Property testing and geometry / Christian Sohler [original]

Property Testing and Geometry

Dissertation of Christian Sohler

September 30, 2002

Acknowledgments

First of all, I would like to thank my thesis advisor Professor Friedhelm Meyer auf der

Heide for his great support. He showed me in many helpful discussions how to improve

my work and suggested in which direction my research should go. But he also left me the

opportunity to go my own way.

I would also like to thank my ’co-advisor’ Professor Artur Czumaj for doing an excel-

lent job. Artur introduced me to property testing and suggested to consider it in the context

of geometry. His encouragement and ideas were always helpful and it was a pleasure for

me to work with him.

Then I would like to thank Valentina Damerow, Alexander May, Thomas M¨

uller, Har-

ald R¨

acke, and Martin Ziegler for carefully reading parts of this thesis and their helpful

suggestions to improve its readability.

Finally, a special thanks to the anonymous person(s) who cheered me up with a gift in

form of a ’SchlagerBild’ music CD during the last phase of writing.

iii

Contents

1 Introduction 1

1.1 Motivation.................................. 4

1.2 RelatedWork ................................ 5

2 Preliminaries 9

2.1 The Standard Testing Model . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Combinatorial Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 The Power of Uniform Sampling . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Two Probability Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Testing Algorithms for Geometric Properties 17

3.1 Intersection of Geometric Objects . . . . . . . . . . . . . . . . . . . . . 17

3.2 ConvexPosition............................... 20

3.3 Euclidean Minimum Spanning Tree . . . . . . . . . . . . . . . . . . . . 28

3.3.1 Basic Definitions and Input Representation . . . . . . . . . . . . 29

3.3.2 Testing EMSTs in Well-Shaped Graphs . . . . . . . . . . . . . . 31

3.3.3 A Simple Property Tester in Well-Shaped Graphs . . . . . . . . . 34

3.3.4 An Improved Property Tester in Well-Shaped Graphs . . . . . . . 37

3.3.5 A Property Tester in Graphs with Maximum Degree 5 . . . . . . 43

3.3.6 A Property Tester in General Graphs . . . . . . . . . . . . . . . . 48

4 Efficient Property Testers 51

4.1 Abstract Combinatorial Programs . . . . . . . . . . . . . . . . . . . . . 51

4.1.1 Testing Abstract Combinatorial Programs . . . . . . . . . . . . . 53

4.2 Property Testing vs. Testing Abstract Combinatorial Programs . . . . . . 56

4.3 ClusteringProblems............................. 57

4.3.1 Radiusclustering.......................... 58

4.3.2 Diameter clustering . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.4 ReversalDistance.............................. 64

4.5 Property Testing vs. Testing Abstract Combinatorial Programs (continued) 68

4.6 GraphColoring............................... 70

4.7 HypergraphColoring............................ 73

4.8 Testable Hereditary Graph Properties . . . . . . . . . . . . . . . . . . . . 78

Contents

5 Testing Algorithms with Geometric Queries 83

5.1 Property Testing with Range Queries . . . . . . . . . . . . . . . . . . . . 83

5.2 Testing Convex Position with Range Queries . . . . . . . . . . . . . . . . 85

5.3 MapLabeling................................ 89

5.4 ClusteringProblems............................. 93

6 Property Testing and Moving Data 97

6.1 Soft Kinetic Data Structures . . . . . . . . . . . . . . . . . . . . . . . . 98

6.1.1 Analysis of Soft Kinetic Data Structures . . . . . . . . . . . . . . 100

6.1.2 Discussion of The Model . . . . . . . . . . . . . . . . . . . . . . 101

6.2 Basic Soft Kinetic Data Structures . . . . . . . . . . . . . . . . . . . . . 101

6.2.1 SortedSequences.......................... 101

6.2.2 Balanced Search Trees . . . . . . . . . . . . . . . . . . . . . . . 105

6.2.3 RangeTrees............................. 108

6.2.4 Euclidean Minimum Spanning Trees . . . . . . . . . . . . . . . . 111

7 Conclusions 113

1 Introduction

Property Testing is the computational task to decide whether a given object (e.g., a graph,

a function, or a point set) has a certain predetermined property (e.g., bipartiteness, mono-

tonicity, or convex position) or is far away from every object having this property. If the

input object neither has the property nor is far away from it, the algorithm may answer

arbitrarily. In contrast to traditional algorithms a property testing algorithm does not get a

complete description of the object under consideration as input. Instead, it has the ability

to perform queries about the (local) structure of the object. This way it is possible that a

property testing algorithm achieves its goal by looking only at a small (usually randomly

selected) part of the whole object.

The type of query used by a property testing algorithm depends on the object under

consideration. For example, if the object is a (dense) graph Gthen the algorithm may

query for entries in the adjacency matrix of G. If the object is a function f, it may ask

queries of the form: ’What is the value of f(x)?’ where xis an element of the domain of

f. For point sets the algorithm may ask for the position of the i-th point of the set (given

an arbitrary, fixed, but unknown ordering among the points).

To specify if an object is far away from a property we need a distance measure between

objects. The distance between two objects O1and O2is typically given by the ’fraction

of object O1’ that has to be changed in order to obtain object O2. For example, in the

adjacency matrix model for graphs the distance between two graphs is the fraction of

entries in the adjacency matrix on which the two graphs differ. The distance between

functions is the fraction of domain elements on which the value of the two functions differs

and the distance of points sets is the fraction of points that are contained in one set but not

in the other. To determine when an object is far away from a property a distance parameter

is introduced. Given a distance measure and a distance parameter we say that an object

is -far from a property, if the distance to every object having the property is more than .

The quality of a property testing algorithm is measured by its query complexity and its

running time. The query complexity is the number of queries the algorithm asks about the

input. The running time is the time required for additional computations.

The concept of property testing was first explicitly formulated by Rubinfeld and Sudan

mainly in the context of program checking [75]. The study of property testing for com-

binatorial objects with focus on graphs was initiated by Goldreich et al. [51]. In many

follow-up papers the concept of property testing has been applied to different classes of

objects including graphs, hypergraphs, matrices, formal languages, Boolean expressions,

and point sets.

In this thesis we study geometric problems in the context of property testing and we

1 Introduction

apply concepts from computational geometry to property testing problems. We start our

investigations in Chapter 3 with the development of property testing algorithms for some

fundamental geometric properties [A. Czumaj, C. Sohler, and M. Ziegler, Property testing

in computational geometry, In: Proceedings of the 8th Annual European Symposium on

Algorithms (ESA), pages 155 - 166, 2000]. We show in particular, that disjointness of a set

of ngeometric objects can be tested with O(pn/)query complexity. Next we develop a

property testing algorithm for the convex position property of point sets in the Rd. A point

set is in convex position if every point of the set is a vertex of its convex hull. Our testing

algorithm has a query complexity of Od+1

pnd/. For both problems we show that our

property testing algorithm is optimal with respect to its asymptotic query complexity. Next

we consider the property that a geometric graph in R2(a graph with vertex set equal to a

point set in the R2)is a Euclidean minimum spanning tree. We develop a property testing

algorithm with query complexity O(pn/ log(n/)) for this problem. Our algorithm

is designed in the adjacency list model for graphs and it uses an interesting non uniform

sampling technique.

In the next chapter we present a general framework for property testing algorithms with

one-sided error [A. Czumaj and C. Sohler, Abstract combinatorial programs and efficient

property testers, In: Proceedings of the 43rd Annual IEEE Symposium on Foundations of

Computer Science (FOCS), to appear, 2002]. Our framework is based on a connection of

property testing with a new class of problems which we call abstract combinatorial pro-

grams. Informally, we show that a property can be tested efficiently, if for every problem

instance (no matter, if the instance has the property or not) there is an abstract combinato-

rial program (of small dimension and width) that satisfies a Feasibility Preserving property

and a Distance Preserving property. We apply our framework to a variety of classical com-

binatorial problems. We prove the testability of some clustering properties of point sets,

of the reversal distance property of permutations, and the k-coloring property of graphs

and hypergraphs [A. Czumaj and C. Sohler, Testing hypergraph coloring, In: Proceedings

of the 28th Annual International Colloquium on Automata, Languages and Programming

(ICALP), pages 493 - 505, 2001]. Although for most of these properties a property testing

algorithm has been known before, our framework can be used to simplify the correctness

proofs and in most cases it slightly improves the analyzed query complexity. More im-

portant, it shows that our framework is fairly general and can be applied to a couple of

different problems. In the last section of Chapter 4 we continue our investigations in the

generality of the framework and show: If a hereditary graph property can be tested in

time independent of the size of the graph (for constant ), then there exists a proof for its

testability in our framework. A graph property is called hereditary, if it is closed under

taking induced subgraphs, i.e., if G= (V, E)is a graph with a hereditary graph property

then every subgraph of Ginduced by a set S⊆Vhas the property, as well. Although the

current formulation of the framework is purely combinatorial and there is no trace of the

geometric origin, its development has been influenced by concepts from geometry such as

the geometry of linear programming and LP-type problems [76].

In Chapter 5 we return to the development of specific property testing algorithms;

this time under a different model of computation [A. Czumaj and C. Sohler, Property

testing with geometric queries, In: Proceedings of the 9th Annual European Symposium on

Algorithms (ESA), pages 266 - 277, 2001]. While in the standard testing model considered

in Chapter 3 a testing algorithm for properties of point sets is only allowed to query for

the position of the i-th point of the set we now allow a certain kind of geometric queries

when accessing the input point set. In this new model a testing algorithm may specify a

query range Rand ask for the i-th point within R. Depending on the type of query range

allowed we have different models of computation. The queries we consider are supported

by standard spatial data structures as they appear in many applications. We show that in

the new model it is possible to design more efficient testing algorithms than in the model

considered so far. In particular, we reconsider the convex position property and show that

it can be tested with O(logn/)triangular range queries. Then we show that there is a

property testing algorithm for a basic map labeling property. This algorithm uses O(1/3)

rectangular range queries. Finally, we visit the clustering properties from Chapter 4 again.

In Chapter 6 we apply property testing in the context of moving objects [A. Czumaj,

C. Sohler, Soft kinetic data structures, In: Proceedings of the 12th ACM-SIAM Symposium

on Discrete Algorithms (SODA), pages 865 -872, 2001]. When we consider moving object

like cars, robots, and mobile phones we do not know the future position of an object.

In many situations an object is controlled and moved by a third party which is located

outside of our system. Basically, the only query a system can guarantee to answer correctly

is for the current position of an object. In Chapter 6 we develop a theoretical model

for this kind of motion. We are interested in combinatorial structures that are uniquely

defined by the current position of all objects. We assume the following: Queries about

the structures we maintain occur subsequently in time. Between two queries the objects

may move arbitrarily. This movement induces combinatorial changes in the maintained

structure because the structure is uniquely defined by the objects’ positions. Before each

query a data structure may spend some time to ’adjust’ its data. The time spent is compared

to the number of ’combinatorial changes’ that occurred in the data structure since the

most recent query. More precisely, we assume that the objects moved on the ’shortest’ (in

terms of combinatorial changes) route to their current position. If we do not want to ask

for the position of every single object before each query, there is no hope to maintain a

combinatorial structure correctly. That is, where property testing can be applied. We only

maintain approximately correct structures (in the property testing sense). If a structure

is far from correct, then the property testing algorithm usually rejects and returns a small

counter example. This counter example is then locally corrected. This procedure is applied

until the property tester accepts. Using this approach we can show that we can maintain

approximately correct data structures spending only slightly more time than the number

of combinatorial changes in the structure during the shortest motion from configuration to

configuration. We illustrate our approach on sorted sequences, binary search trees, range

trees, and the Euclidean minimum spanning tree.

In the remainder of the Introduction we discuss our motivation to consider property

testing in general, and in particular in the geometric context. Lateron we give a brief

summary of the state of the art in property testing.

1 Introduction

1.1 Motivation

Property testing can be viewed as a relaxation of a standard decision procedure: Instead

of deciding whether an object has a given property or not, we only have to distinguish

between the case when the object has the property and the case when the object is far

away from it. This simpler problem can often be solved much faster and by looking only

at a small - sometimes constant - part of the input. For this reason the running time of a

property testing algorithm is usually much smaller than that of the corresponding decision

procedure. Problems that are infeasible in their original formulation as a decision problem

(e.g., NP-complete problems) are tractable in the property testing setting. The standard

way to deal with intractable problems would be, of course, to apply an approximation

algorithm (in the classical sense) to the problem. But sometimes polynomial or even linear

time approximation algorithm may also be too slow. In telecommunications, web traffic

analysis, and data mining we have massive data sets that cannot be processed by classical

algorithms. In such cases the only possibility is to use a sublinear time and space algorithm

as provided by property testing.

We also observe that property testing is a natural form of approximation. Whether or

not the approximation provided by a standard approximation algorithm is more useful than

that of property testing depends on the problem at hand. For example, in the case of graph

and hypergraph coloring it might be more useful to obtain a coloring that violates at most

an -fraction of the edges than a coloring that might use many more colors than necessary.

We remark that property testing can be very useful in the field of program verification.

In software development people often specify interfaces between different program parts.

Even with a correct and complete interface specification it may happen (more often than

not ?) that data is passed through the interface that is not consistent with the specification.

Typically, data is passed internally using pointers and so it is possible to pass huge objects

in constant time. Hence it is infeasible to check the whole object for its correctness without

heavy impact on the running time of the system. In this case a property testing algorithm

provides a useful alternative: If we cannot give a full guarantee that the input is correct

then we can at least detect the case when it is far from correct.

A further reason for the research in property testing algorithms is the fact that in some

situations being close to a property is almost as good as having the property. This can be

seen in the following example from Computer Graphics: One of the major problems in

Computer Graphics is to design efficient rendering algorithms. Rendering is the computa-

tional task to display a virtual scene on the screen. Such a virtual scene is typically com-

posed of several millions of polygons. Using nowadays graphics hardware and a standard

z-buffer algorithm it is not possible to render all polygons in a reasonable time. Therefore,

it is necessary to determine in advance if certain parts of the scene cannot be seen from

the current point of view. This process is called occlusion culling. Unfortunately, it is very

difficult to compute exactly those polygons that cannot be seen from a certain point of

view. Many approaches have been developed to deal with this problem and some of them

work only for objects that have (roughly) a convex shape (see, e.g., the survey [26]). If we

could determine quickly in advance whether an object is convex or far away from convex

then we could quickly decide if it is useful for the purpose of (online) occlusion culling.

1.2 Related Work

Such a service can be provided by a property testing algorithm. Therefore we consider the

closely related problem, if a point set (possibly the set of vertices of an object used for

culling purposes) is in convex position, in Chapter 3 and 5 under different property testing

models.

One particular situation when property testing can be useful in Computational Geom-

etry is the exact computation of geometric structures. It is well-known that fixed precision

arithmetic leads to serious problems in the design of geometric algorithms. Therefore,

software packages provide efficient algorithms to evaluate geometric predicates exactly.

Unfortunately, precision takes its time and so the exact evaluation of geometric predicates

is usually much slower than what we could achieve by using fixed precision arithmetic.

One approach to deal with these problems is the following: We first compute the geomet-

ric structure using fixed precision arithmetic and then verify the final structure using exact

arithmetic. If we find errors we try to fix them locally. Unfortunately, such a heuristic

has the drawback that an early error in the computation may cause damage to the whole

part of the structure computed later. Therefore, we should make sure that our structure is

not ’too far away’ from the correct structure during the whole computation. This could

be achieved by a property testing algorithm. If the property testing algorithm rejects the

current structure, then we can correct it before we finish the computation.

Another situation when property testing is useful is in the case of continuously chang-

ing data. Continuously changing data arises in applications that deal with moving objects,

e.g., when we want to maintain a mobile ad-hoc network for moving robots. In such a

situation we can query each robot for its current position but such a query may be slow

and each query increases the network load. It also does not make sense to update the posi-

tion of each robot before a transmission using the network is performed - such an update

would often create much more network load than the transmission itself. Instead we may

use property testing techniques to check the structure of the current network. If we find

errors in the structure we then correct them locally. In Chapter 6 we develop a theoretical

framework for mobile data and we show how to maintain a certain type of approximate

range trees. These range trees may be useful for the problem of maintaining such a com-

munication network.

1.2 Related Work

We now give a brief summary of the state of the art in property testing. Property testing

was first explicitly formulated by Rubinfeld and Sudan in the context of program checking

[75]. In [51] Goldreich, Goldwasser, and Ron initiated the study of property testing for

combinatorial objects. Since then, people designed property testing algorithm for a wide

variety of problems and tried to classify large classes of testable properties. In the follow-

ing we try to summarize the known results in the different areas where property testing

has been applied. We say that a property can be tested efficiently, if there is a property

testing algorithm with a query complexity that does not depend on the size of the input (is

constant, if the distance parameter is a constant).

Related document tools

Tools for careful academic writing

Plag helps identify passages that may need closer source checking. Identific can support academic and institutional document processes. They help keep academic document workflows clearer.

plag.ai

1 Introduction

Graph, Hypergraph, and Matrix Properties. Graph properties have been exten-

sively studied in the context of property testing. There are essentially two models for

testing graph properties. The adjacency matrix model for dense graphs and the adjacency

list model for sparse graphs. In the adjacency matrix model introduced in [51] it is known

that properties that can be formulated as a certain class of graph partitioning properties

can be tested efficiently [51]. Among these properties are clique size, cut size, bisection,

and k-colorability properties of graphs. Colorability properties have received much atten-

tion in subsequent work. The analysis from [51] for the bipartiteness and the (standard)

k-colorability property has been improved in [6]. In [5] Alon et al. used a variant of Sze-

mer´

edi’s Regularity Lemma to show that a very general graph coloring property can be

tested efficiently. They use this result to prove that every first order logic graph property

without ∀∃ quantification is efficiently testable. Though the query complexity of their al-

gorithm is independent of the size of the graph its dependency on is enormous (i.e., a

tower of towers of a polynomial in ). The notion of coloring from [5] has been further

generalized in various ways in [42] and shown to be efficiently testable.

Another general property considered in the research is the property of H-freeness

(graphs not containing a fixed graph Has an induced subgraph). Alon proved that the

query complexity of a one-sided error property tester is polynomial, if and only if His

bipartite [3]. The property of H-freeness can also be tested efficiently in 3-uniform hyper-

graphs [62].

In the adjacency matrix model there is always a canonical property testing algorithm as

the following result of Goldreich and Trevisan shows: A property in the adjacency matrix

model can be tested with query complexity independent of the size of the graph, if and

only if there exists a property testing algorithm that samples a set Sof vertices of size

independent of the graph uniformly at random and that accepts if and only if the subgraph

induced by Shas some fixed graph property (possibly different from the tested one) [54]. In

this work the authors also characterize graph partition problems that are efficiently testable

with one-sided error.

The adjacency list model for graphs has been introduced in [52] by Goldreich and Ron.

They show that some graph properties like connectivity, planarity, and cycle-freeness can

be tested efficiently. In contrast to the adjacency matrix model in the adjacency list model

there is no canonical property testing algorithm known. Typically, non-uniform sampling

strategies that depend on the problem at hand have been applied. For this reason the prop-

erties considered in the adjacency list model seem to be structurally simplier than the ones

analyzed in the adjacency matrix model. Colorability properties have also been considered

in the adjacency list model: In [53] Goldreich and Ron showed that bipartiteness can be

tested with a query complexity of Opoly((logn)/)pn/using a non-uniform sam-

pling strategy based on random walks. The authors also present an almost matching lower

bound on the query complexity. Very recently, Bogdanov et al. [19] presented a lower

bound of Ω(n)(for sufficiently small ) on the query complexity of 3-colorability in the

adjacency list model. It is also known that the diameter of graphs [67] and acyclicity of

directed graphs is testable in sublinear time [15].

The popularity of the adjacency matrix model for graphs lead to the question which

matrix properties can be tested efficiently. Fischer and Newman showed that matrix prop-

1.2 Related Work

erties defined by a finite collection of forbidden induced partially ordered sets (where we

view a matrix as a partially ordered set in the standard way, that is, as a product order of

total orders) are efficiently testable [45]. Parnas and Ron developed property testing algo-

rithms for the problem to test whether a given matrix is a Euclidean metric, a tree metric,

or an ultrametric [68].

Properties of Functions. Monotonicity properties of functions have also received a

lot of attention. Monotonicity of binary functions f:{0, 1}n→{0, 1}can be tested with

a query complexity of O(n/)[50]. This result has been generalized to functions over

arbitrary finite ordered sets in [37]. Sortedness of a sequence of nnumbers can be tested

in O(logn/)time [40]. Lower bounds on monotonicity testing have been given in [44].

Properties of Boolean functions that can be tested efficiently are singletons, monomi-

als, and the property that fis a monotone DNF formula with at most `terms. [70]. In [43]

Fischer et al. proved that the property that a Boolean function over nvariables depends

only on kof them can be tested with a number of queries depending only on kand .

Language Properties. In order to understand the nature of efficiently testable prop-

erties people considered property testing in the context of formal languages. Alon et al.

showed that a regular language seen as a property can be tested efficiently [7] and there

exists a context-free language that cannot be efficiently tested. Further, there is a property

testing algorithm for testing Dyck languages (languages containing strings of balanced

parenthesis) with query complexity e

O(n2/3/)[69]. Newman proved that constant width

oblivious read-once branching programs viewed as properties can be tested efficiently [66].

Recently, Fischer and Newman showed that read-twice constant width branching programs

are not necessarily testable [46].

Other Results. There are a couple of results on specific topics that do not fit into one

of the three categories above. We summarize them in this paragraph.

Alon and Shapira showed that a general satisfiability problem which includes hyper-

graph k-coloring can be tested efficiently [8].

In the context of clustering Alon et al. considered the radius and diameter clustering

problem, i.e. the problem if a given point set can be partitioned into ksubsets such that

the radius of the smallest sphere enclosing each set (the maximum distance between two

points within each set, respectively) is at most b. If we view this problem as a property of

point sets then it can be tested efficiently as proved in [4].

Some results related to statistics have been obtained: In [14] Batu et al. gave a prop-

erty testing algorithm with query complexity O(n2/3/−4logn)for closeness of distri-

butions. Given a distribution Aover [n]×[m]it can be tested with query complexity

O(n2/3m1/3poly(−1)) if the distributions obtained by projecting Ato each coordinate

are independent [13].

Recently, Buhrman et al. introduced the notion of property testing to quantum com-

puting [21].

1 Introduction

Surveys. There are three surveys that summarize some of the work done in the field of

property testing [42, 74, 49].

2 Preliminaries

In this chapter we introduce some notation used throughout this thesis. In particular, we

introduce a formulation of the standard property testing model with one sided error. We

introduce a special class of properties which we call combinatorial properties.

Then we show that if there exists a property tester for a combinatorial property of point

sets then there exists a property tester with the same query complexity that picks a sample

set uniformly at random and decides based on the sample and its internal coin flips. Such a

result is used for lower bound constructions for the query complexity of a property testing

algorithm.

Finally, we prove two auxiliary lemmas. The first lemma gives a lower bound on the

probability that a sample taken uniformly at random from a set Ωcontains one of kdisjoint

sets of size l. The second lemma gives an upper bound on that probability.

2.1 The Standard Testing Model

We begin with some basic notation and definitions. We use the e

O-notation to hide poly-

logarithmic factors, i.e. we have O(npolylog(n)) = e

O(n). We write [n] := {1,...,n}

for the set of integer numbers between 1and n. Throughout this thesis, let Ddenote a

finite set called domain and let Rdenote a (possibly infinite) set called range. Further let

Fdenote a set of functions from Dto R. For a subset S⊆ D and a function f∈ F let

f|Sdenote the restriction of fto S. That is, f|S:S→Rwith f|S(x) = f(x)for all x∈S.

Now we define a property of Fas a set of functions from F:

Definition 2.1.1 A set Π⊆ F is called a property.

As already mentioned in the introduction we also need a distance measure between

functions in Fto define a property testing problem. In general, such a distance measure

can be an arbitrary function σ:F ×F →[0, 1]:

Definition 2.1.2 Given a distance measure σbetween functions in Fand a real number

,0≤<1, we say a function f∈ F is -far from (having a property) Πif σ(f, g)> 

for every function g∈Π.

Typically, we define the distance between two functions f, g ∈ F as the fraction of

domain elements on which the two functions differ (see, for example, [51]). Therefore, we

denote this distance measure as the standard distance measure:

2 Preliminaries

Definition 2.1.3 Given two functions f, g ∈ F we define the standard distance measure σ1

for functions in Fas:

σ1(f, g) = |{x∈ D :f(x)6=g(x)}|

|D|.

The goal of property testing is to develop efficient property testers. A property tester

for Πis an algorithm that gets a distance parameter and a (possibly implicit) description

of D(for example, when Dis the set of numbers from [n]it gets n). The property tester

has oracle access to the input function f(for each x∈ D it may ask queries of the form:

’What is the value of f(x)?’). A property tester must

•accept every function f∈Π, and

•reject every function fthat is -far from Πwith probability at least 2

1Notice that if f /∈Πand fis not -far from Π, then the outcome of the algorithm can go

either way.

Complexity of Property Testers. There are two types of possible complexity mea-

sures for property testers: The query complexity and the running time. The query com-

plexity measures the number of queries asked by a property testing algorithm:

Definition 2.1.4 The number of queries to the oracle is the query complexity of the prop-

erty tester.

If one counts also the time the algorithm needs to perform other tasks than querying

the input function values (for example, to compute certain combinatorial structures or to

compute a coloring of a graph), then the obtained complexity is called the running time of

the property tester. Now we present two examples how objects other than functions can be

represented in our model.

Point Sets. When we consider point sets we set D= [n]. Then we can represent a

point set P={p1,··· , pn}in Rdby a function f: [n]→Rdwith f(i) = pifor 1≤i≤n.

Throughout the thesis we assume that the input point set is in general position when we

deal with property testing algorithms for point sets. General position means that our point

set is not in a degenerate configuration. Roughly speaking, we call a configuration of

points degenerate, if it occurs with probability 0when the points are subject to small

random perturbations.

To be more precise, when we consider the convex position property as defined in Chap-

ter 3 then a point set Pin the Rdis degenerate, if there are more than k+1points in a

k-dimensional affine subspace with k<d. When we consider the Euclidean minimum

spanning tree, then a point set is degenerate, if the point set does not have a unique min-

imum spanning tree or if it has a subset that does not have a unique minimum spanning

tree.

1We consider a one-sided error model, though in the literature also a two-sided error model has been

considered. In the two-sided error model the goal is to distinguish with probability at least 2

3between

the case f∈Πand the case of fbeing -far from Π.

2.2 Combinatorial Properties

Graphs. In the literature one can find three models for property testing in graphs. The

first (and most widely considered) model is called the adjacency matrix model [51]. Here

one assumes that a graph G= (V, E)is represented as an adjacency matrix, i.e. it is

possible to query for each pair (v, u)with v, u ∈V, if (v, u)∈Eor not. Similarly, one

can assume that the graph is represented by a function f:V×V→{0, 1}that encodes the

adjacency matrix. W.l.o.g., we assume that V= [n]. Hence when we set D= [n]×[n]

and R={0, 1}then we can represent graphs as functions from Dto Ras required.

The second model is the bounded length adjacency list model [52]. Here one assumes

that a degree bound don the maximum degree of the vertices in the input graph is known.

In this case a graph is represented as a function f: [n]×[d]→[n]∪{0}where f(i, j)

denotes the j-th neighbor of vertex iand where f(i, j) = 0, if such a neighbor does not

exist.

The third model is the unbounded length adjacency list model [67]. In this model, the

input is not represented as a function. The model is specified by the different queries that

may be asked about the input graph G= (V, E). First of all, the algorithm may ask for the

degree deg(v)of each vertex v∈V. Further it may ask for the i-th neighbor of vertex v

for each i≤deg(v).

2.2 Combinatorial Properties

We now want to introduce a special class of properties that plays an important role in

property testing. We refer to these properties as combinatorial properties. Combinatorial

properties are only defined for functions whose domain Dis (isomorphic to) a set of the

form [n]dfor some constant d>1. They are properties that are closed under permutations

(isomorphisms) of the domain. This is typically the case when the function is used to

represent a set of objects or a combinatorial structure such like a graph or hypergraph.

Definition 2.2.1 Let Fbe the set of functions with domain D= [n]dand arbitrary range

R. Let Sndenote the set of permutations of [n]. A property Πof Fis called combinatorial,

if it is closed under permutations of [n]. That is, if π∈ Snis a permutation of [n]then

for every f∈Πit holds: π(f)∈Πwhere π(f)(i1, . . . , id) := f(π(i1), . . . , π(id)) for

i1,··· , id∈[n].

Let us consider a few examples for combinatorial properties. If the considered object

is a set of basic objects (that is, there is no ordering among the objects) then every property

must be combinatorial. This is because the property does not depend on the way the object

is represented as a function. Examples for combinatorial properties of sets of objects

are the disjointness property considered in Chapter 3 or the clustering properties of point

sets in Chapter 4. Other examples are all graph and hypergraph properties (if viewed as

properties of an n-vertex graph or hypergraph).

Examples for properties that are not combinatorial are properties like sortedness,con-

vex polygons2, and the reversal distance property considered in Chapter 4. These prop-

erties typically depend heavily on the ordering given implicitly by the representation of

2The property that a sequence of points forms a simple convex polygon.

2 Preliminaries

the input object as a function. When we design property testers for these properties we

sometimes require non-uniform sampling techniques.

For combinatorial properties of point sets we show in the following section that there is

no chance to improve the query complexity of a property tester by using adaptive sampling.

If we consider combinatorial properties of point sets and we have an arbitrary property

tester with query complexity q(n, )then there is a property tester that samples a set

S⊆[n]of size q(n, )uniformly at random and decides based on Sand its internal coin

flips.

2.3 The Power of Uniform Sampling

We now prove a fundamental lemma that is needed for lower bound constructions. We

show that if there is a property tester for a combinatorial property Πof point sets with

query complexity q(, n)then there is a property tester for Πthat picks a sample set

S⊆Pof size q(, n)uniformly at random and decides based on Sand its internal coin

flips. The proof is almost similar to the proof of Lemma 4.1 in the paper [54] by Goldreich

and Trevisan which is based on an analogous statement proven by Bar-Yossef et al. in [11].

The idea of the proof is simple: There are n!different representations of the same

point set, one for each permutation of [n]. When we start our algorithm on a random

representation of the same set then choosing the next index adaptively simply means to

pick another element uniformly at random. Indeed we cannot gain anything using adaptive

sampling:

Lemma 2.3.1 Let Πbe an arbitrary combinatorial property of point sets and let Abe an

arbitrary property tester for Πwith query complexity q(, n). Then there exists a property

tester for property Πthat selects a set Sof q(, n)points uniformly at random and decides

based on Sand its internal coin flips.

Proof : Recall that a point set P={p1,··· , pn}in the Rdis represented by a function

f: [n]→Rdwith f(i) = pi. Since Πis a combinatorial property we know that we can

talk about properties of Prather than properties of the function representing P. That is, if

one representation of Phas property Πthen every other representation also has property

Π. We use this fact in the following to construct a property tester that samples uniformly at

random from an arbitrary property tester for Π. Both property testers have the same query

complexity.

Let Abe an arbitrary property tester for Π. Wlog., we assume that algorithm Aoperates

in iterations. In each iteration, depending on its internal coin flips and the answers obtained

in the past iterations, the algorithm selects a new index iand makes a query for the position

of the point pi.

Now we obtain a new algorithm A0from Ain the following way: When A0is started

with input point set Pof size n(given as an oracle) it first choses a permutation πof [n]

uniformly at random. Then algorithm Ais invoked with oracle access to the point set π(P)

that is represented by the function fπ: [n]→Rddefined by fπ(i) = f(π(i)). We observe

2.3 The Power of Uniform Sampling

that π(P)has property Π, if and only if Phas property Πsince π(P) = Pand since Πis

combinatorial. Further we know that Ais a property tester for property Π. Thus it must

accept, if π(P)has property Π. It follows that A0accepts, if Phas property Π.

If Pis -far from Πthen so is π(P). By the fact that Ais a property tester it follows

that π(P)is rejected by Awith probability at least 2/3. It follows that A0also rejects P

with probability at least 2/3. Thus we know that A0is a property tester for property Π.

Our next step is to show that in each iteration of algorithm A0the next chosen point

is uniformly distributed among all possible choices (among all points that have not been

chosen in a previous iteration). For every sequence rof coin flips let Ardenote the deter-

ministic algorithm obtained from Aby fixing the outcome of the coin flips according to r.

Further let A0

rdenote the algorithm similar to A0with the exception that Aris invoked in-

stead of A. We show that for every sequence rof coin flips and for every possible sequence

of queries and answers the choice of the point selected next by algorithm A0is uniformly

distributed among the points not selected so far.

Let iπ

1,r,··· , iπ

j,r denote the indices selected by A0

rin iterations 1to j, i.e., the indices se-

lected when Aris invoked with oracle access to the point set π(P). Further let qπ

1,r,··· , qπ

j,r

denote the points selected by A0

rin iterations 1to j, i.e., qπ

`,r =fπ(iπ

`,r)for 1≤`≤j. The

choice of the index iπ

j+1,r selected by A0

rin iteration j+1depends only on the previously

selected points qπ

1,r, ..., qπ

j,r . Thus when we condition πby fπ(iπ

`,r) = qπ

`,r,1≤`≤j,

the choice of index iπ

j+1,r is deterministic. Further we prove in the following claim that

for every fixed index iπ

j+1,r under the same conditioning the point fπ(iπ

j+1,r)is uniformly

distributed among the points not selected so far:

Claim 2.3.2 Let i1,··· , ij∈[n]be distinct indices and let q1,··· , qj∈Pbe the distinct

points. Then for each i∈[n]\ {i1,··· , ij}and each p∈P\ {q1,··· , qj}:

Pr[fπ(i) = pfπ(i`) = q`, 1 ≤`≤j] = 1

n−j

Proof : The number of permutations of [n]with fπ(i`) = q`for 1≤`≤jis (n−j)!.

For each i∈[n]\ {i1,··· , ij}and each p∈P\ {q1,··· , qj}the number of permutations

of [n]with fπ(i) = pand fπ(i`) = q`for 1≤`≤jis (n−j−1)!. Thus

Pr[fπ(i) = pfπ(i`) = q`, 1 ≤`≤j] = (n−j−1)!

(n−j)! =1

n−j

Thus we know that in each iteration of algorithm A0

rthe next chosen point is uniformly

distributed among the points not chosen so far. We use this fact to build an algorithm

A00 that has the same output distribution as A0. Then we design an algorithm A000 that

samples a set S⊆Pof q(, n)points uniformly at random and then decides based on

internal coin flips and the point positions. We observe that we can execute algorithm A0by

first choosing a sequence rof coin flips uniformly at random and then invoking algorithm

r. Algorithm A00 works in a similar manner. It first chooses a sequence rof coin flips

2 Preliminaries

uniformly at random. Then it invokes algorithm A00

r. Algorithm A00

rbehaves similarly to

algorithm A0

rwith the exception that in each iteration when A0

rrequests the position of

the point with index ialgorithm A00

rselects uniformly at random an index not selected so

far and returns the corresponding point. Let us formalize this point and look at iteration

jof both algorithms. Let ijdenote the index selected by algorithm A0

rin iteration jand

let i0

jdenote the index selected by algorithm A00

r. When during iteration j+1algorithm

rrequests the position of the point stored at index ij+1then A00

rselects an index i0

j+1

uniformly at random from the set [n]\ {i0

1,··· , i0

j}and answers the query with the position

of point i0

j+1.

Claim 2.3.3 Let Pbe any point set in the Rd. If algorithms A00 and A0are invoked with

oracle access to the same function representing Pthen their output distributions are iden-

tical.

Proof : Let us fix the function representing Pand r(for both algorithms). We show by

induction on the number of iterations that the output distribution of algorithms A00

rand A0

are identical. We already proved for a fixed selection of points in iterations 1to jthat the

next point chosen by algorithm A0

ris uniformly distributed among all points not selected so

far. Clearly, the same is true for algorithm A00

r(if the distribution of the indices is uniform

then so is the distribution of the corresponding points). Thus the induction step holds and

thus the output distributions of both algorithms are identical. 2

We have already proven that A0is a property tester for property Π. Since the output

distributions of A00 and A0are identical for every input point set we have shown that algo-

rithm A00 is also a property tester. But A00 uses a non-adaptive sampling. Finally, we show

that there is an algorithm A000 that samples a set Sof q(, n)points uniformly at random

and decides based on Sand its internal coin flips.

Algorithm A000 first samples a set Sof q(, n)indices uniformly at random and then

chooses a permutation (i1,··· , iq(,n))of Suniformly at random. It then runs algorithm

A00 and answers the j-th query of algorithm A00 by returning the point stored at index ij.

Claim 2.3.4 Let Pbe any point set in Rd. If algorithms A000 and A00 are invoked with oracle

access to the same function representing Pthen their output distributions are identical.

Proof : Again we fix the function representing point set Pand the random choice of r. We

show by induction on the number of iterations that both algorithms have the same output

distribution. It suffices to show that the choice of the index in iteration j+1is uniformly

distributed among the indices not chosen so far. For every index i∈[n]\ {i1,··· , ij}it

holds that

Pr[ij+1=ii`6=ifor 1≤`≤j] = q(, n) − j

n−j·(q(, n) − j−1)!

(q(, n) − j)! =1

n−j

Thus the index is chosen uniformly at random from the set of remaining indices. Hence,

the output distributions of algorithms A000 and A00 are identical. 2

Algorithm A000 satisfies the statement of Lemma 2.3.1 and thus the proof is completed.

2.4 Two Probability Lemmas

The first lemma shows that if we take a sample of size sfrom a set Ωof nelements, then

with good probability we have at least one of kdisjoint sets (each of size l) in our sample.

Lemma 2.4.1 Let Ωbe an arbitrary set of nelements. Let kand lbe arbitrary integers

(possibly dependent on n) and let sbe an arbitrary integer such that s≥2 n

(2 k)1/l . Let

W1, . . . , Wkbe arbitrary disjoint subsets of Ωeach of size l. Let Sbe a subset of Ωof

size swhich is chosen independently and uniformly at random. Then

Pr∃j∈[k] : (Wj⊆S)≥1

Proof : We first observe that we can focus on the case k≤n

2, because if k > n

2, then

every set Wicontains exactly one element and then we immediately get Pr[∃j∈[k] :

(Wj⊆S)] ≥1

2. Further, since k≤n

2yields l+n−l

(2 k)1/l ≤2 n

(2 k)1/l , it is sufficient to

consider only the case s=l+n−l

(2 k)1/l . Next, by Boole-Benferoni inequality (see, e.g., [65,

Proposition C.2]) we obtain

Pr∃j∈[k] : (Wj⊆S)≥

j=1

Pr[Wj⊆S] − X

1≤i<j≤k

Pr[(Wi∪Wj)⊆S].

Further, we observe that for every j∈[k]it holds that

Pr[Wj⊆S] = n−l

s−l

n

s=(n−l)!

(s−l)! (n−s)! ·s! (n−s)!

n!=(n−l)! s!

n! (s−l)! =

l−1

r=0

s−r

n−r.

Similar arguments can be used to show that for every i, j ∈[k], if i6=j, then it holds that

Pr[(Wi∪Wj)⊆S] = n−2 l

s−2 l

n

s=

2 l−1

r=0

s−r

n−r=

l−1

r=0

s−r

n−r·

l−1

r=0

(s−l) − r

(n−l) − r.

Hence, Boole-Benferoni inequality implies that

Pr∃j∈[k] : (Wj⊆S)≥k·

l−1

r=0

s−r

n−r−k

2·

l−1

r=0

s−r

n−r·

l−1

r=0

(s−l) − r

(n−l) − r

=k·

l−1

r=0

s−r

n−r· 1−k−1

2·

l−1

r=0

(s−l) − r

(n−l) − r!

≥k·

l−1

r=0

s−l

n−l· 1−k·

l−1

r=0

s−l

n−l!

=k·s−l

n−ll

· 1−k·s−l

n−ll!.

2 Preliminaries

Now, since we assumed that s=l+n−l

(2 k)1/l , our calculations above yield Pr[∃j∈[k] :

(Wj⊆S)] ≥1

4, and thus the lemma follows. 2

The next lemma is useful for certain lower bound constructions. The lemma gives an

upper bound on the probability that a sample set of size schosen uniformly at random

contains one of ksets Wicompletely.

Lemma 2.4.2 Let Ωbe an arbitrary set of nelements. Let k,l, and sbe arbitrary integers

(possibly dependent on n). Let W1, . . . , Wkbe any disjoint subsets of Ωof size leach.

Let Sbe a subset of Ωof size swhich is chosen independently and uniformly at random.

For every real p,0 < p < 1, (possibly depending on other parameters), if s≤(n− (l−

1)) ·(p/k)1/l, then Pr∃j∈[k] : (Wj⊆S)≤p .

Proof : We use the union bound to obtain that

Pr∃j∈[k] : (Wj⊆S)≤

j=1

Pr[Wj⊆S] =

j=1

l−1

r=0

s−r

n−r≤k·s

n− (l−1)l

Therefore, if s≤(n− (l−1)) ·(p/k)1/l, then the above inequality is upper bounded by

p.2

3 Testing Algorithms for Geometric

Properties

In this chapter we design and analyze testing algorithms for fundamental problems from

the area of Computational Geometry. The problems we consider deal with global proper-

ties of geometric objects such as point sets or geometric graphs. We can formulate property

testing in the standard model for geometric objects as follows:

Given a geometric object O(for example, a set of points or a geometric graph) and the

ability to perform local queries about the object (e.g. queries of the form: ’What is the

position of the i-th point of the collection ?’, ’What is the i-th edge incident to vertex p

?’, or ’What is the position of vertex p?’) decide whether Ohas a certain (predetermined)

property (e.g., whether the set of points is in convex position or whether the geometric

graph is a Euclidean minimum spanning tree) or is far away from every object having this

property.

We assume that our object representation is in the standard testing model. Point sets are

represented in the same way as it has been described in the previous chapter. Other objects

we consider are sets of geometric objects and geometric graphs. The object representation

is again in the spirit of the standard testing model. We allow only the most basic queries

about the object. It is only required that it is possible to determine the whole structure of

the object using these queries. By allowing only basic queries we ensure that our testing

algorithm is almost independent of the application (almost every structure supports these

basic queries). Later, in Chapter 5 we consider a model where we allow more powerful

queries.

The first problem we consider is disjointness of geometric objects. Then we design

a property tester for the convex position property of point sets. And finally, we give a

property tester for Euclidean minimum spanning trees.

3.1 Intersection of Geometric Objects

The first problem we consider is to test whether a set O={O1, . . . , On}of ngeometric

objects is pairwise disjoint. A geometric object is an (implicitly represented) subset of

Rd. Two geometric objects O1and O2are said to be disjoint, if O1∩O2=∅. A set

O={O1, . . . , On}of ngeometric objects is pairwise disjoint, if each pair of objects Oi

and Oj,1≤i, j ≤n, is disjoint. An equivilant formulation of this problem is to test

3 Testing Algorithms for Geometric Properties

whether the intersection graph of the objects contains no edges.1

We represent sets of geometric objects by a function f: [n]→Rwhere Rcontains

implicit representations of all geometric objects of a certain class of objects, e.g. all line

segments, points, or rectangle. For example, when we consider sets of line segments in the

Rdthen R=Rd×Rd. The property tester for disjointness of geometric objects is used

later in this thesis when we design a property tester for the Euclidean minimum spanning

tree. We use the standard distance measure from Definition 2.1.3 which has the following

equivalent formulation:

Definition 3.1.1 A set Oof nobjects in the Rdis -far from being pairwise disjoint, if one

has to remove more than n objects from Oto obtain a disjoint set of objects.

The Testing Algorithm. Like many other property testers the tester for disjointness of

geometric objects is very simple. It picks a set Sof 8pn/ objects uniformly at random

and computes whether these objects are pairwise disjoint. If they are, then the algorithm

concludes that in this case either the objects are disjoint or at least ’most of the objects of

the input set are disjoint’. Thus in such a case the algorithm can accept the input. If the

sample set contains two or more objects that intersect, then we know for sure that the set of

objects is not disjoint and the algorithm can reject the input. We prove that the following

algorithm is a property tester for disjointness of geometric objects.









DISJOINTNESSTESTER(set of objects O)

Choose a set S⊆ O of size 8pn/ uniformly at random

if Sis disjoint then accept

else reject

Lemma 3.1.2 Algorithm DISJOINTNESSTESTER is a property tester for disjointness of

geometric objects. Its query complexity is O(pn/)and its running time is T(8pn/)+

O(1), where T(m)is the time to decide whether a set of minput objects is disjoint.

Proof : We have to prove that (1) algorithm DISJOINTNESSTESTER accepts every set of

disjoint geometric objects, and (2) that it rejects every set of geometric objects that is -far

from disjoint with probability at least 2/3.

Part (1) is immediate: If Ois pairwise disjoint, the every subset of Ois also pairwise

disjoint and so algorithm DISJOINTNESSTESTER accepts O.

Thus let us suppose that Ois -far from disjoint and prove part (2). It is easy to see that

we can apply k=n/2 times the following procedure to O: pick a pair of intersecting

objects Wi,i∈[k], and remove it from O. In order to prove that DISJOINTNESSTESTER

is a property tester it is sufficient to show that with probability at least 2/3 at least one

of these pairs is in the sample set S. We apply Lemma 2.4.1 and standard amplification

1We do not use the formulation as a graph problem here because our distance measure would be based on

vertex deletion rather than edge deletion as it is usually in the property testing literature (see Section 2.1).

3.1 Intersection of Geometric Objects

arguments. We consider the sample set Sas four independently selected sets S∗

1, S∗

2, S∗

3, S∗

each of size 2n

(2k)1/2 and then apply Lemma 2.4.1 to obtain:

Pr∃j∈[k] : (Wj⊆S)≥1−Y

1≤i≤41−Pr∃j∈[k] : (Wj⊆S∗

i)≥1−(3/4)4≥2/3

A Lower Bound on the Query Complexity. We now want to show that the algo-

rithm DISJOINTNESSTESTER in the general setting of arbitrary geometric objects is op-

timal with respect to its asymptotic query complexity. For this purpose we consider the

problem if a set of unit disks is disjoint. We represent a set of nunit disks by their center

points, that is, by a function f: [n]→Rd. We observe that the disjointness property for

unit disks is equivalent to the following sparseness property of point sets. A point set Pin

the Rdis called sparse, if for each pair of points p, q we have dist(p, q)> 1. In order to

prove that every property tester for disjointness of geometric objects has query complexity

Ω(pn/)we show that there is no property tester for sparseness of point sets with query

complexity o(pn/).

Lemma 3.1.3 Every property tester for disjointness of sets of geometric objects must have

query complexity Ω(pn/).

Proof : As already mentioned above sparseness of point sets is a special case of dis-

jointness of geometric objects. Hence, it suffices to show a lower bound for the sparseness

property of point sets. Let us assume that there is a property tester Afor sparseness of

point sets with query complexity q(, n) = o(pn/).

Claim 3.1.4 Let Abe a property tester for sparseness of point sets with query complexity

q(, n). Then there is a property tester A00 for sparseness of point sets that samples a set S

of q(, n)points uniformly at random and accepts if and only if the points in Sare sparse.

Such a property tester is called canonical.

Proof : By Lemma 2.3.1 we know that there is a property tester A0that samples a set of

q(, n)points uniformly at random and decides based on Sand its internal coin flips. Let

A00 denote an algorithm that samples a set Sof q(, n)points uniformly at random and that

accepts if and only if Sis sparse. We want to prove that A00 is a property tester. Clearly,

algorithm A00 accepts every sparse point set. Thus we have to prove that it rejects every

point set that is -far from being sparse (a point set is -far from sparse if the corresponding

set of unit disks is -far from disjoint). We show this indirectly by proving that if algorithm

A0rejects a sample set Sthen so does algorithm A00. By definition a property tester has

one-sided error and so A0must accept if the sample is sparse. But if the sample Sis not

sparse then A00 rejects and thus it follows that if A0rejects then so does A00. Since A0is a

property tester it rejects every point set Pthat is -far from sparse with probability at least

2/3. Hence algorithm A00 also rejects such a set Pwith probability at least 2/3. Since A00

3 Testing Algorithms for Geometric Properties

accepts every sparse point set we have shown that A00 is a property tester for sparseness.

This proves Claim 3.1.4. 2

We construct a set Pof npoints in Rthat is -far from sparse but is not rejected by a

canonical property tester with query complexity q(, n)if nis sufficiently large. Pis the

union of bnc+1pairs Wiof points and a set Uof n−2(bnc+1)points such that the

following condition is satisfied for all points p, q ∈Pwith p6=q: If there is a Wiwith

Wi={p, q}then dist(p, q)≤1. Otherwise, we have dist(p, q)> 1. We observe that

algorithm A00 rejects Pif and only if the sample contains one of the sets Wi. We apply

Lemma 2.4.2 with p=1/2,k=n +1,

s=n−1

2(n +1)

=Ω(pn/)

and l=2; thus:

Pr∃j∈[k] : (Wj⊆S)≤1/2

Since q(, n) = o(pn/)for sufficiently large nwe have

q(, n)≤n−1

2(n +1)

This proves that there is a point set that is -far from sparse and that is rejected by algorithm

A00 with probability at most 1/2. Since 1/2 < 2/3 we conclude that A00 is not a property

tester. Contradiction. 2

We also note that the proof of the lower bound for disjointness of unit disks can be

modified to prove lower bounds for other classes of objects including line segments, rect-

angles, etc. This means that we cannot asymptotically improve the query complexity of a

property tester when we use the geometry of the special class of objects.

Summary. We summarize the results from this section in the following theorem:

Theorem 1 There is a property tester for pairwise disjointness of a set of geometric ob-

jects Owith query complexity O(pn/)and running time T(8pn/) + O(1)where

T(m)denotes the time needed to compute if a set of mobjects is disjoint. Every property

tester for disjointness of geometric objects has query complexity Ω(pn/).

3.2 Convex Position

In this section we consider a classical property of point sets: Being in convex position. Let

Pdenote a set of npoints in general position in the Rd. We say that a point set Pis in

convex position, if all points in Pare vertices of the convex hull of P:

3.2 Convex Position

Definition 3.2.1 Let Pbe a point set in the Rdand let CH(P)denote the convex hull of P.

A point p∈Pis called an extreme point of P, if pis a vertex of CH(P). A point set Pis in

convex position if each point in Pis an extreme point.

Using our standard representation of point sets we assume that the input point set Pis

represented by a function f: [n]→Rd. We also use the standard distance measure from

Definition 2.1.3. In terms of the problem under consideration this can be stated as follows:

Definition 3.2.2 A set Pof npoints is -far from convex position, if no set Qof size (at

most) n exists such that P\Qis in convex position.

In the remainder of this section we prove that the following algorithm is a property

tester for convex position:

CONVEXTESTER (P, )

let s=164d+1

pnd/ +2d +2

Choose a set S⊆Pof size suniformly at random

if Sis in convex position then accept

else reject

In order to show that CONVEXTESTER is a property tester we have to prove that (1)

it accepts every point set in convex position and (2) it rejects every point set that is -far

from convex position with probability at least 2/3.

We observe that CONVEXTESTER accepts every point set in convex position because

every subset of a set in convex position is in convex position, as well.

Thus we only have to prove that a point set that is -far from convex position is rejected

with high probability. We make use of the following theorem by Carath´

eodory [23]:

Theorem 2 (Carath´

eodory’s Theorem) If P⊆Rdand p∈ CH(P)thenp∈ CH{p0, . . . , pd}

for certain affinely independent vectors {p0, . . . , pd}in P.

Now let us assume that Pis -far from convex position. We want to prove that algo-

rithm CONVEXTESTER rejects Pwith probability at least 2/3. It follows from Theorem 2

that every point set that is not in convex position can be rejected by finding a small set of

d+2points that is not in convex position. We refer to such a set as a counter example.

The idea of the analysis is now to consider sets Wi,1≤i≤n

2(d+1), of d+1points

that can be easily extended to a set of d+2points that is not in convex position. Easily

extendible means that there is a large set Uiof points such that for each p∈Uithe set

Wi∪{p}is a counter example. Intuitively, the existence of such a large set means that we

get one of the d+2points of the counter example for free because our sample set contains

at least one point of Uiwith very high probability.

3 Testing Algorithms for Geometric Properties

Figure 3.1: Illustration for the proof of Lemma 3.2.3. One of the cones Ci(this one is

defined by p1and p2).

Finding Counter Examples. We now want to formalize this idea and prove two tech-

nical lemmas. First we prove that for a given point pin the interior of the convex hull of

a point set Pthere exists a set W⊆Pof dpoints such that there are at least n

d+1points in

Psuch that any of them can be added to W∪{p}to obtain a point set that is not in convex

position.

Lemma 3.2.3 Let Pbe a set of npoints in Rdand let p∈Rdbe a point with p∈

CHint(P)and let P∪{p}be in general position. Then there exist sets W⊆Pand U⊆

P\Wwith |W|=dand |U|≥n

d+1such that p∈ CHint(W∪{q})for each q∈U.

Proof : By Theorem 2 and the general position assumption it follows that there is a set

W∗⊆Pof size d+1such that p∈ CHint(W∗). Let us denote by W∗

i,1≤i≤d+1,

the subsets of W∗of size d. We show that one of the subsets W∗

iof W∗satisfies Lemma

3.2.3. W.l.o.g., let us assume that p= (0,...,0). We now want to partition the Rd

into d+1cones. Let W−

i,1≤i≤d+1, denote the set of points {(−x1,...,−xd) :

(x1, . . . , xd)∈W∗

i}. The conic combination of the point vectors in the set W−

idefines a

cone Ci,1≤i≤d+1. The union of these cones Cicovers the Rd. Thus there is a cone Cj,

1≤j≤d+1that contains at least n

d+1points of P(and also of P\W∗

jsince W∗

j∩Cj=∅).

We further observe that for each q∈P∩Cjit holds that p∈ CHint(W∗

j∪{q}). We

conclude that the sets W=W∗

jand U=P∩Cjsatisfy the lemma. 2

The second technical lemma shows that if a point set is -far from convex position

then there are many sets Wiand Uisuch that Wi∪{q}is not in convex position for every

q∈Ui.

Lemma 3.2.4 Let Pbe a set of npoints in the Rdthat is -far from convex position and

let k=n

d+1. Then there exist sets Wi⊆P,i∈[k], and sets Ui⊆P,i∈[k], such that the

following properties are fulfilled:

(1) |Wi|=d+1,i∈[k],

(2) Wi∩Wj=∅for all i, j ∈[k]with i6=j,

3.2 Convex Position

(3) Wi∪{q}is not in convex position for every q∈Ui, and

(4) |Ui|≥1−

d+1n.

Proof : We construct point sets P1=P, P2, . . . , Pkwith Wi⊆Piand Pi+1=Pi\Wi.

Clearly, our construction satisfies that the sets Wiare disjoint. We show how to construct

Wifrom the set Pi: First of all, observe that for every i∈[k]it holds that |Pi|=n− (d+

1)(i−1)and thus |Pi|≥n− (d+1)( n

d+1−1). This implies that |P\Pi|< n and so the

set Picannot be in convex position (by the definition of -far we can delete every set of

size at most n and do not obtain a point set in convex position). Since Piis not in convex

position there is a point p∈ CHint(Pi). We apply Lemma 3.2.3 with pand Pi\ {p}and

obtain the sets Wand U. We choose Wi=W∪{p}and Ui=U. By Lemma 3.2.3 we get

|Wi|=d+1and Wi∪{q}is not in convex position for each q∈Ui. We already observed

that |Pi|≥n− (d+1)( n

d+1−1). Thus we know that |Pi\ {p}| ≥n−n which means

that |Ui|≥n−n

d+1=1−

d+1n. All properties in Lemma 3.2.4 are satisfied and the lemma is

proven. 2

Lemma 3.2.5 Algorithm CONVEXTESTER is a property tester for the convex position

property. Its query complexity is O(d+1

pnd/).

Proof : We have to prove that CONVEXTESTER accepts every point set in convex position

and rejects every point set that is -far from convex position with probability 2/3.

We already observed that algorithm CONVEXTESTER accepts every point set in convex

position.

Thus we have to show that every point set that is -far from convex position is rejected

with probability at least 2/3. Let Pbe a point set that is -far from convex position and let

k=n

d+1. Then there exist sets Wiand Ui,i∈[k], with the properties stated in Lemma

3.2.4. Now let S⊆Pbe a point set of size 4d+1

pnd/ +2(d+1)chosen uniformly at

random from P. W.l.o.g., let us assume that  < 1/2. We view Sas the union of two sets

Wand Uof size 4d+1

pnd/ and 2(d+1), respectively. Clearly, we can assume that each

of them is chosen uniformly (and independently) at random from P. We first observe that

for a fixed Uiof size 1−

d+1n:

PrUi∩U6=∅≥1−1−|Ui|

n|U|≥1−1/e ≥1/2

We further know that:

PrPis rejected≥Pr∃i∈[k] : Wi⊆Sand Ui∩S6=∅

≥Pr∃i∈[k] : Wi⊆Wand Ui∩U6=∅

≥Pr∃i∈[k] : Wi⊆W·1/2 .

To derive a bound on Pr∃i∈[k] : Wi⊆Wwe apply Lemma 2.4.1 with k=n

d+1,

l=d+1, and

s=4d+1

pnd/ ≥d+1

p(2n)d·(d+1)/ ≥2n

(2k)1/l

3 Testing Algorithms for Geometric Properties

and obtain: Pr∃i∈[k] : Wi⊆W≥1/4 .

We conclude that PrPis rejected≥1/8 .

If algorithm CONVEXTESTER chooses a set Sof size 16 ·4d+1

pnd/ +2(d+1)then

PrPis rejected≥1− (1−1/8)16 ≥2/3

which proves the query complexity stated in Lemma 3.2.5. 2

A Lower Bound on the Query Complexity. We now want to prove that our prop-

erty tester is optimal with respect to the asymptotic query complexity. The proof follows

the same line of thought as the proof of Lemma 3.1.3. We first show that the existence of

a property tester with query complexity q(, n)implies that there is a canonical property

tester with the same query complexity, i.e. a property tester that samples a set Sof q(, n)

points uniformly at random and accepts, if and only if the sample set Sis in convex po-

sition. Then we show that for sufficiently large nthere is a point set Pthat is -far from

convex position such that

Pr[Sis in convex position ]> 1/3

holds, if S⊆Pis a set of q(, n) = o(d+1

pnd/)points chosen uniformly at random

from P. This implies that there is no property tester with query complexity q(, n).

Lemma 3.2.6 Every property tester (with one sided error) for convex position has query

complexity Ω(d+1

pnd/).

Proof : The proof is by contradiction. Assume there is a property tester Afor convex

position with query complexity q(, n) = o(d+1

pnd/). We first want to prove that there

is a canonical tester with the same query complexity:

Claim 3.2.7 Let Abe a property tester for convex position of point sets with query com-

plexity q(, n). Then there is a property tester A00 for convex position of point sets that

samples a set Sof q(, n)points uniformly at random and accepts if and only if the points

in Sare in convex position.

Proof : The proof is similar to the proof of Claim 3.1.4. By Lemma 2.3.1 we know that

there is a property tester A0that samples a set of q(, n)points uniformly at random and

decides based on Sand its internal coin flips. Let A00 be an algorithm that samples a set Sof

q(, n)points uniformly at random and that accepts if and only if Sis in convex position.

We want to prove that A00 is a property tester. Clearly, algorithm A00 accepts every point set

in convex position. Thus we have to prove that it rejects every point set that is -far from

convex position. We show this indirectly by proving that if algorithm A0rejects a sample

set Sthen so does algorithm A00. By definition a property tester has one-sided error and

3.2 Convex Position

so A0must accept if the sample is in convex position. But if the sample Sis not in convex

position then A00 rejects. It follows that A00 always rejects when A0rejects. Since A0is

a property tester it rejects every point set Pthat is -far from disjoint with probability at

least 2/3. Hence algorithm A00 also rejects such a set Pwith probability at least 2/3. Since

A00 accepts every point set in convex position we have shown that A00 is a property tester

for convex position. This proves Claim 3.2.7. 2

Let us denote by A00 the property tester as obtained in Claim 3.2.7. In the remainder of

the proof we construct a point set Pof npoints that is -far from convex position and show

that this point set is accepted by A00 (observe that A00 is uniquely defined by q(, n)) with

probability at least 1/2. Since Pis -far from convex position this is a contradiction to the

fact that A00 is a property tester (which would require that Pis rejected with probability at

least 2/3 > 1 −1/2 =1/2).

We start with an overview of our construction and then present it in details. The idea is

to construct a point set Pthat consists of n+1sets Wiof size d+1(and some additional

points). Each set Wiconsists of a set Fiof dextreme points that form a facet of the convex

hull and one point qithat is located infinitesimally close to the facet but in the interior of

the convex hull. The point set has the property that every subset S⊆Pof size more than

d+1is in convex position, if and only if there is a set Wi⊆S.

Figure 3.2: An example for the lower bound construction in 2D.

Now let Sbe a point set of size schosen uniformly at random from P. We apply

Lemma 2.4.2 with p=1/2 and obtain that for

s≤(n−d)·1

2(n +1)

1/(d+1)

=Θ(d+1

pnd/)

our sample set Sdoes not contain one of the sets Wiwith probability at least 1/2. Hence S

is in convex position with probability at least 1/2 and thus algorithm A00 is not a property

tester, if q(, n)≤(n−d)·1

2(n+1

1/(d+1).

TheDetailedConstruction. For a set Xanda pointpin Rdletdist(X, p) = minx∈X||x−

p||. Let P1be a set of n−d·(n +1)points in general and convex position and let

 < 1/d −1/n. Further let P2={p1, . . . , pk}⊆P1be a subset of size k=n +1.

3 Testing Algorithms for Geometric Properties

For each point p∈P1let hpdenote a hyperplane (a tangent to CH(P1)) such that

hp∩CH(P1) = p. Further let H+

p,H−

pdenote the closed halfspaces induced by hpsuch

that H+

pdenotes the halfspace containing P1(Figure 3.2). Let

H−

pHp

Figure 3.3: Step 1: We start with a point set in convex position.

β=min

p,q∈P1,p6=qdist(hp, q).

Observation 3.2.8 Every point set e

P1obtained from P1by perturbating each of the points

in P1by a distance of less than β/2 is in convex position.

Instead of perturbating the point set we replace each point pi∈P2by a set Fi,i∈[k], of d

points (Figure 3.4). We do this replacement in such a way that we obtain a point set P3in

general and convex position. For a point p∈Rdlet B(p, β/3)denote the d-dimensional

ball with center pand radius β/3. We choose each Fisuch that

•q∈hp∩B(p, β/3)for each q∈Fi.

•P3=P1\P2∪Fis in general position, where F=Si∈[k]Fi

Clearly, we can choose the Fisuch that P3is in general position because we can move the

points within B(pi, β/3)∩hpito destroy every degenerate cases. We observe that P3is in

convex position since hpiis a witness for the fact that the points in Fiare extreme points

in P3. We obtain the set Pfrom P3by adding a set Q={q1, . . . , qk}of kpoints to P3. Let

Wi=Fi∪{qi}for each qi∈Q. We want to choose Qin such a way that:

•P=P3∪Qis -far from convex position,

•If a subset S⊆Pcontains no set Wi,i∈[k], completely, then Sis in convex

position.

We now give a construction that satisfies these properties. For each i∈[k]let q0

idenote

a point in the interior of the ((d−1)-dimensional) convex hull of Fiand let videnote

a vector pointing from q0

iin the interior of CH(P3)(Figure 3.5). Let qi(β)denote the

point q0

i+β·vi. For i∈[k]and p∈Filet hFi,p(β)denote the hyperplane induced by

Fi\ {p}∪{qi(β)}. Further let H+

Fi,p(β)and H−

Fi,p(β)denote the halfspaces induced by

3.2 Convex Position

Figure 3.4: Step 2: We replace some points by a set Fiof d=2points.

hFi,p(β)such that H+

Fi,p(β)contains Pfor β=0. Since P3is in general position there is

a value β∗such that for every i∈[k]and every p∈Fithe halfspace H+

Fi,p(β∗)contains

P3\ {p}and such that P3∪Qis in general position (where Qis defined as follows). We

choose qi=qi(β∗)and Q=Si∈[k]qi. We now have to prove that Qhas the required

properties:

Claim 3.2.9 The set P=P3∪Qis -far from convex position. If a subset S⊆Pdoes not

contain a set Wicompletely, then Sis in convex position.

Proof : Assume Pis -close to convex position. Then there is a set Rof at most n

points such that P\Ris in convex position. Since |R|≤n there is a set Wicompletely

contained in P\R. Further we know that by the size of Pand Rthere must be at least

one further point qin P\R. By the choice of Qwe know that qis in H+

Fi,p(β∗)for each

p∈Fi. Hence qiis in the interior of CH(Fi∪{q})which contradict the fact that P\Ris

in convex position.

It remains to prove the second statement. Let S⊆Pa subset that contains no set Wi

completely. W.l.o.g., we assume that it contains exactly dpoints of each set Wi. If these

points are the points in Fithen by our construction the hyperplane hpiis a witness that the

points in Wi=Fiare extreme. Otherwise, one of the points in Fiis not in S. Let is call the

missing point p. Then hFi,p(β∗)is a witness that the points are extreme points. Thus for

every point p∈Swe have a witness that pis extreme. Hence Sis in convex position. 2

q’

vii

Figure 3.5: Step 3: We place another point in the interior of CH(Fi)and move it slightly

along viinto the interior of the convex hull.

3 Testing Algorithms for Geometric Properties

Now we can apply Lemma 2.4.2 with p=1/2 and obtain that for

s≤(n−d)·1

2(n +1)

1/(d+1)

=Ω(d+1

pn/)

a sample set Sof size schosen uniformly at random from Pdoes not contain one of the

Wiwith probability at least 1/2. Since q(, n) = o(d+1

pnd/ we have for sufficiently

large n

q(, n)≤(n−d)·1

2(n +1)

1/(d+1)

It follows that there exists a point set Pthat is -far from convex position and that is ac-

cepted by algorithm A00 with probability at least 1/2. Since the existence of a property

tester for convex position with query complexity q(, n)implies that algorithm A00 is a

property tester, it followsthat thereis no propertytester withquery complexity o(d+1

pnd/).

Summary. We summarize the results of this section in the following theorem:

Theorem 3 Let Pbe a point set of npoints in the Rd. Then there is a property tester

for the convex position property with query complexity O(d+1

pnd/)and running time

O(T(d+1

pnd/)) where T(n) = O(n·logO(1)h+ (nh)bd/2c

bd/2c+1·logO(1)n)is the running

time of the fastest known algorithm [24] to decide if a point set is in convex position (h

denotes the number of extreme points of the set). Every property tester for convex position

has a query complexity of Ω(d+1

pnd/).

Proof : Follows from Lemma 3.2.5 and Lemma 3.2.6. 2

3.3 Euclidean Minimum Spanning Tree

We are given a point set Pin the R2and a geometric graph G= (P, E)with vertex set

Pand edge set Eand we want to design a property tester for the property that Gis a

Euclidean Minimum Spanning Tree (EMST) of the point set P. A graph G= (P, E)is

called a Euclidean minimum spanning tree of point set P, if Gis a minimum spanning

tree of the complete Euclidean graph of P. The complete Euclidean graph is a complete

weighted graph where each edge e= (p, q)∈P×Phas weight equal to the Euclidean

distance between pand q.

In this section we make slightly different assumptions on the input representation. We

assume that our property tester has oracle access to point set Pand graph Geach repre-

sented by a separate function. The point set is represented in a similar way as in the pre-

vious subsection. The graph is represented in the unbounded length adjacency list model

[67]. Details on the input representation follow in Subsection 3.3.1.

For simplicity we assume throughout this section that Pis in general position, i.e., all

edge weights are distinct and hence the EMST is unique and its maximum degree is five.

3.3 Euclidean Minimum Spanning Tree

3.3.1 Basic Definitions and Input Representation

Let Pdenote a set of npoints in general position in R2(we consider only the problem in

two dimensional space) and let T= (P, E)denote the Euclidean minimum spanning tree

of P. We start with some basic definitions needed in this section before we discuss the

input representation:

Definition 3.3.1 Ageometric graph for Pis a weighted graph G= (P, E)with vertex set P

and edge set E⊆P×P(the edges can be interpreted as straight-line segments connecting

the endpoints). The weight of an edge (p, q)is implicitly given by the Euclidean distance

between pand qin R2.

Definition 3.3.2 A geometric graph for Pthat is the minimum spanning tree of the com-

plete geometric graph for Pis called the Euclidean minimum spanning tree (EMST) of

In this section we do not use the standard distance measure (though the differences are

insignificant), because the input object consists of two parts, the point set and the graph.

Further we use the unbounded length adjacency list model which uses a non-functional

representation of the graph. Our results also hold in the (weaker) bounded degree adja-

cency list model.

Typically, a distance measure for graph properties depends on the number of entries

in the graph representation that must be changed to obtain a graph that has the tested

property. In our case, the size of the graph representation depends on the number of edges

in the graph. Since every EMST has n−1edges we let our distance measure depend on

the number of vertices instead of the number of edges:

Definition 3.3.3 Let G= (P, E)be a geometric graph for Pand let T= (P, E)denote

the Euclidean minimum spanning tree of P. We say Gis -far from being the Euclidean

minimum spanning tree of P(or, in short, -far from EMST), if one has to modify (insert

or delete) more than n edges in Gto obtain T, that is :

|E\E|+|E\E|> n .

We want to design a property tester for the property of being a Euclidean minimum

spanning tree of a point set in the plane. Such a property tester takes as input a point set P

in general position in R2and a geometric graph Gfor P, and accepts the input if Gis the

EMST for Pand rejects the input with probability at least 2

3if Gis -far from being the

Euclidean minimum spanning tree of P.

Input Representation. Our property tester has to verify whether a geometric graph is

the EMST of a point set P. Hence, it must have access to the graph and the point set. Both

the point set and the graph are given as an oracle. Similar to the previous section the point

set is represented by a function f: [n]→R2. Thus the algorithm may query the oracle for

the value of f(i)for some i∈[n]and gets in return the position of the i-th point of P.

3 Testing Algorithms for Geometric Properties

The geometric graph is given in the unbounded length adjacency list representation

introduced in [67]. Let us briefly recall the description of the model as given in Chapter

2: The unbounded length adjacency list model is a general model for sparse graphs. The

graph structure is represented by adjacency lists of varying length. Our property tester may

query for the degree deg(p)of a vertex pand for each i≤deg(p)it may query for the

i-th neighbor of p. We represent the vertex set of our graph by the set of numbers [n].

Thus we can easily obtain the position of a vertex pfrom the point set representation by

querying for the value of f(p).

BasicPropertiesof EMSTs. The followingclaim statessome basic(and well-known)

properties about Euclidean minimum spanning trees:

Claim 3.3.4 Every Euclidean minimum spanning tree of a point set in general position

in R2has maximum degree less than or equal to five, is connected, and its straight-line

embedding is crossing-free.

Proof : The EMST is connected since it is a spanning tree of a complete graph. Its

embedding is crossing-free because it is a subgraph of the Delaunay triangulation; see,

e.g., [61]. Assume the EMST has a vertex with degree 6 or larger. Then there are two

edges with an angle less than 60 degree since the point set is in general position. But this

means that at most one of these edges can be in the EMST. Contradiction. 2

Now we want to introduce some additional notation that will be useful to simplify the

description of the algorithm and its analysis. Let Gdenote a geometric graph for P. The

EMST-completion of Gis a geometric graph G0over the same point set that contains all

edges from Gand all edges of the EMST (that are missing in G).

Definition 3.3.5 For a given point set P, a geometric graph G= (P, E), and the Euclidean

minimum spanning tree T= (P, E)of P, the EMST-completion of Gis the geometric graph

GC= (P, E ∪E)that contains all edges that are either in Gor in T.

In the next subsection we present a property tester for EMST that works for a special

class of input graphs which we call well-shaped. A well-shaped graph is a connected graph

with maximum degree 5 that has crossing-free EMST-completion. The restriction to well-

shaped graphs simplifies the analysis of the algorithm and it allows a clear view on the

important features of the property tester.

Definition 3.3.6 Let Gbe a geometric graph for point set P. We call Gwell-shaped, if

•it has maximum degree of 5,

•it is connected,

•and the straight-line embedding of its EMST-completion is crossing-free.

3.3 Euclidean Minimum Spanning Tree

Later in this section we present a general testing algorithm for EMST. This algorithm first

tests if the input graph is far from a well-shaped graph. If this is the case then we can

reject the graph by Claim 3.3.4. If the input graph passes the test, then we know that with

good probability it is either close to a well-shaped graph or it is well-shaped. If the graph

is well-shaped we can use the testing algorithm for the special case of well-shaped graphs.

At the end of this section we show that for this algorithm it does not matter if the graph is

well-shaped or close to well-shaped. Hence, we can also use the algorithm if the graph is

close to a well-shaped graph.

3.3.2 Testing EMSTs in Well-Shaped Graphs

We now design a property tester for EMST for the case that our input graph is well-shaped.

First, we give an overview of the algorithm:

Let Gdenote a well-shaped geometric graph with vertex set P. Our property tester uses

the following procedure: We first pick a sample set S⊆Pusing some randomized scheme

to be described later. Next, we find the subgraph GSof Gthat is induced by the vertex set

S. Then we compute the EMST-completion of GS. If the EMST-completion has a cycle

then we reject the input, otherwise we accept. Computing the EMST-completion can be

done in time O(|S|·log|S|). We just have to compute the EMST of Sand insert the missing

edges in GS. The query complexity of the tester is O(|S|).

We first show that we can always reject the input graph, if the EMST-completion of GS

contains a cycle. We use the following lemma which follows easily from standard theory

of minimum spanning trees (see, e.g., [77, Chapter 6]). It implies that if we sample a set

of vertices S⊆Pthen an edge ecannot be in the EMST of Pif it is not in the EMST of S.

Lemma 3.3.7 Let S⊆Pbe a subset of Pand let p, q ∈S. If the edge e= (p, q)does not

belong to the EMST of S, then edoes not belong to the EMST of P.

Proof : Our proof is by contradiction. Let us suppose that edoes not belong to the EMST

of Sand ebelongs to the EMST Tof P. The removal of ein Tcuts Tinto two trees. These

two trees induce a partition of Pinto two subsets P1and P2. Since ebelongs to the EMST

of P,emust also be the shortest edge between these two subsets. Let S1=P1∩Sand

S2=P2∩S.P1and P2are not empty since one vertex of eis in each of the sets. Then

eis also the shortest edge between S1and S2and therefore it belongs to the EMST of S;

contradiction. 2

Lemma 3.3.7 is of major importance for the way we proceed. Therefore we state its

consequences in Corollary 3.3.8 below. First of all, it shows that our property tester always

accepts the EMST of P(it rejects only, if there is a cycle in the EMST-completion and such

a cycle provides a counter example for the EMST property). Further it can be used to show

that we can reject the input graph, if all vertices of a cycle in the EMST-completion of G

are contained in S:

Corollary 3.3.8 Let Gbe a geometric graph for P. Let S⊆Pbe a subset of Pand let GS

be the subgraph induced by S.

3 Testing Algorithms for Geometric Properties

•If the EMST-completion G0= (P, E0)of Gcontains a cycle C= (p0, . . . , pk)2of

length k≥3and pi∈Sfor all 0≤i≤k, then there is a cycle in the EMST-

completion of GS.

•If the EMST-completion of GScontains a cycle C= (p0, . . . , pk)of length k≥3,

then Gis not the EMST of P.

Proof : We have to prove that Cis a cycle in the EMST-completion of GS, i.e., that every

edge (pi−1, pi+1)for all i∈[k]is in the EMST-completion G0

Sof GS. Every such edge

is either an edge of the EMST or an edge of the input graph. Obviously, the edges of the

input graph are in G0

Sand by Lemma 3.3.8 all EMST edges are also in G0

Now we prove the second part: Since the EMST-completion of GScontains a cycle

there is an edge ein GSthat does not belong to the EMST of S. It follows by Lemma 3.3.7

that eis not in the EMST of P. Hence Gis not the EMST of P.2

Now we consider the case when the input graph G= (P, E)is -far from EMST. Our

goal is to design a randomized sampling scheme such that the EMST-completion of the

subgraph induced by the sample set contains a cycle with high probability. Let T= (P, E)

denote the EMST of Pand let GC= (P, E∪E)denote the EMST-completion of G. In the

following we refer with red edges to the edges in E\Eand with blue edges to the edges

in E. It turns out that it is sufficient to focus on “short” cycles that contain at most two red

edges:

Definition 3.3.9 Let Gbe a geometric graph for Pand let Cbe a cycle of length kin the

EMST-completion of G. We call C -short if (1) its length kis smaller than or equal to 72



and (2) it contains at most two red edges.

In our algorithm we try to find -short cycles that satisfy some additional “topological”

properties. We want to exploit the fact that Gis well-shaped, in particular, that the EMST-

completion of Ghas a crossing-free straight-line embedding. Hence we use a topological-

like representation of the geometric graph Gto exploit the fact that every minimal cycle

in a (well-shaped) planar geometric graph corresponds to a face in its straight-line embed-

ding. In order to use this approach in a formal framework we will consider the geometric

graph Gnot only as an undirected graph, but at the same time also using its “directed” rep-

resentation by “replacing” each undirected edge (p, q)by two directed edges [p, qiand

[q, pi.

For every vertex pin Gwe (cyclically) sort incident outgoing edges in clockwise order

around the vertex pwith respect to the Euclidean positions of the edges’ endpoints. (This

sorting is done only implicitly, but since we assume that each vertex has a constant degree,

each time we consider a vertex we can in constant time sort its incident edges.) The

successor of a directed edge [p, qiis the edge [q, riwhere ris the vertex adjacent to q

that precedes pin the adjacency list of q(sorted in clockwise order around q). With this

definition cycles of succeeding edges correspond to faces of the straight-line embedding.

2Here, C= (p0,...,pk)is a cycle (of length k) if pi∈Pfor all i∈[k],p0=pk,(pi−1, pi)∈E0for all

i∈[k]and pi6=pjfor all i, j ∈[k],i6=j.

3.3 Euclidean Minimum Spanning Tree

Figure 3.6: A straight-line embedding and its planar map representation.

Such a representation of Gis called a planar map for the straight-line embedding of G(see

figure 3.6). We denote it e

Definition 3.3.10 Let Gbe a well-shaped geometric graph for Pand let e

Gbe the corre-

sponding planar map. For each edge e= [p, qiin e

Gthe successor of eis the edge [q, ri

in e

Gwith rbeing the vertex adjacent to qthat precedes pin the adjacency list of q(sorted

in clockwise order around q); if qhas degree one, then r=p.

For edge e= [p, qiin Gthe kth successor,k≥0, is defined recursively as follows:

the 0th successor of [p, qiis [p, qiitself, and for k > 0, the kth successor of [p, qiis the

successor of the (k−1)st successor of [p, qi.

Similarly, edge eis the predecessor of edge e0if edge e0is the successor of edge e, and

eis the kth predecessor of e0if e0is the kth successor of e.

We have introduced the planar map representation of a graph Gbecause it describes the

faces of the corresponding embedding in a simple way (using succeeding edges in e

G). We

observe that the correspondence between the faces in the embedding of Gand the cycles

of succeeding edges in e

Gis one to one. We also note that each (directed) edge is contained

in exactly one cycle of succeeding edges.

Definition 3.3.11 Let Gbe a well-shaped geometric graph for Pand let C= (p0, . . . , pk)

be a cycle in the planar map of the EMST-completion of G. Then Cis called topological,

if for every two consecutive edges on the cycle [pi, pi+1iand [pi+1, pi+2i,[pi+1, pi+2iis

the successor of [pi, pi+1i. We also call the corresponding cycle in Gtopological.

The following key lemma shows that every well-shaped geometric graph that is far

from EMST must contain many short topological cycles in its EMST-completion.

Lemma 3.3.12 Let G= (P, E)be a well-shaped geometric graph for P. If Gis -far from

EMST, then there are at least  n

100 -short topological cycles in the EMST-completion of G.

Proof : Let T= (P, E)denote the EMST of Pand let EB=E\Edenote the blue edges of

the EMST-completion of Gthat are not in E. Further let ER=E\Edenote the red edges

3 Testing Algorithms for Geometric Properties

of the EMST-completion of G. Since Gis -far from EMST we have |EB|+|ER|> n by

definition of -far.

Now let Hdenote the EMST-completion of Gand let e

Hdenote the planar map of

its straight-line embedding. Let us denote by ρthe number of faces in the straight-line

embedding of H. Hence e

Hhas ρ(disjoint) topological cycles since each face is bounded

by a unique cycle of succeeding edges (which by definition is called topological). Since

His planar and connected and has more than n−1+n/2 edges we can apply Euler’s

formula to deduce that ρ≥n/2.

Now let s(f)denote the number of (directed) edges in the topological cycle bounding

face f. Since Pfs(f)≤6n by Euler’s formula there can be at most ρ

4faces fwith

s(f)≥48

. Thus 3ρ

4faces fhave s(f)<48

.

Since |ER|≤ρthe number of directed red edges is at most 2ρ. Hence the number of

topological cycles with 3 or more red edges can be at most 2ρ

3. Since we have shown that

there are at least 3ρ

4topological cycles having less than 48

edges, at least ρ

12 ≥n

100 of them

have at most two red edges. 2

Definition 3.3.13 Let Gbe a well-shaped geometric graph for P. For every vertex p∈P,

we define its topological k-neighborhood as the set of vertices that are the endpoints of the

edges that are either the ith successor, 0≤i≤k, of any edge incident to p, or are the jth

predecessor, 0≤j≤k, of any edge incident to p. The topological k-neighborhood of a

vertex pis denoted Ntop

G(p, k).

The following claim follows easily from the fact that our input graph has maximum

degree of 5.

Claim 3.3.14 Let Gbe a well-shaped geometric graph for P. For every vertex p∈P, we

can find its topological k-neighborhood in time O(k).2

Now, provided that Gis -far from EMST (but well-shaped), our first approach is to

sample uniformly at random a sufficiently large set Qof points in P. Then we add for

every point in Qits topological 72

-neighborhood to the sample set. Provided that the set

Qis sufficiently large, we prove in Lemma 3.3.18 that if Gis -far from EMST, then

the so obtained set will contain all vertices from a certain -short topological cycle in the

EMST-completion of Gwith probability at least 2/3. Therefore, this will certify that G

is not an EMST by Corollary 3.3.8. In Lemma 3.3.24, we tune the sketched algorithm to

slightly improve the complexity bound.

3.3.3 A Simple Property Tester in Well-Shaped Graphs

Now we discuss our first property tester for EMST for well-shaped input graphs. We follow

the approach sketched in the beginning of this subsection. According to that description

the key issue is to describe the algorithm that finds the sample set Sand to prove that

with probability at least 2

3the input is rejected if Gis -far from EMST. In particular, by

our discussion above we know from Lemma 3.3.12 that if a well-shaped graph Gis -far

3.3 Euclidean Minimum Spanning Tree

from EMST, then there are many -short topological cycles in the EMST-completion of G.

Further, by Corollary 3.3.8, in order to reject Git is sufficient to prove that with probability

at least 2

3the sample Scontains all vertices from a certain -short topological cycle in the

EMST-completion of G. We provide a precise description of this algorithm and its analysis

below. We assume that Gis well-shaped. Every -short topological cycle either

1. is a cycle consisting of at most 72

blue edges, or

2. is a path consisting of at most 72

blue edges having the endpoints connected by a red

edge, or

3. is a path consisting of at most 72

blue edges having the endpoints connected by a

path of two red edges, or

4. consists of two paths containing at most 72

blue edges that are connected to each

other by two red edges.

Our first observation is that if there are many -short topological cycles of type (1) or (2),

then we can easily spot them. Indeed, if there are at least  n

200 -short topological cycles of

type (1) or (2) in the EMST-completion of G, then it is enough to take a random subset Q

of Pof size Θ(1/)to ensure that at least one vertex from any -short topological cycle

will be in Qwith probability at least 2

Lemma 3.3.15 Let G= (P, E)be awell-shaped geometricgraph. Ifthe EMST-completion

of Gcontains at least  n

200 -short topological cycles of type (1) or (2), then a set Q⊆P

of size 4000/ chosen uniformly at random contains at least one vertex from any -short

topological cycle.

Proof : Since Gis well-shaped it has a maximum degree of 5and so the EMST-

completion of Ghas a maximum degree of 10. We conclude that every vertex p∈P

is contained in at most 10 -short topological cycles. Furthermore, the set PCof all ver-

tices that are contained in at least one -short topological cycle (of type (1) or (2)) has

cardinality at least  n

2000 . Now let Q⊆Pdenote a set of size 4000

taken uniformly at

random from P. Then

PrQ∩PC=∅≤(1−|PC|

n)|Q|≤(1−

2000)|Q|≤1/3 .

Therefore, PrQ∩PC6=∅≥2/3 .

Now, we explore the key feature of -short topological cycles of type (1) or (2): For

every vertex vfrom such a cycle all other vertices from the cycle belong to the topological

-neighborhood of v. This motivates us to define the sample set Sas the topological 72

-

neighborhood of all vertices in Q. Since the set Qcontains at least one vertex from any

-short topological cycle of type (1) or (2) with probability at least 2/3, we can conclude

that Scontains all vertices from a particular -short topological cycle of type (1) or (2)

3 Testing Algorithms for Geometric Properties

with probability at least 2

3. Since every vertex of the topological cycle is contained in our

sample set we know by Corollary 3.3.8 that the EMST-completion of the subgraph induced

by our sample contains a cycle. Thus our property tester rejects the input with probability

3if it is -far from EMST.

It is easy to see that the procedure for cycles of type (1) and (2) described above has a

query complexity of O(|Q|

) = O(1

2).

Lemma 3.3.16 Let G= (P, E)be a well-shaped geometric graph and let Q⊆Pbe a set

of size 4000/ chosen uniformly at random from P. If the EMST-completion of Gcontains

at least  n

200 -short topological cycles of type (1) or (2), then the set

S=[

p∈QNtop

G(p, 72

)

contains all vertices of at least one -short topological cycle with probability at least 2/3.

Proof : Follows from Lemma 3.3.15 and the observation that a cycle of type (1) and (2)

is completely contained in S, if one of its vertices is contained in Q.2

The -short topological cycles of type (3) and (4) are a little bit more difficult to detect.

However, we can still use a very similar approach as for cycles of type (1) or (2), but

this time we must find two vertices that belong to the same -short topological cycle.

Suppose that there are at least  n

200 -short topological cycles of type (3) or (4) in the

EMST-completion of G. As before, we first take a random subset Qof P, but this time

the size of Qis Θ(pn/). Then, we define the sample set Sto be the union of the

topological 72

-neighborhood of all vertices in Q. We show now that the so defined sample

set is sufficient to certify that G(if it is -far from EMST) is not an EMST by proving an

analogous statement to Lemma 3.3.16 for cycles of type (3) and (4):

Lemma 3.3.17 Let G= (P, E)be a well-shaped geometric graph and let Q⊆Pbe a

set of size 80pn/ chosen uniformly at random from P. If the EMST-completion of G

contains at least  n

200 -short topological cycles of type (3) or (4), then the set

S=[

p∈QNtop

G(p, 72

)

contains all vertices of at least one -short topological cycle with probability at least 2/3.

Proof : For every -short topological cycle Cof type (3) let us define the set WCto

contain two vertices: one vertex on the blue path in Cand the vertex incident to the two

red edges in C. Similarly, for every -short topological cycle Cof type (4) let us define the

set WCto contain one vertex from the first blue path in Cand one vertex from the second

blue path in C.

Since each vertex p∈Pbelongs to at most 10 -short topological cycles we can select

from the sets WCthe sets Wi,1≤i≤n

2000 , such that the sets Wiare disjoint and for each

3.3 Euclidean Minimum Spanning Tree

i,1≤i≤n

2000 , there is an -short topological cycle Cwith Wi=WC. Then we apply

Lemma 2.4.1 with k=n

2000 ,l=2, and s=20n

√n =20pn/ and obtain

Pr∃j∈[k] : (Wj⊆Q)≥1− (1−1/4)4≥2/3 .

Now observe that all vertices of a cycle Care in S, if WC⊆Q. Therefore, the lemma

follows. 2

We prove that the following algorithm is a property tester for EMST:

EMST-TEST-SIMPLE(G, )

s=80pn/ +4000/

choose a set Q⊆Pof size suniformly at random

S=Sq∈QNtop

G(q, 72

)

compute the subgraph GSinduced by S

compute the EMST-completion GCof GS

if GCcontains a cycle then reject

else accept

Lemma 3.3.18 Let Gbe a well-shaped geometric graph for P. Then there is a property

tester that in time Opn/3·log(n/)and with query complexity O(pn/3)accepts

the input if Gis an EMST of Pand rejects the input with probability at least 2

3, if Gis -far

from EMST.

Proof : We claim the algorithm EMST-TEST-SIMPLE is a property tester for EMST. By

Corollary 3.3.8 we know that EMST-TEST-SIMPLE accepts, if the input graph G= (P, E)

is the EMST.

Now let us consider the case when Gis -far from EMST. Then by Lemma 3.3.12 we

know that there are n/100 -short topological cycles in the EMST completion of G. It

follows that there are n/200 cycles of type (1) and (2) or n/200 cycles of type (3) or

(4). By Lemma 3.3.16 and 3.3.17 we know that the sample taken by EMST-TEST-SIMPLE

contains an -short topological cycle with probability at least 2/3. By Corollary 3.3.8 we

know that then there is a cycle in the EMST-completion of the subgraph induced by our

sample. Hence the algorithm rejects in such a case.

The query complexity of the algorithm is immediate. Its running time follows from

Claim 3.3.14 and the fact that the EMST completion of a graph with mvertices can be

computed in time O(mlogm).2

3.3.4 An Improved Property Tester in Well-Shaped Graphs

Now we slightly improve the complexity of the property tester described in Lemma 3.3.18.

In our analysis above we were always trying to catch one initially fixed single vertex from

each blue path although an -short topological cycle can contain as many as 72

vertices.

3 Testing Algorithms for Geometric Properties

We now want to take the length of the topological cycles into consideration. Further, we

were always taking topological 72

-neighborhoods of all vertices. This strategy should be

applied to the cycles that haveas many as 72

edges, but it is not necessary for shorter cycles.

Our approach now is to improve the complexity of the property tester by combining these

two observations. We want to show that the following algorithm is a property tester for

EMST, if the input graph is well-shaped:

EMSTTEST(G, )

s=1700pn/ +192000/ +4000/

S=FINDCYCLE(G, s, )

compute the subgraph GSinduced by S

compute the EMST-completion GCof GS

if GCcontains a cycle then reject

else accept

Where the procedure FINDCYCLE is the following:

FINDCYCLE(G, s, )

S(0)=∅

for i=1to 2s do

j=0

pick a vertex p(i)∈Puniformly at random

while j≤log 72

do

j=j+1

flip a coin

if head then exit

S(i)=S(i−1)∪Ntop

G(p(i), 2j)

return S(2s)

First of all, we observe that algorithm EMSTTEST accepts every Euclidean minimum

spanning tree by Corollary 3.3.8.

Thus we have to prove that the algorithm rejects, if the input graph is -far from EMST.

Let us assume that Gis well-shaped and -far from EMST. Then, by Lemma 3.3.12, there

are at least  n

100 -short topological cycles in the EMST-completion of G. Let Cj,j=

1, 2, 3, 4, denote the set of all -short topological cycles of type (j) in the EMST-completion

of G. Now we consider separately cycles in C1∪C2and cycles in C3∪C4. By our discussion

above we have either |C1∪C2|≥ n

200 or |C3∪C4|≥n

200 .

Cycles of type (1) and (2). Suppose that Gis a geometric graph for Pwith maxi-

mum degree 5and there are at least εn -short topological cycles of type (1) or (2) in the

EMST-completion of Gfor ε=

200 . We first consider the probability that a fixed -short

topological cycle C∈C1∪C2is contained in the sample set. Let `denote the number

of vertices in cycle C. Then the probability that in round iof the FINDCYCLE procedure

3.3 Euclidean Minimum Spanning Tree

vertex p(i)is one of the `vertices of cycle Cis `

n. Further the probability that the topolog-

ical neighborhood of p(i)is chosen large enough to contain all vertices of Cis at least 1

2` .

Overall, for a fixed cycle Cthe probability that a vertex of Cis chosen in round iand that

the topological neighborhood of the vertex is large enough is at least 1

2n . If the cycles are

vertex disjoint then it is simple to prove that after O(1

)rounds at least one cycle is com-

pletely contained in the sample set with constant probability. Unfortunately, in the general

case the cycles are not vertex disjoint. To overcome this technical problem we use the

planar map representation of Gand the following trick for the analysis: Instead of taking

the whole topological 2j-neighborhood of vertex p(i)we assume that our algorithm selects

only one of the outgoing edges (in its planar map representation) uniformly at random.

Then it includes only the 2jsuccessors and predecessors of the chosen edge in the planar

map representation of G. Clearly, this procedure considers only a subset of the vertices

considered in the original procedure. Nevertheless, we can show that the set of vertices

we pick using this procedure is still sufficiently large. We can now use the fact that the

(directed) cycles are edge disjoint. Assume that we pick a vertex that belongs to a cycle C.

Provided that the chosen neighborhood is large enough we still have to choose the correct

outgoing edge to have all vertices of Cin the sample set. Since our graph has a degree

bound of 5 the probability that this edge is chosen is at least 1/5. Since type (2) cycles

consist of a path of `−1(directed) blue edges the probability that p(i)is one of the `−1

origins of these edges is `−1

n≥`

2n (a directed edge points from its origin to its destina-

tion). Hence the probability that a cycle Cof type (1) or (2) is completely contained in the

sample set is at least 1

20n . We know that the cycles are disjoint and so the probability that at

least one cycle is completely contained in the sample set in round iis at least εn

20n =ε

20 . Let

XCdenote the indicator random variable for the event that cycle C∈C1∪C2is completely

contained in the sample set. Then we have for s≥20/ε =4000/:

Pr∀C∈C1∪C2:XC=0≤1−ε

202s ≤1/3

and hence Pr∃C∈C1∪C2:XC=1≥2/3

and so we have just proved:

Lemma 3.3.19 Let Gbe a geometric graph for Pwith maximum degree 5that has at least

εn =n

200 topological -short cycles of type (1) or (2). If algorithm FINDCYCLE(G, s, )

is invoked with s≥4000/ then the set S(2s)returned by the algorithm contains an -short

topological cycle with probability at least 2/3.2

Cycles of type (3) and (4). Let Gbe a geometric graph with maximum degree 5. Let

us further assume that there are at least εn topological -short cycles of type (3) and (4) in

the EMST-completion of G, for ε=

200 . We want to show that the sample set computed

by algorithm FINDCYCLE contains every vertex of at least one -short topological cycle

with good probability.

Recall that cycles of type (4) consist of 2 paths of blue edges connected by two red

edges. Cycles of type (3) are a special case of type (4) cycles: The shorter path has length

3 Testing Algorithms for Geometric Properties

0. For each cycle C∈C3∪C4let X(i)

Cdenote the indicator random variable for the event

that all vertices of the longer (blue) path of cycle Care in S(i). Let Y(i)

Cdenote the indicator

random variable for the event that all vertices of the shorter (blue) path of cycle Care in

S(i). Further let ∆(i+1)denote the indicator random variable for the event that there is a

cycle C0∈C3∪C4with X(i)

C0=0and X(i+1)

C0=1. We say that a cycle C∈C3∪C4is

half-contained in S(i), if X(i)

C=1. Cycle Cis contained in S(i), if X(i)

C=1and Y(i)

C=1.

We analyze the algorithm in two steps. We first show that with high probability many

(at least εs/80) topological -short cycles are half-contained in the set S(s). Then we show

that the set S(2s)contains at least one cycle C∈C3∪C4with high probability.

Claim 3.3.20 Let the outcome of the random choices in round 1to iof the for-loop of

FINDCYCLE be fixed. Then it holds that, if

C∈C3∪C4

X(i)

C<εs

2(3.1)

then

Pr∆(i+1)=1≥ε/40 . (3.2)

Proof : Let us assume that Equation (3.1) is true. Then we observe that:

C∈C3∪C4

X(i)

C<εs

2≤εn

since s≤n. We conclude that we have more than εn/2 cycles in C3∪C4that are not

half-contained in S(i). If p(i+1)is one of the vertices of the longer path of one of these

cycles and if the topological neighborhood included in FINDCYCLE is large enough then

we have ∆(i+1)=1. To estimate the probability for ∆(i+1)=1we apply the same trick as

in the analysis for the case of type (1) and (2) cycles. This yields immediately (observing

the fact that we have εn/2 cycles instead of εn):

Pr∆(i+1)=1≥1

2` ·`

2n ·1

5·εn

2=ε/40

Our next goal is to show that there are at least εs/80 cycles that are half-contained in

S(s).

Claim 3.3.21

PrX

C∈C3∪C4

X(s)

C≤εs/80≤e−εs/300 .

3.3 Euclidean Minimum Spanning Tree

Proof :

PrX

C∈C4∪C4

X(s)

C≤εs

80

≤PrX

1≤i≤s

∆(i)≤εs

80

≤PrX

1≤i≤s

B(i)≤εs

80

≤PrX

1≤i≤s

B(i)≤(1−1

2)·εs

40

where B(i)are independent 0−1variables with Pr[B(i)=1] = ε/40. The latter inequality

follows from Claim 3.3.20. We apply a Chernoff bound [56, Inequality (7)] and it follows

that

≤e−( 1

2)2εs

40·2=e−εs/320 .

Let W(i+1)denote the indicator random variable for the event that there exists C∈

C3∪C4with X(i)

C=1and Y(i)

C=0and Y(i+1)

C=1.

Claim 3.3.22 Let the outcome of the random choices in round 1to iof the procedure

FINDCYCLE be fixed. If X

C∈C3∪C4

X(i)

C> εs/80

then PrW(i+1)≥εs

1600 ·n.

Proof :

We assume that there are more than εs/80 cycles that are half-contained in S(i). Again

we use essentially the same trick as in the case of type (1) and (2) cycles. We observe that

there is a little problem with cycles of type (3). Since the length of the shorter path is 0there

is no directed edge in this path. Thus we have to modify our approach slightly. We use

the following sampling scheme for the analysis: Instead of taking the whole topological

2j-neighborhood of p(i)we choose a number kbetween 1and 6uniformly distributed. If k

is between 1and 5we include the 2jpredecessors and successors of the k-th edge incident

to p(i). In the case k=6we only include the vertex p(i)in the sample. Then we get that

the probability that a cycle Cis contained in the sample is at least 1

2` ·`

2n ·1

6=1

24n . We

have more than εs/80 cycles that are half-contained in S(i). Therefore we obtain that:

PrW(i+1)≥εs

1920 ·n.

3 Testing Algorithms for Geometric Properties

Lemma 3.3.23 Let Gbe a geometric graph for Pwith maximum degree 5that has at

least εn =n

200 topological -short cycles of type (3) or (4). Then FINDCYCLE is an

algorithm with (expected) query complexity O(pn/ log(n/)) that samples a set S⊆P,

|S|≥1700pn/ +192000/, such that the EMST-completion of the subgraph GS(2s)

induced by S(2s)has an -short topological cycle.

Proof : Let ε=/200 and let Gbe a geometric graph for Pwith maximum degree 5

that has at least εn topological -short cycles of type (3) or (4).

PrThere is a cycle in the EMST-completion of the subgraph induced by S(2s)

≥1−Pr1

C∈C3∪C4

X(s)

C≤ε

80+1−εs

1920ns

by Claim 3.3.22. Choosing s≥1700pn/ +192000/ it follows together with Claim

3.3.21 for n≥4:

≥1− (e−2+e−5)≥4

Obtaining Deterministic Query Complexity. It is easy to modify the algorithm

such that the query complexity has an upper bound of O(pn/ log(n/)) by insignifi-

cantly increasing the error of the algorithm. We can do this in the following way: We run

algorithm EMSTTEST and stop, if it either accepts or rejects or if the sample size becomes

too large. Let XSdenote the random variable for the size of S(2s). We stop the algorithm

and accept the input, if we find out that the size of S(2s)becomes larger than 10 ·E[XS]. By

Markov inequality we have:

PrXS≥10 E[XS]≤1/10 .

Hence it follows that our new algorithm rejects a geometric graph that is -far from EMST

with probability 4/5 −1/10 ≥2/3. Thus it is a property tester with a deterministic bound

of O(log(n/)·pn/)on the query complexity of our algorithm (rather than expected

query complexity).

Lemma 3.3.24 Let Gbe a well-shaped geometric graph for P. Then there is a property

testerthat in time O(log2(n/)·pn/)and withquery complexityof O(log(n/)·pn/)

accepts the input G, if Gis an EMST of Pand that rejects the input with probability at least

3if Gis -far from EMST.

Proof : Follows from Lemma 3.3.12, Lemma 3.3.19 and Lemma 3.3.23. 2

3.3 Euclidean Minimum Spanning Tree

3.3.5 A Property Tester in Graphs with Maximum Degree 5

Now we want to remove the well-shaped condition for input graphs. In this subsection

we do the first step towards that goal. We develop property tester for connectivity and

crossing-free EMST-completions. Then we replace the well-shaped condition for EM-

STTEST by the assumption that the input graph has maximum degree 5. Before EM-

STTEST is invoked we test if the input graph is /200-far from connected and if its

EMST-completion has a crossing-free straight line embedding. Thus we may assume that

EMSTTEST gets an input graph that is /200-close to connected and /200-close to hav-

ing an EMST-completion with a crossing-free straight line embedding (if this is not the

case, the property testers for connectivity and crossing-free EMST-completions reject).

This way we develop a property tester for graphs with maximum degree 5. The degree

bound is then removed in the last subsection.

Testing Connectivity. In the first phase we test whether the input graph is connected.

Definition 3.3.25 A geometric graph Gfor Pis -far from connected if one has to add

more than ·nedges to Gto obtain a connected graph. If Gis not -far from connected,

then we call it -close to connected.

Remark 1 Let us notice here the equivalent characterization of geometric graphs for P

that are -far from connected — these are geometric graphs for Phaving more than ·n+1

connected components.

Since the property of being connected does not depend on the positions of the input

points in P, we can use a property tester for connectivity in graphs. In [52] it has been

proven that connectivity of bounded degree graphs can be tested efficiently:

Lemma 3.3.26 [52] Let Gbe a graph with degree bound d. Connectivity of Gcan be

tested with O(log2(1/d)

d )time and query complexity in the bounded length adjacency list

model (see Chapter 2) 3.

We can immediately apply this result to geometric graphs:

Corollary 3.3.27 Let Gbe a geometric graph for Pwith maximum degree 5. There is a

property tester that in time O(log2(1/)

)and with a query complexity of O(log2(1/)

)accepts

the input if Gis connected and rejects the input with probability at least 2

3if Gis -far

from connected.

Proof : We run the tester from [52] with d=5and 0=/5.2

3In the bounded degree graph model a graph is -far from connected if one has to add more than dn edges

to obtain a connected graph.

3 Testing Algorithms for Geometric Properties

Testing Crossing-Free EMST-Completions. Next, we design a tester that accepts

the input graph, if it is the EMST and rejects it if the straight-line embedding of its EMST-

completion is -far from crossing-free. We proceed in two steps. First, our property tester

checks for pairs of intersecting blue edges and then it tries to find intersections between

blue and red edges; red edges cannot intersect because they are edges of an EMST. We use

the following distance measure.

Definition 3.3.28 The straight-line embedding of a geometric graph Gfor Pis -far from

crossing-free if one has to remove more than ·nedges in Gto obtain a crossing-free

straight-line embedding. If the straight-line embedding of Gis not -far from crossing-

free, then we call it -close to crossing-free.

We first use the tester DISJOINTNESS developed in Section 3.1 to find intersections be-

tween blue segments (induced by blue edges). Since Ghas maximum degree 5it has at

most 5n edges. Therefore, since one can verify in time O(nlogn)if a geometric graph

with nvertices has crossing-free straight-line embedding [16] (the number of edges must

be O(n); otherwise there must be a crossing), Theorem 1 implies the following result.

Lemma 3.3.29 Let Gbe a geometric graph with maximum degree 5. There is a property

tester that in time O(pn/ log(n/)) and with the query complexity of O(pn/)ac-

cepts the input if the straight-line embedding of Gis crossing-free and rejects the input

with probability at least 2

3if the straight-line embedding of Gis -far from crossing-free.

It remains to design a property tester for red-blue intersections in the EMST-completion

of G4. A geometric graph with red and blue edges has a straight-line embedding without

red-blue intersections, if there is no intersection between the corresponding red and blue

segments. Similarly, the straight-line embedding of a geometric graph whose edges are

colored blue and red is -far from having no red-blue intersections, if one has to delete

more than an -fraction of its edges to remove all red-blue intersections.

The main difficulty with testing for red-blue intersections in the EMST-completion of

Gis that the red edges are only defined implicitly. The following lemma shows a relation

between the endpoints of red and blue edge intersecting each other.

Lemma 3.3.30 Let pq be a red and xy be a blue segment in the EMST-completion of G.

If pq and xy intersect each other, then either pq is not in the EMST of every set containing

{p, q, x}or it is not in the EMST of every set containing {p, q, y}.

Proof : The points p, q, x, y are in convex position because the segments pq and xy

intersect. We consider the quadrilateral pxqy (see figure 3.7). Let us call the inner angles

in the quadrilateral at vertices p, q, x, y to be α,β,γ, and δ, respectively. Let us recall that

the longest edge of a triangle is opposite of the largest angle.

4More precisely, we do not design a property tester for the property of having no red-blue intersections. Our

algorithm might reject an input graph if its EMST-completion has no red-blue intersections. However, if

the input graph is the EMST then it is always accepted by our algorithm.

3.3 Euclidean Minimum Spanning Tree

Figure 3.7: A red blue intersection. The red edge is dotted.

If α < π

2and β < π

2then γor δis larger than π

2because α+β+γ+δ=2 π. Without

loss of generality, let γ > π

2. Then, segment pq is the longest edge of triangle pqx and

thus is cannot be the EMST of {p, q, x}. By Lemma 3.3.7 this is a contradiction to the fact

that (p, q)is an EMST edge. Hence we must have either α≥π

2or β≥π

If α≥π

2then segment xy is the longest edge in triangle pxy. Hence edge (x, y)is not

contained in the EMST of p, x, y. By Lemma 3.3.7 it is also not contained in every EMST

of a subset of Pthat contains p, x, y.

If β≥π

2then segment xy is the longest edge in triangle qxy. Hence edge (x, y)is not

contained in the EMST of q, x, y. By Lemma 3.3.7 it is also not contained in every EMST

of a subset of Pthat contains q, x, y.2

This lemma shows that each red-blue intersection has a “witness” consisting of one

point and one edge of the input graph. Our property tester for red-blue intersections is

similar to the DISJOINTNESS tester. We modify the disjointness property in the following

way: We say that two points p, q ∈Pintersect, if there is an (blue) edge e= (p, x)(or

e= (q, x)) incident to p(or q) such that the other point is a witness that eis not in the

EMST of P, that is eis not in the EMST of p, q, x. The following algorithm is a property

tester for red-blue intersections:

REDBLUETEST(G, )

Choose a set S0⊂Pof size 16p5n/ uniformly at random

Let S=S0∪N(S0)where N(S0)denotes the set of neighbors of points in S0

Let GSdenote the subgraph induced by S

if the EMST-completion of GShas a cycle then reject

else accept

The analysis of the algorithm is similar to the analysis of the algorithm DISJOINTNESS.

If the input graph is the EMST then there are no intersections among the points in P.

If the EMST-completion of the input graph is -far from having no red-blue intersections

then by Lemma 3.3.30 it follows that the point set Pis -far from being disjoint, if the

EMST-completion is -far from having no red-blue intersections. Thus we have:

3 Testing Algorithms for Geometric Properties

Lemma 3.3.31 Let Gbe a geometric graph for Pwith maximum degree 5. Algorithm

REDBLUETEST runs in time O(pn/ logn)and with the query complexity of O(pn/)

and accepts the input graph Gif it is the Euclidean Minimum Spanning Tree and rejects

the input with probability at least 2

3if the straight-line embedding of the EMST-completion

of Gis -far from having no red-blue intersection.

Proof : If G= (P, E)is the EMST then its EMST-completion is crossing-free. Thus

algorithm REDBLUETEST accepts G.

Let GCdenote the EMST-completion of Gand let us assume that GCis -far from

having no red-blue intersections. By Lemma 3.3.30 we can apply the following procedure

k=n/20 times: pick a pair of intersecting (with the definition from above) points

{p, q}=Wi,i∈[k], and remove all edges incident to pand qfrom GC. By the degree

bound, we have to remove at most 10 edges for each of the two vertices. Therefore, we

can apply this procedure at least ktimes.

In order to prove that REDBLUETEST rejects Gwith probability at least 2/3 if GCis

-far from having no red-blue intersections we show that with probability at least 2/3 one

of the pairs Wi,i∈[k], is in S0. We apply Lemma 2.4.1 and obtain:

Pr∃j∈[k] : (Wj⊆S0)≥1− (1−3/4)4≥2/3

It remains to show that the algorithm rejects, if there is a pair Wi⊆S0. If Wi={p, q}⊆S0

then there exists e= (p, x)(or e= (q, x)) such that eis not in the EMST of p, q, x. By

Lemma 3.3.7 eis not in the EMST of S. Hence Smust have a cycle and REDBLUETEST

rejects. 2

Finally, we can combine Lemma 3.3.29 and Lemma 3.3.31 to obtain.

Lemma 3.3.32 Let Gbe a geometric graph for Pwith maximum degree 5. There is an

algorithm that in time O(pn/ log(n/)) and with a query complexity of O(pn/)

accepts G, if Gis the EMST of Pand rejects Gwith probability at least 2

3if the straight-

line embedding of its EMST-completion is -far from crossing-free.

Proof : Let Gbe -far from having a crossing-free EMST-completion. Then, either the

straight-line embedding of Gis ε

2-far from crossing-free or the EMST-completion of Gis

2-far from having no red-blue intersections. Applying Lemma 3.3.29 and Lemma 3.3.31

with =ε

2shows that Gis rejected with probability at least 2

3. Since the tester for blue-

blue intersection and the tester for red-blue intersections both accept the EMST we have

completed the proof of Lemma 3.3.32. 2

Extension to Degree Bounded Graphs. Now we want to design a property tester

for arbitrary degree bounded input graphs. We first extend the planar map representa-

tion for well-shaped graphs to a planar map like representation for general graphs in the

following straightforward way:

For every vertex pin Gwe (cyclically) sort incident outgoing edges in clockwise order

around the vertex pwith respect to the Euclidean positions of the edges endpoints. For

sake of completeness we restate Definitions 3.3.10, 3.3.11, and 3.3.13 for general graphs

(these generalizations are immediate):

3.3 Euclidean Minimum Spanning Tree

Definition 3.3.33 Let Gbe a geometric graph for Pand let e

Gbe the corresponding planar

map like representation. For each edge e= [pqiin e

Gthe successor of eis the edge [q, ri

in e

Gwith rbeing the vertex incident to qthat precedes pin the adjacency list of q(sorted

in clockwise order around q); if qhas degree one, then r=p.

For edge e= [p, qiin Gthe kth successor,k≥0, is defined recursively as follows:

the 0th successor of [p, qiis [p, qiitself, and for k > 0, the kth successor of [p, qiis the

successor of the (k−1)st successor of [p, qi.

Similarly, edge eis the predecessor of edge e0if edge e0is the successor of edge e, and

eis the kth predecessor of e0if e0is the kth successor of e.

Definition 3.3.34 Let Gbe a geometric graph for Pand let C= (p0, p1, . . . , pk)be a

cycle in the planar map of the EMST-completion of G. Then Cis called topological, if

for any two consecutive edges on the cycle [pi, pi+1iand [pi+1, pi+2i,[pi+1, pi+2iis the

successor of [pi, pi+1i. We also call the corresponding cycle in Gtopological.

Definition 3.3.35 Let Gbe a geometric graph for P. For any vertex p∈P, we define

its topological k-neighborhood as the set of vertices that are the endpoints of the edges

that are either the ith successor, 0≤i≤k, of any edge incident to p, or are the jth

predecessor, 0≤j≤k, of any edge incident to p. The topological k-neighborhood of a

vertex vis denoted Ntop

G(v, k).

Then we recall that Lemmas 3.3.19 and 3.3.23 do not require that Gis connected not

that the EMST-completion. Hence it suffices to show that the bound of Lemma 3.3.12

carries over to general graphs:

Lemma 3.3.36 Let G= (P, E)be a geometric graph for Pthat is /200-close to con-

nected and /200-close to having a crossing-free straight-line embedding. If Gis -far

from EMST, then there are at least  n

100 -short topological cycles in the EMST-completion

of G.

Proof : Let GCdenote the EMST-completion of G= (V, E). Since GCis /200-close

to crossing-free we can delete a set EDof at most n/200 edges from GCto make it

crossing-free. Then we can insert a set EIof at most n/100 edges to make Gconnected

(and still keep its EMST-completion crossing-free). This is always possible since we can

insert EMST edges to connect the disconnected components. Let us denote the resulting

graph G0

C= (V, E \ED∪EI).

Using the same arguments as in the proof of Lemma 3.3.12 we obtain that there are at

least n/24 -short topological cycles in G0

C. Now we reverse our modifications of the

EMST-completion and we remove the edges EIfrom G0

C. We observe that the removal of

a single edge can destroy at most two cycles since the cycles are disjoint (in their planar

map representation). Hence the removal of EDdestroys at most n/50 -short topological

cycles. Then we reinsert the removed edges ED. Now each edge can destroy at most 2

cycles. Hence the re-insertion of EDdestroys at most n/100 -short topological cycles.

Counting the remaining cycles we get that at least n/100 -short topological cycles are

in GC.2

We conclude:

3 Testing Algorithms for Geometric Properties

Lemma 3.3.37 Let Gbe a geometric graph for Pwith maximum degree 5. Then there

is a property tester that in time O(log2(n/)·pn/)and with query complexity of

O(log(n/)·pn/)accepts the input if Gis an EMST of Pand rejects the input with

probability at least 2

3if Gis -far from EMST.

Proof : Follows from Lemma 3.3.19, Lemma 3.3.23, and Lemma 3.3.36. 2

3.3.6 A Property Tester in General Graphs

It remains to remove the degree bound condition for the input graph. In order to do this,

we first describe a low degree tester:

A property tester for low degree. We first define when a graph is far from having

low degree vertices:

Definition 3.3.38 A geometric graph G= (P, E)for Pis -far from having low degree if

one has to remove more than ·nedges in Gto obtain a graph having maximum degree

smaller than or equal to five.

If Gis not -far from having low degree, then we call it -close to having small degree.

It is easy to see that if Gis -far from having small degree, then there are at least

√·nvertices in Geither having degree greater than five or having an adjacent vertex

with degree greater than five. Therefore the simple algorithm that picks a random set Sof

4pn/ points in Pand tests if every point p∈Shas the degree smaller than or equal to

5and if so then it tests if every neighbor of p∈Shas degree smaller than or equal to 55,

will detect with probability greater than or equal to 2

3every geometric graph Gthat is -far

from having small degree.

Lemma 3.3.39 Let Gbe a geometric graph for P. There is a property tester that in time

O(pn/)and with the query complexity of O(pn/)accepts the input if Ghas the

maximum degree smaller than or equal to 5and rejects the input with probability at least

3if Gis -far from having small degree.

Proof : Clearly, our algorithm accepts every graph having maximum degree of five. Let

us assume that Gis -far from having small degree. A vertex that has degree more than

five or that is adjacent to a vertex with degree more than 5is called a heavy vertex. Let S

denote a sample of size 4√n chosen uniformly at random from P.

PrScontains no heavy vertex≤(1−1/pn/)4√n/ ≤1/3 .

It follows that PrScontains a heavy vertex≥2/3

Hence our algorithm rejects every graph that is -far from having small degree with prob-

ability at least 2/3. Thus it is a property tester. 2

5Since the degree of every p∈Sis less than or equal to 5, such a test can be performed in a constant time

per vertex p.

3.3 Euclidean Minimum Spanning Tree

The Tester for General Graphs. To obtain a tester for general graphs we have to

modify the tester for graph with degree bound of 5in the following way: We first test

whether the input graph is /2-far from having low degree (using the tester described be-

low). Then we run the property tester for graphs with maximum degree of 5after applying

the following modifications and with distance parameter ε=/2:

•If during the course of the algorithm we encounter a vertex with degree more than 5,

we immediately reject.

•For each vertex v∈Swe also include every neighbor of vin Ginto the sample

set. This can be done without asymptotically increasing the running time of the

algorithm (because we reject, if we encounter a vertex with degree more than 5).

Clearly, the above modifications do not affect the case when the input graph is the

EMST of the point set: The algorithm will still accept the input graph. Thus let us consider

the case when the input graph Gis -far from EMST. If the low degree tester rejects the

input graph, we are done. Thus let us assume that the input graph passes this test (and thus

is /2-close to having low degree):

Let us assume that Gis /2-close to having low degree but -far from EMST. We call

a vertex of Gadistinguisher, if it either has degree more than 5 or if it has a neighbor

whose degree is greater than 5. Now we define the graph G0to be a graph obtained from

Gby deleting a minimal set of edges such that G0has a maximum degree of 5. Since we

deleted less than n/2 edges from Gto obtain G0we conclude that G0is /2-far from

EMST.

Inorder to analyze the behaviorof themodified algorithm weconsider the(unmodified)

algorithm for graphs with maximum degree 5. First of all, we observe that the modified

algorithm always rejects, if there is a distinguisher in the sample chosen by the unmod-

ified algorithm. But if there is no distinguisher in the sample chosen by the unmodified

algorithm then the graph ’looks’ like the graph G0which has maximum degree 5 and is

/2-far from EMST. If we run the unmodified algorithm on input G0it rejects with prob-

ability 2/3. Thus the modified algorithm always rejects when the unmodified algorithm

rejects G0. We conclude that the modified algorithm rejects Gwith probability at least 2/3

because it either rejects or it behaves like the unmodified algorithm with input G0. Hence

we proved:

Theorem 4 There is a property tester for the EMST property with a running time of

O(√ n)·log2(n/)) and with a query complexity of O(√ n ·log(n/)).2

3 Testing Algorithms for Geometric Properties

4 Efficient Property Testers

In the previous chapter we have seen some property testers and their analyses. This leads

to the general question if it is possible to characterize all properties of a certain class of

objects (e.g., all properties of point sets) that can be tested efficiently. First of all, it is

necessary to specify what is meant by ”efficiently testable”: A property Πis efficiently

testable, if there is a property tester for Πthat has query complexity independent of the

size of the input (for constant distance parameter ). One of the major open questions in the

field is to characterize for certain classes of objects all properties that are efficiently testable

and there has been some effort to achieve this goal: For graph properties it is known that

all first order graph properties that do not have a ∀∃ quantification are efficiently testable

in the adjacency matrix model [5]. In the same model it has also been proven that a large

class of graph partitioning problems can be tested efficiently [51]. For matrix properties it

is known that certain monotonicity properties can be tested efficiently [45].

In this chapter we try to find a characterization of efficiently testable properties for

functions from an arbitrary finite domain Dinto an arbitrary range R. Using the very

general formulation of a property (see Definition 2.1.1) we can use our framework for a

large class of objects and - as shown in this chapter - also for a large class of properties. In

particular, we show that our framework can be used to prove that certain clustering prop-

erties over point sets, a reversal distance property over permutations, and the k-coloring

property over `-uniform hypergraphs can be tested efficiently. For some of these problems

a property testing algorithm has been analyzed before [4]. Our goal is here to show that

our framework is fairly general and that it leads to elegant proofs that highlight the combi-

natorial features of the problems. We can also show that every hereditary graph property

(every graph property that is closed under taking induced subgraphs) is efficiently testable,

if and only if there is a proof of its testability using our framework.

We emphasize that our results only hold for property testers with one-sided error. Our

result for hereditary graph properties is existential. It shows that our framework can be

applied to a large variety of problems. It would be very interesting to show that our charac-

terization can be used to show that certain large classes of functions are efficiently testable.

This is one of the major open questions regarding our framework.

4.1 Abstract Combinatorial Programs

In this section we introduce abstract combinatorial programs. An abstract combinatorial

program (ACP) is a combinatorial structure defined over a ground set of atom items. In

4 Efficient Property Testers

this combinatorial structure atom items may be arranged into sets. These sets describe

possible ”basic” configurations and they are called the bases of the abstract combinatorial

program. Further there is a relationship between atom items and bases: Each atom item

is either consistent with a given basis or it violates it. This relationship is described by a

violation function.

ACPs can be used to highlight combinatorial features of property testing problems.

In typical applications the ground set depends on the problem under consideration and

corresponds in a natural way to the basic items of the considered problem. When we want

to apply ACPs to graph problems then the ground set may be the set of vertices of the graph

and when we consider properties of point sets the ground set may be the set of points.

The set of bases describes possible “basic” configurations of the corresponding prob-

lem. If we consider a graph coloring problem then a basis may correspond to a subset of

vertices Wtogether with an associated coloring of W. For technical reasons we define

every basis as a pair (W, `), where Wis a subset of the ground set and `is an index de-

scribing a configuration of W(for example, in the graph-coloring example above, it is a

coloring of vertices in W).

The violation function describes for each basis band each atom item xif xis consistent

with bor not. If xis not consistent with bthen we say xviolates b. If every atom item

is consistent with a basis then we call this basis feasible. Normally, the violation function

depends on the input instance. For the graph-coloring example above one could define the

violation function such that a vertex vviolates a basis (colored vertex set W) if and only

if in the input graph the k-coloring of Wcannot be extended to a proper k-coloring of

W∪{v}.

Formally, we define an abstract combinatorial program in the following way.

Definition 4.1.1 An abstract combinatorial program (ACP) is a triple (C,B, $), where

• C is a finite set called ground set,

• B ⊆ {(W, `) : W⊆ C, ` ∈N}is a set of bases, and

•$:B→2Cis a violation function. For each basis b∈ B the set $(b)is the set of

atom items violating b.

A basis bis feasible if $(b) = ∅. An abstract combinatorial program is feasible if it has

a feasible basis.

We also would like to remark here that in the context of randomized incremental algorithms

configuration spaces have been introduced as an analysis tool (e.g., see [35]). Although

the structure of a configuration space is similiar to that of an ACP it would be misleading

to describe our framework in terms of configuration spaces as the intuitive meaning of the

corresponding components of the definitions differs widely.

We are now interested in the problem of testing the feasibility of ACPs. In order to do

so we need some further definitions. The first definition introduces the term dimension to

denote the maximum number of atom items involved in a basis. Then we denote by the

width of an ACP the maximum number of different bases with the same set of atom items:

4.1 Abstract Combinatorial Programs

Definition 4.1.2 (ACP Dimension) Let Pbe an abstract combinatorial program. The

dimension of P(denoted dim(P)) is defined as max{|W|: (W, `)∈ B}. The width of P

denoted (width(P)) is defined as max{`: (W, `)∈ B}.

Since we are interested in property testing we want to investigate in ”local” properties

of ACPs to conclude if the ACP as a whole is feasible or not. For this purpose we need

some further definitions. We say that a basis is feasible for a set of atom items, if none

of these items violates the basis. A basis is covered by a set of atom items, if the basis

contains only atom items from this set. And a set of atom items contains a self-feasible

basis, if there is a feasible basis that is covered by the set:

Definition 4.1.3 (Self-feasible bases) Let P= (C,B, $)be an abstract combinatorial

program. We say a basis b= (W, `)∈ B is covered by a subset C∗⊆ C if W⊆C∗. We

say that a basis bis feasible for a subset C∗⊆ C, if no c∈C∗violates b. We say a subset

C∗⊆ C contains a self-feasible basis if there is a basis bthat is covered by C∗and that is

feasible for C∗.

In the next section we want to design a property tester for feasibility of ACPs. We

assume that the algorithm has the possibility to determine for a set S⊆ C if Shas a self-

feasible basis or not. The size of the set Sshould be as small as possible (the size of this

set could be seen as the query complexity of the algorithm). Since a property tester has

1-sided error it is necessary that we can always determine if the input ACP is feasible.

But then this would require that an ACP that consists of a single feasible basis is always

accepted. The only chance to ensure this is to set S=C. Therefore, we consider only

monotone ACPs. If a monotone ACP Pwith dimension dim(P)is feasible then every

S⊆ C with |S|≥dim(P)has a self-feasible basis.

Definition 4.1.4 (Monotonicity) Let P= (C,B, $)be an abstract combinatorial pro-

gram with dimension dim(P).Pis called monotone if it is either not feasible, or if (it is

feasible and) every subset S⊆ C with |S|≥dim(P)contains a self-feasible basis.

4.1.1 Testing Abstract Combinatorial Programs

In this section we design a property tester for monotone ACPs. We first have to define when

an ACP is far from feasible. We do this directly without specifying a distance measure

between ACPs:

Definition 4.1.5 An abstract combinatorial program is -far from feasible if every basis

is violated by more than ·|C|objects from the ground set C.

A property tester for ACPs is an algorithm that (i) accepts every feasible ACP and (ii)

rejects with probability at least 2

3every ACP that is -far from feasible. In the following

theorem we analyze a property tester for certain ACPs:

4 Efficient Property Testers

Theorem 5 (Testing ACPs) Let P= (C,B, $)be an abstract combinatorial program

with dimension at most δand width at most ρ. Then there exists swith

s=Θ(−1·(δ·ln(δ/) + lnρ))

such that the following algorithm









ACP-TESTER(P, )

Sample a set Sof sobjects from Cuniformly at random

if Scontains a self-feasible basis then accept P

else reject P

1. accepts P, if Pis monotone and feasible,

2. and rejects Pwith probability at least 2/3, if Pis -far from feasible .

Proof : Let P= (C,B, $)be an ACP that is -far from feasible. Further let Phave

dimension at most δand width at most ρ. For a basis b= (W, `)let Ebbe the random

event (with respect to the random choice of S) that W⊆Sand that none of the elements

from $(b)is in S. Now, in order to prove the theorem it is sufficient to show that with the

probability larger than or equal to 2

3for none of b∈ B the event Ebholds.

For every r,0≤r≤δ, let ∆rbe the set of all b= (W, `)∈ B with |W|=r. Let us fix

an arbitrary b∈∆r. Then we have

Pr[Eb]≤(1−)n−r

s−r

n

s.

Since Phas dimension at most δand width at most ρ, we have |∆r|≤ρ·n

rfor every

r≥0. Furthermore, we have |∆r|=0for all r > δ. Therefore, by the union bound we

obtain

Pr[∃b∈ B :Eb]≤X

b∈B

Pr[Eb] =

r=0X

b∈∆r

Pr[Eb]

≤ρ·

r=0n

r·(1−)n−r

s−r

n

s

=ρ·

r=0n

r

n

s·(1−)n−r

s−r

=ρ·

r=0s

r(n−s)!

(n−r)! ·((1−)n−r)!

((1−)n−s)!

4.1 Abstract Combinatorial Programs

=ρ·

r=0s

r·((1−)n−r)···((1−)n−s+1)

(n−r)···(n−s+1)

≤ρ·

r=0

sr·(1−)s−r≤ρ·

r=0

sr·e−(s−r)

≤ρ·δ·sδ·e−(s−δ)≤ρ·δ·sδ·e−(s−s/2),

where we assume in the last inequality that s≥2δ. Then we set s0:= (δ−1ln(3δ ·ρ))3

and

s=2−1(δlns0+ln(3δ ·ρ)) .

With these choices we have

s=2−1(δlns0+ln(3δ ·ρ))

≤2−1·δ·lns0·ln(3δ ·ρ)

≤2(−1δln(3δ ·ρ))2

≤(δ−1ln(3δ ·ρ))3=s0

We further conclude

ρ·δ·sδ·e−(s−s/2)≤ρ·δ·sδ·(s0)−δ·(3δ ·ρ)−1

≤1/3

Hence, with probability at least 2

3all b= (W, `)∈ B with W⊆Sare violated by S, which

completes the proof of the first part of the theorem. If Pis monotone and feasible then

every set X⊆ C of size at least dim(P)contains a self-feasible basis. Therefore, Smust

contain a self-feasible basis because s≥dim(P). Hence the tester accepts the input. It

remains to show that

s=Θ(−1·(δ·ln(δ/) + lnρ))

We have

s=2−1(δlns0+ln(3δ ·ρ))

=Θ(−1(δ(ln(δ−1) + lnln(δρ)) + ln(δρ))

=Θ(−1(δ(ln(δ−1) + lnlnρ) + lnρ))

=Θ(−1·(δ·ln(δ/) + lnρ))

by the observation that for δ≥√lnρwe have δlnlnρ=O(δln(δ/)) and for δ < √lnρ

we have δlnlnρ=o(lnρ).2

Corollary 4.1.6 Algorithm ACP-TESTER is a property tester for monotone abstract com-

binatorial programs.

Proof : Follows immediately from Theorem 5. 2

4 Efficient Property Testers

4.2 Property Testing vs. Testing Abstract

Combinatorial Programs

Our motivation to introduce abstract combinatorial programs was to study its relation to

property testing problems. We now prove a theorem that shows how we can use abstract

combinatorial programs to prove for certain properties that there is an efficient property

tester. Roughly speaking, a property can be tested with small query complexity, if for

every problem instance there is an equivalent (in the sense of property testing) abstract

combinatorial program of small dimension and width.

We now present a first (simple) variant of the main theorem of this chapter. Then

we give some examples and discuss in detail how our theorem can be used to prove the

existence of a property tester with small query complexity. In most examples the obtained

algorithm has also a small running time.

Our approach of using the framework of abstract combinatorial programs to study

property testers of functions f∈ F is to reduce testing of fto testing certain ACPs. In

this section we consider only ACPs whose ground set Cis the domain Dof the function

f. Later in Section 4.5 we show how to deal with other cases. In order to show that a

property Πcan be tested with low query complexity we construct for every f∈ F an ACP

Pf.Pfmust satisfy the following constraints: If fis -far from Πthen Pfmust be -far

from feasible. The second constraint requires that if the function values of fon a set X

can be extended to a function in Π(and if Xhas a certain size) then Xcontains a feasible

basis. We now want to prove a special case of the main theorem of this chapter. We only

consider ACPs whose ground set Cis the domain Dof the tested function f.

Theorem 6 Let Fbe a set of functions from a finite set Dto a set Rand let Πbe a

property of F. Let 0<<1and let δ, ρ ∈N. If for every f∈ F there exists an ACP Pf

with dim(Pf)≤δand width(Pf)≤ρsuch that the following two conditions are satisfied:

(Distance Preserving) if fis -far from Πthen Pfis -far from feasible and

(Feasibility Preserving) for every X⊆ C with |X|≥δ: If there exists g∈Πwith

f|X=g|X, then Xcontains a self-feasible basis,

then there exists s=Θ(−1·(δ·ln(δ/) + lnρ))) such that the following algorithm is a

property tester for property Π:









TESTER(f, )

Sample a set Sof selements in Duniformly at random

if f|S=g|Sfor some g∈Πthen accept f

else reject f

Proof : In order to show that TESTER(f, ) is a property tester for Π, we have to prove

that every function having property Πis accepted by the tester, and every function that is

4.3 Clustering Problems

-far from having property Πis rejected with probability at least 2

3. If f∈Πthen for every

X⊆ C we have f|X=g|Xwith g=f∈Π. This immediately implies that every f∈Π

is accepted by TESTER(f, ). Therefore, it remains to prove that if fis -far from Π, then

the algorithm rejects the input with probability greater than or equal to 2

3. We prove this

by relating ACP-TESTER(P, ) to TESTER(f, ) and by applying Theorem 5.

By the Distance Preserving property, if fis -far from Πthen Pfis -far from feasible.

Furthermore, by Theorem 5, if Pfis -far from feasible then ACP-TESTER(Pf, ) rejects

Pfwith probability greater than or equal to 2

3.Pfis rejected by ACP-TESTER(Pf, )

only if the chosen sample set Scontains no self-feasible basis. But now the Feasibility

Preserving property implies that if there is g∈Πthat agrees with fon the sample set then

every set X⊆ C with |X|≥δ≥dim(Pf)contains a self-feasible basis. By the fact that

|S|≥δwe can conclude that Scontains a self-feasible basis, if there exists a g∈Πthat

agrees with fon the sample set S. Therefore, we can conclude that if fis -far from Π

then with probability at least 2

3there is no such g∈Πwith f|S=g|S. Hence, fis rejected

by TESTER(f, ) with probability at least 2

3. This implies that TESTER(f, ) is a property

tester for Π.2

4.3 Clustering Problems

So far we have introduced a framework that can be used to prove that certain properties can

be tested efficiently. Although in principle we can model many different property testing

problems in terms of our framework, it is not clear that we can use our framework to

prove that these properties have efficient property testers. For this reason we now consider

problems from different fields and show that our framework can be used to design and

analyze efficient property testers.

The first class of problems we consider are clustering problems. Clustering deals with

the problem to partition a set of items into different groups called clusters such that a given

optimization function is minimized. If we consider the corresponding decision problem we

want to know if a clustering with a given optimization value exists.

We consider two clustering problems of point sets in the Rd. The first problem is called

radius k-clustering. Here the goal is to partition a point set in the Rdinto kdifferent

clusters such that the maximum cost of the kclusters is minimized. The cost of a cluster is

given by the radius of the smallest ball enclosing the cluster. The second problem is called

diameter k-clustering. Again the goal is to minimize the maximum cost among a set of k

clusters but this time the cost of a cluster is given by the maximum distance between any

two points within a cluster.

Both problems have been analyzed in the context of property testing and it is known

that there are efficient property testers [4]. We merely want to show that these results

can also be achieved using our framework. The proofs we present are in the spirit of the

proofs from [4]. Yet we think that our proofs might give a clearer view of the important

combinatorial aspects that make these problems testable. In case of the diameter clustering

problem we also slightly improve the query complexity.

4 Efficient Property Testers

4.3.1 Radius clustering

The decision version of the radius k-clustering problem in the Euclidean space Rd[4, 41]

[57, p. 325] (sometimes also called the Euclidean k-center problem) is to verify whether a

given set Pof npoints in Rdcan be partitioned into ksets such that the points in each set

are contained in a unit ball. If such a partition exists, we say that Pis k-clusterable. We

assume that Pis in general position and, as usual, that the point set Pis represented as a

function f: [n]→Rd. Let Fdenote the set of functions representing point sets of size

nand let Π⊆ F denote the set of functions representing point sets that are k-clusterable.

The distance between two point sets is given by the standard distance measure between

functions (see Definition 2.1.3) which is consistent with the following definition:

Definition 4.3.1 A set Pof npoints in Rdis -far from being k-clusterable if more than

 n points must be deleted from Pto obtain a point set that is k-clusterable.

Now, in orderto useour framework fromTheorem 6we haveto describefor everyinput

point set Pin Rdan ACP P= (C,B, $)with C= [n]that satisfies the two conditions

of the theorem. Before we start thinking about the construction of Pwe observe that the

radius k-clustering property is a combinatorial property. This means that the ’number’

of a point in the representation is irrelevant for the property. Thus we can identify the

ground set Cof the ACP with the point set P. Hence a basis of such an ACP consists of a

small set of points from P(formally, of their corresponding indices) and some additional

information (formally encoded as an integer number).

Now we are ready to talk about how to construct the ACPs. Usually, the hard part is

to find the right set of bases. Typically, a basis is used to represent implicitly a ”possible

solution” to the problem. Usually, it should be the case that the set of bases corresponds

to the set of possible solutions of the problem. Additionally, it is required that this implicit

representation can be done in the form of some atom items from the ground set of the ACP

(in our case these items correspond to points from P) and some ”additional information”.

For simplicity, let us first consider the radius 1-clustering problem. If the point set P

is 1-clusterable then by definition it is contained in some unit ball. Vice versa, we can

describe every ’possible solution’ of the problem implicitly by the position of such a unit

ball. In a similar way we can consider every ball with radius at most 1as a possible

solution. Our goal is to describe a subset of these balls implicitly in terms of atom items

(points). If Pis 1-clusterable then this subset must contain a ball that contains every point

in P.

In the following we use the fact that every finite point set is contained in a unique

(closed) ball of smallest radius (see, e.g., [78]). We denote this ball by sball(P). With

every subset W⊆Pwe associate its smallest enclosing ball. If the radius of sball(W)

is at most 1then this ball can be interpreted as a ’possible solution’ of the problem and

W(formally, the pair (W, 1)) as a basis. So we could say that for each W⊆Pthe pair

(W, 1)is a basis, if and only if the radius of sball(W)is at most 1. But to get a query

complexity independent of nfrom our framework we need to reduce the number of atom

items involved in a basis. To do this we use the fact that there is always a subset W⊆P

of cardinality at most d+1such that sball(W) = sball(P)[78].

4.3 Clustering Problems

We say that (W, 1)is a basis, if the following two conditions are satisfied:

•|W|≤d+1,

•the radius of sball(W)is at most 1.

We can now define a natural violation function in the following form: A basis (W, 1)

is violated by all points that are not contained in sball(W). Using this definition we know

(a) that Phas a feasible basis if Pis 1-clusterable and (b) that every basis in Pis violated

by more than n points, if Pis -far from 1-clusterable.

We can easily extend this definition of a basis to kclusters: A basis for the radius

k-clustering problem consists of kbases for single clusters. Formally, a basis consists

of at most k(d+1)points from Pand an integer number that encodes a partition of the

k(d+1)points into ksets (of size at most d+1). The violation function is defined in the

straightforward way: A point violates a basis, if it is not contained in any of the smallest

enclosing balls defined by the bases. We have now done the ”tricky part” of the proof -

the design of the bases - what remains is straightforward verification of the requirements

of our framework.

Bases for radius clustering: We define the bases to specify all possible representa-

tions of feasible k-clusterings. Formally, we define the set of bases B={(W, `) : W⊆

[n],|W|≤(d+1)k, 1 ≤`≤k(d+1)k}, where the pair (W, `)∈ B should be interpreted

as follows:

•Wis the set of points defining ksmallest enclosing balls (clusters), and

•`is represented as a vector hν1, . . . , ν(d+1)kiof length (d+1)ksuch that for 1≤

i≤|W|, the ith point in Wdefines the smallest enclosing ball containing cluster

number νi∈[k].

We say a basis b∈ B is valid if every set W0of points defining one of the ksmallest

enclosing balls in bsatisfies that the radius of sball(W0)is at most 1.

It is easy to see that with such a definition of the bases, the abstract combinatorial

program designed for the radius clustering problem has dim(P)=(d+1)k=: δand

width(P) = k(d+1)k=: ρ.

Violation Function for Radius Clustering. We say p∈ C = [n]violates a basis

b∈ B if bis not valid or the point p(located at f(p)) is not contained in any of the kballs

defined by b. Furthermore, all non-valid bases are violated by all ground set elements.

Now, once we have defined formally the abstract combinatorial program Pfor every

instance of the radius clustering problem, we have to verify the prerequisites of Theorem 6:

The Distance Preserving and the Feasibility Preserving properties.

4 Efficient Property Testers

Distance Preserving Property. If Pis -far from being k-clusterable, then for every

set of kballs in Rdof radius at most one there is always a set of more than  n points in

Pthat are not contained in any of the balls. By definition every basis corresponds to such

a set of smallest enclosing balls and the violation function is defined according to these

balls. Thus every basis b∈ B must be violated by more than  n elements from C. This

implies the Distance Preserving property.

Feasibility Preserving Property. Every set S,|S|≥δ, that is k-clusterable is con-

tained in kunit balls. Thus we can partition Sinto kclusters S1, . . . , Skeach of which is

contained in a unit ball. Further there exists sets Wi⊆Siwith sball(Wi) = sball(Si)and

|Wi|≤d+1. The sets Widefine a certain basis in Bwhich is covered by Sand is feasible

for S. This implies the Feasibility Preserving property. Therefore, we can apply Theorem

6 to obtain a property tester for the radius clustering problem with a query complexity of

O(d k/).

Implementation. We also observe that we can implement the second statement of the

algorithm TESTER(f, ) efficiently in the following way: We compute whether the sample

set Sis k-clusterable. If it is, we accept the input. If it is not we reject. The correctness

follows immediately from the fact that if a point set Sis k-clusterable then there exists a

superset of Swith size nthat is also k-clusterable. We summarize our discussion in this

section with the following theorem.

Theorem 7 There is a property tester for the radius clustering problem with query com-

plexity of

O(d k −1ln(d k/)) = e

O(d k/).2

Finally, let us discuss what we learned from this example with respect to our frame-

work. We have seen that the bases of the constructed ACP correspond to possible solutions

of the problem. If we generalize this observation we can say, roughly speaking, that prop-

erties are testable if every possible (global) solution to the problem can be encoded using

a few ”atom items” (of the problem instance) and some additional information. Unfor-

tunately, things are not always that simple. In this first example we have a one-to-one

correspondence between the clustering problem and the ACP, i.e., a sample S⊆ C has a

self-feasible basis if and only if the corresponding point set is k-clusterable. This is not

always the case as we see in the next section.

4.3.2 Diameter clustering

The decision version of the diameter k-clustering problem ([4], [9, Problem ND54], [48,

Problem MS9], [57, p.326]) is defined as follows: Given a point set Pin Rdand a positive

integer k, can Pbe partitioned into kdisjoint sets (clusters) C1, . . . , Cksuch that for every

i,1≤i≤k, and every x, y ∈Ciit holds that dist(x, y)≤1. If such a partition exists,

we say that Pis k-clusterable. As always, we assume in the property testing setting that the

4.3 Clustering Problems

point set is represented by a function f: [n]→Rd. In this case we do not use the standard

distance measure. It has been shown in [4] that under the standard distance measure every

property tester must have a query complexity of Ω(√n). Instead we use the bicriteria

distance measure proposed in [4] which is defined in the following way:

Definition 4.3.2 [4] Let Pbe a point set in Rdand kbe a positive integer. We say P

is (, β)-far from being k-clusterable if for every partition of Pinto sets C0, C1, . . . , Ck

satisfying dist(x, y)≤1+βfor all 1≤i≤kand x, y ∈Ci, it holds that |C0|>  ·|P|.

It is known that under this distance measure there is a property tester with query com-

plexity e

Ok2

·2

β2dfor the diameter k-clustering problem [4]. Again our goal in this

section is to show that we can prove this result using our framework and improve the query

complexity of e

Ok

·2

βd. Our proof uses combinatorial arguments that appear implicitly

in the proof presented in [4]. Our goal is to design an efficient property tester that for given

k,and β>0(i) accepts every point set that is k-clusterable and (ii) rejects with proba-

bility at least 2

3every input that is (, β)-far from being k-clusterable. Again we denote by

Fall functions representing point sets of size nand by Π⊆ F all functions representing

point sets that are k-clusterable.

As in the case of the radius clustering problem we begin our discussion with the con-

struction of bases for a single cluster. We will see that we can generalize this definition

to the case of k-clusters in a similar way as in the radius clustering problem. We use the

following notation: A cluster Cis a non-empty set of points in Rdwith dist(x, y)≤1for

every x, y ∈C.

Definition 4.3.3 Let Cbe a cluster. The kernel kern(C)of Cis defined as the intersection

of unit balls with centers at the points in C.

We use the following simple properties of a kernel:

Claim 4.3.4 Let Cbe a cluster. Then we have:

1. C⊆kern(C).

2. There exists a unit ball containing all the points in C.

3. If p∈kern(C)then dist(x, y)≤1for every x, y ∈C∪{p}.

Proof : (1) Since dist(x, y)≤1for every x, y ∈C, each point x∈Cis contained

in every unit ball with the center at any other point y∈C, and hence, xis contained in

kern(C). (2) Since C6=∅, from the previous property we get kern(C)6=∅. Let us pick

any point x∈kern(C). Since xis contained in all unit balls with centers at the points

in C, it is at the distance at most 1from every point in C. Therefore all points in Care

contained in the unit ball with the center at x. (3) If p∈kern(C), then pis contained in

all unit balls with centers at the points in Cand thus its distance to every point in Xis at

most 1.2

4 Efficient Property Testers

Now, in order to use our framework from Theorem 6 we have to describe for every

input set Pof npoints in Rdan ACP P= (C,B, $)with C= [n]that satisfies the

preconditions of the theorem. Following the arguments from the radius clustering case we

can again identify the ground set Cof the ACP with the input point set P. Thus we can

again construct our bases using points from Pas atom items. We first consider the case

k=1:

We start by making an observation about the kernel of a cluster C: If we add a point

pto Cthen Cremains a cluster (i.e., the pairwise distance between points from Cis at

most 1) if and only if pis in the kernel of C. Thus a good basis would contain exactly

those elements from Cthat define the boundary of kern(C). Unfortunately, it might be

the case that almost every element of Cdefines the boundary of C. So this approach is

doomed to failure. But we can do the following: We can find a small set of points Win C

that approximates the kernel of C. We choose a basis for a cluster Cto be every maximal

subset W⊆Cwith the property that all points in Whave a mutual pairwise distance of

more than β. This way, we ensure that (a) the kernel is well approximated and (b) that the

number of points defining the kernel of a cluster is small. We now prove that every basis

for a single cluster is defined by no more than (1+2

β)dpoints.

Lemma 4.3.5 Let Cbe a cluster and let Wbe a subset of Csuch that for every p, q ∈W

dist(p, q)> β > 0. Then |W|≤(1+2/β)d.

Proof : Let Cand Wbe as stated in the lemma. If we draw balls of radius β/2 centered

at every point in W, then all these balls are pairwise disjoint. By Property 2 in Claim

4.3.4, all points in Ware contained in some unit ball because W⊆Cand Cis a cluster.

Therefore, if we draw balls of radius β/2 centered at every point in W, then all these balls

are contained in a ball of radius 1+β/2. Since all these small balls are disjoint, we have

the following upper bound for the size of W:

|W|·Vol(ball of radius β/2)≤Vol(ball of radius 1+β/2),

where Vol() denotes the volume of the object. Since the volume of a d-dimensional ball of

radius ris equal to1rd·πd/2

Γ(1+d/2), we obtain an upper bound for the size of Wof (1+ (2/β))d.

To formalize our definition of bases we need:

Definition 4.3.6 Let Pbe a point set in Rd, and βa positive real. Let Cbe a cluster.

We say a point p∈Pis β-covered by Cif p∈kern(C)and there is q∈Csuch that

dist(p, q)≤β.

Similar to the radius clustering problem we want to represent every possible solution

to the diameter 1-clustering problem by a basis. As already mentioned we can do this

1Here, Γ() is Euler’s Gamma (factorial) function, that is formally defined as Γ(x) = R∞

0tx−1e−tdt for all

positive x. It is well known that Γ(x+1) = x Γ(x)and that for integer x≥0we have Γ(x+1) = x!and

Γ(x+1

2)! = √π·((2 x)!)/(x!·4x).

4.3 Clustering Problems

only approximately. Therefore we say that a pair b= (W, `)is a basis for the diameter

1-clustering problem if Wis a subset of Pof size at most (1+2

β)dand `=1. We say

that a basis is valid, if for all p, q ∈Wwe have β≤dist(p, q)≤1. It remains to define

the violation function. If a basis is not valid it is violated by every point in P. If a basis

is valid, we have two different types of violation: First of all a point pviolates a basis

b= (W, 1), if p /∈kern(W). This is the straightforward type of violation. If a point

is not in the kernel of Wthen it cannot belong to the same cluster. The second type of

violation is different. A point p∈kern(W)violates b, if pis not β-covered by W. Here

a point violates a basis if it is consistent with the current basis of the cluster, but it changes

the kernel significantly. But if the kernel is changed significantly then the basis bis no

longer ”a good implicit description of the solution of the clustering problem”. In this case,

we can obtain a new basis b0= (W∪{p}, 1)from bthat approximates the new kernel.

Bases for diameter clustering. We can extend our definition of bases for the diam-

eter 1-clustering problem to arbitrary kin the following way: A basis for the diameter k-

clustering problem is an encoding of ksets W1, . . . , Wk⊆Peach of size at most (1+2

β)d.

A basis is valid if for every p, q ∈Wi,1≤i≤k, it holds that β≤dist(p, q)≤1.

Formally, a basis is a set W=SiWiwith an integer encoding the partition of Win the

sets Wi.

Lemma 4.3.7 Any ACP Pfor the diameter k-clustering problem has dimension at most

k·(1+2/β)dand width at most kk·(1+2/β)d.

Proof : The dimension of the ACP follows immediately from the definition of the bases.

The width follows from the fact that every point of a basis can belong to one of ksets, that

is, we have (at most) kchoices for each point of the basis. 2

Violation function for diameter clustering. A basis bthat is an encoding of the

sets W1, . . . , Wkis violated by a point p∈P, if bis not valid or if pviolates every Wi

(seen as a basis for the 1-clustering problem), 1≤i≤k.

Feasibility Preserving property. In order to show the Feasibility preserving prop-

erty we have to show that every k-clusterable set S⊆Pof size at least δ:= k·(1+2/β)d≥

dim(P)has a self-feasible basis. If Sis k-clusterable then there exists a partition of Sinto

kclusters C1, . . . , Ck. Since for every Wi⊆Ciit holds that kern(Ci)⊆kern(Wi)

we know that there exists sets Wiwith the property that for each p, q ∈Wiwe have

β≤dist(p, q)≤1and for each p∈Ciand q∈Wiwe have dist(p, q)≤β. By

Lemma 4.3.5 and the size of the bases the Feasibility Preserving property follows.

Distance Preserving property. We prove the Distance Preserving property by con-

tradiction. Let us assume Pis (, β)-far from being k-clusterable and suppose there is a

basis bencoding the sets W1, . . . , Wkthat is violated by less than  n points. We delete

all points in Pthat violate band let P∗be the remaining point set. Since all the points in

4 Efficient Property Testers

P∗are β-covered by some Wi, for each point p∈P∗there is a Wiwith p∈kern(Wi)

and for which there exists qp∈Wiwith dist(p, qp)≤β. We assign each such a point

pto the cluster corresponding to Wi. Observe that all points in the cluster are contained

in kern(Wi). Furthermore, for every point r∈kern(Wi)the distance between pand

ris not larger than the distance from pto qpplus the distance from qpto r. Hence, we

can conclude that the distance between two points in the cluster (both of which must be

contained in kern(Wi)) is at most 1+β. This implies that P∗can be partitioned into k

clusters of diameter at most 1+βeach, which is a contradiction.

Implementation. Similar to the radius clustering problem we can implement the sec-

ondstatement ofthe algorithm TESTER(f, ) bychecking ifthe sampleset Sis k-clusterable.

This follows from the fact that every set k-clusterable set Sof size at most ncan be ex-

tended to a set Pof size nthat is clusterable. On the other hand, every set Sthat is not

k-clusterable cannot be extended to a set Pthat is k-clusterable. Now, by our discussion

above, we can apply Theorem 6 to obtain the following result.

Theorem 8 There is a property tester for the diameter k-clustering problem with a query

complexity of

O(k·−1·(1+ (2/β))d)

4.4 Reversal Distance

In the previous section we have seen how to apply our framework to certain clustering

problems. We have seen that a clustering problem is efficiently testable, if we have an im-

plicit characterization of the cluster consisting of a small number of input objects and some

additional information. We now want to consider a different problem which is called the

reversal distance problem. Determining the reversal distance between two permutations is

a fundamental problem in computational biology. It has been introduced in the pioneering

works by Sankoff and later on many researchers have investigated in this problem (see,

e.g., the survey in [71, Chapter 10]). The problem to compute the reversal distance be-

tween two permutations can be reduced to the problem of computing the distance between

one permutation and the identity permutation (details below). Therefore, the problem is

also called ’sorting by reversals’.

In sorting by reversals one is asked to compute the shortest sequence of (interval)

reversals that transforms a given permutation πinto the identity permutation. The number

of reversals that are necessary is called the reversal distance between πand the identity

permutation. Because of its applications in computational biology, sorting by reversals has

been widely studied in the recent years (see, e.g., [10, 17, 22, 59, 71, 72]). It is known that

sorting by reversals is NP-hard [22], its optimization version is MAX-SNP-hard [18], and

that there exits a polynomial-time 1.375-approximation algorithm [17] (see also [10, 59]).

We now introduce the reversal distance problem, formally. Let Sndenote the set of all

permutations of [n].

4.4 Reversal Distance

Definition 4.4.1 A reversal ρhi, jiof an interval [i, j],1≤i≤j≤n, is the permutation

1 2 . . . i −1ii+1 . . . j −1 j j +1 . . . n

1 2 . . . i −1 j j −1 . . . i +1ij+1 . . . n .

That is, for a permutation π= (π1, . . . , πn)∈ Sn,ρhi, jihas the effect of reversing

the order of (πi, πi+1, . . . , πj)and transforming πinto

π·ρhi, ji= (π1, . . . , πi−1, πj, πj−1, . . . , πi, πj+1, . . . , πn)

The reversal distance between two permutations πand σis the minimum number of

reversals that is necessary to transform πinto σ.

Definition 4.4.2 Given a pair of permutations π= (π1, . . . , πn), σ = (σ1, . . . , σn)∈ Sn,

the reversal distance drev(π, σ)between πand σis the minimum number of reversals

needed to transform πinto σ(that is, the minimum number ksuch that there exists a

sequence of reversals ρ1, ρ2, . . . , ρkwith π·ρ1·ρ2···ρk=σ).

Equivalently, we can compute the number of reversals necessary to transform σ−1πinto

the identity permutation id. Therefore, we are interested in the reversal distance between

a given permutation πand the identity permutation. We want to consider the decision

version of the reversal distance problem.

Definition 4.4.3 The reversal distance problem is to decide for a given permutation π∈

Snand an integer kif the reversal distance drev(π, id)between πand the identity permu-

tation id is at most k.

We now want to formulate the problem as a property testing problem that fits into our

framework. The set of functions Fwe want to consider is the set of permutations of [n],

that is, F=Sn. We are interested in all permutations that have a reversal distance of at

most kto the identity permutation. We can write this as a property Πas follows:

Π={π∈ Sn:drev(π, id)≤k}.

Now that we have defined the property we need a distance measure between permuta-

tions. Here we can use the standard distance measure from Definition 2.1.3. Equivalently,

we can use the following definition:

Definition 4.4.4 A permutation π∈ Snis -far from having reversal distance smaller

than or equal to kif for every sequence of kreversals ρ1, ρ2, . . . , ρkthe permutation

π·ρ1·ρ2···ρkdisagrees with the identity permutation on more than ·nplaces, that is,

if π·ρ1·ρ2···ρk= (σ1, . . . , σn)then |{i∈{1,2,...,n}:σi6=i}| >  ·n.

4 Efficient Property Testers

In contrast to the clustering problems the reversal distance property is obviously not

combinatorial. We conclude that we have to take the domain of the function into account.

For a permutation π= (π1,··· , πn)we denote atom items by πi. This notion covers the

fact that the value of domain element iis πi=π(i). Hence it captures also the domain of

a value of f.

Let us notice that we can encode an interval [i, j]by the two domain elements πiand

πj(using the fact that π−1(πi) = iand π−1(πj) = j). If we apply a reversal ρto πthen

πiand πjinduce the interval ((π·ρ)−1)(πi),((π·ρ)−1)(πj)(for this reason we want to

work with πirather than with i). We denote the interval induced by two elements πiand

πjby [πi, πj].

We say a reversal ρhr, sisplits an interval [πi, πj], if i<r≤jor i≤s<j(or both).

Definition 4.4.5 Let π= (π1, . . . , πn)be a permutation and let [πi, πj]denote an interval.

We say that a reversal ρhk, `isplits an interval [πi, πj], if i<k≤jor if i≤`<j.

We generalize this notion to k-reversals:

Definition 4.4.6 Let π= (π1, . . . , πn)be a permutation. A k-reversal ρ=ρ1·ρ2···ρk

splits an interval [πi, πj], if there exists `,0≤`<k, such that ρ`+1splits

[(π·ρ1···ρ`)−1(πi),(π·ρ1···ρ`)−1(πj)] .

If ρdoes not split [πi, πj]then we say ρis safe for [πi, πj].

Notice that if ρ1, . . . , ρkis safe for [πi, πj]then each of the reversals ρ1, . . . , ρkeither

entirely contains [πi, πj]or it does not contain any π`∈[πi, πj]. Therefore, in this case,

after applying ρ1, . . . , ρkthe positions of πi+1, . . . , πj−1are determined by the position of

πiand πj.

Bases for the k-reversal problem: The idea is now to define a basis as a set of

2k +1intervals induced by pairs of the atom items of the basis. For each such set we then

consider only reversal that are safe for these intervals. We say that a set of 2k +1intervals

is a basis if a sequence of kreversals ρ1,...ρkexists such that for each atom item πiof

the basis it holds that (π·ρ1···ρk)−1(πi) = πiand if ρ1,...ρkis safe for all intervals

induced by the elements involved in the basis.

Definition 4.4.7 (Bases for k-reversal distance) Let π= (π1,··· , πn)∈ Sn. A set Iof

2 k +1intervals is a valid basis for the reversal distance problem if there is a sequence

ρ1, . . . , ρkof kreversals such that

•(π·ρ1···ρk)−1(πi) = πiand (π·ρ1···ρk)−1(πj) = πjfor each interval [πi, πj]∈

I, and

•no interval [πi, πj]∈Iis split by ρ1, . . . , ρk.

If the set of intervals is a basis b, then we associate with it the k-reversal ρb=ρ1···ρk

(in case that there are different sequences that witness the basis property we choose an

arbitrary one).

4.4 Reversal Distance

Formally, a basis consists of 4k +2atom items and an integer number encoding the

2k +1pairs of atom items (an atom item may be paired with itself). The integer number

can be seen as a vector of length 2k +1having entries with values from [4k +2]×[4k +2]

to specify each of the 2k +1pairs.

Lemma 4.4.8 For each instance of the reversal distance problem the corresponding ACP

Phas dim(P)≤4k +2and width(P)≤(4k +2)4k+2.

Proof : By definition a basis consists of 4k +2atom items. Thus we have dim(P)≤

4k+2. Each vector of length 2k+1with values from [4k+2]×[4k+2]can be encoded as

an integer number between 1and (4k +2)4k+2. Thus we have width(P)≤(4k +2)4k+2.

Violation function for k-reversal distance: Let bbe a basis and let ρb=ρ1···ρk

bethe k-reversal associated with basis b. We say bisviolated by πi∈ C, if (π·ρb)−1(πi)6=

πi, that is, πiis not moved to position πiwhen ρbis applied to π.

Distance Preserving Property: We have associated a k-reversal ρbto each basis.

An atom item violates a basis, if it is not at the correct position when ρbis applied to π. If

a permutation is -far from having reversal distance smaller than or equal to kthen every

k-reversal puts more than n elements to the wrong position. Hence every basis is violated

by more than  n elements. Therefore, the distance preserving property is satisfied.

Feasibility Preserving Property: The hard part in this case is to prove the Feasibility

Preserving property. Let S⊆ C be a set of atom items and let ρ=ρ1···ρkbe a k-reversal

with (π·ρ)−1(πi) = πifor each πi∈S. We show that in this case Shas a self-feasible

basis. First we want to construct a set of 2k +1intervals that are safe for ρ1,...ρk. We

start with the set of s−1intervals induced by S. now we observe that each reversal ρican

split at most 2of these intervals. We conclude that at most 2k of these intervals are split by

ρ1,...ρk. We can merge adjacent intervals not split by ρ1,...ρkand obtain a set of 2k+1

intervals that are not split by ρ1,...ρk. Hence there exists a sequence of kreversals that

is safe for each of our intervals. Thus these intervals form a basis b. It remains to prove

that this basis is not violated (the k-reversal associated with the basis must not be the k-

reversal ρ). By our construction of the intervals each πi∈Sis contained in a safe interval.

Therefore, its position after applying the reversal is uniquely determined by the positions

of the endpoints of the interval. Let SI⊆Sdenote the set of endpoints of intervals of the

basis b. Since bis a basis there is a k-reversal ρbwith

(π·ρb)−1(πi) = πi= (π·ρ)−1(πi)for each πi∈SI

Since the endpoints are mapped to the same places when ρband ρare applied to πwe can

conclude that each other point in Sis also mapped to the same place. Hence no πi∈S

violates band we have shown the Feasibility Preserving property.

Once we have proven the Distance and the Feasibility preserving property, Lemma

4.4.8 and Theorem 6 allow us to conclude with the following theorem.

4 Efficient Property Testers

Theorem 9 There exists a property tester for the k-reversal distance property with query

complexity e

O(k/).2

4.5 Property Testing vs. Testing Abstract

Combinatorial Programs (continued)

In Section 4.2, we described a framework for testing problems via testing abstract combi-

natorial programs. We only considered ACPs whose ground set is equivalent to the domain

of the tested function. Although we showed that this framework can be applied to differ-

ent problems, it is not always powerful enough. In some cases it is necessary to consider

ground sets different from the domain of the tested function. For example, when we con-

sider graph problems we want to identify the ground set with the set of vertices of the

graph. If we consider the adjacency matrix model using the approach from Section 4.2 and

we represent a graph by a function f:V×V→{0, 1}then the ground set would be a set

of entries in the adjacency matrix. But sampling entries of the adjacency matrix results in

a disconnected set of edges, if the size of the sample set is o(√n)[51]. In order to have a

more flexible model we introduce interpretations.

Interpretations. Interpretations are functions that map each subset of the ground set of

the ACP to a subset of the domain of the tested function. Given a sample set S of ground

set items we use the interpretation to determine a set of domain elements DS. Then we

query for the value f(x)for each x∈ DS.

Definition 4.5.1 An interpretation of Cin Dis a function I:2C→2D.

To investigate quantitative properties of the reduction we need the following definition.

Definition 4.5.2 For a function h:N→N, we say an interpretation Iof Cin Dis h-

bounded if for every X⊆ C it holds |I(X)|≤h(|X|). (We write in that case that Iis

h(N)-bounded, with Nbeing the formal input variable.)

The main idea behind introducing these notions is to allow a more general analysis of

algorithm TESTER(f, )from Section 4.2. Similarly to the proof of Theorem 6, we want

to test an input function f∈ F via testing a related ACP P= (C,B, $). Since Pis now

allowed to be an ACP with arbitrary ground set C, we use the interpretation Iof Cin D

to link the domains of fand Pin the reduction. The notion of h-bounded functions in

Definition 4.5.2 is used to describe the size of the random sample in the tester. That is, if

the interpretation Iis h(N)-bounded and if our algorithm samples a set S⊆ C then we

query for the value of f(x)for every x∈I(S)⊂ D and the restriction |I(S)|≤h(s)yields

an upper bound on the query complexity.

4.5 Property Testing vs. Testing Abstract Combinatorial Programs (continued)

Distance Preserving Property. In Theorem 6 we used the Distance Preserving prop-

erty that requires that if a function fis -far from property Πthen the ACP is -far from

feasible. In general, however, one can parameterize this property and require the (, λ)-

Distance Preserving property: if fis -far from property Πthen the ACP is λ-far from

feasible.

Now, in the framework defined above, it is easy to see that Theorem 6 can be general-

ized to the following theorem, which describes our framework in its full generality.

Theorem 10 Let Fbe a set of functions from a finite set Dto a set R, and let Πbe a

property of F. Let 0 < , λ < 1 and let I:2C→2Dbe an h-bounded interpretation of

Cin D. If for every f∈ F there exists an ACP Pfwith dim(Pf)≤δand width(Pf)≤ρ

such that:

((, λ)-Distance Preserving) if fis -far from Πthen every basis in Pfis λ-far from

feasible, and

(Feasibility Preserving) For every X⊆ C with |X|≥δ: If there exists g∈Πwith

f|I(X)=g|I(X)then Xcontains a self-feasible basis,

then there exists s=Θ(λ−1·(δ·ln(δ/λ) + lnρ)) such that the following algorithm is a

property tester for Πwith query complexity h(s):









TESTER(f, )

Sample a set Sof selements in Cuniformly at random

if f|I(S)=gI(S)for some g∈Πthen accept f

else reject f

Proof : In order to show that TESTER(f, ) is a property tester for Π, we have to prove

that every function having property Πis accepted by the tester, and every function that is

-far from having property Πis rejected with probability at least 2

3. If f∈Πthen for every

X⊆ C we have f|I(X)=g|I(X)with g=f∈Π. This immediately implies that every f∈Π

is accepted by TESTER(f, ). Therefore, it remains to prove that if fis -far from Π, then

the algorithm rejects the input with probability greater than or equal to 2

3. We prove this

by relating ACP-TESTER(P, ) to TESTER(f, ) and by applying Theorem 5.

By the Distance Preserving property, if fis -far from Πthen Pfis -far from feasible.

Furthermore, by Theorem 5, if Pfis -far from feasible then ACP-TESTER(Pf, ) rejects

Pfwith probability greater than or equal to 2

3.Pfis rejected by ACP-TESTER(Pf, )

only if the chosen sample set Scontains no self-feasible basis. But now the Feasibility

Preserving property implies that if there is g∈Πthat agrees with fon the interpretation of

the sample set then every set X⊆ C with |X|≥δ≥dim(Pf)contains a self-feasible basis.

By the fact that |S|≥δwe can conclude that Scontains a self-feasible basis, if there exists

ag∈Πthat agrees with fon the sample set S. Therefore, we can conclude that if fis -far

from Πthen with probability at least 2

3there is no such g∈Πwith f|I(S)=g|I(S). Hence, f

is rejected by TESTER(f, ) with probability at least 2

3. This implies that TESTER(f, ) is

a property tester for Π.2

4 Efficient Property Testers

4.6 Graph Coloring

In this section we apply Theorem 10 to graph coloring. A k-coloring of a graph G= (V, E)

is an assignment χ:V→{1,...,k}of colors to the vertices of the graph. A coloring is

proper if there is no edge e= (v, u)∈Esuch that χ(v) = χ(u). If Ghas a proper k-

coloring, then Gis k-colorable. The graph k-coloring problem is to decide whether a given

graph is k-colorable. It is a classical problem in algorithmic graph theory. It is known that

for k≥3the problem of verifying if an input graph is k-colorable is NP-complete (see,

e.g., [48, Problem GT4] or [9, Problem GT5]). It is also well known that this problem is

very hard to approximate, and so, for example, it is NP-hard to 4-color 3-colorable graphs

and it is hard to color k-colorable graphs with approximation within n1−, and even within

n1−O(1/√loglogn)[38] where ndenotes the number of vertices in the graph. The best known

approximation bound for arbitrary kis of O(n(loglogn)2/log3n)[9, Problem GT5].

We want to consider the graph coloring problem in the adjacency matrix model. That

is, the input graph G= (V, E)is given as a function V×V→{0, 1}representing the

adjacency matrix of the graph. Thus we have f(u, v) = 1, if and only if (u, v)∈E.

W.l.o.g. we assume that V= [n]. Let Πdenote all n-vertex graphs that have a proper

k-coloring. We use the standard distance measure between graphs (see Definition 2.1.3)

which is equivalent to the following definition:

Definition 4.6.1 A graph Gis -far from being k-colorable if in order to transform Ginto

ak-colorable graph one has to modify more than  n2entries in the adjacency matrix of

It is known that graph coloring in the adjacency model can be tested efficiently [51] and

the proof we present uses combinatorial arguments that essentially appear in the proof

presented in [51]. The achievement of our framework is again a clear and elegant proof

that highlights the combinatorial aspects of the problem and a slight improvement in the

query complexity that matches for k>2the bounds from [6] in the e

O-notation.

For the k-coloring problem, we identify the ground set Cwith the set of vertices V

of the input graph G= (V, E). Since in our framework Gcan be viewed as given by its

adjacency matrix representation, we define the interpretation Ito map each set of vertices

to the submatrix induced by these vertices. That is, for every W⊆V, we have I(W) =

W×W. Clearly, the interpretation is N2-bounded.

The bases of the coloring problem are formed by some properly colored sets of vertices.

That is, everybasis correspondsto apair (W, χ∗), where W⊆V=Candχ∗isan encoding

of a proper k-coloring of W(here, one can think that a k-coloring of Wis represented by

a vector of length |W|with values in kW). Notice that with this definition, if for each

(W, χ∗)∈ B we have |W|≤δ, then so defined abstract combinatorial program Phas

dim(P)≤δand width(P)≤kδ.

Before we define the bases formally, we need some more definitions:

Definition 4.6.2 Let S⊆Vbe a set of vertices and let χbe a proper k-coloring of S. Let

Vi=v∈V:∃u∈Swith χ(u) = iand (v, u)∈E

4.6 Graph Coloring

be the set of vertices that cannot be properly colored in color iusing any extension of χ

to V. We call a vertex v∈V\Sheavy for hS, χiif there is a proper extension of χto

S∪{v}, but every extension increases the number of vertices in certain Vi,1≤i≤k, by

at least  n/3 (that is, (i) there exists 1≤j≤k, with v /∈Vjand (ii) ∀1≤j≤kif v /∈Vjthen

{w /∈Vj: (v, w)∈E}≥ n/3).

A vertex v∈V\Sthat cannot be properly colored by any extension of the coloring χ

(that is, v∈Tk

i=1Vi) is called a conflict vertex for hS, χi.

Bases for k-coloring: For the graph coloring problem we define the bases inductively:

•{∅, 1}is a basis (where 1is the encoding of the coloring of the empty set of vertices)

and

•if b= (K, χ)is a basis, vis a heavy vertex for band χ∗is an encoding of the

previous coloring χof Kextended by a proper coloring of v, then (K∪{v}, χ∗)is a

basis.

Our next step is to show that the ACP we define this way has small dimension and

width:

Lemma 4.6.3 For every input instance of the graph coloring problem the corresponding

abstract combinatorial program Phas dim(P)≤3k/ and width(P)≤k3 k/.

Proof : Since every k-coloring of a set of rvertices can be encoded using an integer

number between 1 and kr, it is enough to show that |K|≤3 k/ for every basis b= (K, χ).

Let b= (K, χ)be a basis with |K|=r. Since bis a basis, there must exist a sequence

of bases (K0, χ0),(K1, χ1),. . .,(Kr, χr)with (K0, χ0)=(∅, 1),(Kr, χr)=(K, χ), and

such that for every 1≤i≤rwe have |Ki|=iand the only vertex in Ki\Ki−1is heavy

for hKi−1, χi−1i.

Let V(i)

1, V(i)

2, . . . , V(i)

k,0≤i≤r, be the sets of vertices such that each vertex v∈V(i)

cannot be properly colored in color jif we want to extend coloring χito the set Ki∪{v}.

That is,

V(i)

j=v∈V:∃u∈Kiwith χi(u) = jand (v, u)∈E.

It is easy to see that for every i, j we have V(i)

j⊆V(i+1)

j. Furthermore, by the definition of

heavy vertices, we know that for every i,1≤i≤r, there is certain j,1≤j≤k, such

that |V(i)

j|≥|V(i−1)

j|+ n/3. Therefore, since we have V(0)

j=∅for every j,1≤j≤k, it

must hold that Pk

j=1|V(i)

j|≥i  n/3 for every i,1≤i≤r. Finally, since |V(r)

j|≤nfor

every j,1≤j≤k, we conclude that r≤3 k/.2

Violation function for k-coloring: A basis b= (K, χ)is violated by a vertex v∈V

if either (i) vis a heavy vertex for hK, χior (ii) vis a conflict vertex for hK, χi.

Once we have described k-coloring in our framework, in order to apply Theorem 10,

we must show that the (, λ)-Distance Preserving and the Feasibility Preserving properties

hold. We begin with the proof of the following lemma that implies the (, λ)-Distance

Preserving property with λ=/3.

4 Efficient Property Testers

Lemma 4.6.4 ((, /3)-Distance Preserving property) Let G= (V, E)be a graph that

is -far from being k-colorable and let S⊆Vbe any set of properly k-colored vertices

with a proper coloring χ. Then, Vcontains more than  n/3 conflict vertices for hS, χior

Vhas more than  n/3 heavy vertices for hS, χi.

Proof : Our proof is by contradiction. Assume Gis -far from having a proper k-coloring

and there are less than  n/3 conflict vertices for hS, χiand less than  n/3 heavy vertices

for hS, χi. Then, we show that there is a k-colorable graph G∗that is obtained from Gby

removing less than  n2edges. This is a contradiction.

Let Xbe the set of all conflict vertices for hS, χi, let Ybe the set of all heavy vertices

for hS, χi, and let Zbe the set of remaining uncolored vertices (i.e., Z=V\(S∪X∪Y)).

For every i,1≤i≤k, let Vibe the set of vertices in Vsuch that for every v∈Vithe

extension of χby coloring vwith color iis not a proper coloring (cf. Definition 4.6.2).

We first construct a graph G0by removing all edges incident to the vertices in X∪Y

and extend the coloring χof Sto a k-coloring of S∪X∪Yby coloring the vertices in

X∪Yarbitrarily. Since |X∪Y|< 2  n/3, less than 2  n2/3 edges are removed from G

in this way. Furthermore, since all vertices in X∪Yare isolated, the obtained coloring χ0

is a proper k-coloring of S∪X∪Y.

Now, we modify G0to extend the coloring χ0to all vertices in Z. For each v∈Z, let

τ(v)be a color that satisfies the following two constraints:

•If we extend χ0to S∪{v}and we color vwith τ(v)then the resulting k-coloring is

proper, and

•If we extend χ0to S∪{v}and we color vwith τ(v)then the absolute increase in the

size of Vτ(v)is minimal (among all possible choices that satisfy the first constraint).

In other words, v /∈Vτ(v)and for every i,1≤i≤k, if v /∈Vithen |{w /∈Vi: (v, w)∈

E}| ≥|{w /∈Vτ(v): (v, w)∈E}|.) Since vis not a conflict vertex for hS, χi, such a color

τ(v)always exists (but is possibly not unique). By the fact that vis not a heavy vertex for

hS, χi, we know that |{w /∈Vτ(v): (v, w)∈E}| <  n/3. Therefore, if we remove from

G0for every v∈Zall edges (v, w)∈Ewith w /∈Vτ(v), then the resulting graph G∗is

obtained from Gby removal of less than  n2edges. It remains to show that the following

coloring of G∗is proper: We color each vertex v∈S∪X∪Ywith color χ0(v). Each

vertex v∈Zis colored with color τ(v). Now assume that this coloring is not proper. Then

there must be an edge (v, u)such that vand uare colored in the same color. Since χis

proper, all vertices in Xand Yare isolated, and by the definition of τ(v)it is immediate

that such an edge can only be between vertices in Z. But then we have that u /∈Vτ(v)and

this implies that (v, u)has been removed from G0. Contradiction. 2

It remains to prove the Feasibility Preserving property:

Lemma 4.6.5 (Feasibility Preserving property) If for some S⊆Vevery basis covered

by Sis violated by certain v∈Sthen the subgraph of Ginduced by vertices in Scannot

be properly k-colored.

4.7 Hypergraph Coloring

Proof : The proof is by contradiction. Let us suppose there is a proper k-coloring χof the

subgraph of Ginduced by the vertices in S. For every U⊆S, let χUdenote the coloring χ

restricted to vertex set U.

Let us observe that the set of bases covered by Sis not empty, because it contains

the “empty set” basis (∅, χ∅). Therefore, there exists a basis (possibly one of many) b=

(U, χU)covered by Shaving the maximum size of set U. Since bis violated by certain v∈

S, either vis a conflict vertex for hU, χUior vis a heavy vertex for hU, χUi. Furthermore,

since χwas assumed to be a proper k-coloring of S,vcannot be a conflict vertex for

hU, χUiand therefore it must be a heavy vertex for hU, χUi. But this implies that (U∪

{v}, χU∪{v})is a basis that is moreover covered by S. This yields a contradiction, because

we assumed that there is no basis (K, χK)covered by Shaving |K|>|U|.2

Therefore, we summarize our discussion in this section with the following theorem that

follows directly from Theorem 10 and Lemmas 4.6.4 and 4.6.5.

Theorem 11 There is a property tester for the graph k-coloring property with a query

complexity of

O(k −2ln(k/2))2=e

O(k2/4).2

4.7 Hypergraph Coloring

We now want to extend the analysis from Section 4.6 to obtain an efficient property tester

for hypergraph coloring. A hypergraph is a pair H= (V, E)with a finite vertex set V

and the edge set E⊆2V. A hypergraph His `-uniform if |e|=`for all e∈E. (Notice

that a 2-uniform hypergraph is a graph.) A k-coloring of a hypergraph His an assignment

χ:V→{1,...,k}of colors to the vertices of the hypergraph. A coloring is proper if

no edge in Eis monochromatic, that is, if for every edge e∈Ethere are v, u ∈ewith

χ(v)6=χ(u). If Hhas a proper k-coloring, then His k-colorable. The k-coloring problem

for hypergraphs is to decide whether a given hypergraph is k-colorable.

Hypergraph coloring is a well studied problem in the literature in discrete mathemat-

ics, combinatorics, and computer science. In contrast to graphs, where one can decide in

linear time if a graph is 2-colorable (or equivalently, bipartite), deciding if a given hyper-

graph is 2-colorable is NP-complete even for 3-uniform hypergraphs [64]. In [63], it is

shown that unless NP ⊆ ZPP, for every fixed `≥3, it is impossible to approximate in

polynomial time the chromatic number of `-uniform hypergraphs within a factor n1−εfor

every constant ε>0. See also [55, 60] for further inapproximability results. The property

of hypergraph 2-colorability has been also extensively studied in combinatorics (see, e.g.,

[28, 29, 39, 73]), and for example, the study of this problem led to the discovery of the

celebrated Lov´

asz Local Lemma [39].

We want to design a property tester for the k-coloring problem in `-uniform hyper-

graphs. An `-uniform hypergraph can be represented by a function f:V`→{0, 1}that

encodes its adjacency matrix. We use the standard distance measure (Definition 2.1.3) to

measure the distance between hypergraphs. In terms of hypergraphs we can express this

distance measure as follows:

4 Efficient Property Testers

Definition 4.7.1 An `-uniform hypergraph H= (V, E)is -far from having a proper k-

coloring, if we have to remove more than n`edges from Hto obtain a hypergraph that

has a proper k-coloring.

Now, we discuss how to apply our framework to hypergraph coloring. Similarly to

the graph coloring problem, we identify the ground set Cwith the set of vertices Vof

the input hypergraph H= (V, E). Since in our representation Hcan be viewed as given

by its adjacency matrix representation, we define the interpretation Ito map each set of

vertices to the submatrix induced by these vertices. That is, for every S⊆V, we have

I(S) = S×S×···×S. Clearly, the interpretation is N`-bounded. Let hS, χibe a pair with

S⊆Vand χa proper k-coloring of vertices in S.

Definition 4.7.2 We say a vertex vis i-colorable with respect to hS, χiif for every e∈E

with v∈e, either (i) there exists a vertex u∈(S∩e)with χ(u)6=ior (ii) there exists a

vertex w∈e\(S∪{v}).

We want to extend our approach for graph coloring to hypergraphs. Again we use a

recursive definition of bases that is based on heavy vertices and conflict vertices. In order

to define heavy vertices we introduce a potential function. A vertex is a heavy vertex if

its coloring increases the potential significantly (very much in the spirit of graph coloring;

further motivation behind the definition of the potential function can be found in the proof

of Lemma 4.7.7.)

Definition 4.7.3 Let H= (V, E)be a hypergraph. Let S⊆Vand let χbe a proper

k-coloring of vertices in S. The potential of hS, χiis defined as

ΦH(hS, χi) :=

i=1

`−1

j=1

nj−1·|ϕ(hS, χi, i, j)|

where

ϕ(hS, χi, i, j) = W⊆V:|W|=`−j&∃e∈E(W⊆e&∀v∈e\Wχ(v) = i).

Hence if W∈ϕ(hS, χi, i, j), then coloring all vertices in Xwith color icreates a

monochromatic edge. Our next step is to extend the notion of heavy and conflict vertices

to hypergraphs:

Definition 4.7.4 A vertex v∈V\Sis heavy with respect to hS, χiif (i) there is an i,

1≤i≤k, such that vis i-colorable and (ii) for every i,1≤i≤k, if vis i-colorable and

χ0is the extension of χto S∪{v}by coloring vwith color ithen

∆ΦH(hS, χi, v, i) := ΦH(hS∪{v}, χ0i) − ΦH(hS, χi)>n`−1

Definition 4.7.5 A vertex v∈V\Sis a conflict vertex with respect to hS, χiif for every

i,1≤i≤k,vis not i-colorable.

The bases and the violation function are defined in the same way as for graph coloring:

4.7 Hypergraph Coloring

Bases for k-coloring:

•{∅, 1}is a basis (where 1is the encoding of the coloring of the empty set of vertices)

and

•if b= (K, χ)is a basis, vis a heavy vertex for band χ0is an encoding of the previous

coloring χof Kextended by a proper coloring of v, then (K∪{v}, χ0)is a basis.

Violation function for k-coloring: A basis b= (K, χ)is violated by a vertex v∈V

if either (i) vis a heavy vertex for hK, χior (ii) vis a conflict vertex for hK, χi.

In a similar way as for the graph coloring problem we can give an upper bound for the

dimension of the constructed ACP:

Lemma 4.7.6 For every problem instance of the hypergraph k-coloring problem the cor-

responding abstract combinatorial program Phas dim(P)≤3k`/ and width(P)≤

k3k`/.

Proof : Since every k-coloring of a set of rvertices can be encoded using an integer

in the range between 1 and kr, it is enough to show that |K|≤3 k `/ for every basis

b= (K, χ). To show this, let us recall that the bases are defined inductively by adding a

heavy vertex to another basis. By definition, a heavy vertex increases the potential of the

corresponding basis by more than 1

3 n`−1. The maximum potential of every basis is less

than k ` n`−1and the starting potential is 0. Thus, it follows for every basis b= (K, χ)that

|K|≤3 k `/.2

Our next step is to prove the Distance Preserving property.

Lemma 4.7.7 ((, /3)-Distance Preserving property) Let H= (V, E)be a hypergraph

that is -far from being k-colorable and let S⊆Vbe an arbitrary set of vertices colored

according to a proper k-coloring χ. Then, either Vcontains more than  n/3 conflict

vertices with respect to hS, χior Vhas more than  n/3 heavy vertices for hS, χi.

Proof : Our arguments are similar to those used in the proof of Lemma 4.6.4. The proof

is by contradiction. Let us assume there are less than or equal to  n/3 heavy vertices

and less than or equal to  n/3 conflict vertices with respect to hS, χi. Then, we show

that it is possible to extend coloring χof Sto a coloring χ∗of Vthat has at most  n`

monochromatic (violating) edges in H. This will yield contradiction.

We define χ∗as follows:

χ∗(v) =











χ(v)for every v∈S

1if v∈V\Sand vis a heavy vertex or a conflict vertex with respect to

hS, χi

iif v∈V\Sis i-colorable with respect to hS, χiand iminimizes (over

all possible choices of proper coloring i) the increase in the

potential, that is, ∆ΦH(hS, χi, v, i)≤∆ΦH(hS, χi, v, j)for every

proper coloring jof v

4 Efficient Property Testers

Now, we give an upper bound on the number of monochromatic edges in coloring χ∗

of H. Let us first consider heavy and conflict vertices. By our assumption, the number of

such vertices is upper bounded by 2

3 n. Therefore, the number of edges incident to these

vertices is upper bounded by 2

3 n`. Hence, it is sufficient to show that there are at most

3 n`monochromatic edges in Hthat are not incident to heavy or conflict vertices. We

show this indirectly by defining a set ED⊆Eof at most 1

3 n`edges. Then we prove that

this set contains all monochromatic edges for the coloring χ∗. Let Vlight denote the set of

all vertices in V\Sthat are neither heavy nor conflict vertices for hS, χi.

For a vertex v∈Vlight let us define

∆ϕ(hS, χi, v, i, j) := ϕ(hS∪{v}, χ0i, i, j)\ϕ(hS, χi, i, j)

where χ0is the extension of χto S∪{v}by coloring vertex vwith color i. We further

denote by E(X, v) = {e∈E:v∈e&X⊆e}all edges of the hypergraph that contain

X∪{v}. Now we make a simple but important observation:

Claim 4.7.8 If E(X, v) = ∅then X /∈∆ϕ(hS, χi, v, χ∗(v), j).

Proof : Follows immediately from the definition of ∆ϕ(hS, χi, v, χ∗(v), j).2

We want to define the set

ED:= [

v∈Vlight [

j∈[`−1]

E(v)

D,j

using sets E(v)

D,j that determine for each vertex va set of edges that is responsible for the

sets in ∆ϕ(hS, χi, v, χ∗(v), j). As we will see later these edges are the only edges that may

possibly be monochromatic in χ∗. We define the set E(v)

D,j as follows:

E(v)

D,j := [

X∈∆ϕ(hS,χi,v,χ∗(v),j)

E(X, v)

Claim 4.7.9 Let e∈Ebe a monochromatic edge in the coloring hV, χ∗i. Then e∈ED.

Proof : If eis monochromatic then there is i∈[k]such that χ∗(u) = ifor all u∈

e. Further we conclude that for each u∈ewe have {u}∈ϕ(hV, χi, i, ` −1). Now

we distinguish between two cases. First let us consider the case when there is a u∈e

with {u}∈ϕ(hS, χi, i, ` −1). In this case χ∗(u)6=iby the definition of χ∗. In the

second case such a udoes not exists but we have {u}∈ϕ(hV, χi, i, ` −1). But - as

can be seen from Claim 4.7.8 - we defined EDin such a way that there cannot be an

X∈ϕ(hV, χi, i, ` −1)that is not contained in ϕ(hS, χi, i, ` −1). Hence, if eis not in ED

it cannot be monochromatic. 2

It remains to show that the size of EDis small enough. By definition we have

ED=X

v∈Vlight

`−1

j=1E(v)

D,j

4.7 Hypergraph Coloring

and it is easy to see that for the E(v)

D,j it holds

E(v)

D,j≤∆ϕ(hS, χi, v, χ∗(v), j)·nj−1

because of the fact that |E(X, v)|≤n`−(|X|+1). We conclude that we have

ED≤X

v∈Vlight

`−1

j=1∆ϕ(hS, χi, v, χ∗(v), j)·nj−1

Since all considered vertices are light we also have for every v∈Vlight

∆ΦH(hS, χi, v, χ∗(v)) =

`−1

j=1∆ϕ(hS, χi, v, χ∗(v), j)·nj−1≤1

3n`−1.

This finally gives us ED≤1

3n`.

Together with Claim 4.7.9 we get a contradiction to the fact that His -far from being

k-colorable. This proves the lemma. 2

The proof of the following lemma is essentially the same as the proof of Lemma 4.6.5

and we present it here for the sake of completeness only.

Lemma 4.7.10 (Feasibility Preserving property) If for some S⊆Vevery basis covered

by Sis violated by certain v∈S, then the sub-hypergraph of Hinduced by vertices in S

cannot be properly k-colored.

Proof : The proof is by contradiction. Let us suppose there is a proper k-coloring χof the

subgraph of Hinduced by the vertices in S. For every U⊆S, let χ|Udenote the coloring

χrestricted to vertex set U.

Let us observe that the set of bases covered by Sis not empty, because it contains

the “empty set” basis (∅, χ∅). Therefore, there exists a basis (possibly one of many) b=

(U, χ|U)covered by Shaving the maximum size of set U. Since bis violated by certain v∈

S, either vis a conflict vertex for hU, χ|Uior vis a heavy vertex for hU, χ|Ui. Further, since

χwas assumed to be a proper k-coloring of S,vcannot be a conflict vertex for hU, χUi

and therefore it must be a heavy vertex for hU, χ|Ui. But this implies that (U∪{v}, χ|U∪{v})

is a basis that is moreover covered by S. This yields a contradiction, because we assumed

that there is no basis (K, χ|K)covered by Shaving the size of Kgreater than |U|.2

The three lemmas above combined with our framework from Theorem 10 imply the

following result.

Theorem 12 There is a property tester for the hypergraph k-colorability with the query

complexity

O(k`−2ln(k/))`=e

O((k `/2)`).2

4 Efficient Property Testers

4.8 Testable Hereditary Graph Properties

In this section we consider arbitrary hereditary graph properties. A graph property Π

is a family of graphs that is preserved under graph isomorphism (that is, if Gsatisfies

property Πand G0is a graph isomorphic to Gthen G0has property Π, too). A graph

property Πis hereditary if it is closed under taking induced subgraphs, that is, if graph

G= (V, E)has Πthen every subgraph GSinduced by a set S⊆Vhas property Π(see,

e.g., [20]). Similarly as in Section 4.6, we consider undirected graphs that are represented

by a function f:V×V→{0, 1}that encodes the adjacency matrix of the graph. W.l.o.g.

we assume V= [n]. When we talk about graph properties we have to observe that in

our framework a property is defined as a subset of the set of n-vertex graphs. When no

confusion can arise we use Πto denote the graph property Πas well as the corresponding

set of n-vertex graphs (which is the formal definition of a property in this thesis). Our

main result is that a hereditary graph property is efficiently testable if and only if it can be

reduced to an abstract combinatorial program of dimension that is independent of the size

of the input graph.

For every graph G= (V, E)and every subset U⊆Vwe denote by GUthe subgraph of

Ginduced by U.

We use the standard distance measure from Definition 2.1.3 for testing graph proper-

ties:

Definition 4.8.1 Let Πbe an arbitrary graph property. A graph Gis -far from (satisfy-

ing) Πif in order to transform Ginto a graph satisfying Πone has to modify more than

 n2entries in the adjacency matrix of G.

Now, we can formally state the main result of this section.

Theorem 13 Let Πbe a hereditary graph property. Let 0<<1. Let Gbe the set of all

graphs on the vertex set V= [n]. Then, Πis efficiently testable if and only if there are

δ=δ(),ρ=ρ()and λ=λ(), such that every G∈ G can be mapped to an abstract

combinatorial program PG= ([n],B, $)with dim(PG)≤δ()and width(PG)≤ρ()

satisfying the following two properties:

((, λ)-Distance Preserving) if Gis -far from Πthen every basis in PGis λ-far from

feasible, and

(Feasibility Preserving) for every S⊆Vwith |S|≥δ(): If there is G0∈Πwith

GS=G0

Sthen there is a self-feasible basis for Sin PG.

The remaining part of this section is devoted to the proof of Theorem 13. We begin with

the following lemma that simplifies the analysis of property testers for graph properties. By

this lemma, if a hereditary graph property Πhas a property tester with a query complexity

of q()then it has property tester that samples a set Sof r(q()) vertices and accepts, if

and only if the subgraph induced by Shas Π.

4.8 Testable Hereditary Graph Properties

Lemma 4.8.2 (Alon, In Appendix E of [54]) Let Πbe a hereditary graph property.

Suppose there is a property tester for property Πwith query complexity q(). Then, there

is a property tester for property Πthat selects uniformly at random a set of r(q()) vertices

and accepts the input graph if and only if the corresponding induced subgraph satisfies

property Π.2

In order to prove Theorem 13, we must prove that our condition is both necessary and

sufficient for efficient testability of any property Π. We first observe that our proof of The-

orem 10 with the interpretation Ias defined in Section 4.6 implies directly the sufficiency

of our conditions. Now, we prove the necessity of our condition.

Lemma 4.8.3 (Necessary Condition) Let Πbe a hereditary graph property. Let 0 <  <

1. Suppose there is a property tester for property Πwith query complexity q()≤1

4nand

let us set r=r(q()). Let Gbe the set of all graphs on the vertex set V={1,...,n}.

Let λ=λ() = 1

3·2r. Then, for every G∈ G there exists an abstract combinatorial

program PG= ([n],B, $)with dim(PG)≤2r =: δand width(PG) = 1satisfying the

(, λ)-Distance Preserving and the Feasibility Preserving properties.

Proof : Let us fix an arbitrary graph G= (V, E)for which we describe ACP PGsatisfying

the required properties.

Bases for graph properties: We define the bases of PGto be of the form (K, 1)where

K⊆V. We first give a set of basis candidates BC and then show how to obtain the set of

bases from this set of candidates.

We define the set of basis candidates BC as follows:

•(∅, 1)is a basis candidate,

•if Kis a set of vertices of size rand GK(the subgraph induced by K) does not satisfies

Π, then (K, 1)is a basis candidate, and

•if (K, 1)is a basis and K0⊆K, then (K0, 1)is a basis candidate.

We also define the notion of violators and light bases:

Definition 4.8.4 Let BC be a set of basis candidates and let b= (K, 1)∈ BC. The set of

violators ViolBC(b)of bwith respect to BC is defined as follows:

•if |K|=rthen ViolBC(b) = V\K, and

•if |K|< r then ViolBC(b) = {v∈V\K:there exists a basis candidate b0=

(K∪{v}, 1)}.

Definition 4.8.5 Let BC be a set of basis candidates and let b∈ BC. We call blight (with

respect to BC) if |ViolBC(b)|≤3

2λ n. If bis not light then it is heavy.

4 Efficient Property Testers

Now, we define the set of bases Bas the output of the following algorithm COMPUTE-

BASES:







COMPUTEBASES(Set of candidates BC)

while there is a light basis b(with respect to the current BC)do

BC =BC \ {b}

return B:= BC ∪{(∅, 1)}

Violation function for graph properties: A basis b∈ Bis violated by its violators, that

is, by all vertices contained in ViolB(b).

Notice that our construction of bases and the definition of the violation function implies

that every basis b∈ B is heavy, that is, it is violated by more than 3

2λ n vertices in V.

Now, the following claim follows trivially from our construction.

Claim 4.8.6 PGhas dimension (r, 1).2

(, λ)-Distance Preserving Property: We prove now that the ACP PGdefined above

satisfies the (, λ)-Distance Preserving Property, that is, that if Gis -far from Πthen

every basis in PGis λ-far from feasible, where λ=λ() = 1

3·2r. Notice that by the

definition of bases, every basis except for (∅, 1)is violated by more than λ n ground set

elements. Therefore, what remains to be proven is that also the basis (∅, 1)is violated by

more than λ n ground set elements. Our proof of this fact is via a sequence of claims that

top-down establish lower bounds for the number of bases of given size.

For a basis b= (K, 1), let |K|be called its size. We begin with the following simple

claim about bases of size r.

Claim 4.8.7 Suppose r < (1−2

3λ)n. If Gis -far from Π, then the number of bases of

size rin Bis bigger than 2

3·n

r.

Proof : The proof follows directly from the definition of the bases and from Lemma

4.8.2. Indeed, Lemma 4.8.2 implies that for a graph that is -far from Π, if one picks at

random a set of rvertices in V, then with probability at least 2

3the subgraph induced by

these vertices does not satisfy Π. This is equivalent to say that the number of subsets of V

of size rfor which the induced subgraph does not satisfy Πis bigger than 2

3·n

r. By the

definition of the basis candidates, for every set K⊆Vof size r,(K, 1)∈ BC. Moreover,

the set V\Kis the set of violators of K. Hence (K, 1)is a basis that is violated by n−r

violators, and thus (K, 1)is heavy. Thus, (K, 1)belongs to B. Therefore, the number of

bases of size rin Bis bigger than 2

3·n

r.2

The next claim deals with the relation between the number of bases of size kand of

size k−1.

Claim 4.8.8 Suppose that n≥2 r. Let ζ,0≤ζ≤1, and let k,1≤k≤r, be an integer.

If there are more than ζ·n

kbases of size kin Bthen the number of bases in Bof size

k−1is bigger than ζ−3·λ

2·n

k−1.

4.8 Testable Hereditary Graph Properties

Proof : Recall that Bcontains only heavy bases. Furthermore, if (K, 1)is a basis then our

construction of bases ensures that for every u∈K,(K\ {u}, 1)is a basis (in the following

we refer with basis also to basis candidates) and that this basis is violated by u. Thus,

every basis of size kdefines kviolators for bases for size k−1. We conclude that overall,

there are more than

k·ζ·n

k= (n−k+1)·ζ·n

k−1>1

2·n·ζ·n

k−1

violators for bases of size k−1. Observe that every light basis has at most 3

2λ n violators.

Therefore, the number of violators of all light bases of size k−1is at most 3

2λ n n

k−1. It

follows that the number of violators of heavy bases of that size is bigger than

2·n·ζ·n

k−1−3

2·λ·n·n

k−1=1

2·(ζ−3 λ)·n·n

k−1.

Since each basis has at most nviolators it follows that the number of heavy bases is larger

than 1

2·(ζ−3 λ)·n

k−1,

which completes the proof of our claim. 2

The next claim gives a lower bound for the number of bases of given size (bigger than

0).

Claim 4.8.9 Let 1≤k≤rbe an integer. Then the number of bases of size r−kis bigger

than 1

3·2k−1−3

2λ2−1

2k−1·n

r−k.

Proof : The proof is by induction on k. For k=1the claim follows directly from Claims

4.8.7 and 4.8.8. Now, let us assume the claim is true for k=k0, for certain 1≤k0< r,

and we show that this implies that the claim is true also for k=k0+1. By induction, this

will yield the claim.

By the induction hypothesis, the number of bases in Bof size r−k0is bigger than

1

3·2k0−1−3

2λ(2−1

2k0−1)·n

r−k0

=1

3·2k−2−3

2λ2−1

2k−2·n

r−k+1.

Therefore, by Claim 4.8.8, the number of bases in Bof size r−k= (r−k0) − 1is bigger

than

1

3·2k−2−3

2λ2−1

2k−2−3 λ

2·n

r−k=1

3·2k−1−3

2λ2−1

2k−1·n

r−k

4 Efficient Property Testers

and the claim follows. 2

Now, we are ready to complete the proof of the (, λ)-Distance Preserving Property for

PG. By our discussion above, we only must show that the basis (∅, 1)is violated by more

than λ n ground set elements. By Claim 4.8.9, the number of bases in Bof size 1is bigger

than 1

3·2r−2−3·λ·n

1=4

3·2r−3·λ·n≥λ·n ,

by our assumption that λ=1

3·2r. Each of the bases of size 1is of the form ({u}, 1)for some

u∈Vand by the definition of violation, such a vertex uviolates (∅, 1). Since the number

of such vertices uis bigger than λ·n, this completes the proof of the (, λ)-Distance

Preserving Property for PG.

Feasibility Preserving Property: We prove now that the ACP PGsatisfies the Feasi-

bility Preserving Property. We prove that for every S⊆V,|S|≥δ, if the subgraph GS

satisfies Πthen there is a self-feasible basis for Sin PG. This implies the Feasibility Pre-

serving Property because Πis hereditary and so: If GSdoes not have property Πthen G

cannot have Π, as well.

Our arguments are roughly the same as in the proof of Lemma 4.6.5. The proof is

by contradiction. Let us suppose that the subgraph GSsatisfies Π, but every basis in S

is violated. Observe that the set of bases covered by Sis not empty, because it contains

the “empty set” basis (∅, 1). Therefore, there exists a basis b= (K, 1)covered by S

maximizing the size of set K. Since bis violated by certain v∈Sand since Kis of

maximum size, we conclude that |K|=r. It follows that the subgraph GKinduced by

Kdoes not have property Π. Since Πis hereditary, it is closed under taking induced

subgraphs, and hence we conclude that GSdoes not have property Π.

If n < 2r then we use the following simple ACP: The ACP has a single basis (V, 1)

which is violated by all ground set elements, if Gis -far from Πand which is not violated

otherwise. This completes the proof of Lemma 4.8.3. 2

Now, we can conclude our discussion to complete the proof of Theorem 13.

Proof : The “only if” part follows from Lemma 4.8.3 and the “if” part follows from

Theorem 10 with the interpretation Ias defined in Section 4.6. 2

5 Testing Algorithms with

Geometric Queries

In Chapter 3 we considered Property Testing in the ’standard model’ for geometric objects

such as point sets. This means that the property testing algorithm is only allowed to ask

basic queries about the given object. In the case of point sets, our algorithm was allowed

to ask queries of the form ’What is the position of the i-th point of the point set ?’.

In this section we want to introduce a new type of query for point sets: Range queries.

Range queries are of the form: ’What is the i-th point in a query range R?’ where R

is a range in the Rd(typically, an axis parallel rectangle or a simplex). Such queries are

efficiently supported by basic spatial data structures, e.g. range trees and partition trees

[1].Later in this section we show that the property of being in convex position (in the R2)

can be tested with O(logn/)triangular range queries. This stands in contrast to the lower

boundof Ω(d+1

pnd/)onthe query complexityin thestandard model fromChapter 3. We

also consider two other examples, labeling and clustering, and show that we can achieve

much better results in the new model than it would be possible in the standard model.

5.1 Property Testing with Range Queries

Many fundamental data structures for point sets are designed to efficiently answer range

queries (cf. [1]), i.e. queries that ask to return all points (or just their number) located in

a given query range. Usually, the shape of a query range is restricted to a certain class of

ranges, e.g. axis parallel rectangles or simplices. This way it is possible to answer queries

much more efficient than it would be possible for arbitrary query ranges.

In this section we introduce models of computation whose basic operations are range

queries. All models are defined for point sets in the Rd. Let Pdenote the point set under

consideration.

The Simplex Range Query Model.In the Simplex Range Query Model a Property

Testing algorithm is allowed to ask the following types of queries:

•’What is the number of points of Pcontained in a (possibly degenerated) simplex S

?’.

5 Testing Algorithms with Geometric Queries

•’What is the position of the i-th point of Pcontained in a (possibly degenerated)

simplex S?’ The ordering of the points of Pcontained in simplex Sis fixed but

unknown to the property testing algorithm. If iis bigger than the number of points

contained in Sthen the symbol ∅is returned to denote that such a point does not

exist.

•’What is the position of the i-th point of P?’ The ordering of the points of Pis fixed

but unknown. If iis bigger than |P|then the symbol ∅is returned. Such a query

could be realized using the second type of query with a sufficiently large simplex.

For convenience in notation we introduce this as a separate type of query.

Of course, the simplex Sspecified by the property testing algorithm may differ from

query to query. This model is motivated by the large number of efficient data structures

that support triangular range queries such as partition trees and cutting trees. The most

efficient data structure for simplex range counting queries requires O(nd+)space (for

any >0) and supports queries in time O(logdn)[1].

The Rectangle Range Query Model.In the Rectangle Range Query Model a prop-

erty testing algorithm is allowed to ask the following types of queries:

•What is the number of points of Pcontained in a (possibly unbounded) axis parallel

rectangle R?’

•’What is the position of the i-th point of Pcontained in a (possibly unbounded) axis

parallel rectangle R?’ The ordering of the points of Pcontained in rectangle Ris

fixed but unknown to the Property Testing algorithm. If iis bigger than the number

of points contained in Rthen the symbol ∅is returned to denote that such a point

does not exist.

•’What is the position of the i-th point of P?’ The ordering of the points of Pis fixed

but unknown. If iis bigger than |P|then the symbol ∅is returned. Similar to the

Simplex Range Query Model such a query could be realized using the second type

of query with a sufficiently large rectangle.

Orthogonal range queries are supported by many fundamental data structures such

as range trees, R-tree, and quad trees. The most efficient data structure for orthogonal

range counting queries in the RAM model supports queries in O(logd−1n)time and uses

O(nlogd−2n)space [1].

The Mixed Range Query Model. In the Mixed Range Query Model the property

testing algorithm may ask every query that is allowed in the Simplex Range Query Model

as well as every query that is allowed in the Rectangle Range Query Model.

Some further definitions. In all three models the query complexity of a property

testing algorithm is the number of queries asked about the point set. The size n=|P|of

the point set is known to the testing algorithm.

5.2 Testing Convex Position with Range Queries

In Chapter 3 we designed a property tester for convex position in the standard property

testing model. Its query complexity was O(d+1

pnd/). We also proved a matching

lower bound on the query complexity of every testing algorithm in the standard testing

model. Obviously, the simplex range query model is more powerful than the standard test-

ing model. But how much does the possibility to use simplex range queries help us when

we want to design property testing algorithms ?

We first design and analyze a property tester for convex position in the simplex range

query model. We only consider the 2-dimensional case when the point set Pis in the

R2. Again we assume that the input point set is in general position. We prove that in

the simplex range query model there is a property tester for convex position with query

complexity O(logn/). Hence it is possible to achieve an exponential improvement in

the number of queries using this new model.

Our algorithm is based on a randomized test for extremality of points. Hence, we first

have to prove that many points of a point set are not extreme, if the set is -far from convex

position. This is done in the following Proposition:

Proposition 5.2.1 Let Pbe a point set that is -far from convex position. Then more than

|P|points in Pare not extreme.

Proof : Let Pbe -far from convex position. Assume less than |P|points in Pare not

extreme. Then we can delete these points from Pto obtain a point set in convex position.

Hence the point set Pcannot be -far from convex position. This is a contradiction. 2

Our next step is to show that the simple algorithm ISFACET decides if two points are

the vertices of the same facet (segment) of the convex hull.

Figure 5.1: If two points pand qbelong to the same facet of the convex hull then one of

the halfspaces induced by a line through pand qcontains exactly 2 points.

ISFACET(p, q)

Let H+and H−denote the two halfspaces induced by the line through pand q

if H+contains exactly 2 points from Pthen accept

if H−contains exactly 2 points from Pthen accept

reject

5 Testing Algorithms with Geometric Queries

Figure 5.2: A circular ordering for all points in a halfspace induced by a line through p

and r.

We show the following lemma:

Lemma 5.2.2 Let p, q ∈Pbe two points from a point set P⊆R2. Then algorithm

IsFacet(p, q)decides whether pand qare the vertices of the same facet of the convex hull

of P. The algorithm uses 2simplex range queries.

Proof : First of all, let us observe that two points p, q ⊆Pare the vertices of the

same facet of the convex hull of P, if and only if one of the closed halfspaces induced

by a line through pand qcontains exactly the points pand q. A halfspace range query

is a degenerated triangular range query and so we are allowed to use such a query in the

simplex range query model. Hence the algorithm is correct and obviously uses at most 2

queries. 2

So far, we can only find out whether two points belong to the same facet of the convex

hull of P. This would imply that both of them are extreme points. We use this possibility

to design an algorithm that finds out whether a single point pis extreme. If pis extreme

(and if |P|> 1) then there is a point q∈Psuch that pand qare vertices of the same facet

of the convex hull of P. Thus we have to design a procedure that finds such a point q, if p

is extreme.

The idea of our procedure is simple: Given an extreme point pwe pick another point

runiformly at random from P. If pr is a facet of the convex hull of Pthen we have a

witness that pis extreme. If not we select one of the halfspaces induced by a line through

pand r. Let us call the selected halfspace H(see Figure 5.2). Then we observe that the

point pinduces a circular ordering among the points in halfspace H. The ’last’ point in

this ordering is the point qwe are looking for. We can find this point efficiently using a

randomized binary search-like procedure on the points within the halfspace.

5.2 Testing Convex Position with Range Queries

EXTREMECHECK (P, p)

Choose a point r∈Puniformly at random

if ISFACET(p, r)then return pis extreme

Let Hdenote a closed halfspace induced by the line through pand r

Let τ1denote the (degenerated) triangle H

i=1

while τicontains more than 2points and i≤100 logndo

Choose a point si∈τiuniformly at random

if si6=pand si6=rthen

Let H(i)

sdenote the halfspace induced by a line through pand si

that does not contain r

Let τi+1denote the intersection of Hand H(i)

i=i+1

if τicontains more than 2 points then return don’t know

else if ISFACET(p, si)then return pis extreme

else return pis not extreme

We first show that the algorithm is correct, if it returns an answer other than ”don’t

know”.

Lemma 5.2.3 Let pbe an extreme point in P. If algorithm EXTREMECHECK is started

with input pthen the algorithm returns either ”pis extreme” or ”don’t know”.

Proof : We first observe that pis incident to two facets of the convex hull of Psince p

is an extreme point. The segment pr is not a facet of the convex hull of P, since otherwise

the algorithm had accepted the input in the first if-statement by Lemma 5.2.2. Now let

q1and q2denote the points of pthat share a facet with pin the convex hull of P. Then

either one of the points must be in halfspace H. If τi+1contains more than 2 points then

we know that sicannot share a facet with pbecause both closed halfspaces induced by a

line through pand sicontain at least 3 points. One of the halfspaces contains p,si, and

rand the other one contains all points in τi+1. Hence sican only share a facet with p, if

τi+1contains exactly 2points. If the algorithm does not answer ”don’t know” it computes

a triangle that contains exactly 2points. As we already observed there is a point in Hthat

shares a facet with pit must be the computed point. 2

Lemma 5.2.4 Let pbe a point in Pthat is not extreme. If algorithm EXTREMECHECK is

started with input pthen the algorithm returns either ”pis not extreme” or ”don’t know”.

Proof : By Lemma 5.2.2 we know that the algorithm only return ”pis extreme” if pis a

vertex of a facet of the convex hull of P. The lemma follows since pcannot be a vertex of

a facet of the convex hull, if it is not extreme. 2

Then we show that the algorithm typically returns an answer other than ”don’t know”.

5 Testing Algorithms with Geometric Queries

Lemma 5.2.5 Let Pbe a point set of npoints in the R2. Let Xp

idenote the random variable

for the number of points in triangle τiwhen running algorithm EXTREMECHECK with

input p∈P. Let Zidenote the indicator random variable for the event that Xp

i+1≤2

3Xp

Then we have

Pr[Zi=1]≥1

Proof : Let Qidenote the set of points that are contained in triangle τi. We observe

that Pr[si=q] = 1

|Qi|for each point q∈Qi. For the sake of analysis we enumerate all

points in Qibut paccording to their circular ordering around p. Then it is immediate that

i+1≤2

3Xp

iholds, if sibelongs to the last b2

3|Qi|c−1points in this particular ordering. It

follows for |Qi|> 2 that

Pr[Zi=1]≥b2/3|Qi|c−1

|Qi|≥1

Lemma 5.2.6 Algorithm EXTREMECHECK returns the answer ”don’t know” with prob-

ability less than 1/3.

Proof : We first observe that

Pr[EXTREMECHECK returns ”don’t know”]≤PrX

1≤i≤100 logn

Zi≤log3/2 n

Now let Yibe a 0-1 random variable with PrYi=1=1/4. Then by Lemma 5.2.5 we

observe that

PrX

1≤i≤100 logn

Zi≤log3/2 n≤PrX

1≤i≤100 logn

Yi≤log3/2 n

=PrX

1≤i≤100 logn

Yi≤4logn

log(3/2)E[Yi]

≤PrX

1≤i≤100 logn

Yi≤8lognE[Yi]

≤PrX

1≤i≤100 logn

Yi≤(1−9

10)100 logn·E[Yi]

We apply a Chernoff bound [56] to the last inequality and obtain:

PrX

1≤i≤100 logn

Yi≤(1−9

10)100 logn·E[Yi]≤e−(9/10)225 logn/2 ≤e−10 logn≤1/3

Now we can come up with the testing algorithm:

5.3 Map Labeling

CONVEXTESTER-C(P, )

for i=1to 6−1do

choose a point pfrom Puniformly at random

if EXTREMECHECK(P, p)returns ”pis not extreme” then reject

Theorem 14 Algorithm CONVEXTESTER-Cis a property tester for convex position in the

plane in the simplex range query model. Its query complexity is O(logn/).

Proof : We observe that the algorithm only rejects a point set, if algorithm EXTREME-

CHECK returns that the point is not an extreme point. By Lemma 5.2.3 algorithm EX-

TREMECHECK always returns ”pis extreme” or ”don’t know” if pis extreme. Hence

every point set in convex position is accepted because every point is an extreme point.

It remains to prove that a point set Pis rejected, if Pis -far from convex position.

When a point pis chosen uniformly at random from Pthen by Proposition 5.2.1 the

probability that pis not extreme is at least . By Lemma 5.2.6 we have that the prob-

ability that EXTREMECHECK answers ”don’t know” is at most 1/3. We conclude that the

probability that a point chosen uniformly at random by EXTREMECHECK is not extreme

and EXTREMECHECK rejects it, is at least 2/3. Hence the probability that algorithm

CONVEXTESTER-Cdoes not reject a point set that is -far from convex position is at least

PrCONVEXTESTER-Caccepts≤(1−2/3)6−1≤e−4≤1/3

and hence, PrCONVEXTESTER-Crejects≥2/3

5.3 Map Labeling

We continue the investigations in the power of our new model. We now consider a basic

map labeling problem. Map labeling in general deals with problems where to place labels,

e.g. names of towns, rivers, mountains, and oceans, on a geographic map. The constraints

for such a placement of labels are fairly obvious: Labels should be close to the labeled

feature and labels should not intersect. A basic map labeling problem can be formulated

in the following way [47]:

Let Pbe a set of npoints in the R2. Decide whether it is possible to place naxis-

parallel unit squares such that

•all squares are pairwise disjoint (labels do not overlap),

•each point is a corner of exactly one square (each point is labeled), and

•each square has exactly one point on its corners (each point has a unique label).

5 Testing Algorithms with Geometric Queries

If a set Sof nsquares satisfies the conditions above, then Sis called a valid labeling

for P. The map labeling problem is known to be NP-complete and the corresponding

optimization problem (with the goal to maximize the label size) is known to have no ap-

proximation algorithm with ratio better than 2, unless P=NP [47].

Our goal is to design a property testing algorithm for the above labeling problem. We

use the standard distance measure from Definition 2.1.3. In terms of map labeling this can

be formulated as follows:

Definition 5.3.1 A set Pof npoints in the plane is -far from having a valid labeling, if

we have to delete more than  n points to obtain a set of points that has a valid labeling.

Before we consider the map labeling problem in the orthogonal range query model, we

prove a strong lower bound on the query complexity in the standard testing model. This

bound shows that we cannot hope for a property testing algorithm that has o(n1−δ)query

complexity for every constant δ>0.

Theorem 15 For every constant δ,0<δ<1, there is a positive constant such that

there is no property tester for the labeling problem with o(n1−δ)query complexity in the

standard testing model.

Proof : For a given δlet us define c=d1−δ

2δ eand let =1

2(2c+1). We observe that by the

chosen value of cit holds that δ≥1

2c+1. We construct a point set that is -far from having

a valid labeling. The set consists of n/(2c +1)copies of a set Qcof 2c +1points. Qchas

the property that it does not have a valid labeling but if we delete one point from it, then it

always has a valid labeling. A property tester with query complexity o(n1−δ)that samples

a subset of points uniformly at random cannot reject such a point set with probability 2/3.

Therefore, we first show that the existence of a property tester for a point set Pimplies

the existence of a property tester with similar query complexity that selects a sample set S

from Puniformly at random and accepts if and only if Shas a valid labeling:

Claim 5.3.2 Let Abe a property tester for the map labeling problem with query complexity

q(n, ). Then there exists a property tester A00 for the map labeling problem that sample a

set Sof q(, n)points uniformly at random and accepts if and only if the points in Shave

a valid labeling. Such a property tester is called canonical.

Proof : By Lemma 2.3.1 we know that there is a property tester A0that samples a set Sof

q(, n)points uniformly at random and decides based on Sand and its internal coin flips.

Let A00 be an algorithm that samples a set Sof q(, n)points uniformly at random and that

accepts if and only if Shas a valid labeling. We want to prove that A00 is a property tester.

Clearly, algorithm A00 accepts every point set having a valid labeling. Thus it remains to

prove that it rejects every point set that is -far from having a valid labeling. We show this

indirectly by proving the following statement: If algorithm A0rejects a sample set Sthen

so does algorithm A00. By definition a property tester has one-sided error and so A0must

accept if the sample set has a valid labeling. But if the sample Sdoes not have a valid

5.3 Map Labeling

labeling we know that A00 rejects the input. Hence, A00 always rejects, if A0rejects. Since

A0is a property tester it rejects every point set Pthat is -far from having a valid labeling

with probability at least 2/3. Thus algorithm A00 is a property tester. This proves Claim

5.3.2. 2

Now we show that there exists a point set that is -far from having a valid labeling

and that is not rejected by a canonical property tester with query complexity o(n1−δ)with

probability more than 1/3. Our construction is based on a point set Qcof size 2c +1with

the following two properties:

•Qcdoes not have a valid labeling,

•for every p∈Qcthe set Qc\ {p}has a valid labeling.

Label:

Does not have a valid labeling

Figure 5.3: The lower bound construction. If one point is missing then the set has a valid

labeling (right figure).

Claim 5.3.3 For every c≥2there exists a set Qcof 2c +1points such that the set Qc

does not have a valid labeling and such that Qc\{p}has a valid labeling for every p∈Qc.

Proof : For a given c≥2we construct Qc. W.l.o.g. we assume that two squares intersect

if and only if their interiors intersect. We place points lcand rcat position (1, 0)and (c, 1),

respectively, a set Ucof cpoints at the position (1, 1),...,(c, 1), and a set Lcof c−1points

at positions (1.5, 0.5),...,(c−0.5, 0.5). We claim that the set Qc=Uc∪Lc∪{lc, rc}

does not have a valid labeling. This is because we have only |Lc|−1label positions for the

points in Lcthat do not necessarily intersect with points from Uc∪{lc, rc}. Hence there

cannot be a valid labeling for Qc. On the other hand, for every q∈Qcthe set Qc\ {q}has

a valid labeling as one easily verifies by a case distinction (see also the figure above):

•In the first case we either have q=lcor q=rc(these cases are symmetric). Wlog.

let q=lc. Then we choose the upper left square for all points in Uc, the lower left

square for all points in Lcand the lower right square for rc.

•Now let qbe in Lcand let qhave the position (i, 0.5). We choose the upper left

square for all points in Uc, the lower right square for all points r= (j, 0.5)∈Lc

with j<iand the the lower left square for the remaining points in Lc. For the points

lcand rcwe choose the lower left and lower right square, respectively.

5 Testing Algorithms with Geometric Queries

•It remains to consider the case q∈Uc. Let us assume that qhas position (i, 1).

Then we pick the upper left square for all points r= (j, 1)∈Ucwith j<iand the

upper right square for the remaining points in Uc. For all points r= (j, 0.5)∈Lc

with j<i−1we choose the lower right square. For (i−0.5, 0.5)∈Lcwe choose

the upper right square, and for the remaining points from Lcwe choose the lower

left square. For the points lcand rcwe choose the lower left and lower right square,

respectively.

In all cases we easily verify that the labels do not intersect. Hence Claim 5.3.3 follows.

Now we can construct a set Pof npoints that consists of k=bn

2c+1ctranslated copies

W1, . . . , Wkof the set Qc(such that their possible labelings do not intersect). In order to

obtain a set with a valid labeling from Pwe must delete at least one point from each Wi,

1≤i≤k. Thus we have that Pis -far from having a valid labeling. Further a canonical

property tester can reject Ponly if the sample set Scontains at least one of the sets Wi. If

we now apply Lemma 2.4.2 with l=2c +1and p=1/2 then we obtain that

Pr∃j∈[k] : (Wj∈S)≤1/2

if the sample size is less than (n−2c)·1

2n 1

2c+1=Ω(n1−δ). Hence there is no canonical

property tester with query complexity o(n1−δ)and so there cannot be any property tester

with that query complexity. 2

We show now that if we use the computational model that allows range queries we can

design a tester with O(1/3)query complexity. It is based on the approach developed in

[36] and [58].

LABELTEST(P):

choose a sample set Sof size 4/ uniformly at random from P

for each p∈Sdo

i=0,T=∅

Let Rbe the axis parallel square with center pand side length 20d1/e

while i≤(20d1/e+2)2do

Let qbe the i-th point in the query range R

if q6=∅then T=T ∪{q}

i=i+1

if Tdoes not have a valid labeling then reject

Theorem 16 Algorithm LABELTEST is a tester for the labeling problem that has query

complexity O(1/3)and running time exp(O(1/2)).

Proof : First of all, let us observe that algorithm LABELTEST accepts every input point

set that has a valid labeling. Thus we have to show that every instance Pthat is -far

5.4 Clustering Problems

from having a valid labeling is rejected with probability at least 2/3. Let us assume that

Pis -far from having a valid labeling. Now we partition the plane into grid cells of side

length 10d1/esuch that at most n/2 points of Pare within a distance of 1or less to the

boundary of the grid cell they are contained in. Such a partition can be constructed in the

following way: We start by partitioning the plane into vertical stripes Ai={(x, y)∈R2:

2.5i ≤x < 2.5(i+1)}. For j∈[k], with k=4d1

e, let Rj=Si:imodk=jAi. Then we

have that there is m∈[k]with |Rm∩P|≤n/4. If we put the vertical lines of the grid

cells in the middle of the stripes in Rmthen at most n/4 points are within a distance of 1

to these vertical lines. A similar argument applied to horizontal lines gives us the desired

grid partition.

For the analysis we delete all points that are within a distance of 1or less to the bound-

ary of their grid cell from the point set P. This way we obtain the point set P0. Let

Xconfl ⊆P0be a minimal-size set of points such that P0\Xconfl has a valid labeling. Since

Pis -far from having a valid labeling we know that Xconfl > n/2. Now we consider a

single grid cell C. By the construction of P0we know that the labeling of the points in C

cannot intersect with labels from points outside of C.

Claim 5.3.4 If S∩Xconfl 6=∅then algorithm LABELTEST rejects.

Proof : Let xbe a point from Xconfl that is contained in Sand let Cdenote the grid cell

containing x. The while-loop either adds all points in Rto Tor it finds out that more than

(20d1/e+2)2points are contained in R. In the case that Tcontains all points in Rit must

also contain all points from C. Thus the points in Tdo not have a valid labeling. In the

case when we have more than (20d1/e+2)2points in Twe would have to fit more than

(20d1/e+2)2unit square labels on an area of size (20d1/e+2)2. Thus there cannot be

a labeling in this case, either. Hence the algorithm rejects, if S∩Xconfl 6=∅.2

By Claim 5.3.4 the theorem follows with

PrS∩Xconfl =∅≤(1−/2)4/ ≤2/3

and so PrS∩Xconfl 6=∅≥2/3 .

5.4 Clustering Problems

In this section we revisit the radius and diameter clustering problem as defined in Chapter

4. Recall that the goal of a clustering problem is to decide whether a set of npoints Pin

Rdcan be partitioned into ksubsets (called clusters)S1, . . . , Sksuch that the cost of each

cluster is at most b. There are several different ways to define the cost of a cluster. Let S

be a set of points in Rd. In the radius clustering problem the cost costR(S)of a cluster S

is twice the minimum radius of a ball containing all points of the cluster. In the diameter

5 Testing Algorithms with Geometric Queries

clustering problem the cost costD(S)of a cluster Sis the maximum distance between a

pair of points of the cluster.

The goal of our property tester is to accept all instances that admit a clustering into k

subsets of cost band to reject with high probability those instances that cannot be clustered

into ksubsets of cost (1+)b. Note that this distance measure is different from the one

used in Chapter 4.

Definition 5.4.1 A point set Pis (b, k)-clusterable for a cost measure cost(), if there is a

partition of Pinto sets S1,...Sksuch that cost(Si)≤bfor all 1≤i≤k. A point set P

is -far from being (b, k)-clusterable, if for every partition of Pinto sets S1,...Skat least

one set Sihas cost larger than (1+)b.

Let us assume, without loss of generality, that b=1and thus we want to design a

tester for the problem whether a point set Pis (1, k)-clusterable for the two cost measures

above. We partition Rdinto grid cells of side length /(3√d). For each cell containing an

input point, we choose an arbitrary input point from the cell as its representative. Then, we

compute whether the set of representatives is (1, k)-clusterable. If it is so, then we accept

it, if it is not so, then we reject it. Clearly, every set of points that is (1, k)-clusterable

is accepted by the algorithm. On the other hand, every instance that is -far from (1, k)-

clusterable will be rejected. (This approach has been introduced in [2] to obtain a (1+)-

approximation algorithm for the radius clustering problem.)

Our algorithms starts with an empty box with vertices at infinity. Then we query for

a point in this box. This point is located in a grid cell C. We will use the point as repre-

sentative for this cell. The bounding hyperplanes of Cinduce a partition of the space in

3dboxes (one of them being the grid cell). We mark all empty boxes and the grid cell.

Then we continue this process with an unmarked box. If there are only marked boxes the

algorithm terminates.

So far, our partition into grid cells works fine, only if there are many points in a single

cell. On the other hand, if no two points are in the same grid cell, the algorithm has Ω(n)

query complexity. Thus we need an upper bound on the number of representatives that are

(1, k)-clusterable. Similar to Chapter 4 we use a volume argument to show that we can

stop the process when we find a set of representatives of size k·(6√d/)d.

Lemma 5.4.2 Let Sbe a set of points in Rdno two points of which belonging to the same

cell of a grid of size /(3√d)< 1. If cost(S)≤1for any of the two cost measures

described above, then |S|≤(6√d/)d, where cost(·)∈{costR, costD}.

Proof : We first observe that if cost(S)≤1then Sis contained in a hypercube with

side length 1. We show that a no set Sof size more than (6√d/)dthat satisfies the

precondition of Lemma 5.4.2 is contained in a hypercube with side length 1. A hypercube

with side length 1 can contain at most (1

/(3√d)+2)d≤(6√d/)dgrid cells. This proves

Lemma 5.4.2. 2

Let V=k·(6√d/)dthe maximum number of cells that can contain points that

belong to one of the kclusters. We observe that we can stop our procedure if the number

5.4 Clustering Problems

of representatives is more than V. Thus, we can guarantee that the algorithm requires at

most V·3drange queries.

Theorem 17 There is an property tester for the radius clustering and diameter clustering

problem that uses at most k·(18 √d/)dorthogonal range queries.

Proof : We first describe the algorithm:

CLUSTERTEST(P, k, b, ):

B= infinite box

S=∅

Initialize empty queue Q

insert Binto Q

while Qis not empty do

B=Q.head()

if Bcontains a point pthen

find the grid cell Cthat contains p

S=S∪{p}

split Binto 3dsub-boxes using the bounding hyperplanes of C

insert all sub-boxes (except for C) into Q

if there are more than Vcells with representatives then

reject

if Sis (k, b)-clusterable then accept

else reject

We charge to each point p∈Sthe 3d−1range queries that have to be done to check

the 3d−1new boxes (other than the grid cell containing p) that are created when pis

selected to be the representative for a grid cell. Thus the overall number of range queries

is at most V·3d=k·(18 √d/)d.

Now we have to prove that our algorithm is a property tester. Clearly, if Pis (k, b)-

clusterable then so is every subset of Pand so the algorithm accepts. If the algorithm

rejects because there are too many cells with representatives then by Lemma 5.4.2 the

input points set is not (k, b)-clusterable. Thus it remains to show that the algorithm rejects

every point set that is -far from (k, b)-clusterable.

Let us assume that Pis -far from (k, b)-clusterable. Then for every partition into sets

C1, . . . , Ckthere is a set Ciwith cost(Ci)>(1+)b. By definition of the diameter

clustering problem this means that there are two points p, q ∈Cisuch that dist(x, y)>

(1+)b. Since the diagonal of a grid cell is at most b/3 the distance between the

representatives of the cells of pand qis at least (1+/3)b. Thus the set Scannot be

clustered and the algorithm rejects.

Similar arguments show that every -far instance of the radius clustering problem is

rejected. 2

5 Testing Algorithms with Geometric Queries

6 Property Testing and Moving Data

In this chapter we apply the concept of property testing in the context of moving data. We

develop a theoretical model for the analysis of approximate combinatorial structures under

a very general type of motion. Then we illustrate our model on some examples. Among

these examples are range trees and the Euclidean minimum spanning tree of objects mov-

ing in the R2. We consider the following general scenario: We are given a set Oof n

objects and these objects are moving in the Rd. We assume that the objects are moved by

a third party. We have no information about the future movement of the objects. The only

thing we are allowed to do is to update the current position of an object at certain points of

time and at unit cost per update. Such an update is also called an update query. Our goal

is to approximately maintain a combinatorial structure defined uniquely by the positions

of these objects. At certain points of time there are queries to the combinatorial structure

we maintain. Before we answer such a query we may update some (possibly all) object

positions and perform some additional computations. After these updates we require that

our structure (which we call hypothesis) is close to correct with probability at least 2/3.

Then the query is answered. The combinatorial structure we maintain together with the

update scheme we use is called a soft kinetic data structure. The quality of a soft kinetic

data structure is measured by a form of competitive analysis. For every possible (non de-

generate) continuous motion of the objects we compare the number of updates made by

our algorithm against the dynamics of the motion. The dynamics is measured in terms

of combinatorial changes that occur if the combinatorial structure is correctly maintained

for the complete motion. The worst case ratio (over all continuous motions) between the

number of updates and the dynamics is called the dynamic efficiency of a soft kinetic data

structure.

Our strategy to design a soft kinetic data structure is the following: We first design a

property tester for the combinatorial structure we want to maintain. If this property tester

rejects a hypothesis it should provide a witness that this hypothesis is not correct. Then we

try to find a strategy that fixes such a witnessed error by applying some local changes in

the structure.

Each time before we have to answer a query we run the property tester. If the property

tester rejects, it provides a witness. The witnessed error is then corrected and we invoke

the property tester again. In order to show that this process terminates we have to prove

that every error correction brings the combinatorial structure closer to the correct structure.

In order to analyze the dynamic efficiency of such a soft kinetic data structure we have to

relate the number of corrections we do to the number of combinatorial changes induced by

the underlying motion.

6 Property Testing and Moving Data

Our model is partly inspired by the framework of kinetic data structures by Basch et

al. [12]. In their framework Basch et al. make different assumptions on the motion of the

objects than we do. In particular, they assume that there is a flight plan that describes the

near future motion of every object. This flight plan is known to the algorithm and changes

in the flight plan are passed to the algorithm. The information provided by the flight plan

results in a completely different model of computation. The analysis of a kinetic data

structure is in terms of combinatorial changes of the maintained structured and inspired

our way to analyze the dynamics.

6.1 Soft Kinetic Data Structures

In this section we describe the framework of soft kinetic data structures.

Moving Objects. We first introduce some terminology used to describe objects and

their motion. We are given a set O={O1, . . . , On}of nobjects (e.g., points, balls, cubes)

that move in the Rd. The motion of the objects is described by a motion plan. A motion

plan contains the position pi(t)of each object Oi∈ O at every point of time t≥0. The

entity of all positions of objects P(t) = {p1(t), . . . , pn(t)}is called the configuration of

Oat time t. With this definition a motion plan is a function that assigns a configuration to

each point of time t∈R,t≥0. We assume that the motion plan is fixed but unknown to

what we later define as a soft kinetic data structure. The way we can access the positions

of our objects is similar to property testing: If we fix a point of time tthen the current

configuration of the objects can be seen as a point set. We may then update the position of

every point by specifying its index. This is similar to our representation of point sets used

in the context of property testing. We may update the position of an object certain points

of time.

We say that a motion plan is continuous if for each t, δ ∈R, t ≥0, there exists an ε0

such that |pi(t) − pi(t+ε)|≤δholds for each i, 1 ≤i≤n, and every ε≤ε0.

Hypotheses. In our model each of the nobjects is represented by a unique object

identifier which is a number from [n]. At every point of time tthe soft kinetic data struc-

ture may update the current position pi(t)of object Oiusing the object identifier i. We

introduce object identifier to syntactically distinguish between objects and their current

position. That is, we may, for example, compute a combinatorial structure defined over

the objects without knowing their current positions. In our framework we are interested in

combinatorial structures that are uniquely defined by the objects’ positions. When the set

of objects is a point set then some examples for such structures could be sorted sequences

in 1D, Voronoi diagrams, or the Euclidean minimum spanning tree. We use the fact that

we can build such a structure over a given set of objects even if we do not know the current

object configuration. In such a case we make a guess about the position of each object

and then we compute the structure. These guesses are usually based on the past config-

urations of the point set and on additional information obtained by update queries. The

computed structure may or may not be consistent with the current object configuration.

6.1 Soft Kinetic Data Structures

Therefore, we call such a structure a hypothesis. Given a distance measure σa hypothesis

His called -close to correct if the distance σ(H, H∗)to the correct hypothesis is at most

. Otherwise, we say that His -far from correct.

Now we want to illustrate our definitions on the example of a point set in 1D and the

combinatorial structure ’sorted sequence’. When we consider point sets then at every fixed

point of time tthe current configuration P(t)is a point set represented essentially in the

same way as in the context of property testing in the previous chapters. In particular, we

have that the object identifiers are the numbers from [n]and the current position of each

point can be accessed by an update query for the value of pi(t). Since we consider a point

set in one dimension we have a total ordering defined by the positions of the points. This

total ordering corresponds to a permutation of [n]which in turn can be viewed as a sorted

sequence. Now assume we do not know the positions of the points at a fixed point of time

t. Then we can still make some guesses (e.g. based on the past configurations we know)

about the positions of the points and compute a permutation πof [n]. Such a permutation

is a hypothesis for the sorted sequence defined by the configuration P(t).

The Objective of Soft Kinetic Data Structures. The objective of a soft kinetic

data structure is to maintain a hypothesis for a predetermined combinatorial structure un-

der an unknown motion plan. At certain points of time the soft kinetic data structure has

to answer a query about the hypothesis it maintains. Such a query is usually the standard

query supported by the combinatorial structure (typically, some kind of access or search

query). In our example of the sorted sequence a query may ask for the i-th item in the

sorted sequence. Each time before a query is processed, an algorithm called the reorga-

nizer is invoked. This algorithm may perform update queries about the current position

of some objects (we assume that the time does not proceed as long as the reorganizer is

working) and perform changes in the hypothesis. Technically, a reorganizer consists of a

property tester for the combinatorial structure and a procedure that corrects all errors in the

hypothesis that are detected by the property tester. After the reorganizer finished its work

the hypothesis should be -close to correct with probability at least 2/3. Then the query

algorithm is invoked and answers the query.

We assume that the soft kinetic data structure has to process a sequence of mchrono-

logical queries hQi= (Q1, . . . , Qm). Query Qitakes place at time tiand we associate

with each query the point of time when it takes place.

Soft Kinetic Data Structures. Now we can say that a soft kinetic data structure con-

sists of

•ahypothesis for a predetermined combinatorial structure of a set of npoints,

•areorganizer that is invoked each time before a query to the hypothesis has to be

processed,

•aquery algorithm that processes the queries to a hypothesis.

6 Property Testing and Moving Data

6.1.1 Analysis of Soft Kinetic Data Structures

In this section we explain how to evaluate the quality of a soft kinetic data structure. Our

concept is to apply competitive analysis. That is, for every sequence of queries and every

continuous motion plan Mwe compare the number of update queries made by our soft

kinetic data structure against the number of combinatorial changes that occur in a correct

hypothesis when the points move according to M.

Let Mdenote an arbitrary motion plan describing a continuous motion and let CS

denote a combinatorial structure uniquely defined by a configuration of O. The dynamics

dynCS(M)of Mw.r.t. CS is defined as the number of combinatorial changes in CS when

we move the objects according to Mfrom time 0to ∞. The term combinatorial change

depends on the problem at hand. For example, when the objects are points moving in the

Rand the combinatorial structure CS is a sorted sequence then a combinatorial change

takes places at each point of time twhen the ordering in the sorted sequence changes.

When the objects are points and the structure is a Euclidean minimum spanning tree then

a combinatorial change takes place when the graph structure changes. We always assume

that the motion plan is non degenerate, that is, at a fixed point of time there can be at most

one combinatorial change in the structure.

Now let us considered a given soft kinetic data structure DS. For a given sequence

hQiof mqueries Q1, . . . , Qmtaking place at time t1, . . . , tiand a motion plan Mwe

denote the update cost cost(hQi,M)of DS to be the number of update queries done

by the reorganizer when the objects move according to M. The dynamic efficiency of a

soft kinetic data structure is roughly the worst case ratio between the update cost and the

dynamics.

Definition 6.1.1 The dynamic efficiency deff of a soft kinetic data structure is defined as

follows:

deff := sup

hQi,M

costhQi,M

dynCSM+m

where the supremum is taking over all non-degenerate motion plans Mdescribing a con-

tinuous motion and all possible sequences of mqueries hQi.

A second quality measure for soft kinetic data structures is the update time. The up-

date time is the worst case ratio between overall running time of the reorganizer and the

number of update queries. We denote by time(hQi,M)the overall running time used by

the reorganizer for sequence hQiunder a motion plan M.

Definition 6.1.2 The update time of a soft kinetic data structure is defined as

tu:= sup

hQi,M

timehQi,M

costhQi,M+m

where the supremum is taking over all non-degenerate motion plans Mdescribing a con-

tinuous motion and all possible sequences of mqueries hQi.

100

6.2 Basic Soft Kinetic Data Structures

6.1.2 Discussion of The Model

There are a few points in our model that require a further discussion. First of all, let us

make some notes on the assumption that the points do not move while the reorganizer

works. We made this assumption because we have no other restrictions on the motion. As

soon as we allow the points to move during this process in a theoretical model there is no

real chance to make reasonable statements about the quality of the structure. In an im-

plementation the reorganizer is a continuous process typically running in the background.

When a query arrives we stop the reorganizer and answer the query. Typically, the changes

that occur during the time when the query is processed are neglectable.

A second point that requires discussion is the distance measure needed to evaluate the

quality of a hypothesis. Unlike in the case of property testing we want to use our soft

kinetic data structure to answer queries. Therefore, a distance measure should reflect the

’correctness of a hypothesis w.r.t. the query algorithm’ rather than the pure combinatorial

distance to a correct combinatorial structure. For example, if we consider soft kinetic

search trees then a query could be an access operation to an item stored in the tree. In such

a case a single error at the root of the tree can make almost every access operation fail.

Although the combinatorial distance to a correct structure would be rather small in such

a case, a hypothesis for a balanced search tree having such an error should be -far from

correct. In general, we want to have a distance measure that guarantees that, typically, an

access operation works correctly.

6.2 Basic Soft Kinetic Data Structures

In this section we describe how to apply our framework to some combinatorial structures

like sorted sequences, balanced search trees, range trees and Euclidean minimum spanning

trees.

6.2.1 Sorted Sequences

The first example for a soft kinetic data structure we consider is that of a sorted sequence.

W.l.o.g., we assume that our set of objects Ois a set of npoints moving in R. We assume

that the total order induced by the positions of the objects at a fixed point of time is always

unique. That is, we never have two objects with the same position. We can achieve this

by using an arbitrary tie-breaker (e.g., the object number). The object identifiers of O

are stored in an array A[1...n]. When we consider continuous motion the topological

structure of a sorted sequence changes when the positions v1, v2of two points change from

v1< v2to v1> v2. We say that a permutation A(t) = (A[1], . . . , A[n]) of [n]stored in

Aat time tis a hypothesis that is -close to correct, if the sequence pA[1](t), . . . , pA[n](t)

has edit distance (see Definition 6.2.1 for an equivalent definition) at most n to a sorted

sequence. We use the following definition:

Definition 6.2.1 Let pA[1](t), . . . , pA[n](t)denote a sequence stored in an array A[1...n]

at time t. We say that pA[1](t), . . . , pA[n](t)is -close to sorted, if the longest increasing

101

6 Property Testing and Moving Data

subsequence in pA[1](t), . . . , pA[n](t)has length at least (1−)n. Otherwise, we say that

pA[1](t), . . . , pA[n](t)is -far from sorted.

Now we are given a sequence of maccess queries hQi= (Q1, . . . , Qm)to our array

and we assume that query Qitakes place at time ti. Before the query is processed the

reorganizer is invoked. This algorithm checks whether the current hypothesis is -close to

correct. In order to do so, it invokes a property tester SORTEDTEST for sorted sequences

from [40]. This property tester checks, if a sequence of number pA[1](t), . . . , pA[n](t)is a

sorted sequence or -far from sorted (where the distance is measured according to the edit

distance).

If this algorithm rejects it returns two indices k, l with k<land pA[k](t)> pA[l](t).

We then use a binary search like procedure to find an index k0such that pA[k0](t)>

pA[k0+1](t)and swap the object identifier stored in A[k0]and A[k0+1]. Then we make

two recursive calls to the reorganizer:

ARRAYREORGANIZER(A, t, )

if SORTEDTESTpA[1](t), . . . , pA[n](t), rejects then

Let (k, l)be the pair returned by SORTEDTEST pA[1](t), . . . , pA[n](t), 

k0=FINDINVERSION(A, k, l)

swap(A[k0], A[k0+1])

ARRAYREORGANIZER(A, t, )

From the pair of indices (k, l)returned by the property tester SORTEDTEST from [40]

algorithm FINDINVERSION can compute a pair (k0, k0+1)with pA[k0](t)> pA[k0+1](t)

by a binary search like procedure in the following way: As long k6=l−1we compute

m=dk+l

2eand proceed either with (k, m)or with (m, l)depending on the value of

pA[m](t).

We now want to prove that algorithm ARRAYREORGANIZER computes a hypothesis

that is -close to correct.

Lemma 6.2.2 Let A(t) = A[1], . . . , A[n]be -far from a correct hypothesis for a

sorted sequence. Then algorithm ARRAYREORGANIZER computes with probability at

least 2/3 a hypothesis that is -close to correct.

Proof : We start with a simple claim about the edit distance:

Claim 6.2.3 Let k0denote an index such that pA[k0](t)> pA[k0+1](t). If we swap A[k0]

and A[k0+1]then the edit distance of pA[1](t), . . . , pA[n](t)to a sorted sequence does not

increase.

Proof : pA[1](t), . . . , pA[n](t)has edit distance k, if and only if the longest increasing

subsequence of pA[1](t), . . . , pA[n](t)has length n−k. The length of the longest increasing

subsequence cannot decrease when we swap A[k0]and A[k0+1]and hence the edit distance

cannot increase. 2

102

6.2 Basic Soft Kinetic Data Structures

Now we want to analyze the probability that ARRAYREORGANIZER stops when the

hypothesis is -far from correct. We assume that the probability that SORTEDTEST accepts

a sequence that is -far from sorted is less than 1/10. If the probability is larger then we can

use standard amplification arguments to get the desired probability. Now we observe that

for the case that ARRAYREORGANIZER terminates when the input is -far from correct

there is a witness in form of a binary tree. Each node of this tree corresponds to a call

of ARRAYREORGANIZER that rejects the current hypothesis and it has a left (right) child

if the first (second) recursive call rejects the hypothesis. For example, if the first call to

ARRAYREORGANIZER accepts the hypothesis, then the corresponding tree is the empty

tree and it appears with probability 1

10 . It is well known that the number of binary trees

with knodes is less than 4k.

Now let us fix a binary tree Twith knodes and analyze the probability that the be-

havior of the algorithm is according to the witness T. We observe that the probability that

ARRAYREORGANIZER behaves according to Tis at most (1

10 )k+1because each binary tree

with knodes has k+1empty leafs and each empty leaf corresponds to a call to the reor-

ganizer that accepted a sequence that is -far from sorted. The overall probability that the

process stops before the hypothesis is -close to correct is bounded from the above by

0≤i≤∞

4i(1

10)i+1≤1

10 X

0≤i≤∞

5)i≤1

This proves Lemma 6.2.2. 2

Let Mdenote an arbitrary non-degenerate continuous motion plan. In the following we

want to make some observations w.r.t. the combinatorial changes induced by the motion

plan. First let us observe that any combinatorial change in the sorted sequence can be

written as a reversal σhi, i +1ifor some i∈[n−1](unless we have a degenerate case

wheremore than 2 points areat thesame position). Such a reversal is called atransposition.

Now let hQi= (Q1, . . . , Qm)denote a sequence of mqueries and let us denote by tithe

time when query Qitakes place and t0=0. Now we derive a lower bound on the number

of combinatorial changes that can be expressed in terms of the configurations of the point

set at times ti,1≤i≤m. For this purpose let us introduce some notation. For a number

πiin a permutation π= (π1, . . . , πn)of [n]let us define the rank(πi)as the number of

elements πjwith πj< πiand j>i. Then we define the weight w(π)of a permutation π

as Pi∈[n]rank(πi). It is well known that we can write every permutation πas a sequence

of w(π)transpositions and that every sequence of transpositions equal to πhas length at

least w(π).

We further observe that at every point of time tthere is a unique permutation π=

(π1, . . . , πn)such that pπ1(t)<··· < pπn(t). Thus we can write πi= (πi

1, . . . , πi

n)to

denote the permutation that satisfies pπi

1(ti)<··· < pπi

n(ti), that is, πidenotes the sorted

sequence at the time when the i-th query takes place. Then we can define permutations

σ1,...σmby the equation:

πi·σi=πi+1.

103

6 Property Testing and Moving Data

Now we can give a lower bound on the dynamics dyn(M)of the motion plan Mw.r.t. the

combinatorial structure sorted sequence:

dyn(M)≥X

1≤i≤m

w(σi).

We use this lower bound to show that the number of errors corrected by the reorganizer is

at most the dynamics of the motion plan M.

Lemma 6.2.4 Let hQi= (Q1, . . . , Qm)be an arbitrary sequence of mqueries and let M

be an arbitrary non-degenerate continuous motion plan. If query Qitakes place at time ti

and the reorganizer corrects err(i)errors at time ti, then we have:

1≤i≤m

err(i)≤dyn(M)

where dyn(M)denotes the dynamics of the motion plan Mw.r.t. the combinatorial struc-

ture sorted sequence.

Proof : For a given point of time tilet A(ti) = (A[1], . . . , A[n]) denote the permutation

defined by the elements of array Aat time ti. Further let us define the permutation σC(ti)

in the following way:

A(ti)·σC(ti) = πi.

Hence the permutation σC(ti)denotes the changes that must be performed to transform

the array at time tiinto a sorted sequence. The weight w(σC(ti)) denotes the minimum

number of transpositions necessary to transform the array into a sorted sequence. We prove

by induction on the number of queries that for k≤mthe following inequality holds:

1≤i≤k

err(i) + w(σC(tk)) ≤X

1≤i≤k

w(σi).

Since every permutation can be expressed as a sequence of transpositions we know that

we can apply the triangle inequality to the weight of permutations. This way, we observe

that the number of errors corrected in the (k+1)-st call of the reorganizer is at most

err(k+1)≤w(σC(tk)·σk+1)− w(σC(tk+1)) ≤w(σC(tk))+ w(σk+1)− w(σC(tk+1)) .

And hence err(k+1) − w(σC(tk)) + w(σC(tk+1)) ≤w(σk+1).

This proves the induction hypothesis for every k≤mand we get

1≤i≤m

err(i)≤X

1≤i≤m

err(i) + w(σC(tm)) ≤X

1≤i≤m

w(σi)≤dyn(M)

which completes the proof of Lemma 6.2.4. 2

Now we are ready to prove:

104

6.2 Basic Soft Kinetic Data Structures

Theorem 18 There is a soft kinetic sorted array with dynamic efficiency O(logn

)and up-

date time O(1). It answers queries about the i-th object in the soft kinetic sorted array in

time O(1).

Proof : By Lemma 6.2.2 we know that our reorganizer satisfies the requirements of a soft

kinetic data structure, that is, with probability at least 2/3 the sorted sequence maintained

in the array is -close to correct (after the reorganizer has been invoked). Thus we only

have to analyze the dynamic efficiency and update time of the data structure. The property

tester SORTEDTEST makes Θ(logn/)(update) queries and has the same running time.

The procedure FINDINVERSION makes O(logn)update queries and its running time is

proportional to the number of performed update queries. For a sequence of queries hQi=

(Q1, . . . , Qm)the number of calls to the reorganizer is m+’twice the number of errors

corrected by the reorganizer’. Thus, together with Lemma 6.2.4 we get that the dynamic

efficiency of the data structure is O(logn/). The update time and the query time of O(1)

are immediate because for each update we do only a constant amount of work and the

query algorithm just has to return one entry in the array. 2

6.2.2 Balanced Search Trees

The next structure we want to look at is a binary search tree. A (balanced) binary search

tree is a rooted tree. At each node vof the tree a key KEY(v)is stored. For every node vin

the tree and every node uin the left (right, respectively) subtree of vit holds that KEY(v)≥

KEY(u)(KEY(v)<KEY(u)). In the case of soft kinetic data structures each node stores

an object identifier. The key of a node is the position of the corresponding object (w.l.o.g.,

we assume that the objects are points). Again we assume that every configuration of points

induces a unique ordering among the objects (that is, the case ’=’ does not occur). We want

to consider standard access (search) queries in binary trees. At first glance the problem

seems to be similar to the case of soft kinetic arrays. The combinatorial structure we use

for the analysis is (essentially) a sorted sequence. Thus we have a combinatorial change in

the search tree, if there is a combinatorial change in the sorted sequence. The difference

lies in the distance function used for soft kinetic balanced search trees. In the case of

binary trees the edit distance to a sorted sequence is not a good distance measure w.r.t. the

functionality of the trees. If only a single object is not at its correct position in the sorted

sequence and this object is stored at the root of the tree then it might be the case that a huge

fraction of access operations fails. Therefore, we consider a different distance measure:

Definition 6.2.5 A search tree is -far from correct, if more than  n access operations

fail.

With this new distance measure we also need a new property tester for the invariant

of the search tree. But this is simple for balanced search trees with the distance measure

from above. We store all nodes of the search tree in an array such that we can pick a

node uniformly at random in constant time. Then for s≥2/ the following algorithm

is a property tester for the invariant of a search tree Tw.r.t. the distance measure from

Definition 6.2.5:

105

6 Property Testing and Moving Data

TREETEST(T, s, )

for i=1to s

pick a node vof the search tree uniformly at random

if node vcannot be accessed reject

The test whether a node can be accessed can be implemented by walking from node v

to the root and checking the invariant at each node of the search path. We show that for

s≥2/ algorithm TREETEST is a property tester.

Lemma 6.2.6 For s≥2/ algorithm TREETEST is a property tester for correctness of

balanced binary search trees. Its query complexity and running time is O(logn/).

Proof : Clearly, the algorithm accepts every correct search tree. Thus we assume that the

tree is -far from correct. Then the probability that we find a node that cannot be accessed

is more than in each pass of the for-loop. Hence the overall probability that the algorithm

reject is

PrAlgorithm TREETEST rejects≥1− (1−)2/ ≥1−e−2≥2/3 .

The query complexity and running time follow immediately from the fact that the search

tree is balanced. 2

If an object in the search tree cannot be accessed there must be an error on the search

path (the path from the object’s node to the root). We conclude that algorithm TREETEST

rejects, only if it finds two objects whose current position is inconsistent with the tree

order. Thus we are basically in the same situation as in the case of the sorted arrays. We

therefore maintain an array A[1...n]where the items are stored in the ordering induced

by the search tree (this can be the same array that is used for the sampling). Then we

can proceed similarly to the sorted arrays. We use a binary search to find two adjacent

array entries whose ordering is inconsistent with the corresponding object positions. Then

we swap these entries in the array and we swap the entries in the corresponding nodes in

the tree (these nodes must not be adjacent in the tree). Unfortunately, the change of the

distance function makes the proof for the sorted arrays invalid for the search trees. This is

because of the fact that reducing errors in the tree order by swapping two objects (in the

way just explained) does not necessarily decrease the distance to a correct data structure

in terms of the distance measure from Definition 6.2.5. In fact, in situations it can even

increase this distance. We must apply a different technique. We therefore change the

sample size of algorithm TREETEST in such a way that it accepts an input that is -far

from correct with probability at most 1

3n2:

Lemma 6.2.7 Let s≥2ln(3n)/. Then the probability that algorithm TREETEST ac-

cepts an input that is -far from correct is at most 1

3n2.

Proof : For the case s≥2ln(3n)/ we get:

PrAlgorithm TREETEST accepts≤(1−)2ln(3n)/ ≤1

3n2.

106

6.2 Basic Soft Kinetic Data Structures

Then we use the following reorganizer:

TREEREORGANIZER(T, t, )

if TREETEST(T, 2 ln(3n)/, )rejects then

Let (k, l)be the pair returned by TREETEST(T, 2 ln(3n)/, )

k0=FINDINVERSION(A, k, l)

swap(A[k0], A[k0+1])

swap the corresponding tree nodes

TREEREORGANIZER(T, t, )

Now we prove that the probability that the reorganizer terminates when the tree is -far

from correct is at most 1/3.

Lemma 6.2.8 The probability that TREEREORGANIZER stops when the hypothesis Tis

-far from correct is at most 1/3.

Proof : Let Adenote the array in which the object identifiers are stored with respect

to the tree order. Let A(t) = (A[1], . . . , A[n]) denote the permutation of [n]induced

by array Aat time t. Further let π= (π1, . . . , πn)denote the permutation such that

pπ1(t)<··· < pπn(t). Let

A(t)·σ=π .

Then we have that w(σ)≤n2−3by the definition of the weight of a permutation. We

further observe that each call of the reorganizer decreases the weight of σby 1. In the case

that Tis -far from correct we know that the reorganizer has to reject at most n2times. Let

Xdenote the indicator random variable for the event that algorithm REORGANIZER does

not stop when Tis -far from correct. Then we have:

PrX=1≥(1−1

3n2)n2−3≥1

√e≥2/3

since

PrAlgorithm TREETEST accepts≤1

3n2.

Lemma 6.2.4 holds also for trees because it does only assume that errors are corrected

by swapping two adjacent objects in the maintained sorted sequence (which is equivalent

to the tree order and is maintained in array A). Therefore we can conclude:

Theorem 19 There isa soft kinetic balanced searchtree with dynamicefficiencyO(log2n/)

and update time O(1). It answers access queries in time O(logn).

107

6 Property Testing and Moving Data

Proof : By Lemma 6.2.8 we know that our reorganizer satisfies the requirements of a

soft kinetic data structure. The number of update queries made by the property tester is

again proportional to its running time of O(log2n/). For a sequence of queries hQi=

(Q1, . . . , Qm)the number of calls to the reorganizer is m+’the number of errors corrected

by the reorganizer’. Together with Lemma 6.2.4 we get that the dynamic efficiency of the

data structure is O(log2n/). The update time of O(1)is immediate. The query algorithm

needs time O(logn)for an access query because the tree is balanced. 2

6.2.3 Range Trees

We now want to focus on structures from computational geometry. We first consider range

trees in one and two dimensions. Then we design soft kinetic Euclidean minimum span-

ning trees.

1D Range Trees. 1D range trees are balanced binary search trees (see [1] for a formal

definition of range trees). The only difference to a standard search tree lies in the type of

query. A query to a range tree specifies a query range [x, y]and the answer should return

all points within the interval [x, y]. The standard way to answer such queries is to search

for xand yin the range tree and then to report all points on the search paths that are within

the query range as well as all points between the two search paths. Obviously, we can

apply the analysis of soft kinetic search trees to range trees. So the only question is what

time is needed to answer a query. A query to a soft kinetic range tree works in a similar

way as a standard query to a range tree. But in soft kinetic range trees it may be the case

that a point whose position is not within the query interval is stored between the two search

paths (because there is an error in the structure). In such a case we do not report this point.

For each point between the two search paths that is not contained in the query interval

we can find an error in the data structure. This way we can amortize the time needed to

process those points against the dynamics of the system.

We can use soft kinetic data structures for binary search trees to obtain the following

result (see [1] for a formal definition of range trees):

Theorem 20 There is a soft kinetic 1D-range tree with dynamic efficiency O(log2n/)

and update time O(1). The soft kinetic range trees supports 1D-range queries in O(logn+

k)time, where kis the number of reported points (a point is reported, only if it is within

the query range). If k∗denotes the number of points in the query range (at the time the

query is processed) then it holds with probability at least 2/3 that k≥k∗− n.

Proof : We have to show that k≥k∗− n holds with probability 2/3 and we have

to prove the running time for the range queries. The other results follow from the binary

search trees. First of all, we can assume that the data structure is -close to correct with

probability 2/3 when the query is processed. This follows from Lemma 6.2.8 because the

reorganizer is called before a query is answered.

A range query for a one dimensional range tree is answered in the following standard

way: we search for the right and the left end of the query interval and output all nodes

108

6.2 Basic Soft Kinetic Data Structures

that are between the search paths. Before we report a point we check if it is contained

inside the query interval. If it is, we output it. If it is not we have found a conflict with

some node on the search paths. After all nodes inside the query interval have been reported

we partition the set of points Perr whose position is not in the query interval but that are

located between the two search paths into two sets. The first set P<x contains all points

of Perr whose position is smaller than x. The second set P>y contains all points of Perr

whose position is larger than y. We pick the larger set of these two and correct an error for

each point in this set. W.l.o.g. let us assume that P<x is the larger set. In the following we

identify the point with the node where it is stored. We observe that there are two different

orders among the points. The standard ’<’-relation and the tree order ≺T. We observe that

each point pin P<x is either stored in a right subtree of a node of the left search path or

in a left subtree of a node of the right search path. These are the two cases we consider.

Let us start with the case when pis stored in a right subtree of a point qon the left search

path. Then we observe that qmust be larger than x(because the search path took the left

edge). Therefore, we have p < q but q≺Tp. This is a conflict. In the second case, the

point pis stored in a left subtree of a point on the right search path. Then we know that the

point qstored at the split node (the node where the two search paths split) lies within the

search interval [x, y]. Since pis stored in a left subtree of the right path it must be larger

than q. But this is not the case. Hence there is a conflict because q≺Tp. For each point

we use the standard procedure to find two points that are adjacent in the tree order and we

swap them. Since we process the points in increasing order none of the conflicts of the

remaining points is resolved (though the ’conflict partner of a point may be moved in the

tree order).

The number of inversions corrected in this process is at least half the number of points

in the search interval that are not reported. Therefore we can amortize the update queries

and the time needed to deal with these points against the dynamics of the motion.

Now assume that less than k∗− n points are reported. Then there are more than n

nodes that cannot be accessed, because every access operation to one of the missing nodes

will end either on the search paths or between them. But the nodes that can be accessed

this way are the nodes reported by the query algorithm. Thus we have that the tree is -far

which occurs with probability at most 1/3.2

2D-Range Trees. In this section we describe and analyze 2-dimensional soft kinetic

range trees (see, e.g., [1] for a formal definition). We consider a standard implementation

of 2D-range trees. A 2D-range tree for a set of npoints in R2is a two level data structure.

The first level consists of a binary search tree (a 1D-range tree) sorted according to the

x-coordinate. The second level consists of ntrees (one for each node in the first level tree)

sorted according to the y-coordinates of the points. (see, e.g., [1] for more details).

Similar to the binary search trees we want to have a data structure that supports range

queries in a reasonable way. That is, we want to make sure that, if a structure is -close to

correct, it cannot happen that almost every range query fails.

We say that a point is accessible in a 2D range tree, if it can be accessed in the first

level tree using its x-coordinate as key and it can be accessed in all second level trees on

109

6 Property Testing and Moving Data

the search path using its y-coordinate.

Definition 6.2.9 A 2D range tree is -far from correct, if there are more than n points

that are not accessible.

Similar to the binary search trees we can design a simple property tester. We sample a set

of O(1/)nodes and check whether they are accessible by following the parents pointers

in the trees. We also use two arrays as auxiliary data structures. Array AXis used to define

the order of objects in the first level of the range tree and array AYdefines the order in

the second level. We maintain the invariant that at any time the order of the objects in

both levels is according to the order of the corresponding array. A combinatorial change

in a range tree takes places when the there is a change in the sorted sequence w.r.t. the

x-coordinate or in the sorted sequence w.r.t. the y-coordinate.

When the property tester rejects, it provides a proof that one of these arrays is not in

the correct order and we can find an adjacent pair that is in the wrong order (using the

same procedure as in the case of arrays and trees). Then we swap these two objects in the

corresponding array and we adjust the range tree to the changes we made. The adjustment

in the second level is done by deleting and reinserting the objects. This can be done in

O(log2n)time.

If we swap two objects in the first level tree, we delete all occurrences in the second

level trees. For this purpose we maintain pointers to all occurrences of an object in the

secondleveltrees. Then we swapthetwonodes inthe first level tree andreinsert theobjects

into the second level trees (using the ordering maintained in array AYfor comparisons).

This procedure can be done in O(log2n)time.

Now we can proceed in a similar way as for the search trees. We first show that the

following algorithm is a property tester for correctness of range trees that accepts a range

tree with probability at most 1

6n2if it is -far from correct.

RANGETREETEST(T, s, )

for i=1to s

pick a node vof the search tree uniformly at random

if node vis not accessible then reject

Lemma 6.2.10 Let s≥2ln(6n)/. Then algorithm RANGETREETEST is a property

tester. If an input tree Tis -far from correct then it is accepted with probability at most

6n2.

Proof : Since algorithm RANGETREETEST accepts every correct tree we only have to

analyse the probability of rejectance. Let Tbe a range tree that is -far from correct. Then

we have

Pralgorithm RANGETREETEST accepts≤(1−)2ln(6n)/ ≤1

6n2

110

6.2 Basic Soft Kinetic Data Structures

We already explained the way the errors are corrected by the reorganizer. The remain-

ing part of the reorganizer is similar to the one for search trees.









RANGETREEREORGANIZER(T, t, )

if RANGETREETEST(T, 2 ln(3n)/, )rejects then

correct the detected error

RANGETREEREORGANIZER(T, t, )

Lemma 6.2.11 The probability that RANGETREEREORGANIZER stops when the hypoth-

esis Tis -far from correct is at most 1/3.

Proof : The distance between the hypothesis and a correct range tree is at most two times

the distance in the one dimensional case (because we have two sorted sequences instead

of one). In the case that Tis -far from correct we know that the reorganizer has to reject

at most 2(n2−3)times. Let Xdenote the indicator random variable for the event that

algorithm REORGANIZER does not stop when Tis -far from correct. Then we have:

PrX=1≥(1−1

6n2)2(n2−3)≥1

√e≥2/3

since

PrAlgorithm RANGETREETEST accepts≤1

6n2

Checking whether a point is accessible takes O(log2n)time. We conclude:

Theorem 21 There is a soft kinetic 2D-range trees with dynamic efficiency O(log3n/)

and update time O(1). The soft kinetic range tree supports 2D-range queries in O(log2n+

k)time where kis the number of reported points in the given query range. If k∗denotes

the number of points in the given query range then it holds with probability 2/3 than

k≥k∗−n.

Proof : The analysis of the running time of the queries is similar to the 1D case. And

again, if there is a query that returns less than k∗− n points, then the structure must be

-far which is a contradiction. 2

6.2.4 Euclidean Minimum Spanning Trees

The last structure we consider is a Euclidean minimum spanning tree of a point set in the

R2. In order to design a soft kinetic Euclidean minimum spanning tree we use the property

tester and the distance measure from Section 3.3. Recall that this property tester only

rejects, if it finds a cycle in the EMST-completion of the input graph. The longest edge

of such a cycle cannot be in the EMST of the underlying point set. We use the following

reorganizer:

111

6 Property Testing and Moving Data

EMSTREORGANIZER(T, t, )

if EMSTTESTER (T, )rejects then

Let edenote the edge that is not in the EMST

mark eas deleted

increase counter for deleted edges by 1

if n/200 edges have been deleted then

recompute the EMST from scratch (query all point positions)

set counter to 0

EMSTREORGANIZER(T, t, )

Here EMSTTESTER(T, )denotes the property tester for EMSTs in degree bounded

graphs from Section 3.3 with the exception that we do not test the connectivity of the

tree. In the algorithm EMSTTESTER marked edges will be treated as missing edges. Since

the number of marked edges is at most n/200 our graph is always /200-close to con-

nected. By the analysis from Section 3.3 it follows that algorithm EMSTTESTER rejects

the hypothesis with probability at least 2/3, if it is -far from EMST.

Theorem 22 There is a soft kinetic Euclidean minimum spanning tree with dynamic effi-

ciency O(pn/ ·log(n/)) and update time O(logn/). Queries about incident edges

to a vertex vcan be answered in time O(1).

Proof : We first observe that the number of calls to the reorganizer is at most m+

’number of detected errors’. If we found n/200 errors then we correct all of them at

once by recomputing the EMST from the scratch. This requires nupdate queries and

O(nlogn)time. We observe that for each of the n/200 errors there must be a change

in the combinatorial structure of the Euclidean minimum spanning tree. Therefore, we

can amortize the number of update queries and the running time over the n/200 errors.

We obtain that the dynamic efficiency is dominated by the property tester and therefore

O(pn/ ·log(n/)). The update time is dominated by the time needed to recompute the

EMST and therefore O(logn/).2

112

7 Conclusions

In this thesis we studied geometric problems in the context of property testing and we

applied concepts from geometry to property testing. In Chapter 3 we designed property

testing algorithms for specific problems including disjointness of geometric objects, con-

vex position, and the Euclidean minimum spanning tree. Our algorithms have sublinear

query complexity and for some of them we give matching lower bounds. We also believe

that every property tester for the Euclidean minimum spanning tree in 2D has a query

complexity of Ω(√n)though we do not have a proof of that.

We think that property testing is an interesting concept, i.e., if applied to geometric

problems. We hope that further research in this area will provide new insights into the

structure of geometric properties. Besides the development of new property testing algo-

rithms it could be also an interesting problem to use existing property testers in the de-

sign of sublinear algorithms for geometric problems. For example, in the recent sublinear

approximation algorithm for the weight of a minimum spanning tree in degree bounded

graphs by Chazelle, Rubinfeld, and Trevisan [25] a property tester for connectivity in

graphs has been used as a subroutine.

In Chapter 4 we designed a framework for property testing algorithms with one-sided

error. The achievement of our framework is a nice decoupling of the probabilistic and

the combinatorial aspects of property testing. As a consequence of this, a proof using our

framework is usually elegant and gives a clear view of the important combinatorial features

of the problem. We applied our framework to analyze a couple of different property testers

and we showed that a large class of graph properties can be tested efficiently, if and only

if there is a proof using our framework. There are several open problems regarding our

framework and we want to mention two of them here:

The first problem is to extend the framework to algorithms with 2-sided error. The

question is here, if it is possible to formulate a framework for a 2-sided error model and if

such a model is useful for the design and analysis of property testing algorithms. Another

interesting problem is to prove in our framework that triangle-free graphs can be tested

efficiently. Although it is known that triangle-free graphs can be tested efficiently, the only

known proof uses Szemer´

edi’s Regularity Lemma. A proof using other techniques might

provide new insights into property testing and the structure of dense graphs.

Chapter 5 and 6 were devoted to the design of new models that involve property test-

ing. In Chapter 5 we introduced the possibility of range queries to property testing and

showed that the query complexity for some problems can be dramatically improved, if

range queries are allowed. Very recently, in a related model a sublinear time approxima-

tion algorithm for the weight of the Euclidean minimum spanning tree of a point set in

113

7 Conclusions

the Rdhas been developed [27]. Is it possible to find other sublinear time approximation

algorithms in such a model ?

In Chapter 6 we designed a model (called soft kinetic data structures) to analyze the

maintainance of combinatorial structures under a very general model of motion. We ana-

lyzed some basic soft kinetic structures. Besides the obvious problem to design other soft

kinetic data structures it would be very interesting to further develop models for moving

data. Keeping the results from Chapter 5 in mind an interesting direction of research is to

find out which queries should be supported by mobile objects in order to design efficient

algorithms for mobile data.

114

Bibliography

[1] P. Agarwal. Range searching. Handbook of Discrete and Computational Geometry,

1997.

[2] P. Agarwal and C. Procopiuc. Exact and approximation algorithms for clustering.

In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms

(SODA), pages 658–667, 1998.

[3] N. Alon. Testing subgraphs in large graphs. In Proccedings of the 42nd Annual IEEE

Symposium on Foundations of Computer Science (FOCS), pages 434 – 441, 2001.

[4] N. Alon, S. Dar, M. Parnas, and D. Ron. Testing of clustering. In Proceedings of the

41st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages

240–250, 2000.

[5] N. Alon, E. Fischer, M. Krivelevich, and M. Szegedy. Efficient testing of large

graphs. Combinatorica, 20(4):451 – 476, 2000.

[6] N. Alon and M. Krivelevich. Testing k-colorability. SIAM Journal on Discrete Math-

ematics, 15:211 – 227, 2002.

[7] N. Alon, M. Krivelevich, I. Newman, and M. Szegedy. Regular languages are testable

with a constant number of queries. SIAM Journal on Computing, 30(6):1842 – 1862,

2001.

[8] N. Alon and A. Shapira. Testing satisfiability. In Proceedings of the 13th Annual

ACM-SIAM Symposium on Discrete Algorithms (SODA), 2002.

[9] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and

M. Protasi. Complexity and approximation. combinatorial optimization problems

and their approximability properties. Springer, New York, NY, 1998.

[10] V. Bafna and P. A. Pevzner. Genome rearrangements and sorting by reversals. SIAM

Journal on Computing, 25(2):272 – 289, 1996.

[11] Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Sampling algorithms: Lower bounds

and applications. In Proceedings of the 33th Annual ACM Symposium on Theory of

Computing (STOC), pages 266–275, 2001.

115

Bibliography

[12] J. Basch, L. Guibas, and J. Hershberger. Data structures for mobile data. Journal of

Algorithms, 31(1):1 – 28, 1999.

[13] T. Batu, E. Fischer, L. Fortnow, R. Kumar, R. Rubinfeld, and P. White. Testing

random variables for independence and identity. In Proceedings of the 42nd Annual

IEEE Symposium on Foundations of Computer Science (FOCS), pages 442 – 451,

2001.

[14] T. Batu, L. Fortnow, R. Rubinfeld, W. Smith, and P. White. Testing closeness of

distributions. In Proceedings of the 41st Annual IEEE Symposium on Foundations of

Computer Science (FOCS), pages 259 – 269, 2001.

[15] M. A. Bender and D. Ron. Testing acyclicity of directed graphs in sublinear time. In

Proceedings of the Annual Interrnational Colloquium on Automata, Languages and

Programming (ICALP), pages 809–820, 2000.

[16] J. L. Bentley and T. A. Ottmann. Algorithms for reporting and counting geometric

intersections. IEEE Trans. Comput., C-28:643 – 647, 1979.

[17] P. Berman, S. Hannenhall, and M. Karpinski. 1.375-approximation algorithm for

sorting by reversals. Electronic Colloquium on Computational Complexity, Technical

Report Series, 8(47), 2001.

[18] P. Berman and M. Karpinski. On some tighter inapproximability results, further

improvements. In Proceedings of the 26th Annual Interrnational Colloquium on

Automata, Languages and Programming (ICALP), pages 200 – 209, 1999.

[19] A. Bogdanov, K. Obata, and L. Trevisan. A lower bound for testing 3-colorability

in bounded-degree graphs. In Proceedings of the 43th Annual IEEE Symposium on

Foundations of Computer Science (FOCS), pages 93–102, 2002.

[20] B. Bollob´

as. Hereditary properties of graphs: Asymptotic enumeration, global struc-

ture, and colouring. Documenta Mathematica, III:333 – 342, 1998.

[21] H. Buhrman, L. Fortnow, I. Newman, and H. R¨

ohrig. Quantum property testing.

In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms

(SODA), 2003.

[22] A. Caprara. Sorting permutations by reversals and Eulerian cycle decompositions.

SIAM Journal on Discrete Mathematics, 12(1):91 – 110, 1999.

[23] C. Carath´

eodory. ¨

Uber den Variabilit¨

atsbereich der Fourierschen Konstanten von

positiven harmonischen Funktionen. Rendiconto del Circolo Matematico di Palermo,

32:193–217, 1911.

[24] T. M. Chan. Output-sensitive results on convex hulls, extreme points, and related

problems. Discrete & Computational Geometry, 16(3):369–387, 1996.

116

Bibliography

[25] B. Chazelle, R. Rubinfeld, and L. Trevisan. Approximating the minimum spanning

tree weight in sublinear time. In Proceedings of the 28th Annual Interrnational Collo-

quium on Automata, Languages and Programming (ICALP), pages 190 – 200, 2001.

[26] D. Cohen-Or, Y. Chrysanthou, C. T. Silva, and F. Durand. A survey of visibility

for walkthrough applications. to appear in: IEEE Transactions on Visualization &

Computer Graphics.

[27] A. Czumaj, F. Erg¨

un, L. Fortnow, A. Magen, I. Newman, R. Rubinfeld, and C. Sohler.

Sublinear approximation of Euclidean minimum spanning tree. In Proceedings of the

14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 813–

822, 2003.

[28] A. Czumaj and C. Scheideler. An algorithmic approach to the general Lov´

asz local

lemma with applications to scheduling and satisfiability problems. In Proceedings of

the 32th Annual ACM Symposium on Theory of Computing (STOC), pages 38 – 47,

2000.

[29] A. Czumaj and C. Scheideler. Coloring non-uniform hypergraphs: A new algorithmic

approach to the general Lov´

asz Local Lemma. Random Structures and Algorithms,

17(3 - 4):213–237, 2000.

[30] A. Czumaj and C. Sohler. Property testing with geometric queries. In Proceedings of

the 9th Annual European Symposium on Algorithms (ESA), pages 266 – 277, 2001.

[31] A. Czumaj and C. Sohler. Soft kinetic data structures. In Proceedings of the 12th

Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 865 – 872,

2001.

[32] A. Czumaj and C. Sohler. Testing hypergraph coloring. In Proceedings of the

28th Annual Interrnational Colloquium on Automata, Languages and Programming

(ICALP), pages 493 – 505, 2001.

[33] A. Czumaj and C. Sohler. Abstract combinatorial programs and efficient property

testers. In Proceedings of the 43rd Annual IEEE Symposium on Foundations of Com-

puter Science (FOCS), pages 83–92, 2002.

[34] A. Czumaj, C. Sohler, and M. Ziegler. Property testing in computational geometry.

In Proceedings of the 8th Annual European Symposium on Algorithms (ESA), pages

155 – 166, 2000.

[35] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf. Computational

Geometry: Algorithms and Applications. Springer-Verlag Berlin Heidelberg, 1997.

[36] S. Doddi, M. Marathe, A. Mirzaian, B. Moret, and B. Zhu. Map labeling and its gen-

eralizations. In Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete

Algorithms (SODA), pages 148 – 157, 1997.

117

Bibliography

[37] Y. Dodis, O. Goldreich, E. Lehman, S. Raskhodnikova, D. Ron, and A. Samorod-

nitsky. Improved testing algorithms for monotonicity. In Proceedings of the 3rd In-

ternation Workshop on Randomization and Approximation Techniques in Computer

Science, pages 97 – 108, 1999.

[38] L. Engebretsen and J. Holmerin. Clique is hard to approximate within n1−o(1). In

Proceedings of the 27th Annual Interrnational Colloquium on Automata, Languages

and Programming (ICALP), pages 2 – 12, 2000.

[39] P. Erd˝

os and L. Lov´

asz. Problems and results on 3-chromatic hypergraphs and some

related questions. Infinite and Finite Sets (to Paul Erd˝

os on his 60th birthday), II:609

– 627, 1975.

[40] F. Erg¨

un, S. Kannan, S.R. Kumar, R. Rubinfeld, andM. Viswanathan. Spot-checkers.

J. Comput. Syst. Sci., 60:717–751, 2000.

[41] T. Feder and D. H. Greene. Optimal algorithms for approximate clustering. In Pro-

ceedingsof the 20th Annual ACM Symposium on Theory ofComputing (STOC), pages

434 – 444, 1998.

[42] E. Fischer. The art of uniformed decisions. a primer to property testing. Bulletin of

the European Association for Theoretical Computer Science, 75:97 – 126, 2001.

[43] E. Fischer, G. Kindler, D. Ron, S. Safra, and A. Samorodnitsky. Testing juntas.

In Proceedings of the 43th Annual IEEE Symposium on Foundations of Computer

Science (FOCS), pages 103–112, 2002.

[44] E. Fischer, E. Lehman, I. Newman, S. Rashkodnikova, R. Rubinfeld, and

A. Samorodnitsky. Monotonicity testing over general poset domains. In Proceed-

ings of the 34th Annual ACM Symposium on Theory of Computing (STOC), 2002.

[45] E. Fischer and I. Newman. Testing of matrix properties. In Proceedings of the 33rd

Annual ACM Symposium on Theory of Computing (STOC), pages 286 – 295, 2001.

[46] E. Fischer and I. Newman. Functions that have read-twice constant width branching

programs are not necessarily testable. In Proceedings of the 17th Conference on

Computational Complexity, 2002.

[47] M. Formann and F. Wagner. A packing problem with applications to lettering of

maps. In Proceedings of the 7th Annual ACM Symposium on Computational Geom-

etry (SoCG), pages 281 – 288, 1991.

[48] M. R. Garey and D. S. Johnson. Computers and intractability: A guide to the theory

of NP-completeness. Freeman, New York, NY, 1979.

[49] O. Goldreich. Combinatorial property testing (a survey). In Proceedings of DIMACS

Workshop in Randomization Methods in Algorithm Design, pages 45 – 59, 1997.

118

Bibliography

[50] O. Goldreich, S. Goldwasser, E. Lehman, D. Ron, and A. Samorodnitsky. Testing

monotonicity. Combinatorica, 20(3):301 – 337, 2000.

[51] O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to

learning and approximation. Journal of the ACM, 45(4):653 – 750, 1998.

[52] O. Goldreich and D. Ron. Property testing in bounded degree graphs. In Proceedings

of the Annual ACM Symposium on Theory of Computing (STOC), pages 406–415,

1997.

[53] O. Goldreich and D. Ron. A sublinear bipartiteness tester for bounded degree graphs.

Combinatorica, 19(3):335–373, 1999.

[54] O. Goldreich and L. Trevisan. Three theorems regarding testing graph properties.

In Proceedings of the 42nd Annual IEEE Symposium on Foundations of Computer

Science (FOCS), pages 460–469, 2001.

[55] V. Guruswami, J. H˚

astad, and M. Sudan. Hardness of approximate hypergraph color-

ing. In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer

Science (FOCS), pages 149 – 158, 2000.

[56] T. Hagerub and C. Rub. A guided tour to chernoff bounds. Information Processing

Letters, 33:305 – 308, 1989.

[57] D. S. Hochbaum. Approximation algorithms for NP-hard problems. PWS Publish-

ing Company, Boston, MA, 1996.

[58] H. H. III, M. Marathe, V. Radhakrishnan, D. R. S. Ravi, and R. Stears. Nc-

approximation for np- and pspace-hard problems for geometric graphs. Journal of

Algorithms, 26(2):238 – 274, 1998.

[59] J. D. Kececioglu and D. Sankoff. Exact and approximation algorithms for sorting by

reversals, with application to genome rearrangement. Algorithmica, 13(1/2):180 –

210, 1995.

[60] S. Khot. Hardness results for approximate hypergraph coloring. In Proceedings of

the 34th Annual ACM Symposium on Theory of Computing (STOC), 2002.

[61] D. Kirkpatrick and J. Radke. A framework for computational morphology. Compu-

tational Geometry, pages 217 – 248, 1985.

[62] Y. Kohayakawa, B. Nagle, and V. R¨

odl. Efficient testing of hypergraphs. In Pro-

ceedings of the Annual Interrnational Colloquium on Automata, Languages and Pro-

gramming (ICALP), pages 1017–1028, 2002.

[63] M. Krivelevich and B. Sudakov. Approximate coloring of uniform hypergraphs. In

Proceedings of the 6th Annual European Symposium on Algorithms (ESA), pages 477

– 489, 1998.

119

Bibliography

[64] L. Lov´

asz. Coverings and colorings of hypergraphs. In Proceedings of the 4th South-

eastern Conference on Combinatorics, Graph Theory and Computing, pages 3 – 12,

1973.

[65] R. Motwani and P. Raghavan. Randomized algorithms. Cambridge University Press,

New York, NY, 1995.

[66] I. Newman. Testing of function that have small width branching programs. In Pro-

ceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science

(FOCS), pages 251 – 258, 2000.

[67] M. Parnas and D. Ron. Testing the diameter of graphs. In Proceedings of the

RANDOM-APPROX’99, pages 85–96, 1999.

[68] M. Parnas and D. Ron. Testing metric properties. In Proceedings of the 33rd Annual

ACM Symposium on Theory of Computing (STOC), pages 276 – 285, 2001.

[69] M. Parnas, D. Ron, and R. Rubinfeld. Testing parenthesis languages. In Proceedings

of the 5th International Workshop on Randomization and Approximation Techniques

in Computer Science, pages 261 – 272, 2001.

[70] M. Parnas, D. Ron, and A. Samorodnitsky. Proclaiming dictators and juntas or testing

boolean formulae. In Proceedings of the 5th International Workshop on Randomiza-

tion and Approximation Techniques in Computer Science, pages 273 – 284, 2001.

[71] P. A. Pevzner. Computational Molecular Biology. MIT Press, 2000.

[72] P. A. Pevzner and M. S. Waterman. Open combinatorial problems in computational

molecular biology. In Proceedings of the 3rd Israel Symposium on Theory of Com-

puting and Systems, pages 158 – 173, 1995.

[73] J. Radhakrishnan and A. Srinivasan. Improved bounds and algorithms for hypergraph

two-coloring. In Proceedings of the 39th Annual IEEE Symposium on Foundations

of Computer Science (FOCS), pages 684 – 693, 1998.

[74] D. Ron. Property testing. In Handbook of Randomized Algorithms. Kluwer Academic

Publishers, 2001.

[75] R. Rubinfeld and M. Sudan. Robust characterization of polynomials with applica-

tions to program testing. SIAM Journal on Computing, 25(2):252 – 271, 1996.

[76] M. Sharir and E. Welzl. A combinatorial bound for linear programming and related

problems. In Proceedings of the 9th Annual Symposium on Theoretical Aspects of

Computer Science (STACS), pages 569 – 579, 1992.

[77] R. E. Tarjan. Data structures and network algorithms. In CBMS-NSF Regional Con-

ference Series in Applied Mathematics, volume 44. Society for Industrial and Applied

Mathematics (SIAM), 1983.

120

Bibliography

[78] E. Welzl. Smallest enclosing disks(balls and ellipsoids. In H. Maurer, editor, New

Results and New Trends in Computer Science, number 555 in LNCS. Springer, 1991.

121