Document [original]

Visual Data Mining

of Graph-Based Data

Oliver Niggemann

From the Department of Mathematics and Computer Science

of the University of Paderborn, Germany

The Accepted Dissertation of Oliver Niggemann

In Order to Obtain the Academic Degree of Dr. rer. nat.

Contents

I Concepts of Visual Data Mining 7

1 FundamentalMethodology 9

1.1 Structure of this Work . ....................... 9

1.2 The AI Visualization Paradigm ................... 10

1.3 Structure-Based Visualization . ................... 12

1.4 Visualizing Graphs . . . ....................... 13

1.5 Visualizing Tabular Data....................... 15

1.6 Existing Methods for Visual Data Mining . . . . . ........ 16

1.7 Evaluation . .............................. 18

2 Structure Identification 19

2.1 Existing Methods . . . . ....................... 21

2.1.1 Agglomerative Approaches . . ............... 21

2.1.2 Divisive Approaches . . ................... 24

2.1.3 Optimization Approaches . . . ............... 25

2.1.4 Self-Organizing Approaches . ............... 26

2.1.5 Other Approaches . . . ................... 27

2.2 Natural Clustering . . . ....................... 28

2.2.1 MajorClust . . . . ....................... 29

2.2.2



-Maximization . ....................... 35

2.3 Adaptive Clustering . . ....................... 38

2.4 Evaluation . .............................. 39

3 Structure Classification 43

3.1 Direct Classification . . . ....................... 44

3.1.1 Learning the optimal Layout Method . . . ........ 44

3.2 Case-Based Classification . . . ................... 45

3.3 Evaluation . .............................. 46

4 Structure Envisioning 49

4.1 Existing Methods . . . . ....................... 49

4.1.1 Force-Directed Drawing ................... 50

4.1.2 Multidimensional Scaling . . . ............... 52

4.1.3 Level-Oriented Drawing ................... 52

4.1.4 Other Methods . ....................... 54

4.2 Factor-Based Drawing . ....................... 55

4.3 Combining Clustering and Graph Drawing . . . . ........ 59

4.4 Evaluation . .............................. 63

CONTENTS

5 Creating Graphs from Tabular Data 65

5.1 Similarity Functions . ........................ 68

5.2 Learning Similarity Functions . . . ................. 70

5.2.1 Existing Methods....................... 71

5.2.2 New Methods . ........................ 71

5.3 Evaluation . . . ............................ 75

II Implementing Visual Data Mining 77

6 Visualizing Models given as Graphs 81

6.1 Network Traffic ............................ 81

6.1.1 Using STRUCTUREMINER .................. 84

6.1.2 Administration Tasks . . . ................. 86

6.1.3 Simulation . . . ........................ 92

6.1.4 Discussion . . . ........................ 97

6.2 Configuration Knowledge-Bases . ................. 99

6.2.1 Identifying Technical Subsystems ............. 101

6.2.2 Classifying Clusters according to the Optimal Layout Al-

gorithm ............................ 102

6.2.3 Discussion . . . ........................ 103

6.3 Protein Interaction Graphs . . . . . ................. 103

6.3.1 Discussion . . . ........................ 105

6.4 Fluidic Circuits ............................ 107

6.4.1 Domain Clustering . . . . . ................. 107

6.4.2 Discussion . . . ........................ 109

7 Visualizing Models given as Tables 113

7.1 Object Visualization . . ........................ 113

7.1.1 Document Visualization . . ................. 113

7.2 Visualization of Features . . . . . . ................. 121

8 Summary 127

A Machine Learning 131

A.1 Learning a Classification Function ................. 132

A.2 Regression . . . ............................ 133

A.3 Neural Networks . . . ........................ 134

B Some Graph-Theoretic Definitions 137

Acknowledgments

This work has profited very much by suggestions from colleagues and by co-

operations with other researchers. I especially wish to thank Professor Dr. H.

Kleine Büning, my thesis advisor, for supporting me and my work. Further-

more, I thank Professor Dr. F. Meyer auf der Heide for evaluating this thesis.

I owe Dr. Benno Stein many thanks for setting the standard for this work

and for reminding me that there’s no such thing as a free lunch. The discus-

sions with Dr. TheoLettmann,SvenMeyerzuEissen, AndrèSchulz, andallmy

other colleagues and their constant support were very important to me during

the last years.

Michael Lappe from the Structural Genomics Group of the European Bioin-

formatics Institute in Cambridge (UK) provided me with many interesting

problems from the field of biology and also helped me to solve some of them. I

also thank Jens Tölle from the University of Bonn (Germany) for answering all

those questions about computer networks.

Finally, I would like to express my deepest gratitude to my wife Petra. She

made this work possible.

CONTENTS

Part I

Concepts of Visual Data

Mining

Chapter 1

Fundamental Methodology

The field of data visualization has reached a point of inflection. The rapidly

growing amount of data does not anymore allow for a presentation of all given

data items. Furthermore, the complexity of many datasets surpasses the user’s

ability to identify the gist or the underlying concepts. One solution to these

problems is to analyze the given data and confront the user with an abstract

view of the information.

The following examples may help to illustrate the breadth and complexity

of current datasets:

“For instance, Wal-Mart, the chain of over 2000 retail stores, every day uploads

20 million point-of-sale transactions to an AT&T massively parallel system with 483

processors running a centralized database. At corporate headquarters, they want to

know trends down to the last Q-Tip.” (in [63])

Visualizing trafficin computer networks is one of six applications described

in this work. Currently several million IPs (computer addresses) are used in

the internet (from [118]). Even if not all of these addresses have to be visual-

ized at the same time, most realistic scenarios still comprise several thousand

computers which interact in highly complex ways.

SGI describes customers for its data visualization software package from

the telecommunication market as follows: “According to the Technology Research

Institute, 26 percent of the carriers it interviewed had warehouses of 500GB or greater,

with that percentage projected to increase to 63 percent in 1997. Terabyte, and multi-

terabyte-sized telecommunications databases are not unusual.” (in [171])

Another typical domain is the visualization of protein interactions within

the framework of the human genome project (see section 6.3): E.g. the SCOP

hierarchy for the classification of proteins comprises 11410entries (from [150]).

Again, the interactions between proteins can be very complex.

In this chapter, a methodology is presented which combines data abstrac-

tion techniques and visualization methods.

1.1 Structure of this Work

This work is structured as follows: Two main parts exist. Part I introduces the

visualization concepts which are applied to six domains in part II.

CHAPTER 1. FUNDAMENTAL METHODOLOGY

This chapter describes the ideas discussed in this work. Section 1.3 intro-

duces the general methodology for visualizing complex data according to the

AI-visualization paradigm. Section 1.4 formally defines the to-be-visualized

graphs. In section 1.5, the visualization approach is extended for tabular data

sets. A short overview of existing methods for visual data mining is given in

section 1.6.

Each of the next four chapters explains a single step of the new visual-

ization methodology. These chapters end with a section called “Evaluation”

which verifies whether or not that step complies with the four features given

above. After all steps are explained comprehensively, section 4.3 elaborates on

the combination of those steps, i.e. the visualization methodology is discussed

in detail.

Chapter 6 describes applications where data is given directly as a graph.

Tabular data sets are the subject of chapter 7. Each application closes with a

discussion of the results. A summary is given in chapter 8.

While the first part of this work gives only theoretical evaluations of the

described algorithms, the second part assesses the usefulness of the methods

by applying them to real-world problems.

1.2 The AI Visualization Paradigm

The field of Artificial Intelligence (AI) tries to solve complex problems by an-

alyzing human problem solving approaches. This work applies the general

AI-philosophy to the visualization of complex data: Humans understand in-

formation by forming a mental model which captures only the gist of the in-

formation. Manual visualizations often resemble this mental model. Thus,

applying Artificial Intelligence to the automatic visualization of data results in

the following AI-visualization paradigm:

“Every visualization should aim at the human mental model.”

Finding a presentation close to the mental model supports the human un-

derstanding of complex data. Thus, information hidden in the given data is

made explicit. In this sense, such a visualization may be viewed as visual data

mining or visual knowledge discovery. Data mining1or knowledge discovery

is a field of computer science which deals with the problem of finding un-

known structures and relations in complex data sets. The main idea of visual

data mining is to combine the advantages of computers and humans, i.e. the

computer’s ability to quickly process largeamounts of data and the human tal-

ent of identifying visual patterns. In this work, visualization is used solely as a

means of knowledge discovery, esthetic considerations are less important.

While it is difficultto find general featuresof visualization methods follow-

ing the AI-visualization paradigm, it is possible to analyse problem instances

and to identify common features of good solutions. The author does not claim

that all visualization methods following the AI-visualization paradigm must

comprise these features. But he claims that in many domains the application

of the AI-visualization paradigm results in visualization methods comprising

the four features presented below.

1An overview concerning data mining can be found at [43, 59, 14, 18, 113].

1.2. THE AI VISUALIZATION PARADIGM

In this work, six problem domains areanalyzed and solutions to the visual-

ization problem are developed. Four common featuresare identified. The new

methodology of structure-based visualization is introduced in section 1.3; this

methodology leads to visualizations which comprise all four features.

These features can also be used as criterions for the applicability of the

methodology to a new problem: The user assesses whether the features are

desirable for a solution.

The result of the visualization as shown to the user is called presentation.

The first two features describe this presentation:

1. Object-based Presentation: Whenever the given data comprises distinct

objects, the userwants torecognizethese objects inthe representation,i.e.

objects and the spatial relations between them should carry the visual

information. Objects are often represented by simple graphical objects

(e.g. circle, texts, etc.).

Thus, the given datashould either consist of objects and theirrelations or

it must be transformed into this structure. Such data can be modeled by

a graph2, where objects form the nodes, while edges represent the given

relations between the objects.

2. Constant Dimension Presentation: The presentation always uses a con-

stant number of dimensions, i.e. more complex data does not result in

a more complex representation. In this work, all data sets are visualized

on a 2-dimensional plane because for cognitive reasons many users pre-

fer 2-dimensional presentations. Most results can easily be extended for

3-dimensional presentations. Since humans often have problems with

presentations having more than three dimensions, the constant dimen-

sion restriction makes sense.

This work deals with data for which no inherent visualization exists.

Data which includes inherent visualizations often describes geometrical

or spatial information: A mathematical function

(

)

has an inherent

visualization (its 2-dimensional plot). A set of 3-dimensional polygons

defining the outline of a human face also has an inherent visualization

(any rendering of the polygons). The readermay note that since the data

directly defines the corresponding visualization, the number of dimen-

sions used for the representation is determined by the data. Therefore, a

variable dimension presentation is used.

A set of people and their relationships to one another does not have an

inherent visualization. Nevertheless most humans have a clear idea of

an appropriate visualization: Visualize people as objects on a plane and

place two people with a close relationship closely together. The number

of dimensions used for the visualization is defined by the capabilities of

the user, not by the data.

The last two criterions define features of the visualization process itself:

3. Abstraction: In order to approximate the mental model, the data is ab-

stracted: Machine learning is used to identify the gist of the information,

i.e. abstract concepts hidden in the data are identified. In this process

2This graph is formally defined in section 1.4, an introduction to graphs can be found in [80].

CHAPTER 1. FUNDAMENTAL METHODOLOGY

details are lost, thus the methodology presented here is not suited for

problem classes where no information can be lost. Artificial data sets

which often do not comprise any underlying concept are also not suited.

Abstraction also allows for the visualization of larger data sets: Working

in a top-down approach, visualization algorithms can first layout only

the identified abstract concepts. In a second step, more detailedinforma-

tion within the abstract concepts are visualized. Such a method therefore

creates from the original dataset several smaller datasets which are laid

out separately.

4. User Adaption: Since the precise mental model of the user is unknown,

the user should be integrated into the visualization process. When ap-

propriate, the needs and possibilities for user interactions are examined

in the following chapters.

In this work, the to-be-visualized data is restricted to graphs. Graphs are a

very common means of modeling and always present an object-oriented view

of the data. Tabular data is only discussed in one special context: It is viewed

as a special type of graph (see section 1.5) and graph visualization methods are

applied. The reader may note that different visualization techniques exist for

tabular data, details can be found in section 1.6.

1.3 Structure-Based Visualization

Human understandingis basedon structureidentification. Faced witha highly

complex world, the mind reduces the problem of comprehending a situation

by means of abstraction: structures are identified and classified. E.g. visual

recognition relies on identifying alreadyknown objects, the human design pro-

cess has often been explained by a pattern recognition approach, and people

diagnosing a defective technical device exploit the structure of the device by

identifying faultless or defectivesubstructures. Automatic structure identifica-

tion emulates this process. Since most data sets model a limited part of reality,

reality’s structure is also imprinted on the data.

In many cases, classified structures should be visualized in a deterministic

way, i.e. similar structures should result in similar visualizations. This allows

for an easy understanding of alreadyknown structures and repetitive patterns.

ABA

MODEL

Figure

1.1:

The general Methodology of Structure-based Visualization

These thoughts lead to the new general methodology of structure-based

visualization (see also figure 1.1):

1.4. VISUALIZING GRAPHS

1. Structure Identification: The first step identifies the model’s structure

which is defined by closely related submodels and their relationships.

2. Structure Classification: The second step consists of classifying the indi-

vidual substructures.

3. Structure Envisioning: The substructures and their relationships are vi-

sualized in the third step. Similar substructures are visualized in a sim-

ilar way and repetitive substructures are emphasised. The visualization

places nodes on a plane.

WWW

(O0

(OBJECTTYPE (TYPE STRING) (VALUE

"CT_COMPONENT"))

(COMPONENTTYPE (TYPE STRING) (VALUE

"compCheckValve"))

(GEOMETRY (TYPE LIST_OF_INTS) (VALUE

(62216 115712 70903 140288 180 0 398123)))

(P_V_FEST (TYPE BOOLEAN) (VALUE T))

(P_V (TYPE FLOAT) (VALUE 1))

Figure

1.2:

An Example for the Visualization Methodology

Figure1.2showsanexampleofthis: Firstofall, thestructureofthetextually

defined hydraulic circuit is identified, i.e. substructures are outlined. These

substructures are classified:

denotes a supply substructure and

denotes a

working substructure. Finally, the model is drawn. The layout emphazises the

structure by placing elements of the same substructure closely together. The

repetitive pattern of the three working units is emphazised by their parallel

positioning.

1.4 Visualizing Graphs

As mentioned in the beginning of this section, the method of structure-based

visualization requests the data to be given in the form of a graph. Thus a visu-

alization graph is now defined formally:

Definition 1.4.1 (Visualization Graph)

The visualization graph

V; E

;Æ;w;

)

is defined as follows:



denotes the set of nodes. Each node corresponds to an object.







is the set of edges used for finding a layout.







is the set of visible edges.

CHAPTER 1. FUNDAMENTAL METHODOLOGY







defines the node labels over the alphabet





defines the edge weights. The larger the weight, the closer

the relationship between nodes.

Remark 1.4.1 Byintroducingtwoseparatesetofedges

and

,itbecomes possible

to model situations where one set of edges (

) should be used to calculate positions

for the nodes and a different set of edges (

) is shown to the user. Situations where

;

are possible.

One example is the visualization of hydraulic circuits: The user should only see

edges representing tubes between hydraulic components. For the visualization task

more edges exist: These edges connect components which should be placed closely to-

gether, e.g. a cylinder and its controlling valve.

We normally presume that edges are undirected, i.e.

and

comprise

sets of nodes. Directed edgesare modeledas lists of nodes. If methods areonly

suited for directed or undirected graphs, it is explicitly mentioned in this text.

The reader may note that directed graphs can always be treated as undirected

graphs. The final visualization can still show directed edges. Of course, in this

way all information coded into the edge directions is lost. If undirected graphs

are transformed into directed graphs, either additional knowledge about edge

directions is used or the directions are randomly chosen. In the later case, the

visualization method uses nonexisting information. This may lead to unsatis-

factory results.

For simplicity reasons,only the currently importantelements of avisualiza-

tion graph are listed in this work, e.g

V; E

)

may denote a visualization

graph in a context where

;Æ;w;



are of no importance.

In order to solve the graph layout problem, positions for all nodes have to

be found:

Definition 1.4.2 (Graph Layout)

A graph layout

is a tuple

G; 

where

V; E

;Æ;w;

)

is a visualiza-

tion graph and





defines a two-dimensional position for each

node.

In this work, it will be outlined how the general methodology of structure-

based visualization can be implemented using visualization graphs:

The firststep canbe operationalizedby meansof clusteralgorithms (section

2). In the second step (classifying the clusters), machine learning is used to ab-

stract the necessary classification knowledge from given examples. Section 3

describes two different approaches: direct classification and case-based classi-

fication.

The graph is laid out in the third step using graph drawing algorithms (see

section 4).

Similar clusters can be visualized in a similar way by applying the same

layout algorithms (e.g. see section 3.1.1). For many applications, the only rea-

son to classify clusters is to use similar layout methods for similar clusters. For

this reason, step 2 and step 3 are often highly interwoven.

1.5. VISUALIZING TABULAR DATA

Name education income occupation citizenship gender

Meyer Ms. CS. $70.000 Sales German f

Smith Ba. CS. $55.000 Devolpment British m

Wagner High school $45.000 Training Dutch f

     

Table 1.1: A typical tabular data set

1.5 Visualizing Tabular Data

A special type of visualization graph is called tabular data set. Each node of a

tabular data set has a node label which comprises a number of node features.

The characters

;

(

;

)

are used to separate the features in the node label and

are therefore part of the alphabet



Definition 1.5.1 (Tabular Data Set)

Let

V; E

;Æ;w;

)

be a visualization graph.

is called a tabular data

set if and only if:



;



(

)=(

(

)

;:::;f

(

)

V; p

(

)

(

;

(

;

)



 f

;

(

;

)

g



The variables

(

)

are called features. Tabular data sets can be viewed as

a table (see table 1.1), each row represents a node or object and each column

defines a feature. In order to transform a tabular data set into a visualization

graph, the edges

must be defined; this is the subject of section 5.

The analysis of tabular data sets arises in many different domains:

☞Case-based reasoning (CBR) tries to solve a new problem by finding a

similar old problem and adapting the old solution to the new problem.

Old problem and their solution are saved in form of a table, each row

represents an old problem and its solution. The columns model features

of problem instances or their solutions respectively. CBR is explained

shortly in section 3.2.

☞Relational database also use tables to store data. An introduction can be

found in [40, 176].

☞Spreadsheets are used in many domains to store and analyze financial

data and calculations.

All these fields developed highly specialized methods to analyze and visu-

alizetabulardata;themost popularmethods areoutlinedin thenextsection. In

this work only one aspect is examined: How can the methodology of structure-

based visualization be applied to tabular data sets? For this, weights between

objects are computed by analyzing the given features. The visualization itself

uses only these weights, the original features are disregarded. Details can be

found in chapter 5.

CHAPTER 1. FUNDAMENTAL METHODOLOGY

1.6 Existing Methods for Visual Data Mining

Several methods for visual data mining exist. In this work, the four features

from the beginning of this chapter are used to classify these techniques. We

furthermoredifferentiatebetweendatagivenin theformofagraphandtabular

data. This approach may result in an AI-centered view onto the field of visual

data mining, but the readermay find such a classification helpful in comparing

existing methods with the methodology presented in this work.

Four typical classes of visual data mining methods exist:

1. Graph-given Data, Object-Oriented, Constant Dimension Visualiza-

tion: An detailed overview of such visualization methods is given in

section 4.1.

2. TabularData,Object-Oriented,ConstantDimensionVisualization: The

object features of a tabular data set can be viewed as positions in

dimensional space3.

These visualization techniques reduce the

dimensions to only two or

three dimensions. Typical methods are Principle Component Analysis

and Factor Analysis (seesection 4.2), FastMap(this method may be view-

ed as a heuristic for Principle Component Analysis, see [42]) and Multi-

dimensional Scaling (an overview can be found in section 4.1.2). Clus-

tering may also be viewed as a dimension reduction technique: The

dimensional features are replaced by a single value, the respective clus-

ter. An extensive overview of existing algorithms for clustering is given

in section 2. Another possibility is to transform the tabular data into a

graph and apply graph visualization method. This approach is used in

this work and an introduction can be found in chapter 5.

3. Tabular Data, Object-Oriented, Variable Dimension Visualization:

These methods also treat the object features as

-dimensional positions.

Unlike the dimension reduction techniques described above, special vi-

sualization techniques are used to show more than two or three dimen-

sions. Sometimes several diagrams, each depicting a subset of the orig-

inal dimensions, are used instead of one single diagram. The general

idea is to use more complex visualization for more complex data. Of

course,manyofthesetechniques canalsobeusedtovisualizedimension-

reduced data.

Thoroughly and extensive overviews can be found in [38, 92, 23, 20].

Since this type of visualization is not the subject of this work, only a few

typical examples are mentioned4:



Geometric Techniques:

–Scatterplots: For all feature subsets, 2- or 3-dimensional visual-

izations are created,see [24].

3We assume metric features;as explained in section 5.1, non-metric featurescan be transformed

into metric features.

4The following enumeration of the variable dimension visualization methods closely follows

the classification suggested in [92].

1.6. EXISTING METHODS FOR VISUAL DATA MINING

–Prosection Views: Selected value ranges are projected orthogo-

nal onto a plane. This results in several differentdiagrams, each

showing one projection (see [48]).

–Parallel Coordinates: Objects are depicted as lines in a 2-dimen-

sional drawing. The original

dimensions are placed on the

x-axis. The y-value of a line (i.e. object) at a special x-value (i.e.

dimension) is defined by the corresponding value of the object

feature. Details can be found in [74].



Icon-based Techniques: These techniques depict objects as icons on

a plane. The appearance of the icon is used to visualize additional

dimensions. An overview of these techniques can be found in [92].



Pixel-oriented Techniques: These methods depict each feature in a

single window. Feature values correspond to exactly one pixel in a

window. The main idea is that similar feature value sets result in

similar visualizations. An examples is Recursive Pattern Visualiza-

tion ([90]).



HierarchicalTechniques: Suchmethodsdepictmorethantwo-dimen-

sions in one 2D-diagram by placing smaller diagrams within larger

diagrams. Examples are:

–Dimension Stacking: Each position on a 2-dimensional plane

comprises another 2-dimensional diagramshowing dimensions

threeandfour. Detailscan befound in [103]. Anextension to 3D

(boxes within transparent boxes) has been introduced in [140].

–Treemap ([152]): For this method, a classification tree for the

data has to be given. Such a treesubdivides the data recursively

according to value ranges. Different features are used on differ-

ent levels of the tree. This classification tree is used to subdivide

the visualization plane.

4. Tabular Data, Non Object-Oriented, Constant Dimension Visualiza-

tion: These methods concentrate on the visualization of aggregated val-

ues. The objects themself areno longer shown, only aggregations of their

values are visualized. Typical examples are histograms, pie charts, and

function plots. The OLAP method for managing databases also com-

prises such methods (see [27, 153, 136]). This type of visualization causes

totally differentquestions than thevisualization method describedin this

work. Thus, the reader is referedto [173, 174] for an an overview.

Most of these methods have been combined with distortion techniques.

Such techniques distort a visualization in such way that important aspects are

shown in detail while unimportant parts are blended-out.

Abstraction is mainly used by techniques in the visualization classes 1,2,

and 4. In this work, a class 1 visualization technique is described which uses

clustering to abstract graphs. Section 4.1.4 gives an overview of existing com-

binations of clustering algorithms and class 1 visualization techniques. The

dimension reduction algorithms used with visualization methods from class 3

are also abstraction techniques. All of these methods work automatically.

Since most aggregations used by methods from class 4 create more abstract

information, they may be viewed as a form of abstraction as well. But unlike

CHAPTER 1. FUNDAMENTAL METHODOLOGY

the other methods, aggregations are normally defined manually. This work

focuses on the automatic abstraction of data.

User interaction can mainly be found in combination with visualization

methods from class 3. It often compensates the missing abstraction ability.

The user normally either chooses parameters of the representation (e.g colors,

icons etc.), influences the used projection (e.g. for scatterplots), changes the

level of detail, or chooses a subset of the data. A more extensive overview of

user-interaction techniques for algorithms from class 3 can be found in [91].

Different techniques for user-interaction are introduced in this work.

1.7 Evaluation

The next three chapters describe individual steps of the structure-based visu-

alization methodology. The fourth chapter deals with the transformation of

tabular data sets into visualization graphs. At the end of each chapter, a sec-

tion called “Evaluation”checks whether thedescribed methods follow the four

features of visualizations from section 1.3.

Some general comments can now be given:

1. Object-basedPresentation: All steps treat objects as atomic entities. The

“Structure Envisioning” step especially depicts objects as geometric enti-

ties.

2. Constant Dimension Presentation: This feature only has to be followed

by the final envisioning step (see section 4.4). It is therefore only exam-

ined there. The reader may note that several existing visualization meth-

ods do not use a constant dimension representation (see section 1.6).

3. Abstraction: “Structure Identification” and“StructureClassification” are

the main abstraction steps. Details can be found in chapters 2.4 and 3.3.

This feature is consequently not discussed in other chapters.

4. User Adaption: The question of user adaption, i.e. the integration of the

user in the problem solving process, is subject to world-wide research

(e.g. see [137, 73, 88]). Most approaches try to create a model of the

user’s behaviour5which is then used to adapt an algorithm. This work

is neither an introduction to user adaption nor does it focus on the user

adaption aspect. Nevertheless, user adaption is examined as far as the

structure-based visualization method is concerned.

5This model is often implemented by means of Bayesian Networks, see [133].

Chapter 2

Structure Identification

As outlined in sections 1.3 and 1.4, the visualization graphs are clustered in

order to identify their structure and to speed up the layout process.

Section 2.1givesan overviewofexisting approaches. Two newmethods ap-

plicable to general graphsarepresented in section 2.2. An innovative approach

integrating additional domain knowledge is given in section 2.3.

A clustering is now formally defined:

Definition 2.0.1 (Clustering)

A clustering of a visualization graph

= (

V; E

)

is a set

;:::;C





with



Definition 2.0.2 (Disjunctive Clustering)

A disjunctive clustering of a graph

= (

V; E

)

is a clustering

;:::;C

with



i; j



Definition 2.0.3 (Set of all Disjunctive Clusterings (

))

(

)

is defined as the set of all disjunctive clusterings of a graph

V; E

)

(

;:::;C

gjf

;:::;C

is a disjunctive clustering of

Definition 2.0.4 (Induced Structure of a Graph (

))

Let

V; E

)

be a graph and

(

)

. The graph

(

G; C

)=(

)

with

gj9f

g 2

and

(

)

is called the structure induced

w=1

w=2

Figure 2.1:

A clusteredgraph (left side) and its structure

An exemplary clustering and its induced structure can be seen in figure 2.1.

CHAPTER 2. STRUCTURE IDENTIFICATION

For some applications it is helpful to identify further subclusters within

clusters. Such a clustering a called hierarchical clustering:

Definition 2.0.5 (Hierarchical Clusterings)

Ahierarchicalclusteringofagraph

V; E

)

isdefinedasatriple

< C;T;C

where



(

)



)

denotes the so-called clustering tree

and



defines the root of

C6C4

C6C1

Figure 2.2:

An hierarchical cluster (left side) and the corresponding clusteringtree (right

hand side).

An exemplary hierarchical clustering can be seen in figure 2.2: On the left

side a hierarchical clustering of a graph

V; E

)

can be seen (the top cluster

node

, which comprises all clusters, is not shown). The figure on the right

hand side shows the clustering tree

)

with

;

d; e

a; b; c

, and

f; g; h

The reader may note that every disjunctive clustering can also be treated as a

hierarchical clustering.

Parenthesis: For many graphs the clustering (and the layout

step) can be improved by using only a subset of the edges

E.g. a common pre-processingstepis to deletealledges which

have a weight below a given threshold.

Another popular pre-processing step is the

-nearest-

neighbour method: Every node deletes all adjacent edges ex-

cept the

edges with the highest weights.

1See definition B.0.8.

2.1. EXISTING METHODS

2.1 Existing Methods

Clustering is one of the oldest subjects investigated in the field of Machine

Learning: Pearson introduced in 1894 ([134]) a statistical clustering method.

Since then, a large number of methods and algorithms has been introduced,

some generally applicable, others highly specialized. The following section

tries to give an overview by introducing a new classification scheme for clus-

tering algorithms.

This schemes classifies clustering methods according to two categories.

Other schemes have been given in [111, 41, 160]:

1. Additional Knowledge Necessary for the Algorithm: Some clustering

methods rely on graph-theoretical features only. These algorithms ex-

ploit the domain knowledge coded implicitly into the graph’s structure.

Clusters found by these approaches are sometimes called natural clus-

ters. Other methods use additional knowledge, e.g. explicit node labels

or an explicit clustering criterion.

In most cases, natural clusters are desired, i.e. the given graph com-

prises allnecessary information to find itsstructure. Thus, any additional

knowledge needed should be seen as a weakness of the algorithm.

2. Optimization Criterion: All clustering techniques optimize some crite-

rion. Some methods use a given explicit function, while others have the

criterion coded into an algorithm.

Most people agree on the right clustering of a given graph, but finding a

generally approved criterion proves to be difficult (see also section 2.1.3

and 2.2). The variety of different criterions used by different algorithms

does not mirror an uncertainty or arbitrariness inherent in the human

understanding of correct clustering, but an inability to express this im-

plicit human understanding in terms of a mathematical function. The

reader may note that the author neither denies the existence of problem-

atic graphs, where different experts see different clusters, nor does he

question the necessity to sometimes develop specialized algorithms for

special domains.

The clustering approaches presented in the following paragraphs are typi-

cal for a whole class of algorithms. Most methods used today are specialized

versions of these algorithms. Section 2.1.5 presents some variations.

2.1.1 Agglomerative Approaches

Additional Knowledge: extra parameters necessary

Criterion: inter-cluster distances

Agglomerative methods try to minimize the inter-node distances within

clusters. Initially, every node forms its own cluster. The two closest clusters

are united until reasonable clusters are found. The definition of the closeness

of two clustersand the definition ofthe term “reasonableclusters”arethe main

problems when applying this method. Besides the graph, additional explicit

parameters are used to define when to stop the union of clusters. Therefore,

the algorithm needs extra knowledge.

CHAPTER 2. STRUCTURE IDENTIFICATION

Figure 2.3: An Example Graph Figure 2.4: The correspond-

ing edges

dmin

dmin dmin

Figure 2.5: Nearest

Neighbour: Step 1 Figure 2.6: Nearest

Neighbour: Step 2 Figure 2.7: Nearest

Neighbour: Step 3

One of the first implementations was the so-called single-linkage or nearest

neighbour approach (see [45, 154, 78]). For this method, the distance between

two clusters is defined by the distance between their closest nodes. Figure 2.3

gives an example: The graph

= (

V; E

)

already has a layout2, it is fully

connected, and all edges

= (

)

are weighted by the term



dist

(

)

i.e. the closer the nodes are,the higher are the corresponding edge weight. For

claritysake, the edgesareoftennot shown. Figure2.4shows the corresponding

edges.

The iterative union of clusters normally stops when the distance between

all clusters is larger then a given

min

. Figures 2.5, 2.6, and 2.7 show three

steps of the algorithm. The algorithm stops at step 3, because all inter-cluster

distance are large then

min

(

min

can be seen in figure 2.5).

Other agglomerative methods define the distancebetween two clusters dif-

ferently:

☞Complete Linkage: The distance is defined by the two furthest nodes.

☞Average Distance: The distance is defined by the average distance be-

tween two nodes.

☞Centroid Distance: For each cluster its centroid is calculated. The cen-

troid of a cluster is the point with the smallest average distance to all

other nodes within the cluster. The distance between clusters is then de-

fined by the distance between their centroids.

comes with a layout in order to simplify the following explanations.

2.1. EXISTING METHODS

Figure 2.8: Chaining

Effect: Step 1 Figure 2.9: Chaining

Effect: Step 2 Figure 2.10: Chaining

Effect: Step 3

A more extensive overview over agglomerative methods can be found in [41,

54].

Parenthesis: For some applications, agglomerative methods

areusedin aslightly differentfashion: The algorithm doesnot

stop, i.e. the method always ends with one cluster containing

all nodes. A typical application is the creation of taxonomies

in biology (e.g. in [154, 155]).

Agglomerative methods distinguish themself by some advantages:



They are fast: When the edge weights (i.e the node distances) are given,

single-linkage algorithms need for a graph

V; E

)

(

)

(

log(

)

steps. This can been seen by noting that single-linkage

algorithms createiteratively a minimal spanning tree for

(see e.g. [188,

156]). Such an tree can be constructed in

(

)

or in

(

log(

)

([80]).



They can be easily implemented.



They are well understood and have been applied frequently.

Several disadvantages reduce the applicability of this method:



As outlined before, this approach relies solely on distance information.

The so-called chaining effect may be used to illustrate that distance in-

formation is often insufficient: Figures 2.8, 2.9, and 2.10 show a situa-

tion where two distinct clusters are connected by a so-called bridge. The

nodes forming the bridge are closer together than the nodes within the

clusters. As can be seen in the figures, the clusters can not be found by

using distance information only.



Being greedy algorithms, all agglomerative methods suffer from the dis-

ability to revise an earlier decision. Wrong early local decision, e.g. the

union of two clusters, can not be changed. Therefore, agglomerative

methods are a local heuristic approach.



The iterative union of clusters should stop when reasonable clusters are

found. In order to define the term “reasonable clusters”, extra parame-

ters are needed. These parameters are domain dependent and must be

determined for each problem separately.

CHAPTER 2. STRUCTURE IDENTIFICATION

Figure 2.11: Mincut Clustering:

Clustering 1 Figure 2.12: Mincut Clustering:

Clustering 2



Graphs without weights can not be clustered.

2.1.2 Divisive Approaches

Additional Knowledge: extra parameters necessary

Criterion: minimum cut between clusters

Divisive approaches(also called MinCut clustering) tryto minimize the cut

of a clustering, i.e. the weight of the edges between clusters. Theses methods

start with on cluster containing all nodes. The clusters are iteratively divided

until reasonable clusters are found. Again, additional knowledge is necessary

to define the term “reasonable clusters”.

The cut of a graph division plays an important role for these algorithms.

The cut is defined as follows:

Definition 2.1.1 (Cut of a Clustering (cut))

The cut of a clustering

;:::;C

of a graph

V; E

)

is defined

as:

cut

(

G; C

;

(

)

Definition 2.1.2 (Minimum Cut of a Graph)

The minimum cut of a graph

V; E

)

is defined as:

min



V;C

;

[

cut

(

)

Divisive algorithms find a clustering

with a minimum cut and

subdivide

and

recursively. Since this method would always end with

clusters containing one single node, it is necessary to define when to stop the

partitioning. A common method uses a maximum cut value: A cluster is not

subdivided if its maximum cut exceeds this given value.

More details can be found in [104, 185]. MinCut clustering has some inter-

esting advantages:



Unlike Nearest-Neighbourapproaches,not only distanceinformation are

taken into consideration, i.e. the chaining-effect (see section 2.1.1) is

avoided.



It has got a good theoretical basis (see especially [185]).

The following disadvantages exist:



In some cases, the cut between the clusters is not sufficient as a cluster

criterion: Figures 2.11 and 2.12 show two clusterings at the minimum

cut; obviously only the left clustering is appropriate.

2.1. EXISTING METHODS



MinCut clustering is a greedy algorithm and cannot modify earlier deci-

sions. The clustering in figure 2.12is an example: It would make sense to

revoke the decision and to choose a different minimum cut.



Extra parameters are necessary to define the optimal number of clusters.



Its run-time behaviour makes it inappropriate for larger graphs. In [79],

an overview over the complexity of determining a minimum cut is pre-

sented. E.g. in[117]anefficientalgorithmisgivenwhichneeds

(

log

)

time (

(

)

space). A probabilistic algorithm needing

(

log

))

steps (

(

)

space) has been introduced in [85].

For the following run-time estimations, we assume a run-time complex-

ity of

(

)

for the minimum cut algorithm (dense graphs). If the parti-

tions areequally sized, a maximum of

log

calls to the minimum cut al-

gorithm is necessary. By solving the resulting recursion formula

(

)

, we get an overall run-time of

(

)

. If the partitions are

not equally sized, a recursion formula

(

j

follows

and an overall run-time of

(

)

Fornon densegraphs, weassumearun-timecomplexityof

(



log

)

This results for equally sized partitions in a recursion formula

(

log

(

)

and in an overall complexity of

(

log

)

. For arbitrary partition sizes, we get an overall run-time

(

log

)

Remark 2.1.1 In[111],SvenMeyer zuEissen introducedaMinCut-Clustering method

which uses the



value (see section 2.2.1) to determine when to stop further cluster

divisions.

2.1.3 Optimization Approaches

Additional Knowledge: explicit criterion

Criterion: all

Optimization approaches use a given function

(

)

to rate the

clustering of a graph

= (

V; E

)

. By means of optimization methods, a

clustering is identified which maximizes

. These methods rely on the given

graph only, but they demand the explicit definition of the quality criterion.

Unfortunately,thesheernumberofpossible clusteringsmakesasimple“try

all, keep best”-approachunfeasible. In [108], Liu found the number of possible

partitions of

V; E

)

into

clusters to be equal to:

(







Since

is normally not known, the overall number of possible clusterings

follows as:

all

(

)

CHAPTER 2. STRUCTURE IDENTIFICATION

all

(

)

10 115 975

15 1 382 958 545

20 51 724 158 235 372

25 4 638 590 332 229 999 353

Table 2.1: Complexity of the Optimization problem

Table 2.1 gives for some graph sizes

the resulting

(

)

. For complex-

ity reasons, most methods start with an initial clustering and refine it using a

hill-climbing technique. An overview can be found in [41, 54].

The quality functions used in most applications aremainly variations of the

same idea: Maximize the edge density within clusters and minimize the edge

density between clusters. The edge density of a graph

V; E

)

is defined

. Such approaches can be found in [41, 54, 7]. Most of these methods

demand that the number of clusters is given.

In [143], a method combining the inter-cluster cut and the cluster sizes has

been used.

The advantages of this method are:



The user can define the quality of a clustering explicitly.



All knowledge from the field of optimization can be applied.

Optimization methods are seldomly used, because of the following disadvan-

tages:



For run-time reasons, the optimal solution is almost never found.



If fast optimization techniques as hill-climbing are used, the solution is

often suboptimal.



Researchers do not agree on a good quality function.

2.1.4 Self-Organizing Approaches

Additional Knowledge: none

Criterion: inter-cluster distances

The best known self-organizing clustering approach is due to Kohonen

([9, 96]). The main idea is to mark a subset of the vertices as specimen ver-

tices. These vertices define clusters implicitly: Each vertice belongs its closest

specimen vertice.

Initially, a fraction of the vertice is marked randomly as specimen vertices.

In each step, every specimen vertice is moved into the center of its cluster. The

center of a cluster

is the node

which minimizes

dist

(

v; v

)

where

dist

denotes the graph-theoretic distance between

and

(see defini-

tion B.0.4).

Figure 2.13 shows a typical start situation. The white vertices are the spec-

imen vertices and their clusters are outlined. In the next step (figure 2.14), the

specimen vertices are moved into the center of their cluster.

2.1. EXISTING METHODS

Figure 2.13: Kohonen Clustering:

Step 1 Figure 2.14: Kohonen Clustering:

Step 2

When two specimen vertices move too close together, they are replaced by

one single specimen vertice. Sometimes a new specimen vertice is added, if the

distance between two neighbouring specimen vertices exceeds a given thresh-

old. Therefore, this algorithm is able to find the optimal number of clusters.

Other self-organizing approaches are given in [44, 110, 121]. An extension

to non-spherical clusters can be found in [56].

Sometimes the problem is stated as follows: Find

(

given) specimen ver-

tices

;:::;s

,sothattheterm

dist

(

s; v

)

where

denotes

the cluster implied by

and

dist

denotes the length of the shortest path from

is minimal. This problem is called

-means clustering (see [114] for a good

introduction) and can also be solved using optimization methods (see section

2.1.3).

Kohonen-Clustering possesses some nice features:



The number of clusters can be found automatically.



The theoretical background of the method is well funded.

Some disadvantages reduce the applicability of this method:



The algorithm needs predefinedthreshold valuesin orderto define when

to split and when to unite two specimen vertices.



Only circular clusters can be found.



The run-time behaviour is unsatisfiable: Calculating the centroid of a

cluster with

nodes takes

(

)

times if all nodes distances are known.

Node distances can be computed (e.g. with the Floyd-Warshall algo-

rithm) in

(

)

. If the algorithm needs

iterations, an overall runtime

complexity of

(



)

results. Most practical tests have shown

that in most cases at least

iterations were necessary.

2.1.5 Other Approaches

In the last decades, most of the basic algorithms presented in the last sections

have been enhanced and extended. Furthermore, several different strategies

have been developed. This section gives a brief overview of these methods.

In the last few years, three-way or multilevel approaches have been devel-

oped: First of all, a graph is coarsed, i.e. several nodes are merged into one.

The resulting coarsed graph is clustered and the result is mapped back onto

CHAPTER 2. STRUCTURE IDENTIFICATION

the original graph. Such methods can be found e.g. in [86, 66, 87]. These meth-

ods normally apply standardtechniques (e.g. nearest-neighbouror optimizing

approaches) for the coarsening and the clustering step.

Non-disjunctive or fuzzy clustering finds overlapping clusters, examples of

which are given in [52, 186, 21, 177].

In the field of statistics, several algorithms for clustering already laid out

graphs exist. These approachesnormally presume extra information about the

clusters (distribution of the nodes, shape of the clusters, etc.). Most such sys-

tems combine statistical methods with algorithms presented in the previous

sections. Details and overviews can be found in [41, 54, 142, 182, 6, 62].

Some applications constrain the cluster sizes. E.g. in VLSI or parallel com-

puting clusters with similar sizes are needed. This version of the clustering

problem is most times called “partitioning”. Introductions to this field and to

different methods can be found in [104, 93, 31, 66, 65, 126, 77].

2.2 Natural Clustering

In this work, clustering is applied first of all as a means of data mining, thus so

called “Natural Clusters” are wanted. Such clusters are defined by the graph’s

structureonly, i.e. no node or edgelabels areused. Naturalclusterscorrespond

to clusters as identified by a human analyzer not using any additional domain

knowledge.

The algorithms described in section 2.1 are in most cases not able to find

naturalclusters3. In thissection, two new algorithms fordetecting naturalclus-

tersarepresented,onetheoreticalmethod(section2.2.2)andone fastclustering

algorithm (section 2.2.1).

Many attempts exist to define natural clusters; some formal definitions

have been mentioned in section 2.1.3. Godehardt defines a cluster in [54] as

follows:

“Whatever the case, the clusters should be chosen so that the ob-

jects within a cluster are mutual similar (homogeneity within the

classes) while the representatives of distinct classes are mutual dis-

similar (heterogeneity between the classes).”

T. Roxborough and A. Sen outline in [143]:

“In spite of the differencesof opinions as to what constitutes a clus-

ter, one idea is universally accepted: the nodes belonging to a clus-

termusthavestrongrelationshipbetweenthemincomparisonwith

the nodes outside the cluster.”

In [119] a similar statement can be found:

“The goal of a clustering algorithm is to automatically identify a

set of nodes/or edges that are more strongly connected with each

other than with the rest of the graph.”

Such statements can be found in a large number of work concerning clus-

tering, all of which refer to the definition of natural clusters.

3Reasons for this are discussed in section 2.4.

2.2. NATURAL CLUSTERING

2.2.1 MajorClust

Additional Knowledge: none

Criterion: inter-cluster connections

The Algorithm

In this section, a fast new clustering algorithm (see also [160]) is presented:

Initially, the algorithm assigns each node of a graph its own cluster. Within

the following re-clusteringsteps, anodeadopts thesame clusterasthe majority

of its neighbors belong to. If there exist several such clusters, one of them is

chosen randomly. If re-clustering comes to an end, the algorithm terminates.

Figure 2.15: A definite majority clustering situation (left) and an undecided

majority clustering situation (right).

The left hand side of Figure 2.15shows the definite case: most of the neigh-

bors of the centralnode belong to the left cluster, andthe central node becomes

a member of that cluster. In the situation depicted on the right hand side, the

central node has the choice between two different clusters.

We now write down this algorithm formally.

Algorithm 2.2.1 (MAJORCLUST)

Input. A graph

V; E

Output. A function

which assigns a cluster number to each node.

(1)

false

(2)

(

end

(3) while

false

(4)

true

(5)

(6)







u; v

(





is max.

(7) if



is a unique solution then

false

(8) if

(

)



then

(



(9) end

(10) end

Remark 2.2.1 Choosing a node

in step

and choosing between clusters

with

the same size of



u; v

(



in step

must be totally randomly.

CHAPTER 2. STRUCTURE IDENTIFICATION

Theorem 2.2.1

MAJORCLUST

terminates after

(

)

definite majority decisions. The cluster

change of a node is a definite majority decision if and only if it has happened

in a definite majority situation (see figure 2.15).

Proof.

Let

V; E

)

be a visualization graph.

(

)

is defined as the number

of nodes adjacent to

with a different cluster membership:

(

jff

v; w

gjf

v; w

V; c

(

)

(

)

(

)

is twice the overall

number of inter-cluster edges:

(

) =

(

)

. Obviously

(

)



holds for all visualization graphs

Let

be a node which is about to change its cluster membership in

a definite majority decision. Without loss of generality let

;:::;k

denote the clusters adjacent to

;



is defined as the number of

nodes adjacent to

which belong to cluster

. Without loss of generality, we

assume



:::



. Since

will change its cluster membership in a

definite majority situation,

(

j; j



and

follows.

Let



denote the change of

caused by the cluster change of

decreased by



neighbours decrease their

value by 1, because

(

)=1

after the cluster change of

neighbour increase their

value by 1.

Thus:





(



)



The reader may note that

(

)

does not change for non-definite majority

situations in step

:If

keeps its old cluster membership, no changes happen.

In an undecided majority clustering (see figure 2.15),

holds and



follows.

Since initially

(

)



holds,

MAJORCLUST

terminates after

(

)

definite majority decisions.

The following corollary analyses the run-time behaviour of algorithm 2.2.1.

Corollary 2.2.1 MAJORCLUST terminates, when for

steps (step 6 of algorithm

2.2.1) no definite majority decision has happened. Therefore, MAJORCLUST termi-

nates after

(

jj

)

steps.

Proof.

Follows directly from theorem 2.2.1.

A faster variation of MAJORCLUST is introduced by the next corollary.

Corollary 2.2.2 If MAJORCLUST terminates, when for

p; p

(

constant) steps

(step 6 of algorithm 2.2.1) no definite majority decision has happened, MAJORCLUST

terminates after

(

)

steps.

Proof.

Follows directly from theorem 2.2.1.

Remark 2.2.2 MAJORCLUST does stop, even when in the last

steps an unde-

cidedmajority clusteringsituation has happended(suchsituations aredefined in figure

2.15). This problem may be solved by randomizing the edge weights, i.e. by adding a

small, random value to the edge weights. For such randomized graphs, undecided ma-

jority clustering situation are very unprobable, i.e. MAJORCLUST becomes (almost)

determininistic.

2.2. NATURAL CLUSTERING

Remark 2.2.3 MAJORCLUST can also use the edge weights

: Each neighbour

a node

is assessed using the weight of the edge

Remark 2.2.4 In order to find subclusters within clusters, most cluster algorithm

(including MAJORCLUST) can be extended for weighted graphs as follows: After the

clusters are identified, all edge weights are squared. This operation increases the dif-

ferences between large and small edge weights. Now all subgraphs induced by the

previous clustering are clustered again. MAJORCLUST is specially suited for this ap-

proachsince is automatically detectswhen no more clusters can be found. This version

of MAJORCLUST is called hierarchical MAJORCLUST.

Properties of MAJORCLUST

In the rest of this section some hints will be given why MAJORCLUST finds

reasonable clusters.

Definition 2.2.1 (Stable Regions of a Graph)

Let

= (

V; E

)

be a graph. A connected subgraph

= (

)

induced by



with

(

) =

a; v

, where

denotes

the cluster function from algorithm 2.2.1 is called a stable region iff no further

iterations of

MAJORCLUST

can change

(

)

for all

Figure

2.16:

Two Stable Regions

Figure 2.16 shows two stable regions.

Remark 2.2.5 Clusters found by MAJORCLUST are always stable regions.

Remark 2.2.6 MAJORCLUST may not find all stable regions.

Definition 2.2.2 (Border of a Subgraph)

Let

= (

V; E

)

be a graph and

= (

)

a stable region in

. The

border of

is defined as follows:

bor der

(

j9f

with

Corollary 2.2.3 Let

V; E

)

be a graph and

)

a stable region in

. Furthermore let

cut

(

)

be the cut4of

and let





be defined as follows:

=

bor der

(

)

. Then

cut

(

)

<cut

(

bor der

(

)) + 2

j



holds.

Proof.

bor der

(

)

cut

(

bor der

(

)) + 2

j



because in the sum above some edges are counted twice, furthermore:

4The definition of the term cut can be found in definition B.0.7

CHAPTER 2. STRUCTURE IDENTIFICATION

bor der

(

)

>cut

(

(follows directly from the definition of a stable region)

The theorem above relates the cut at both sides of a border and the number

of edge within the border (



). Two situations are possible: (i) Only few edges

arewithin the border, i.e. we canassume

cut

(

)

cut

(

bor der

(

))

. Inthat

case, the stable area is a reasonable cluster (see beginning of section 2.2 for a

definition of theterm “reasonablecluster”, similar clusterpropertieshave been

used in [188, 99, 98]). (ii)



is relatively high, e.g.



>cut

(

bor der

(

))

Then most edges connected to the border are between nodes within the border

and the border itself forms a circular reasonable cluster area. The reader may

note that in both cases stable areas are reasonable cluster candidates.

Lemma 2.2.1

Let

V; E

)

be a connected graph and

)

a stable region in

with

. Then

contains a cycle

Proof.

The following holds:

(

)

has at least one edge to another node within

(since

is con-

nected)

jj

j

(

)

bor der

(

)

: Such a node

exists, because

holds. Because of

v; v

j 

j  j

can be con-

cluded. It follows directly, that

contains a cycle (see [80], theorem 1.2.3).

Corollary 2.2.4 From lemma 2.2.1 follows, thatevery stable region of a graph

con-

tains at least as many nodes as the smallest cycle in

Thereadermaynotethatcorollary2.2.4makessense: Graphswhich contain

only large cycles do not have the property of “strong local relations” between

groups of nodes, i.e. they contain none or only few clusters. The corollary

makes the same statement about stable areas.

At the beginning of section 2.2, the author has quoted some opinions about

the featuresof optimal natural clusters. One possible conclusion is the demand

of a relatively small cut between differentclusters. It can be shown that the cut

between two stable regions is also relatively small (see also corollary 2.2.3 and

[188, 99, 98]).

Figure

2.17:

Node exchange and cut.

Theorem 2.2.2

Let

V; E

)

be connected graph and be

)

and

)

be two neighboured stable regions in

. Then

cut

(

)

<cut

(

)

5A cycle is defined in definition B.0.2

2.2. NATURAL CLUSTERING

holds.

Proof.

For all

holds:

v; v

. The edges

v; v

are part of

cut

(

)

.If

is moved to cluster

, those

edges arereplacedby the edges

v; v

, i.e. the cut increases. This

is also illustrated by figure 2.17.

A Statistical Model for MAJORCLUST

In this short section, a statistical model will be presented, that may help to

explain the behaviour of MAJORCLUST. First of all a statistical model for the

number of cluster will be given. In addition, an empirical evaluation will sup-

port this model.

Theorem 2.2.3

Let

V; E

)

;:::;v

)

be a connected graph with a constant node

degree of

. Then after one turn of

MAJORCLUST

(one turn is finished when all

node cluster memberships have been adjusted),

(

)

expected clusters will

be left.

Proof.

The only possibility for a node to keep its cluster membership is to give its

cluster membership first to a neighbour and then to get it back afterwards.

The probability that the first node (

) keeps its cluster membership is

. The

probability that

keeps its clustermembership is





(



, because



is the probability that

is a neighbour of

and

is the probability that

took the cluster membership of

and

is also the probability that

got

the same cluster membership back from

The probability that

keeps its cluster membership is then equal to



(





(



. So it follows that afterone turn





(



(

)

expected clusters are left.

To verify this theorem empirically, MAJORCLUST has been applied to ran-

domgraphs. Theserandom graphshavetofulfilltwoconditions: (i)Aconstant

node degree of

and (ii) they have to comprise structure, i.e. areas with a den-

sity above average. Meeting the last condition is a special challenge because

normally structure is a property of graphs that have been used to model a real-

world problem. The domain structure is then imprinted into these graphs.

The graphs used for the evaluation have been created using the following

function:

Algorithm 2.2.2 (CreateRandomGraph)

Input The number of nodes

, a node degree

Used Functions

(

)=2





(

)

where

denotes the standardized normal distribution

(





dist

(

)

denotes the Euclidean distance between the nodes

Output A graph

V; E

)

with

CHAPTER 2. STRUCTURE IDENTIFICATION

( 1) set

V; E

)

with

(2)for

(3) for

( 4) create a new node

(5)



(

)=(

x; y

)





(

)

denotes the position of node



(6)

(7) end

(8)end

(9)do

(10) for all

V; v

(11) if

deg r ee

(

)

and

deg r ee

(

)

then

(12) With a probability

(

dist

(

))

add the edge

(13) end

(14) until

(

dist

(

))

Figure

2.18:

The average number of clusters for growing

The algorithms above creates first of all a grid of

nodes (steps 1-8), i.e.

each node has position within a



grid. In addition, an edge

created with a probability which depends on the Euclidean distance between

and

(step 12). This guarantees for the creation of neighbourhoods which

are connected higher-than-average. An edge is not createdif it would result in

a node degree higher than

(step 11).

Figure 2.18 shows the average number of clusters

(

av g

)

after one turn

of MAJORCLUST over the number of nodes

(

)

for a node degreeof

. Its linear

nature is obvious. Figure 2.19 depicts the average number of clusters

(

av g

)

over the node degree (

deg

) for

1000

nodes. The dependent variable is shown

using a logarithmic axis. It can be seen that the function behaves like

deg

for

larger

deg

. The diagrams do not change for different

2.2. NATURAL CLUSTERING

DEG

6050403020100

avg#cl

4,0

3,5

3,0

2,5

2,0

1,5

1,0

135,0

115,0

100,0

Figure

2.19:

The average number of clusters for growing node degrees

1000

2000

3000

4000

5000

6000

0 5000 10000 15000 20000 25000

|E|

sec

deg=1.9 deg=2.4 deg=2.9

Figure

2.20:

Run-time behaviour of

MAJORCLUST

for different numbers of edges.

These randomly created graphs also allow for an estimate of run-time be-

haviour: Figure 2.20 depicts for large artificial graphs the required run-time6

over the number of edges. Three different average node degrees are shown.

Since more clusters exist for graphs with lower average node degrees (see ex-

planations above), it makes sense that for such graphs the run-time increases.

Obviously MAJORCLUST behaves very nicely.

2.2.2



-Maximization

Additional Knowledge: none

Criterion:



-Maximization is based on the idea of weighting each node by the connec-

tivity of its appurtenant cluster. By maximizing the sum of node weights, the

resulting clusters closely resemble the human idea of an ideal graph decom-

positon.

The edge connectivity



(

)

of a graph

= (

V; E

)

denotes the minimum

number of edges that must be removed to make

a not-connected graph:



(

) = min



and

V; E

)

is not connected}.

6Clisp on a Pentium II, 400 Mhz has been used.

CHAPTER 2. STRUCTURE IDENTIFICATION

Definition 2.2.3 (



)

Let

V; E

)

beagraph,andlet

;:::;C

bea disjunctive clustering

. The

weighted partial connectivity

(

)

, is defined as

(

):=

j



;

where



(

)





designates the edge connectivity of

(

)

, where

(

)

is the sub-

graph induced by

(see definition B.0.6).

Figure 2.21 illustrates the weighted partial connectivity



λ = 2

λ = 1

λ = 2

Λ = 3∗2+2∗1+3∗2 = 14

Λ = 5∗1+3∗2 = 11

λ = 1

λ = 2

λ = 3

Λ =Λ∗= 4∗2+4∗3 = 20

Figure

2.21:

Graph decompositions and related



values.

Definition 2.2.4 (



-Structure)

Let

= (

V; E

)

be a graph, and let



be a disjunctive clustering of

that

maximizes



(



)





:= max

(

)

is a disjunctive clustering of

Then the structure

(

G; C



)

(see definition 2.0.4) is called



-structure

of the

system represented by

Figure 2.22 shows that



-maximization means structure identification.

Unstructured graph Structure

Λ∗ =12

Λ∗ = 46

Optimum structured graph wrt.

Figure

2.22:

Examples for decomposing a graph according to the structure definition.

2.2. NATURAL CLUSTERING

Remark 2.2.7 A key feature of the above structure measure is its implicitdefinition of

a structure’s number of clusters.

Two rules of decomposition, which are implied in the structure definition,

are worth to be noted. (i) If for a (sub)graph

= (

V; E

)

and a disjunctive

clustering

;:::;C

the strong splitting condition



(

)

min



;:::;

is fulfilled,

will be decomposed. Note that (i) the strong splitting condi-

tion is commensurate for decomposition. Obviously this splitting rule follows

the human sense when identifying clusters in a graph.

(ii) If for no disjunctive clustering

the strong splitting condition holds,

will be decomposed only, if for some

the condition

j



(

)

(

)

fulfilled. This inequality forms a necessary condition for decomposition.

Theweighted partialconnectivity,



,canbemadeindependentofthe graph

size by dividing it by the graph’s node number

. The resulting normalized



value is designated by











It is usefulto extend thestructureidentification approachby integrating the

edge weights

. For this a generalization of

(

)

is introducing: the weighted

edge connectivity





of a graph. It is defined as follows:



(

) = min

(

) :



and

V; E

is not connected}.

Using this definition, all previous results can be directly extended to graphs

with edge weights.

The following theorem relates Min-cut-clustering (see section 2.1.2) to clus-

tering by means of



-maximization.

Theorem 2.2.4 (Strong Splitting Condition)

Let

= (

V; E

)

be a (sub)graph and

a disjunctive cluster-

ing in

two

clusters. Then applying the strong splitting condition (



(

)

min



;

) results in a decomposition at a minimum cut.

Proof of Theorem. Every clustering

(

)

would decompose

, or both. Since

min



;

>

(

)

can not be a clustering at a minimum

cut. Therefore only

can be a decomposition at a minimum cut.

Remark 2.2.8 When the strong splitting condition does not hold, an optimum de-

composition according to the structuring value need not be the same decomposition

as found using the minimum cut. This is because of the latter’s disregard for cluster

sizes. Figure 2.23 is such an example. Here

refers to a clique with



nodes.

An optimum solution according to the weighted partial connectivity



(which is also

closer to human sense of esthetics) consists of one cluster

and a second

cluster

. An algorithm using the minimum cut would only separate

Thereadermayalsonoticethat,asmentionedbefore,maximizingtheweight-

ed partial connectivity implies an optimum number of clusters, while most

other known clustering algorithms need additional parameters defining more

or less explicitly the number of clusters.

The main advantages of this method are:



The



-value strongly resembles the human assessments of clusterings.

CHAPTER 2. STRUCTURE IDENTIFICATION

Cx Clustering according to MinCutClustering according to Λ

Figure 2.23: Weighted partial connectivity (



-) maximization versus Min-Cut-

clustering.

Figure 2.24:

The problematic cases for the



-maximization.



The number of clusters is defined implicitly.

The disadvantages of



-maximization are:



No fast method for finding the



-structure exists yet.





-maximization fails to find the optimal clustering in some cases. Fig-

ure 2.24 shows two examples: While the optimal clustering on the left

side can still be found by using the node connectivity instead of the edge

connectivity7, both variations fail for the graph on the right hand side.

2.3 Adaptive Clustering

In some domains, additional knowledge can be integrated into the clustering

process. The most popular approach works by using node labels: Heuristic

rules and algorithms are defined which compute clusters by means of graph-

theoretic information and by using given node labels. An example can be

found in [149, 163]: Special clusters (so-called hydraulic axis) are found by

using graph grammars and path-search techniques, both of which rely heavily

on node labels.

The main drawback of this approach is the time needed to define the rules

and algorithms. In the example mentioned above, ittook one master thesis and

additional work of approximately 4 month to develop a working system. The

main reason for this is that using node labels leads to a knowledge acquisition

problem: The necessary knowledge has to be extracted from an expert, who in

most cases is not explicitly aware of his knowledge.

One solution is to use Machine Learning. For this, the expert gives a typi-

cal set of clustered and labeled graphs and a system tries to conclude domain

specific clustering knowledge from the examples.

7See [80] for an introduction to node connectivity

2.4. EVALUATION

Such a system has been developed by the author. A description can be

found in section 6.4.18.

2.4 Evaluation

As outlined in section 1.7, the methods described in this chapter are rated by

the four features introduced at beginning of chapter 1. This allows for an as-

sessment whether the AI-visualization paradigm has been followed.

1. Object-based Presentation: All clustering methods result in a clustering

of objects. Even after the clustering step, the individual object do not

vanish. Unlike methods where the cluster replacesthe objects, clustering

information is used here only as additional information.

3. Abstraction: Clustering is well suited to abstract knowledge coded into

a graph. It is based on the assumption that edges connect nodes which

somehow belong together, i.e. highly connected subgraphs form clusters.

Clustering imitates the human approach of comprehending complex sys-

tems: find reasonable subsystems and work on different levels of ab-

stractions. I.e. either subsystems are examined separately or informa-

tion within subsystems is disregarded and only the global structure is

analyzed. Therefore, the approach used in this work can be viewed as a

typical AI-method: identify the human problem solving process and find

an algorithm which emulates it.

The reader may note that comparing different clustering algorithms is

difficult since no generally accepted quality measure exists. The author

personallyprefersthe



-valueasa quality measure,but thatquality func-

tion is neither generally accepted nor does it work in all cases (see section

2.2.2 for details). Therefore, only two methods remain to compare clus-

tering algorithms:

(a) Theoretical evaluations (as have been given in this chapter) may

help to choose the most suitable algorithm. The abstraction qual-

ity of clustering algorithms depends mainly on the clustering cri-

terion. Some criterions have obvious disadvantages, e.g. the dis-

tance information used for nearest-neighbour clustering. Optimiza-

tion approaches suffer from the problem that no generally applica-

ble quality measureexists. MAJORCLUST,



-Optimization, Divisive-

Clustering, and Self-Organizing-Clustering are harder to compare;

all algorithms have specific advantages and disadvantages. Those

four algorithms must be compared using real-word problems (see

below).

Run-time considerations also influence the choice between differ-

ent methods. From that point of view, MAJORCLUST and Nearest-

Neighbour clustering should be prefered. Divisive and Self-Organi-

zing-Clustering canstill beapplied tosmall problems,while



-Opti-

mizationandgeneraloptimization approachesarenormallynot suit-

able for larger graphs.

8The description has been placed in the chapter “Hydraulic Circuits” because it has mainly

been applied to that domain.

CHAPTER 2. STRUCTURE IDENTIFICATION

(b) Clustering algorithms can also be compared using real-world prob-

lems: In chapters 6 and 7, six such applications are described.

MAJORCLUST is applied successfully to all six domains. Self-org-

anizing approaches also prove helpful, but normally have problems

finding the optimal number of clusters. Divisive algorithms suffer

from an unsatisfiable run-time behaviour and from the problem de-

scribed in figure 2.12. For some small graphs, this approach shows

promising results (e.g. in section 6.4).

500

1000

1500

2000

2500

3000

3500

4000

0 500 1000 1500 2000 2500 3000 3500 4000

|E|

sec

MajorClust

Kohonen

MinCut

Figure

2.25:

Run-time behaviour of three clustering algorithms. Only clustering results

have been used which passed a visual inspection.

Some of the results from chapters 6 and 7 are summarized below:

Figure 2.25 compares the run-time behaviour of MAJORCLUST, self-

organizing clustering, and MinCut Clustering (according to remark

2.1.1). It shows the run-time9(in seconds) over the number of edges.

One of the main problems concerning the run-time comparison of

clustering algorithms is that in many cases bad quality causes a fast

performance. E.g. MinCut-clustering only needs few steps if it car-

ries out only a small number of cluster divisions, but such cluster-

ings are very often suboptimal. The fact that in this work clustering

is applied to several domains allows for a different approach:

typical graphs from all six domains are clustered and the results are

verified by a user. Only such graphs which pass this manual inspec-

tion areused for the run-time comparison. The readermay note that

this procedure still causes some problems: (i) The quality of the re-

sults is disregarded. E.g. algorithm

may need

15%

less time than

algorithm

, but its quality might be

30%

worse than the quality of

algorithm

.(ii) The graphs from the six domains are not typical

for all graph classes. (iii) Several versions and implementations of

the same clustering algorithm exist. This test may disregard some

improvements to the standard algorithms.

9The tests have been carried out on a Pentium II, 400Mhz using CLisp.

2.4. EVALUATION

Nevertheless, figure 2.25 may help to give a notion of the differ-

ent run-time behaviour. Obviously, MAJORCLUST shows the best

performancewhileself-organizingclusteringandespeciallyMinCut

Clustering are not suited for larger graphs. The run-time behaviour

of MAJORCLUST is also evaluated using artificial graphs in section

2.17.

100

Quality

MajorClust

Kohonen

MinCut

Figure

2.26:

The quality of the results from different clustering algorithms.

Figure 2.26 presents the percentage of accepted clustering results.

Obviously, MAJORCLUST is able to identify reasonable clusters in

most cases while self-organizing clustering and especially MinCut-

Clustering have more problems. The unsatisfactory quality of Min-

CutClustering is mainly dueto the problemdescribed infigure 2.12.

Kohonen clustering suffers from two main problems: (i) It often

has problems finding the optimal number of clusters and (ii) non-

circular clusters can not be detected.

Run-time behaviourand clustering qualityarediscussed inchapters

6 and 7 for each domain separately.

4. User Adaption: In most cases, all necessary knowledge for identifying

natural10 clusters is coded into the graph’s structure. Therefore, any ad-

ditional parameters should be viewed as a weakness of the clustering al-

gorithms. In other words, the best user-adaption in clustering is no user-

adaption. Thus, MAJORCLUST and



-optimization and (partially) some

self-organizingapproachesareoptimal with respectto user-adaption. All

other algorithms rely on extra information.

In some cases, the information coded into the edges and edge weights

in not sufficient, i.e. non-natural clusters are wanted. In that case either

specialized algorithms or algorithms with parameters have to be used.

When specialized methods have to be found, the application of cluster

learning as described in section 2.3 and 6.4.1 allows for the definition

of cluster detection methods on an abstract level, thus supporting the

knowledge acquisition and user-adaption process.

10see section 2.2

CHAPTER 2. STRUCTURE IDENTIFICATION

Chapter 3

Structure Classification

In this chapter, two methods for the automatic classification of clusters arepre-

sented. Section 3.1 discusses direct classification methods, while section 3.2

deals with case-based approaches. Both section also describe algorithms for

learning the necessary classification knowledge from given exemplary cluster-

ings.

Clusterlabelscansupportthehuman understand of complex graphs. E.g. a

cluster in a network traffic graph may be identified as “AI Lab” or as “Human

Resource Management”1. An example drawing can seen in figure 6.8.

Another application is knowledge-bases, e.g. a configuration knowledge-

base modeling a technical system. Since such knowledge-bases form a graph,

clustering can be used to identify technical subsystem. Labeling these clusters

with the name of the corresponding technical subsystem increases the value of

the clustering step. This application is described in detail in section 6.2.

Therefore, an automatic labeling or classification of clusters is desirable.

This can be done by a classification function mapping clusters onto cluster la-

bels. Manual definitions of such functions are highly complex. Therefore, this

work will concentrate on the usage of machine learning algorithms to define

classification functions. In this chapter, two classification approaches are pre-

sented: (i) Direct classification and (ii) Case-based Classification.

Typical datastructures for graphsandclusters (e.g. adjacencylists or matri-

ces) arenot well suited as an input for classification functions for the following

reasons:

☞Similar (or even isomorph graphs) can result in differentdata structures.

☞Users think about cluster similarities in terms of abstract graph features

as the number of nodes, diameter, or domain dependent properties. Such

features are not part of an adjacency list or matrix.

For these reasons, it makes sense to represent a cluster by a set of abstract

graph features. Thus, for each cluster

a vector

!

(

)

comprising sev-

eral graph features is calculated. Table 3.1 shows typical graph features.

The cluster classification function

can now be formally defined:

1An extensivedescription of cluster classification in thedomain of network traffic can be found

in section 6.1.

CHAPTER 3. STRUCTURE CLASSIFICATION

Feature

number of connected components

edge connection

number of biconnected components

number of nodes

number of edges

minimum/average/maximum distance between nodes

diameter

minimum/average/maximum node degree

directed/ undirected

number of clusters as found by MAJORCLUST

Table 3.1: Important graph features for drawing purposes

Definition 3.0.1 (Cluster Classification Function)

Let

be a cluster of a graph and let

!

(

)

be the feature vector of

. A func-

tion



! f

;:::;l

, where

denotes a cluster label, is called a cluster

classification function.

(

!

(

))

denotes the label of cluster

In the definition above, only real numbers are allowed as cluster features.

This is sufficient for the applications presented in this work. In general, nomi-

nal and ordinal features (see also section 5.1) could be used as well.

3.1 Direct Classification

Direct classification tries to map cluster features directly onto cluster labels.

Since manually finding such a classification function is a difficult task, the fol-

lowing approach can be used: The user gives a set of clusters and the corre-

sponding labels. By applying Machine Learning, the functional relation be-

tween cluster featuresand cluster labels is determined. An application is given

in the next section. In section 6.2, clusters in knowledge-bases are classified.

3.1.1 Learning the optimal Layout Method

Inmanydomains,nooptimalgraph-drawingmethodexists,buttheuserprefers

different drawing methods for different graphs or even for different clusters.

Theideapresentedinthissectionis toanalyzetheuser’spreferences(i.e. which

method is used for which graph or cluster) and to learn a function which maps

a graph or cluster onto a layout method.

The rest of this section elaborates on how the problem of finding a classifi-

cation function

can be reducedto a standard regression problem, making the

automatic learning of

possible.

The main idea is to have the user give some exemplary cluster labelings,

learn a correlation between cluster features and labels, and use this learned

knowledge to classify new clusters.

The learning process is quite simple: A set of typical clusters

;:::;C

has to be given. For each cluster

, the userchooses his favoritelayout method

(

)

(see definition 3.0.1). This results in a database

of classified feature

vectors

!

(

)

(

)

>; :::;<

!

(

)

(

)

3.2. CASE-BASED CLASSIFICATION

Figure

3.1:

Different graph drawing methods (spring embedder for the top-left cluster,

circuit layout on the right, and a level-oriented approach for the bottom

cluster) have been chosen automatically accordingto previous user

preferences.

The reader may note that such classified feature vectors can be obtained

during the normal course of work: Every time the user chooses a special layout

method, the system analyzes his behaviour. After a while, the system will be

able to choose the layout method automatically.

Databaseslike

arenormal inputfor standardregressionalgorithms (see

section A.2), i.e.

can be learned by applying regression to

learns which

features support or weaken the applicability of a graph drawing method. For

runtime reasons, it may bereasonable for largedatabases and complex feature-

vectors to use neural-networks as a heuristic method to solve the regression

problem.

Thismethod hasbeenappliedinsection 6.2tothevisualizationofknowledge-

bases.

3.2 Case-Based Classification

Case-basedClassification can be applied to the problem of clusterclassification

as follows: Old cluster classifications are saved together with the correspond-

ing clusters. Again, this leads to a database

of classified feature vectors

!

(

)

(

)

>; :::;<

!

(

)

(

)

. For a new feature vector

the most similar existing vector is retrievedand the classification of this vector

is also used for the new cluster. In order to find a similar vector, a so-called

similarity measure

sim







is used.

sim

(

!

(

)

;

!

(

)

calculates

the similarity between two clusters

and

While similarity measures are used here in a different way than in section

5, all definitions, methods, and results from that section can nevertheless be

applied to Case-based Classification. Obviously, similarity measures are an in-

strument used in differentareasof computer science. The methods for learning

similarity measures from chapter 5 can be especially applied.

Thus, the classification function is definedindirectly: Let

be anew cluster

and let

be defined as above.

(

)

is defined as

(

)=~

, where

(

)= ~

CHAPTER 3. STRUCTURE CLASSIFICATION

and

is the cluster in

which maximizes the function

sim



(

C; C

)

Case-based Classification is a special form of Case-based Reasoning (CBR).

Good introductions to CBR can be found in [159, 97].

In section 6.1.2, Case-based Classification is applied to the identification of

clusters in traffic graphs. For this, a manually defined similarity measure has

been developed.

3.3 Evaluation

As outlined in section 1.7, the methods described in this chapter are rated by

the four features introduced at the beginning of chapter 1. This allows for an

assessmentofwhetherornot theAI-visualizationparadigmhasbeenfollowed.

1. Object-based Presentation: Since clusters and not objects are the subject

of this chapter, the object-oriented concept can not be violated.

3. Abstraction: Cluster classification is another step of abstraction: After

a graph is abstracted into a system of clusters, each cluster is further

abstracted into a single symbol, i.e. its label. Again, this approach is

borrowed from human problem solving methods: As many author have

stated before(e.g. see [109, 120]), human thinking often happens in terms

of symbols. It therefore makes sense to associate clusters with labels.

The abstraction power of these methods mainly depends on the under-

lying classification or similarity function (see appendix A for regression

and neural networks and chapter 5 for the learning of similarity func-

tions). Allmethodsarealsoevaluatedusingreal-worldproblemsinchap-

ters 6 and 7.

4. User Adaption: Cluster classification adapts the visualization methodol-

ogy to the user by labeling identified clusters with meaningful symbols.

Three ways of user-adaption have been introduced and assessed in this

section:

(a) Manual Definition of a Classification Function: This method de-

mands the formulation of knowledge in the form of a mathematical

function and is therefore not applicable in most cases.

(b) Learning of a Direct Classification Function: In section 3.1, the

cluster classification process is adapted to the user by applying Ma-

chine Learning: Exemplary labelings are analyzed and a classifica-

tion function is abstracted. This classification method is used for

the identification of optimal clustering methods and cluster labels

in section 6.2. Users find the notion of automatically chosen layout

methods to be interesting and promising (see e.g. [125]).

This approachis almostoptimal froma knowledge acquisition point

of view; the user behaviour is observed and compiled into classifi-

cation knowledge. The following comparison to case-based classifi-

cation reveals some disadvantages.

to direct classification, interesting advantages: (i) New labels are

3.3. EVALUATION

“learned” simply by remembering the corresponding cases2. The

reader may note that by integrating new cases, the classification is

adapted to the user. (ii) Sometimes finding (or learning) a similarity

measureis easier than finding a function for the directclassification.

E.g. only experts are able to classify wine, but most people can as-

sess the similarity between two given wines. (iii) In some domains,

e.g. in medical science, cases are a natural form of knowledge rep-

resentation.

The following two methods forfinding similarity measuresdifferby

the explicity of the demanded knowledge:

i. Manual Definition of a Similarity Measure: Explicit knowl-

edge about object similarities is needed for this method. An

example is given in section 6.1.2.

ii. Learning of a Similarity Measure: Abstract and user-friendly

methods for the definition of similarity measures are given in

section 5.2.2. An examples is given in section 7.1.1.

These methods arealso discussed for each domain separatelyin chapters

6 and 7.

2Assuming that the similarity measure is still correct.

CHAPTER 3. STRUCTURE CLASSIFICATION

Chapter 4

Structure Envisioning

In this chapter, methods arepresented which compute a graph layout (see def-

inition 1.4.2) for a given (clustered) visualization graph (see definition 1.4.1).

First, existing algorithms for drawing graphs are described in section 4.1. New

graph drawing methods are described in section 4.2. All algorithms are ex-

tended in section 4.3 to allow for the integration of cluster knowledge.

In this chapter, only theoretical evaluations of the presented methods are

given. Anevaluationusing graphsfromvariousrealisticdomains canbefound

in chapters 6 and 7.

Two categories are used to classify graph drawing techniques:

I. QualityCriterion: Somedrawingmethodsuselocalcriterions,e.g. edges,

to define a good layout. Other techniques are based on global features of

the graph, e.g. graph diameters. This criterion is called quality criterion.

II. GraphClass: While some graph drawingalgorithms can be used with all

graphs, other methods are mainly suited for special graph classes. Typi-

cal graph classes are “Directed Graphs”, “Weighted Graphs”, “Trees”, or

“Bipartite Graphs”.

4.1 Existing Methods

Graph drawing is the area of computer science dealing with the problem of

finding two- or three-dimensional1positions for the nodes of a graph. Nor-

mally, an esthetic graph layout is wanted. Though, the visualization method

presented in this work emphasizes the data-mining aspect: Graphs are visu-

alized to support the analysis of large and complex data. Hence, similarity

definitions, clustering, and cluster classification have been used. This work is

intended as a contribution to the field of visual data-mining, not mainly to the

field of graph drawing.

Much researchhas been done recently in the relatively young areaof graph

drawing; good overviews can be found in [49, 8] or in the proceedings of the

annualconference“GraphDrawing”(publishedbySpringer). Inthe restofthis

section, the most popular classes of graph drawing algorithms are presented.

13-dimensional drawings are not dealt with in this work.

CHAPTER 4. STRUCTURE ENVISIONING

This short introduction focuses on algorithms suited for general (directed)

graphs. Methods developed for special graph classes (trees, acyclic graph) are

mentioned in section 4.1.4.

4.1.1 Force-Directed Drawing

Criterion: local: edge lengths

Class: general graphs

Force-directed drawing methods rely on knowledge about optimal edge

lengths. Wrong edgelengths arecorrectedby moving the corresponding nodes

into a better position.

This is often illustrated using an analogy from the field of physics: Nodes

correspondtosmallbodies, whileedgescorrespondtosprings connecting them.

Starting from arandom layout, the spring forcesmove the bodies into such po-

sitions that leave the springs as relaxed as possible. In order to simulate such a

system on a computer, all forces affecting a node must be calculated and itera-

tively applied to the bodies.

Figure

4.1:

One step of a force-directedlayout, forces are depicted as arrows,the

optimal edge length can be seen at the bottom of the figure.

Figure 4.1 shows an example: On the left side a randomly laid out graph

can be seen; edge lengths are suboptimal. The optimal edge length is shown

at the bottom of the figure. Arrows depict the forces on the nodes; these forces

move the nodes into the positions shown on the right side of the figure.

An alternative view of such approaches uses energy functions: The sum

of deviations between optimal edge lengths and actual edge length is called

the energy or the overall-error of the graph layout. Force-directedmethods try

to find a layout minimizing that energy. From this point of view, calculating

forces on nodes and moving them respectively may be seen as a local search

heuristics for finding a minimum of the energy function. The reader may note

that by using forces, only a local minimum can be found.

Several realizations of this algorithm exist, two typical implementations are

presented in more detail:



Eades: P. Eeades introduced the idea of force-directed drawing to the

field of graph-drawing ([33]) and named his approach spring-embedder.

Two connected nodes

placed at a distance of

dist

are attracted

with aforceproportionalto

log(

dist

)

where

denotestheoptimal

distance between

and

. This force obviously attracts the nodes only

dist

> Æ

, otherwise is repels them. Not connected nodes are

repelled by a force proportional to

. Eades moves iteratively one

node after the other until an equilibrium as achieved.

4.1. EXISTING METHODS



Kamada and Kawai: This approach has been developed in [83] and, in-

dependently, in [101]. The optical distance between nodes is defined by

their graph-theoretical distance2in the graph. Applying the energy view

onto the graph layout, Kamadaand Kawai assumethe energy in theedge

between

and

to be



(

dist



)

, where

dist

again denotes the

actual (Euclidean) distance and

the optimal distance.

The solution is found iteratively, Kamada and Kawai always move that

nodes first, whose movement results in the highest improvement of the

energy function. The direction of the movement is defined by the local

gradient of the energy function.

Other variations have been presented e.g. by Fruchtermann and Reingold

in[47],whouseddifferentforcestoimprovetherun-timebehaviour. Sugiyama

and Misue introduced in [167, 168] an algorithm which incorporated new con-

straints by adding magnetic forces. Tunkelang applied in [175] numerical an-

alytic methods to improve the optimization process, another method has been

proposed in [128]. Some author, e.g. in [19, 29], have used different optimiza-

tion method. A good overview of force-directeddrawing can be found in [8].

In this work, two versions of force-directed layout algorithms have been

used: Tunkelang’s method3and Eades’s spring-embedder.

Force-directed method offer many advantages:



It is intuitively appealing.



The results often resemblethe human esthetic notion of nice graph draw-

ings.

It also has disadvantages:



The run-time behaviourallows only for the visualization of small graphs:

In every step

(

)

pairs of nodes have to be evaluated. Since for most

graphsatleast

(

)

stepsarenecessaryforagood layout, a complexity of

(

)

results. Tunkelang’s methods needs only

(

jj

log

)

time (assuming

iterations). Clustering provides a way to reduce the

complexity, e.g. in section 2 or in [57, 61].



Edge crossings and other constraints are not taken into consideration in

the basic version of the algorithm; the only optimization criterion is the

given optimal edge length. These problems have been the subject of sev-

eral improvements to the basic method, e.g. in [167, 35, 19].

Two examples can be seen in figure 4.2, a protein-network on the left (sec-

tion 6.3) and a knowledge-base for the configuration of telephone systems (see

section 6.2).

2see definition B.0.4

3Implemented in [178].

CHAPTER 4. STRUCTURE ENVISIONING

Figure 4.2:

A Protein-network (left side) and a knowledge-base of a telephone system

(right side) visualized by a spring-embedder

4.1.2 Multidimensional Scaling

Criterion: local: node distances

Class: general graphs

A similar or even identical graph visualization technique to force-directed

drawing (section 4.1.1) is multidimensional scaling. This approach stems from

the field of statistics and descriptive data analysis (see e.g. [172]). Good over-

views can be found in [17, 6]. A good comparison to force-directed drawing

and to other graph layout methods is given in [30].

Generallyspeaking, Multidimensional Scalingdefinesenergyfunctionssim-

ilar to force-directed drawing (see section 4.1.1) and applies gradient methods

to find a local minimum. In addition to the different history, the main dif-

ferences to force-directed drawing are (i) modified energy functions, (ii) the

usage of a totally connected graph (the force-directed layout method due to

Kamada and Kawai [83] can be seen as a multidimensional scaling algorithm)

and (iii) different optimization algorithms.

4.1.3 Level-Oriented Drawing

Criterion: global: graph inherent hierarchy

Class: directed graphs with few cycles

Sugiyama, Tagawa and Toda introduced in [81] a new layout approach

comprising three steps. These steps are also shown in figure 4.3:

1. Layer Assignment: Nodes are assigned to horizontal layers.

2. Crossing Reduction: Nodes within a layer are ordered to minimize the

number of edge crossings.

3. Horizontal Coordinate Assignment: Nodes are assigned x-coordinates.

This method normally assumes directed graphs. The first step associates

nodes with (horizontal) layers in such a way that nodes within a layer are not

connected. For acyclic graphs, this can be done by a topological sorting of

the nodes. Cyclic graphs are transformed into acyclic graphs (see [135, 8]) by

reversing the direction of some edges. For complexity reasons (the so-called

feedback arc set problem is NP-complete, see [51]), heuristics are applied.

4.1. EXISTING METHODS

Figure

4.3:

The three steps of the level-oriented layout approach.

If connected nodes are not in neighboured layers, edges have to cross sev-

eral layers. To improvethe optical appearanceof such edges, so-called dummy

nodes areinserted. E.g. for an edge between a node

in layer 1 and a node

in layer 3, a dummy node

in layer 2 is created. The original edge

replaced by the edges

. This can also be seen in figure 4.3.

Within the second step, an ordering of the nodes is found that minimizes

the edges crossings between layers. Since this problem is also NP-complete,

heuristics (e.g. barycenter method, see [81]) are used. An overview of existing

heuristics can be found in [8].

In a third step a balanced layout is found. For this, positions for the nodes

are found without changing the ordering from step 2. The balanced layout is

found by minimizing the distances between connected nodes. Details can be

found in [50].

Several improvements to this algorithm exist, e.g. the inclusion of com-

pound or cluster information (see [148, 166, 147, 39]). A general overview can

be found in [147, 8].

The main advantages are:



Graphs which comprise a general flow, i.e. a general direction of the

edges, can be drawn nicely.



Graphs which model a given hierarchy, e.g. the positions in a company,

are drawn in a natural way.



Bipartite graphs (for a definition refer [80]) are laid out nicely, since each

layer can only comprise nodes of the same type.

Disadvantages are:



The method is not well suited to undirected graphs. Of course, undi-

rected graphs may be transformed into directed graphs, but randomly

chosen edge directions do not carry any real information.



Graphswhich donot haveahierarchicalstructurearedrawnratherpoorly.

An exampleofa level-orienteddrawing canbe seenon the leftside offigure

4.4.

CHAPTER 4. STRUCTURE ENVISIONING

Figure 4.4:

A fluidic circuit visualized by a level-oriented algorithm (left side) and net-

work traffic visualized by a circular layout algorithm (right side).

4.1.4 Other Methods

Several other methods for laying out a graph exist. A classification can be seen

in table 4.1.

Name Criterion Class

CircularLayout global: minimum spanning tree all

Flow-based Drawing local: edges planar graphs

Self-Organizing local: edge lengths all

Table 4.1: Classification of some Graph Drawing Methods

Tree Layout algorithms are specialized in laying out trees; examples can

be found in [139, 169]. A popular approach is the circular layout ([34, 12]) :

The root of the tree is placed in the center of the plane and all descendants are

positioned on concentric circle around the root node.

The reader may note that tree drawing algorithms can be applied to all

graphs by computing the minimum spanning tree first and using only edges

for the visualization process included in the spanning tree. The root of the

spanning tree is chosen in such a way that the height of the tree is minimized,

the reader may refer to [115] for details. An example can be seen in Figure 4.4

(right side).

Flow-based drawings form a popular class of layout algorithms for planar

graphs. The drawings are orthogonal, i.e. edges are drawn rectangular and

edge bends are minimized. An overview can be found in [8]. The reader may

note that though most algorithms compute global features of the graph (e.g.

minimum cost flow), the criterion itself relies mainly on local graph features.

Orthogonal drawing algorithms exist for general graphs as well: Some author

planarize a graph and apply layout techniques for planar graphs afterwards

(e.g. in [76, 129, 8]), while others construct a orthogonal drawing directly (e.g.

in [131, 8]).

Self-organizinglayouts usingKohonen maps havebeenintroducedin [110].

Several authors have combined clustering knowledge and graph drawing:

In [32], a method for clustering already laid out graphs is presented. This

method subdivides the graph into almost equally sized partitions. A clas-

sical optimization approach to clustering has been combined with a spring-

4.2. FACTOR-BASED DRAWING

embedder in [145]. In [143], a divisive clustering technique has been used

which also tries to generate equally sized clusters. A force-directed layout

method has been applied to visualize the resulting clusters.

Several papers exists which introduce layout techniques for already clus-

tered graphs: In [179, 70, 107] clustered graphs are visualized by means of

force-directed layout algorithms. 3-dimensional drawing of clustered graphs

are described in [36]. Orthogonal drawings of clustered graphs are dealt with

in [37]. In [166, 148], level-oriented layout algorithms areextended to allow for

an integration of clustering knowledge.

4.2 Factor-Based Drawing

Criterion: global: variances

Class: all

Most graph drawing methods presented in section 4.1 can be ranked into

two types:



Some methods use local information to find a layout (e.g. force-directed

drawing, self-organizing layout, etc.). Since they disregard global infor-

mation about the graph, the resulting layout is often locally pleasing but

fails to emphasize the global graph structure.



Other algorithms use a global quality criterion which exploits a-priori

knowledge about the graph’s structure (e.g. level-oriented layout, cir-

cular layout, etc.). Of course, these algorithms can only be applied to

specific graph classes.

This section introduces a new method which is applicable to general graphs

and which uses a global optimization criterion.

Inthefieldofstatistics,principlecomponentanalysisisapopulardataanal-

ysis technique4(e.g. see [11, 6, 105, 141]). Given a tabular data set

with

fea-

tures, principle component analysis reduces

to a new data set

with only

m; m < n

features, whereby as much information as possible is preserved.

So far, while being a popular method for visualizing tabular data sets in the

area of statistics, principle component analysis has not been applied to graph

drawing. The main idea presented here is as follows: The adjacency matrix

of a given graph

V; E

)

is interpreted as a tabular data set. Each row is

the description of a node, i.e. nodes are described by their edges. This

dimensional tabular data set is reducedto a 2-dimensional data set using prin-

ciple component analysis. The new 2-dimensional features of a node are used

as visualization coordinates.

The basic idea of principle component analysis is now described using an

example: Figure 4.5 shows a 2-dimensional data set on the left side. This data

set is transfered into a new 2-dimensional data set. First of all, an ellipse is

placed around the nodes. The first new dimension runs from one tip of the

ellipse to the other (i.e. the first axis of the ellipse). This axis is depicted in the

figure as a solid line. The second new dimension (see the dotted line in the

4Some authors classify it as a special factor analysis method.

CHAPTER 4. STRUCTURE ENVISIONING

Figure

4.5:

The identification of the new dimensions by principlecomponent analysis

figure) is defined by the second largest distance within the ellipse and must be

orthogonal to the first dimension (i.e. the second axis of the ellipse). The data

is now transfered into the new coordinate system. Obviously, global features

of the graph are used for this layout technique.

More formally, the algorithm first finds a new x-axis in such a way that

the variance of this new feature is maximized. In the second step, the second

feature(y-value) is computed, using the same method as before, from that part

of the original matrix which has not been explained by the first axis.

Since principle component analysis has not been described so far in the

context of graph drawing, a formal description of a graph drawing by means

of principle component analysis will be given now:

Algorithm 4.2.1 (Factor-basedLayout)

Input:

A visualization graph

V; E

;Æ;w;

)

Output:

A function





defining two-dimensional positions

1. The

jj

adjacencymatrix

forthegraph

isconstructedasfollows:

Let

denote the element of

in the

th row and

jth

column.

is set

(

)

. We presume that

is standardised: The average value of

each column of

and the variances are

2. The correlation matrix

is constructed as follows: Let

denote the

element of

in the

th row and

th column.

is the (linear) correlation

between the

th and the

th column of

. The reader may note that only

linear correlations between columns areused.

is computed as follows:

(







)(







)

(







)



(







)

;

where





denotes the average valueof the

th column. Since





is always

and the variances

(







)

j

are

can also be computed by

j



3. Let



and



be the first two eigenvalues

and let

be the

corresponding eigenvectors. The first new axis

is then definedby



and the second axis



4. The only problem left is the calculation of the new coordinates for the

objects accordingto the2-dimensional coordinatesystem

. In[82]

5For an introduction to eigenvalues and eigenvectors, the reader may refer to [100, 181].

4.2. FACTOR-BASED DRAWING

a fast method which, unlike many other methods, does not rely on all

eigenvectors has been introduced: The matrix

= (



)





defines the new coordinates of the objects and thus the graph layout



In the implementation used for the STRUCTUREMINER system, the power

method (see [100, 181, 68]) has been applied to compute the first two eigen-

values and eigenvectors. This approximative algorithm identifies one eigen-

value/vector pair after the other. Since only the first two eigenvalues/vectors

are needed in the last step of the algorithm, the usage of such algorithms is

recommended.

Figure4.6:

Agridlaidout byfactor-baseddrawing (leftside)andby all-pairfactor-based

drawing.

One typical problem of factor-based layout is shown on the left side of fig-

ure 4.6: The layout of the



grid seems to be crumpled. This is mainly

because the original matrix

only comprises local information. No global

information keeps the algorithm from placing two unneighboured nodes too

close to each other. Factor-based drawing now tries to find a compact layout,

thus leading to a “crumbled”layout. Most humans want a differentlayout, i.e.

they want the original distances between nodes in the graph to be preserved.

This can be carried out by computing the matrix

in the first step of the al-

gorithm 4.2.1 differently:

is set to

dist

(

)

(

dist

is defined in definition

B.0.4). The result can be seen on the right side of figure 4.6. This modified

version of algorithm 4.2.1 is called all-pair factor-based drawing.

The main advantages of this method are:



No assumptions about the graphs are made.



Since global featuresareused, the layout emphasises the global structure

ofthegraphsverywell. Becauseofthemaximizationoffeaturevariances,

distance proportions in the graph are emphasised.

Figure

4.7:

A cycle visualized using all-pair factor-drawing

This is a major advantage compared to force-directed layout methods:

Force-directed drawing uses only local information to find an optimum

CHAPTER 4. STRUCTURE ENVISIONING

layout, i.e. it uses edges and edgelengths. This works well for symmetric

graphs (e.g. grids) since the global structure of the graph is mirrored in

its localstructure. Most graphsfromrealisticdomains asused in chapters

6 and 7 are non-symmetric.

A simple example can be seen in figure 4.7: A cycle with

200

nodes is

visualized using all-pair factor-drawing. This method always finds the

optimum layout.

Force-directed methods have problems laying out cycles, because the lo-

cal structure is different from the global structure. Tunkelang’s version

of force-directed drawing only finds in

tries a planar layout;

in most cases extra loops were generated (see left side of figure 4.8).

Such loops correspond to local minima of the error function optimized

by force-directedlayout (see also section 4.1.1). Even if a planar layout is

found, the layout is suboptimal; an example can be see on the right side

of figure 4.8.

Figure 4.8:

A cycle visualized using Tunkelang’s version of force-directed layout



The computed solution is always optimal with regard to the explained

variances.

Some disadvantages are:



It is not very fast: Step 1 of algorithm 4.2.1 needs

(

)

time and step

2 and step 4 have a complexity of

(

)

. The power-method used in

this implementation for thecomputation of eigenvaluesand eigenvectors

converges with the same rate as

(



)

, where

denotes the number

of iterations (see [181]). The reader may note that this method is at least

as fast as typical force-directedapproaches.

The following theorem ranges the run-time behaviour in:

Theorem 4.2.1

If thelayout criterion “Nodedistancesin the final layoutshould resemble

the original node distances in the graph” is reasonable, then any satisfy-

ing graph layout technique must have at least the same run-time com-

plexity as the all-pair-shortest-pathesproblem

Proof.

Since these distances are initially unknown, any graph drawing method

must compute them. Algorithms which rely on local information only

(e.g. force-directed methods) can not abide by the criterion from above.

Algorithmsasthelevel-orientedmethodsrelyonspecialknowledgeabout

node distances.

6The well-known Floyd-Warshall algorithm needs

(

)

time.

4.3. COMBINING CLUSTERING AND GRAPH DRAWING



Edges have no uniform lengths and edge crossings are not minimized.



Since rows do not need to be different, nodes may be placed at the same

position.

The problems of equally placed nodes and of non uniform edge length can

be overcome by using an incremental spring-embedder (see section 4.1.1) as

a post-processing step: This special spring-embedder uses the node positions

computed by the factor-basedlayout algorithm as starting positions. Just like a

normal spring-embedder, nodes are moved according to attraction and repul-

sion forces. Unlike a normal spring-embedder, the amount of node movement

is restricted, i.e. nodes are only allowed small position changes. Furthermore,

the amount of movement allowed decreases over time. Such an algorithm re-

sults in small local position changes while the overall layout remains unal-

tered; hence combining the advantages of spring-embedder (local optimiza-

tion criterion) and factor-based drawing (global optimization criterion). Fig-

ure 4.9 gives an example. The reader may note that (i) the drawing on the left

side (pure factor-based layout) shows precisely the tube’s two most important

dimensions and (ii) the drawing on the right side (factor-based layout com-

bined with an incremental spring-embedder) is a 2-dimensional drawing. No

3-dimensional information has been used.

Figure 4.9:

An all-pair factor-based layout of a 3-dimensional tube (left side), obviously

nodes have been placed very closely together. The drawing on the right side shows

the same graph after an incremental spring-embedder has been applied as a post-

processing step.

4.3 Combining Clustering and Graph Drawing

The cluster information from section 2 still has to be integrated into the graph

drawing processin orderto implement the methodology of structure-basedvi-

sualizationfromsection 1.3. Thefollowing detaileddescriptionof thestructure-

based visualization process illustrates the exploitation of the computed struc-

Figure 4.10: Example Step 1 Figure 4.11: Example Step 2

CHAPTER 4. STRUCTURE ENVISIONING

ture information for the graph layout process. Later on, necessary extensions

to the layout methods are explained.

1. Structure Identification: As described in chapter 2, the graph’s inher-

ent structure is identified, resulting in a (hierarchically) clustered graph.

This is illustrated using a small exemplary graph: The graph in figure

4.10 depicts a small traffic graph. This graph is clustered (e.g. using MA-

JORCLUST, see section 2.2.1 for details). The result can be seen in figure

4.11, i.e. the clustering tree from definition 2.0.5 is depicted. Three clus-

ters have been identified.

2. Structure Classification: The cluster are classified, i.e. each cluster is

described by a label. Figure 4.13 shows the classified clusters.

3. StructureEnvisioning: The graph is laid out with respectto the structure

information. For this, three steps are necessary:

(a) Size Estimation: For each cluster a bounding box is estimated. Clus-

ters arelaid out within their bounding boxes. The size of the bound-

ing box is estimated from the bottom up: For each node, the size is

given. This size may be seen as the bounding box of the node.

Clusters comprising only clusters or nodes, whose bounding boxes

are already known, compute their bounding box size by merging

the individual bounding boxes of their subclusters. Normally, the

resulting box size is multiplied by a constant value to allow for an

uncongested layout and to support the layout process. This step can

be seen in figure 4.14.

(b) Layout: The clustered graph is laid out from the top down: The

graph comprising the top clusters (i.e. the graph’s induced struc-

ture, see definition 2.0.4) is laid out using a given graph drawing

method. The bounding boxes of the clusters are the nodes of this

graph. Laid out bounding boxes (i.e. clusters) can be seen in figure

4.14.

are laid out. For this, the substructure becomes the new input for

the layout algorithm and the process recursively goes to step 3b. In

our example, the substructures consist of the original nodes, the laid

out substructures of the three top clusters are shown in figure 4.15.

One problem remains, namely that substructures can not be totally

independently laid out. If two connected nodes belong to different

clusters, they should neverthelessbe placedtogether as close aspos-

sible. I.e. if a substructure is laid out, edges to other substructures

have to be considered. An example can be seen in figure 4.12 where

the nodes

and

are in different clusters.

should be placed on

the right side of the left cluster and

on the left side of the right

cluster.

(d) Post-Optimization: In some cases, a computed graph layout does not

fit into the given bounding box. In these cases, two post-optimi-

zations are possible: (i) The bounding box uses free space in its

neighbourhood for enlargement. (ii) If no free space is available,

the graph layout is scaled to fit into the bounding box.

4.3. COMBINING CLUSTERING AND GRAPH DRAWING

Figure

4.12:

The dependency between substructures

In order to implement the methodology described above, the graph layout

methods have to be extended in three aspects:

1. Bounding Box: Since each layout happens within the bounding box of

the respective parent cluster, the layout algorithms have to incorporate

the possibility to define maximum and minimum x- and y-coordinates.

2. Integration of external Attractions: As described in step 3c, some nodes

are connected to nodes in other clusters. These connections result in an

attraction to other clusters. In order to simplify this problem, only four

external attraction are allows for a node: to the left, to the right, to the

top, and to bottom of its bounding box.

3. Spatial nodes: Nodes are now defined by a bounding box, i.e. they are

not dimensionless quantities anymore, but geometric entities. The graph

drawing algorithms have to be able to work with such nodes.

These extensions have been implemented for STRUCTUREMINER as fol-

lows:



Level-orientedLayout:

1. Bounding Box: Only the methods from step 3d are used.

2. Integration of external Attractions: Attractions to clusters above or

below the current cluster can be implemented by assigning a node

to a top or bottom level. Horizontal attractions are taken into con-

sideration by moving a node to the left or right of its level.

3. Spatialnodes: Spatial nodes can be integrated by using appropriate

distances between layers (each layer has the height of its highest

node) and by leaving enough space between nodes within a layer.



Force-directed Layout:

1. Bounding Box: Additionally to the methods from step 3d, nodes

which leave the bounding box are placed in the next step randomly

within the bounding box again.

2. IntegrationofexternalAttractions: Attractionsto nodeswithinother

clusters are implemented by additional attraction forces.

Cluster A Cluster B Cluster C

Figure 4.13: Example Step 3

Cluster A

Cluster B

Cluster C

Figure 4.14: Example Step 4

CHAPTER 4. STRUCTURE ENVISIONING

Cluster A

Cluster B Cluster C

Figure 4.15: Example Step 5

Cluster A

Cluster B

Cluster C

Figure 4.16: Example Step 6

3. Spatial nodes: Spatial nodes can be implemented by modifying the

optimal edge length: Bounding box diameters are addedto the edge

lengths.



Circular Layout:

1. Bounding Box: Only the methods from step 3d are used.

2. Integrationof externalAttractions: External attractions can be inte-

grated into the circular layout. The general idea is as follows: Each

node in the tree summarizes its own attractions and the attractions

of all its descendants. Since attractions can be seen as vectors (e.g.

;

for an attraction to the right), summarizing attraction means

summarizing vectors. The root node is always placed in the center

of the plane, its descendants are positioned on the inner circle ac-

cording to their summarized attractions. Nodes deeper in the tree

are also placed according to their attractions. Since they also should

be placed close to their parent node, a trade-off between external

attractions and attraction to their parent node has to be found.

3. Spatial nodes: This can be implemented by choosing appropriate

diameters for the concentric circles.



Factor-based Layout:

1. Bounding Box: Only the methods from step 3d are used.

2. IntegrationofexternalAttractionsandSpatialnodes: Thisdemands

aremet by applyinga force-directedalgorithms asa post-processing

step. Theforce-directedalgorithmsimply usestheresultofthefactor-

based layout as its initial starting point. Details can also be found in

section 4.2.

4.4. EVALUATION

4.4 Evaluation

As outlined in section 1.7, the methods described in this chapter are rated by

the four features introduced at beginning of chapter 1. This allows for an as-

sessment of whether or not the AI-visualization paradigm has been followed.

1. Object-basedPresentation: All layout techniques presentedin this chap-

ter depict objects as clearly recognizable geometric entities.

2. ConstantDimensionPresentation: Alldrawing methods reduceagraph

to a 2-dimensional layout. Only the techniques of force-directed draw-

ing, multidimensional scaling, and factor-based drawing are suited for

general graphs. The new technique of factor-based drawing especially

supports a fast and deterministic drawing of graphs. Other layout meth-

ods, e.g. level-oriented drawing, orthogonal drawing, or tree drawing

have been developed mainly for special graph classes.

Common weaknessesof allalgorithms, especiallyrun-time problemsand

an insufficient emphasis of the graph’s structure, have been partially

overcome by the combination with clustering methods. For this, exten-

sions of all drawing methods have been described in section 4.3.

Comparing layout methods leads to the same problems as the compar-

ison of clustering algorithms (see section 2.4): No generally accepted

quality measure exists. Thus, only two ways remain to carry out such

a comparison: (i) Theoretical features can be compared which has been

done in this chapter. (ii) The layout methods can been evaluated using

real-world problems: chapters 6 and 7 describe six applications.

500

1000

1500

2000

2500

3000

3500

0 500 1000 1500 2000 2500

|E|

sec

Eades Tunkelang Factor-based Pair-Factor

Figure

4.17:

A run-time comparison of different layout methods.

The same method as in section 2.4 is used here to compare the run-time

behaviour7of layout algorithms:

typical graphs from all six domains

are visualized and rated. Figure 4.17 depicts the run-time of accept-

able layouts over the number of edges. The reader may note that (i)As

expected, Eade’s force-directed layout algorithm is much slower than

Tunkelang’s. (ii) The run-time of the factor-basedalgorithms is less influ-

enced by the number of edges than the other methods. (iii) Circular and

7The tests have been carried out on a Pentium II, 400Mhz using CLisp.

CHAPTER 4. STRUCTURE ENVISIONING

level-oriented layout methods are not shown since their run-time never

exceeds one minute.

The layout quality is compared for each domain separately in chapters 6

and 7.

3. Abstraction: In most cases, a 2-dimensional layout of a graph comprises

less information than the original graph. This is mainly because the Eu-

clideandistanceson theplanedifferfromtheedgeweightsandthelayout

now defines distances between previously not connected nodes. The loss

of information can be seen clearly when layout techniques such as mul-

tidimensional scaling or factor-based drawing are used. Both methods

use an explicit loss function (the optimization function for multidimen-

sional scaling or the percentage of explained variances for factor-based

drawing).

Loss of information is connected to data abstraction: Important infor-

mation is emphazised while less important information is lost. In this

sense, graph layout may be seen as an abstraction technique. The general

information about node relations is preservedwhile the precise informa-

tion about edge weights is lost. Factor-based drawing controls the loss

of information by preserving as much variance as possible. It therefore

provides a theoretically well-founded means of data abstraction. Force-

directed methods only use local information, thereforedisregard the loss

of global information.

4. User Adaption: User adaption in the field of graph drawing mainly

means the choice of the proper drawing method. For some graphs, a

level-oriented technique might be optimal while other graphs or sub-

graphs can best be laid out by a force-directed method or even by a spe-

cially developed algorithm. Section 3.1.1 elaborated on a method for the

automatic choice between drawing methods.

The classification of drawing methods given in this work may also help

in manually choosing the correct layout algorithm.

Chapter 5

Creating Graphs from Tabular

Data

In this chapter, methods are presented which take a tabular data set (see defi-

nition 1.5.1) and compute a corresponding visualization graph (see definition

1.4.1). First, general concepts for defining object similarities are described. Ex-

isting algorithms for learning similarity functions are the subject of section

5.2.1. Section 5.2.2 introduces new methods for learning object similarities.

These methods use abstract user interfaces and machine learning to bridge the

gap between implicit expert knowledge and explicit knowledge as necessary

for the usage by a computer system.

Dataminingorknowledge discoverytriestofindcoherenciesandunknown

relationships between objects in large setsof data. In this sense, the graph anal-

ysis techniques introduced and described in this work should be looked at as

data mining methods. Nevertheless, in many cases, the data is not given as a

graph, but as a tabular data set (see definition 1.5.1).

For many data mining tasks, the general goal of the analysis is known, e.g.

when investigating the influence of people’s youth on their later lives. More

precisely, users normally have an object similarity category (people’s youth)

and an investigation goal (an assessment of people’s later life, e.g. their in-

come). The underlying visualization assumption is that the object similarity

category influences the investigation goal.

The object similarity category defines the similarity between objects, e.g.

people with similar youths are similar. The visualization should express this

similarity between objects by means of spatial relations in the visualization,

e.g. people with similar education, citizenship, and gender should be placed

closely to each other, since their youths were probably similar.

Using such a visualization, the user is often able to recognize structures in

the data, i.e. if the visualization assumption holds, spatial patterns of objects

should correlate with the investigation goal. The reader may note that the task

of identifying unknown patterns in the visualization is left the the user. Thus,

the distinctive human ability of visual pattern recognition is exploited. This

general methodology can also be seen in figure 5.1.

Many existing visualization and data analysis systems do not support the

definition of object similarities. Instead, given object features are directly in-

CHAPTER 5. CREATING GRAPHS FROM TABULAR DATA

f1 f2 f3 f4 f5

1 2 y 3 5.5

2 2 n 6 7.7

1 2 y 5 1.2

Object

Similarities

Visualization

Investigation

Goal

Object

Similarity

Category

Data Analysis

Pattern Recognition

by User

Figure

5.1:

The general methodology for the visual analysis of tabular data: Using a

given object similarity category, object similarities are found and a graph is

constructed. This graph is visualized. The user can now try to find

coherencies between the visualization result and the investigation goal.

terpreted as coordinates in an Euclidean space1. Thus, object similarities are

defined implicitly by the corresponding Euclidean distances. Such techniques

must not be understood as different from the method described above; the

responsibility for the definition of object similarities is shifted only to a data

preprocessingstep. The given Euclideandistances implicitly defineobject sim-

ilarity categories.

Therefore, no visualization technique for tabular data is possible where the

user has not defined object similarities. Either he scales the data in such a way

that the resulting features may be viewed as Euclidean coordinates (implicitly

defining object distances), or he specifies the object similarities explicitly. Eu-

clidean similarity definitions are highly sensible to feature scaling, e.g. choos-

ing meters or feet as a unit for height influences the object distances. For these

reasons, this sections deals with the direct definition of object similarities by

means of functions.

Humans normally find it hard to define object similarities explicitly in the

form ofa mathematical function formost datasets andanalysis tasks. In simple

cases, e.g. when the similarity between people equals the difference between

their incomes, a similarity function can be stated easily. But for most cases,

the process of transferring a fuzzy understanding of “similar” into an explicit

mathematical function overtaxes the capabilities of most users, hence making

theapplicationofgraph-basedanalysisandvisualizationmethodsdescribedin

this work impossible. Therefore,this section introducesknowledge acquisition

methods for learning a similarity measure which are based on a combination

of high-level user interfaces and Machine Learning.

1For this, non metric feature are transformed into metric feature.

Similarity functions are now defined formally:

Definition 5.0.1

Let

V; E

;

;Æ;w;

)

be a tabular data set. As defined in definition

1.5.1,

(

) = (

(

)

;:::;f

(

)

V; p

denotes an object feature list, and

(

)

(

;

(

;

)



denotes the feature

of node

. A function

sim

: 









is called a similarity function for

if and only if

sim

(

)

;Æ

(

)) =

sim

(

)

;Æ

(

))

Sometimes similarity functions are defined more restrictively (e.g. in [41,

75]). Such functions will be called strict similarity functions:

Definition 5.0.2

Let

= (

V; E

;

;Æ;w;

)

be a tabular data set and let

sim

be a sim-

ilarity function for

sim

is called a strict similarity function if and only if

sim

(

)

;Æ

(

)) = max

sim

(

)

;Æ

(

))

(

)

For reasonable similarity functions, the following holds: Let

V; E

;

;Æ;w;

)

be a tabular data set, let

and let

sim

be a similar-

ity function for

. Then the statements “

sim

(

)

;Æ

(

))

>sim

(

)

;Æ

(

))

”

and “

is more similar to

than

” are equivalent.

Parenthesis: Sometimes not the similarity but the dissimilar-

ity between two objects is defined. Such a function is called

dissimilarity-function. If for a dissimilarity-function

dist

the

property

dist

(

a; c

)



dist

(

a; b

dist

(

b; c

)

holds for all objects

a; b; c

, it is called a distance function. Dissimilarity functions

and similarity functions can be transformed into each other

(e.g. by the operations



). Details can be found in [41].

When the similarity function is known, the corresponding visualization

graph can be constructed: For this, all nodes are connected by an edge and

each edge is weighted by the objects similarities.

Let

V; E

;

;Æ;w;

)

be a tabular data set and let

sim

be a simi-

larity function for

. A corresponding visualization graph

;Æ

;



)

can be constructed as follows:



del ta



sim





=

2For asymmetric similarity functions, i.e. functions where

sim

(

)

;Æ

(

))

sim

(

)

;Æ

(

))

, see [26]

CHAPTER 5. CREATING GRAPHS FROM TABULAR DATA

This graph can now be visualized using the method of structure-based vi-

sualization (see section 1.3).

In the rest of this section, the problem of defining a similarity function is

addressed. Since, as mentioned before,a manual definition is in most cases not

possible, knowledge acquisition methods are used to make the user’s implicit

understanding of the domain explicit. For this, machine learning is used to

narrow the gap between abstract and fuzzy expert knowledge and explicit and

precise similarity functions.

5.1 Similarity Functions

Much work has been done in the last decades on similarity functions; good

overviews can be found in [138, 41, 75, 180]. This section will introduce some

basic concepts from similarity functions and will concentrate on one class of

similarity functions which is suited especially for the applications presented

in section 7: weighted similarity functions. The next section will focus on the

automatic learning of these similarity functions.

In section 5, tabular data sets have been defined formally; each node (or

object)

has been described by a set of features

(

) = (

;:::;f

)

. Nodes

correspond to rows and features correspond to columns in a typical tabular

representation of tabular data sets (see table 1.1). Similarity functions measure

the similarity between the features of two nodes. These functions are often

classified according to the type of features they can handle:

☞Cardinal: We call a features cardinal if and only if all values of the fea-

ture are real numbers. Typical cardinal features are height, temperature,

or distances. Values of cardinal features can be added, subtracted, multi-

plied and divided and the result is still a reasonablevalue for the feature.

☞Ordinal: The only operation on values of ordinal features are the func-

tions

. School-grades or positions in a company hierarchy are typ-

ical examples. Adding or subtracted values of ordinal feature does not

make sense.

☞Nominal: Nominal values can only be compared for equality. Name or

profession of a person are nominal features. If a nominal feature has only

two possible values (e.g. gender of a person), it is called a binary feature.

Nominal Features

If all features of two given nodes are binary, only four combination of pos-

sible value comparisons exist:

;

, and

. The number of value

pair occurrencesare summarized,

denotes the number of

combinations,

the number of

combinations,

the number of

combinations, and

the

number of

combinations (see also table 5.1).

Several authors have combined the values

a; b; c

and

and formed simi-

larity functions, e.g. the Jaccard coefficient

or the matching coefficient

. Details and further similarity functions for binary features can be

found in [155, 25, 55].

Nominal featureswith more thantwo values can be reducedto a set of new

binary features: Let

be a nominal features with possible values

;:::;a

5.1. SIMILARITY FUNCTIONS

values 1 0

a c a

b d b

b c

Table 5.1: Possible combinations for binary feature values

Then

newfeatures

;:::;f

arecreatedandthe following relation between

and

exists:



For some applications (see section 1.6), it may make sense to treat binary

features as metric features with only two values.

Ordinal Features

Ordinal features are normally either transformed into cardinal or nominal fea-

tures, details can be found in [41, 62, 138].

Cardinal Features

Similarity between cardinal features are often defined by the so-called



Minkovski-Metrics:

Let

and

be two objects with corresponding features

(

)

;:::;f

(

)

and

(

)

;:::;f

(

)

. Then the Minkovski-Metric is defined as:



(



(

)



(

)

. For

= 2

, we get the well known Euclidean distance (only with an

extra “



”). Other cardinal similarity functions and information about mixing

different types of features can be found in [41, 62, 138, 6].

Weighted Similarity

In most cases, weighted similarities are used. These functions first calculate

the difference between individual pairs of features. The overall similarity is

createdby weighting the featuredifferencesand by summarizing the resulting

weighted differences. A typical example is the following functions:

Example 5.1.1 (Weighted Linear Similarity Function of Interaction Level 1)

Let

(

(1)

;:::;f

(1)

)

;

(

(2)

;:::;f

(2)

)

be two feature vectors and let all features be car-

dinal. Then the weighted linear similarity function of interaction level 1 is defined

by:



j

(1)



(2)

This is one of the most commonly used similarity functions.

Since all the functions in this section use individual feature differences, it

makes sense to define a so-called difference vector:

Definition 5.1.1 (Difference Vector

)

Let

(

(1)

;:::;f

(1)

)

;

(

(2)

;:::;f

(2)

)

be two feature vectors. The vector

(

)

;Æ

(

)) = (

(1)



(2)

;:::;f

(1)



(2)

)

;

where



denotes the appropriatedifferenceoperator between the features(de-

pending on their type, e.g. for cardinal features



j



), is called the

difference vector

CHAPTER 5. CREATING GRAPHS FROM TABULAR DATA

Since in many cases, dependencies between elements of the difference vec-

tor exist, it makes sense to introducethe following class of similarity functions.

These functions createadditional features by multiplying the original features,

e.g. a combined feature

income



g ender

is created by multiplying the origi-

nal features income and gender.

Definition 5.1.2

(Weighted Linear Similarity Function of Interaction Level



)

Let

V; E

;

;Æ;w;

)

be a tabular data set and let

sim

be a similarity

function for

sim

is called a weighted linear similarity function of interaction

level



(



)

if and only if it has the form:

sim

(

)

;Æ

(

)) =





(



)

;





where

(

) = (

(1)

;:::;f

(1)

)

;Æ

(

) = (

(2)

;:::;f

(2)

)

V; p

and

(

)

;Æ

(

)) = (

;:::;d

)

denotes the difference vector of

and

Remark 5.1.1



is called the interaction level because a weighted linear similarity

function of interaction level



takes interactions between





features of the difference

vector into consideration.

Example 5.1.2 (Weighted Linear Similarity Function of Interaction Level 2)

Let

(

(1)

;:::;f

(1)

)

;

(

(2)

;:::;f

(2)

)

be two feature vectors and let all features be car-

dinal. Then the weighted linear similarity function of interaction level 2 is defined

by:



j

(1)



(2)



j

(1)



(2)

Such a function has been used in section 7.1.1.

Remark 5.1.2 Thereadermaynotethatlearning suchweightedfunctionsmeansfind-

ing values for the parameter



5.2 Learning Similarity Functions

Especially in the field of Case-Based Reasoning (see section 3.2), several meth-

ods for learning weighted similarity measures have been developed. A short

overview is given in section 5.2.1. New methods developed by the author are

presented in section 5.2.2. These methods provide high level interfaces to sup-

port the knowledge-acquisition process.

The methods introduced in section 5.2.2 are applied to the problem of doc-

ument retrieval and document management in section 7.1.1.

Parenthesis: All methods for learning similarity measures

can also be used in the field of Case-Based Reasoning (see

chapter 3.2). Similarity measurearethereused to find already

known solutions for new problem descriptions.

5.2. LEARNING SIMILARITY FUNCTIONS

Name Type Remarks Literature

EACH reinforcement-

learning extra

parameters

needed

[146]

RELIEF reinforcement-

learning binary weights [95]

CCF statistical only binary

features [28]

GM-CDW statistical [69]

Table 5.2: Some existing methods for learning similarity measures

5.2.1 Existing Methods

Methods forlearning similarity measurescan bedivided into two mainclasses:

(i) Methods using Reinforcement-Learning and (ii) Algorithms relying mainly

on statistical analysis. Reinforcement-Learning methods normally predict a

similarity and the user or a different system rates the prediction. Based on the

rating, the weights are adjusted. Statistical methods analyze given examples

and deduceappropriateweights. Table5.2gives some examplesof wellknown

methods. Other examples can be found in [16, 4, 157].

All these methods have in common that the knowledge acquisition step

(identifying exemplar feature vectors including similarities) and the learning

step (using the examples to learn weights) are not treated separately. The new

method described in the next section differentiates between the two steps.

Notseparatingthe knowledge-acquisition stepfromthe learningstepyields

several problems:



Since the expert is integrated into such methods in a predefined way,

no flexibility is possible in the way in which the knowledge is obtained.

Hence, most well-known knowledge-acquisition procedures cannot be

applied.



Most existing methodsrely onwell-known basic learning paradigms,e.g.

on reinforcement-learning. Nevertheless, they do not reducethe learning

problem to well-known learning algorithms (e.g. regression, neural net-

works) which realize those paradigms. Instead, they use specially devel-

oped proprietaryalgorithms. While for most known learning algorithms

advantages and disadvantages have been examined, almost nothing is

known about the applied proprietary algorithms.



Verifying such algorithms is difficult, because learning problems cannot

be distinguished from knowledge-acquisition problems.

5.2.2 New Methods

Stein and Niggemann presented in [162] a general methodology for learning

similarity measures. Figure 5.2 shows this framework: Two main steps are

identified, a knowledge acquisition step and a learning step.

As mentioned before, the definition of object similarities relies mainly on

subjective matters: (i) What is the purpose of the visualization? Often the same

CHAPTER 5. CREATING GRAPHS FROM TABULAR DATA

δ(v1), δ(v2), sim(δ(v1), δ(v2))

δ(v3), δ(v4), sim(δ(v3), δ(v4))

δ(v5), δ(v6), sim(δ(v5), δ(v6))

. . ...

Knowledge

Acquisition

Step

Learning

Step

w1=1.45

w2=0.02

w3=0.4

........

Figure

5.2:

A General Methodology for Learning Similarity Measures

data has to be visualized in differentways in order to help with differentques-

tions. (ii) Who is using the visualization tool? Different experts often have

different views of a domain. (iii) Different data sets often need different simi-

larity measures. It is thereforedesirable to use knowledge acquisition methods

which allow for a fast and reliable definition of similarity measures.

The knowledge acquisition step uses existing knowledge sources (experts,

databases, etc.) to get the necessary knowledge. This step always results in a

set of feature vector pairs whose similarity is known (see also figure 5.2), e.g.

(

)

;Æ

(

)

;sim

(

)

;Æ

(

))

;

(

)

;Æ

(

)

;sim

(

)

;Æ

(

))

;:::

The second steps uses these rated vector pairs and applies a supervised

learning strategy to find values for the weights

. In this work, regression

and neural networks have been mainly used (seeappendix A). The introduced

general framework allows for the application of standard learning techniques.

Normally only a small (but typical) subset of the data set, called a learning

set, is used to learn the similarity measure. The computed similarity measure

is then used for a larger set of data. By using separate learning sets, two main

problemsarise: (i) The question of whether or not a givenlearning set is typical

for a domain has to be answered for each domain separately. (ii) All applied

learning algorithms have to abstract the small learning set in a reasonable way.

The reader may note that the first problem concerns all learning processes; the

data used for the learning must be correlated to the data used later on. The

second problem is discussed in section A.

The next three sections introduce techniques for the knowledge acquisi-

tion step. All these methods have been applied in section 7.1.1 to the complex

problem of document visualization and retrieval. For simplicity sake, a less

complex problem is used here to illustrate the example: Finding similarities

between dogs. Table 5.3shows information about 8 dogs; eachdog is described

by 8 features (2 binary feature,4 nominal features, 2 cardinal features).

Learning by Means of Given Similarities

This technique is quite trivial: The user provides the similarity for pairs of

objects. This can be done by presenting two objects to the expert. The expert

then assesses the similarity between this pair of objects, e.g. on a scale from

Such methods have several disadvantages:



objectsareusedforthe learningprocess,theexperthastorate



object pairs.



The similarities have to be comparable, i.e. the expert has to decide

whether object

is more similar to object

then object

. Such

decisions can be quite difficult.

5.2. LEARNING SIMILARITY FUNCTIONS

Name Race Age Colour Sex Height Exam. Character

Murmel Briard 1 black f 64 no friendly

Roma crossbreed 1 gold f 50 no friendly

Carlo Shepard 6 gray m 62 yes dominant

Keddy Terrier 7 wheat m 48 yes unfriendly

Aiko Briard 1 fauve m 62 no friendly

Mücke Briard 2 black f 64 yes touchy

Bajaro Shepard 2 gray m 68 yes friendly

Henry crossbreed 1 brown m 62 no sensible

Table 5.3: Example: Dog Data

In the field of statistics, the problem of assessing object similarity is well

known. Most methods3(e.g. ranking method, free sorting, anchor stimulus)

relyon an user who defines all object similarities, i.e. no learning or abstraction

mechanisms are applied. The reader may note that this method is used here

in a different way: The expert assess the similarity of

object pairs. These

assessments are used to learn a similarity measure. This similarity measure is

then used to compute the similarity between

n; n



pairs of objects.

Figure

5.3:

Example: Learning by Means of given Similarities

Normally, the user does not see the feature vector but the objects itself. In

our dogs example, the user would be confronted with a questionnaire similar

to figure 5.3.

Learning by Classification

For this method, the expert classifies the objects. The main assumption is that

two objects are similar if and only if they belong to the same class. Let

(

V; E

;

;Æ;w;

)

be a tabular data set and

be the classification

function.

is the set of all possible classes. The function

is normally given

by the user. The similarity

sim

is then defined by:

sim

(

)

;Æ

(

)) =



;

(

)

;

otherwise

;

with

For our dogs example, reasonable classes would be

dangerous, unknown,

harmless

hunting hound, tracking hound, family dog

. A possible ques-

tionnaire can be seen in figure 5.4.

3Good overviews can be found in [6, 17].

CHAPTER 5. CREATING GRAPHS FROM TABULAR DATA

Figure

5.4:

Example: Learning by Classification

This method has some interesting advantages:



classifications define

similarities.



Most experts have few problems classifying objects.



Severaldata setscome with a classification, e.g. the charactersof thedogs

are part of the given data set.

The main disadvantages are:



A disjunctive classification is sometimes difficult, e.g. not all dogs can be

associated with a race.

While in the learning setonly similarities

exist, learning algorithms as

regression result in similarity measures which can yield arbitrary similarities.

This is because learning algorithms abstract the given data.

Learning by Examples

The method presented now demands the least explicit knowledge from the

expertand was first introducedby the author andStein in [122]. The main idea

is to have the expert give some exemplary visualizations, analyze them, and

abstract a similarity measure. Since the visualization methodology used in this

work requires the user to know the investigation goal (see beginning of this

chapter), no additional explicit knowledge is demanded from the expert, i.e.

this method can always be applied.

Again let

V; E

;

;Æ;w;

)

be a tabulardata set (the learning set).

The expert now manually defines a graph layout (see definition 1.4.2), i.e. he

specifies a function





, which defines a two-dimensional position

for each object. Figure 5.5 shows an exemplary drawing for our dogs example.

The similarity of two objects

is now defined by:

sim

(

jj

;

where

jj jj

denotes the Euclidean distance between the position of

and

The following points distinguish this method:



The graphical definition of similarities is close to the mental model of the

user.



By placing

objects,

similarities are defined.

5.3. EVALUATION

Figure

5.5:

Example: Learning by Example

Some disadvantages exist:



Finding a good layout may overstrain the expert. This is mainly because

by placing one object, the distances to



other objects are defined. To

solve this problem, only distances between close objects could be used.

For this, the layout is first clustered, i.e. groups of closely related objects

are identified (see section 2 for an introduction to clustering techniques).

Three clusters can be seen in figure 5.5. Next, only distances to objects

within the same cluster are considered. The distances between all other

objects are defined by the cluster distances.



Implementing the necessary user interface may require some work.

5.3 Evaluation

As outlined in section 1.7, the methods described in this chapter are rated by

the four features introduced at beginning of chapter 1. This allows for an as-

sessment of whether or not the AI-visualization paradigm has been followed.

1. Object-basedPresentation: Allmethodspresentedtreatobjectsasatomic

entities.

4. UserAdaption: Thethreenewknowledgeacquisitionmethodspresented

above differ by the explicity of the knowledge demanded from the user.

Method 1 (Learning by Means of Given Similarities) requires relatively

complex knowledge about object similarities and the similarity of given

object pairs has to be correctly assessed.

Classifyingobjects (Method2: LearningbyClassification)islessdemand-

ing; instead of object pairs, only single objects have to be rated.

Method 3 (Learning by Examples) hardly uses any explicit knowledge;

the graphical definition used is very close to the mental model of the

user.

Experts rate the algorithms presented here as helpful and promising (e.g.

in [162, 122]). In section 7.1.1, these methods are evaluated using a real-

world problem.

CHAPTER 5. CREATING GRAPHS FROM TABULAR DATA

At the end of part I of this work, it can be said that the methodology of

structure-based visualization from section 1.3 can be implemented by a com-

bination of existing algorithms, new methods, and extensions to existing tech-

niques.

Part II

Implementing Visual Data

Mining

The methods presented in the first part of this work have all been imple-

mented and applied to various problem domains. In chapter 6, applications

are presented where the data is given in the form of graphs. Chapter 7 con-

centrates on domains which use tabular data, i.e. data where the similarities

between objects is unknown.

Chapter 6

Visualizing Models given as

Graphs

All four applications presented in this chapter have in common that the to-

be-visualized data is given in the form of a graph. Chapter 6.1 deals with

the visualizing of traffic in computer networks. Visualizing knowledge-bases

for resource-based configuration is the subject of chapter 6.2. In chapter 6.3,

the methodology of structure-basedvisualization is applied to visualizing pro-

tein interaction graphs. Visualizing these graphs is useful in the context of the

Human Genome Project. Chapter 6.4 focuses on the visualization of technical

drawings, especially hydraulic circuits.

The software system STRUCTUREMINER has been applied to all four do-

mains. It is described in detail in chapter 6.1, i.e. chapter 6.1 also serves as

an introduction to the capabilities and to the usage of STRUCTUREMINER. The

chapters 6.2, 6.3, and 6.4 focus on the aspects peculiar to the respective do-

mains.

6.1 Network Traffic

Modern computer networks are designed to allow for a fast and unimpeded

communication between computers. To guarantee a problem free operation,

the administrator is faced with the task of understanding the traffic. If the

traffic is well understood, the administrator can identify network components

constituting bottlenecks for the traffic (see section 6.1.2) or even detect critical

network situations (see [125]).

For the following reasons, understanding network traffic is difficult:

☞Even small local networks (LANs) consist of several hundred computers.

Large company or university networks often comprise several thousand

computers. These computers communicate with each other and with

computers outside the LAN. Furthermore, the amount of traffic between

two computers, its temporal distribution, and the used protocols vary

greatly. In order to help the administrator understand the traffic, auto-

matic abstraction of the data and a high-level visualization is necessary.

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

☞Almost nothing is known about the structure of the traffic beforehand.

Terms like “strong traffic,” “weak related subnets,” or “strong internet

user” have a totally different meaning in differentcontexts.

☞Much information about the traffic can not be concluded from static in-

formation (e.g. the traffic of an hour or a day), but the temporal variation

of the traffic structure must be examined.

☞Because of new users, new technologies, and new scopes of duties within

the organization, the overall structure of the traffic can rapidly change.

Desktop-System

Router

Desktop-System

Router

Desktop-System

Router

Internet

Desktop-System

Desktop-System Desktop-System

Desktop-System

Router Konzentrator

Desktop-System

Figure

6.1:

Transforming network traffic into a graph

To addressthese problems, an abstractcommunication model is formed au-

tomatically: Every active network component (PC, router, switch etc.) forms a

node in a so-called traffic graph. If and only if two nodes (i.e. network compo-

nents) communicate with each, they are connected with each other by an edge

which is weightedby the amount ofcorresponding traffic. Figure6.1 illustrates

this: On the left side, the network topology can be seen; each edge models a

direct communication link. In the middle figure, network components are un-

derstood as nodes of a graph. The edges of the right-handed graph are com-

munication links and this graph must be visualized in order to enhance the

user’s understanding of the network traffic.

Formally, a networkis mapped onto avisualization graph

V; E

;Æ;

)

(see definition 1.4.1) as follows:



is the set of network components.



is the set of communication pairs in the network, i.e. every edge in

connects two components which communicate with one another..



is normally identical to

. Sometimes

represents the network

topology, i.e. each edge in

represents a direct connection (e.g. a cable)

between two network components.



(

)

is the IP address of node



=

;:::;

;

g[f

;

(

;

)



(

)

is the amount of traffic between

and

This graph is now visualized using the method of structure-based visual-

ization presented in chapter 1.3.

The user can choose between several clustering methods for step 1 (Struc-

ture Identification), allowing him/her to evaluate the applicability of cluster

methods to given graphs.

6.1. NETWORK TRAFFIC

Figure

6.2:

Using

STRUCTUREMINER

for Visualizing Network Traffic

Step 2 (Structure Classification) is implemented as follows: The user can

label clusters. When later on, a new cluster in the same network as before is

found, the label of the most similar known cluster is used. Typical clusters

are “Team 2”, “Department C”, or “Project A”. In most computer networks,

the clusters do only change slightly over a period of time; normally only a

few computers change their cluster membership. This classification approach

may be viewed as a form of case-based classification (see section 3.2). The

classification of traffic clusters is the subject of section 6.1.2.

For step 3 (Structure Envisioning), various graph drawing methods (see

chapter 4) can be applied. This allows the user to choose the most suitable

layout method for each graph or cluster separately.

Figure

6.3:

The Tree View of

STRUCTUREMINER

The analysis and visualization approach presented here is totally different

from the methods used so far: Most existing systems mainly show statistical

analysis of traffic data (e.g. average traffic on line A, lost packets between

router 1 and switch 2 etc.). Typical academic research can be found in [72, 71,

10], established commercial systems are e.g. CINEMA(Hirschmann), TREN-

DREPORTER (NetScout Systems), NETWORKAUDIT (Kaspia Systems), OPTIV-

ITY (BayNetworks), OBSERVER (NetworkInstruments), or SURVEYER (Shomiti).

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

6.1.1 Using STRUCTUREMINER

Figure6.2showsthe graphicaluserinterfaceof STRUCTUREMINER: Eachproject

consists of a tree view (on the left side) and of several graph views. The tree

view shows the structure of the graph. On the top level, the connected compo-

nents of the graph can be seen. The clusters are arranged below the respective

connected component. If subclusters exist, they can be found below their par-

ent cluster. In figure 6.3, a tree view can be seen in detail.

For each connected component, a graph view exist which is used to visual-

ize the graph. Figure 6.4shows a graph view: The separate clusters are empha-

sized by not showing edges between clusters and by bounding boxes. Figure

6.4 depicts the same graph, but now the edges between clusters (sometimes

called external edges) are drawn.

Figure 6.4:

Left side: The

STRUCTUREMINER

Graph View. Right side: Showing External

Edges

The user is often mainly interested in the graph’s structure: Figure 6.5

shows only the structure. Figure 6.5 depicts a random layout; the reader may

note that the structured approachimproves the comprehensibility of the graph

layout.

Figure 6.5:

Left side: The Graph’s Structure. Right side: A Random Layout

The recursive structure of the graph is not only visible in the tree view;

but also appears in the graph view: Figure 6.6 shows a top-level cluster: By

choosing “Fold-in” from its context menu, the subclusters are shown (figure

6.6).

6.1. NETWORK TRAFFIC

In STRUCTUREMINER, different cluster algorithms can be applied to each

connected component:

☞MAJORCLUST (see section 2.2.1)

☞Hierarchical MAJORCLUST (see remark 2.2.4)

☞Kohonen (see section 2.1.4)

☞MinCut (see section 2.1.2). This version of MinCut Clustering has been

developed and implemented in [111]: The iterative division of clusters at

their minimum cutstops, if the



-value(see section 2.2.2)of theclustering

worsens.

The step of cluster classification is explained in section 6.1.2.

Figure 6.6:

Left side: Before the “Fold-in” action, Right side: After the “Fold-in” action

After the clustering step, the graph can be visualized by applying one of six

layout methods:

☞Force-directed Layout (see section 4.1.1): STRUCTUREMINER offers two

versionsofforce-directedlayout, Tunkelang’smethod1andEades’sspring-

embedder.

☞Level-oriented Layout (see section 4.1.3)

☞Circular Layout (see section 4.1.4)

☞Factor-based Drawing (see section 4.2). An optional incremental spring-

embedder (see section 4.2) can be applied as a post-processing step.

☞All-Pair Factor-based Drawing (see section 4.2).

☞Random Layout

It is possible to choose the drawing method for each cluster individually.

This can be seen in figure 6.7 (the actual drawing method is marked). The new

drawing method can be applied recursively to the subclusters as well.

Further Features of StructureMiner

Further features of STRUCTUREMINER are:

☞Using a slider control, only edges with weights above a given threshold



are shown, i.e.

(

)

>

. This allows for the concen-

tration on important edges.

1Implemented in [178].

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

☞In a separate file, the user can associate name templates with colors, i.e.

node names matching the regular expression “controller*.pb*” (e.g. con-

troller2.pb3) may be associated with the color red. Nodes are then visu-

alized using the appropriate color.

☞The user can zoom into (and out of) the graph.

☞Several graphs can be visualized at the same time.

☞Graphs can be printed and exported (e.g. into the wmf-format).

☞Nodes and edges can be added, deleted and moved, i.e. STRUCTUREM-

INER works also as a graph editor.

Figure

6.7:

Individual Drawing Methods for each Cluster

Features especially developed for network administration are described in

section 6.1.2. J. Tölle from the Network Institute of Prof. Martini at the Uni-

versity of Bonn served as a network expert for section 6.1.2. He also provided

most of the knowledge concerning intrusion detection in [124]. The author

concentrated on the machine learning and visualization aspects.

6.1.2 Administration Tasks

The three following extensions to STRUCTUREMINER became necessary in or-

der to agree with demands from network administrators:

Classification

The administrator should be able to label clusters, typical labels are e.g.

“Project A” or “AI Lab”. When a new cluster is identified, it should automat-

ically be given the same label as the most similar cluster seen before, i.e. the

clusters are classified using a case-based approach (see section 3.2).

In orderto find a similar cluster, in case-basedclassification a so-calledsim-

ilarity measure is used. Here the similarity measure is a function which mea-

sures the similarity between two clusters (see 3.2 for details). When a new

cluster is found, it is compared to all previous clusters using the similarity

function. The label of the previous cluster which yields the highest similar-

ity (i.e. the highest value of the similarity function), is also used for the new

cluster. An automatically labeled traffic graph can be seen in figure 6.8.

6.1. NETWORK TRAFFIC

Group A

AI Lab

Unkown

Group B

Figure

6.8:

An automatically classified traffic graph

In the rest of this section, let

V; E

;;

)

be a traffic graph and let the

node label function



be a one-to-one function, i.e. the node labels are unique.

All node labels have a length of

log

, i.e.



(

:::l

with



;



. Furthermore let

num

: 

! f

;:::;



be an one-to-one

function mapping letters in



onto numbers.

Fornetworkadministrationpurposesthefollowingsimilaritymeasuremakes

sense:

Definition 6.1.1 (Cluster Similarity)

Let

and

be two clusters of the same graph

V; E

)

, i.e.



and

need neither be disjunctive nor belong to the same clustering (see

definition 2.0.1). Normally

denotes an already labeled cluster created by a

previous clustering, while

denotes a cluster found by a later clustering. The

similarity between

and

is used to decide whether the label of

should

also be used for

. Then the similarity function

sim

between two clusters is

defined as:

sim

(

max(

;

)

Remark 6.1.1 For administration purposes, clusters or groups of users are normally

identified by their nodes. Therefore its makes sense not to use the edges to define cluster

similarities.

Other functions are possible, e.g. functions that also take the non-similar

nodes into consideration.

Theorem 6.1.1

For this theorem, we assume that all clusters are subgraphs of a given graph

= (

V; E

)

(the computer network). Let

;:::;C

denote a set of

already labeled traffic clusters. A new cluster

can be labeled using the

similarity measure

sim

from definition 6.1.1 in

(

j



)

time where



= max

(



)

Proof. An algorithm is given which labels clusters in

(

j



)

time:

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

Figure 6.9: The Anima-

tion Graph Figure 6.10: Animation

Step 1

Figure 6.11: Animation

Step 2 Figure 6.12: Animation

Step 3

(1) It is assumed, that every node

is linked to a set



(

)

of clusters

which comprise



can be computed in a preprocessing step.



is equal to

max



(

)

(2)

(

)=0

(3)

(4)

then

(5)



(

)

(6)

(

)+1

(7) remember the cluster

best

with the highest value of

(8)

(9)

(10)

Initially,

(

)

time is needed to inform a node

whether it belongs

. Since

is a subgraph of

, ever node can be informed in

(1)

time.

This information is needed in step 4.

Step 3 causes

iterations. In each iteration, step 5 causes

(



)

executions

of steps 6 and 7. Since step 6 and 7 need

(1)

time, the overall run-time is

(

j



)

Animation

The administrator is also highly interested in understanding the change in

trafficover aday, a week, or ayear. Since the amount of changes makesa quan-

titativepresentationimpossible, a qualitativevisualization oftrafficchangehas

to befound. For this, the techniqueof structure-basedvisualization(see section

1.3) can be extended as follows:

The network trafficis now defined by aset of trafficgraphs which sharethe

6.1. NETWORK TRAFFIC

same nodes:

;:::;G

with



= (

V; E

(

)

(

)

. Each

graph represents the traffic over a given period of time (e.g. the graph

may

represent the traffic from 8am to 9am,

is the traffic from 9am to 10am etc.).

The method of structure-based visualization is now applied to the graph

, i.e. it is clustered and visualized. The edges are normally not shown. Col-

ors areused to highlight clusters. The user may note that since structure-based

visualization has beenused, clusters areemphasized bythe spatial distribution

of nodes. An example can be seen in figures 6.9 and 6.10 where the clusters are

representedby borderlines. The first figure shows the visualized graph

, the

second figure shows the same graph layout, but without the edges.

Figure 6.13:

Traffic animation with

STRUCTUREMINER

: Three clusters are visualized, the

clustering changes over time.

Nextthe graph

is clustered. This new clusteringis appliedto the oldlay-

out, i.e. all nodes keep their positions, but they may change their cluster mem-

bership. A new cluster membership results in a different node color/shape.

This process continues for the graphs

;i >

. Figures 6.11 and 6.12 depict

typical cluster changes.

Figures 6.13 and 6.13 show an example: Traffic changes recorded at the

University of Bonn throughout the course of a day.

Parenthesis: The reader may note that it is helpful for the

user if clusters keep their color/shape during the animation.

This may look like a simple problem, but it is essentially the

cluster classification problem described earlier in this section.

Because clusters change during the animation, it is necessary

to identify the cluster most similar to a new cluster from the

previous animation step.

Analyzing the Network Topology

primary distributors

secondary distributors

tertiary distributors

users

Figure

6.14:

Structured Cabling

Three differentviews of a computer network exist. Each view is defined by

a graph:

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

1. Cabling Graph

:Edges of this graph model cables. Usually, every

computer is connected to a so-called tertiary distributor by a single ca-

ble. The tertiary distributors are again connected to secondary distribu-

tors which are connected to a single primary distributor. Therefore, the

cabling forms a tree2(see figure 6.14). The reader may note that (i) dis-

tributor nodes may be connected by severalcables and (ii) the purpose of

a cable is unspecified.

Computers and distributors form the nodes of

and the edges are de-

fined by the tree structure. This view onto the network is called the ca-

bling or the physical topology of the network.

2. LogicalTopology

:The logical topology defines which computers can

communicate directly with each other. This view of a network may dif-

fer from the cabling, e.g. by connecting two cables to each other, tree-

like cabling can be used to form a ring. Unlike the cabling, the logical

topology also specifies which active network components (e.g. switches,

routers, computers) exist and how they are connected to each other. E.g.

a tertiary distributor node in

may comprise

switches and

hubs.

Active network components form the nodes of

and edges represent

direct communication links3.

3. Traffic Graph

:As defined above, nodes of traffic graphs are active

network components while edges model network traffic. Logical topol-

ogy and traffic graphs are also explained in figure 6.1.

Figure6.15:

Avisualized physical topology

(left side)and a superimposedclustering

of the traffic graph

Structure-basedvisualization cannow be usedto solve two typical network

problems:

1. Network Planning: Choosing a network topology

which takes the

given cabling

and the expected traffic graph

into consideration is

a major network planning task. Structure-based visualization is applied

as follows:

(a) Visualize

(b) Identify a clustering

2This type of cabling has beenstandardized in EN 50173 and ISO/IEC DIS 11801.

3I.e. direct links on layer 2 of the OSI model.

6.1. NETWORK TRAFFIC

over the visualization

(while keeping all node positions). An example can be seen

in figure 6.15.

The planer can now try to find a logical topology which allows for a fast

communication between nodes within the same cluster.

One typical problemconsists of grouping the set of computers connected

to a tertiary distributor. Computers in the same group are connected

using a single hub or a switch. Because of technical restrictions, only

a limited number of computers can be connected to a switch or hub.

Since communication within a group is faster than the communication

between groups, the administrator wants computers which communi-

cate with each other quite frequently (i.e. computers in the same cluster)

to be in the same group.

Parenthesis: Finding an optimal grouping of computers can

also be viewed as a k-means clustering problem (see sec-

tion 2.1.4). An alternative is to apply MinCut Clustering

(or Nearest-Neighbour clustering) and stop the division (or

union) step when the given number of clusters is reached.

2. Analysis of an Existing Logical Topology: STRUCTUREMINER can help

the network administrator to examine an existing logical topology. This

can be done as follows:

(a) Visualize

(b) Identify a clustering

over the visualiza-

tion of

(while keeping all node positions).

By comparing the structure of

(represented by the layout) with the

clustering

(represented by node colors/shapes), the adminis-

trator can see whether or not the existing logical topology still makes

sense. The more

corresponds to the layout, the more appropriate the

logical topology is.

Figure 6.16:

A network topology (left hand side) and the superimposed traffic clusters

(right hand side). The clusters are represented by differentnode shapes.

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

Figure 6.16 shows a typical example: The node groups at the top and at

the bottom of the screenshot are cut open by the traffic clusters (MAJOR-

CLUST has been used as a clustering algorithm).

6.1.3 Simulation

Regarding visualization and simulation as a symbiotic combination makes

sense for several domains: Whenever a simulation is done in order to present

the results to a user, both visualization and simulation must be examined to-

gether. Therefore, it is helpful to differentiate between two classes of simula-

tion tasks:

1. Internal Simulations: The results of these simulations are only used

within a larger algorithm and they are never shown to the user. A typ-

ical example is the field of configuration: Technical systems consist of

several small components. The configuration task is to combine com-

ponents in such a way that the system implements a given function.

Many systems propose a configuration and test it by using a simulation

(see e.g. [164, 158]). When the simulation shows a discrepancy between

the wanted and simulated behaviour of the system, the configuration is

changed. Another example is the model-baseddiagnosis of technical sys-

tems (e.g. in [165, 46]): A diagnosis is constructed by comparing the

normal and observed behaviours of a technical system. The normal be-

haviour is found by means of simulation.

2. External Simulation: These simulations are done with the sole purpose

of showing the results to the user. An example will be given in this sec-

tion.

External simulation has two natural connections to the field of visualization:

☞The simulation results must be visualized.

☞We presume that a user wants to simulate a given system and that the

simulation results are presented to the user. The system is modeled on

three different levels (see also figure 6.17): The user has a mental model

of the given system, i.e. the user thinks about the system on a specific

abstraction level. This system is formally defined on a modeling level

, which should be as close as possible to

is used as a modeling

level for the computer.

For many domains, the same modeling level is used for the presentation

of the simulation results. This makes sense, because

is an abstract

level close to

, but is also manageable by a computer. For some do-

mains, different modeling levels for the input and for the visualization

are possible.

The simulation is done on a modeling level

. The simulation result

is then transformed back onto the input and visualization level, resulting

. The user sees and comprehends

by transforming it into a mental

model

6.1. NETWORK TRAFFIC

Choosing a modeling level for

is normally a difficult problem, since

a simulation should be fast, simulate all important variables of the sys-

tem, should often be able to work with unprecise input (e.g. qualita-

tive simulation in [102]), and should also result in easily comprehend-

able data. The examination of the simulation/visualization combination

leads to some hints for the modeling level

: Since the input is trans-

formed from the modeling level

into

and the simulation result

is transformed back into

on the visualization level, it can be con-

cluded that visualization and simulation levels must be chosen in such

a way, that model transformations can be easily carried out. A possible

solution would be to choose two similar or equal modeling levels.

Inmanycases,userscarryoutmanualsimulationsonthelevel

. There-

fore, it often possible to choose

close to

Thesethoughts can besummarizedas follows: Wheneverpossible, asim-

ulation level close to a reasonable visualization level should be chosen.

Visualization

Mental Model

Simulation

M'' R

R''

Figure

6.17:

Model-transformations for the Visualization of Simulation Results

The restof this section presents a system for the simulation of network traf-

fic. This system has been developed and implemented by the author. While

the main focus of the description lies on the connection between visualization

and simulation, this section may also be worth reading for someone interested

in the field of network simulation.

The simulation system presented here has been developed to support the

planning of networks. In order to choose appropriate components for a com-

puter network, the planer has to know what the demands on the components

are. The demands aremainly defined by theexpected trafficin the network, i.e.

the more precise the traffic is known beforehand, the better the planning can

be. The subject of this section is the network simulation only; further aspects

of network planning and configuration are not described here.

Figure 6.18:

Left side: The topology of an example network, every edge represents a

physical connections between network components. Right side: The traffic connections

of the example network, each edge represents network traffic.

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

Existing simulation algorithms for computer-networks were mainly devel-

oped foran in-depth analysisof protocols andnetwork-configurations. Typical

systems employ discrete-event-simulation,e.g. in [127, 53, 144, 151]. However,

this work concentrates on the demands of a network planner or administrator:

a fast and understandable simulation of large networks. Furthermore, since

future network traffic as caused by the users is only vaguely known, in-depth

analysis takes second place to a reasonable processing of vague and abstract

terms of traffic-load.

Most existing algorithms pay for their accuracy with a slow run-time be-

haviour and with a dependency on precise inputs, making them inappropriate

for planning and administration tasks. Therefore, the system presented here

uses fast algorithms which can work with imprecise input.

Figure

6.19:

Choosing a Traffic Pattern

The simulation system is based on STRUCTUREMINER. One main differ-

ence is that now two sets of edge are used. One set

represents the logical

structure of the network (i.e. each edge connects two components which can

communicate directly with each other) and another set

represents the com-

munication links (i.e. eachedge connects two components which communicate

with each other). Figure 6.18 shows the logical view of a small example net-

work. Figure 6.18 depicts the communication links of the network,



and



are the only communication edges.

Thefollowing input andvisualizationmodelinglevelhasbeenchosen: Each

communication pattern between two nodes

is modeled as a function

;

maps the amount of traffic per second onto a proba-

bility, i.e.

(1000) = 0

means that with a probability of

1 1000

bytes per

second are send from

. The readermay note that traffic is not modelled

as a time-dependent function; only a static view of the traffic is used.

Since the user does not want to define such a function for each communi-

cation link, standard functions can be used. The user simply chooses for each

communication link respective traffic patterns from a set of predefined func-

tions (see figure 6.19), i.e. a communication link is defined as a typical WWW

communication of a secretary.

Parenthesis: Typical communication patterns have been

found by analyzing network traffic. For this, traffic has been

recorded using a software probe and analyzed statistically.

The simulation itself is first explained using the example from figures 6.18

and 6.18, a formal algorithm is presented afterwards: Figure 6.20 shows the

6.1. NETWORK TRAFFIC

Figure 6.20:

Traffic Pattern 1 (left side) and Traffic Pattern 2 (right side)

communication pattern

A;F

for the traffic from node

to node

, figure 6.20

shows the pattern

F;G

. The traffic for each communication pair is routed

through the (physical) network, e.g. the traffic from

uses the way



This wayisnormally definedby routingschemes which areknown formost

networks. Herethe shortest way routing hasbeen used (see [58, 170] fordetails

and for other routing schemes). The simulation method presented here also

allows for the comparison of different routing schemes.

For each physical edge, the communication patterns using this edge are

summarized. Since the patterns

are probability density functions, two

patterns can be summarized using the method of convolution (see [62]). The

result for the logical edge



can be seen in figure 6.21; the functions

A;F

and

F;G

have been summarized. The traffic on the edges



B; B



defined only by

A;F

(since only this communication pair uses those edges)

and the traffic of



corresponds to

F;G

The implementation uses discrete functions to approximates the continues

functions

Figure

6.21:

The resulting Superimposition of Traffic Patterns

The algorithm for the simulation can be stated formally as follows:

Algorithm 6.1.1 (StatisticalSimulation)

Input. A graph

V; E

)

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

where

defines the logical topology of the network,

a set

(1)

(2)

;:::;p

(1)

(2)

probability density functions with

(

)

Output. A probability function

for each

(1)

(



;

otherwise

end

(2)

v;w

(3) let

;:::;e

)

be the way

of the traffic from

using the edges

(4)

(5)

(



)

v;w

(

)

// convolution

This simulation approach has been chosen for the following reasons:



The simulation meets the demands of a network planner. A network

planner is mainly interested in the amount of traffic which the network

components have to cope with. A pure worst-case scenario, i.e. adding

the maximum possible traffic for each logical edge, does not help the

planner, because he/she has to decide whether better components are

worth the money. I.e. he/she must know how often the cheaper compo-

nents would cause traffic delays or network problems. By modeling the

trafficwith probability densityfunctions, not only the possible amount of

traffic, but also the probability for a special traffic situation is simulated.



The approach is also abstract enough to allow for the processing of only

vaguely known traffic patterns. The exact future behaviour of a user is

unknown, but typical statistical characteristics of his/her behaviour are

normally known (e.g. the typical trafficpatterns caused bya secretaryac-

cessing the WWW).The changeof trafficovertime is especiallyunknown

for future traffic.



The statistical simulation presented here is fast enough to simulate even

large networks. The reader may note that calculating the convolution

requires

(

)

time (

being the number of traffic intervals used for the

discrete probability functions). The number of convolutions is bounded

by the overall length of paths used by traffic in the physical network.



The simulation does not have to take all technical restrictions (e.g. max-

imum traffic allowed a special switch) into consideration. This makes

sense, since we are not simulating a real network, but trying to finds de-

mands on a new network. E.g. the user wants to identify an appropriate

switch, but is not interested in evaluating an existing one.

The main disadvantages are:



Typical traffic patterns have to be known.



Unlikediscrete-event-simulations,temporalchangeofthe trafficcanonly

be modelled by a series of different simulations.

6.1. NETWORK TRAFFIC



Some network events (buffer overflows) cannot be precisely predicted.

The simulation method presented here allows for the simulation of even

large networks (about

500



1000

nodes) within a few seconds. It has been

tested with several realistic networks (30-40) and the results proved helpful for

network planners and administrators. Simulation results have been compared

to measurements in real networks (traces with software-probes). In order to

verify the simulation results, a test laboratory has been used. In this labo-

ratory, different network topologies can be created and software systems on

the computer (PCs and SUNs) generate communication patterns. The result-

ing traffic has been recorded and has been compared to the predictions of the

simulation method presented in this paper. The simulation results resembled

reality closely.

When asked aboutthe optimal method for the presentation of recordednet-

work traffic, most network administrators prefered probability density func-

tions. Therefore, input/visualization modeling level and simulation level are

identical. Many of the advantagesofthe simulation approach,arealso applica-

ble to a visualization using probability density functions: (i) The visualization

is abstract enough to allow for an overview of large amounts of traffic. (ii) The

probability of problematic traffic situation is visible. (iii) The precision of the

visualization corresponds tothe precisionof the knowledge aboutfuturetraffic

and to the reliability of the knowledge about user behaviour. These correspon-

dences are due to the unique mental model of the user.

Furthermore, the simulation results can be visualized using the visualiza-

tion method for network traffic described earlier in section 6.1.1. The only re-

maining question is how the probability function

of an edge

can be trans-

feredinto an edgeweight. Agood solution is to use the centroidof the function

as the corresponding edge weight:

(

1

(

)

1

(

)

The simulation system presentedhere is, besidesbeing beneficial to the net-

workplanner,agoodexampleofthecloserelationshipbetweensimulationand

visualization. It can be seen that, when developing a system for external simu-

lation, it makes sense to regard visualization and simulation as a whole and to

choose similar or even identical modeling levels.

6.1.4 Discussion

In order to evaluate the analysis and visualization approach of structure-based

visualization, traffic recorded at the universities of Paderborn and Bonn is

used. Furthermore,



100

artificial traffic situations have been generated in

the laboratory described in section 6.1.3. For these artificial situations, the

correct clustering4is known. Hierarchical MAJORCLUST is able to identify

87%

of the given traffic clusters. MinCut-Clustering5finds

53%

and Kohonen-

Clustering finds

76%

of these artificial clusters. MinCut-Clustering mainly

4Since the laboratory comprises only



computers, only divisions into

clusters have been

generated.

5The implementationfrom [111] has been used.

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

suffers from the problem described in figure 2.12, while MAJORCLUST and

Kohonen-Clustering sometimes fails to differentiate between the two given

clusters. As mentioned before, MAJORCLUST has problem finding small clus-

ters (

The size of real traffic graphs (300-600 nodes) makes an application of Min-

Cut-Clustering hardly possible (see also figure 2.25). For most graphs (44 out

of 51 traffic graphs) Kohonen-Clustering finds reasonable clusters, but needs

additional parameters in order to detect the correct number of clusters (see

section 2.1.4). In all cases, users find the results produced by hierarchical MA-

JORCLUST quite adequate. MAJORCLUST needs only a few seconds to compute

the clustering, while Kohonen clustering needs





minutes.

Because traffic graphs normally have an underlying star-like structure6, cir-

cular layout (see section 4.1.4) yields the best results. Even the usually good

factor-based layout can not compete in this domain with circular layout. The

figures 6.22 and 6.22 show the same traffic graph visualized using circular and

factor-based layout.

Figure 6.22:

Left side: A traffic visualization using

MAJORCLUST

and circular layout.

Right side: A traffic visualization using

MAJORCLUST

and factor-based drawing

Experts rate the results of structure-basedvisualization as very helpful (see

e.g. [124, 125, 160]). The combination of clustering, classification, and graph

drawing allows for the analysis of otherwise incomprehensible traffic situa-

tions.

Some open problems remain:



It would be helpful if data could be read directly from network probes.



Much research is needed to apply the clustering and visualization meth-

ods tothe problemof intrusion detection. Somequestions concerntheus-

age of rule learning paradigms, appropriate visualization methods, and

user modeling (hacker modeling). Early answers to these questions can

be found in [124].

In Conclusion, it can be said that the methodology of structure-based vi-

sualization allows for a decent visualization of network traffic. Users are able

to identify unknown patterns and information in the data. The application of

clustering and classification proves especially helpful.

6This is caused by the typical client-server relations.

6.2. CONFIGURATION KNOWLEDGE-BASES

6.2 Configuration Knowledge-Bases

The configuration of technical systems is one of the oldest fields of Artificial

Intelligence. Configuration denotes thetask ofassembling asystem fromgiven

atomic components. In most cases, the system has to fulfill given demands and

has to minimize a cost function.

Resource-based configuration is one of many existing configuration meth-

ods7. Components are described by means of functionalities which are offered

or demanded. Initially, a set of wanted functionalities is given. A solution is

the cheapest set of components which leaves no demands unfulfilled.

Let

;:::;c

denote the set of components and

;:::;f

the set of functionalities. Each component

demands functionalities

demand

(

)



and offers functionalities offer

(

)



Knowledge-bases for resource-based configuration can be treated as a di-

rected visualization graph

(

)

(

)

(

)

;Æ

(

)

(see definition 1.4):



(

)

[



(

)

(

c; f

)

C; f

F; f

offer

(

)

(

f; c

)

C; f

F; f

demand

(

)



(

)

(

)

pl ug in

;

sl ot

;

virtual

;

pr esumed

;

where

(

)

(

) = \

(

)

provides additional information

about functionalities: “Virtual” and “presumed” functionalities do not,

like“slot” or“plugin”functionalities, correspondto physicalconnections

between components.

Obviously,

(

)

is bipartite.

Experts are mainly interested in causal effects between components, i.e.

they want to know if the existence of a component

directly causes the

existence of a different component

. Therefore it makes sense, not to vi-

sualize

(

)

butagraph

(

)

which modelscomponents andthe causaleffects

between them.

(

)

(

)

(

)

is constructed as follows:



(

)



(

)

(

)

;

(

f; c

)

(

)



(

)

(

)

. Let

(

)

;

(

f; c

)

(

)

be the edges which cause an

edge

(

)

(

)

depends on the importance of the type of the func-

tionality

(

)=4

for “slot” functionalities,

(

)=2

for “plugin” and

“virtual” functionalities, and

(

)=1

for “presumed” functionalities.

An additional function



(

)

! f

pl ug in

;

sl ot

;

v ir tual

;

pr esumed

;

(

)

(

)

is used to label edges according to the type of the corresponding func-

tionality.

7Introduction can be found in [64, 123].

100

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

Figure

6.23:

A visualized knowledge-base for resource-basedconfiguration.

is visualized using the methodology of structure-based visualization.

Edges are displayed differently according to their



value:



“Slot” edges are represented by red and solid lines.



“Plugin” edges use grey, thick, and solid lines.



“Virtual” edges are displayed using red and dotted lines.



“Presumed” edges are drawn using a dotted black pen.

STRUCTUREMINER is used to visualize such graphs. In this work, a knowl-

edge basemodeling a telecommunication system is used. This knowledge base

comprises components such as cables, boxes, computer plug-in boards, tele-

phone exchange systems, adapters, etc.

The system Pre-AKON,which has been developedin 1994for the company

Tenovis (formerly Bosch-Telenorma), uses this knowledge bases to support the

synthesis of telecommunication system.

Figure 6.23 gives an example: First, MAJORCLUST is used to cluster the

graph

. The clusters and nodes are visualized by means of a circuit layout

method (see section 4.1.4 for details).

STRUCTUREMINER canblend-out edgetypes. Figure6.24gives anexample:

In the left screenshot all edges are shown; in the screenshot on the right, only

“slot” and “plugin” edges are displayed.

Other graph drawing methods have also been applied: Figure 6.25 shows

the result of a force-directed layout method (left side, see section 4.1.1 for de-

tails) and a random layout (right side).

6.2. CONFIGURATION KNOWLEDGE-BASES

101

Figure 6.24:

Left side: One cluster of the knowledge-base, all edges are shown. Right

side: Only the edge types “SLOT” and “PLUGIN” are shown.

Figure 6.25:

The same cluster as in figure 6.24 visualized using a force-directed layout

method (left side) and using a random layout (right side).

6.2.1 Identifying Technical Subsystems

Asdescribedinchapter3,itisoftenhelpfultolabelclusters. Fortheknowledge-

base used in this work, clusters should correspond to technical subsystems.

Knowledge-bases often come in different variations. E.g. for different tele-

phone systems, slightly different knowledge-bases are used. Such knowledge

bases sharemost oftheir components andfunctionalities. However, afew com-

ponents normally offer or demand different functionalities. In such cases, it

is helpful to recognize subsystems which are similar to already known sub-

systems. For this work, variations of the original knowledge base have been

generated artificially. The variations are similar to variations as found between

real problem instances. The original knowledge base comprises

technical

subsystems.

As described in section 3.1, a classification function

is used to map for a

cluster

from cluster features

!

(

)

onto the cluster labels

;:::;l

. Here,

denotes the name of a technical subsystem. I.e.

(

!

(

))

denotes the correct

cluster label for

is learned by analyzing sample cluster labels, details can

be found in section 3.1 and in appendix A.

184

cluster are examined manually.

136

of these clusters correspond to one

of original

subsystems. These

136

clusters are labeled using one of the

different labels (i.e. technical subsystems).

labeled clusters are chosen ran-

102

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

domly and used to learn8a classification function

which maps from clusters

onto labels. Details concerning the learning process can be found in appendix

A. Thelearnedfunction

classifiesincorrectly

oftheclustersin thelearn-

ing set.

Figure6.26:

Two differentknowledge-bases;clusters are labeled,i.e. technical subsystem

are identified

Next, the other

labeled clusters are used to verify the learning results,

i.e. these clusters form a test set:

of the clusters in the test are labeled

incorrectly by

. Obviously, a classification function is learned by analyzing

sample labelings. Figure 6.26 gives two examples: The clusters correspond to

technical subsystems and are labeled automatically.

6.2.2 Classifying Clusters according to the Optimal Layout Al-

gorithm

No single graph layout method proved optimal for all knowledge-bases or

knowledge-base clusters. Therefore, STRUCTUREMINER supports the choice

of individual layout methods for each graph and each cluster. In section 3.1.1,

a method has been introduced that analyzes such choices and chooses appro-

priate layout methods for new graphs or clusters automatically.

This method calculates for each cluster

a set of features

(

)

(see table

3.1). In this work, cardinal features are used, i.e.

(

)



. As described in

appendix A, regression can be use to learn a classification function



;:::;l

where

;



denotes a layout method. Thus,

maps from

features onto graph drawing algorithms.

An example: The author has chosen his favorite graph drawing method for

75 clusters. 48 of these clusters has been used to learn a classification function9

Thesetofclassifiedclustersisdividedintoalearningsetcomprising

clusters

and a test set of

clusters10. The learning set is used to learn

. The purpose

of the test set is to evaluate the generalization capabilities of

. In this example,

of the cases in the learning set and

of the clusters in the test set are

8Regression is used as a learning method.

9The feature

from appendix A correspond the features

(

)

10As usually, several such apportionments are used; the variance of the error rate has been

0076

for the test set. The layout methods have been represented in each apportionments by

an equal number of cases.

6.3. PROTEIN INTERACTION GRAPHS

103

classified incorrectly. Obviously, it was possible to learn the author’s layout

preferences.

Examples can be seen in figures 3.1 and 6.27.

Figure

6.27:

Different layout methods have been chosen automatically for different

clusters (circularlayout for the top-left cluster, level-oriented layout for the

bottom-left cluster, and force-directedlayout for the right cluster).

6.2.3 Discussion

The visualization method presented in this section helps users to comprehend

complex knowledge bases quickly. This is done by means of the following

visualization features:



MAJORCLUST is able to find technical subsystems.



Labels identify clusters as special technical subsystems. By analyzing

samplelabelings, afunction, which labelsclustersautomatically,islearned

successfully.



Individual layout methods are applied to different clusters. This is done

byanalyzingprevioususerpreferences. Usersmainlychoseforce-directed

layout methods, all-pair factor-drawing, and circular drawing.

6.3 Protein Interaction Graphs

After the completion of the human genome project and because the complete

sequences of many other organisms are available these days, it becomes in-

creasinglyimportanttoidentifythebiologicalfunction ofthenovelgenesfound.

Since proteins are the products of genes, researchers work on the characterisa-

tion of the Proteome, the set of all proteins in a living cell. In order to under-

stand how proteins work together and build the complex machinery of living

cells, experimental and computational studies of Protein-Protein Interactions

are central to enhance our understanding of cellular processes as a whole.

The reader may note that in this work, using publicly available sources,

protein interactions are taken for granted. The various underlying biological

principles are not examined (e.g. enzymatic reactions, protein complex assem-

bly, gene regulation, signal transduction, see [84]). When protein interactions

are understood, it is often possible to prevent or influence a sequence of bio-

logical events (e.g. disease states).

104

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

Figure

6.28:

A typical protein-network visualized using

MAJORCLUST

and factor-based

layout, no inter-cluster connections are shown.

Introducing even the basic underlying concepts of protein analysis is be-

yond the scope of this work. Biological questions have not been the focus

of research of the author11. Introductions to protein analysis can be found in

[84, 132, 60]. This section therefore treats proteins as nodes of a so-called pro-

tein network. Edges represent interactions between proteins. In some cases,

nodes do not model a single protein, but a whole family of proteins. In this

case, edges between protein families summarize all interactions between sin-

gle proteins and are therefore weighted by the amount of original interactions.

For such analyses, existing protein taxonomies (e.g. the SCOP hierarchy, see

also [150]) are used.

The methodology of structure-basedvisualization is applied to protein net-

works. The results are presented and discussed at the end of this section.

As mentioned before, biologists classify proteins by means of taxonomies.

Taxonomies are hierarchical clusterings (see definition 2.0.5). In this work, the

SCOP hierarchy is used which describes proteins by the categories “Class,”

“Fold,” “Superfamily,” “Family,” and “Protein”. At the top level of the hier-

archical clustering, proteins are differentiate by their membership to a special

class, the category “Fold” is used on the second level, etc.

In order to obtain a notion of the degree of interactions within and between

clusters, it is helpful to compare the clusterings as defined by the SCOP hi-

erarchy (clustering using domain knowledge) and the clusterings computed

by one of the methods from chapter 2 (graph-theoretic clustering). The reader

may note that graph-theoretic clustering tries to find a clustering which maxi-

mizes the numberof edges within clusters and which minimizes the numberof

edges between clusters. The following text defines a so-called clustering com-

parison coefficient



which can be used for arbitrary clustering comparisons.



is used later on to compare the results of SCOP-clustering and MAJORCLUST

respectively.

11Those merits must be assigned to Michael Lappe from the Structural Genomics Group of the

European Molecular Biology Laboratory — European Bioinformatics Institute in Cambridge (UK).

6.3. PROTEIN INTERACTION GRAPHS

105

Definition 6.3.1 (Routing Distance in a Hierarchical Clustering)

Let

= (

V; E

)

be a visualization graph.

= (

C; E

)

denotes a hierar-

chical clustering of

(see definition 2.0.5). The routing distance

dist

between

two nodes

is defined as follows:

dist

(



are in the same cluster

dist

(

)

where

dist

denotes the length of the shortest path between two nodes in the

tree

and is formally defined by definition B.0.4.

Definition 6.3.2 (Clustering Comparison Coefficient



)

Let

V; E

)

be a visualization graph.

(1)

)

and

(2)

(

(2)

)

denote two hierarchical clusterings of

(see definition 2.0.5).



(

G; T

(1)

(2)

)

is defined as follows:



(

G; T

(1)

(2)

dist

(1)

(

)



dist

(2)

(

)



;

where



j

)



denotes a normalization factor.

stands for the

maximum of the heights of the trees

(1)

and

(2)

dist

(

)

(

)

denotes the

routing distance of

(

)

(see definition 6.3.1).



(

G; T

(1)

(2)

)

examines all node pairs of

. Some node pairs may be

neighbours in

(1)

, but belong todifferentclustersof

(2)



assessesthediffer-

ences of distances in

(1)

and

(2)

respectively.



is normalized, i.e. all values

are in the interval

;

. A value of

means that the clusterings are equal.

nb5nb4dip5dip4

0.9

0.8

0.7

0.6

Figure

6.29:

The clustering comparison coefficient



for different protein networks.

Figure 6.29 shows



(

G; T

(1)

(2)

)

-values for 4 protein networks.

(2)

defined by the SCOP-hierarchy,

(1)

has been computed using MAJORCLUST

(see section 2.2.1). All values are between

and

. Thus, the clusters

defined by MAJORCLUST differ significantly from the SCOP-hierarchy. With

other words, protein interactions happen mostly between SCOP-clusters.

6.3.1 Discussion

The system STRUCTUREMINER is used to visualize 20 different protein net-

works12, 60% of which represent protein families. Those graphs comprise 300-

12Because proteinnetworks are rather valuable, only asmall number of such graphsis available.

106

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

Figure

6.30:

A visualized protein network including inter-cluster connections.

500 proteins; most graphs have an edge density (see section 2.1.3) of





Figure 6.28 shows a part of a typical protein network: MAJORCLUST has been

used as a clustering algorithm. Nodes and clusters have been visualized by the

all-pair factor-based drawing method (combined with an incremental spring-

embedder). Figure 6.30 shows some inter-cluster interactions.

Only MAJORCLUST is used as a cluster algorithm since evaluating cluster

qualities is time-consuming and can only be done by experts who have limited

time.

Figure 6.31:

The same cluster of a proteinnetwork visualized by a spring-embedder (left

side, Tunkelang’s spring-embedder has been used, see 4.1.1 for details) and by all-pair

factor-based drawing (right hand side).

Since no information about a graph’s structure is known beforehand, only

the following two layout methods areapplied: (i) Force-directeddrawing (sec-

tion 4.1.1) and (ii) a combination of all-pair factor-based drawing and an in-

cremental spring-embedder (see section 4.2.1). The second method normally

produces for many cases the better results. Figures 6.31 and 6.32 show two

examples; the reader may note that while Tunkelang’s method produces more

esthetic layouts, all-pair factor-based drawing leads to easier analyzable lay-

outs. Force-directeddrawingoftenfailsto identifytheunderlyingglobalgraph

structure.

Even for larger graphs (



400



500

nodes,



4000



5000

edges), MAJOR-

CLUST is able to compute the clusters within



minute.

Experts find the combination of clustering methods and graph drawing

techniques helpful for the analysis of protein networks. By analyzing inner

cluster interactions first and by considering edgesbetween clusters in a second

step, hardly comprehendable graphs can be examined faster and more reliably.

The structure-based visualization of protein networks is still in its infancy.

6.4. FLUIDIC CIRCUITS

107

Figure 6.32:

The same cluster of a proteinnetwork visualized by a spring-embedder (left

side, Tunkelang’s spring-embedder has been used, see 4.1.1 for details) and by all-pair

factor-based drawing (right hand side).

Nevertheless, the few results which have already been evaluated by experts,

suggest the continuation of this research. Some open researchissues are:

☞Existing protein taxonomies should be integrated in structure-based vi-

sualization using node colors or node shapes. This would allow for a

visual comparison of graph-theoretic and biological clusterings.

☞Identified clusters should be labeled automatically. Even the manual

classification of clusters (according to their biological function) currently

constitutes a difficult problem. No automatic methods for such a cluster

analysis are known so far.

☞The visualization tool should be customized according to the special de-

mands of biologists. This may include node shapes, edge thicknesses, or

node and edge labels.

6.4 Fluidic Circuits

This section elaborates on the visualization of fluidic circuits. The main focus

lies on the clustering step. Since fluidic circuits often have an almost serial-

parallel structure, special layout algorithms can be applied (see [8] for details).

These serial-parallel layout methods are not the subject of this work. More

information about fluidic circuits can also be found in section 7.1.1.

6.4.1 Domain Clustering

In this section, some new answers to the following question are given: “Can a

system learn a special, domain-dependent clustering method, if the user only

gives some sample clusterings?”. The method of definition by exemplary de-

mands no explicit knowledge from the user and can help to bridge the gap

between implicit expert knowledge and explicit algorithms.

Domain-dependent knowledge is expressed using node labels, i.e. unlike

the clustering methods presented before, the node labels

of a visualization

graph

= (

V; E

;Æ;w;

)

are used. The reader may note that arbitrary

information can be coded into node labels

, e.g. when modeling a computer

network, labels maydenote anIP-addressorthe functionof the node(terminal,

router, switch, WWW-server etc.).

The idea employed here is to map a graph

where domain-dependent

knowledge is coded into the node labels onto a domain-independent graph

where all domain knowledge is coded into the edge distribution and the edge

108

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

Figure

6.33:

Transfering a domain-dependent graph into a domain-independent one.

weights. This graph can be clustered using the graph-theoretic techniques de-

scribed in section 2. This is also shown in figure 6.33, in a first step the labeled

graph

is transfered into a graph

which is only defined by nodes, edges,

and edge weights. This graph is clustered in a second step. The remaining

problem is to find a mapping between

and

, this step is called domain

reduction.

Whether two nodes

belong to the same cluster, is defined by

their labels and by others features of

, e.g. the distance between

and

. The next definition states this formally:

Definition 6.4.1 (Domain Reduction)

Let

(

)

(

)

(

)

;Æ

(

)

(

)

;



(

)

beavisualizationgraphandlet

(

)

(

)

(

)

be a graph without node labels.

results from

as follows:



(

)

(

)



(

)

(

)



(

)

(

))

v1v2v1v2

Figure 6.34:

A The local maximum flow for

(left-hand side) and for

(right-

hand side).

(

)

denotes a function, which analyzes

and their relation in

and compiles the result into a set of features

(

;:::;f

)

. Typical features are:

(

)

(

)

;Æ

(

)

(

)

2. The distance

dist

(

)

3. The maximum flow

between

and

in the subgraph of

induced

by the nodes



;min

(

dist

(

)

;dist

(

))

< i

. This

so-called local maximum flow is normally calculated for



and

measures the connectivity between

and

in their neighbourhood.

Figure 6.34 shows an example, on the left-side a graph and the subgraph

as induced by



is shown (gray background), the local maximum flow

between

and

. On the right-side the subgraph induced by



shown, now the local maximum flow between

and

. The reader

may note that the local maximum flow can be calculated quickly since

the induced subgraph is relatively small.

13For a definition of distance see definition B.0.4.

14For a definition of maximum flow see definition B.0.5.

6.4. FLUIDIC CIRCUITS

109

4. Domain-specific features can be added.

is a function which takes the features

(

)

and computes the likeli-

hood of

and

being in the same cluster. Learning

by analyzing given

clusterings is the subject of the rest of this section.

In order to learn

, the user defines a set of clustered graphs

;

(1)

;:::;C

(1)

;:::;

;

(

)

;:::;C

(

)

where

(

)

;:::;C

(

)

is a disjunctive clustering of

For every graph

(

)

(

)

and each pair of nodes

(

)

set

(

)) = 1

;:::;p

(

)

. For all other pair of

nodes

(

)

we set

(

)) = 0

Now a supervisedlearning strategy (e.g. neural networks or regression,see

section A for details) can be used to learn

. This method is now applied to

the clustering of hydraulic circuits:

Using such an adaptive method for the clustering of hydraulic circuits in-

stead of a specially developed method as in section 6.4.1 has some advantages:



Developing a special clustering method takes a long time; almost a year

has been necessary to develop the method presented in [149, 163]. Since

not only the hydraulic axis, but also other types of cluster are needed for

an analysis of hydraulic circuits, it makes sense to speed up the develop-

ment process.



Experts often have problems expressing their implicit knowledge explic-

itly. Adefinitionbyexamplemayhelptobridgethisknowledge-acquisition

gap.



In some domains it is possible to get hold of examples, but an expert is

hard to find. E.g. this is often true for medical domains.

W W

Figure 6.35:

A manually clustered hydraulic circuit (left-hand side) and its structure

(right-hand side).

Figure6.35depictsatypicalmanualclusteredhydrauliccircuits: Twowork-

ing units and a supply unit have been identified.

6.4.2 Discussion

In this example, 20 out of 62 circuits are clustered manually. As described

above, for each pair of nodes

a feature vector

(

)

is computed. In

addition to the general feature mentioned before, a domain-specific feature is

110

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

added: Each node is classified according to its function in the circuit (cylin-

der, tank, valve, pump, or miscellaneous). A function

, which maps from

the features of two nodes onto their likelihood of being in the same cluster, is

learned using a one-layer neural network and backpropagation (see section A

for details).

Domain-independent graphs are constructed using both the learned func-

tion

and the method of domain reduction (see definition 6.4.1). In a second

step, these graphs are clustered using MAJORCLUST (see section 2.2.1).

Figure 6.36:

Two automatically clustered Circuits

Figure 6.36shows two automatically clustered circuits; both clusterings are

correct.

The method of domain reduction can correctly cluster

75%

of the graphs in

the learning set and

69%

of the graphs in the test set. Obviously, it is possible to

automaticallylearnaclustermethodforhydrauliccircuitsbyanalyzingsample

clusterings.

The methodfails forcircuitsthat comprise severalsmallworking units. Fig-

ure 6.37 shows a typical example; the five working units on the left are not

recognized as single clusters, but they are identified incorrectly as one cluster.

The author thinks that it may be possible to apply this method to other

technical domains as well, if the graphs from these domains fulfill certain re-

quirements:

Figure

6.37:

An incorrectly clustered circuit

☞The node labels are given and define important domain knowledge.

☞The wanted clusters can be defined using node labels and topological

information only.

6.4. FLUIDIC CIRCUITS

111

☞All clusters are defined using the same set of instructions, i.e. the system

can learn one set of instructions to identify all clusters.

☞Clusters have at least a certain size.

Figure 6.38:

Two fluidic circuits visualized by a level-oriented layout. On the left side,

components are depicted as nodes which emphasizes the structure of the circuit. On

the right side, bitmaps are used to visualize single components.

The domain of fluidic circuits is quite differentfrom the other domains dis-

cussed in this work: Much knowledge aboutthe domain exists andcan be used

to adjust the three steps of structure-based visualization:



Step 1 has been discussed intensively.



Classification, i.e. step 2, can be done quite easily: subgraphs which in-

clude cylinders are working units while supply units comprise tanks or

pumps.



As mentioned before, special layout methods exists for serial-parallel

graphs. Level-oriented drawing (section 4.1.3) is also able to find rea-

sonable layouts, figure 6.38 shows two examples.

This application is used mainly to show how domain knowledge can be

used to improve clustering methods. The identification of substructures is a

key aspect of the visualization of fluidic circuits. This example therefore un-

derlines the importance of the concept of structure-based visualization.

112

CHAPTER 6. VISUALIZING MODELS GIVEN AS GRAPHS

Chapter 7

Visualizing Models given as

Tables

Tabular data sets (see section 1.5.1) may be viewed as a table: rows define ob-

jects while columns correspond to object features. Such tables can be evaluated

according to two differentaspects: (i) Objects and their relations may be exam-

ined or (ii) the user may be interested in features and their relationship to one

another. Section 7.1 deals with object analysis, while section 7.2 concentrates

on the examination of feature relationships.

7.1 Object Visualization

In this section, an application for the visualization of objects, which aredefined

by tabular data sets, is presented. One focus is the usage of the knowledge-

acquisition methods from section 5. A largevariety of datais storedin the form

of a table (databases, spreadsheets, etc.). Therefore, the techniques described

in this section are also applicable to other applications.

7.1.1 Document Visualization

The visualization and retrieval of documents is subject to worldwide research

because since the development of world spanning computer networks (Inter-

net) exceeds the amount of accessible documents by far any administrable

number. Typical documents are technical drawings, manuals, letters, or court

decisions In orderto allow fora rapidandcorrectaccess toall documents, visu-

alization methods areused. For this, documents areplaced on aplane in such a

way that spatial closeness between documents corresponds to document simi-

larities.

Such document management systems have been examined in several dif-

ferent works: In [183], a management system for arbitrary text documents is

presented. This system extracts features from text documents and uses Multi-

dimensional Scaling or Principle Component Analysis to visualize documents

on a plane. In [22], a force-directed layout algorithm has been used to visual-

ize bibliographic data in 3 dimensions. Self-organizing maps were employed

in [106, 1]. Document clustering techniques were used in [187, 94, 1, 5]. The

113

114

CHAPTER 7. VISUALIZING MODELS GIVEN AS TABLES

proceedings of the International Conference on Information Knowledge Man-

agement (CIKM) or [15] may be used to obtain a more extensive overview.

The readermay note that the methodology of structure-based visualization

allows for a separation between (i) general tasks (e.g. graph drawing) and

(ii) specialized algorithms (e.g. Multidimensional Scaling), thus supporting

the application of different standard algorithms to the problem of document

visualization.

In this section, the visualization methodology presented in this work is

applied to the retrieval of technical documents, i.e. to the visualization of a

sets of fluidic circuits. Each fluidic circuit (e.g. hydraulic and pneumatic cir-

cuits) forms a node of a visualization graph, node pairs are connected by edges

weighted with the appropriate document similarity. A short introduction to

fluidic circuits is given in section 6.4.

f1 f2 f3 f4 f5

1 2 y 3 5.5

2 2 n 6 7.7

1 2 y 5 1.2

0.6

0.7

0.5

Figure

7.1:

Document Retrieval by Graph Visualization

The main idea is to describe a given set of documents as a tabular data set

(see section 1.5). Each row of this table consists of a set of features describing

one document. This is also shown in figure 7.1: Each document on the left side

forms a row of the table in the middle. The columns of this table correspond to

features of fluidic circuits. Using the methods introduced in section 5, a graph

is constructed. This graph is visualized using the method of structure-based

visualization from section 1.3.

This section focuses on two main problems: (i) The identification of reason-

able features for fluidic circuits and (ii) the application of the methods from

section 5 to the automatic construction of similarity functions for fluidic cir-

cuits. A similarity function assesses the similarity between the features of two

circuits. Asmentioned before,this similarity functionis used todefine theedge

weights of the visualization graph.

This process will now be described formally.

Formalization

Let

;:::;S

be a set of fluidic circuits. A tabular data set1

(

;

;Æ

;



)

is constructed as follows:



;:::;v

where

represents the fluidic circuit

1See definition 1.5.1

7.1. OBJECT VISUALIZATION

115



;



(

)

is set of features describing the fluidic circuit

In the following text,

will also be called a fluidic circuit.

As describedin section 5, edges andedge weights have to befound in order

to visualize a tabular data set. This is done by means of a so-called similarity

function2

sim

: (

)





(

)



;

where

sim

(

)

;Æ

(

))

denotes the

similarity between the nodes

(i.e. fluidic circuits

and

Using this function

sim

, a visualization graph

can be constructed as fol-

lows: Let

;

;Æ

;



)

be the tabular dataset as defined

above and let

sim

be a similarity function defined on

. Then a visualization

graph

V; E

;

;Æ;w;

)

can be constructed as follows:



(

sim

(

)

;Æ

(

))



;



(

)

In the rest of this section, appropriate functions

and

sim

will be intro-

duced. First, a method is described which computes for each circuit a feature

list

Finding Features for a Fluidic Circuit

Engineers normally do not look at fluidic circuits from a behaviour point of

view but instead employ a functional model of the circuit. Or, in other words,

they are more interested in what a circuits does than in how it fulfills its func-

tion. When talking about circuit similarity, we normally mean circuits with

similar functions. This is mainly caused by the problem-oriented approach of

engineers: An existing plan is normally retrieved in order to help solve a new

given problem; new problems are defined as a set of wanted functions.

Therefore, it make sense to reduce a circuit to a set a of features describing

its function. This feature set can then be used to define a similarity function. In

this section, only a short introduction to the featuredefinition of fluidic circuits

is given, more extensive overviews can be found in [67, 161].

Fluidic circuitscomprise functional units, so-called fluidic axes. The coreof

such an axis is a working element, i.e. a cylinder. From an engineer’s perspec-

tive, every cylinder, i.e. every axes, implements a function. Typical examples

are a cylinder raising a car or a cylinder locking a door. Thus, the functions of

a circuit are defined by its fluidic axes.

(

)

isdefinedasthecombinationofthe featuresofthecorrespondingaxes:

(

[

axis

(

)

;

where

denotes the fluidic axes of

axis

(

)

is a function which defines a

set of feature describing a fluidic axes

2An introduction to similarity functions can be found in chapter 5.

116

CHAPTER 7. VISUALIZING MODELS GIVEN AS TABLES

F1 , x1 , v1 , p1F2 , x2 , v2 , p2

a1a2

Figure

7.2:

A fluidic circuit with two axes

and

. Each axes is described by the

parameters force,position, velocity, and pressureof the corresponding

working element.

Axes are mainly described by their working element, i.e. in terms of force,

positions, velocities, and pressure. This can also be seen in figure 7.2.

A working element can be specified by its position diagram: This diagram

shows the cylinder position over a period of time. An example can be seen in

figure7.3: The working element ofthe axis

isfirstextended. Afterwards,itis

retracted. Inathirdtimeinterval,itstays inaretractedposition. Obviously,itis

possible to identify three so-called phases: drive-out, drive-in, and null-phase.

Since each phase implements a function (e.g. drive-out may correspond to the

raising of a car),it makes sense to describe axes by their corresponding phases.

null-phase

constant-drive-out constant-drive-in

null-phasehold-pressure

null-phase

x (m)

t (s)2.0 4.5 5.5

fast-drive

Figure

7.3:

Position diagram of two fluidic axes

and

Each axis is described by two types of features, (i) phase descriptions and

(ii) phase orders. These groups of features are defined as follows:

(i)Phase Descriptions

Phases are classified into the categories constant-drive, position-drive, hold-

position, accelerate, fast-drive, hold-pressure, and press; each category in turn

is characterized by 6 features:

1. How many phases of the specific category exist in the respective

axis?

2. How long (in seconds) is the duration of the phase ?

3. Which distance (in mm) is covered by the working element?

7.1. OBJECT VISUALIZATION

117

4. Which force (in Newton) is applied to the working element?

5. How precise must the axis work? This is a value from

[0; 1]

that

defines the acceptable deviations from the duration, the distance,

and the force.

(ii)Phase Orders These 49 features

ord

i;j

;



i; j



capture the order of the

phases. For example,

ord

;

is the number of times a phase of category

(constant-drive) is directly followed by a phase of category

(position-

drive).

Together, the feature vector

axes

(

)

for an axis

is of the following form:

(

phase description constant-drive

)

(

phase description position-drive

)

(

phase description hold-position

)

(

phase description accelerate

)

(

phase description fast-drive

)

(

phase description hold-pressure

)

(

phase description press

)

(

phase orders

)

Since the features of an fluidic axes and of fluidic circuits are now formally

defined, the similarity functions will be learned in the next section.

Learning a Similarity Function for Fluidic Circuits

Findingsimilaritiesbetweenfluidiccircuitsposesaknowledgeacquisitionprob-

lem, because experts are rare, expensive, and have their necessary knowledge

only implicitly at disposal. When asked to express their knowledge about

axes similarities explicitly as a mathematical function, most experts are over-

strained.

Below, two methods from section 5 that tackle the knowledge acquisition

problem are presented. The methods differ from each other in the explicity of

the necessary expert knowledge. The following text assumes that the readeris

familiar with the knowledge acquisition methods introduced in section 5.2.2.

First, a similarity function for fluidic axes is presented.

Using the definitions from above, the similarity between two nodes

(representing circuits

) is defined as:

sim

(

)

;Æ

(

)) =

max

sim

axes

(

axis

(

)

;Æ

axis

(

))

;

where

denotes the axes of

sim

axes

(

axis

(

)

;Æ

axis

(

))

denotes a function measuring the similarity be-

tween the features

axis

(

)

and

axis

(

)

, and

jj

holds.

Defining a similarity function

sim

(

)

;Æ

(

))

between two fluidic circuits

and

has now beenreducedto the problem ofdefining a similarity function

sim

axes

(

axis

(

)

;Æ

axis

(

))

between two fluidic axes

and

The similarity betweenaxes is definedusing the Weighted Linear Similarity

Function of Interaction Level 1 (see example 5.1.1):

Let

and

be two fluidic axes and let

axis

(

)=(

;:::;f

)

and

axis

(

118

CHAPTER 7. VISUALIZING MODELS GIVEN AS TABLES

(

;:::;f

)

be the corresponding feature vectors.

sim

axes

(

axis

(

)

;Æ

axis

(

))

is then defined as follows:

sim

axes

(

axis

(

)

;Æ

axis

(

)) =



j



;

with

Two knowledge-acquisition methods from section 5.2.2 are applied to find

values for the weights

. As mentioned before, the so-learned similarity mea-

sure is then used to construct a visualization graph.

Knowledge Acquisition Method 1: Learning by Classification

As described in section 5.2.2, one way to obtain the necessary expert knowl-

edge is to exploit a classification of the fluidic circuit. This knowledge can be

used to learn values for the weights

67 fluidic axes are classified into 9 classes by a domain expert. Using re-

gression3, a similarity measure is constructed. The error rate is defined as the

percentageof axes pairs whose similarity is assessed incorrectly by the learned

measure. The error rate on the learning set4is 12%, while the error rate on a

test set5is 16%. Obviously, a good similarity measureis constructed.

Knowledge Acquisition Method 2: Learning by Examples

Figure

7.4:

A manual arrangement of fluidic circuits

A graphical arrangement of circuit documents, which has been generated

by the domain expert, is analyzed and similarity measures are constructed 6.

An example can be seen in figure 7.4. To evaluate the quality of the learned

measure, the Mean Square Error (MSE) is used:

(

sim

(

)



sim

(

))

;

3For details concerning the following characteristics of the regression result please refer to sec-

tion A.2: 33 features are used, 40 % of them represent phase orders,

= 0

482

, significance of

the F-test=

4The learning set comprised axes pairs used for the learning process.

5The axes pairs in the test set have not been used for the learning process.

6For details concerning the following characteristics of the regression result please refer to sec-

tion A.2: 16 features are used, 51 % of them represented phase orders,

= 0

, significance of

the F-test=

001

7.1. OBJECT VISUALIZATION

119

and

denote fluidic axes,

denotes the number of axes,

sim

denotes the

similarity as predictedby the learned similarity measure, and

sim

denotes the

empirical similarity measure defined by the manual layout.

On a test set, a MSE of

045

is achieved. All similarity values are from the

interval

;

, hence the MSE defines a (squared) average deviation from the

correct similarity.

Only 16 features are used for the regression; all other features have con-

stant values. 51 % of them represent phase orders. Therefore, both phase or-

ders and phase descriptions are important. The most important featuresof the

phase descriptions are precision, distance, and number of phases. Especially,

the phase orders “constant-drive after constant-drive” and “position-drive af-

ter hold-position” are relevant.

The significance of the regression result7is

001

; meaning that the result is

not caused by random influences.

The

value of the regression (measuring the amount of variation ex-

plained by the regression) is

which can be interpreted as the statement that

the regression result can explain the observations only partially; i.e. there re-

main unexplained observations. In order to improve this result, a more com-

plex similarity function is needed. Experts assume that dependencies between

the features of an axes exist. Therefore, the Weighted Linear Similarity Func-

tion of Interaction Level 2 (see example 5.1.2) is used:

sim

axes

(

axis

(

)

;Æ

axis

(

)) =



j





j



jj



;

with

Applying regression to this function results in an MSE of

029

. The

value can be improved to

. Hence this complex similarity measure is able

to capture the dependencies between features.

I.e., also with this knowledge source a good similarity measure can be con-

structed.

Visualizing Fluidic Circuits

As described earlier, the constructed similarity measure can be used to trans-

form a tabular data set into a visualization graph. While only a few circuits are

normally used for the knowledge acquisition process, largenumbers of circuits

can be involved in the visualization process. Of course, as with all learning

processes, the fluidic circuits used for the learning set must be representative.

This problem can be solved by an expert who is normally able to choose an

appropriate set of plans.

A modifiedversion of STRUCTUREMINER (see section 6)is used tovisualize

circuits. Figure 7.5 shows a typical visualization and each plan is represented

by its name. A different visualization mode can be seen in figure 7.6 where the

circuits arenow shown as graphs. The positioning and especially the grouping

of circuits allows the user the assess similar documents quite easily. E.g. when

the user knows the positions of two matching circuits, other circuits between

them might also be interesting.

7Using an F-Test.

120

CHAPTER 7. VISUALIZING MODELS GIVEN AS TABLES

Figure

7.5:

Visualized Fluidic Circuits, textual display mode

Parenthesis: Such a visualization can easily be used to sup-

port the design of fluidic circuits: Experts define the expecta-

tions of a to-be-designed circuit in form of a functional de-

scription, i.e. the expert gives a (partial) feature vector of

the wanted circuits. This feature vector is visualized together

with several known circuits. All existing circuits close to the

new feature vector resemble the problem description and can

therefore be used to solve the design problem. If no visual-

ization is used and only the closest circuits are retrieved, this

solution is nothing but a normal Case-Based Reasoning ap-

proach.

Discussion

This section applies the knowledge-acquisition methods from chapter 5 to the

domain of document management. First, abstract features for fluidic docu-

ments are developed. In a second step, a similarity measure working on these

features is learned by means of different knowledge acquisition methods. The

resultspresentedaboveshow that reasonablesimilaritymeasuresarefound. In

the final step, a given set of documents is transformed into a graph using this

similarity measure. This graph is visualized using the method of structure-

based visualization from section 1.3.

Since, as outlined above, high quality measures can be learned, the result-

ing layouts successfully reproduce the user’s notion of document similarity.

Experts (e.g. in [161]) find the introduced knowledge acquisition and visual-

ization methods to be both helpful for the domain of fluidic circuitsand for the

general problem of document management.

This approachhasseveraladvantageswhen comparedto existing methods:

7.2. VISUALIZATION OF FEATURES

121

Figure

7.6:

Visualized Fluidic Circuits, graphical display mode



Thenew knowledge-acquisition methods allowfora fastandreliablelog-

ging of expert knowledge.



Structure-based visualization emphasizes the underlying structure, thus

supporting the fast access to large sets of documents.



Users can understand sets of unknown (fluidic) documents relatively

quickly.



The visualization can support the design of fluidic circuits.

Many problems remain, e.g.:



This approach to document management should be applied to different

types of documents, e.g. text documents. Such documents demand the

creation of different features.



Further, already existing document management methods, e.g. taxono-

mies, search algorithms, can be combined with the methods presented

here.

7.2 Visualization of Features

Tabular data sets (see definition 1.5.1) are normally depicted as a table. Rows

define objects while columns correspond object features. Users are often inter-

ested in dependencies between features. E.g. table 1.1 describes persons by

the features name, education, income, occupation, and gender. Typical ques-

tion formulations arewhether the gender influences theincome or whether the

education depends on the gender and on the citizenship.

In this section, features of tabular data sets (see section 1.5.1) are analysed

and visualized. The main idea is to treat features (the columns of a tabular

122

CHAPTER 7. VISUALIZING MODELS GIVEN AS TABLES

data set) as nodes of a visualization graph (see definition 1.4.1). Each pair of

nodes is connected by an edge which is weighted by the dependency between

the corresponding features. Feature dependencies express an undirected and

unsigned causality, i.e. two features

depend on each other if values of

(

) cause

(

) to assume special values. Examples are age and the number

of children (the older a person is, the more children she/he has) or income and

type of car. The causalities may be probabilistic and may comprise more than

just two features.

Parenthesis: Of course, feature dependencies can also be

computed using the methods from chapter 5: Features be-

come objects, while object descriptions become featuresa, i.e.

the table is rotated at

degrees. Using the terminology from

chapter 5, in this section the similarities are defined implicitly

by the given feature values, i.e. no explicit similarity function

is used and the object similarity category is specified by the

given values. Another difference exists: Features only com-

prise values of the same type (nominal, cardinal etc.). This

allows for the application of special statistical methods.

aNevertheless,theauthorwillrefrainfromcalling featuresobjectsandvice

versa.

Since the dependencies are not known beforehand, but constitute the goal

of the analysis, they must be induced from the tabular data sets by statistical

means. Typicaltechniquescanbeclassifiedaccordingtothetypeofthefeatures

(see also section 5.1):



Nominal features: Two main classes of algorithms can be identified:

(i) classical statistical approaches and (ii) association rules. The reader

may note thatassociation rules arealso basedon statistical computations;

the main different is rather that association rules (introductions can be

found in [2, 3, 13]) have been developed in the field of computer science

with a different focus than classical statistical approaches (see [6, 62]).

Association rules use a two-step method: In a first step, promising candi-

date sets of dependent variables are identified. The second steps gener-

ates rules using these candidate sets. Here, only the first step is needed.

Let

denotes the number of objects and

the maximum number of de-

pendent features8. If only dependencies between pairs of features are

wanted, the association rules method needs

(

)

time, otherwise its run-

time behaviour is exponential in

. Association rules are a relatively

fast method for identifying dependenciesbetween largersets of variables

(larger

) but, since this method is based on heuristic decisions, it is diffi-

cult to assess the quality of the results.

A common method for analysing the dependency between nominal fea-

tures in the field of statistics is the



-test ([6, 62]). This method allows

for a quantitative assessment of feature dependencies. Dependencies be-

tween pairs of features can be computed in

(

)

time, for all other cases

8For reasons of simplicity, binary variables are assumed.

7.2. VISUALIZATION OF FEATURES

123

the run-time is exponential in

. For

, the

log



l inear

-analysis ([62])

is normally applied.



Cardinal features: Regression (see section A) is normally used to com-

pute (linear) dependencies between cardinal features. The coefficient of

determination expresses the degree of dependency. An alternative is a

transformation into nominal features by introducing intervals.



Mixed cardinal and nominal features: Variance-analysis and discrimi-

nation-analysis (see [6, 62] fordetails) arewell-known techniques to anal-

yse such cases. Here, a different approach is used: Let

be a nominal

feature with

different values



;:::;

and

be a cardinal feature.

is transformed into

new feature

;:::;f

with

(

)=1

(



for all objects

and values



can be treated as a cardinal feature,

thus the dependency between

and

can be computed by applying

regression to the pairs

and

(



q).

Parenthesis: It is also helpful to combine association rules

and classical statistical approaches: The association rule al-

gorithm is used to identify promising sets of dependent vari-

ables. Classical approaches are than used to analyze the de-

gree of dependency between variables.

LFC 4

POP90C5

HU3

HC21

HC20

DOMAIN

OEDC5

POP90C4

RECINHSE

RECP 3

LFC 3

Figure

7.7:

A typical visualized cluster (using

MAJORCLUST

and factor-based drawing),

all nodes are related to middle-class houses.

This method is now applied to the analysis of largegroups of people. Every

person is describedby

481

features,including personal information (age, num-

ber of children, etc.), information about the neighbourhood (average income,

percentage of married people, etc.), and hobbies (pets, gardening, etc.). This

extensive data set and further information about the features can be found in

[89].

Here,only pair-wisedependencies betweenfeaturesof agroupof

1000

peo-

ple are analyzed and visualized. If both features are nominal, a



-test is done

and the Cramer’s V value(see [6] for details)is used as the corresponding edge

weight. Purecardinalfeaturesareanalyzedusing regressionandmixedfeature

pairs are treated as explained above.

124

CHAPTER 7. VISUALIZING MODELS GIVEN AS TABLES

Discussion

LSC1

HOMEE

P HOTO

LSC3

ETH4

ETH8

ETH16

ETH9

POBC1 ETH6

ETH7

ETH15

Figure

7.8:

Two visualized clusters (using

MAJORCLUST

and factor-based drawing). The

left cluster shows features related to people who were born outside the

U.S.A. or who speak a non-english language. The cluster on the right relates

for unknown reasons the hobby photography, people who work at home,

and people from the Philippines.

AGEC 6

HHN2

ETHC3

AGE9 01

AGE9 03 AGE 906

AGE9 02

AGE9 04

AGE9 05

MARR3

LFC 4

HHAGE2

LFC 2

LFC 1

AGEC 7

VC3

HHAS1

HHAGE3

HHAGE1

WWIIVETS

ANC6

POP90C5

HU3

HC21

HC20

DOMAIN

OEDC 5

POP 90C4

RE CINHSE

RECP 3

LSC 4

LFC 5

LFC 3

P UBP HOT O

AGEC 2

HUP A5

HUP A1

ANC14

ANC9

MC3

HC9

ANC12

MC2

HC1

Figure

7.9:

A clusteredand visualized set of features.

MAJORCLUST (see section 2.2.1) identifies 20 clusters. 13 of them are intu-

itively meaningful; figures 7.7 and 7.8 show three typical clusters. Figure 7.9

depicts a part of the overall layout of the visualized feature graph. Hierarchi-

cal MAJORCLUST even identifies 50 clusters on 3 hierarchy levels; at least 30

of them are intuitively meaningful. Kohonen-Clustering (section 2.1.4) iden-

tifies 22 clusters, most of them similar to the clusters found by MAJORCLUST.

MinCut-Clustering (section 2.1.2) finds 12 clusters, 11 containing



nodes

and one cluster comprising all other nodes. While MAJORCLUST finds the clus-

terswithin



seconds, Kohonen-Clustering needs



minutes and MinCut-

Clustering needs about



minutes. Therefore, the author advocates the us-

age of MAJORCLUST for these type of graphs.

Since nothing is known about feature graphs beforehand, only general

graph layout method should be applied. The author prefers all-pair factor-

based layout combined with an incremental spring-embedder(see section 4.2),

7.2. VISUALIZATION OF FEATURES

125

since the overall structure of the graph is emphazised (see also figure 7.10).

Force-directed methods (section 4.1.1) fail to produce a reasonable global lay-

out. The run-time behaviour of all-pair factor-based layout and force-directed

layout is comparable.

Figure 7.10:

Left side: A part of a feature graph visualized by means of all-pair factor-

based layout. Right side: The same graph visualized with Tunkelang’s version of force-

directed layout.

Several open research issues remain:

☞More datasets should be used.

☞An extensive comparison of the different methods for the identification

of feature dependencies is needed.

☞Causalcharacteristics ofthe featuredependenciesshould beemphasized.

Nevertheless, the evaluation of this application suggests that the method-

ology of structure-based visualization can be used to analyze complex feature

dependencies.

126

CHAPTER 7. VISUALIZING MODELS GIVEN AS TABLES

Chapter 8

Summary

In this work, a novel methodology for the visualization of graphs is presented.

Here, graphsarevisualized with the purpose of data mining, i.e. the visualiza-

tion should help the user to understand the concepts and information hidden

in the graphs.

The methodology of structure-based visualization comprises three steps:

(i) Identification of the graph’s structure by means of clustering algorithms,

(ii) cluster classification by means of classification functions or case-basedclas-

sification, and (iii) layout of the clusters and nodes by means of graph drawing

methods.

For tabular data sets, a fourth step exists: Node similarities are defined

using knowledge acquisition techniques and machine learning.

These four steps are implemented as follows:

1. Structure Identification: In this work, graph structures are identified by

clustering algorithms. First, existing clustering approaches are described

and classified according to two criterions: (i) Optimization criterion and

(ii) necessary additional knowledge.

Then, three new clustering methods are presented:



MajorClust: MAJORCLUST allows for a fast and reliable clustering

of graphs. Theoretical deliberations and practical tests using six

applications in chapters 6 and 7 show the high quality of MAJOR-

CLUST’s clustering results.





-Optimization:



is a graph-theoretical quality measure for clus-

terings. Using



, features of good clusterings are presented.



AdaptiveClustering: Thismethodintegratesdomain-specificknow-

ledge into the clustering process. For this, given exemplary cluster-

ings are analyzed and clustering knowledge is learned. In section

6.4.1, this method is applied to the clustering of fluidic circuits.

2. Structure Classification: Identified clusters are illustrated by a label, e.g.

a network traffic cluster may be identified as a special university insti-

tute. For this, two classification approaches are presented: (i) Direct clas-

sification and(ii)case-basedclassification. Machinelearning methods are

127

128

CHAPTER 8. SUMMARY

used to abstract the necessary classification knowledge from exemplary

classifications.

3. Structure Envisioning: In this step, the graphs are laid out, i.e. node po-

sitions arecomputed. First, existing methods arepresentedand classified

accordingto two criterions: (i) Used quality criterion and (ii) correspond-

ing graph class.

Anovelmethod,whichcombinesprinciplecomponentanalysisandforce-

directeddrawing, is presentedin section 4.2. This method is appliedsuc-

cessfully to several domains in chapters 6 and 7.

All graph drawing methods used for this step are extended in order to

integrate clustering knowledge.

4. CreatingGraphsfrom TabularData: In this work, tabulardata is treated

asaspecialtypeofgraph. Inordertoapplythemethodologyofstructure-

based visualization to tabular data, similarities between nodes have to be

defined. This is done by means of a so-called similarity measure which

assesses the similarity between two nodes.

Since the manual definition of such measures overtaxes the capabilities

of most experts, a new methodology for learning of similarity measures,

which allows for the application of standard machine learning methods,

is described in section 5.2.2 . Three knowledge acquisition methods are

introduced which implement this methodology: (i) Learning by means

of given similarities, (ii) learning by classification, and (iii) learning by

examples. All these methods allow for a fast and abstract definition of

object similarities.

The visual data mining techniques introduced in this work are applied to

six domains. For this, the methodology of structure-based visualization and

all algorithms have been implemented within the scope of the system STRUC-

TUREMINER.STRUCTUREMINER is applied to the following domains:



Network Traffic: Network trafficmay be seen as a graph: computers and

active network components are modeled by nodes while communication

links form the edges. Such graphs are visualized successfully in section

6.1. The following results are worth mentioning as well:

–Clusters are classified by a means of a case-based approach.

–The change oftraffic overa period oftime is shown using a specially

developed animation method.

–STRUCTUREMINER is used to support the analysis and planing of

network topologies.

–The connections between the field of visualization and simulation

are explained using a novel network simulation approach.



Configuration Knowledge-bases: Systems for the configuration of tech-

nicalsystems useaso-calledknowledgebasetostoretheirdomainknowl-

edge. This knowledge-base may be seen a graph: technical components

form the nodes and causal effects result in directed edges, i.e. an edge

from node

to node

means that if component

is chosen for the

129

configuration,

must be chosen as well. STRUCTUREMINER helps the

user to understand such knowledge bases. Interesting feature are:

–Technical subsystem are recognized automatically, i.e. clusters are

labeled.

–STRUCTUREMINER allows for the choice of different layout meth-

ods for different clusters. By analyzing the user’s choices, STRUC-

TUREMINER learns a function which automatically chooses for fu-

ture clusters the appropriate layout method.

–Differentedgeshapesandcolorsareusedto representdifferenttypes

of edges.



ProteinInteractionGraphs: Such graphsmodel the interactionsbetween

proteins and are important in the context of the Human Genome Project.

STRUCTUREMINER makes instructive insights into these graphsand their

structure possible.

Furthermore, MAJORCLUST is used to examine an existing protein taxon-

omy; it can be proved that this taxonomy groups together such proteins

that have only a few interactions with each other.



Fluidic Circuits: This domain is used mainly to present a method which

learns a domain-dependent clustering method by analyzing given sam-

ple clusterings. In section 6.4, results are given which prove the success

of this approach.



Document Management: The visual presentation of large numbers of

documents can help to manage and access otherwise incomprehensible

document sets. In this work, each document represents a fluidic circuit.

In order to visualize document sets, three steps are necessary:

1. Generation of Document Features: For each document, i.e. fluidic

circuit, abstract features are generated automatically. These features

give a functional description of fluidic circuits. The readermay note

that this domain is therefore an example of the visualization of tab-

ular data sets.

2. Definition of Document Similarities: The knowledge acquisition

and machine learning methods described above (see “Creating

Graphs from Tabular Data”), are successfully applied to the defi-

nition of a similarity measure which assess the similarity between

document features.

3. Visualization: Using the learned similarity measurefrom the previ-

ous step, a document graph is constructed: documents form nodes;

edges are weighted by document similarities. This graph is visual-

ized using the methodology of structure-based visualization.



Visualization of Tabular Data Features: Tabular data is often presented

in the form of a table: objects define the rows while object features be-

come the columns. E.g. if objects represent people, reasonable features

arename,age, andgender. In thisapplication, STRUCTUREMINER is used

to visualize dependencies between features. E.g. as an example, a data

130

CHAPTER 8. SUMMARY

set consisting of

100

;

000

people who are described by

481

features, is

used.

For this, two steps are necessary:

1. First, the dependencies between features are computed. Depend-

ing on the type of feature (cardinal or nominal), different machine

learning techniques are used, e.g. regression or



-test. Next a

dependency-graph is constructed: Nodes model features; the graph

is totally connected. Edges are weighted by the degree of depen-

dency.

2. This graph is visualized using the methodology of structure-based

visualization.

Thevisualizationhelps the usertounderstandthe dependenciesbetween

features. Mostoftheidentifiedclusterscorrespondtoreasonableclusters,

e.g. people living in rural areas or rich neighbourhoods.

In conclusion, it can be said that the methodology of structure-based visu-

alization has been validated by (i) theoreticaldeliberations and (ii) by applying

it to six domains. For each step of the methodology, either existing approaches

have beenused, existing algorithms havebeen extended, ornew methods have

been developed.

Appendix A

Machine Learning

This chapter is not intended as an introduction to the field of machine learn-

ing. Its main purpose is to define two common machine learning techniques

formally: neural networks1and regression. Furthermore, section A.1 explains

how regression and neural networks can be used to learn a classification func-

tion.

The next two sections should (i) remind the reader of basics concepts and

(ii) explain some important terms. For introductions to the field of machine

learning, the reader may refered to [112, 113, 6]. Neural networks are intro-

duced in [9, 130] and regression in [184, 116, 6].

Regression and neural networks try to learn a function

which maps from

cardinal2independent features onto one so-called dependent variable.

Both methods employ supervised-learning, i.e.

is learned by analyzing a set

(

x; g

(

)

, the so-called learning set. Normally, a second set

(

x; g

(

))

(

x; g

(

))

, the so-called testset, is given which is

used to examine the quality of the learned function.

Regression and neural networks respectively use a function

to approxi-

mate

should possess two features: (i) A resemblance to

and (ii) its struc-

ture should take the learning process into consideration. Therefore, the user

has to know for both methods the general structure of

. This structure is used

to choose an appropriate function template for

. E.g. regression often uses

the linear function

((

;:::;x

)) =



where

. Regression and

neural networks have also in common that they try to minimize the same error

function

(

(

x;g

(

))

(

)



(

))

1In this work, the term “neural network” denotes feed-forward neural nets which use back-

propagation as a learning method.

2See section 5.1.

131

132

APPENDIX A. MACHINE LEARNING

Parenthesis: In section 5.1, a so-called weighted linear sim-

ilarity function of interaction level



(see definition 5.1.2)

has been introduced. There, each



defines one of









elementsain

, i.e.









. Both regres-

sion and neural networks can learn weighted linear similarity

functions of arbitrary interaction levels



p; A

are defined by definition 5.1.2.

A.1 Learning a Classification Function

In order to apply regression or neural networks,

(

)

must be a cardi-

nal value, i.e.

. Nevertheless, regression and neural networks are

also used to learn a classification function. Such a function maps from a list of

features

onto a set of labels

;:::;l

, i.e.

;:::;l

. For

this, a set of new function

;:::;g

is introduced where

The functions

are now used to solve the classification problem for given

features

(

)=max

(

)



with



Since

(

)

denotes a cardinal value, regression and neural net-

works can now be used to learn the functions

. For this,

new learning set

;:::;L

are created as follows:

(

x; g

(

))

(

)

g[f

(

x; g

(

))

(

where



Regression and neural networks can now use the learning set

to learn

the function

. Obviously, the problem of learning a classification function for

classes has been reduced to the problem of learning

functions which map

onto cardinal values.

Input Regression/

Neural Net

g1(x)

g2(x)

max

Maximization

Classification

Figure

A.1:

Using regressionor neural networks to learn a classification function.

This is also illustrated by figure A.1: For each class

a classification func-

tion

is learned using regression or neural networks. The classification of an

input vector

= (

;:::;x

)

is defined by the class

corresponding to the

function

with the highest value

(

)

, i.e.



(

)



(

)

A.2. REGRESSION

133

A.2 Regression

LinearRegressiontriestoapproximate

(

)

bymeansofalinearfunction

(



where

;:::;x

)

. For this, weights

are computed

which minimize the term



. Figure A.2 gives an example: The linear func-

tion approximatesthe boxes whose positions are defined by the given patterns

(

x; g

(

))

. The dotted lines are the respective errors.

Figure

A.2:

A learned linear function and the respective errors (dotted lines).

By using matrices, linear regression can be written down more clearly. It is

presumed that

=((

(1)

(

(1)

))

;:::;

(

)

(

)

)))



(

)

, i.e.

has

rows and

columns. Row

is defined by the

feature list

(

)



(

;

. Row

consists of

(

)



(

;

. Row

consists of



(

;

. Row

denotes the error defined by

(

)



(

)

Obviously,



holds.

canbecomputedbytheterm



)





. For this, the matrix



must be invertible, i.e. the columns

must be linear independent3. Thus, linear regression requires

(

)

time. Proofs and further explanations are given in [184, 116, 6].

Several methods exist which evaluate the quality of a regression result:



Coefficient of Determination: This coefficient measures the “goodness

of fit” of

, i.e. is answers the question of how closely

approximates the

patterns in

. It is defined as



j

(

)





)



j

(

)





)

where





j

(

)

has values between

(no fit) and

(perfect fit, i.e. all nodes are on the

line).

3If the columns are not linear independent either neural networks are used or columns are

deleted.

134

APPENDIX A. MACHINE LEARNING



F-Test: The F-test examines whether the regression result can also be

applied to patterns which are not part of

. For this, the term

emp



(

j



)

is computed. Using the F-distribution, a standard statistical

test is carried out.



t-Test: A t-test can be carried out for each feature separately. This statisti-

cal hypothesis test examines whether a single feature is necessary for the

regression result. Details can be found in [6].

A.3 Neural Networks

For this work, only one special class of neural nets is used: Multilayer, feed-

forwardperceptronnetworks which usebackpropagationasalearningmethod.

Such a neural network consists of functional units, so-called perceptrons,

which are arranged in layers. Let

;



;



denotes the

perceptron in the

th layer. Obviously,

layers exist and layer

consists of

perceptrons. Perceptrons in layer





can only be connected

to other perceptrons in layer

; connections are directed towards layer

. A connection

(

)

between two perceptrons

and

is weighted by a

factor

(

)

. Layer

is called input-layer, layer

is called output-layer.

denotes the set of all perceptrons and

denotes the set of all connections.

Figure A.3 (left side) shows an exemplary neural network.

Layer 1

Layer 2

Layer 3

0.2

0.4

0.6

0.8

–6 –4 –2 2 4 6

Figure A.3:

Left side: A multilayer neural net, right side: the sigmoidal threshold func-

tion

Each perceptron

is defined by two states: its activation

net

(

)

and its

output

(

)

. While the activation designates an internal perceptron state, the

output of a perceptron can be seen by all of its descendants. The activation of

a perceptron

is defined as follows:

net

(

)

(

)

(

)

Theoutput ofaperceptron

isdefinedas

(

net

(

))

where

denotes

a function

. In this work, the sigmoidal function



used. This function can be seen on the right side of figure A.3. The readermay

note that the structure of

is coded into the topology of the neural network.

The features given in the learning set

are used as inputs for the percep-

trons of the input-layer. Using a feature value

as the input for a perceptron

A.3. NEURAL NETWORKS

135

in the input layer means that the output

(

)

is set to the feature

value

In this work, only one perceptron

is needed in the output layer. For a

given feature set

, the output function

(

)

should approximate

(

)

As mentioned above, the learning set comprises both features and respective

values of

. Therefore, a supervised learning strategy can be applied.

Backpropagation uses a gradient optimization method to minimize



. For

this, the individual error terms



(

x; g

(

)) = (

(

)



(

))

are minimized iter-

atively (

(

(

x;g

(

))



(

x; g

(

))

(

)

denotes the output of

when the

features

are used as inputs for the input-layer perceptrons.

Since backpropagation, unlike regression, does not minimize



in one step

but minimizes the terms



one after the other, the order of the elements of

may influence the learning process.

Now, all patterns are presented to the neural network, i.e. they are used as

inputs for the perceptrons in the input-layer. For each pattern

(

x; g

(

))

the weights of a connection

(

)

areadjusted by adding the term

Æ

(

)

(

)

to the hitherto weight.



is a given learning factor which sometimes decreases

over time.

(

)

is an error term for

and is defined as follows:

(

)(1



(

))(

(

)



(

))

i.e.

belongs to

the output layer

(

)(1



(

))

(

)

(

)

(

)

else

Proofs and further explanations for this learning algorithm can be found in

[9].The quality of the result is often evaluated by means of the so-called mean

squared error:

MSE

(

x;g

(

))



(

x; g

(

))

(

)

The reader may note that, by having more than one perceptron in the out-

put layer, several functions sharing the same input can be learned at the same

time. E.g. each of the functions

, which are used for classification tasks (see

the descriptions above), can be modeled by one perceptron of the same neu-

ral network. That way, the functions

can share parts of their computations

(since they share parts of the network).

136

APPENDIX A. MACHINE LEARNING

Appendix B

Some Graph-Theoretic

Definitions

Definition B.0.1 (Walk)

Let

V; E

)

be a graph and

(

;:::;e

)

a set of edges. If there exists a set of

nodes

(

;:::;v

)

with



, the set of edges is called walk. If

holds, it is called a closed walk.

Definition B.0.2 (Path, Cycle)

Let

V; E

)

be a graph and

(

;:::;e

)

a set of edges. If there exists a set

of nodes

(

;:::;v

)

with



and



i; j



, the set of

edges is called path. If

and



hold, the set of edges is called a cycle.

Definition B.0.3 (Length of a Path)

Let

V; E

)

be a graph and

;:::;v

)

a path in

. The length of

the path

is defined as





(

)

Definition B.0.4 (Distance

dist

)

Let

V; E

)

be a graph. The distance

dist

(

)

between

defined as the length of the shortest path between

and

Definition B.0.5 (Maximum Flow)

Let

V; E

)

be a graph. The maximum flow between

is defined

as the maximum number of edge-disjunctive paths in

from

, i.e. as:

max(

(

;:::;v

)

;:::;

(

;:::;v

)

For a weighted graph

V; E

)

the maximum flow between

is defined very similar, only is every path now assessed by the minimum

weight of an used edge:

max(

(

;:::;v

)

;:::;

(

;:::;v

)

(

))

(

)

denotes the smallest weight of an edge in

, i.e.

((

;:::;v

)) = min





(

))

Details can be found in [80].

Definition B.0.6 (Induced Subgraph)

Let

= (

V; E

)

be a graph and



. The graph

(

) = (

C; E

)

with

is called the subgraph of

induced by

137

138

APPENDIX B. SOME GRAPH-THEORETIC DEFINITIONS

Definition B.0.7 (Cut of a Subgraph)

Let

V; E

)

be a graph and



. The cut of

(

cut

(

)

) is defined as:

cut

(

jff

Definition B.0.8 (Tree)

An undirected graph

V; E

)

is called a tree iff (

)

is connected and (

)

contains no cycles.

Bibliography

[1] A.Becks, S.Sklorz, and C.Tresp. Semantic structuring and visual query-

ing of document abstracts in digital libraries. In Second European Confer-

ence on Research and Advanced Technology for Digital Libraries (ECDL’98),

1998.

[2] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules be-

tween sets of items in large databases, 1993.

[3] R. Agrawal and R. Srikant. Fast algorithms for mining association rules,

1994.

[4] D. Aha. Tolerating noisy, irrelevant, and novel attributes in instance-

based learning algorithms. International Journal of Man-Machine Studies,

1992.

[5] J. Aslam, K. Pelekhov, and D. Rus. Using star ckusters for filtering. In

9th International Conference on Information Knowledge Management (CIKM

2000), 2000.

[6] K. Backhaus, B. Erichson, W. Plinke, and R. Weber. Multivariate

Anaylsemethoden. Springer, 1996.

[7] T. Bailey and J. Cowles. Cluster definition by the optimization of simple

measures. IEEE Transactions on Pattern Analysis and Machine Intelligence,

September 1983.

[8] G. D. Battista, P. Eades, R. Tamassi, and I. G. Tollis. Graph Drawing -

Algorithms for the Visualization of Graphs. Prentice Hall, 1999.

[9] R. Beale and T. Jackson. Neural computing. Institute of Physics Publish-

ing, Bristol, Philadephia, 1990.

[10] R. A. Becker, S. G. Eick, and A. R. Wilks. Visualizing network data. IEEE

Transactions on Visualization and Computer Graphics, 1(1), march 1995.

[11] K. Überla. Faktorenanalyse. Springer, 1977.

[12] M. Bernard. On the automated drawing of graphs. In 3rd Caribbean Conf.

on Combinatorics and Computing, 1981.

[13] M. Berry and G. Linoff. Data Mining Techniques. Wiley Computer Pub-

lishing, 1997.

139

140

BIBLIOGRAPHY

[14] M. Bertold and D. Hand, editors. Intelligent Data Analysis: An Introduc-

tion. Springer Verlag, 1999.

[15] L. Bielawski and J. Boyle. Electronic document Management Systems.

Prentice-Hall, 1997.

[16] A. Bonzano, P. Cunningham, and B. Smyth. Using introspective learning

to improve retrieval in cbr: A case study in air traffic control. In Proceed-

ings of the Second ICCBR Conference, 1997.

[17] I. Borg and P. Groenen. Modern Multidimensional Scaling. Springer, 1997.

[18] M.Bramer,editor. Knowledge discoveryand data mining: theory and practice.

IEEE Books, 1999.

[19] J. Branke,F. Bucher, and H.Schmeck. Using geneticalgorithms for draw-

ing undirected graphs. In Proc. Third Nordic Workshop on Genetic Algo-

rithms and thier Applications, 1997.

[20] K. Brodlie, L. Carpenter, R. Earnshaw, J. Gallop, R. Hubbold, A. Mum-

ford, C. Osland, and P. Quarendon, editors. Scientific Visualization.

Springer, 1992.

[21] R. Cannon, J. Dave, and J. Bezdek. Efficient implementation of the fuzzy

c-means clustering algorithms. IEEE Trans. Pattern Analysis Machine In-

telligence, 8, 1986.

[22] M. Chalmers and P. Chitson. Bead: Explorations in information visual-

ization. In 15th Int’l SIGIR, 1992.

[23] W. Cleveland. The Elements of Graphing Data. Wadsworth Advanced

Books and Software, 1985.

[24] W. Cleveland. Visualizing Data. AT&T Bell Laboratories, 1993.

[25] H.CliffordandW. Stephenson. An IntroductiontoNumericalClassification.

Academic Press, 1975.

[26] A. Constantine and J. Gower. Graphical representation of asymmetric

matrices. Applied Statistics, 1978.

[27] T. O. Council. http://www.olapcouncil.org/.

[28] R. H. Creecy, B. M. Masand, S. Smith, and D. Waltz. Trading mips and

memory for knowledge engineering. Communications of the ACM, 35,

1992.

[29] R. Davidson and D. Harel. Drawing graphs nicely using simulated an-

nealing. ACM Trans. Graph., 15(4), 1996.

[30] J. de Leeuw and G. Michailides. Graph layout techniques and multidi-

mensional data analysis. Statistics Series 248, UCLA, 1999.

[31] R. Diekmann, B. Monien, and R. Preis. Using helpful sets to improve

graph bisections. Technical Report tr-rf-008-94,University of Paderborn,

1994.

BIBLIOGRAPHY

141

[32] C. Duncan, M. Goodrich, and S. Kobourov. Balanced aspect ratio trees

and their use for drawing very large graphs. In S. Whitesides, editor,

Graph Drawing, Lecture Notes in Computer Science. Springer, 1998.

[33] P. Eades. A heuristic for graph-drawing. Congressus Numerantium,

42:149–160,1984.

[34] P. Eades. Drawing free trees. Bulletin of the Institute for Combinatorics

and its Application, 1992.

[35] P. Eades, R. Cohen, and M. Huang. Online animated graph drawing for

web navigation. In Proc. Graph Drawing ’97. Springer, 1997.

[36] P. Eades and Q.-W. Feng. Multilevel visualization of clustered graphs.

In S. North, editor, Graph Drawing, Lecture Notes in Computer Science.

Springer, 1996.

[37] P. Eades and Q.-W. Feng. Drawing clustered graphs on an orthogonal

grid. In G. DiBattista, editor, Graph Drawing, LectureNotes in Computer

Science. Springer, 1997.

[38] R. Earnshaw and N. Wiseman. AN Introductory Guide to Scientific Visual-

ization. Springer, 1992.

[39] L. A. R. Eli B. Messinger and R. R. Henry. A divide-and-conquer algo-

rithm for theautomatic layout of largedirectedgraphs. IEEETransactions

on Systems, Man, and Cybernetics, January/February 1991.

[40] R. Elmasri and S. Navathe. Fundamentals of Database Systems. Ben-

jamin/Cummings Publ. Company, 1994.

[41] B. S. Everitt. Cluster analysis. Eward Arnolds, New York, Toronto, 1993.

[42] C. Faloutsos and K.-I. Lin. Fastmap: A fast algorithm for indexing, data-

mining and visualization of traditional and multimedia datasets. In SIG-

MOD Conference, 1995.

[43] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors.

Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press,

1996.

[44] A. Flexer. On the use of self-organizing maps for clustering and visu-

alization. Principles of Data-Mining and Knowledge Discovery (PKDD99),

1999.

[45] K. Florek, J. Lukaszewiez, J. Perkal, H. Steinhaus, and S. Zubrzchi. Sur

la liason et la division des points d’un ensemble fini. Colloquium Methe-

maticum, 2, 1951.

[46] K. D. Forbus and J. de Kleer. Building Problem Solvers. MIT Press, Cam-

bridge, MA, 1993.

[47] T.FruchtermanandE.Reingold. Graph-drawingbyforce-directedplace-

ment. Software-Practice and Experience, 21(11):1129–1164,1991.

142

BIBLIOGRAPHY

[48] G. Furnas and A. Buja. Prosection views: Dimensional interference

through sections and projections. Journal of Computational and Graphical

Statistics, 3(4), 1994.

[49] R. T. G. DiBattista, P. Eades andI. Tollis. Algorithms for drawing graphs:

An annotated bibliography. Computational Geometry, 4, 1994.

[50] E. Gansner, E. Koutsofios, S. North, and K. Vo. A technique for drawing

directed graphs, 1993.

[51] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to

the Theory of NP-Completeness. Freeman, San Francisco, CA, 1979.

[52] I. Gath and B. Geva. Unsupervised optimal fuzzy clustering. IEEETrans-

actions on Pattern Analysis and Machine Intelligence, 11(7), 1989.

[53] Markus Rümekasten. Hybride, tolerante Synchronisation für die verteilte

und parallele Simulation gekoppelter Rechnernetze. PhD thesis, University

of Paderborn, 1997.

[54] E. Godehardt. Graphs as Structural Models, volume 4. Vieweg, 1988.

[55] J. Gower. Measures of similarity, dissimilarity and distance. In S. Kotz,

N. Johnson, and C. Reads, editors, Encyclopedia of Statistical Science, vol-

ume 5. John Wiley and Sons, 1985.

[56] S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm

for large databases. Proceedings of ACM SIGMOD International Conference

on Management of Data, 1998.

[57] R. Hadany and D. Harel. A multi-scale algorithm for drawing graphs

nicely. In P. Widmayer, G. Neyer, and S. Eidenbenz, editors, Graph Theo-

retic Concepts in Computer Science (WG ’99). Springer, 1999.

[58] F. Halsall. Data Communications, Computer Networks and Open Systems.

Addison-Wesley, 1996.

[59] J. Han and M. Kamber. Data Mining : Concepts and Techniques. Morgan

Kaufmann, August 2000.

[60] J. Hanke andJ. Reich. Kohonen map as avisualization tool for the analy-

sis of protein sequences: multiple alignments, domains and segments of

secondary structures. Computer Applications in the Biosciences, 12(6),1996.

[61] D. Harel and Y. Koren. A fast multi-scale method for drawing large

graphs. In Proc. Working Conf. on Anvanced Visual Interfaces. ACM Press,

2000.

[62] J. Hartung. Statistik. Oldenbourg, 1999.

[63] S. R. Hedberg. Smart data miners are cashing in on valuable information

buried in private and public data sources. Byte Magazine, October 1995.

[64] M. Heinrich and E. W. Jüngst. A Resource-based Paradigm for the Con-

figuring of Technical Systems for Modular Components. In Proc. CAIA

’91, pages 257–264,1991.

BIBLIOGRAPHY

143

[65] B. Hendrickson. Multidimensional spectral load balancing. Technical

Report SAND93-0074,Sandia National Laboratories, 1993.

[66] B. Hendrickson and R. Leland. A multilevel algorithm for partitioning

graphs. Technical Report SAND 93-1301, Sandia National Laboratories,

1993.

[67] M.Hoffmann. Zur Automatisierung des Designprozesses fluidischer Systeme.

eingereichtedissertation, University of Paderborn,Department of Math-

ematics and Computer Science, 1999.

[68] K. Holm. Die Befragung 3. UTB, 1976.

[69] N. Howe and C. Cardie. Examining locally varying weights for nearest

neighbor algorithms. In Proceedings of the Eleventh ICML. Morgan Kauf-

mann, 1997.

[70] M. Huang and P. Eades. A fully animated interactive system for cluster-

ing and navigating huge graphs. In S. Whitesides, editor, Graph Drawing,

Lecture Notes in Computer Science. Springer, 1998.

[71] B. Huffaker, J. Jung, D. Wessels, and K. Claffy. Visualiza-

tion of the growth and topology of the nlanr caching hierarchy.

http://www.caida.org/Tools/Plankton/Paper.

[72] B. Huffaker,E. Nemeth, and K.Claffy. Otter: A general-purposenetwork

visualization tool. http://www.caida.org/Tools/Otter/Paper.

[73] IJCAI Workshop on "Learning about User", 1999.

[74] A. Inselberg and B.Dimsdale. Parallel coordinates: A tool for visualizing

multi-dimensional geometry. In Visualization, 1990.

[75] M. Jambu. Explorative Datenanalyse. Gustav Fischer Verlag, 1992.

[76] R. Jayakumar.

(

)

algorithms for graph planarization. IEEE Trans.

Comp.-Aided Design, 8, 1989.

[77] D. Johnson, C. Aragon, L. McGeoch, and C. Schevon. Optimization by

simulated annelaing; an experimental evaluation; part 1, graph parti-

tioning. Operations Research, 37(6), 1989.

[78] S. Johnson. Hierarchical clustering schemes. Psychometrika, 32, 1967.

[79] M. J"unger, G. Rinaldi, and S. Thienel. Practical performance of efficient

minimum cut algorithms. Technical Report 97.271, Angewandte Mathe-

matik und Informatik - Univeristät zu Köln, 1997.

[80] D. Jungnickel. Graphen, Netzwerke und Algorithmen. BI Wissenschaftsver-

lag, 1990.

[81] S. T. K. Sugiyama and M. Toda. Methods for visual understanding of

hierarchical system structures. IEEE Transactions on Systems, Man, and

Cybernectics, 11(2), 1981.

[82] H. Kaiser. Formulas for component scores. Psychometrika, 27, 1962.

144

BIBLIOGRAPHY

[83] T. Kamada and S. Kawai. An algorithm for drawing general undirected

graphs. Information Processing Letters, 31:7–15, 1989.

[84] M. Kanehisa. Post-genom Informatics. Oxford University Press, 2000.

[85] D. Karger and C. Stein. A new approach to the minimum cut problem.

Journal of the ACM, 43:601–640,1996.

[86] G. Karypis, E.-H. Han, and V. Kumar. Chameleon: A hierarchical clus-

tering algorithm using dynamic modeling. Technical Report Paper No.

432, University of Minnesota, Minneapolis, 1999.

[87] G. Karypsis and V. Kumar. A fast and high quality multilevel scheme

for partitioning irregular graphs. Technical Report 95-035, University of

Minnesota, Minneapolis, 1995.

[88] J. Kay, editor. proc. of the seventh international conference on user modeling

(UM99). Springer, 1999.

[89] Data for the kdd cup 1998, provided by the paralyzed veterans of amer-

ica. http://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html,1998.

[90] D. Keim, H.-P. Kriegel, and M. Ankerst. Recursive pattern: A technique

for visualizing very large amounts of data. In Proc. Visualization, 1995.

[91] D. A. Keim. Databases and visualization. In Proc. ACM SIGMOD Int.

Conf. on Management of Data, Montreal, Canada, 1996. Tutorial.

[92] D. A. Keim and H.-P. Kriegel. Visualization techniques for mining large

databases: A comparison. IEEE Transactions on Knowledge and Data Engi-

neering, 1996.

[93] B. Kernighan and S. Lin. Partitioning graphs. Bell Laboratories Record,

January 1970.

[94] H. Kim. A semi-supervised document clustering technique for informa-

tion organization. In 9thInternational Conference on InformationKnowledge

Management (CIKM 2000), 2000.

[95] K. Kira and L. Rendell. A practical approach to feature selection. In

Proceedings of the Ninth International Conference on Machine Learning, 1992.

[96] T. Kohonen. Self Organization and Assoziative Memory. Springer, 1990.

[97] J. Kolodner. Case-based reasoning. Morgan Kaufmann, San Mateo, CA,

1993.

[98] W. L. G. Koontz andK. Fukunaga. Asymptotic analysis of a nonparamet-

ric clustering technique. IEEE Transactions on Computers, c-21(9), 1972.

[99] W. L. G. Koontz and K. Fukunaga. A nonparametric valley-seeking tech-

nique for cluster analysis. IEEE Transactions on Computers, c-21(2), 1972.

[100] A. D. Kraus. Matrices for Engineers. Hemisphere, 1987.

BIBLIOGRAPHY

145

[101] J. B. Kruskal and J. B. Seery. Designing network diagrams. In Proc. First

general Conference on SocialGraphics.U.S.Department ofthe Census, 1980.

[102] B. Kuipers. Qualitative Simulation. Artificial Intelligence, 29, 1986.

[103] J. LeBlanc, M.Ward, and N. Wittels. Exploring n-dimensional databases.

In Proc. Visualization, 1990.

[104] T. Lengauer. Combinatorial Algorithms for Integrated Circuit Layout.B.G.

Teubner, Stuttgart, 1990.

[105] M. Lewis-Beck, editor. Factor Analysis and Related Techniques. SAGE Pub-

lications Toppan Publications, 1994.

[106] X. Lin, D. Soergel, and G. Marchionini. A self-organizing semantic map

for information retrieval. In annual international ACM SIGIR conference on

research and development in information retrieval, 1991.

[107] I. Lisitsyn and V. Kasyanov. Higres - visualization system for clustered

graphs and graph algorithms. In J. Kratochvil, editor, Graph Drawing,

Lecture Notes in Computer Science. Springer, 1999.

[108] G. Liu. Introduction to Combinatorical Mathematics. McGraw Hill, 1968.

[109] J. McCarthy. private communications. Technical report, Stanford Uni-

versity, 1956.

[110] B. Meyer. Self-organizinggraphs -a neuralnetwork perspective of graph

layout. In S. Whitesides, editor, Graph Drawing 1998. Springer, 1998.

[111] S. Meyer zu Eißen. Natürliche Graphpartitionierung am Beispiel von

Aufgabenmodellen in Unternehmensnetzwerken. Master’s thesis, Uni-

versity of Paderborn, 2000.

[112] R. Michalski, J. G. Carbonell, and T. Mitchell, editors. Machine Learning:

An Artificial Intelligence Approach, volume 1. Tioga, Palo Alto, California,

1983.

[113] R. S. Michalski, I. Bratko, and M.Kubat, editors. Machine Learning & Data

Mining: Methods and Applications. John Wiley, 1998.

[114] B. Mirkin. Mathematical Classification and Clustering. Kluwer Academics

Publishers, 1996.

[115] S. Mitchell Hedetniemi, E. Cockayne, and S. Hedetniemi. Linear Algo-

rithms for Finding the Jordan Center and Path Center of a Tree. Trans-

portation Science, 15:98–114,1981.

[116] R. H. Myers. Classical and Modern Regression with Applications. Duxbury

Press, Boston, 1986.

[117] H. Nagamochi, T. Ono, and T. Ibaraki. Implementing an efficient mini-

mum capacity cut algorithm. Mathematical Programming, 67, 1994.

[118] NetFactual. http://www.netfactual.com/, January 2001.

146

BIBLIOGRAPHY

[119] F. J. Newbery. Edge concentration: A method for clustering directed

graphs. Software Engineering Notes, ACM Press, 14(5), 1997.

[120] A. Newell. IPL-V Programmer’s Manual. Rand Corporation, rm-3739-rc

edition, 1963.

[121] R. Ng and J. Han. Efficient and effective clustering methods for spatial

data mining. Proc. of the VLDB Conference, Santiago, Chile, 1994.

[122] O. Niggemann and B. Stein. Visualization by example. In ProceedingsAd-

aptivität und Benutzermodellierung in interaktiven Softwaresystemen (ABIS-

2000), 2000.

[123] O. Niggemann, B. Stein, and M. Suermann. On Resource-based

Configuration—Rendering Component-PropertyGraphs. In J.Sauerand

B. Stein, editors, 12. Workshop “Planen und Konfigurieren”, tr-ri-98-193,

Paderborn, Apr. 1998. University of Paderborn, Department of Mathe-

matics and Computer Science.

[124] O. Niggemann, B. Stein, and J. Tölle. Visualization of network traffic. In

IEEE International Conference on Communications (ICC 2001), 2001.

[125] O. Nigggemann and B. Stein. A meta heuristic for graph drawing. Pro-

ceedings of the Working Conference on AdvancedVisual Interfaces (AVI 2000),

2000.

[126] B. Nour-Omid, A. Raefsky, and G. Lyzengaomid:1986. Solving finite el-

ement equations on concurrent computers. In Parallel computers and thier

impact on mechanics. AM. Soc. Mech. Eng., New York, 1986.

[127] The network simulator - ns-2. http://www.isi.edu/nsnam/ns/.

[128] D. Ostry. Drawing graphs on convex surfaces. Master’s thesis, Dept. of

Computer Science, University of Newcastle, 1996.

[129] T. Ozawa and H. Takahashi. Graph Theory and Algorithms, volume 108 of

Lecture Notes in Computer Science, chapter A Graph-planarization Algo-

rithm and its Application to Random Graphs. Springer-Verlag,1981.

[130] Y.-H. Pao. Adaptive Pattern Recognition and Neural Networks. Addison-

Wesley, Reading, MA, 1989.

[131] A. Papakostas and I. Tollis. Algorithms for area-efficient orthogonal

drawings. Comp. Geom. Theory Appl., 9(1-2), 1998.

[132] J. Park, M. Lappe, and S. A. Teichmann. Mapping protein family interac-

tions. submitted, 2000.

[133] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible

Inference. Morgan Kaufmann, 1988.

[134] K. Pearson. Contribution to the mathematical theory of evolution. Phil.

Trans., 1894.

BIBLIOGRAPHY

147

[135] X. L. Peter Eades and W. F. Smyth. A fast and effictive heuristic for the

feedback arc set problem. In Information Processing Letters 47, Elsevier

Science Publishers, 1993.

[136] A. Plana. http://altaplana.com/olap/.

[137] Proceedings Adaptivität und Benutzermodellierung in interaktiven Software-

systemen (ABIS-2000),2000.

[138] A. Reckmann. Ähnlichkeitsmaße und deren Parametrisierung für die

fallbasierte Diagnose am Beispiel einer medizinischen Anwendung.

Master’s thesis, University of Paderborn, 1999.

[139] E. Reingold and J. Tilford. Tidier drawing of trees. IEEE Transactions on

Software Engineering, 7(2):223–228,1981.

[140] J. Rekimoto and M. Green. The information cube: Using transparency in

3dinformattionvisualization. InProc.3rdAnnualWorkshoponInformation

Technologies and Systems, 1993.

[141] J. Ritsert, E. Stracke, and F. Heider. Grundzüge der Varianz- und Faktoren-

analyse. Campus Verlag, 1976.

[142] S. J. Roberts. Parametic and non-parametric unsupervised cluster ana-

lyis. Pattern Recognition, April 1996.

[143] T. Roxborough and Arunabha. Graph Clustering using Multiway Ra-

tio Cut. In S. North, editor, Graph Drawing, Lecture Notes in Computer

Science, Springer Verlag, 1996.

[144] M. R"umekasten. ATLAS - Eine Kurzeinf"uhrung. University of Pader-

born, March 1992.

[145] R. Sablowski andA. Frick. Automatic GraphClustering. In S.North, edi-

tor, Graph Drawing, Lecture Notes in Computer Science, Springer Verlag,

1996.

[146] S. L.Salzberg. A nearesthyperrectangle learningmethod. MachineLearn-

ing, 1991.

[147] G. Sander. Graph Layout through the VCG Tool. Technical Report

A/03/94, 1994.

[148] G. Sander. Layout of Compound Directed Graphs. Technical Report

A/03/96, 1996.

[149] A. Schulz. Graphenanalyse hydraulischer Schaltkreise zur Erkennung

von hydraulischen Achsen und deren Kopplung. Master’s thesis, Uni-

versity of Paderborn, 1997.

[150] Structural classification of proteins. http://scop.mrc-

lmb.cam.ac.uk/scop/index.html.

[151] B. Seck. Ein Simulationsmodel zur Auswertung aktiver Komponenten

in einem Konfigurationssystem für Rechnernetze. Master’s thesis, Uni-

versity of Paderborn, 1997.

148

BIBLIOGRAPHY

[152] B. Shneiderman. Tree visualizations with treemaps: A 2d space-filling

approach. ACM Transactions on Graphics, 11(1), 1992.

[153] A. Shoshani. Olap and statistical databases: Similarities and differences.

In Proc. ACM PODS, 1997.

[154] P. Sneath. The application of computers to taxonomy. J. Gen. Microbiol.,

17, 1957.

[155] P. Sneath and R. Sokal. Principles of Numerical Taxonomy. W.H. Freeman

Comp., 1973.

[156] H. Sp"ath. Cluster-Analyse-Algorithmen. R. Oldenbourg Verlag, 1975.

[157] C. Stanfill and D. Waltz. Toward memory-based learning. Communica-

tions of the ACM, 29:1213–1228,1986.

[158] B. Stein. Supporting Hydraulic Circuit Design by Efficiently Solving the

Model Synthesis Problem. In International ICSC Symposium on Engineer-

ing of Intelligent Systems (EIS 98), pages 1274–1280.ICSI Academic Press,

Feb. 1998.

[159] B. Stein and M. Hoffmann. On adaptation in case-based design. In Pro-

ceedings of the third International ICSC Symposia on Soft Computing (SOCO

’99). ICSC Academic Press, 1998.

[160] B. Stein andO.Niggemann. 25.WorkshoponGraphTheory, chapterOn the

Nature of Structure and its Identification. Lecture Notes on Computer

Science, LNCS. Springer, Ascona, Italy, July 1999.

[161] B. Stein and O. Niggemann. Generation of similarity measures from dif-

ferent sources. The Fourteenth International Conference on Industrial

and Engineering Applications of Artificial Intelligence and Expert Sys-

tems (IEA/AIE-2001),2000.

[162] B. Stein, O. Niggemann, and U. Husemeier. Learning Complex Similar-

ity Measures. In Jahrestagung der Gesellschaft für Klassifikation, Bielefeld,

Germany, 1999. Springer.

[163] B. Stein and A. Schulz. Topological analysis of hydraulic systems. Tech-

nical Report tr-ri-98-197,University of Paderborn, 1998.

[164] B. Stein and E. Vier. Recent Advances in Information Science and Technology,

chapter An Approach to Formulate and to Process Design Knowledge

in Fluidics. Second Part of the Proceedings of the 2nd IMACS Interna-

tional Conference on Circuits, Systems and Computers, CSC ’98. World

Scientific, London, 1998.

[165] P. Struss. Fundamentals of Model-BasedDiagnosis of Dynamic Systems.

In Proceedings of the 15th International Joint Conference on Artificial Intelli-

gence, Nagoya, Japan, 1997.

[166] K. Sugiyama and K. Misue. Visualization of structural information: Au-

tomatic drawing of compound graphs. IEEE Transactions on Systems,

Man, and Cybernetics, 1991.

BIBLIOGRAPHY

149

[167] K. Sugiyama and K. Misue. Graph drawing by magnetic-spring model.

J. Viusal Lang. Comp., 6(3), 1995.

[168] K. Sugiyama and K. Misue. A simple and unified method for draw-

ing graphs. In R. Tamassia and I. Tollis, editors, Proc. Graph Drawing.

Springer, 1995.

[169] K. Supowit and K. Misue. The complexity of drawing trees nicely. Acta

Informatica, 18:359–368,1983.

[170] A. S. Tanenbaum. Computer Networks. Prentice Hall, 1996.

[171] L. Technologies, R. B. Systems, and SGI. Im-

proved customer control through churn management.

http://www.sgi.com/software/mineset/tech_info/churn.html.

[172] W. Togerson. Multidimensional scaling: I. theory and method. Psychome-

trika, 1952.

[173] E. Tufte. The Visual Display of Quantitative Information. Graphcs Press,

1983.

[174] J. Tukey. Exploratory Data Analysis. Addison-Wesly, 1977.

[175] D. Tunkelang. A Numerical Optimization Approachto General Graph Draw-

ing. PhD thesis, School of Computer Science, Carnegie Mellon Univer-

sity, Pittsburgh, 1999.

[176] J. Ullman. Priciples of Database and Knowledge-Based Systems, Vol. I/II.

Computer Science Press, 1988.

[177] K. Urahama. Unsupervised learning algorithm for fuzzy clustering. IE-

ICE Trans. Inf. and Syst., E76-D(3), 1993.

[178] A. Wagner. Optimierungsverfahren für das Graph-Layout Problem.

Master’s thesis, Universität Paderborn, 2001.

[179] X. Wang and I. Miyamoto. Generating customized layouts. In Graph

Drawing. Springer, 1995.

[180] S. Wess. Fallbasiertes Problemlösen in wissensbasierten Systemen

zur Entscheidungsunterstützung und Diagnostik: Grundlagen, Systeme

und Anwendungen. Technical report, Sankt Augustin: Infix, 1996.

[181] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Clarendon Press, 1965.

[182] R. Wilson and M. Span. A new approach to clustering. Pattern Recogni-

tion, 23(12), 1990.

[183] J. Wise, J. Thomas, K. Pennock, and D. Lantrip. Visualizing the non-

visual: Spatial analysis and interaction with information from text docu-

ments. In Proc. IEEE Inf. Visualization, 1995.

[184] T.Wonnacott and R.Wonnacott. Regression: asecond course in statistics.

John Wiley & Sons, New York, Chichester/Brisbane/Toronto, 1981.

150

BIBLIOGRAPHY

[185] Z.WuandR.Leahy. Anoptimalgraphtheoreticapproachtodatacluster-

ing: Theory and its application to image segmentation. IEEE Transactions

on Pattern Analysis and Machine Intelligence, November 1993.

[186] J.-T. Yan and P.-Y. Hsiao. A fuzzy clustering algorithm for graph bisec-

tion. Information Processing Letters, 52, 1994.

[187] J. York and S. Bohn. Clustering and dimensionality reduction in spire. In

Automated Intelligence Processing and Analysis Symposium, 1995.

[188] C. T. Zahn. Graph-theoretical methods for detecting and describing

gestalt clusters. IEEE Transactions on computers, C-20(1),1971.