Maintaining Semantic Networks: Challenges and Algorithms [original]

Maintaining Semantic Networks:

Challenges and Algorithms

Bernd Michelberger

University of Applied Sciences

Ravensburg-Weingarten,

Weingarten, Germany

bernd.michelberger@

hs-weingarten.de

Klaus Ulmschneider

University of Ulm, Institute of

Artificial Intelligence,

Ulm, Germany

klaus.ulmschneider@

uni-ulm.de

Birte Glimm

University of Ulm, Institute of

Artificial Intelligence,

Ulm, Germany

birte.glimm@

uni-ulm.de

Bela Mutschler

University of Applied Sciences

Ravensburg-Weingarten,

Weingarten, Germany

bela.mutschler@

hs-weingarten.de

Manfred Reichert

University of Ulm, Institute of

Databases and Information

Systems, Ulm, Germany

manfred.reichert@

uni-ulm.de

ABSTRACT

Knowledge workers are confronted with a massive load of

data from numerous heterogeneous sources, making it dif-

ficult for them to identify the information relevant for per-

forming their tasks. Particularly challenging is the align-

ment of information with business processes. In previous

work, we introduced a Semantic Network (SN) for bridging

this gap, i.e., for identifying relations between information

and business processes. What has been neglected so far is

the maintenance of an SN in order to keep the SN consis-

tent, complete, and up-to-date. This paper tackles this issue

and extends our approach with algorithms dealing with the

maintenance of an SN. For this purpose, we identify and

classify properties of objects and relations captured in an

SN and show how these properties can be maintained. We

use a case from the automotive domain in order to demon-

strate and validate the feasibility and applicability of our

maintenance framework.

Categories and Subject Descriptors

I.2.4 [Computing methodologies]: Artificial intelligence

—Knowledge Representation Formalisms and Methods

General Terms

Algorithms

Keywords

Semantic network, evolution, maintenance

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are not

made or distributed for profit or commercial advantage and that copies bear

this notice and the full citation on the first page. Copyrights for components

of this work owned by others than ACM must be honored. Abstracting with

credit is permitted. To copy otherwise, or republish, to post on servers or to

redistribute to lists, requires prior specific permission and/or a fee. Request

permissions from [email protected].

iiWAS ’14, December 04 - 06 2014, Hanoi, Viet Nam

http://dx.doi.org/10.1145/2684200.2684287

1. INTRODUCTION

Knowledge workers and decision makers are confronted

with a continuously increasing load of data during their

daily work. Examples include e-mails, office files, checklists,

guidelines, fact sheets, web pages, and best practices. In

daily practice, knowledge workers and decision makers are

not only interested in getting access to this data, but also re-

quire comprehensive and aggregated information when per-

forming specific tasks in a business process context [19].

Handling such information is by far more complex and time-

consuming than just storing data because such information

can, for example, be incomplete, incorrect, unreliable, un-

structured, or outdated [18, 25].

A particular challenge is to align information with busi-

ness processes and to provide relevant information to knowl-

edge workers and decision makers. In practice, information

is not only stored in distributed and heterogeneous sources,

but also managed separately from business processes. For

example, shared drives, databases, enterprise portals, and

enterprise information systems are used to store informa-

tion. In turn, business processes and their tasks are man-

aged using process-aware information systems [24].

In such an environment, information and business pro-

cesses are often linked manually and statically, e.g., in en-

terprise portals. However, it has been shown that such en-

terprise portals rather contain complex and static content

(e.g., long lists of documents, large process maps) confusing

users [9]. Therefore, it is challenging for users to identify

relations between information and business processes as re-

quired when performing a specific process task.

In previous work, we introduced a Semantic Network (SN)

bridging the gap between information and business processes

[16]. Such an SN can be created using a bottom-up ap-

proach, i.e., starting with the integration of information and

business processes from heterogeneous sources [20]. Follow-

ing this, the integrated information and business processes

are syntactically and semantically analyzed. The resulting

SN comprises unified information objects (e.g., checklists,

guidelines, forms), process objects (e.g., pools, lanes, tasks,

gateways, events), and semantic relations (e.g., “is similar

to”, “is used by”). More specifically, information objects are

needed when performing business processes. In turn, pro-

cess objects are process elements such as tasks or gateways

that guide process-oriented work. Finally, semantic relations

allow identifying inter-linked objects in different ways, e.g.,

information objects referring to the same topic or objects

required for performing a particular process task.

An SN constitutes the basis for delivering relevant in-

formation to knowledge workers and decision makers (cf.

Fig. 1). More specifically, an SN offers an application in-

terface (cf. [8] for details) that may be queried to retrieve

required information. Based on a query (e.g., “information

objects relevant for creating a review”), the SN automati-

cally delivers respective information objects to users [20].

Semantic Relation

Information Object

Process Object

Delivering relevant

information objects

Users

Figure 1: Delivering relevant information objects

In order to provide those information objects to users fit-

ting their demands best, information and process objects in

an SN must be up-to-date, complete, and consistent, i.e.,

an SN must be maintained over time. In two case stud-

ies and an online survey [10, 17] we have already shown

that the maintenance of SNs constitutes a crucial prerequi-

site in order to continuously provide required information

to knowledge workers and decision makers. Maintenance is,

however, a non-trivial task. For example, objects may be

integrated (e.g., new guidelines are created), updated (e.g.,

a task is modified), and deleted (e.g., a checklist is no longer

valid). Likewise, relations may be established (e.g., two doc-

uments have the same author), updated (e.g., two forms

become more similar to each other), and deleted (e.g., two

documents no longer having the same author). Such changes

can happen outside of an SN (e.g., a checklist has changed

on a shared drive), which we call exogenous changes, or in-

side an SN over time (e.g., a guideline being out-of-date),

which we call endogenous changes. Both exogenous and en-

dogenous changes must be properly handled (cf. Fig. 2).

Exogenous

Changes Endogenous

Changes

Data

Source

Figure 2: Exogenous and endogenous changes

Picking up the above mentioned challenges on the mainte-

nance of SNs, the contribution is as follows: (i) We propose

a framework for maintaining SNs. Specifically, we show how

SNs evolve over time and identify characteristics of object

and relation properties and their influence with respect to

the maintenance of SNs. (ii) We introduce three algorithms

dealing with exogenous and endogenous changes of an SN.

We examine feasibility and cost of our algorithms by a proof-

of-concept implementation and show by an empirical evalua-

tion in the automotive domain that automatic maintenance

of SNs is essential and practical at the same time.

The paper is organized as follows: We next introduce the

preliminaries (cf. Section 2) and then present the mainte-

nance framework in Section 3. Section 4 validates the cost

of the algorithms and presents a case study demonstrating

the applicability of the algorithms. Finally, we discuss re-

lated work (cf. Section 5) and conclude (cf. Section 6).

2. PRELIMINARIES

An SN constitutes a labeled and weighted directed graph

whose vertices represent objects and the (labeled) edges

represent the semantic relations between the objects with

weights indicating the relevance of the relations. A weight

is expressed in terms of a number ranging between 0 and 1,

with 1 indicating the strongest possible semantic relation.

Definition 1. ASemantic Network SN is a tuple (V, E, L,

W, fl, fw), where Vis a set of vertices such that each v∈V

represents an information object or a process object; Eis

a multiset of edges such that each edge e= (v, v0)∈E,

v, v0∈Vand v6=v0, represents a relation between such

objects. The function fl:E→Llabels each edge e∈E

with an edge label from the set of labels L. Furthermore, the

function fw:E→Wassigns a weight from the set of weights

Wto each edge e∈E. Given an edge e= (v, v0)∈E, we

call vthe source and v0the destination of e.

Given a vertex v∈V, the internal neighborhood of v, de-

noted Γ−(v), is the set of vertices {v0|(v0, v)∈E}. Anal-

ogously, for v∈V, the external neighborhood of v, denoted

Γ+(v), is the set of vertices {v0|(v, v0)∈E}. Then, the

total neighborhood of v∈Vis the union of the internal and

external neighborhood of v, denoted Γ(v).

The incoming degree of a vertex v∈Vis the number of

incoming edges and the outgoing degree is the number of

its outgoing edges. The total degree of vis the sum of its

incoming and outgoing degree.

For example, given two edges e= (v, v0), e0= (v0, v00 )∈E,

v, v0, v00 ∈V, we call van internal neighbor of v0and v00 an

external neighbor of v0. Thus, the total degree of v0is 2.

Note that we often refer to vertices as objects (e.g., infor-

mation and process objects) and to edges as relations of an

SN. We next define properties for vertices and edges.

Definition 2. Each vertex v∈Vand each edge e∈E

has a set of properties P(v) and P(e), respectively, where

each p∈P(v)∪P(e) is a pair (key, val). We denote key

as the unique name and val as the value of pand write

key(v) (key(e)) to denote val.

In order to create an SN, business processes and pieces of

information, possibly from different data sources (e.g., pro-

cess repositories, shared drives), are transformed into pro-

cess and information objects (cf. Figs. 3(a) and 3(b)), each

represented by a vertex and its according properties. The

transformation ensures that proprietary formats (e.g., office

formats) are converted into a generic and uniform format,

which allows for analyzing the objects in the SN.

After that, objects in an SN are syntactically and se-

mantically analyzed to detect their semantic relations (cf.

Fig. 3(c)) [8]. First, properties such as authorship are com-

pared (syntactic analysis), e.g., to link objects with the same

author. Second, the properties of the objects are analyzed

(semantic analysis). For this purpose, algorithms from the

fields of data mining, text mining (e.g., text preprocessing,

linguistic preprocessing, clustering, classification, informa-

tion extraction), pattern-matching, and machine learning

(e.g., supervised learning, unsupervised learning, reinforce-

ment learning, transduction) are applied [11, 26] to further

classify and group correlated objects.

Information

Object

Process

Object

Semantic Relation

Business Processes

Information

(a)

(b)

(c)

Figure 3: Schematic creation of an SN

Generally, semantic relations in an SN may exist between

information objects (e.g., a guideline similar to another one)

or process objects (e.g., an event triggering a sub-process).

Additionally, semantic relations exist between information

and process objects as well (e.g., an instruction required for

executing a specific process task). A detailed description of

the creation process is given in [20].

3. MAINTAINING SEMANTIC NETWORKS

This section introduces our maintenance framework. First,

we show how an SN evolves over time (cf. Section 3.1). Sec-

ond, we illustrate why properties of objects and relations are

important (cf. Section 3.2). Third, we present a collection

of algorithms for maintaining SNs (cf. Section 3.3).

3.1 Evolution

Evolution of SNs is driven by exogenous and endogenous

changes. We further distinguish between evolution in depth

and breadth [21]: Depth is defined by the size of all property

values in an SN, i.e., the amount of information stored within

all objects. Breadth is defined as the number of relations in

an SN, i.e., the cardinality of the set of edges.

Depth can be increased by adding objects (e.g., new doc-

uments on a shared drive), adding properties (e.g., adding

keywords to an existing document), or by updating values of

properties (e.g., a property is described in more detail) (cf.

Fig. 4(a)). In turn, deleting objects and properties decreases

the depth of an SN (cf. Fig. 4(b)). Note, that updates of

property values can decrease depth as well. Breadth can be

increased by adding relations (e.g., a new link between two

objects) (cf. Fig. 4(c)), while deleting relations (e.g., two ob-

jects no longer have the same author) decreases breadth (cf.

Fig. 4(d)). Hence, depth and breadth are indicators for the

cost of performing maintenance operations such as adding

an object or a relation to an SN (cf. Section 4).

(a) increase depth (b) decrease depth (c) increase breadth (d) decrease breadth

Figure 4: SN evolution in depth and breadth

In order to formalize such maintenance operations in an

SN, we next define suitable actions, which can consider both

exogenous and endogenous changes.

Definition 3. An action changes an SN. Each action ahas

a set of parameters PA(a), where each pa ∈PA(a) is a pair

(key, val). We call key the unique name and val the value of

pa and write key(a) to denote val. A parameter pa is either

mandatory or optional. If pa with key key is mandatory,

then, for each action a, there exists a value valasuch that

(key, vala)∈PA(a).

Typical mandatory parameters of an action are a unique

identifier uri (e.g., a URL) and the function func (e.g., add,

update, delete) to be executed. Actions are triggered by ex-

ogenous or endogenous changes (cf. Fig. 5), e.g., if a docu-

ment on a shared drive is deleted (an exogenous change) or

if a document is outdated (an endogenous change).

Endogenous Changes

PA(a1) = {(uri, 1), (func, delete)};

PA(a2) = {(uri, 2), (func, add)};

PA(a3) = {(uri, 2), (func, update)};

Changed SN

Changes Exogenous Changes

Figure 5: Actions changing an SN

For example, consider an engineer in the automotive do-

main conducting a review of product requirements docu-

mented in functional specifications. The goal is to improve

as well as to approve such specifications. Due to a revision

of the review process, an employee from the quality man-

agement department replaced an outdated review template.

Thus, an action a1was triggered with uri(a1) = “H:/temp-

lates/review-v1.xls” and func(a1) = “delete”. Thereby, an-

other action a2was triggered with uri(a2) = “H:/temp-

lates/review-v2.xls” and func(a2) = “add”. However, based

on new guidelines the engineer noticed that the template was

incomplete (e.g., a required question was missing). There-

fore, the engineer adapts the template. Thus, another action

a3is triggered with uri(a3) = “H:/templates/review-v2.xls”

and func(a3) = “update”. Thereby, we ensure by actions

that an SN is maintained.

3.2 Property Classification

As mentioned above, maintaining SNs requires not only

to consider the object-relation-level, but also the properties

of objects and relations. Furthermore, if, for example, the

author of a document has changed, it is not necessary to up-

date the entire object but only relevant parts, i.e., the value

of the property author and according relations. One can ob-

serve that some properties change over time (e.g., filesize),

whereas others do not (e.g., uri). This can be exploited in

maintenance algorithms by focusing only on properties that

are relevant for a particular operation. To capture this, we

categorize properties as follows:

Definition 4. Properties are classified into two categories:

existence and mutability.Existence expresses whether a

property is mandatory or optional, where a property pwith

key key is mandatory for the vertices V(edges E) of an SN

if, for each v∈V(e∈E), there exists a value valv(vale)

such that (key, valv)∈P(v) ((key, vale)∈P(e)) and it is

optional otherwise. Mutability expresses whether a prop-

erty’s value is dynamic or static, where pis dynamic if val

in (key, val) can change over time and it is static otherwise.

For example, for v∈V, typical mandatory properties are

a unique identifier uri, a data source source, a creation date

cdate, a modification date mdate, and a content cont. The

categories, existence and mutability, can be combined into a

matrix comprising four blocks (a, b, c, d) to which we assign

the properties (cf. Fig. 6).

(c)

mandatory/static

(a)

mandatory/dynamic

(d)

optional/static

(b)

optional/dynamic

Existence

Mutubility

Figure 6: Property classification

In the following, we illustrate the assignment of individual

properties to different blocks with examples:

(a) mandatory/dynamic: Some properties are always part

of an SN and are dynamic. For example, the modifica-

tion date mdate changes with every update of an ob-

ject or a relation. The content cont or the total degree

deg of an object can change over time as well. There-

fore, these properties are mandatory and dynamic.

(b) optional/dynamic: The title of a document can change

over time, but some filetypes (e.g., a text file) do not

have a title. Therefore, the title of a document can be

considered as optional and dynamic.

cdate exists for all objects and relations and therefore

is mandatory. Since these properties do not change

over time, they can be considered as static.

(d) optional/static: If a property does not change over

time and does not exist for every object or relation it is

called optional/static, e.g., the filetype of a document.

Based on the property classification we infer the follow-

ing for adding, updating, and deleting elements of an SN:

One must ensure that static/mandatory properties (c) are

given as a minimum requirement when adding elements (cf.

Fig. 7(a)). When deleting elements one has to consider prop-

erties within all blocks (a, b, c, d) (cf. Fig. 7(b)). When

executing updates, only properties which are not assigned

to the mandatory/static block (a, b, d) must be considered

(cf. Fig. 7(c)). Note that the grey background color in Fig. 7

indicates affected blocks for each function (cf. Section 3.3).

(c)

mandatory/static

(a)

mandatory/dynamic

(d)

optional/static

(b)

optional/dynamic

(a) add-function

(c)

mandatory/static

(a)

mandatory/dynamic

(d)

optional/static

(b)

optional/dynamic

(b) delete-function

(c)

mandatory/static

(a)

mandatory/dynamic

(d)

optional/static

(b)

optional/dynamic

Figure 7: Property classification and functions

3.3 Algorithms

We next specify functions (add,delete, and update) used

in our algorithms that perform maintenance operations.

The add-function adds a vertex vadd and its properties

to a Semantic Network SN and determines which semantic

relations exist between vadd and existing vertices. As men-

tioned above, mandatory/static properties are the minimum

input parameter for the add-function.

Function add(SN, vadd)

Require:SN = (V, E, L, W, fl, fw) an SN,

vadd the vertex to be added incl. its properties P(vadd);

Ensure:SN is updated;

V:= V∪ {vadd};

foreach v∈Vdo

if uri(v)6=uri(vadd)then

E:= E∪ {new edge/s between vand vadd};

The delete-function deletes a vertex vdel in a Semantic

Network SN including existing semantic relations between

vdel and its total neighborhood. Note that the function con-

siders all blocks of the property classification, i.e., all prop-

erties of vdel are deleted.

Function delete(SN, vdel)

Require:SN = (V, E, L, W, fl, fw) an SN,

vdel the vertex to be deleted incl. its properties P(vdel);

Ensure:SN is updated;

v:= get v∈Vs.t. uri(v) = uri(vdel);

E:= E\ {(v, γ),(γ, v)|γ∈Γ(v)};

V:= V\ {v};

The update-function takes a vertex vupd as input, which

is used to update the vertex vin the SN that is identified

by the same uri as vupd. The function also adds, deletes,

and updates semantic relations between the updated vertex

vand existing vertices. Note that mandatory/static prop-

erties are not considered in case of objects.

Function update(SN, vupd)

Require:SN = (V, E, L, W, fl, fw) an SN,

vupd the vertex used to update the SN incl. its

properties P(vupd);

Ensure:SN is updated;

P(vupd) := {p∈P(vupd)|pis not mandatory/static};

v:= get v∈Vs.t. uri(v) = uri(vupd);

foreach (key,val)∈P(v)do

if (key,valupd)∈P(vupd)then

val := valupd;

P(vupd) := P(vupd)\ {(key,valupd)};

else

P(v) := P(v)\ {(key,val)};

P(v) := P(v)∪P(vupd);

foreach v0∈Vdo

if v0∈Γ(v)then

E:= update edge/s betw. v0and v;

E:= E\ {obsolete edge/s between v0and v};

if uri(v0)6=uri(v)then

E:= E∪ {new edge/s between v0and v};

Based on these functions, we propose three algorithms for

maintaining SNs. The maintenance is based on two main

principles: the push- and the pull-principle. The former

can be applied to both exogenous and endogenous changes,

whereas the latter can only be applied to exogenous changes.

With the push-principle, the data source pushes informa-

tion and business processes automatically to an SN when

they are added, updated, or deleted within the data source.

However, with regard to exogenous changes, prerequisite is

that the data source is able to send notifications if informa-

tion and/or business processes have been changed. Regard-

ing endogenous changes, the prerequisite is that the SN de-

tects changes automatically and triggers respective actions.

With the pull-principle, an SN gathers information and

business processes from a data source. Such a maintenance

process is triggered by time-based schedulers, i.e., the SN is

maintained at a certain point in time. The principle is used

for data sources which are not capable of sending change

notifications (e.g., a document has changed) to the SN.

However, the use of a specific principle depends on the ca-

pabilities of a data source (whether the data source is able

to send change notifications to the SN or not). For exam-

ple, for an enterprise information system which is capable of

sending notifications we use the push-principle whereas for

a shared drive we use the pull-principle.

For each of these two principles, we introduce a corre-

sponding algorithm (cf. Fig. 8). Prerequisite for both prin-

ciples is that the SN has access to underlying data sources.

In case of exogenous changes the SN transforms information

and business processes into a uniform format. In case of

endogenous changes no transformation is necessary.

Push-Principle

Push-Algorithm

add delete update

Pull-Principle

Pull-Algorithm

add delete update

Principles

Algorithms

Functions

Figure 8: Push- and Pull-Algorithm

3.3.1 Push-Algorithm

The Push-Algorithm deals with changes of an SN based

on the push-principle, e.g., a policy is no longer valid in 2014

and the corresponding object in the SN has to be maintained

accordingly. Thus, the maintenance of the SN is triggered by

an action that is applied to the SN by the Push-Algorithm.

The algorithm works as follows: In the add and update

case, we create a vertex vincluding its properties from the

data source affected by the action a(i.e., based on the uri of

the action). After that, we call the corresponding function.

In the delete case, we identify the corresponding vertex v∈

Vbased on the uri of the action and call the according

function. Hence, the Push-Algorithm applies endogenous

and exogenous changes to the SN by actions.

3.3.2 Pull-Algorithm

The Pull-Algorithm deals with changes of an SN based on

the pull-principle, i.e., data has changed in the data source

and needs to be gathered by the SN. For example, docu-

ments on a shared drive are updated and, therefore, respec-

tive changes have to be made in the SN. As aforementioned,

the maintenance of the SN is triggered by a scheduler.

Algorithm 1: Push-Algorithm

Require:SN = (V, E, L, W, fl, fw) an SN,

aan action;

Ensure:SN is updated;

switch func(a) do

case add

v:= create a vertex incl. its properties from

the data source affected by the uri of a;

add(SN, v);

break;

case update

v:= create a vertex incl. its properties from

the data source affected by the uri of a;

update(SN, v);

break;

case delete

v:= get v∈Vs.t. uri(v) = uri(a);

delete(SN, v);

The algorithm works as follows: First, we create a set of

vertices Vds from a data source ds. After that, for each ver-

tex v∈Vin the SN that was created from ds (property

source(v)), we check if a corresponding vertex vds ∈Vds ex-

ists. If this is the case, we check if vds is newer than v(e.g.,

by comparing the creation and/or modification dates). If v

is out-of-date, it is updated with the properties of vds by call-

ing update(SN, vds). After that, vds is removed from Vds. If

no corresponding vertex exists in the data source, we delete

the vertex (v) in the SN using the delete(SN, v) function.

Finally, we add each remaining vertex vds ∈Vds from the

data source to the SN by calling add(SN, vds), which leaves

the SN synchronized with the data source. Hence, the Pull-

Algorithm allows for maintaining the SN at a certain point

in time. Note that the process has to be repeated for each

data source that has to be synchronized.

Algorithm 2: Pull-Algorithm

Require:SN = (V, E, L, W, fl, fw) an SN,

ds the data source;

Ensure:SN is updated;

Vds := create a set of vertices incl. their properties

from the data source ds;

foreach v∈Vs.t. source(v) = ds do

vds := get vds ∈Vds s.t. uri(vds) = uri(v);

if vds then

if vds is newer than vthen

update(SN, vds);

Vds := Vds \ {vds}

else

delete(SN, v);

foreach vds ∈Vds do

add(SN, vds);

3.3.3 Partial-Pull-Principle and Algorithm

In practice, SNs can comprise a large amount of objects

and relations. Maintaining SNs using the pull-principle can,

therefore, be a very time-consuming task. In a specific work

context, a user might, however, only be interested in a se-

lected part of the SN. For example, during a review, review

templates, review minutes, existing reviews, or even results

of a real-time evaluation (e.g., prioritization of projects in

a workshop) are of great importance, while checklists and

best practices for performing effective project management

are less interesting. Thus, it is sufficient to maintain only

these objects and relations that are relevant to the user

when querying the SN. To capture this, we introduce a fur-

ther principle, called the partial-pull-principle, where the SN

gathers only information and business processes from data

sources as requested by a user. These (and only these ob-

jects) are then updated on demand.

In contrast to the other principles, the partial-pull-princi-

ple is completely user-driven because it is triggered by a user

request, whereas the push- and pull-principle are machine-

driven, e.g., through notifications from other enterprise in-

formation systems or schedulers.

Based on the partial-pull-principle, we introduce a third

algorithm (cf. Fig. 9) as a lightweight version of the Pull-

Algorithm. It does not maintain the entire SN, but only the

parts which are relevant for a given request.

Partial-Pull-Principle

Partial-Pull-Algorithm

add delete update

Principle

Algorithm

Functions

Figure 9: Partial-Pull-Algorithm

The algorithm works as follows: First, we create a set of

vertices Vds from affected data sources according to the user

request req. Then, for each vertex v∈Vaffected by req we

retrieve the corresponding vertex vds from the affected data

sources. Thus, if a corresponding vertex vds is in the data

source, we check if vds is newer than v(e.g., by compar-

ing the creation and/or modification dates) and if vis out-

of-date, it is updated with vds by calling update(SN, vds).

If there is no corresponding vertex in the data source, we

delete the vertex (v) in the SN using the delete(SN, v) func-

tion. Hence, the Partial-Pull-Algorithm allows for maintain-

ing parts of an SN based on a user request and ensures that

all requested objects including their properties are synchro-

nized with affected data sources.

4. VALIDATION

To prove that our algorithms are able to successfully main-

tain SNs, we implemented them and evaluated their perfor-

mance in consideration of depth and breadth. We further

evaluated the necessity and application of our algorithms in

a real-world case study in the automotive domain.

Our validation is guided by three research questions:

RQ1 Is automatic maintenance of SNs feasible in consider-

ation of exogenous and endogenous changes?

RQ2 How do depth and breadth affect the runtime of main-

tenance operations?

RQ3 How essential is automatic maintenance of SNs?

Algorithm 3: Partial-Pull-Algorithm

Require:SN = (V, E, L, W, fl, fw) an SN,

req the request to an SN;

Ensure:SN is partially updated;

Vreq contains the requested vertices;

Vds := create a set of vertices from the

data sources affected by req;

foreach v∈Vaffected by req do

vds := get vds ∈Vds s.t. uri(vds) = uri(v);

if vds then

if vds is newer than vthen

update(SN, vds);

vupd := get v0∈Vs.t. uri(v0) = uri(vds);

Vreq := Vreq ∪ {vupd};

else

Vreq := Vreq ∪ {v};

else

delete(SN, v);

4.1 Implementation and Configuration

Driven by these research questions, we implemented a pro-

totype installed on a laptop with an Intel quad-core CPU,

16 GB RAM, and a Windows 7 64-bit operating system.

The prototype is a web-based Java (JRE7)/Scala (2.10)1

application realized as a 3-tier architecture.

The presentation layer uses the web application frame-

work Play,2the Twitter Bootstrap framework,3the Data-

Driven Documents (D3) library,4jQuery,5HTML 5 tem-

plates, and Cascading Style Sheets (CSS) 3. We created

our SNs using the semantic middleware iQser GIN Server

(2.0.0.36) [26]. In addition, we developed several Java open-

source plugins6on the logic layer. The data layer is realized

by a Lucene search index7and a MySQL database.8

Based on our property classification (cf. Section 3.2) we

configured objects and relations of our prototype. As manda-

tory/static object properties we choose cdate,uri, and source

(cf. Fig. 10(a)). Optional properties are filetype and title.

While the filetype does not change over time (static), the

title can vary (dynamic), but it might not be available for

every filetype (e.g., text file) and, therefore, it is optional.

Mandatory dynamic properties are the buzz (activity on ob-

ject), cont,deg, and mdate.

Analogous to object properties, uri and cdate are manda-

tory for relations (cf. Fig. 10(b)). However, relations have

additional mandatory properties such as the source vertex

and destination vertex. The label of an edge is also manda-

tory, while its relevance (weight) can vary, e.g., changing

cont may affect the weight of “is similar to” relations. Addi-

tionally, the reason of a relation, which describes why a re-

lation has been established (e.g., a particular method which

detected a “is similar to” relation), is static and optional.

1http://www.scala-lang.org/

2http://www.playframework.com/

3http://getbootstrap.com/

4http://d3js.org/

5http://jquery.com/

6http://sourceforge.net/directory/?q=nipro

7http://lucene.apache.org/

8http://www.mysql.com/

We split our validation into two parts: The first part ad-

dresses the technical perspective (cf. Section 4.2), i.e., the

influence of depth and breadth on our algorithms, whereas

the second part evaluates their application in an empirical

case study in the automotive domain (cf. Section 4.3).

(c)

cdate, source, uri

(a)

buzz, cont,

deg, mdate

(d)

filetype

(b)

title

(c)

cdate, destination

vertex, label,

source vertex, uri

(a)

mdate, weight

(d)

reason

(b)

(a) objects (b) relations

Figure 10: Property classification implementation

Therefore, we created two SNs based on the configuration

mentioned above: one with synthetic text files to evaluate

the runtime behavior of the algorithms and one with real-

world documents from the automotive domain to evaluate

the functional perspective of our maintenance framework.

Both SNs were created with the following relations: “is au-

thor of”, “has same title as”, and “is similar to”. The suc-

cessful implementation and initial tests have already shown

that automatic maintenance of SNs is feasible in general.

Therefore, we now examine the runtime in consideration of

depth and breadth. Finally, we evaluate the necessity and

practicability of automatic maintenance of SNs in a business

environment.

4.2 Technical Validation

In order to address RQ1 and RQ2, we created SNs contain-

ing 5, 50 and 500 objects once with small (1 KB) and once

with larger (100 KB) files. To obtain comparable results

for our measurements, all objects within an SN were identi-

cal (e.g., identical property cont). Therefore, we simulated

the worst-case scenario regarding performance: Every object

was connected with every other object in each SN yielding

40, 4,900 and 499,000 relations. Note, only “is similar to”

and “is author of” relations were detected since text files do

not have the property title and therefore no “has same ti-

tle as” relations were recognized. Based on the initial SNs,

we performed operations (add, update, delete) with a single

object using our Push- and Pull-Algorithm (cf. Section 3.3).

Each combination of these dimensions represented one case

to be examined with regard to small and large files (36 cases

in total, e.g., updating an object with a size of 1 KB through

the Push-Algorithm in an SN containing 5 objects).

For each case, the shown numbers in the diagrams (cf.

Figs. 11-13) constitute averages over three warm runs. Warm

runs were chosen to ensure comparability of the measured

values since the iQser GIN Server [26] performs several initial

background tasks on start-up. Note the logarithmic scale

used in the diagrams.

Fig. 11 shows that objects can be added to an SN in linear

time. Compared to the number of objects within an SN, the

depth has a bigger influence on the runtime as shown by the

comparison with objects of different size (1 KB vs. 100 KB).

Detecting relations between the added and existing objects

is polynomial in the number of vertices. However, note that

the actual performance of the algorithms also depends on the

property values of vertices. For example, the value of cont

(i.e., the size of the property in bytes) affects the analysis of

similarity relations between the added and existing vertices.

100

1,000

10,000

100,000

5 5 5 5 50 50 50 50 500 500 500 500

push pull push pull push pull push pull push pull push pull

1 KB 1 KB 100 KB 100 KB 1 KB 1 KB 100 KB 100 KB 1 KB 1 KB 100 KB 100 KB

time in ms

no. of objects / algorithm / filesize

relation

object

Figure 11: Effect of depth and breadth on additions

As shown in Fig. 12 update operations perform similarly.

In contrast to the integration of objects, however, some op-

erations are not required (e.g., for mandatory static proper-

ties). Comparisons between existing properties take place

instead (e.g., to check whether a property has changed).

Nevertheless, we could not detect significant differences con-

cerning cost between add and update operations on identical

initial situations.

100

1,000

10,000

100,000

5 5 5 5 50 50 50 50 500 500 500 500

push pull push pull push pull push pull push pull push pull

1 KB 1 KB 100 KB 100 KB 1 KB 1 KB 100 KB 100 KB 1 KB 1 KB 100 KB 100 KB

time in ms

no. of objects / algorithm / filesize

relation

object

Figure 12: Effect of depth and breadth on updates

In contrast to add and update operations, delete opera-

tions perform differently. While the deletion of objects can

be done in linear time, the deletion of relations varies sig-

nificantly with the size of the objects. The reason is that

all references, which form the basis of a relation (e.g., ex-

tracted concepts and co-occurrences of a specific object in

case of “is similar to” relations) must be also deleted. Note

that such “housekeeping” tasks are controlled by the iQser

GIN Server. In turn, this might be the reason for differences

in measured values resulting in unexpected outcomes. For

example, the cost for deleting an object with a size of 1 KB

out of 5 objects was higher than deleting an object with a

size of 100 KB out of 5 objects (cf. Fig. 13). Furthermore,

measurements with the programming language Java can be

less accurate compared to native programming languages

(e.g., the Java garbage collector cannot be disabled).

Despite the fact that the implemented technology might

cause inaccuracies in measurements, we were still able to

verify that maintenance cost with our algorithms is highly

dependent on depth and breadth (RQ2). However, external

components (e.g., a component for the semantic analysis of

object properties) and their computation cost can influence

the performance of maintenance operations significantly. In

contrast, the consideration of our property classification (cf.

Section 3.2) can have positive effects on performance if only

necessary operations (e.g., only updating properties which

are not mandatory/static) are executed.

100

1,000

10,000

5 5 5 5 50 50 50 50 500 500 500 500

push pull push pull push pull push pull push pull push pull

1 KB 1 KB 100 KB 100 KB 1 KB 1 KB 100 KB 100 KB 1 KB 1 KB 100 KB 100 KB

time in ms

no. of objects / algorithm / filesize

relation

object

Figure 13: Effect of depth and breadth on deletes

Altogether, the Push- and the Pull-Algorithm both ensure

a synchronized SN over affected data sources after comple-

tion. Thereby, automatic maintenance is feasible (RQ1).

4.3 Empirical Validation

As proposed by Yin [27] and Kitchenham et al. [12], we

performed a case study to evaluate RQ3 in a“typical”project

setting with knowledge workers and decision makers from

several innovation departments in the automotive domain.

Thus, all participants were involved in knowledge-intensive

business processes. The participants were selected based on

their expert knowledge regarding the considered case. No

participant was a member of the research team.

The case study was performed in July 2014 with five de-

cision makers and six knowledge workers from a large au-

tomotive manufacturer. It was conducted based on a ques-

tionnaire comprising 60 questions.9Each interview lasted

around 90 minutes.

First, the participants had to answer general questions

about their current work environment and information hand-

ling within their processes. Afterwards they had to perform

tasks with our prototype and answer questions based on

these tasks with respect to maintenance (i.e., add, update,

delete, and search for objects and validate property values

as well as detected relations). The underlying corpus (data

sources) contained 333 internal documents from their field

of interest (e.g., scientific papers from departments dealing

with technology monitoring and technology development).

Therefore, the users were familiar with the information rep-

resented in the SN and able to judge the containing infor-

mation and its quality on a certain level.

In order to generalize the results and to gain further in-

sights, we asked the users some concluding questions when

all tasks had been completed. More specifically, RQ1 was

addressed from the user’s perspective by the tasks with as-

sociated questions, whereas RQ3 was investigated by the

introducing and concluding questions.

9Available at http://nipro.hs-weingarten.de/casestudy

The initial questions about information handling in the

users’ current work environment revealed that, with the ex-

ception of personal contacts (e.g., in meetings, phone calls),

information is mostly handled and accessed in a digital way.

Information is, however, mostly not well-structured and of-

ten distributed in different sources (e.g., information systems

or shared drives), which makes it difficult to search for it.

These statements are endorsed by the fact that a significant

amount of information is stored in files.

Given the fact of enormous file usage, a consequential chal-

lenge working with files in a business environment is keeping

track of changes (cf. Fig. 14). This is mainly substantiated

by the dynamics and amount of information available as well

as a lack of technological assistance (e.g., search over differ-

ent data sources, notifications on changed content).

5lower quartile

minimum

median

maximum

upper quartile

Question: The information needed for my

daily business is very dynamic.

1: completely disagree, 2: disagree, 3: neutral,

4: agree, 5: completely agree

(a) Information dynamics

5lower quartile

minimum

median

maximum

upper quartile

Question: Based on my current work

environment I can easily identify

interdependencies between information

from different data sources.

1: completely disagree, 2: disagree, 3: neutral,

4: agree, 5: completely agree

(b) Information overview

Figure 14: Information characteristics

In order to address these challenges (e.g., dynamics of in-

formation in distributed sources, heterogeneous filetypes),

we introduced our prototype (which captures such dynam-

ics) and asked the participants to perform tasks with the

offered graphical user interface (e.g., check if property val-

ues were updated, validate the correctness of a relation).

In the concluding questions all participants stated that

every task with our algorithms could be completed success-

fully. Since objects, relations, and properties were adapted

based on exogenous and endogenous changes, we can confirm

that automatic maintenance of SNs is feasible in practice.

Pull/Add Push/Add Pull/Delete Push/Delete

Question: Updating the SN was easy.

1: completely disagree, 2: disagree, 3: neutral,

4: agree, 5: completely agree

lower quartile

minimum

median

maximum

upper quartile

Figure 15: Simplicity of maintenance for users

However, some users could not recognize any relation be-

tween the document they added to the SN and others. While

no “is author of” and “has same title as” relation existed, the

“is similar to” relations were not shown. For more sophis-

ticated user experience we filtered these relations according

to a hardcoded weight-threshold with respect to the overall

amount of “is similar to” relations. Nevertheless, we could

verify with according database entries that the relations were

set and take this as an input for further user interface devel-

opment by allowing users to set the threshold dynamically.

Additionally, we asked the users about their preference be-

tween push and pull. All users preferred the Push-Algorithm

since the effects of their tasks are reflected immediately in

the user interface. By contrast, the Pull-Algorithm always

has a delay since it is triggered at a certain point in time.

A synchronization every three minutes caused obviously too

much delay for some users, even when performing other tasks

in between. However, the Partial-Pull-Algorithm, which ad-

dresses this downside, received positive feedback from the

participants. Note that the Partial-Pull Algorithm only con-

siders objects currently existing in an SN.

Pull/Add Push/Add

Question: My effort updating the SN was low.

1: completely disagree, 2: disagree, 3: neutral,

4: agree, 5: completely agree

lower quartile

minimum

median

maximum

upper quartile

Figure 16: Maintenance effort for users

We interpret the disadvantage of the Pull-Algorithm as

a confirmation of RQ3 (automatic maintenance is essential)

and recommend the Push-Algorithm if technology permits

offering an improved real-time-experience to users. For the

Pull-Algorithm, the update intervals can be shortened, de-

pending on the number of files in a data source. Neverthe-

less, the delay of the Pull-Algorithm, as observed by the

users, is caused by the configured time interval of imple-

mented schedulers. However, its performance compared to

the Push-Algorithm is almost identical (cf. Section 4.2).

Concluding, we asked the users about their impression

concerning benefits for their daily work in consideration of

SN maintenance. All users stated that integrating and con-

necting distributed information was easy when using the

prototype (cf. Fig. 15) and could be done with reasonable

effort (cf. Fig. 16). Additionally they confirmed the benefit

of their daily work (cf. Fig. 17).

Pull/Update Push/Update

Synchronization: Automatic updates of

objects, relations and properties are useful.

1: completely disagree, 2: disagree, 3: neutral,

4: agree, 5: completely agree

lower quartile

minimum

median

maximum

upper quartile

Figure 17: User perspective on auto. maintenance

More than 90% of the participants also stated that an SN

provides an enhanced overview on information and that a

maintained SN is desirable. In particular, this would be a

benefit compared to distributed data sources. One user sum-

marized: “Maintenance and corresponding updates should

be automatic. No user interaction for maintenance should

be required. Users should focus on working with the SN.”

Asking about use cases for a maintained SN, both, knowl-

edge workers and decision makers recognized further poten-

tials of the SN to support their daily work. For example,

an expert search for people assigned to objects (e.g., au-

thors) and their relations with each other as well as decision

support scenarios (e.g., detecting patterns within the SN).

4.4 Discussion

We have shown that automatic maintenance of SNs with

regard to both exogenous and endogenous changes is feasible

with acceptable cost (RQ1). Furthermore, we examined the

effect of depth and breadth on the runtime of the proposed

algorithms (RQ2). Both algorithms performed satisfactorily

in terms of adding, updating, and deleting objects. The cost

of detecting relations, however, varies widely when expensive

linguistic and/or statistical algorithms are used.

All users appreciated a unified single point of information

access, which is up-to-date and makes distributed informa-

tion searchable. Many of the interviewed users already ex-

perienced a well maintained SN as an enabler for various

use cases (e.g., expert search, gap analysis). In particu-

lar, knowledge workers and decision makers can benefit from

maintained SNs. Both groups emphasized that relations be-

tween objects can be interesting for various purposes (e.g.,

navigation within an SN or decision support).

Our case study results on the deficits of information hand-

ling in current business environments (cf. Section 4.3) have

shown that it becomes increasingly crucial to provide up-

to-date, integrated, and homogenous views on enterprise in-

formation. The empirical validation confirms that there is

a high demand for a single point of information access in

knowledge-intensive processes. Finally, we verified that a

maintained SN is not only essential for such processes, but

a practicable solution as well (RQ3).

5. RELATED WORK

Generally, various types of SNs exist. Examples include

associative networks [5], topic networks [22], fact networks

[23], or ontologies [13]. They usually have a special field of

application (e.g., semantic search, reasoning). All of them

(including our approach) have in common, that they have to

be maintained. Depending on the expressiveness, however,

maintenance effort varies widely. Usually the higher the ex-

pressiveness, the higher the creation and maintenance effort

[23]. In practice, most of the above mentioned SNs have to

be maintained manually. However, several semi-automatic

maintenance approaches exist such as ontology maintenance

using textual analysis [7], ontology mapping maintenance

support [3], or SN integrity maintenance [2].

In turn, further approaches such as data warehousing [14],

enterprise architecture management (EAM) [6], or search

engines [15] have to deal with maintenance as well. However,

manual effort is usually necessary (e.g., schema mapping)

[1]. For example, the literature study of Farwick et al. [4]

confirms that there exists no related work addressing the

problem of automated EA maintenance.

Our SN represents information in a unified, integrated,

user-, and machine-interpretable form. Unlike existing main-

tenance approaches, our approach focuses on the integration

of distributed and heterogeneous information and business

processes including the detection of their interdependencies.

We showed that no manual effort is needed to maintain such

an SN, but manual effort can increase the expressiveness,

i.e., better conceptualization and delimitation of concepts

(e.g., for enhanced detection of relations).

6. SUMMARY AND OUTLOOK

This paper presented a framework for maintaining SNs,

i.e., adapting exogenous and endogenous changes. There-

fore, we presented three algorithms. Furthermore, our prop-

erty classification allows maintaining SNs efficiently.

Our validation confirms that our maintenance framework

is able to keep SNs synchronized with underlying sources in

reasonable time. Moreover, we applied our algorithms to a

real-world case, i.e., validated them based on an implemen-

tation in the automotive domain.

In future, we will improve our algorithms to support self-

learning components with a focus on endogenous changes.

7. REFERENCES

[1] P. Bernstein and L. Haas. Information integration in

the enterprise. Communications of the ACM, pages

72–79, 2008.

[2] T. ˇ

Capek. Semantic network integrity maintenance via

heuristic semi-automatic tests. In Proc. of the

RASLAN Workshop 2009, pages 63–67. Masaryk

University, 2009.

[3] D. Dinh, J. Dos Reis, C. Pruski, M. Da Silveira, and

C. Reynaud-Delaˆıtre. Identifying relevant concept

attributes to support mapping maintenance under

ontology evolution. J. of Web Semantics: Science,

Services and Agents on the World Wide Web, 2014.

[4] M. Farwick, B. Agreiter, R. Breu, S. Ryll, K. Voges,

and I. Hanschke. Requirements for automated

enterprise architecture model maintenance - a

requirements analysis based on a literature review and

an exploratory survey. In Proc. of the 13th Int’l Conf.

on Enterprise Information System (ICEIS’11), pages

325–337. SciTePress, 2011.

[5] N. V. Findler. Associative Networks: The

Representation and Use of Knowledge of Computers.

Academic Press, 1979.

[6] R. Fischer, S. Aier, and R. Winter. A federated

approach to enterprise architecture model

maintenance. In Proc. of the 2nd Int’l Workshop on

Enterprise Modelling and Information Systems

Architectures (EMISA’07), pages 9–22. Springer, 2007.

[7] Y. Gargouri, B. Lefebvre, and J.-G. Meunier.

Ontology maintenance using textual analysis. In Proc.

of the 7th World Multi-Conf. on Systemics,

Cybernetics and Informatics (SCI’03). International

Institute of Informatics and Systemics, 2003.

[8] M. Hipp, B. Michelberger, B. Mutschler, and

M. Reichert. A framework for the intelligent delivery

and user-adequate visualization of process information.

In Proc. of the 28th Symp. on Applied Computing

(SAC’13), pages 1383–1390. ACM Press, 2013.

[9] M. Hipp, B. Mutschler, B. Michelberger, and

M. Reichert. Navigating in process model repositories

and enterprise process information. In Proc. of the 8th

Int’l Conf. on Research Challenges in Information

Science (RCIS’14). IEEE Computer Society, 2014.

[10] M. Hipp, B. Mutschler, and M. Reichert. On the

context-aware, personalized delivery of process

information: Viewpoints, problems, and requirements.

In Proc. of the 6th Int’l Conf. on Availability,

Reliability and Security (ARES’11), pages 390–397.

IEEE Computer Society, 2011.

[11] A. Hotho, A. N¨

urnberger, and G. Paaß. A brief survey

of text mining. LDV Forum, 20(1):19–62, 2005.

[12] B. Kitchenham, L. Pickard, and S. L. Pfleeger. Case

studies for method and tool evaluation. IEEE

Software, 12(4):52–62, 1995.

[13] L. W. Lacy. OWL: Representing Information Using the

Web Ontology Language. Trafford Publishing, 2005.

[14] J. Lechtenb¨

orger. Data warehouse schema design.

Akademische Verlagsgesellschaft AKA, PhD Thesis,

University of M¨

unster, 2001.

[15] D. Lewandowski. Web searching, search engines and

information retrieval. Information Services & Use,

18(3), 2005.

[16] B. Michelberger, B. Mutschler, M. Hipp, and

M. Reichert. Determining the link and rate popularity

of enterprise process information. In Proc. of the 21st

Int’l Conf. on Cooperative Information Systems

(CoopIS’13), pages 112–129. Springer, 2013.

[17] B. Michelberger, B. Mutschler, and M. Reichert. On

handling process information: Results from case

studies and a survey. In Proc. of the 2nd Int’l

Workshop on Empirical Research in Business Process

Management (ER-BPM’11), pages 333–344. Springer,

2011.

[18] B. Michelberger, B. Mutschler, and M. Reichert.

Towards process-oriented information logistics: Why

quality dimensions of process information matter. In

Proc. of the 4th Int’l Workshop on Enterprise

Modelling and Information Systems Architectures

(EMISA’11), pages 107–120. Koellen-Verlag, 2011.

[19] B. Michelberger, B. Mutschler, and M. Reichert. A

context framework for process-oriented information

logistics. In Proc. 15th Int’l Conf. on Business

Information Systems (BIS’12), pages 260–271.

Springer, 2012.

[20] B. Michelberger, B. Mutschler, and M. Reichert.

Process-oriented information logistics: Aligning

enterprise information with business processes. In

Proc. of the 16th Int’l EDOC Conf. (EDOC’12), pages

21–30. IEEE Computer Society, 2012.

[21] S. Oertelt and K. Ulmschneider. Prozessintegrierter

Einsatz virtueller Methoden im strategischen

Technologie- und Innovationsmanagement. In Proc. of

KnowTech - Wissensmanagement und Social Media -

Markterfolg im Innovationswettbewerb, pages 485–498.

GITO, 2013.

[22] J. Park and S. Hunting. XML Topic Maps: Creating

and Using Topic Maps for the Web. Addison-Wesley

Professional, 2002.

[23] K. Reichenberger. Kompendium semantische Netze:

Konzepte, Technologie, Modellierung. Springer, 2010.

[24] M. Reichert and B. Weber. Enabling Flexibility in

Process-Aware Information Systems: Challenges,

Methods, Technologies. Springer, 2012.

[25] J. Rowley. The wisdom hierarchy: representations of

the DIKW hierarchy. J. of Information Science,

33(2):163–180, 2007.

[26] J. Wurzer. New approach for semantic web by

automatic semantics. In Proc. of the 2nd European

Semantic Technology Conf. (ESCT’08), 2008.

[27] R. K. Yin. Case Study Research: Design and Methods.

Sage Publications, 2009.