Maintaining Semantic Networks:
Challenges and Algorithms
Bernd Michelberger
University of Applied Sciences
Ravensburg-Weingarten,
Weingarten, Germany
bernd.michelberger@
hs-weingarten.de
Klaus Ulmschneider
University of Ulm, Institute of
Artificial Intelligence,
Ulm, Germany
klaus.ulmschneider@
uni-ulm.de
Birte Glimm
University of Ulm, Institute of
Artificial Intelligence,
Ulm, Germany
birte.glimm@
uni-ulm.de
Bela Mutschler
University of Applied Sciences
Ravensburg-Weingarten,
Weingarten, Germany
bela.mutschler@
hs-weingarten.de
Manfred Reichert
University of Ulm, Institute of
Databases and Information
Systems, Ulm, Germany
manfred.reichert@
uni-ulm.de
ABSTRACT
Knowledge workers are confronted with a massive load of
data from numerous heterogeneous sources, making it dif-
ficult for them to identify the information relevant for per-
forming their tasks. Particularly challenging is the align-
ment of information with business processes. In previous
work, we introduced a Semantic Network (SN) for bridging
this gap, i.e., for identifying relations between information
and business processes. What has been neglected so far is
the maintenance of an SN in order to keep the SN consis-
tent, complete, and up-to-date. This paper tackles this issue
and extends our approach with algorithms dealing with the
maintenance of an SN. For this purpose, we identify and
classify properties of objects and relations captured in an
SN and show how these properties can be maintained. We
use a case from the automotive domain in order to demon-
strate and validate the feasibility and applicability of our
maintenance framework.
Categories and Subject Descriptors
I.2.4 [Computing methodologies]: Artificial intelligence
—Knowledge Representation Formalisms and Methods
General Terms
Algorithms
Keywords
Semantic network, evolution, maintenance
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
iiWAS ’14, December 04 - 06 2014, Hanoi, Viet Nam
Copyright 2014 ACM 978-1-4503-3001-5/14/12 ...$15.00.
http://dx.doi.org/10.1145/2684200.2684287
1. INTRODUCTION
Knowledge workers and decision makers are confronted
with a continuously increasing load of data during their
daily work. Examples include e-mails, office files, checklists,
guidelines, fact sheets, web pages, and best practices. In
daily practice, knowledge workers and decision makers are
not only interested in getting access to this data, but also re-
quire comprehensive and aggregated information when per-
forming specific tasks in a business process context [19].
Handling such information is by far more complex and time-
consuming than just storing data because such information
can, for example, be incomplete, incorrect, unreliable, un-
structured, or outdated [18, 25].
A particular challenge is to align information with busi-
ness processes and to provide relevant information to knowl-
edge workers and decision makers. In practice, information
is not only stored in distributed and heterogeneous sources,
but also managed separately from business processes. For
example, shared drives, databases, enterprise portals, and
enterprise information systems are used to store informa-
tion. In turn, business processes and their tasks are man-
aged using process-aware information systems [24].
In such an environment, information and business pro-
cesses are often linked manually and statically, e.g., in en-
terprise portals. However, it has been shown that such en-
terprise portals rather contain complex and static content
(e.g., long lists of documents, large process maps) confusing
users [9]. Therefore, it is challenging for users to identify
relations between information and business processes as re-
quired when performing a specific process task.
In previous work, we introduced a Semantic Network (SN)
bridging the gap between information and business processes
[16]. Such an SN can be created using a bottom-up ap-
proach, i.e., starting with the integration of information and
business processes from heterogeneous sources [20]. Follow-
ing this, the integrated information and business processes
are syntactically and semantically analyzed. The resulting
SN comprises unified information objects (e.g., checklists,
guidelines, forms), process objects (e.g., pools, lanes, tasks,
gateways, events), and semantic relations (e.g., “is similar
to”, “is used by”). More specifically, information objects are
needed when performing business processes. In turn, pro-
cess objects are process elements such as tasks or gateways
that guide process-oriented work. Finally, semantic relations
allow identifying inter-linked objects in different ways, e.g.,
information objects referring to the same topic or objects
required for performing a particular process task.
An SN constitutes the basis for delivering relevant in-
formation to knowledge workers and decision makers (cf.
Fig. 1). More specifically, an SN offers an application in-
terface (cf. [8] for details) that may be queried to retrieve
required information. Based on a query (e.g., “information
objects relevant for creating a review”), the SN automati-
cally delivers respective information objects to users [20].
SN
Semantic Relation
Information Object
Process Object
Delivering relevant
information objects
Users
Figure 1: Delivering relevant information objects
In order to provide those information objects to users fit-
ting their demands best, information and process objects in
an SN must be up-to-date, complete, and consistent, i.e.,
an SN must be maintained over time. In two case stud-
ies and an online survey [10, 17] we have already shown
that the maintenance of SNs constitutes a crucial prerequi-
site in order to continuously provide required information
to knowledge workers and decision makers. Maintenance is,
however, a non-trivial task. For example, objects may be
integrated (e.g., new guidelines are created), updated (e.g.,
a task is modified), and deleted (e.g., a checklist is no longer
valid). Likewise, relations may be established (e.g., two doc-
uments have the same author), updated (e.g., two forms
become more similar to each other), and deleted (e.g., two
documents no longer having the same author). Such changes
can happen outside of an SN (e.g., a checklist has changed
on a shared drive), which we call exogenous changes, or in-
side an SN over time (e.g., a guideline being out-of-date),
which we call endogenous changes. Both exogenous and en-
dogenous changes must be properly handled (cf. Fig. 2).
Exogenous
Changes Endogenous
Changes
SN
Data
Source
Figure 2: Exogenous and endogenous changes
Picking up the above mentioned challenges on the mainte-
nance of SNs, the contribution is as follows: (i) We propose
a framework for maintaining SNs. Specifically, we show how
SNs evolve over time and identify characteristics of object
and relation properties and their influence with respect to
the maintenance of SNs. (ii) We introduce three algorithms
dealing with exogenous and endogenous changes of an SN.
We examine feasibility and cost of our algorithms by a proof-
of-concept implementation and show by an empirical evalua-
tion in the automotive domain that automatic maintenance
of SNs is essential and practical at the same time.
The paper is organized as follows: We next introduce the
preliminaries (cf. Section 2) and then present the mainte-
nance framework in Section 3. Section 4 validates the cost
of the algorithms and presents a case study demonstrating
the applicability of the algorithms. Finally, we discuss re-
lated work (cf. Section 5) and conclude (cf. Section 6).
2. PRELIMINARIES
An SN constitutes a labeled and weighted directed graph
whose vertices represent objects and the (labeled) edges
represent the semantic relations between the objects with
weights indicating the relevance of the relations. A weight
is expressed in terms of a number ranging between 0 and 1,
with 1 indicating the strongest possible semantic relation.
Definition 1. ASemantic Network SN is a tuple (V, E, L,
W, fl, fw), where Vis a set of vertices such that each v∈V
represents an information object or a process object; Eis
a multiset of edges such that each edge e= (v, v0)∈E,
v, v0∈Vand v6=v0, represents a relation between such
objects. The function fl:E→Llabels each edge e∈E
with an edge label from the set of labels L. Furthermore, the
function fw:E→Wassigns a weight from the set of weights
Wto each edge e∈E. Given an edge e= (v, v0)∈E, we
call vthe source and v0the destination of e.
Given a vertex v∈V, the internal neighborhood of v, de-
noted Γ−(v), is the set of vertices {v0|(v0, v)∈E}. Anal-
ogously, for v∈V, the external neighborhood of v, denoted
Γ+(v), is the set of vertices {v0|(v, v0)∈E}. Then, the
total neighborhood of v∈Vis the union of the internal and
external neighborhood of v, denoted Γ(v).
The incoming degree of a vertex v∈Vis the number of
incoming edges and the outgoing degree is the number of
its outgoing edges. The total degree of vis the sum of its
incoming and outgoing degree.
For example, given two edges e= (v, v0), e0= (v0, v00 )∈E,
v, v0, v00 ∈V, we call van internal neighbor of v0and v00 an
external neighbor of v0. Thus, the total degree of v0is 2.
Note that we often refer to vertices as objects (e.g., infor-
mation and process objects) and to edges as relations of an
SN. We next define properties for vertices and edges.
Definition 2. Each vertex v∈Vand each edge e∈E
has a set of properties P(v) and P(e), respectively, where
each p∈P(v)∪P(e) is a pair (key, val). We denote key
as the unique name and val as the value of pand write
key(v) (key(e)) to denote val.
In order to create an SN, business processes and pieces of
information, possibly from different data sources (e.g., pro-
cess repositories, shared drives), are transformed into pro-
cess and information objects (cf. Figs. 3(a) and 3(b)), each
represented by a vertex and its according properties. The
transformation ensures that proprietary formats (e.g., office
formats) are converted into a generic and uniform format,
which allows for analyzing the objects in the SN.
After that, objects in an SN are syntactically and se-
mantically analyzed to detect their semantic relations (cf.
Fig. 3(c)) [8]. First, properties such as authorship are com-
pared (syntactic analysis), e.g., to link objects with the same
author. Second, the properties of the objects are analyzed
(semantic analysis). For this purpose, algorithms from the
fields of data mining, text mining (e.g., text preprocessing,
linguistic preprocessing, clustering, classification, informa-
tion extraction), pattern-matching, and machine learning
(e.g., supervised learning, unsupervised learning, reinforce-
ment learning, transduction) are applied [11, 26] to further
classify and group correlated objects.
Information
Object
Process
Object
Semantic Relation
SN
Business Processes
Information
(a)
(b)
(c)
Figure 3: Schematic creation of an SN
Generally, semantic relations in an SN may exist between
information objects (e.g., a guideline similar to another one)
or process objects (e.g., an event triggering a sub-process).
Additionally, semantic relations exist between information
and process objects as well (e.g., an instruction required for
executing a specific process task). A detailed description of
the creation process is given in [20].
3. MAINTAINING SEMANTIC NETWORKS
This section introduces our maintenance framework. First,
we show how an SN evolves over time (cf. Section 3.1). Sec-
ond, we illustrate why properties of objects and relations are
important (cf. Section 3.2). Third, we present a collection
of algorithms for maintaining SNs (cf. Section 3.3).
3.1 Evolution
Evolution of SNs is driven by exogenous and endogenous
changes. We further distinguish between evolution in depth
and breadth [21]: Depth is defined by the size of all property
values in an SN, i.e., the amount of information stored within
all objects. Breadth is defined as the number of relations in
an SN, i.e., the cardinality of the set of edges.
Depth can be increased by adding objects (e.g., new doc-
uments on a shared drive), adding properties (e.g., adding
keywords to an existing document), or by updating values of
properties (e.g., a property is described in more detail) (cf.
Fig. 4(a)). In turn, deleting objects and properties decreases
the depth of an SN (cf. Fig. 4(b)). Note, that updates of
property values can decrease depth as well. Breadth can be
increased by adding relations (e.g., a new link between two
objects) (cf. Fig. 4(c)), while deleting relations (e.g., two ob-
jects no longer have the same author) decreases breadth (cf.
Fig. 4(d)). Hence, depth and breadth are indicators for the
cost of performing maintenance operations such as adding
an object or a relation to an SN (cf. Section 4).
(a) increase depth (b) decrease depth (c) increase breadth (d) decrease breadth
Figure 4: SN evolution in depth and breadth
In order to formalize such maintenance operations in an
SN, we next define suitable actions, which can consider both
exogenous and endogenous changes.
Definition 3. An action changes an SN. Each action ahas
a set of parameters PA(a), where each pa ∈PA(a) is a pair
(key, val). We call key the unique name and val the value of
pa and write key(a) to denote val. A parameter pa is either
mandatory or optional. If pa with key key is mandatory,
then, for each action a, there exists a value valasuch that
(key, vala)∈PA(a).
Typical mandatory parameters of an action are a unique
identifier uri (e.g., a URL) and the function func (e.g., add,
update, delete) to be executed. Actions are triggered by ex-
ogenous or endogenous changes (cf. Fig. 5), e.g., if a docu-
ment on a shared drive is deleted (an exogenous change) or
if a document is outdated (an endogenous change).
Endogenous Changes
SN
PA(a1) = {(uri, 1), (func, delete)};
PA(a2) = {(uri, 2), (func, add)};
PA(a3) = {(uri, 2), (func, update)};
1
5
3
4
Changed SN
2
54
3
Changes Exogenous Changes
Figure 5: Actions changing an SN
For example, consider an engineer in the automotive do-
main conducting a review of product requirements docu-
mented in functional specifications. The goal is to improve
as well as to approve such specifications. Due to a revision
of the review process, an employee from the quality man-
agement department replaced an outdated review template.
Thus, an action a1was triggered with uri(a1) = “H:/temp-
lates/review-v1.xls” and func(a1) = “delete”. Thereby, an-
other action a2was triggered with uri(a2) = “H:/temp-
lates/review-v2.xls” and func(a2) = “add”. However, based
on new guidelines the engineer noticed that the template was
incomplete (e.g., a required question was missing). There-
fore, the engineer adapts the template. Thus, another action
a3is triggered with uri(a3) = “H:/templates/review-v2.xls”
and func(a3) = “update”. Thereby, we ensure by actions
that an SN is maintained.
3.2 Property Classification
As mentioned above, maintaining SNs requires not only
to consider the object-relation-level, but also the properties
of objects and relations. Furthermore, if, for example, the
author of a document has changed, it is not necessary to up-
date the entire object but only relevant parts, i.e., the value
of the property author and according relations. One can ob-
serve that some properties change over time (e.g., filesize),
whereas others do not (e.g., uri). This can be exploited in
maintenance algorithms by focusing only on properties that
are relevant for a particular operation. To capture this, we
categorize properties as follows:
Definition 4. Properties are classified into two categories:
existence and mutability.Existence expresses whether a
property is mandatory or optional, where a property pwith
key key is mandatory for the vertices V(edges E) of an SN
if, for each v∈V(e∈E), there exists a value valv(vale)
such that (key, valv)∈P(v) ((key, vale)∈P(e)) and it is
optional otherwise. Mutability expresses whether a prop-
erty’s value is dynamic or static, where pis dynamic if val
in (key, val) can change over time and it is static otherwise.
For example, for v∈V, typical mandatory properties are
a unique identifier uri, a data source source, a creation date
cdate, a modification date mdate, and a content cont. The
categories, existence and mutability, can be combined into a
matrix comprising four blocks (a, b, c, d) to which we assign
the properties (cf. Fig. 6).
(c)
mandatory/static
(a)
mandatory/dynamic
(d)
optional/static
(b)
optional/dynamic
Existence
Mutubility
Figure 6: Property classification
In the following, we illustrate the assignment of individual
properties to different blocks with examples:
(a) mandatory/dynamic: Some properties are always part
of an SN and are dynamic. For example, the modifica-
tion date mdate changes with every update of an ob-
ject or a relation. The content cont or the total degree
deg of an object can change over time as well. There-
fore, these properties are mandatory and dynamic.
(b) optional/dynamic: The title of a document can change
over time, but some filetypes (e.g., a text file) do not
have a title. Therefore, the title of a document can be
considered as optional and dynamic.
(c) mandatory/static: An identifier uri or a creation date
cdate exists for all objects and relations and therefore
is mandatory. Since these properties do not change
over time, they can be considered as static.
(d) optional/static: If a property does not change over
time and does not exist for every object or relation it is
called optional/static, e.g., the filetype of a document.
Based on the property classification we infer the follow-
ing for adding, updating, and deleting elements of an SN:
One must ensure that static/mandatory properties (c) are
given as a minimum requirement when adding elements (cf.
Fig. 7(a)). When deleting elements one has to consider prop-
erties within all blocks (a, b, c, d) (cf. Fig. 7(b)). When
executing updates, only properties which are not assigned
to the mandatory/static block (a, b, d) must be considered
(cf. Fig. 7(c)). Note that the grey background color in Fig. 7
indicates affected blocks for each function (cf. Section 3.3).
(c)
mandatory/static
(a)
mandatory/dynamic
(d)
optional/static
(b)
optional/dynamic
(a) add-function
(c)
mandatory/static
(a)
mandatory/dynamic
(d)
optional/static
(b)
optional/dynamic
(b) delete-function
(c)
mandatory/static
(a)
mandatory/dynamic
(d)
optional/static
(b)
optional/dynamic
(c) update-function
Figure 7: Property classification and functions
3.3 Algorithms
We next specify functions (add,delete, and update) used
in our algorithms that perform maintenance operations.
The add-function adds a vertex vadd and its properties
to a Semantic Network SN and determines which semantic
relations exist between vadd and existing vertices. As men-
tioned above, mandatory/static properties are the minimum
input parameter for the add-function.
Function add(SN, vadd)
Require:SN = (V, E, L, W, fl, fw) an SN,
vadd the vertex to be added incl. its properties P(vadd);
Ensure:SN is updated;
V:= V∪ {vadd};
foreach v∈Vdo
if uri(v)6=uri(vadd)then
E:= E∪ {new edge/s between vand vadd};
The delete-function deletes a vertex vdel in a Semantic
Network SN including existing semantic relations between
vdel and its total neighborhood. Note that the function con-
siders all blocks of the property classification, i.e., all prop-
erties of vdel are deleted.
Function delete(SN, vdel)
Require:SN = (V, E, L, W, fl, fw) an SN,
vdel the vertex to be deleted incl. its properties P(vdel);
Ensure:SN is updated;
v:= get v∈Vs.t. uri(v) = uri(vdel);
E:= E\ {(v, γ),(γ, v)|γ∈Γ(v)};
V:= V\ {v};
The update-function takes a vertex vupd as input, which
is used to update the vertex vin the SN that is identified
by the same uri as vupd. The function also adds, deletes,
and updates semantic relations between the updated vertex
vand existing vertices. Note that mandatory/static prop-
erties are not considered in case of objects.
Function update(SN, vupd)
Require:SN = (V, E, L, W, fl, fw) an SN,
vupd the vertex used to update the SN incl. its
properties P(vupd);
Ensure:SN is updated;
P(vupd) := {p∈P(vupd)|pis not mandatory/static};
v:= get v∈Vs.t. uri(v) = uri(vupd);
foreach (key,val)∈P(v)do
if (key,valupd)∈P(vupd)then
val := valupd;
P(vupd) := P(vupd)\ {(key,valupd)};
else
P(v) := P(v)\ {(key,val)};
P(v) := P(v)∪P(vupd);
foreach v0∈Vdo
if v0∈Γ(v)then
E:= update edge/s betw. v0and v;
E:= E\ {obsolete edge/s between v0and v};
if uri(v0)6=uri(v)then
E:= E∪ {new edge/s between v0and v};
Based on these functions, we propose three algorithms for
maintaining SNs. The maintenance is based on two main
principles: the push- and the pull-principle. The former
can be applied to both exogenous and endogenous changes,
whereas the latter can only be applied to exogenous changes.
With the push-principle, the data source pushes informa-
tion and business processes automatically to an SN when
they are added, updated, or deleted within the data source.
However, with regard to exogenous changes, prerequisite is
that the data source is able to send notifications if informa-
tion and/or business processes have been changed. Regard-
ing endogenous changes, the prerequisite is that the SN de-
tects changes automatically and triggers respective actions.
With the pull-principle, an SN gathers information and
business processes from a data source. Such a maintenance
process is triggered by time-based schedulers, i.e., the SN is
maintained at a certain point in time. The principle is used
for data sources which are not capable of sending change
notifications (e.g., a document has changed) to the SN.
However, the use of a specific principle depends on the ca-
pabilities of a data source (whether the data source is able
to send change notifications to the SN or not). For exam-
ple, for an enterprise information system which is capable of
sending notifications we use the push-principle whereas for
a shared drive we use the pull-principle.
For each of these two principles, we introduce a corre-
sponding algorithm (cf. Fig. 8). Prerequisite for both prin-
ciples is that the SN has access to underlying data sources.
In case of exogenous changes the SN transforms information
and business processes into a uniform format. In case of
endogenous changes no transformation is necessary.
Push-Principle
Push-Algorithm
add delete update
Pull-Principle
Pull-Algorithm
add delete update
Principles
Algorithms
Functions
Figure 8: Push- and Pull-Algorithm
3.3.1 Push-Algorithm
The Push-Algorithm deals with changes of an SN based
on the push-principle, e.g., a policy is no longer valid in 2014
and the corresponding object in the SN has to be maintained
accordingly. Thus, the maintenance of the SN is triggered by
an action that is applied to the SN by the Push-Algorithm.
The algorithm works as follows: In the add and update
case, we create a vertex vincluding its properties from the
data source affected by the action a(i.e., based on the uri of
the action). After that, we call the corresponding function.
In the delete case, we identify the corresponding vertex v∈
Vbased on the uri of the action and call the according
function. Hence, the Push-Algorithm applies endogenous
and exogenous changes to the SN by actions.
3.3.2 Pull-Algorithm
The Pull-Algorithm deals with changes of an SN based on
the pull-principle, i.e., data has changed in the data source
and needs to be gathered by the SN. For example, docu-
ments on a shared drive are updated and, therefore, respec-
tive changes have to be made in the SN. As aforementioned,
the maintenance of the SN is triggered by a scheduler.
Algorithm 1: Push-Algorithm
Require:SN = (V, E, L, W, fl, fw) an SN,
aan action;
Ensure:SN is updated;
switch func(a) do
case add
v:= create a vertex incl. its properties from
the data source affected by the uri of a;
add(SN, v);
break;
case update
v:= create a vertex incl. its properties from
the data source affected by the uri of a;
update(SN, v);
break;
case delete
v:= get v∈Vs.t. uri(v) = uri(a);
delete(SN, v);
The algorithm works as follows: First, we create a set of
vertices Vds from a data source ds. After that, for each ver-
tex v∈Vin the SN that was created from ds (property
source(v)), we check if a corresponding vertex vds ∈Vds ex-
ists. If this is the case, we check if vds is newer than v(e.g.,
by comparing the creation and/or modification dates). If v
is out-of-date, it is updated with the properties of vds by call-
ing update(SN, vds). After that, vds is removed from Vds. If
no corresponding vertex exists in the data source, we delete
the vertex (v) in the SN using the delete(SN, v) function.
Finally, we add each remaining vertex vds ∈Vds from the
data source to the SN by calling add(SN, vds), which leaves
the SN synchronized with the data source. Hence, the Pull-
Algorithm allows for maintaining the SN at a certain point
in time. Note that the process has to be repeated for each
data source that has to be synchronized.
Algorithm 2: Pull-Algorithm
Require:SN = (V, E, L, W, fl, fw) an SN,
ds the data source;
Ensure:SN is updated;
Vds := create a set of vertices incl. their properties
from the data source ds;
foreach v∈Vs.t. source(v) = ds do
vds := get vds ∈Vds s.t. uri(vds) = uri(v);
if vds then
if vds is newer than vthen
update(SN, vds);
Vds := Vds \ {vds}
else
delete(SN, v);
foreach vds ∈Vds do
add(SN, vds);
3.3.3 Partial-Pull-Principle and Algorithm
In practice, SNs can comprise a large amount of objects
and relations. Maintaining SNs using the pull-principle can,
therefore, be a very time-consuming task. In a specific work
context, a user might, however, only be interested in a se-
lected part of the SN. For example, during a review, review
templates, review minutes, existing reviews, or even results
of a real-time evaluation (e.g., prioritization of projects in
a workshop) are of great importance, while checklists and
best practices for performing effective project management
are less interesting. Thus, it is sufficient to maintain only
these objects and relations that are relevant to the user
when querying the SN. To capture this, we introduce a fur-
ther principle, called the partial-pull-principle, where the SN
gathers only information and business processes from data
sources as requested by a user. These (and only these ob-
jects) are then updated on demand.
In contrast to the other principles, the partial-pull-princi-
ple is completely user-driven because it is triggered by a user
request, whereas the push- and pull-principle are machine-
driven, e.g., through notifications from other enterprise in-
formation systems or schedulers.
Based on the partial-pull-principle, we introduce a third
algorithm (cf. Fig. 9) as a lightweight version of the Pull-
Algorithm. It does not maintain the entire SN, but only the
parts which are relevant for a given request.
Partial-Pull-Principle
Partial-Pull-Algorithm
add delete update
Principle
Algorithm
Functions
Figure 9: Partial-Pull-Algorithm
The algorithm works as follows: First, we create a set of
vertices Vds from affected data sources according to the user
request req. Then, for each vertex v∈Vaffected by req we
retrieve the corresponding vertex vds from the affected data
sources. Thus, if a corresponding vertex vds is in the data
source, we check if vds is newer than v(e.g., by compar-
ing the creation and/or modification dates) and if vis out-
of-date, it is updated with vds by calling update(SN, vds).
If there is no corresponding vertex in the data source, we
delete the vertex (v) in the SN using the delete(SN, v) func-
tion. Hence, the Partial-Pull-Algorithm allows for maintain-
ing parts of an SN based on a user request and ensures that
all requested objects including their properties are synchro-
nized with affected data sources.
4. VALIDATION
To prove that our algorithms are able to successfully main-
tain SNs, we implemented them and evaluated their perfor-
mance in consideration of depth and breadth. We further
evaluated the necessity and application of our algorithms in
a real-world case study in the automotive domain.
Our validation is guided by three research questions:
RQ1 Is automatic maintenance of SNs feasible in consider-
ation of exogenous and endogenous changes?
RQ2 How do depth and breadth affect the runtime of main-
tenance operations?
RQ3 How essential is automatic maintenance of SNs?
Algorithm 3: Partial-Pull-Algorithm
Require:SN = (V, E, L, W, fl, fw) an SN,
req the request to an SN;
Ensure:SN is partially updated;
Vreq contains the requested vertices;
Vds := create a set of vertices from the
data sources affected by req;
foreach v∈Vaffected by req do
vds := get vds ∈Vds s.t. uri(vds) = uri(v);
if vds then
if vds is newer than vthen
update(SN, vds);
vupd := get v0∈Vs.t. uri(v0) = uri(vds);
Vreq := Vreq ∪ {vupd};
else
Vreq := Vreq ∪ {v};
else
delete(SN, v);
4.1 Implementation and Configuration
Driven by these research questions, we implemented a pro-
totype installed on a laptop with an Intel quad-core CPU,
16 GB RAM, and a Windows 7 64-bit operating system.
The prototype is a web-based Java (JRE7)/Scala (2.10)1
application realized as a 3-tier architecture.
The presentation layer uses the web application frame-
work Play,2the Twitter Bootstrap framework,3the Data-
Driven Documents (D3) library,4jQuery,5HTML 5 tem-
plates, and Cascading Style Sheets (CSS) 3. We created
our SNs using the semantic middleware iQser GIN Server
(2.0.0.36) [26]. In addition, we developed several Java open-
source plugins6on the logic layer. The data layer is realized
by a Lucene search index7and a MySQL database.8
Based on our property classification (cf. Section 3.2) we
configured objects and relations of our prototype. As manda-
tory/static object properties we choose cdate,uri, and source
(cf. Fig. 10(a)). Optional properties are filetype and title.
While the filetype does not change over time (static), the
title can vary (dynamic), but it might not be available for
every filetype (e.g., text file) and, therefore, it is optional.
Mandatory dynamic properties are the buzz (activity on ob-
ject), cont,deg, and mdate.
Analogous to object properties, uri and cdate are manda-
tory for relations (cf. Fig. 10(b)). However, relations have
additional mandatory properties such as the source vertex
and destination vertex. The label of an edge is also manda-
tory, while its relevance (weight) can vary, e.g., changing
cont may affect the weight of “is similar to” relations. Addi-
tionally, the reason of a relation, which describes why a re-
lation has been established (e.g., a particular method which
detected a “is similar to” relation), is static and optional.
1http://www.scala-lang.org/
2http://www.playframework.com/
3http://getbootstrap.com/
4http://d3js.org/
5http://jquery.com/
6http://sourceforge.net/directory/?q=nipro
7http://lucene.apache.org/
8http://www.mysql.com/
We split our validation into two parts: The first part ad-
dresses the technical perspective (cf. Section 4.2), i.e., the
influence of depth and breadth on our algorithms, whereas
the second part evaluates their application in an empirical
case study in the automotive domain (cf. Section 4.3).
(c)
cdate, source, uri
(a)
buzz, cont,
deg, mdate
(d)
filetype
(b)
title
(c)
cdate, destination
vertex, label,
source vertex, uri
(a)
mdate, weight
(d)
reason
(b)
-
(a) objects (b) relations
Figure 10: Property classification implementation
Therefore, we created two SNs based on the configuration
mentioned above: one with synthetic text files to evaluate
the runtime behavior of the algorithms and one with real-
world documents from the automotive domain to evaluate
the functional perspective of our maintenance framework.
Both SNs were created with the following relations: “is au-
thor of”, “has same title as”, and “is similar to”. The suc-
cessful implementation and initial tests have already shown
that automatic maintenance of SNs is feasible in general.
Therefore, we now examine the runtime in consideration of
depth and breadth. Finally, we evaluate the necessity and
practicability of automatic maintenance of SNs in a business
environment.
4.2 Technical Validation
In order to address RQ1 and RQ2, we created SNs contain-
ing 5, 50 and 500 objects once with small (1 KB) and once
with larger (100 KB) files. To obtain comparable results
for our measurements, all objects within an SN were identi-
cal (e.g., identical property cont). Therefore, we simulated
the worst-case scenario regarding performance: Every object
was connected with every other object in each SN yielding
40, 4,900 and 499,000 relations. Note, only “is similar to”
and “is author of” relations were detected since text files do
not have the property title and therefore no “has same ti-
tle as” relations were recognized. Based on the initial SNs,
we performed operations (add, update, delete) with a single
object using our Push- and Pull-Algorithm (cf. Section 3.3).
Each combination of these dimensions represented one case
to be examined with regard to small and large files (36 cases
in total, e.g., updating an object with a size of 1 KB through
the Push-Algorithm in an SN containing 5 objects).
For each case, the shown numbers in the diagrams (cf.
Figs. 11-13) constitute averages over three warm runs. Warm
runs were chosen to ensure comparability of the measured
values since the iQser GIN Server [26] performs several initial
background tasks on start-up. Note the logarithmic scale
used in the diagrams.
Fig. 11 shows that objects can be added to an SN in linear
time. Compared to the number of objects within an SN, the
depth has a bigger influence on the runtime as shown by the
comparison with objects of different size (1 KB vs. 100 KB).
Detecting relations between the added and existing objects
is polynomial in the number of vertices. However, note that
the actual performance of the algorithms also depends on the
property values of vertices. For example, the value of cont
(i.e., the size of the property in bytes) affects the analysis of
similarity relations between the added and existing vertices.
1
10
100
1,000
10,000
100,000
5 5 5 5 50 50 50 50 500 500 500 500
push pull push pull push pull push pull push pull push pull
1 KB 1 KB 100 KB 100 KB 1 KB 1 KB 100 KB 100 KB 1 KB 1 KB 100 KB 100 KB
time in ms
no. of objects / algorithm / filesize
relation
object
Figure 11: Effect of depth and breadth on additions
As shown in Fig. 12 update operations perform similarly.
In contrast to the integration of objects, however, some op-
erations are not required (e.g., for mandatory static proper-
ties). Comparisons between existing properties take place
instead (e.g., to check whether a property has changed).
Nevertheless, we could not detect significant differences con-
cerning cost between add and update operations on identical
initial situations.
1
10
100
1,000
10,000
100,000
5 5 5 5 50 50 50 50 500 500 500 500
push pull push pull push pull push pull push pull push pull
1 KB 1 KB 100 KB 100 KB 1 KB 1 KB 100 KB 100 KB 1 KB 1 KB 100 KB 100 KB
time in ms
no. of objects / algorithm / filesize
relation
object
Figure 12: Effect of depth and breadth on updates
In contrast to add and update operations, delete opera-
tions perform differently. While the deletion of objects can
be done in linear time, the deletion of relations varies sig-
nificantly with the size of the objects. The reason is that
all references, which form the basis of a relation (e.g., ex-
tracted concepts and co-occurrences of a specific object in
case of “is similar to” relations) must be also deleted. Note
that such “housekeeping” tasks are controlled by the iQser
GIN Server. In turn, this might be the reason for differences
in measured values resulting in unexpected outcomes. For
example, the cost for deleting an object with a size of 1 KB
out of 5 objects was higher than deleting an object with a
size of 100 KB out of 5 objects (cf. Fig. 13). Furthermore,
measurements with the programming language Java can be
less accurate compared to native programming languages
(e.g., the Java garbage collector cannot be disabled).
Despite the fact that the implemented technology might
cause inaccuracies in measurements, we were still able to
verify that maintenance cost with our algorithms is highly
dependent on depth and breadth (RQ2). However, external
components (e.g., a component for the semantic analysis of
object properties) and their computation cost can influence
the performance of maintenance operations significantly. In
contrast, the consideration of our property classification (cf.
Section 3.2) can have positive effects on performance if only
necessary operations (e.g., only updating properties which
are not mandatory/static) are executed.
1
10
100
1,000
10,000
5 5 5 5 50 50 50 50 500 500 500 500
push pull push pull push pull push pull push pull push pull
1 KB 1 KB 100 KB 100 KB 1 KB 1 KB 100 KB 100 KB 1 KB 1 KB 100 KB 100 KB
time in ms
no. of objects / algorithm / filesize
relation
object
Figure 13: Effect of depth and breadth on deletes
Altogether, the Push- and the Pull-Algorithm both ensure
a synchronized SN over affected data sources after comple-
tion. Thereby, automatic maintenance is feasible (RQ1).
4.3 Empirical Validation
As proposed by Yin [27] and Kitchenham et al. [12], we
performed a case study to evaluate RQ3 in a“typical”project
setting with knowledge workers and decision makers from
several innovation departments in the automotive domain.
Thus, all participants were involved in knowledge-intensive
business processes. The participants were selected based on
their expert knowledge regarding the considered case. No
participant was a member of the research team.
The case study was performed in July 2014 with five de-
cision makers and six knowledge workers from a large au-
tomotive manufacturer. It was conducted based on a ques-
tionnaire comprising 60 questions.9Each interview lasted
around 90 minutes.
First, the participants had to answer general questions
about their current work environment and information hand-
ling within their processes. Afterwards they had to perform
tasks with our prototype and answer questions based on
these tasks with respect to maintenance (i.e., add, update,
delete, and search for objects and validate property values
as well as detected relations). The underlying corpus (data
sources) contained 333 internal documents from their field
of interest (e.g., scientific papers from departments dealing
with technology monitoring and technology development).
Therefore, the users were familiar with the information rep-
resented in the SN and able to judge the containing infor-
mation and its quality on a certain level.
In order to generalize the results and to gain further in-
sights, we asked the users some concluding questions when
all tasks had been completed. More specifically, RQ1 was
addressed from the user’s perspective by the tasks with as-
sociated questions, whereas RQ3 was investigated by the
introducing and concluding questions.
9Available at http://nipro.hs-weingarten.de/casestudy
The initial questions about information handling in the
users’ current work environment revealed that, with the ex-
ception of personal contacts (e.g., in meetings, phone calls),
information is mostly handled and accessed in a digital way.
Information is, however, mostly not well-structured and of-
ten distributed in different sources (e.g., information systems
or shared drives), which makes it difficult to search for it.
These statements are endorsed by the fact that a significant
amount of information is stored in files.
Given the fact of enormous file usage, a consequential chal-
lenge working with files in a business environment is keeping
track of changes (cf. Fig. 14). This is mainly substantiated
by the dynamics and amount of information available as well
as a lack of technological assistance (e.g., search over differ-
ent data sources, notifications on changed content).
1
2
3
4
5lower quartile
minimum
median
maximum
upper quartile
Question: The information needed for my
daily business is very dynamic.
1: completely disagree, 2: disagree, 3: neutral,
4: agree, 5: completely agree
(a) Information dynamics
1
2
3
4
5lower quartile
minimum
median
maximum
upper quartile
Question: Based on my current work
environment I can easily identify
interdependencies between information
from different data sources.
1: completely disagree, 2: disagree, 3: neutral,
4: agree, 5: completely agree
(b) Information overview
Figure 14: Information characteristics
In order to address these challenges (e.g., dynamics of in-
formation in distributed sources, heterogeneous filetypes),
we introduced our prototype (which captures such dynam-
ics) and asked the participants to perform tasks with the
offered graphical user interface (e.g., check if property val-
ues were updated, validate the correctness of a relation).
In the concluding questions all participants stated that
every task with our algorithms could be completed success-
fully. Since objects, relations, and properties were adapted
based on exogenous and endogenous changes, we can confirm
that automatic maintenance of SNs is feasible in practice.
1
2
3
4
5
Pull/Add Push/Add Pull/Delete Push/Delete
Question: Updating the SN was easy.
1: completely disagree, 2: disagree, 3: neutral,
4: agree, 5: completely agree
lower quartile
minimum
median
maximum
upper quartile
Figure 15: Simplicity of maintenance for users
However, some users could not recognize any relation be-
tween the document they added to the SN and others. While
no “is author of” and “has same title as” relation existed, the
“is similar to” relations were not shown. For more sophis-
ticated user experience we filtered these relations according
to a hardcoded weight-threshold with respect to the overall
amount of “is similar to” relations. Nevertheless, we could
verify with according database entries that the relations were
set and take this as an input for further user interface devel-
opment by allowing users to set the threshold dynamically.
Additionally, we asked the users about their preference be-
tween push and pull. All users preferred the Push-Algorithm
since the effects of their tasks are reflected immediately in
the user interface. By contrast, the Pull-Algorithm always
has a delay since it is triggered at a certain point in time.
A synchronization every three minutes caused obviously too
much delay for some users, even when performing other tasks
in between. However, the Partial-Pull-Algorithm, which ad-
dresses this downside, received positive feedback from the
participants. Note that the Partial-Pull Algorithm only con-
siders objects currently existing in an SN.
1
2
3
4
5
Pull/Add Push/Add
Question: My effort updating the SN was low.
1: completely disagree, 2: disagree, 3: neutral,
4: agree, 5: completely agree
lower quartile
minimum
median
maximum
upper quartile
Figure 16: Maintenance effort for users
We interpret the disadvantage of the Pull-Algorithm as
a confirmation of RQ3 (automatic maintenance is essential)
and recommend the Push-Algorithm if technology permits
offering an improved real-time-experience to users. For the
Pull-Algorithm, the update intervals can be shortened, de-
pending on the number of files in a data source. Neverthe-
less, the delay of the Pull-Algorithm, as observed by the
users, is caused by the configured time interval of imple-
mented schedulers. However, its performance compared to
the Push-Algorithm is almost identical (cf. Section 4.2).
Concluding, we asked the users about their impression
concerning benefits for their daily work in consideration of
SN maintenance. All users stated that integrating and con-
necting distributed information was easy when using the
prototype (cf. Fig. 15) and could be done with reasonable
effort (cf. Fig. 16). Additionally they confirmed the benefit
of their daily work (cf. Fig. 17).
1
2
3
4
5
Pull/Update Push/Update
Synchronization: Automatic updates of
objects, relations and properties are useful.
1: completely disagree, 2: disagree, 3: neutral,
4: agree, 5: completely agree
lower quartile
minimum
median
maximum
upper quartile
Figure 17: User perspective on auto. maintenance
More than 90% of the participants also stated that an SN
provides an enhanced overview on information and that a
maintained SN is desirable. In particular, this would be a
benefit compared to distributed data sources. One user sum-
marized: “Maintenance and corresponding updates should
be automatic. No user interaction for maintenance should
be required. Users should focus on working with the SN.”
Asking about use cases for a maintained SN, both, knowl-
edge workers and decision makers recognized further poten-
tials of the SN to support their daily work. For example,
an expert search for people assigned to objects (e.g., au-
thors) and their relations with each other as well as decision
support scenarios (e.g., detecting patterns within the SN).
4.4 Discussion
We have shown that automatic maintenance of SNs with
regard to both exogenous and endogenous changes is feasible
with acceptable cost (RQ1). Furthermore, we examined the
effect of depth and breadth on the runtime of the proposed
algorithms (RQ2). Both algorithms performed satisfactorily
in terms of adding, updating, and deleting objects. The cost
of detecting relations, however, varies widely when expensive
linguistic and/or statistical algorithms are used.
All users appreciated a unified single point of information
access, which is up-to-date and makes distributed informa-
tion searchable. Many of the interviewed users already ex-
perienced a well maintained SN as an enabler for various
use cases (e.g., expert search, gap analysis). In particu-
lar, knowledge workers and decision makers can benefit from
maintained SNs. Both groups emphasized that relations be-
tween objects can be interesting for various purposes (e.g.,
navigation within an SN or decision support).
Our case study results on the deficits of information hand-
ling in current business environments (cf. Section 4.3) have
shown that it becomes increasingly crucial to provide up-
to-date, integrated, and homogenous views on enterprise in-
formation. The empirical validation confirms that there is
a high demand for a single point of information access in
knowledge-intensive processes. Finally, we verified that a
maintained SN is not only essential for such processes, but
a practicable solution as well (RQ3).
5. RELATED WORK
Generally, various types of SNs exist. Examples include
associative networks [5], topic networks [22], fact networks
[23], or ontologies [13]. They usually have a special field of
application (e.g., semantic search, reasoning). All of them
(including our approach) have in common, that they have to
be maintained. Depending on the expressiveness, however,
maintenance effort varies widely. Usually the higher the ex-
pressiveness, the higher the creation and maintenance effort
[23]. In practice, most of the above mentioned SNs have to
be maintained manually. However, several semi-automatic
maintenance approaches exist such as ontology maintenance
using textual analysis [7], ontology mapping maintenance
support [3], or SN integrity maintenance [2].
In turn, further approaches such as data warehousing [14],
enterprise architecture management (EAM) [6], or search
engines [15] have to deal with maintenance as well. However,
manual effort is usually necessary (e.g., schema mapping)
[1]. For example, the literature study of Farwick et al. [4]
confirms that there exists no related work addressing the
problem of automated EA maintenance.
Our SN represents information in a unified, integrated,
user-, and machine-interpretable form. Unlike existing main-
tenance approaches, our approach focuses on the integration
of distributed and heterogeneous information and business
processes including the detection of their interdependencies.
We showed that no manual effort is needed to maintain such
an SN, but manual effort can increase the expressiveness,
i.e., better conceptualization and delimitation of concepts
(e.g., for enhanced detection of relations).
6. SUMMARY AND OUTLOOK
This paper presented a framework for maintaining SNs,
i.e., adapting exogenous and endogenous changes. There-
fore, we presented three algorithms. Furthermore, our prop-
erty classification allows maintaining SNs efficiently.
Our validation confirms that our maintenance framework
is able to keep SNs synchronized with underlying sources in
reasonable time. Moreover, we applied our algorithms to a
real-world case, i.e., validated them based on an implemen-
tation in the automotive domain.
In future, we will improve our algorithms to support self-
learning components with a focus on endogenous changes.
7. REFERENCES
[1] P. Bernstein and L. Haas. Information integration in
the enterprise. Communications of the ACM, pages
72–79, 2008.
[2] T. ˇ
Capek. Semantic network integrity maintenance via
heuristic semi-automatic tests. In Proc. of the
RASLAN Workshop 2009, pages 63–67. Masaryk
University, 2009.
[3] D. Dinh, J. Dos Reis, C. Pruski, M. Da Silveira, and
C. Reynaud-Delaˆıtre. Identifying relevant concept
attributes to support mapping maintenance under
ontology evolution. J. of Web Semantics: Science,
Services and Agents on the World Wide Web, 2014.
[4] M. Farwick, B. Agreiter, R. Breu, S. Ryll, K. Voges,
and I. Hanschke. Requirements for automated
enterprise architecture model maintenance - a
requirements analysis based on a literature review and
an exploratory survey. In Proc. of the 13th Int’l Conf.
on Enterprise Information System (ICEIS’11), pages
325–337. SciTePress, 2011.
[5] N. V. Findler. Associative Networks: The
Representation and Use of Knowledge of Computers.
Academic Press, 1979.
[6] R. Fischer, S. Aier, and R. Winter. A federated
approach to enterprise architecture model
maintenance. In Proc. of the 2nd Int’l Workshop on
Enterprise Modelling and Information Systems
Architectures (EMISA’07), pages 9–22. Springer, 2007.
[7] Y. Gargouri, B. Lefebvre, and J.-G. Meunier.
Ontology maintenance using textual analysis. In Proc.
of the 7th World Multi-Conf. on Systemics,
Cybernetics and Informatics (SCI’03). International
Institute of Informatics and Systemics, 2003.
[8] M. Hipp, B. Michelberger, B. Mutschler, and
M. Reichert. A framework for the intelligent delivery
and user-adequate visualization of process information.
In Proc. of the 28th Symp. on Applied Computing
(SAC’13), pages 1383–1390. ACM Press, 2013.
[9] M. Hipp, B. Mutschler, B. Michelberger, and
M. Reichert. Navigating in process model repositories
and enterprise process information. In Proc. of the 8th
Int’l Conf. on Research Challenges in Information
Science (RCIS’14). IEEE Computer Society, 2014.
[10] M. Hipp, B. Mutschler, and M. Reichert. On the
context-aware, personalized delivery of process
information: Viewpoints, problems, and requirements.
In Proc. of the 6th Int’l Conf. on Availability,
Reliability and Security (ARES’11), pages 390–397.
IEEE Computer Society, 2011.
[11] A. Hotho, A. N¨
urnberger, and G. Paaß. A brief survey
of text mining. LDV Forum, 20(1):19–62, 2005.
[12] B. Kitchenham, L. Pickard, and S. L. Pfleeger. Case
studies for method and tool evaluation. IEEE
Software, 12(4):52–62, 1995.
[13] L. W. Lacy. OWL: Representing Information Using the
Web Ontology Language. Trafford Publishing, 2005.
[14] J. Lechtenb¨
orger. Data warehouse schema design.
Akademische Verlagsgesellschaft AKA, PhD Thesis,
University of M¨
unster, 2001.
[15] D. Lewandowski. Web searching, search engines and
information retrieval. Information Services & Use,
18(3), 2005.
[16] B. Michelberger, B. Mutschler, M. Hipp, and
M. Reichert. Determining the link and rate popularity
of enterprise process information. In Proc. of the 21st
Int’l Conf. on Cooperative Information Systems
(CoopIS’13), pages 112–129. Springer, 2013.
[17] B. Michelberger, B. Mutschler, and M. Reichert. On
handling process information: Results from case
studies and a survey. In Proc. of the 2nd Int’l
Workshop on Empirical Research in Business Process
Management (ER-BPM’11), pages 333–344. Springer,
2011.
[18] B. Michelberger, B. Mutschler, and M. Reichert.
Towards process-oriented information logistics: Why
quality dimensions of process information matter. In
Proc. of the 4th Int’l Workshop on Enterprise
Modelling and Information Systems Architectures
(EMISA’11), pages 107–120. Koellen-Verlag, 2011.
[19] B. Michelberger, B. Mutschler, and M. Reichert. A
context framework for process-oriented information
logistics. In Proc. 15th Int’l Conf. on Business
Information Systems (BIS’12), pages 260–271.
Springer, 2012.
[20] B. Michelberger, B. Mutschler, and M. Reichert.
Process-oriented information logistics: Aligning
enterprise information with business processes. In
Proc. of the 16th Int’l EDOC Conf. (EDOC’12), pages
21–30. IEEE Computer Society, 2012.
[21] S. Oertelt and K. Ulmschneider. Prozessintegrierter
Einsatz virtueller Methoden im strategischen
Technologie- und Innovationsmanagement. In Proc. of
KnowTech - Wissensmanagement und Social Media -
Markterfolg im Innovationswettbewerb, pages 485–498.
GITO, 2013.
[22] J. Park and S. Hunting. XML Topic Maps: Creating
and Using Topic Maps for the Web. Addison-Wesley
Professional, 2002.
[23] K. Reichenberger. Kompendium semantische Netze:
Konzepte, Technologie, Modellierung. Springer, 2010.
[24] M. Reichert and B. Weber. Enabling Flexibility in
Process-Aware Information Systems: Challenges,
Methods, Technologies. Springer, 2012.
[25] J. Rowley. The wisdom hierarchy: representations of
the DIKW hierarchy. J. of Information Science,
33(2):163–180, 2007.
[26] J. Wurzer. New approach for semantic web by
automatic semantics. In Proc. of the 2nd European
Semantic Technology Conf. (ESCT’08), 2008.
[27] R. K. Yin. Case Study Research: Design and Methods.
Sage Publications, 2009.