Advanced Engineering Informatics 62 (2024) 102582
Available online 16 May 2024
1474-0346/© 2024 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Full length article
Evaluating the efficiency and performance of data persistent systems in
managing building and environmental Data: A comparative study
Eyosias Dawit Guyo
a
,
b
,
*
, Timo Hartmann
a
a
Technische Universit¨
at Berlin, Germany
b
Trimble Solutions Oy, Finland
ARTICLE INFO
Keywords:
Building data
Environmental data
Interrelated data
Relational database
Graph-based database
Comparative evaluation
ABSTRACT
Selecting an appropriate data persistent system for a specific use case necessitates a thorough examination of the
application domain and the characteristics of the data expected to be stored. While comparative studies of data
persistent systems exist in various domains, there is a notable absence of such studies concerning building and
environmental data management. This research aims to bridge this gap by conducting a comparative evaluation
based on building and environmental datasets and use cases. The study primarily focuses on two types of
database systems, namely relational database systems and graph-based database systems. Two building and two
city models are employed in the evaluation. The building data sets are extracted from IFC models, and envi-
ronmental data are extracted from CityGML and OpenStreetMap. The assessment involves qualitatively analysing
the database design process of the systems and quantitatively evaluating the efficiency of retrieving data from
those systems. The comparative evaluation identifies at least two crucial aspects to consider when selecting a
suitable data-persistent system for managing building and environmental data. The first aspect pertains to the
stability of the data to be stored, along with the complexity of interrelationships within the building and envi-
ronmental dataset. The second aspect involves the manner in which data is retrieved to accomplish different
tasks within the particular business case. The findings demonstrate that use cases that typically manage inter-
related data and necessitate the traversal of complex relationships between building and environmental features
are better managed by graph-based database systems, particularly when dealing with large datasets. Conversely,
relational databases exhibit superior performance for use cases requiring minimal or no relationship traversal,
regardless of dataset size. The contributions of this study can serve as valuable input when designing information
management tools and systems for building and environmental data management.
1. Introduction
Software applications that utilise building and environmental data
can play a crucial role in supporting decision-making processes associ-
ated with the design, construction, and operation of building facilities.
These applications generally deal with data that can be categorised as
either ephemeral or persistent in nature. Ephemeral data refers to tran-
sitory data that exists for a short duration of time and is typically stored
in temporary storage [1]. Such data ceases to exist once the operation
responsible for its creation terminates. In contrast, persistent data is
intended to be permanently stored unless explicitly deleted [2]. Data
residing in persistent storage exhibits greater independence from the
application that created it and can be accessed again in future sessions
and even by a different application [44]. Various methods can be
employed to persist data, one of which is by using database systems.
A database is more than a simple aggregation and storage of data.
Instead, it is a collection of related data that is logically organised for a
specific purpose [3]. It can be created, managed, and accessed with the
help of computer applications referred to as Database Management
Systems (DBMS). These systems utilise various data models to organise
and store data. One of the data models that is widely used by numerous
DBMSs is the relational model. In the relational model, data is organised
as a collection of relations or related data points, commonly represented
as rows in tables [3]. A single relational database can contain multiple
tables that are often associated with each other, enabling the storage and
retrieval of interrelated data. The data stored in the relational database
is accessed and manipulated by the Structured Query Language or SQL,
which is also often used to refer to relational database systems in
* Corresponding author at: Gustav-Meyer-Allee 25 13555, Berlin, Germany.
E-mail address: [email protected] (E.D. Guyo).
Contents lists available at ScienceDirect
Advanced Engineering Informatics
journal homepage: www.elsevier.com/locate/aei
https://doi.org/10.1016/j.aei.2024.102582
Received 11 January 2024; Received in revised form 22 April 2024; Accepted 30 April 2024
Advanced Engineering Informatics 62 (2024) 102582
2
general.
In addition to relational models or SQL, other alternative data
models are used to persist data in a database. While these alternative
methods are distinct from one another, they are often collectively
referred to as non-relational database systems or NoSQL. Some of these
non-relational database systems primarily focus on aggregating related
data points, while others chiefly focus on representing the relationship
between data points [4]. The graph-based database is one example of a
relationship-oriented database system that does not follow the relational
model. Instead, it represents data using a graph model. The model
typically uses nodes and edges to represent data points and their re-
lationships. Most situations that involve objects with a high level of
interrelationship can be represented using a graph model (hence using a
graph database) [5]. While both relational and graph-based database
systems possess the ability to represent complex relationships, they also
have significant differences. Consequently, thorough consideration is
necessary before selecting either as a data persistence method for a
specific use case.
This research aims to conduct a comparative analysis of relational
and graph database systems to evaluate the relative advantages and
limitations of these systems in managing interrelated building and
environmental data. To meet this research objective, four real-world
data sets consisting of two building data sets and two city data sets
are obtained. These data sets are subsequently stored and managed
using the target data persistent systems for the assessment. The first part
of the assessment involves a qualitative evaluation of the process of
designing each database system, populating them with the provided test
data sets, as well as manipulating them to store evolving data. In the
second part of the assessment, quantitative performance-focused ex-
periments are conducted to assess the speed and cost associated with
accessing interrelated data from the database systems. The tests cover
various scenarios that represent different combinations of database size
and levels of interrelationship within the data sets.
The study’s findings indicate that the complexity of the in-
terrelationships between building and environmental features in data
sets should be an essential consideration when selecting a data-
persistent system. The qualitative evaluation of the target database
systems has determined that building data sets that represent intricate
relationships between different building features and spaces, as well as
city data sets, which represent complex relationships between environ-
mental features, are better represented using a graph database. Addi-
tionally, the quantitative performance evaluation has demonstrated that
the manner in which data is retrieved to execute different tasks in a
specific business case should also be a critical factor to consider prior to
selecting a database system. In situations where data retrieval from a
database requires traversing a minimal number of relationships, rela-
tional database systems outperform their graph-based counterparts,
regardless of the database’s size. However, when a given use case ne-
cessitates traversing a substantial number of relationships, the perfor-
mance advantage often shifts to the graph database solutions. Overall,
the outcomes of this study offer valuable insights that can prove highly
beneficial when deciding on a suitable data persistence system for the
efficient management of building and environmental data across various
applications within the domain of built environment management.
Finally, while this study discusses the selection of suitable database
systems, it’s crucial to acknowledge that real-world software systems
often require the integration of multiple data persistence technologies
due to their inherent complexity. Therefore, the primary objective of
this study is not to advocate for the superiority of any single system, but
rather to underscore the significance of strategically selecting data
persistence systems based on specific requirements within a collabora-
tive environment.
The paper is structured as follows: Section 2 provides a theoretical
background about different database concepts and database systems
that are essential to understand the research. In Section 3, the research
method is thoroughly discussed, including details about the test data
sets, the assessment metrics and the configuration of the experiments.
Then, in Section 4, the result of the comparative assessment is presented.
Section 5 discusses the insights gained from the research results. The
discussion includes the practical implications of the research, its limi-
tations, and potential future directions. Finally, concluding remarks are
given in Section 6
2. Technical background and related works
Databases serve as essential tools for the persistent storage of data.
They facilitate the decoupling of data from the applications that
generate it, thereby ensuring that data remains accessible for future
utilisation by either the same or different applications [44]. Rather than
merely storing data, databases also include valuable information
regarding the relationships between stored data [6]. Hence, they func-
tion as repositories where related data is stored, organised and accessed
[7]. Databases are accessed and managed through different database
management systems (DBMS). These DBMSs have key defining features
that influence how data is going to be stored and accessed, with one
crucial aspect being the data model they employ for data persistence.
This section will provide some technical background regarding these
data models that is essential for understanding this research study.
Additionally, it will also offer a summary of prior research that relates to
data persistent models, underscoring the research gap that this study
aims to address.
2.1. Relational and Non-Relational database systems
Different database management systems employ different models for
organizing and storing data. The most widely utilised model among
them is the relational data model, which has been the primary choice for
data storage for many decades [8]. However, in recent years, alternative
data persistence models have emerged and been embraced by various
database management systems. These alternative models are collec-
tively known as non-relational database systems. While these non-
relational data models exhibit several commonalities, they are distinct
from one another. The following section offers an exploration of the
defining features of both relational data models and non-relational data
models, which are employed for data persistent in different database
systems.
2.1.1. Relational database systems
Relational database systems utilise a relational model that is founded
on relations [9]. A relation contains a finite set of ordered data points or
tuples [3]. All tuples in a relation have the same set of fields [10]. Re-
lations are represented using tables in relational databases where col-
umns represent data fields and rows represent a single record, which is a
set of related data points (tuple). Each column in a relational database
table has a unique name and specific data type that it can store. One or
more columns in a table can be used to store unique keys that can be
used to identify and retrieve each row or record in the table. Such col-
umns are called primary keys. Relational database systems are the most
popular database systems currently in use. Some popular relational
DBMSs include Oracle, MySQL, and PostgreSQL. Relational databases
are suitable for business cases in which the data to be stored is well-
known, stable and predictable where changes in the data structure are
expected to be rare and non-urgent [6].
Relational database systems can support some level of relationship
between the data set they store. A single relational database can contain
multiple tables which can be related to one another. The relationship
could be one-to-one, one-to-many or many-to-many. In some instances,
a table can also be associated with itself. Foreign keys, which are con-
straints set on columns, are used to establish relationships between ta-
bles in a database. By setting a foreign key constraint on a column from
one table, a reference to another table can be established. These keys can
then be used to request data that is distributed across multiple tables.
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
3
Relational database systems use joins to retrieve data that is saved across
multiple interrelated tables during read operations. Joins can also be
used to connect a table with itself. They are often created by overlapping
two tables using their common column. When two tables are joined, the
query processer first creates a table using a cross-product of the two
tables. In other words, it pairs all the rows of one table with all rows of
the other table [7]. If there is a third table to join, a cross-product of the
resulting table from the first join with the third table will be performed,
and the results will be stored in a new table.
2.1.2. Non-Relational database systems
Non-relational data persistence methods do not refer to a single data
storage model. Instead, they encompass multiple distinct systems that
are not based on a relational model. The following are the most common
types of non-relational database systems.
2.1.2.1. Key-Value store. Key-Value databases offer the simplest struc-
ture to store data, where records are stored as pairs comprising a unique
key and its associated value. Any type of data can be stored as a value
which can be accessed and retrieved using its key. These data persistence
systems are ideal for quick data access and exhibit high scalability [4].
However, they are unsuitable for storing interconnected data or
executing complex queries. Key-Value stores are utilised in use cases
such as caching, storing user data, storing large image databases, large
catalogues and websites that have a large number of pages [6]. Exam-
ples of key-value DBMSs include Redis, Amazon DynamoDB, Microsoft
Azure Cosmos DB and Memcached.
2.1.2.2. Document-Based database systems. Document-based database
systems resemble key-value databases, with the distinction that instead
of values, they store self-contained documents [11]. Each document in a
document store has a unique key that is used to identify and retrieve the
document. Additional keys within the documents can also be used to
retrieve specific documents. While document databases offer advantages
similar to key-value stores, they may exhibit slightly lower performance
since they may contain multiple data fields within a single record [11].
Therefore, they are suitable for storing semi-structured data with mul-
tiple attributes. For instance, they can be used to store user profiles
where multiple attributes specific to each user are stored or for content
management, where different types of content need to be saved [43].
Popular document-based database systems include MongoDB, Amazon
DynamoDB, Databricks, and Microsoft Azure Cosmos DB.
2.1.2.3. Column-Oriented database systems. Column-oriented database
systems store data in columns, which are the fundamental units of data
storage where each one of them consists of a name-value pair. These
columns can be grouped into column families, which results in a named
set of columns where each set gives a complete view of one entity [4]. By
introducing related key-value pairs, the column database system en-
hances the structure of key-value stores [11]. However, similar to key-
value and document-based systems, column-oriented databases are
primarily based on data aggregation and do not support joins and re-
lationships [4]. Nevertheless, they offer fast data aggregation capabil-
ities [6]. Prominent examples of column-oriented DBMS include
Cassandra, HBase and Microsoft Azure Cosmos DB.
2.1.2.4. Labelled property graphs. Graphs, typically made up of nodes
and edges (relationships between nodes), are used by some database
management systems to model scenarios involving distinct objects with
interrelationships [5]. One common type of a graph-based database
system is the labelled property graph. In labelled property graphs, nodes
are labelled and interconnected through edges (relationships), and both
nodes and edges can possess properties [4]. Property graphs offer a
simple, straightforward, and compact data representation that closely
resembles real-world relationships. Moreover, these systems provide an
efficient system to traverse relationships where traversing is localised to
the target record without necessitating a scan of the entire database. As a
result, they excel at representing data with complex relationships and
efficiently traversing those relationships [11]. These database systems
are particularly suitable for use cases where managing relationships and
paths between data is the primary requirement, such as in social
networking applications, route-finding tools, and recommendation en-
gines. Examples of popular graph DBMS include Neo4j, Microsoft Azure
Cosmos DB, and ArangoDB.
2.1.2.5. RDF trible stores. The Resource Description Framework (RDF)
is a framework that adopts a triple structure (RDF triples), comprising of
subject, predicate, and object, to describe concepts in a given domain
[48]. RDF has a standardised specification, which is maintained by the
W3C. Furthermore, although not obligatory, ontologies are often uti-
lised with RDF stores to provide a formalised and shared vocabulary and
relationships within the target domain [10]. These ontologies serve as a
schema to organise RDF triples. Another distinguishing feature of RDF is
the application of unique and global identification to resources, referred
to as Uniform Resource Identifier (URI), which facilitates the integration
of data across diverse data sets. Consequently, RDF excel in defining
data semantics and linking data across data sets. Examples of DBMSs
that can store RDF triples include Virtuoso, Apache Jena and GraphDB.
2.1.3. Important database features
The following section briefly introduces some fundamental database
features that are crucial for effective database management. Further-
more, the discussion offers a high-level insight into the distinction be-
tween relational and non-relational database systems in relation to these
key database features. This discussion is imperative for understanding
the comparative assessment that is presented in this study.
Query Language: Database users, including both human users and
computer applications, use different querying languages and general-
purpose programming languages to access and manipulate databases.
Queries are used to perform the four essential operations in database
management, which are Create, Read, Update and Delete or CRUD.
Furthermore, queries can be employed to generate a customised view of
a database tailored to the requirements of a specific use case [44]. Da-
tabases can be modelled to perform a specific set of queries efficiently.
However, such database optimisation can lead to suboptimal perfor-
mance when attempting to execute a different set of queries than the
planned one [10].
The Structured Query Language (SQL) is used to retrieve data from
relational databases as well as to perform various kinds of database
manipulations. SQL has gained widespread adoption across multiple
relational database management systems, enabling users to avoid
vendor lock-in by providing interoperability across different environ-
ments. In contrast, various distinct query languages have been devised
for various non-relational database systems. One such example is the
Cypher query language, which is used for querying Neo4j property
graphs [11]. Meanwhile, the SPARQL query language is developed to
query data from RDF triple stores [10]. Another example is the Cas-
sandra Query Language (CQL), which was introduced to manipulate
Cassandra wide-column stores [12]. Evidently, non-relational database
systems lack a universally standardised query language comparable to
SQL, which limits their cross-system compatibility. Consequently, users
who opt for a specific vendor’s non-relational database face the risk of
being tied to that vendor’s environment, as data migration and inte-
gration options are restricted.
Database Schema: When a relational database is created, it is
mandatory to define a schema that describes the database’s data struc-
ture. The schema serves as a logical blueprint for the database, outlining
the tables, relationships, and constraints that can be stored [13]. The
database is then populated by adding data to the tables defined within
the schema [7]. The schema imposes restrictions on the nature and size
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
4
of data inserted into specific columns as well as the permissibility of null
and duplicate values. A schema is defined during database design and is
not expected to undergo frequent changes [3]. Hence, designing a
schema requires thoughtful consideration of the data to be stored.
However, it is not uncommon to encounter scenarios where a data set
cannot be fully represented by a schema. In such cases, sparsely popu-
lated tables are created, where certain columns are left without values
[4]. When the variability between the data fields within records is sig-
nificant, it will lead to tables with numerous unused data fields. For that
reason, relational databases are a good fit for use cases where the data to
be stored is well-known, stable and predictable, and changes in the data
structure are rare and non-urgent [6].
In contrast, most non-relational database systems are schemaless and
eliminate the need to define entities and relationships when the data-
base is created. Instead, applications must enforce schemas when
reading from the database. These systems offer flexibility when writing
to the database by allowing the addition of new data without disturbing
existing functionalities [4]. Unlike relational databases, attributes
irrelevant to a particular record are not created for that record, thereby
avoiding empty data fields. As a result, schema-free non-relational
database systems better accommodate evolving data sets when
compared to relational databases [4]. However, it should be noted that
some non-relational database systems, RDF stores specifically, do have a
standardised schema.
Indexes: In many database systems, each record comes with a
unique key that serves as an identifier and facilitates record retrieval
through queries. Depending on the specific database systems, other data
fields within a record can also be used to retrieve the record. The data
fields that are frequently used to query the database are often indexed to
improve the efficiency of searching data. An index is a typically ordered
list of keys that point to the location of the full records stored in a
database [14]. If indexes are absent, the entire database needs to be
scanned when searching for records. With indexes, however, only a
limited portion of the database is scanned, reducing the time required to
retrieve data from the database. Nevertheless, indexes come with costs
associated with their storage and maintenance. They can slow down
updating operations since they also need to be updated when changes
are made to the table.
Scaling: The scalability of a database is an important aspect to
consider, as the size of the database can potentially exceed the capacity
of a single machine. Scalability can be approached either vertically or
horizontally. Vertical scaling, also known as scaling up, entails acquiring
a new machine with greater processing power or upgrading an existing
machine by equipping it with hardware that has greater processing
power. This approach to scaling can prove to be costly and is constrained
by the present advancements in computer processor technologies. The
other alternative to scale is horizontal scaling or scaling out, which
entails adding more machines to a system so it can accommodate a
higher workload. Sharding or partitioning a database is a common way to
scale horizontally. It involves splitting a database into unique pieces
known as shards, which are subsequently distributed across multiple
machines [6]. One of the primary driving forces behind the develop-
ment of non-relational database systems was the need to address the
scalability challenges of relational database systems [15]. Consequently,
the aggregation-based non-relational data models are more suited for
horizontal scaling, given that they inherently possess compartmental-
ised data units that can naturally be distributed across multiple ma-
chines [8]. In contrast, sharding is significantly harder in graph
databases due to their relationship-oriented nature, which typically re-
sults in a lack of boundaries within the database that can be used for
partitioning [11]. Nonetheless, the implementation of a Uniform
Resource Identifier (URI) in certain graph database systems, such as RDF
triple stores, can alleviate some of the challenges associated with
sharding. By relying on URIs to identify entities and relationships, it
becomes straightforward to split data across distributed servers that
support the HTTP protocol.
Transactions: A transaction refers to a sequence of operations per-
formed on a database, which can either be committed to the database or
rolled back (Cancelled) [7]. When a transaction is committed, the
resulting changes persist in the database, while rolling back a trans-
action leads to its cancellation and removal of any associated changes
[14]. ACID and BASE are two common transaction models employed in
different database systems. ACID data models prioritise consistency of
the data at any given time, often at the expense of data availability.
Relational DBMS often use the ACID transaction model, which stands for
Atomicity, Consistency, Isolation and Duration. An ACID transaction en-
sures that either all or none of the operations within the transaction are
executed (atomicity), the database is consistent at both the beginning
and end of the transaction (consistency), the transaction is treated as the
sole operation being executed on the database (isolation) and the
changes made to the database by a successful transaction persist in the
database (duration) [16]. In contrast, the BASE transaction model,
which stands for Basically Available, Soft State, and Eventually Consistent,
prioritises the data availability at any given time (basically available)
even though it may not always be consistent (soft state). While a BASE-
compliant database does not ensure immediate consistency, it will,
however, eventually achieve it (eventually consistent). As a result, to
achieve eventual consistency, the database state could change over time
without any input action. This model is widely adapted by most NoSQL
system. However, it is worth noting that some NoSQL database systems,
such as certain graph databases, offer ACID compliance.
In summary, relational and non-relational data persistence methods
that are used by different DBMSs possess distinct features and charac-
teristics. Consequently, they exhibit different behavioural patterns
under different circumstances. As a result, researchers have delved into
evaluating how these systems perform under diverse circumstances. The
upcoming section offers a concise overview of prior research that has
evaluated the efficacy of relational and non-relational data storage
systems. Afterwards, it outlines a crucial research gap that this study
seeks to fill.
2.2. Previous comparative studies
Comparative studies of the numerous data persistence methods assist
users in selecting a suitable solution that meets their specific needs.
Consequently, multiple researchers have conducted comparative as-
sessments with the aim of highlighting the relative advantages and
limitations of different data persistence models. A subset of these studies
compared the relational model with non-relational models, while others
concentrated on comparing different non-relational models with each
other. The studies utilise a range of evaluation metrics, encompassing
write, update, and read performance, scalability, and the ability to
manage the complexity of queries. Based on the particular study, the
term “complex query” can refer to queries that either involve traversing
relationships, aggregating data from multiple data sets, or performing
mathematical computations. The following discussion focuses on studies
that conducted the comparison through experiments.
The majority of comparative studies found in the literature primarily
focus on comparing relational databases with document-based data-
bases. A recent example of such a study is the research conducted by
Antas et al. [17], in which Microsoft SQL Server is compared with
MongoDB, a document-based DBMS and Cassandra, a column-oriented
DBMS. The evaluation results demonstrate that in scenarios involving
structured and interrelated data, SQL performs better than the non-
relational alternatives considered in the study. However, in cases
where the data is unstructured and does not necessitate complex oper-
ations, such as joins, then the two non-relational database systems
outperformed the relational database.
In addition to the complexity of the query, the size of the database
and the volume of data retrieved by a query can also impact the per-
formance of database systems. Studies have shown that an increase in
the database size has more adverse effects on relational database
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
5
systems compared to document-based database systems [18]. Further-
more, as demonstrated in a study conducted by Matallah et al. [19],
SQL systems perform better than document-based databases in small to
medium-scale read operations, but their performance declines as the size
of retrieved data increases. The same study also evaluated the writing
performance of the two database models and noted that the checks and
verifications associated with ACID-based transaction modelling caused
the SQL system to be less time-efficient when writing compared to the
BASE-compliant document-based database. These research outcomes
support the findings of a prior study that compared MySQL with
CouchDB (a document-based DBMS) and concluded that document-
based databases exhibit superior performance to relational databases
when executing write operations particularly involving large volume of
data [20]. This study further reinforces the assertion that document
databases outperform relational databases when reading a large amount
of data, while the performance advantage shifts to the relational data-
base as the complexity of the query increases and more relationships are
involved.
While the document-based database dominates the comparative
studies, a few studies have considered alternative non-relational data
models. For instance, the studies conducted by Abramova et al. [21]
and Antas et al. [17] compared relational databases with column-based
database systems. Both studies have demonstrated that column-based
databases offer easier scalability and faster execution of CRUD
queries. However, the data models have limited functionality compared
to the relational database when it comes to managing more complex
queries, such as joins, which are necessary for managing interrelated
data. Additionally, in rare instances, graph databases have also been
compared to relation databases. A study conducted by Kotiranta et al.
[22] has made the comparison based on the performance of read
queries. In this study, the graph database performed slightly better than
the relational database for simple queries, but it underperformed in
more complex queries. It is worth noting that, in the context of this
study, query complexity refers to operations that involve data aggre-
gation and mathematical computations.
In the context of comparing the non-relational database systems to
one another, the studies mostly focus on comparing document-based
database systems with column-based database systems, occasionally
including key-value store data models. In the assessments that included
key-value stores, the data model has come out as the most optimised for
execution read and write operations [23]. Moreover, when the studies
only include document-based and column-based data models, the
document-based model had a slightly better performance than the
column-based model when executing simple read queries and a signifi-
cantly inferior performance when executing read queries that include
relationships [17,24]. The column-based model also has better data
writing performance than the document-based model, which is attrib-
uted to the fact that the column-based database used in the study did not
require a large amount of memory to run the data insertion [24].
Besides comparing different data models, some studies delve into
analysing various implementations of the same data model across
different environments. Despite a shared underlying data model, distinct
implementations can result in notable differences in performance and
suitability for specific tasks. For instance, De Witte et al. [25] conducted
a comparison of four RDF storage solutions: Blazegraph, Enterprise Store
I, Enterprise Store II, and Virtuoso. Their study aimed to determine
which storage solution offers optimal performance across various data
set sizes by conducting stress tests. Another example is a performance
comparison conducted by Pauwels et al. [26], which focused on rule-
checking procedures that are employed in the construction industry.
This study analysed three distinct rule-checking procedures that utilise
RDF graphs. The study offers both a qualitative assessment of the
essential characteristics of the solutions under consideration and a
quantitative analysis grounded in task execution times.
In summary, previous research comparing non-relational database
systems with relational database systems has predominantly used
document-based data models to represent the non-relational side. In
contrast, the inclusion of graph-based database systems in such studies
has been extremely limited. Studies also often omit graph database
systems even when comparing non-relational databases with one
another. Nevertheless, including graph-based data persistence methods
in comparative studies is an important undertaking due to their unique
nature, one of which is their emphasis on relationships, which sets them
apart from other types of non-relational database systems. These other
systems have consistently been outperformed by relational database
systems in prior studies when managing complex relationships in data
sets. This may not hold true when comparing graph-based databases
with relational databases since both systems are capable of handling
complex relationships. Furthermore, a crucial aspect to consider when
evaluating database systems is the intended business use case. Each
database system possesses unique strengths and weaknesses that render
it suitable for specific domains and unsuitable for others due to the
potential variability in the nature of data sets encountered across
different domains. Prior database comparative studies within the
context of managing building and environmental data are lacking.
Therefore, the main objective of this study is to investigate and provide
new insights into the comparative advantages and disadvantages of
graph-based database systems and relational database systems in the
management of building and environmental data sets. In the pursuit of
delivering an objective analysis within this comparative study, it is
acknowledged that inherent challenges exist which may impact the
fairness and direct comparability of distinct database systems. Factors
including the specific implementations of data models by DBMSs and the
different design choices undertaken by the practitioner have the po-
tential to influence the outcomes. Readers are advised to take these
challenges into account when assessing the findings. A more compre-
hensive discussion of these limitations will be provided in the limitations
section of this paper. Despite these challenges, measures have been
taken throughout this study to maintain as fair a comparison as possible
between the database systems being examined.
3. Research method
3.1. Evaluation metrics and configuration
The comparative analysis presented in this research assesses the
management of interrelated building and environmental data by rela-
tional database systems and graph-based database systems. The type of
graph system selected for the study is the labelled property graph (LPG).
In property graphs, nodes are labelled and interconnected through re-
lationships, and both nodes and relationships can possess properties
[4]. Property graphs offer a simple, straightforward, and compact data
representation that closely resembles real-world relationships. They
allow for richly detailed modelling of relationships where properties can
be assigned directly to relationships, making them a suitable candidate
for a study focused on managing relationships in data sets. Furthermore,
these graph types allow data to be organised without predefining a
schema and hence offer the flexibility to be adapted to changing re-
quirements. These inherent characteristics of labelled property graphs
are the reasons why they are utilised to store interrelated data in this
study.
A comprehensive comparison of database persistence systems entails
a multifaceted analysis involving numerous evaluation metrics and
factors. An exhaustive exploration of every conceivable metric and its
implications is a task that would surpass the practical limitations of a
single article. Consequently, this study narrows its focus to a qualitative
assessment of the database design process and a quantitative assessment
based on query execution time. Execution time is a critical parameter
that is universally relevant across a broad spectrum of use cases, making
it a pragmatic choice for this study. This focus is particularly relevant
from the standpoint of end users in engineering domains, where time-
efficient data retrieval is paramount for user satisfaction. By
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
6
concentrating on this aspect, the study aims to shed light on a funda-
mental component of database performance, providing valuable insights
that are broadly applicable yet manageable within the scope of this
research.
The overall evaluation is systematically divided into two distinct
sections.
•A qualitative evaluation of the design and maintenance of the
database systems. In this assessment section, a comparison is made
regarding the data structuring capabilities offered by the two data-
base models when managing building and environmental data sets.
Additionally, the effort and cost involved in populating, maintaining,
manipulating and updating the database systems are examined.
•A quantitative evaluation of the database systems’ performance
in retrieving data. This evaluation section involves conducting ex-
periments to evaluate the cost and efficiency of retrieving data from
the databases. Key parameters that are emphasised in this perfor-
mance comparison are the level of relationship traversed by the
queries and the size of the data sets to assess scalability. Various
combinations of these parameters are used in the experiments to
account for different potential use cases.
The research utilised MySQL as a relational DBMS and Neo4j as a
graph DBMS, with both systems occupying prominent positions in their
respective categories [27]. Neo4j adopts a property graph as its un-
derlying technology, and it has been in the market for a longer time than
most other graph database systems, suggesting potential advantages in
terms of stability and maturity. Similarly, MySQL, with its long presence
in the market, is expected to offer similar advantages. Furthermore, both
systems are supported by an active community and possess extensive
documentation and other supporting resources crucial for effective uti-
lisation. SQL is used to interact with MySQL, while Cypher, a graph query
language, is used to communicate with Neo4j. It should be noted that the
evaluation does not focus on the DBMS themselves but rather on the
underlying data model they employ to persist data. However, it is
imperative to recognise that the choice of DBMS to implement a
particular data model can have some level of influence on performance.
This is due to the difference in optimisation features and underlaying
architecture between DBMSs. Such variations can result in disparate
performance outcomes, even if a similar data model is implemented.
Therefore, selecting DBMSs that are different from the ones utilised in
this study may affect the outcomes of the study to some degree.
All the tests in this study are performed on a single computer
equipped with an Intel(R) Core (TM) i7-10750H CPU @ 2.60 GHz 2.59
GHz processer and 32.0 GB RAM. Measures were taken to maintain
consistent system configurations throughout the tests and to ensure that
both DBMSs had similar access to the system’s memory and processing
capabilities. These measures entail allocating an equal amount of
memory to both DBMSs and ensuring that the DBMS under evaluation is
the sole application running on the system when it is being tested.
Furthermore, performance tests were conducted several times, and
average values were obtained from the last ten tests for the comparisons.
Indexes were utilised throughout the database management to ensure
the efficient operation of both database systems. Despite our efforts to
ensure a comparable configuration for both DBMSs in the interest of
fairness, it is important to acknowledge that we cannot assert the sys-
tems have completely identical configurations. This is due to the
inherent differences between the two DBMSs, as they are two entirely
distinct software systems, each with its own unique settings and con-
figurations. Therefore, while we strive for fairness in our comparative
analysis, some variances inherent to each DBMS’s design and architec-
ture may influence the outcomes, underscoring the complexity of con-
ducting a perfectly balanced comparison between such diverse
technologies.
3.2. Data sets
This section delves into the four real-world data sets that are used for
the comparative analysis. Two sets of building data sets and two sets of
city data sets are acquired for this study. In both instances, one data set is
of a small scale, while the other is comparatively larger. This is done
intentionally in order to observe how the scalability of the target data-
base systems is affected by varying dataset sizes. Consequently, in the
context of the city data set, one small and one large city data model is
sought after. Subsequently, the publicly available model of the city of
Espoo, which contains around 60,000 buildings, is selected as a small
building data set. Then, for a large city data set, after a thorough com-
parison of available options, the city model of Tokyo, which is consid-
ered among the largest cities in the world, housing more than 1.7 million
buildings, is selected. This data set is significantly larger in scale
compared to the Espoo data set, and it is believed it can highlight the
impact of data set size on the performance and scalability of the target
databases. Both city data sets are publicly available [28,46]. Similar
considerations were applied to the selection of building data sets, where
a small building model is paired with another model with a compara-
tively greater scale. Both building data sets are from real-world projects
and were shared privately using IFC models for use in this research.
Hence, sharing these data sets will require explicit permission from the
project/data owners. More details about all of the data sets are given in
the subsequent section. It is recognised that these data sets might not be
as large in scale as data sets found in other domains. However, they are
real-world data sets that reflect the realities of the target domain.
The building data sets are derived from building models stored in the
Industry Foundation Classes (IFC) standard. IFC is an open-source inter-
national standard developed by buildingSMART International with the
objective of facilitating data exchange between Building Information
Model (BIM) software tools [45]. Currently, over 400 BIM applications
support the IFC standard, thereby enabling practitioners to exchange
and share valuable building data throughout the design, construction
and operation of buildings [29]. The standard is capable of representing
data about physical components, spatial elements, project structures,
involved actors and analytical items [30].
The city data that is used in this study is obtained from city models
published in accordance with the City Geography Markup Language
(CityGML) data standard. CityGML is an internationally recognized data
standard approved by the Open Geospatial Consortium (OGC) for the
storage and exchange of 3D city and landscape models [47]. It has the
capability to represent both human-made structures (such as buildings,
tunnels, bridges, roads, and railways) and natural features (including
terrains, vegetation, and water bodies) within an urban environment.
Presently, over 50 cities and regions spanning 18 countries globally have
made their urban data publicly accessible through the CityGML stan-
dard, thus providing invaluable environmental data to researchers and
practitioners [31]. In addition to CityGML, this study employed the
crowdsourced geographic database OpenStreetMap (OSM) as an addi-
tional source of environmental data [32]. This service offers a collab-
orative platform featuring a freely editable map database that covers the
entire world [33]. OSM represents a wide range of physical features,
including buildings, natural features, amenities, commercial establish-
ments, transportation infrastructures, energy distribution in-
frastructures, water distribution infrastructures and various other
categories. The rich and varied geodata it offers have been applied in
various fields, including disaster management, routing and navigation
services, tourism, leisure, and research [34].
The structure of IFC and CityGML data standards is used in this study
to design database systems. It is recognised that the principal objective
of these open standards is to facilitate the seamless exchange of data
among diverse software applications [45,47]. In typical situations,
software systems possess proprietary internal data models customized to
their particular functionalities and requirements. They utilise these in-
ternal models to create, structure and persist data in database systems.
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
7
However, in the context of this study, the input data is already defined
and structured using these open data standards. As a result, the design of
database systems in this study followed the structure provided by these
data standards rather than developing an entirely new data model. This
approach enabled the precise storage of input data while minimising the
need for restructuring processes that pose the risk of data loss or
distortion. Following the extraction of the necessary building and
environmental data from the aforementioned data sources, two building
and two environmental data sets are composed. Each of these data sets is
subsequently stored in both a relational database and a graph-based
database for the evaluation. The subsequent sections provide a
description of each of these data sets.
3.2.1. Small building data set
The smaller of the two building data sets is obtained from the Torre
Turina building model, which is a residential building belonging to the
Cuatro de Marzo pilot case district in Valladolid, Spain [35]. The
building consists of 12 floors and 284 rooms. The building model is
acquired in IFC format. For the purpose of the study, representations of
3,528 physical building elements, 700 element types, 348 spaces and
108,522 property items are extracted from the IFC file. Furthermore,
44,093 relationships, which include relationships between building el-
ements, between building elements and spaces, and between building
elements and their type, as well as their properties, are extracted from
the building model.
3.2.2. Large building data set
The second building data set used in the study belongs to a larger
office complex situated in Espoo, Finland, whose model files were pro-
vided by Trimble Inc. The data set is made up of multiple IFC files that
correspond to architectural, structural, electromechanical, sanitary and
HVAC models. From these files, a total of 371,693 building elements,
1,047 spaces, 14,794 object types and 6,323,802 property items are
extracted. Moreover, the data set includes 1,572,303 relationships that
consist of relationships between building elements, between building
elements and spaces, and between building elements and their type, as
well as their properties.
3.2.3. Small city data set
The Espoo 3D city model is used in this study as a source of a small
city data set. The data set is openly shared by the city of Espoo, and it can
be retrieved from the source listed in the reference section of this paper
[28]. The model is shared in a format that adheres to the CityGML 2.0
standard and includes various objects such as buildings, vegetation,
water bodies, roads, city furniture, land use and other objects that are
found within the city. Most of these city items share a common set of
attributes such as id, class, usage, function, and GPS coordinates. At the
same time, most objects also have attributes that are specific to their
category. However, there are some building attributes as well as city
features (such as fire hydrants) that are not available in the city model.
To address this, the dataset is later enriched by gathering additional data
from OpenStreetMap and integrating it with the existing data from the
city model. The final data set consists of 63,042 buildings, 130,341
vegetation items, 12,552 land use features, 5,325 city amenities and 767
fire hydrants. Furthermore, some additional information is generated
through computations. This involves identifying buildings in proximity
to each other and determining fire hydrants nearby buildings. Coordi-
nate points of the objects are used for the computation. The computation
yielded 981,742 building-to-building proximity relationships and
35,051 hydrant-to-building proximity relationships.
3.2.4. Large city data set
The Tokyo city model is used as a source of a larger city data set in
the comparative study. This publicly available model encompasses all 23
wards of Tokyo and spans 627.57 square kilometres. The data set can be
obtained from a public repository that is given in the reference section of
this paper [46]. It is shared in CityGML 2.0 format and includes
1,768,233 buildings along with their attributes. These attributes
encompass various aspects such as geometry (height and roof area),
position (latitude and longitude), address (town, district, zone, and
prefecture) and flooding risk. Using a similar approach employed on the
Espoo city data, the coordinates of buildings in the Tokyo city model are
utilised to identify proximity relationships between buildings, resulting
in a total of 41,871,173 relationships between adjacent buildings.
3.3. Evaluation design
3.3.1. Designing and maintaining the database systems
Upon the acquisition and processing of the test data sets, the data-
base design ensued. The process of designing a database system has
inherently subjective elements. The designers’ expertise, as well as
preference, plays some role in design choices. However, objective
principles and best practices are followed in this study to arrive at a well-
designed database. One of the principles is avoiding storing redundant
data since it can inflate database size and lead to inconsistencies [36].
Another best practice implemented in this study is ensuring all entries
are atomic or indivisible. This implies, for instance, storing each part of a
building address (such as street name, building number and postal code)
separately instead of saving the full address as a single entry. This en-
sures individual entities can be retrieved and utilised. Another guiding
principle is ensuring the completeness of data stored in both target
database systems. This, in addition to being a good design principle,
ensures the database comparison is fair since the same volume of data
will be stored in the target database systems.
In addition to the general principles discussed in the preceding
paragraph, additional design strategies tailored for each database sys-
tem are implemented. For relational databases, the initial step involves
defining schemas that delineate the structure of the database. The test
data sets contain numerous entities, each possessing multiple attributes
and interrelations with one another. These concepts steered the design of
the database schemas, guiding the creation of tables and columns and
the definition of various constraints related to data types and relation-
ships. For the transformation of building data from IFC models, the IFC
entities, such as IfcBuildingElement, IfcBuildingStorey and IfcPropertySet,
are turned into entity tables. Similarly, for the city data extracted from
CityGML, the entity classes such as AbstractBuilding, CityFurniture and
SolitaryVegetationObject class are transferred into separate tables. In both
cases, attributes of the entity classes are added to the relational tables as
columns. The unique keys of each record, obtainable from the original
data sources, serve as primary keys in the entity tables.
Furthermore, in addition to entity tables, association tables are
created to store relationships between entities. In the context of IFC
data, relationship-oriented classes such as the IfcRelAggregates, IfcRel-
ConnectsElements and IfcRelDefinesByType are transformed into associa-
tion tables, along with their attributes stored as columns. Similarly, for
the city dataset, an association table is established to store the proximity
relationship between buildings, which is calculated using the location of
buildings provided by the CityGML files. These association tables utilise
foreign keys that refer to the primary keys of records in the entity table
to establish relationships. Inverse relationships are explicitly defined in
the table to ensure relationships between entities can be traversed in
both directions. Finally, to enhance query efficiency, all columns that
are going to be used to filter tables are indexed.
The design of the labelled property graph databases is also guided by
the entities, attributes, and relationships present in the test datasets.
These elements informed the formulation of nodes, labels, properties, and
edges (representing relationships) within the graph databases. The
design adheres to the official data modelling guidelines provided by
Neo4j to ensure a comparable design with its relational database
counterpart [37]. Nodes are used to represent entities, equivalent to the
rows of the entity tables in the relational database. While entities are
typically organized across multiple tables in relational databases, the
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
8
nodes are similarly organised using different labels. Moreover, entity
attributes, which are represented by column fields in the relational
database, are represented using node attributes in the graph database.
Unlike relational databases, nodes within graph databases are not
required to possess an identical set of fields, allowing for the inclusion of
only relevant node attributes. Relationships within the graph database
are depicted using named edges (arrows), which can be assigned prop-
erties. These edges serve as equivalents to the association tables and
foreign key columns utilised in the relational databases. All attribute
column fields present in association tables are mirrored as properties
within the corresponding relationship edges in the graph database.
These edges can be assigned directions and traversed bidirectionally,
thus eliminating the need to define inverse relationships explicitly.
Finally, the node and edge properties that will be used for querying are
indexed to ensure efficient querying.
The preceding paragraphs have provided an overview of the general
principles as well as the database-specific design methodologies adopted
in this study. These principles allowed the design of efficient yet com-
parable relational and graph database systems that are tailored to the
requirements of the research. The upcoming section will provide a more
in-depth explanation of the steps taken to create each individual data-
base system for each of the four data sets used in the study. Samples
extracted from both the relational and graph-based databases will be
included in the discussion.
3.3.1.1. Managing city data sets. The Tokyo city data set includes a long
and structured list of buildings that share a common set of attributes
such as height, roof area as well as data about regional organisation.
Consequently, in the relational database, a single table is created that
lists all the buildings along with their respective attribute as columns.
Concurrently, nodes representing each building in the city are created in
the graph database, where each node is assigned a set of properties that
store building attributes. A sample extract from the relational database
storing the Tokyo city data is presented in Table 1. Similarly, an extract
from the graph database storing similar data is presented in Fig. 1.
In the case of the Espoo city data set, the city model from CityGML
encompasses not only buildings but also other environmental entities
such as vegetation, city amenities, and various land use features. While
the entities within the same group share similar attributes, entities
belonging to various categories possess some distinct attributes. Thus, in
the relational database, separate tables are created for each environ-
mental entity group, with columns representing their attributes. Build-
ings, for instance, possess multiple attributes, including address,
occupancy type, height, and location in terms of latitude and longitude.
The building address is given in the model in a single line that includes
the street name, building number and postal code. To ensure atomicity,
this attribute is translated into three fields, each representing each part
of the address. Meanwhile, in the graph database, nodes are created for
all environmental entities and assigned labels that refer to their type,
Table 1
Sample extract from the building table in the larger (Tokyo) city SQL database.
Id
(Primary key)
Height Building area Districts and
zone
Survey
year
13120-bldg-89402 6.9 53.66159 3 2016
13120-bldg-90754 9.5 133.90931 3 2016
13120-bldg-90061 7.4 84.86166 3 2016
13120-bldg-89798 7.3 184.41252 3 2016
13120-bldg-90553 9.4 30.26722 3 2016
−– −– −– −– −–
Fig. 1. Sample extract from the large (Tokyo) city graph database presenting building nodes, their labels, relationships and properties.
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
9
such as building or vegetation. Samples extracted from the building data
table are presented in Table 3 with a few selected columns. Similarly, a
comparable extract from the graph database is given in Fig. 2. In both
the Tokyo and Espoo test cases, a schema is carefully designed for the
relational databases, determining the precise structure of the tables and
imposing constraints on the records eligible for insertion into these ta-
bles. Conversely, in the case of graph-based databases, the data sets are
introduced immediately following the creation of the databases without
defining a schema.
Once the environmental entities and their attributes from the city
data sets are inserted into the database systems, the next step involves
storing the relationships among these entities. In relational database
systems, association tables are created to represent these relationships
between city entities, which exhibit many-to-many characteristics.
Foreign key restrictions are utilised to reference the main tables in the
association table as well as to enforce the link. Validating these re-
strictions during data insertion by the DBMS introduces additional
processing time to the procedure. Given that proximity between build-
ings is a bidirectional association, two rows are created to represent each
direction of such relationships, resulting in a record count that is twice
the number of available relationships. In the case of the graph-based
database, the procedure for establishing relationships between envi-
ronmental entities involves retrieving each target node through read
queries, followed by the assignment of the relationship. Indexes are first
created to enhance the efficiency of retrieving nodes. In contrast to the
relational database, bidirectional relationships can be represented using
a single edge in the graph-based database. Hence, a single edge is suf-
ficient to represent a relationship between two entities, avoiding the
need to define inverse relationships. Sample extract from the association
tables from the Tokyo and Espoo data set is presented in Tables 2 & 4,
respectively. Meanwhile, Figs. 1 & 2 demonstrate how the same
Fig. 2. Sample extract from the small (Espoo) city graph database presenting building nodes, their labels, relationships, and properties.
Table 2
Sample extract from the nearby association table that stores proximity re-
lationships between buildings in the large (Tokyo) city SQL database.
Id
(Primary key)
Building Id 1
(Foreign key)
Building Id 2
(Foreign key)
67637903 13120-bldg-89402 13120-bldg-90061
67682035 13120-bldg-89798 13120-bldg-90061
67682040 13120-bldg-89798 13120-bldg-90553
67738056 13120-bldg-90061 13120-bldg-90553
67738059 13120-bldg-90061 13120-bldg-90754
−– −– −–
Table 3
Sample extract from the building data table in the small (Espoo) city SQL database.
Id
(Primary key)
Street House Number Postal Code Year of Construction Structure Type of use
14219 Meripoiju 4 02320 2007 Concrete Parking
14220 Meripoiju 3 02320 1972 Concrete Apartment building
14223 Meripoiju 1 02320 1973 Concrete Apartment building
14254 Kivenlahdenkatu 4 02320 1973 Concrete Apartment building
14253 Kivenlahdenkatu 6 02320 1973 Concrete Apartment building
−– −– −– −– −– −–
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
10
relationship is stored in the graph databases.
After completing the design of the database systems for the Espoo
city data set and populating them with the structured data originating
from the CityGML model, we proceeded to introduce additional data
from OpenStreetMap that is characteristically unstructured. This step is
included to evaluate how each database system adapts to and manages
the incorporation of data that deviates from its original design. This
approach provides insights into the systems’ capabilities in handling
scenarios that involve irregular or evolving data structures. The sup-
plementary data from OpenStreetMap introduces new city elements like
fire hydrants and extra attributes to the existing buildings. To incorpo-
rate the data related to fire hydrants, a new table is established within
the relational database, listing all fire hydrants within the city along
with their associated attributes. Additionally, another table is set up to
capture the relationship between each hydrant and the buildings it
serves. In parallel, within the graph database, new nodes labelled ’Hy-
drants’ are instantiated to represent individual fire hydrants. Subse-
quently, these hydrants are interconnected with the buildings they
service through relationship edges. An excerpt of the hydrant table, as
well as an association table that relates hydrants to the building they
serve in the relational database, is displayed in Table 5. Meanwhile, a
representation of similar data in the graph database can be seen in Fig. 3.
The newly acquired data from OpenStreetMap also includes building
attribute data that vary significantly from one building to another. For
instance, while some civic buildings possess official names in Finnish,
Swedish, and English, most city buildings lack these attributes. Such
variability in attributes is pervasive throughout the dataset. In the
design of the relational database, a new table is created to store all the
newly added building property data. Additionally, properties extracted
from CityGML, apart from the address, are transferred from the initial
building data table to this new property table. The rationale for this
division is to maintain a lightweight address table, which will be utilised
Table 4
Sample extract from the nearby association table that stores proximity re-
lationships between buildings in the small (Espoo) city SQL database.
Id
(Primary key)
Building Id 1
(Foreign key)
Building Id 2
(Foreign key)
1775022 14219 14223
1775029 14219 14254
1775020 14219 14220
1775161 14223 14254
1776059 14253 14254
−– −– −–
Table 5
Sample extract from the hydrant table (top) and the association table (bottom)
that relates a hydrant with the building it serves in the small (Espoo) city graph
database.
Id (Primary key) Latitude Longitude Type Position
4300534015 60.1547 24.7054 Underground
4300534016 60.1548 24.7388 Underground
4300534017 60.1548 24.774 Underground Lane
4300534018 60.1549 24.6307 Underground
4300534019 60.1549 24.6395 Underground
— — — —
Id (Primary key) Building Id (Foreign key) Hydrant Id (Foreign key)
23232 14252 4300534018
23233 14252 4300534020
23234 14253 4300534018
23235 14254 4300534018
23236 14255 4300534018
— — —
Fig. 3. Sample extract from the small (Espoo) city graph database presenting building nodes, fire hydrant nodes, their labels, relationships, and properties.
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
11
to identify buildings. In subsequent stages of testing, this table will be
used jointly with the proximity table to identify adjacent buildings.
Thus, by keeping the table lightweight, the cost of joins will be mini-
mised. Overall, incorporating the newly obtained building attribute data
into the Espoo relational database necessitates adding a new table,
modifying an existing table and migrating some data from the existing
table to the new one, all of which require meticulous attention to ensure
data integrity. In contrast, integrating the newly obtained attribute data
into the graph database was comparatively straightforward. This is
achieved by assigning the existing building nodes their new attributes.
The approach described above for incorporating the new data into the
relational and graph database systems is one method among several
possible approaches. It’s important to note that individuals may choose
to follow different approaches, potentially yielding different perfor-
mance outcomes. Table 6 displays excerpts from the updated relational
database, featuring both the address and property tables. Similarly, a
sample from the revised graph database is illustrated in Fig. 4.
3.3.1.2. Managing building data sets. The building data sets extracted
from the IFC files comprise various building elements and features
interconnected by numerous relationships. The database systems
created to store this data are designed to closely adhere to the structure
defined in the IFC files. For both the smaller and larger building data
sets, a similar procedure is followed to design relational and graph
database systems.
In the case of the relational database, tables are created from IFC
entity classes representing building elements, spatial elements, object
types and property sets. The building elements table is formed based on
the IfcBuildingElement class, with each row storing instances belonging to
the class. While each of these instances belongs to a different subclass of
the IfcBuildingElement class, they are all stored in a single table for
simplicity and to minimise the use of join queries. Attributes of the
IfcBuildingElement class became columns in the building elements table
and they include global ID, owner history, name, predefined type, object
type and tag. Furthermore, a column is added to store the subclass of the
IfcBuildingElement class (such as IfcBeam, IfcColumn and IfcDoor) to
which each instance belongs to help categorise the entity table. After-
wards, another table is created to store object types, aggregating in-
stances of the IfcBuildingElementType class. Similar to the building
elements table, attributes of this class are translated into column fields.
Table 6
Sample extract from the modified Espoo city SQL database, depicting the address
(top) and the new properties (bottom) table.
Id (Primary key) Street House Number Postal Code
14219 Meripoiju 4 02320
14220 Meripoiju 3 02320
14223 Meripoiju 1 02320
14254 Kivenlahdenkatu 4 02320
14253 Kivenlahdenkatu 6 02320
— — — —
Id (Primary
key)
Building id (Foreign
key)
Name Value
142017 14254 Year of
Construction
1973
142018 14254 Structure Concrete
142019 14254 Type of use Apartment
building
142020 14254 Building levels 5
142021 14254 Roof shape Flat
— — — —
Fig. 4. Sample extract from the small (Espoo) city graph database presenting building nodes, their labels, relationships, and updated properties.
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
12
Next, a table is created to house data concerning the different spaces
within the building datasets, represented in IFC using the IfcSpace class,
while another table is created for storing building story data following
the structure of the IfcBuildingStorey class. Lastly, a table is devised to
store properties, with columns for an ID, property name, property set
name, and value. This single table accommodates properties for all
building elements, element types, and spaces. In the case of the graph
database design, the process adheres to the general procedure outlined
at the beginning of this section. Nodes and their properties are generated
according to the IFC class-attribute structure. These nodes are equiva-
lent to all the records in all the tables of the relational database. Labels
are subsequently employed to categorize these nodes, and they corre-
spond to the tables (i.e. table names) in the relational database. Since
multiple labels can be utilised, building elements are assigned a second
label to store their specific element type which is akin to the element
type column defined in the building elements table of the relational
database.
After designing the main tables in the relational database, the sub-
sequent step involves creating association tables to store selected re-
lationships from the IFC datasets. Individual association tables are
created to separately manage various relationships, including those
between decomposition elements (IfcRelAggregates), connected ele-
ments (IfcRelConnectsElements), elements and their types (IfcRelDefi-
nesByType), elements or element types and their properties
(IfcRelDefinesByProperties), elements and the spatial structure that
contains them (IfcRelContainedInSpatialStructure), a space and its
bounding elements (IfcRelSpaceBoundary) and a space and its covering
(IfcRelCoversBldgElements). Each of these tables is structured with
columns derived from the attributes of the respective IFC classes. Everly
table contains two foreign key columns that refer to the records that are
being associated. Complementing these columns are global ID, owner
history, and name column fields. To represent these relationships within
the graph database, named edges that can store properties are utilised.
The edge names correspond to the relationship types highlighted in this
paragraph, mirroring equivalent association tables in the relational
database. Similarly, the attributes of the IFC relationship classes, rep-
resented as column fields in the association tables, are transformed into
properties of the relationship edge in the graph database. Each rela-
tionship needs to be linked to an owner history node. However, Neo4j
only supports relationships between nodes. Therefore, as a workaround,
the ID of the relevant owner history node is included as a property in the
relationship edges. Similar to the database design for the city database,
it should be noted that these design choices are not the only possible
approaches for designing the databases. The guiding principle for the
design outlined here is aimed at closely following the IFC structure to
minimise structural alterations to the data sets. Table 7 presents a
sample from the building elements table extracted from the small
building relational database. Table 8 showcases examples from the as-
sociation table from the same database that stores the relationships
between interconnected building elements. Finally, Table 9 presents the
owner history table that is referred to in the other two tables. The data
organization in the larger building database adheres to a similar struc-
ture. Meanwhile, a sample extract from the small building graph data-
base presenting the connected building elements along with their label
and properties is given in Fig. 5.
The process described so far represents the effort done to store the
four real-world data sets in relational and graph databases. In a later
section, qualitative observations made during this database design and
Table 7
Sample extract from the building elements table in the small building SQL database.
Id
(Primary key)
Element type Owner
history
(Foreign key)
Name Object type Tag Predefined type
0I$5J2q9f4neHdIhXKwt0k IfcWall 1 Basic Wall 4M_External_basement 150906 NOTDEFINED
1PC0J7QuP1XvCE8kJzHa24 IfcWall 1 Basic Wall 4M_Interior_single_hollow_brick 155222 NOTDEFINED
1PC0J7QuP1XvCE8kJzHdv8 IfcWall 1 Basic Wall 4M_Interior_single_hollow_brick 155802 NOTDEFINED
1PC0J7QuP1XvCE8kJzHa3n IfcWall 1 Basic Wall 4M_Interior_single_hollow_brick 155171 NOTDEFINED
1PC0J7QuP1XvCE8kJzHaGW IfcWall 1 Basic Wall 4M_Interior_double_hollow_brick 154354 NOTDEFINED
−– −– −– −– −– −– −–
Table 8
Sample extract from the connected path elements association table in the small building SQL database.
Id
(Primary key)
Owner
history
(Foreign key)
Name Relating element
(Foreign key)
Related element
(Foreign key)
2St4Zrjrj6BgvhZ4uLX_AW 1 Structural 1PC0J7QuP1XvCE8kJzHdv8 0I$5J2q9f4neHdIhXKwt0k
3Z06H71mr5LvYai$WSxt6t 1 Structural 1PC0J7QuP1XvCE8kJzHdv8 1PC0J7QuP1XvCE8kJzHa24
225XQaH6L2gAbtm6vx9CSC 1 Structural 1PC0J7QuP1XvCE8kJzHa24 1PC0J7QuP1XvCE8kJzHa3n
0smQl3IGbAiQ2WeuOxOM$v 1 Structural 1PC0J7QuP1XvCE8kJzHa24 1PC0J7QuP1XvCE8kJzHaGW
1k5U80sVH6BR3TT5smPLg$ 1 Structural 1PC0J7QuP1XvCE8kJzHa3n 1PC0J7QuP1XvCE8kJzHaCD
−– −– −– −– −–
Table 9
Sample extract from the owner history table in the small building SQL database.
Id (Primary key) Owninguser id
(Foreign key)
Owning
application id
(Foreign key)
State Change
action
Last modified date Last modifying user Last modifying application Creation date
1 1 1 NOCHANGE 1549365521
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
13
manipulation process are presented. In the next section, a quantitative
evaluation of the target database systems is conducted by successively
increasing the complexity of relationships within both the building and
city data sets.
3.3.2. Quantitative evaluation of data retrieval performance
This section of the comparative study focuses on evaluating the data
retrieval performance of the database systems when dealing with
interrelated data sets. The relationships traversed and the size of data
sets are key control parameters in the evaluation. The assessment in-
volves executing a set of queries and measuring the amount of time
required by each database system to execute the query and retrieve the
requested data. Exemplary activities from the fire emergency domain
are used to illustrate the real-world application of the queries utilised in
the study. All queries are executed ten times to get an average execution
time, which is presented using multiple tables.
The performance tests conducted are grouped into three categories
based on their objective, which are:
1. Retrieving data without traversing any relationships
2. Retrieving data by traversing a single relationship.
3. Retrieving data by traversing several relationships.
3.3.2.1. Retrieving data without traversing any relationships. The first use
case to be considered is the retrieval of data from the data sets that do
not require traversing relationships. This is a critical use case to consider
since there are several scenarios in which required building and envi-
ronmental data can be obtained from a single table. In such scenarios,
only a list of physical or abstract features filtered by one or more attri-
butes is needed. For example, a firefighters’ information system might
request a list of windows from a building database to determine a point
of entry into a building. In those cases, retrieving necessary data does
not necessitate connecting multiple tables or nodes to traverse re-
lationships between data points. Hence, this evaluation case compares
the suitability of relational and graph-based database systems for use
cases where the required data is retrieved without traversing any
relationships. The experiment involves executing a series of read
queries, wherein the quantity of retrieved records increases with each
subsequent query. Consequently, the tests offer insights into the per-
formance of the database systems as the retrieved data becomes exten-
sive. Two datasets, a small building dataset and a large city dataset, are
utilised in the tests to evaluate the effect of database size on the per-
formance of the data persistent systems. For the building test case, a set
of queries that request building elements filtered by their element type
attribute are executed. The five queries used for the relational database
and graph database are given in Listing 1 & 2, respectively. Successive
queries progressively retrieve larger volumes of data from their
respective databases. Notably, comparable queries from both sets, such
as the first SQL query and the first Cypher query, yield identical results.
Furthermore, all attributes used as filters in these queries are indexed in
both database systems to optimise query performance.
Listing 1. The successive SQL queries that are used to retrieve data without
traversing any relationships in the small building database.
Fig. 5. Sample extract from the small building graph database presenting the connects path element relationship between building elements along with their label
and properties (the building element nodes are assigned the ‘BuildingElement’ label in addition to the labels visible in the figure).
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
14
Listing 2. The successive Cypher queries that are used to retrieve data without
traversing any relationships in the small building database.
For the city test case, a series of queries that retrieve a collection of
buildings filtered based on their district and zone attributes are executed
against the large (Tokyo) databases. The SQL and Cypher queries uti-
lised for this test are provided in Listing 3 and 4, respectively.
Comparative queries from both sets yield identical values. Successive
queries incrementally retrieve larger volumes of data from their
respective databases. All attributes utilised in these queries are indexed
in both database systems.
Listing 3. The successive SQL queries that are used to retrieve data without
traversing any relationships in the large city database.
Listing 4. The successive Cypher queries that are used to retrieve data without
traversing any relationships in the large city database.
3.3.2.2. Retrieving data by traversing a single relationship. The second
evaluation case considered in this study is the retrieval of data from a
database, which requires the traversal of a single relationship. In
numerous applications that utilise building and environmental data,
required data is often retrieved by traversing a few relationships. For
instance, fire service providers may seek a list of windows along with
their fire rating property from a building data set. In the relational
database created for this research, property sets are stored in a separate
table from the building elements table, hence necessitating a join to
access the required information. Similarly, firefighters often require a
list of fire hydrants located near a building, which requires a join be-
tween building and hydrant tables in our city databases. As a result,
performance tests are conducted in this section to evaluate the
comparative performance of the relational and graph database system in
managing use cases where few relationships need to be traversed to
acquire needed data. To this end, a series of queries are executed that
traverse a single relationship to retrieve data. Each query set gradually
increases the amount of retrieved data to assess its impact on query
performance. The queries are executed against both small and large data
sets to highlight the effect of building and environmental data set size on
the performance of the database systems. The set of queries executed
against these data sets are the following:
For the building test case, queries are run to fetch building elements
alongside their corresponding properties, which are stored in separate
tables or nodes. The queries are executed against the large building data
set. The SQL and Cypher queries used for this test are given in Listing 5
and 6. Comparative queries from both query sets return identical values.
All attributes used in these queries are indexed in both database systems.
Listing 5. The successive SQL queries that are used to traverse a single rela-
tionship in the small building database.
Listing 6. The successive Cypher queries that are used to traverse a single
relationship in the small building database.
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
15
In the city test cases, queries retrieving a list of buildings adjacent to
a target building are executed. The queries are executed against the
Tokyo city data set. The SQL and Cypher queries used for this test are
given in Listing 7 and 8. Comparative queries from both query sets re-
turn identical values. All attributes used in these queries are indexed in
both database systems.
Listing 7. The successive SQL queries that are used to traverse a single rela-
tionship in the large city database.
Listing 8. The successive Cypher queries that are used to traverse a single
relationship in the large city database.
3.3.2.3. Retrieving data by traversing several relationships. There are
several practical use cases in built environment management where it is
required to traverse several relationships within building and environ-
mental data sets. To illustrate, in the context of building emergency
management, there are situations where it becomes essential to trace
relationships starting from a specific point of interest. For instance, in
the case of indoor navigation, it is often crucial to find a route from the
building’s entry to the room affected by fire, which can be achieved by
utilising space and door adjacency relationships. By tracing the
connections among spaces and doors, fire service providers can effec-
tively manoeuvre from a particular space, such as the entry door, to-
wards the impacted area, or, in the case of rescuing individuals, they can
navigate from the affected space to an exit door. Similarly, when eval-
uating the propagation of smoke and fire from an affected area to the
rest of the building, traversing spatial relationships from the affected
space to the rest of the building becomes necessary.
Similar to the building data sets, environmental data sets can also
contain complex relationships that need to be traversed in order to
retrieve data that is needed for a given use case. For instance, during a
fire hazard, a fire could spread from an affected building to nearby
buildings, and smoke may propagate to adjacent areas. Identifying
nearby high-risk buildings, such as schools, hospitals, or shopping cen-
tres, can facilitate necessary actions to protect the occupants of those
buildings. This identification process requires traversing the adjacency
relationship between buildings, starting from the affected building.
Moreover, the capability to traverse spatial relationships between
environmental features can support firefighters as they explore and
navigate a complex environment to reach an affected building. These
examples underscore the importance of efficiently traversing complex
relationships that often exist in building and environmental data sets.
The tests in this section centre on assessing the influence of
traversing multiple relationships on the performance of relational and
graph database systems. Each test involves executing a sequence of ten
individual queries, with the number of traversed relationships
increasing in consecutive queries. The tests are conducted on both small
and large data sets to understand how performance is influenced by the
scale and complexity of building and environmental data sets. Four tests
are included in this category, each associated with one of the four test
cases.
For the small building data sets, queries that traverse the connection
between adjacent building elements are utilised. The initial three
consecutive SQL and Cypher queries employed for this test are provided
in Listing 9 and 10, with the remaining queries following a similar
pattern.
Listing 9. The first three queries from the successive SQL queries that are used
to traverse an increasing number of relationships from the small build-
ing database.
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
16
Listing 10. The first three queries from the successive Cypher queries that are
used to traverse an increasing number of relationships from the small build-
ing database.
A similar test was carried out against the larger building database
where the connection between building elements is traversed. The SQL
and Cypher queries used for this test are given in Listing 11 and 12.
Listing 11. The first three queries from the successive SQL queries that are
used to traverse an increasing number of relationships from the large build-
ing database.
Listing 12. The first three queries from the successive Cypher queries that are
used to traverse an increasing number of relationships from the large build-
ing database.
In the case of the city databases, queries are executed to traverse the
adjacency relationship buildings starting from a specific building. The
SQL and Cypher queries utilised for this purpose are provided in Listing
13 and 14 for the small city dataset and in Listing 15 and 16 for the
larger city database.
Listing 13. The first three queries from the successive SQL queries that are
used to traverse an increasing number of relationships from the small
city database.
Listing 14. The first three queries from the successive Cypher queries that are
used to traverse an increasing number of relationships from the small
city database.
Listing 15. The first three queries from the successive SQL queries that are
used to traverse an increasing number of relationships from the large
city database.
Listing 16. The first three queries from the successive Cypher queries that are
used to traverse an increasing number of relationships from the large city
database.
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
17
4. Evaluation results
4.1. Designing and maintaining the database systems
Managing City Data Sets: The following observations are made
from the qualitative assessment of the design and maintenance of the
target database systems.
•The process of populating the Tokyo city data set, which exhibits a
well-organised structure, into both the relational database and
graph-based database system was a relatively straightforward and
quick process. However, a disparity in the required effort is observed
when relationships are created. In the relational database, a new
association table was created and populated quite quickly. In
contrast, writing relationships to the graph-based database required
more effort as nodes needed to be first queried before the required
relationships were created. Overall, the implementation of the data
model was noticeably more straightforward in the relational data-
base compared to the graph-based database system.
•Additional unstructured data from OpenStreetMap was integrated
into the databases containing the Espoo dataset. This inclusion aimed
to observe how the systems adapt to changes in the data that need to
be stored. Consequently, the relational database that initially stored
the CityGML data underwent significant alterations to accommodate
the new data. Specifically, a new table was created to store the newly
added unstructured building properties separately from the table
containing the structured building address data. In contrast, the
integration of new unstructured datasets is notably straightforward
in the graph database. There, each pre-existing node, which repre-
sents a building, is readily allocated new attributes pertinent to it.
Notably, the data schema within the graph database dynamically
evolves alongside the inserted data. This stands in stark contrast to
the relational database system, where the schema is predefined prior
to data insertion. Consequently, in response to evolving re-
quirements, the relational database schema necessitates redefinition,
imposing additional overhead and complexity. Thus, the graph
database offered increased flexibility in managing the unstructured
data and accommodating changes in the data structure. In contrast,
the relational database ensures a clear and consistent data structure
is maintained by employing a predefined schema that is separated
from the data set.
•The average number of relationships created per second in both the
relational database and the graph-based database system are pre-
sented in Table 10 for both the Espoo and Tokyo data sets. The results
show how writing performance is affected by the database size. The
relational database system validates the relationships based on
foreign key restrictions as they are created, and hence, it is signifi-
cantly affected by the increase in database size. Meanwhile, the
graph-based database that queries nodes using indexes is minimally
affected by the increase in database size. Consequently, the relational
DBMS exhibited superior write performance when handling a
smaller database but lagged behind the graph DBMS as the database
size expanded.
Managing building data sets: Two use cases were considered in the
study to assess the management of building data from the IFC model in
the target data persistent systems. The following observations are made
from the study.
•The complex interrelationships exhibited in IFC building models are
precisely replicated in the graph-based database system in both use
cases. In contrast, when implementing this data structure in the
relational database, significant modifications were necessary to
transform it into a table-based structure. The implementation of
numerous tables and foreign keys was required to replicate the net-
worked nature of IFC data. Evidently, the complex relationships in
the IFC data are more intuitively represented and are more friendly
for humans to comprehend when using the graph-based database
system’s node-edge structure compared to the tables and foreign
keys used in the relational database system.
•The distinction in how relationships are managed by the two data-
base systems is also reflected in the structure and performance of
queries that traverse those relationships. Listing 9–16 show the
composition of queries written to traverse the relationships found in
the data set. Although both the SQL and Cypher queries fulfil the
same goal and retrieve identical results, there exists a noteworthy
contrast in their composition and execution. The script written for
the graph database demonstrates a notably higher degree of
conciseness in comparison to the SQL query. Navigating these re-
lationships was also much more straightforward in the graph-based
database, where the queries were much more concise. In contrast,
the relational database system relies on creating multiple joins to
construct the relationship between tables, resulting in verbose and
complex scripts. Furthermore, the graph-based database queries
reduce the cost of traversing relationships by isolating the starting
point of a relationship prior to establishing relationships originating
from it. On the contrary, the SQL queries perform expensive joins on
entire tables before filtering out unneeded data.
4.2. Query performance results
The results of the quantitative evaluation of read query performance
are presented in Tables 11 - 14. The following series of paragraphs
discuss these findings.
•Retrieving data without traversing any relationships: As the
experiment results presented in Table 11 demonstrate, when
retrieving data that does not require traversing any relationships, the
relational database demonstrated better performance compared to
the graph-based database across all executed queries. This result is
maintained whether the target data set is small or large in size. It can
also be observed from the results that as the size of data being
retrieved increases, the performance difference between the data
systems increases in a relatively uniform manner. Hence, retrieving a
Table 10
Comparing the time spent writing relationships in both the relational and graph-
based database systems.
City data
sets
No. of
buildings
No. of relationships
(between buildings)
No. of relationships
written per second
Relational
DB
Graph
DB
Espoo 63,042 981,742 57,742 35,400
Tokyo 1,768,233 41,871,173 15,350 33,335
Table 11
Execution time comparison for retrieving data without traversing any relation-
ships from a small building (a) and a large city (b) data set.
(a) (b)
Small Building data set Large City data set
No of
Retrieved
Records
Execution Time (Sec) No of
Retrieved
Records
Execution Time (Sec)
Relational
DB
Graph
DB
Relational
DB
Graph
DB
20 0.004 0.017 150,000 8.85 11.41
100 0.010 0.030 250,000 9.83 17.40
300 0.017 0.046 350,000 10.65 22.05
500 0.023 0.060 450,000 12.52 27.99
900 0.028 0.074 550,000 15.16 34.70
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
18
large number of records exhibited the largest performance difference
in the execution time of the database systems. For instance, Table 11
shows that the performance difference for the smallest number of
records was a fraction of a second, while for retrieving the largest
number of records, it is almost 20 seconds.
•Retrieving data by traversing a single relationship: Test results
presented in Table 12 demonstrate that test cases where it was
required to traverse a single relationship to retrieve data have mostly
similar results to the previous test case where no relationship
traversal is required. Here again, the relational database outperforms
the graph database regardless of the database size. However, the
performance difference is more pronounced when retrieving a more
significant number of records from the data sets.
•Retrieving data by traversing several relationships:
o Small building data set: All queries in this test case were
executed successfully in both data persistent systems (Table 13).
The increase in relationships traversed by the queries had a
negligible impact on their performance, as the increase in execu-
tion time proved to be very small. Although the SQL database
consistently exhibited faster execution times compared to the
graph database, the performance gap observed was also minute,
considering that both systems managed to retrieve the requested
data within a fraction of a second (less than ten milliseconds).
o Large building data set: The increase in the building data set has
a notably more adverse effect on the relational database when
compared to the graph database, as indicated in Table 13. The first
nine queries were successfully executed in the relational database,
where the performance demonstrated a marked decline in the
latter queries. The last query in the set, which required traversing
ten relationships, failed to produce results within a 3-hour time-
frame (10800 seconds) and was consequently terminated. In
contrast, the graph database returned data for all queries, with
only a marginal increase in execution time as the number of re-
lationships increased. Evidently, the performance was minimally
impacted by the increase in the data set size, as all tests were
successfully completed in under a second. It is worth noting from
the results that when traversing up to three relationships, the two
data persistence systems demonstrated a comparable perfor-
mance, with the relational database performing slightly better.
Nevertheless, upon surpassing the third relationship level, the
graph database outperformed the relational database system by
increasingly substantial margins.
o Small city data set: As depicted in Table 14, the traversal of
interrelated environmental data, even within a small city data set,
incurred a performance cost to the relational database. The
executed queries resulted in a short execution time while
traversing up to three relationships. However, this duration
notably increased when traversing four or more relationships. The
test could not be completed for more than five relationship levels
as the execution time ran beyond 3 hours. In contrast, the graph
database executed all queries within a second, exhibiting only a
marginal increase in execution time as the number of traversed
relationships increased.
o Large city data set: The trends observed in the small city data set
are further validated when the same test is conducted on a larger
city data set, as can be seen presented in Table 14. Only four out of
the ten tests could be completed for the relational databases due to
the extensive execution time resulting from traversing relation-
ships, which surfaced early this time owing to the larger database
size. Notably, there was a significant increase in execution time as
the number of relationships increased from three to four. Beyond
that point, when the queries involved traversing more than four
relationships, the relational database failed to produce results
within 3 hours, leading to the termination of the test. In contrast,
the graph database maintained a comparable level of performance
with only a marginal increase in execution time.
5. Discussion
The selection of a suitable data persistent system for a given appli-
cation requires a thorough consideration of the application’s re-
quirements and the nature of the data to be stored. Each database system
possesses distinct strengths and weaknesses that make it suitable for
certain tasks while unsuitable for others. Although there are existing
comparative studies covering multiple domains, there is a lack of
comparative studies within the context of managing building and
Table 12
Execution time comparison for retrieving data by traversing a single relationship
from a small building (a) and a large city (b) data set.
(a) (b)
Small Building Data Set Large City Data Set
No of
Retrieved
Records
Execution Time (Sec) No of
Retrieved
Records
Execution Time (Sec)
Relational
DB
Graph
DB
Relational
DB
Graph
DB
100 0.004 0.012 5 0.0008 0.0033
1,000 0.350 0.818 25 0.0014 0.0054
20,000 0.558 1.248 50 0.0024 0.0066
30,000 0.859 1.993 75 0.0034 0.0088
40,000 1.154 2.853 100 0.0046 0.0102
Table 13
Execution time comparison for traversing an increasing number of relationships
– small and large building data sets.
No. Relationships
Traversed
Execution Times (Sec)
Small Building Data Set Large Building Data Set
Relational
DB
Graph
DB
Relational DB Graph
DB
1 0.0007 0.0033 0.001 0.004
2 0.0009 0.0034 0.002 0.005
3 0.0009 0.0039 0.006 0.005
4 0.0010 0.0039 0.024 0.005
5 0.0014 0.0044 0.174 0.010
6 0.0015 0.0044 0.798 0.013
7 0.0015 0.0058 5.434 0.019
8 0.0022 0.0069 28.305 0.038
9 0.0033 0.0092 209.427 0.076
10 0.0056 0.0104 >10 800
(Incomplete)
0.164
Table 14
Execution time comparison for traversing an increasing number of relationships
– small and large city data sets.
No. Relationships
Traversed
Execution Times (Sec)
Small City Data Set Large City Data Set
Relational DB Graph
DB
Relational DB Graph
DB
1 0.003 0.014 0.010 0.015
2 0.023 0.041 0.076 0.063
3 0.835 0.062 4.263 0.138
4 46.373 0.099 443.249 0.194
5 3149.000 0.137 >10 800
(Incomplete)
0.241
6 >10 800
(Incomplete)
0.157 −0.282
7 −0.199 −0.276
8 −0.211 −0.384
9 −0.233 −0.486
10 −0.249 −0.554
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
19
environmental data. This research addressed this gap by conducting a
comparative evaluation of relational database systems and graph-based
database systems, specifically in the context of handling interrelated
buildings and environmental data.
Comparing database systems in their entirety is an extremely chal-
lenging task due to the countless factors that can impact a system’s
performance. Variables such as data volume, query tuning and optimi-
sation, DBMS configurations, and the requirements of the use case used
for evaluation can all significantly influence the results of any compar-
ison. Recognizing these challenges, this study does not attempt to pro-
vide an exhaustive benchmark that covers all aspects of database
performance. Instead, it narrows its focus to compare the two types of
database systems based on a select set of criteria deemed most relevant.
This study employs query processing time, relationship complexity, and
dataset size as its primary metrics for comparison. Additionally, it in-
cludes a qualitative evaluation centred on the design process and
adaptability to evolving requirements. The study is conducted using
real-world data sets based on real-world use cases. This targeted
approach allows for a more manageable and meaningful comparison
that can yield practical insights for professionals and researchers
working in this specific area. The following sections delve into the
practical implications derived from the comparative study, accompa-
nied by an examination of its limitations and suggestions for future
research directions.
5.1. Practical implications
In the first half of the evaluation, a relational database and a graph
database are designed to store building data extracted from an IFC
building model and environmental data obtained from CityGML city
models and OpenStreetMap. The findings from this evaluation under-
score the critical importance of data structure and interrelationships in
the selection of a database system for storing and managing building and
environmental data. It was observed that the building data derived from
the IFC models, possessing significant levels of interrelationships,
particularly on the larger data set, was well represented using the graph
database. The graph database’s inherent flexibility in managing inter-
connected data contrasts sharply with the relational system, wherein
relationships were split into multiple tables, potentially complicating
data retrieval and analysis. When it comes to environmental data, Cit-
yGML yielded relatively structured data with limited interrelationships,
which was efficiently stored in relational tables. However, the intro-
duction of additional environmental data from OpenStreetMap pre-
sented significant challenges due to its unpredictable structure and
variable data fields. The relational database required substantial modi-
fications to integrate this data, as it deviated significantly from the
original schema designed during the initial database design. In stark
contrast, the graph database, with its flexible nature, enabled a more
seamless integration of this diverse and unstructured data. Based on
these findings, it can be inferred that the choice between a relational or
graph database should be guided by the nature of the data to be stored.
For building and environmental data with intricate interrelationships
and evolving structures, the flexible and adaptive nature of graph da-
tabases offers a distinct advantage. In contrast, relational databases may
be more suitable for data with a well-defined, stable structure and fewer
interconnections. This insight is crucial for engineers, urban planners
and other practitioners who rely on efficient management for building
and environmental data, urging a thoughtful consideration of their
data’s inherent structure.
The second phase of the evaluation was centred on the extraction of
necessary data from the database systems by efficiently traversing re-
lationships. The test results indicate that, for tasks requiring minimal or
no relationship traversal, the relational database demonstrates superior
performance, irrespective of data set size. This makes them particularly
well-suited for applications with straightforward data retrieval needs,
where the complexity of relationships between data entities is minimal.
However, as the number of relationships traversed by queries increases,
the performance advantage often shifts to the graph database since the
relational database’s performance is negatively impacted by each
additional relationship it needs to traverse. This is due to the inherent
nature of labelled property graph databases, which are optimised for
handling interconnected data, thereby minimizing the performance
degradation associated with complex relationship traversals. However,
the size of the datasets influences the extent to which the performance of
the database systems is affected by relationship traversal. When
employing a small building dataset, the relational database continues to
exhibit better performance than the graph database. This suggests that
for smaller-scale building data sets, relational databases might still offer
the best performance, even when relationships are a component of the
data model. Conversely, when dealing with large building and envi-
ronmental datasets, the performance of relational databases is signifi-
cantly compromised as the volume of relationships increases. In extreme
cases, especially when traversing several relationships within city
datasets, the relational database failed to provide values within a
reasonable timeframe. This underscores the importance of graph data-
bases for use cases with large-scale relationship-intensive data sets,
where the ability to efficiently navigate complex networks of relation-
ships is paramount.
It’s important to clarify at this point that this study is not advocating
for the outright dismissal of relational databases from managing inter-
connected data. Practitioners can employ various strategies to minimize
or avoid the use of joins in relational database systems to traverse re-
lationships. Such techniques can include merging tables or incorpo-
rating redundant data within tables to eliminate the necessity for joins,
storing the precomputed results of complex queries that involve joins to
obviate the need for real-time execution, and caching the results of
frequently executed joins. Additionally, designing the database schema
to enable common data requests to be fulfilled with fewer joins and
optimising queries for efficiency are other crucial strategies. All these
approaches, however, often entail trade-offs with regard to database
integrity, system complexity, storage requirements, and other factors.
Consequently, this study encourages practitioners to adopt alternative
technologies that are better suited for traversing complex relationships,
namely the labelled property graph.
This study has demonstrated that the performance of relational
database systems and graph database systems varies significantly when
utilised for different tasks, underscoring that the choice of a data storage
solution is not a one-size-fits-all decision. Although the study discusses
the optimum selection of database systems, it is important to recognise
that in real-world scenarios, the complexity of software systems often
necessitates the integration of multiple data persistence technologies.
Each technology in such system architecture is chosen for its optimal
alignment with particular requirements. In a collaborative environment,
a relational database system could manage structured data with a clearly
defined schema, while a graph database could be used to manage
complex relationships between data points. Therefore, the overarching
aim of this study is not to suggest the superiority of one system over
another but rather to highlight the importance of strategic selection of
data persistence systems based on specific requirements within a
collaborative environment. This collaborative approach allows for the
harnessing of each database system’s strengths, thereby enabling a more
efficient software infrastructure that can meet varying requirements.
5.2. Limitations and future work
While this comparative study endeavours to provide valuable in-
sights regarding the comparative advantages and limitations of data
persistent systems, it also has some known limitations with regard to the
data sets utilised, the configuration of the DBMSs, the optimisation of
the queries and the assessment metrics. The building and environmental
data sets employed in the study are exclusively comprised of static data.
Nonetheless, a significant volume of real-time data is generated at the
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
20
building and city levels. For example, the sensor and detector systems in
buildings produce live data and status updates related to various envi-
ronmental factors like temperature, air quality, and potential hazards.
Moreover, building management systems responsible for monitoring
HVAC systems, fire safety systems, and other utility systems can also
generate a substantial amount of live data [38]. Similarly, city-level
sensors generate a significant volume of live data, catering to diverse
use cases such as traffic management and public safety [39]. Conse-
quently, future research works should delve into the assessment of
relevant database systems with regard to managing real-time data.
In acknowledging the factors that can influence the outcomes of the
database comparison presented in the study, it’s crucial to highlight the
significance of decisions related to database configuration and query
optimisation. The extent of tuning and optimisation applied to the target
database systems can potentially lead to disparities that may not solely
reflect the inherent capabilities of the database technologies themselves.
Efforts were made in this study to apply comparable optimisations to
each database system under review, such as the implementation of
indexing, as detailed in Section 3.1. However, achieving identical levels
of tuning and optimisation is challenging due to the inherent differences
in the database systems’ underlying technologies, architecture, and
available optimisation settings. Moreover, the decisions related to
configuration and optimisation inevitably contain a subjective element
that is influenced by the experience and expertise of the practitioner.
Therefore, it is possible to take a different approach towards database
configuration and query optimisation than what is employed in this
study, potentially impacting some of the findings presented here.
An additional limitation of the study that is open for future research
is the measurement metrics and the exemplary use cases used for the
evaluation. The metrics that are the focus of this study are the agreement
of the data model of the database systems with the data to be stored, the
effort required to create and populate the database systems, the
composition of queries in each system and the time required to write and
retrieved interrelated data from the database systems. Nonetheless,
there are more indicators that should be considered for a full evaluation
of data persistent systems. For instance, transaction-related character-
istics play a pivotal role in use cases where numerous users are expected
to access the database simultaneously. The utilisation of mixed load
queries in database comparison is crucial for accurately simulating real-
world conditions where databases are required to handle simultaneous
read and write operations. This approach not only tests database sys-
tems’ efficiency in managing concurrent data access and manipulation
but also can reveal potential bottlenecks and performance trade-offs
between read and write operations. With an increasing number of
users, the database systems’ throughput, which is the number of queries
they can execute per unit of time, becomes an important indicator.
Furthermore, as the size of data sets grows, the scalability of the target
data persistent system becomes imperative. Hence, data partitioning and
distribution are also important factors. The inclusion of these and other
measurement metrics is essential for a fuller comparison of the database
systems. However, providing a comprehensive and thorough explora-
tion of the implications of each of the listed metrics would have
extended beyond the confines of a single article. Consequently, by
concentrating on processing time and database size for quantitative
analysis, this study is able to provide a sufficiently detailed presentation
of findings without compromising the rigour of analysis. In addition to
the measurement metrics, the use case scenarios that are used to guide
the study could also be expanded. The study primarily concentrated on
use cases that highlight the impact of complex data relationships on the
performance of data-persistent systems. Nevertheless, it is acknowl-
edged here that the management of building environmental data within
a data-persistent system can vary based on the nature of the application
area that is under consideration.
The scope of the assessment conducted in this study is focused on the
data models utilised by database systems, as opposed to the database
management systems (DBMS) themselves, namely MySQL and Neo4j.
Nevertheless, it is imperative to acknowledge that the DBMS that is
selected to implement the data models can have some level of influence
on performance. DBMSs offer different query optimisation techniques
which aim to enhance the efficient utilisation of time and resources
during query execution [40]. Moreover, DBMSs offer varying types of
indexes aimed at optimising query execution alongside diverse ap-
proaches for the implementation of these indexes [14]. The variation in
these features among different DBMSs can lead to disparate performance
outcomes, even when they implement similar data models. As a result,
the selection of DBMSs that are different from the one utilised in this
study may influence the outcomes of the study to some extent. Conse-
quently, a holistic evaluation of a given database system should consider
these optimisation features alongside the characteristics of the funda-
mental data model implemented by the system. Furthermore, beyond
resource optimisation, there are more facets of DBMSs that warrant
thorough consideration. For instance, considering the security features
provided by a DBMS for safeguarding data in both storage and transit is
of paramount importance, given that database users typically expect
their data to remain private and subject to access control [41]. It is also
essential to evaluate the attributes of the DBMS aimed at mitigating data
loss through mechanisms such as backup and recovery in the event of
system failures [42]. Moreover, factors such as overall operational cost,
compatibility and integration capability with other software systems,
and the extent of support available from both vendors and the broader
community are also important considerations prior to selecting a spe-
cific DBMS.
This comparative study employed a specific type of graph database
technology known as labelled property graphs. While this technology
offers significant advantages in terms of flexibility and intuitive
modelling of graph-based data, it is important to consider other graph
technologies. Among these alternatives, RDF triple stores represent a
promising direction. RDF inherently utilises a graph data model, facili-
tating the representation of complex relationships among data entities in
a manner akin to labelled property graphs. Simultaneously, RDF enables
the definition of standardized schemas through ontologies, offering a
balance between the rigid schemas of relational databases and the
schema-less nature of labelled property graphs.
In future research, it would be beneficial to explore the suitable
application of appropriate database systems for various use cases related
to the management of the built environment. Relational databases could
be evaluated for their efficiency in handling structured data, whereas
graph-based systems could be assessed for their effectiveness in
modelling and navigating complex relationships within infrastructure
systems. Further investigations could also focus on hybrid approaches
that leverage the strengths of both database systems. This approach has
the potential to unlock new insights and methods for handling the
increasingly complex data challenges faced in the built environment
management domain.
6. Conclusion
This research provides a comparative assessment of graph-based
database systems and relational database systems in terms of man-
aging building and environmental data. The study is conducted using
four data sets, including two building data sets and two city data sets.
The evaluation focused on a qualitative assessment of data organisation
and maintenance as well as a quantitative evaluation of performance
when traversing relationships and retrieving data from the target data-
base systems. To assess the performance of data retrieval, experiments
are conducted with the number of relationships traversed and the size of
the database as key parameters. The following conclusions are derived
from the findings of the comparative study.
•The conclusions drawn from this evaluation highlight the paramount
significance of data structure and interrelationships when selecting a
database system to store and manage building and environmental
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
21
data. In scenarios involving intricate interrelationships and dynamic
structural changes within data sets, the inherent flexibility and
adaptability of graph databases confer a distinct advantage.
Conversely, relational databases may prove more apt for persisting
data characterized by a well-defined, stable structure and fewer
interconnections.
•The study has shown a significant variance in performance between
relational database systems and graph database systems when
applied to different tasks. This emphasizes that selecting a data
storage solution is not a one-size-fits-all decision. The study high-
lights the importance of graph databases for scenarios involving
large-scale, relationship-intensive datasets, where efficiently navi-
gating complex networks of relationships is crucial. Conversely,
relational database technology is better suited for data management
applications with straightforward data retrieval needs, where the
complexity of relationships between data entities is minimal.
Effective management of building and environmental data can sup-
port several decision-making processes in the management of the built
environment. This necessitates the careful selection of a suitable data
persistence system that can effectively organise data, facilitating its
seamless access and utilisation for the intended use case. This study has
made a valuable contribution towards this objective by conducting an
assessment of relational and graph-based database systems, highlighting
their relative advantages and limitations in managing building and
environmental data. The findings can serve as a valuable resource for
selecting an appropriate data persistence system to manage building and
environmental data effectively. Subsequent investigations may further
enrich this field by implementing suitable data persistent systems to
various practical use cases related to the built environment.
CRediT authorship contribution statement
Eyosias Dawit Guyo: Writing – review & editing, Writing – original
draft, Visualization, Validation, Software, Resources, Methodology,
Investigation, Formal analysis, Data curation, Conceptualization. Timo
Hartmann: Writing – review & editing, Supervision, Project adminis-
tration, Funding acquisition.
Declaration of competing interest
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence
the work reported in this paper.
Data availability
Data will be made available on request.
Acknowledgement
This project was receiving funding from the European Union’s Ho-
rizon 2020 research and innovation programme under the Marie Skło-
dowska-Curie grant agreement No 860555.
References
[1] The Sedona Conference, 2020. The Sedona Conference Glossary: eDiscovery &
Digital Information Management, Fifth. ed.
[2] C.J. Date, The Relational Database Dictionary, Apress, Berkeley, CA. (2008),
https://doi.org/10.1007/978-1-4302-1042-9.
[3] R. Elmasri, S.B. Navathe, Fundametals of Database Systems, Sixth. ed., Addison-
Wesley, 2011.
[4] I. Robinson, J. Webber, E. Eifrem, Graph Databases: New Opportunities for
Connected Data, 2nd ed., O’Reilly Media Inc., 2015.
[5] Murty, P.S.R., 2017. Power Systems Analysis, 2nd ed. Elsevier. https://doi.org/
10.1016/B978-0-08-101111-9.00002-1.
[6] Harrington, J.L., 2016. Relational Database Design and Implementation, Fourth.
ed.
[7] M.J. Donahoo, G.D. Speegle, SQL: Practical Guide for Developers, Elsevier (2005),
https://doi.org/10.1016/B978-0-12-220531-6.X5000-3.
[8] P. Sadalage, M. Fowler, NoSQL Distilled: A Brief Guide to the Emerging World of
Polyglot Persistence, Pearson Education Inc, Vasa, 2012.
[9] Harrington, J.L., 2002. Relational Database Design Clearly Explained, 2nd ed.
Elsevier. https://doi.org/10.1016/B978-1-55860-820-7.X5000-4.
[10] O. Cur´
e, G. Blin, RDF Database Systems: Triples Storage and SPARQL Query
Processing, Elsevier (2015), https://doi.org/10.1016/C2013-0-14009-3.
[11] A. Vukotic, N. Watt, T. Abedrabbo, D. Fox, Neo4j in Action, Manning Publications,
Shelter Island, 2015.
[12] Carpenter, J., Hewitt, E., 2016. Cassandra: The Definitive Guide, Second. ed.
[13] J.L. Harrington, SQL Clearly Explained, Third. Ed. Elsevier. (2010), https://doi.
org/10.1016/C2009-0-61592-0.
[14] T. Halpin, T. Morgan, Information Modeling and Relational Databases, Second. Ed.
Elsevier. (2008), https://doi.org/10.1016/B978-0-12-373568-3.X5001-2.
[15] G. Vaish, Getting Started with NoSQL, Packt Publishing, 2013.
[16] D. Pritchett, BASE: An Acid Alternative, Queue 6 (2008) 48–55, https://doi.org/
10.1145/1394127.1394128.
[17] J. Antas, R. Rocha Silva, J. Bernardino, Assessment of SQL and NoSQL Systems to
Store and Mine COVID-19 Data, Computers 11 (2022) 29, https://doi.org/
10.3390/computers11020029.
[18] M.M. Eyada, W. Saber, M.M. El Genidy, F. Amer, Performance Evaluation of IoT
Data Management Using MongoDB Versus MySQL Databases in Different Cloud
Environments, IEEE Access 8 (2020) 110656–110668, https://doi.org/10.1109/
ACCESS.2020.3002164.
[19] H. Matallah, G. Belalem, K. Bouamrane, Comparative Study Between the MySQL
Relational Database and the MongoDB NoSQL Database, Int. J. Softw. Sci. Comput.
Intell. 13 (2021) 38–63, https://doi.org/10.4018/IJSSCI.2021070104.
[20] C.A. Gy˝
or¨
odi, D.V. Dums¸e-Burescu, D.R. Zmaranda, R.S
¸. Gy˝
or¨
odi, G.A. Gabor, G.
D. Pecherle, Performance Analysis of NoSQL and Relational Databases with
CouchDB and MySQL for Application’s Data Storage, Appl. Sci. 10 (2020) 8524,
https://doi.org/10.3390/app10238524.
[21] V. Abramova, J. Bernardino, P. Furtado, SQL or NoSQL? Performance and
scalability evaluation, Int. J. Bus. Process Integr. Manag. 7 (2015) 314, https://doi.
org/10.1504/IJBPIM.2015.073655.
[22] P. Kotiranta, M. Junkkari, J. Nummenmaa, Performance of Graph and Relational
Databases in Complex Queries, Appl. Sci. 12 (2022), https://doi.org/10.3390/
APP12136490.
[23] V. Abramova, J. Bernardino, P. Furtado, Experimental Evaluation of NoSQL
Databases, Int. J. Database Manag. Syst. 6 (2014) 01–16, https://doi.org/10.5121/
ijdms.2014.6301.
[24] H. Matallah, G. Belalem, K. Bouamrane, Experimental comparative study of NoSQL
databases: HBase versus MongoDB by YCSB, Comput. Syst. Sci. Eng. 32 (2017)
307–317.
[25] D. De Witte, F. Pattyn, L. De Vocht, H. Constandt, E. Mannens, R. Verborgh,
K. Knecht, R. Van De Walle, Big Linked data ETL benchmark on cloud commodity
hardware, in, in: Proceedings of the ACM SIGMOD International Conference on
Management of Data. Association for Computing Machinery, 2016, https://doi.
org/10.1145/2928294.2928304.
[26] P. Pauwels, T.M. de Farias, C. Zhang, A. Roxin, J. Beetz, J. De Roo, C. Nicolle,
A performance benchmark over semantic rule checking approaches in construction
industry, Adv. Eng. Informatics 33 (2017) 68–88, https://doi.org/10.1016/j.
aei.2017.05.001.
[27] DB-Engines, 2023. DB-Engines Ranking - popularity ranking of relational DBMS
[WWW Document]. URL https://db-engines.com/en/ranking/relational+dbms
(accessed 6.17.23).
[28] City of Espoo, 2023. Espoo’s 3D city model [WWW Document]. URL https://kartat.
espoo.fi/3d/citymodel_en.html.
[29] buildingSMART International, 2023. Software Implementations [WWW
Document]. URL https://technical.buildingsmart.org/resources/software-
implementations/ (accessed 9.11.23).
[30] buildingSMART International, 2017. Industry Foundation Classes 4.0.2.1 [WWW
Document]. URL https://standards.buildingsmart.org/IFC/RELEASE/IFC4/ADD2_
TC1/HTML/ (accessed 3.9.21).
[31] Wysocki, O., Schwab, B., Willenborg, B., 2022. Awesome CityGML [WWW
Document]. URL https://github.com/OloOcki/awesome-citygml.
[32] OpenStreetMap, 2023. OpenStreetMap [WWW Document]. URL https://www.
openstreetmap.org/about (accessed 6.17.23).
[33] J. Jokar Arsanjani, A. Zipf, P. Mooney, M. Helbich, An Introduction to
OpenStreetMap in Geographic Information Science: Experiences, Research, and
Applications. (2015) 1–15, https://doi.org/10.1007/978-3-319-14280-7_1.
[34] Mooney, P., Minghini, M., 2017. A Review of OpenStreetMap Data, in: Mapping
and the Citizen Sensor. Ubiquity Press, London, pp. 37–59. https://doi.org/
10.5334/bbf.c.
[35] Costa, G., Sicilia, ´
A., Madrazo, L., Scaramella, L., Martín, S., Izkara, J., Prieto, I.,
Katsigarakis, K., 2017. D2.6: Validation of the district data model repository and
exchange protocols.
[36] S. Roman, Access Database Design & Programming, 3rd Editio. ed., O’Reilly Media
Inc., 2002.
[37] Hunger, M., Boyd, R., Lyon, W., 2021. The Definitive Guide to Graph Databases for
the RDBMS Developer 34.
[38] Y.Y. Ghadi, M.G. Rasul, M.M.K. Khan, Design and development of advanced fuzzy
logic controllers in smart buildings for institutional buildings in subtropical
E.D. Guyo and T. Hartmann
Advanced Engineering Informatics 62 (2024) 102582
22
Queensland, Renew. Sustain. Energy Rev. 54 (2016) 738–744, https://doi.org/
10.1016/j.rser.2015.10.105.
[39] J. Zhao, H. Xu, H. Liu, J. Wu, Y. Zheng, D. Wu, Detection and tracking of
pedestrians and vehicles using roadside LiDAR sensors, Transp. Res. Part C Emerg.
Technol. 100 (2019) 68–87, https://doi.org/10.1016/j.trc.2019.01.007.
[40] Hellerstein, J., 2015. Chapter 7 : Query Optimization, in: Bailis, P., Hellerstein, J.
M., Stonebraker, M. (Eds.), Readings in Database Systems.
[41] Ramakrishnan, R., Gehrke, J., 2003. Database Management Systems.
[42] Bailis, P., 2015. Chapter 3 : Techniques Everyone Should Know, in: Bailis, P.,
Hellerstein, J.M., Stonebraker, M. (Eds.), Readings in Database Systems.
[43] Amazon Web Services, 2023. Amazon DocumentDB: Developer Guide.
[44] Bell, D.A. (Ed.), 1986. Relational Databases, Relational Databases. Elsevier.
https://doi.org/10.1016/C2013-0-03808-X.
[45] International Organisation for Standardisation (ISO), 2018. ISO 16739-1:2018
Industry Foundation Classes (IFC) for data sharing in the construction and facility
management industries — Part 1: Data schema.
[46] Ministry of Land, Infrastructure, Transport and Tourism, 2023. 3D city model
(Project PLATEAU) 23 wards of Tokyo [WWW Document]. URL https://www.
geospatial.jp/ckan/dataset/plateau-tokyo23ku (accessed 6.17.23).
[47] Open Geospatial Consortium (OGC), 2021. OGC City Geography Markup Language
(CityGML) Part 1: Conceptual Model Standard.
[48] The World Wide Web Consortium (W3C), 2014. RDF 1.1 Concepts and Abstract
Syntax [WWW Document]. URL https://www.w3.org/TR/rdf11-concepts/
(accessed 11.11.23).
E.D. Guyo and T. Hartmann