Document [original]

Advanced Engineering Informatics 62 (2024) 102582

Available online 16 May 2024

Full length article

Evaluating the efficiency and performance of data persistent systems in

managing building and environmental Data: A comparative study

Eyosias Dawit Guyo

, Timo Hartmann

Technische Universit¨

at Berlin, Germany

Trimble Solutions Oy, Finland

ARTICLE INFO

Keywords:

Building data

Environmental data

Interrelated data

Relational database

Graph-based database

Comparative evaluation

ABSTRACT

Selecting an appropriate data persistent system for a specific use case necessitates a thorough examination of the

application domain and the characteristics of the data expected to be stored. While comparative studies of data

persistent systems exist in various domains, there is a notable absence of such studies concerning building and

environmental data management. This research aims to bridge this gap by conducting a comparative evaluation

based on building and environmental datasets and use cases. The study primarily focuses on two types of

database systems, namely relational database systems and graph-based database systems. Two building and two

city models are employed in the evaluation. The building data sets are extracted from IFC models, and envi-

ronmental data are extracted from CityGML and OpenStreetMap. The assessment involves qualitatively analysing

the database design process of the systems and quantitatively evaluating the efficiency of retrieving data from

those systems. The comparative evaluation identifies at least two crucial aspects to consider when selecting a

suitable data-persistent system for managing building and environmental data. The first aspect pertains to the

stability of the data to be stored, along with the complexity of interrelationships within the building and envi-

ronmental dataset. The second aspect involves the manner in which data is retrieved to accomplish different

tasks within the particular business case. The findings demonstrate that use cases that typically manage inter-

related data and necessitate the traversal of complex relationships between building and environmental features

are better managed by graph-based database systems, particularly when dealing with large datasets. Conversely,

relational databases exhibit superior performance for use cases requiring minimal or no relationship traversal,

regardless of dataset size. The contributions of this study can serve as valuable input when designing information

management tools and systems for building and environmental data management.

1. Introduction

Software applications that utilise building and environmental data

can play a crucial role in supporting decision-making processes associ-

ated with the design, construction, and operation of building facilities.

These applications generally deal with data that can be categorised as

either ephemeral or persistent in nature. Ephemeral data refers to tran-

sitory data that exists for a short duration of time and is typically stored

in temporary storage [1]. Such data ceases to exist once the operation

responsible for its creation terminates. In contrast, persistent data is

intended to be permanently stored unless explicitly deleted [2]. Data

residing in persistent storage exhibits greater independence from the

application that created it and can be accessed again in future sessions

and even by a different application [44]. Various methods can be

employed to persist data, one of which is by using database systems.

A database is more than a simple aggregation and storage of data.

Instead, it is a collection of related data that is logically organised for a

specific purpose [3]. It can be created, managed, and accessed with the

help of computer applications referred to as Database Management

Systems (DBMS). These systems utilise various data models to organise

and store data. One of the data models that is widely used by numerous

DBMSs is the relational model. In the relational model, data is organised

as a collection of relations or related data points, commonly represented

as rows in tables [3]. A single relational database can contain multiple

tables that are often associated with each other, enabling the storage and

retrieval of interrelated data. The data stored in the relational database

is accessed and manipulated by the Structured Query Language or SQL,

which is also often used to refer to relational database systems in

* Corresponding author at: Gustav-Meyer-Allee 25 13555, Berlin, Germany.

E-mail address: [email protected] (E.D. Guyo).

Contents lists available at ScienceDirect

Advanced Engineering Informatics

journal homepage: www.elsevier.com/locate/aei

https://doi.org/10.1016/j.aei.2024.102582

Received 11 January 2024; Received in revised form 22 April 2024; Accepted 30 April 2024

Advanced Engineering Informatics 62 (2024) 102582

general.

In addition to relational models or SQL, other alternative data

models are used to persist data in a database. While these alternative

methods are distinct from one another, they are often collectively

referred to as non-relational database systems or NoSQL. Some of these

non-relational database systems primarily focus on aggregating related

data points, while others chiefly focus on representing the relationship

between data points [4]. The graph-based database is one example of a

relationship-oriented database system that does not follow the relational

model. Instead, it represents data using a graph model. The model

typically uses nodes and edges to represent data points and their re-

lationships. Most situations that involve objects with a high level of

interrelationship can be represented using a graph model (hence using a

graph database) [5]. While both relational and graph-based database

systems possess the ability to represent complex relationships, they also

have significant differences. Consequently, thorough consideration is

necessary before selecting either as a data persistence method for a

specific use case.

This research aims to conduct a comparative analysis of relational

and graph database systems to evaluate the relative advantages and

limitations of these systems in managing interrelated building and

environmental data. To meet this research objective, four real-world

data sets consisting of two building data sets and two city data sets

are obtained. These data sets are subsequently stored and managed

using the target data persistent systems for the assessment. The first part

of the assessment involves a qualitative evaluation of the process of

designing each database system, populating them with the provided test

data sets, as well as manipulating them to store evolving data. In the

second part of the assessment, quantitative performance-focused ex-

periments are conducted to assess the speed and cost associated with

accessing interrelated data from the database systems. The tests cover

various scenarios that represent different combinations of database size

and levels of interrelationship within the data sets.

The study’s findings indicate that the complexity of the in-

terrelationships between building and environmental features in data

sets should be an essential consideration when selecting a data-

persistent system. The qualitative evaluation of the target database

systems has determined that building data sets that represent intricate

relationships between different building features and spaces, as well as

city data sets, which represent complex relationships between environ-

mental features, are better represented using a graph database. Addi-

tionally, the quantitative performance evaluation has demonstrated that

the manner in which data is retrieved to execute different tasks in a

specific business case should also be a critical factor to consider prior to

selecting a database system. In situations where data retrieval from a

database requires traversing a minimal number of relationships, rela-

tional database systems outperform their graph-based counterparts,

regardless of the database’s size. However, when a given use case ne-

cessitates traversing a substantial number of relationships, the perfor-

mance advantage often shifts to the graph database solutions. Overall,

the outcomes of this study offer valuable insights that can prove highly

beneficial when deciding on a suitable data persistence system for the

efficient management of building and environmental data across various

applications within the domain of built environment management.

Finally, while this study discusses the selection of suitable database

systems, it’s crucial to acknowledge that real-world software systems

often require the integration of multiple data persistence technologies

due to their inherent complexity. Therefore, the primary objective of

this study is not to advocate for the superiority of any single system, but

rather to underscore the significance of strategically selecting data

persistence systems based on specific requirements within a collabora-

tive environment.

The paper is structured as follows: Section 2 provides a theoretical

background about different database concepts and database systems

that are essential to understand the research. In Section 3, the research

method is thoroughly discussed, including details about the test data

sets, the assessment metrics and the configuration of the experiments.

Then, in Section 4, the result of the comparative assessment is presented.

Section 5 discusses the insights gained from the research results. The

discussion includes the practical implications of the research, its limi-

tations, and potential future directions. Finally, concluding remarks are

given in Section 6

2. Technical background and related works

Databases serve as essential tools for the persistent storage of data.

They facilitate the decoupling of data from the applications that

generate it, thereby ensuring that data remains accessible for future

utilisation by either the same or different applications [44]. Rather than

merely storing data, databases also include valuable information

regarding the relationships between stored data [6]. Hence, they func-

tion as repositories where related data is stored, organised and accessed

[7]. Databases are accessed and managed through different database

management systems (DBMS). These DBMSs have key defining features

that influence how data is going to be stored and accessed, with one

crucial aspect being the data model they employ for data persistence.

This section will provide some technical background regarding these

data models that is essential for understanding this research study.

Additionally, it will also offer a summary of prior research that relates to

data persistent models, underscoring the research gap that this study

aims to address.

2.1. Relational and Non-Relational database systems

Different database management systems employ different models for

organizing and storing data. The most widely utilised model among

them is the relational data model, which has been the primary choice for

data storage for many decades [8]. However, in recent years, alternative

data persistence models have emerged and been embraced by various

database management systems. These alternative models are collec-

tively known as non-relational database systems. While these non-

relational data models exhibit several commonalities, they are distinct

from one another. The following section offers an exploration of the

defining features of both relational data models and non-relational data

models, which are employed for data persistent in different database

systems.

2.1.1. Relational database systems

Relational database systems utilise a relational model that is founded

on relations [9]. A relation contains a finite set of ordered data points or

tuples [3]. All tuples in a relation have the same set of fields [10]. Re-

lations are represented using tables in relational databases where col-

umns represent data fields and rows represent a single record, which is a

set of related data points (tuple). Each column in a relational database

table has a unique name and specific data type that it can store. One or

more columns in a table can be used to store unique keys that can be

used to identify and retrieve each row or record in the table. Such col-

umns are called primary keys. Relational database systems are the most

popular database systems currently in use. Some popular relational

DBMSs include Oracle, MySQL, and PostgreSQL. Relational databases

are suitable for business cases in which the data to be stored is well-

known, stable and predictable where changes in the data structure are

expected to be rare and non-urgent [6].

Relational database systems can support some level of relationship

between the data set they store. A single relational database can contain

multiple tables which can be related to one another. The relationship

could be one-to-one, one-to-many or many-to-many. In some instances,

a table can also be associated with itself. Foreign keys, which are con-

straints set on columns, are used to establish relationships between ta-

bles in a database. By setting a foreign key constraint on a column from

one table, a reference to another table can be established. These keys can

then be used to request data that is distributed across multiple tables.

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

Relational database systems use joins to retrieve data that is saved across

multiple interrelated tables during read operations. Joins can also be

used to connect a table with itself. They are often created by overlapping

two tables using their common column. When two tables are joined, the

query processer first creates a table using a cross-product of the two

tables. In other words, it pairs all the rows of one table with all rows of

the other table [7]. If there is a third table to join, a cross-product of the

resulting table from the first join with the third table will be performed,

and the results will be stored in a new table.

2.1.2. Non-Relational database systems

Non-relational data persistence methods do not refer to a single data

storage model. Instead, they encompass multiple distinct systems that

are not based on a relational model. The following are the most common

types of non-relational database systems.

2.1.2.1. Key-Value store. Key-Value databases offer the simplest struc-

ture to store data, where records are stored as pairs comprising a unique

key and its associated value. Any type of data can be stored as a value

which can be accessed and retrieved using its key. These data persistence

systems are ideal for quick data access and exhibit high scalability [4].

However, they are unsuitable for storing interconnected data or

executing complex queries. Key-Value stores are utilised in use cases

such as caching, storing user data, storing large image databases, large

catalogues and websites that have a large number of pages [6]. Exam-

ples of key-value DBMSs include Redis, Amazon DynamoDB, Microsoft

Azure Cosmos DB and Memcached.

2.1.2.2. Document-Based database systems. Document-based database

systems resemble key-value databases, with the distinction that instead

of values, they store self-contained documents [11]. Each document in a

document store has a unique key that is used to identify and retrieve the

document. Additional keys within the documents can also be used to

retrieve specific documents. While document databases offer advantages

similar to key-value stores, they may exhibit slightly lower performance

since they may contain multiple data fields within a single record [11].

Therefore, they are suitable for storing semi-structured data with mul-

tiple attributes. For instance, they can be used to store user profiles

where multiple attributes specific to each user are stored or for content

management, where different types of content need to be saved [43].

Popular document-based database systems include MongoDB, Amazon

DynamoDB, Databricks, and Microsoft Azure Cosmos DB.

2.1.2.3. Column-Oriented database systems. Column-oriented database

systems store data in columns, which are the fundamental units of data

storage where each one of them consists of a name-value pair. These

columns can be grouped into column families, which results in a named

set of columns where each set gives a complete view of one entity [4]. By

introducing related key-value pairs, the column database system en-

hances the structure of key-value stores [11]. However, similar to key-

value and document-based systems, column-oriented databases are

primarily based on data aggregation and do not support joins and re-

lationships [4]. Nevertheless, they offer fast data aggregation capabil-

ities [6]. Prominent examples of column-oriented DBMS include

Cassandra, HBase and Microsoft Azure Cosmos DB.

2.1.2.4. Labelled property graphs. Graphs, typically made up of nodes

and edges (relationships between nodes), are used by some database

management systems to model scenarios involving distinct objects with

interrelationships [5]. One common type of a graph-based database

system is the labelled property graph. In labelled property graphs, nodes

are labelled and interconnected through edges (relationships), and both

nodes and edges can possess properties [4]. Property graphs offer a

simple, straightforward, and compact data representation that closely

resembles real-world relationships. Moreover, these systems provide an

efficient system to traverse relationships where traversing is localised to

the target record without necessitating a scan of the entire database. As a

result, they excel at representing data with complex relationships and

efficiently traversing those relationships [11]. These database systems

are particularly suitable for use cases where managing relationships and

paths between data is the primary requirement, such as in social

networking applications, route-finding tools, and recommendation en-

gines. Examples of popular graph DBMS include Neo4j, Microsoft Azure

Cosmos DB, and ArangoDB.

2.1.2.5. RDF trible stores. The Resource Description Framework (RDF)

is a framework that adopts a triple structure (RDF triples), comprising of

subject, predicate, and object, to describe concepts in a given domain

[48]. RDF has a standardised specification, which is maintained by the

W3C. Furthermore, although not obligatory, ontologies are often uti-

lised with RDF stores to provide a formalised and shared vocabulary and

relationships within the target domain [10]. These ontologies serve as a

schema to organise RDF triples. Another distinguishing feature of RDF is

the application of unique and global identification to resources, referred

to as Uniform Resource Identifier (URI), which facilitates the integration

of data across diverse data sets. Consequently, RDF excel in defining

data semantics and linking data across data sets. Examples of DBMSs

that can store RDF triples include Virtuoso, Apache Jena and GraphDB.

2.1.3. Important database features

The following section briefly introduces some fundamental database

features that are crucial for effective database management. Further-

more, the discussion offers a high-level insight into the distinction be-

tween relational and non-relational database systems in relation to these

key database features. This discussion is imperative for understanding

the comparative assessment that is presented in this study.

Query Language: Database users, including both human users and

computer applications, use different querying languages and general-

purpose programming languages to access and manipulate databases.

Queries are used to perform the four essential operations in database

management, which are Create, Read, Update and Delete or CRUD.

Furthermore, queries can be employed to generate a customised view of

a database tailored to the requirements of a specific use case [44]. Da-

tabases can be modelled to perform a specific set of queries efficiently.

However, such database optimisation can lead to suboptimal perfor-

mance when attempting to execute a different set of queries than the

planned one [10].

The Structured Query Language (SQL) is used to retrieve data from

relational databases as well as to perform various kinds of database

manipulations. SQL has gained widespread adoption across multiple

relational database management systems, enabling users to avoid

vendor lock-in by providing interoperability across different environ-

ments. In contrast, various distinct query languages have been devised

for various non-relational database systems. One such example is the

Cypher query language, which is used for querying Neo4j property

graphs [11]. Meanwhile, the SPARQL query language is developed to

query data from RDF triple stores [10]. Another example is the Cas-

sandra Query Language (CQL), which was introduced to manipulate

Cassandra wide-column stores [12]. Evidently, non-relational database

systems lack a universally standardised query language comparable to

SQL, which limits their cross-system compatibility. Consequently, users

who opt for a specific vendor’s non-relational database face the risk of

being tied to that vendor’s environment, as data migration and inte-

gration options are restricted.

Database Schema: When a relational database is created, it is

mandatory to define a schema that describes the database’s data struc-

ture. The schema serves as a logical blueprint for the database, outlining

the tables, relationships, and constraints that can be stored [13]. The

database is then populated by adding data to the tables defined within

the schema [7]. The schema imposes restrictions on the nature and size

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

of data inserted into specific columns as well as the permissibility of null

and duplicate values. A schema is defined during database design and is

not expected to undergo frequent changes [3]. Hence, designing a

schema requires thoughtful consideration of the data to be stored.

However, it is not uncommon to encounter scenarios where a data set

cannot be fully represented by a schema. In such cases, sparsely popu-

lated tables are created, where certain columns are left without values

[4]. When the variability between the data fields within records is sig-

nificant, it will lead to tables with numerous unused data fields. For that

reason, relational databases are a good fit for use cases where the data to

be stored is well-known, stable and predictable, and changes in the data

structure are rare and non-urgent [6].

In contrast, most non-relational database systems are schemaless and

eliminate the need to define entities and relationships when the data-

base is created. Instead, applications must enforce schemas when

reading from the database. These systems offer flexibility when writing

to the database by allowing the addition of new data without disturbing

existing functionalities [4]. Unlike relational databases, attributes

irrelevant to a particular record are not created for that record, thereby

avoiding empty data fields. As a result, schema-free non-relational

database systems better accommodate evolving data sets when

compared to relational databases [4]. However, it should be noted that

some non-relational database systems, RDF stores specifically, do have a

standardised schema.

Indexes: In many database systems, each record comes with a

unique key that serves as an identifier and facilitates record retrieval

through queries. Depending on the specific database systems, other data

fields within a record can also be used to retrieve the record. The data

fields that are frequently used to query the database are often indexed to

improve the efficiency of searching data. An index is a typically ordered

list of keys that point to the location of the full records stored in a

database [14]. If indexes are absent, the entire database needs to be

scanned when searching for records. With indexes, however, only a

limited portion of the database is scanned, reducing the time required to

retrieve data from the database. Nevertheless, indexes come with costs

associated with their storage and maintenance. They can slow down

updating operations since they also need to be updated when changes

are made to the table.

Scaling: The scalability of a database is an important aspect to

consider, as the size of the database can potentially exceed the capacity

of a single machine. Scalability can be approached either vertically or

horizontally. Vertical scaling, also known as scaling up, entails acquiring

a new machine with greater processing power or upgrading an existing

machine by equipping it with hardware that has greater processing

power. This approach to scaling can prove to be costly and is constrained

by the present advancements in computer processor technologies. The

other alternative to scale is horizontal scaling or scaling out, which

entails adding more machines to a system so it can accommodate a

higher workload. Sharding or partitioning a database is a common way to

scale horizontally. It involves splitting a database into unique pieces

known as shards, which are subsequently distributed across multiple

machines [6]. One of the primary driving forces behind the develop-

ment of non-relational database systems was the need to address the

scalability challenges of relational database systems [15]. Consequently,

the aggregation-based non-relational data models are more suited for

horizontal scaling, given that they inherently possess compartmental-

ised data units that can naturally be distributed across multiple ma-

chines [8]. In contrast, sharding is significantly harder in graph

databases due to their relationship-oriented nature, which typically re-

sults in a lack of boundaries within the database that can be used for

partitioning [11]. Nonetheless, the implementation of a Uniform

Resource Identifier (URI) in certain graph database systems, such as RDF

triple stores, can alleviate some of the challenges associated with

sharding. By relying on URIs to identify entities and relationships, it

becomes straightforward to split data across distributed servers that

support the HTTP protocol.

Transactions: A transaction refers to a sequence of operations per-

formed on a database, which can either be committed to the database or

rolled back (Cancelled) [7]. When a transaction is committed, the

resulting changes persist in the database, while rolling back a trans-

action leads to its cancellation and removal of any associated changes

[14]. ACID and BASE are two common transaction models employed in

different database systems. ACID data models prioritise consistency of

the data at any given time, often at the expense of data availability.

Relational DBMS often use the ACID transaction model, which stands for

Atomicity, Consistency, Isolation and Duration. An ACID transaction en-

sures that either all or none of the operations within the transaction are

executed (atomicity), the database is consistent at both the beginning

and end of the transaction (consistency), the transaction is treated as the

sole operation being executed on the database (isolation) and the

changes made to the database by a successful transaction persist in the

database (duration) [16]. In contrast, the BASE transaction model,

which stands for Basically Available, Soft State, and Eventually Consistent,

prioritises the data availability at any given time (basically available)

even though it may not always be consistent (soft state). While a BASE-

compliant database does not ensure immediate consistency, it will,

however, eventually achieve it (eventually consistent). As a result, to

achieve eventual consistency, the database state could change over time

without any input action. This model is widely adapted by most NoSQL

system. However, it is worth noting that some NoSQL database systems,

such as certain graph databases, offer ACID compliance.

In summary, relational and non-relational data persistence methods

that are used by different DBMSs possess distinct features and charac-

teristics. Consequently, they exhibit different behavioural patterns

under different circumstances. As a result, researchers have delved into

evaluating how these systems perform under diverse circumstances. The

upcoming section offers a concise overview of prior research that has

evaluated the efficacy of relational and non-relational data storage

systems. Afterwards, it outlines a crucial research gap that this study

seeks to fill.

2.2. Previous comparative studies

Comparative studies of the numerous data persistence methods assist

users in selecting a suitable solution that meets their specific needs.

Consequently, multiple researchers have conducted comparative as-

sessments with the aim of highlighting the relative advantages and

limitations of different data persistence models. A subset of these studies

compared the relational model with non-relational models, while others

concentrated on comparing different non-relational models with each

other. The studies utilise a range of evaluation metrics, encompassing

write, update, and read performance, scalability, and the ability to

manage the complexity of queries. Based on the particular study, the

term “complex query” can refer to queries that either involve traversing

relationships, aggregating data from multiple data sets, or performing

mathematical computations. The following discussion focuses on studies

that conducted the comparison through experiments.

The majority of comparative studies found in the literature primarily

focus on comparing relational databases with document-based data-

bases. A recent example of such a study is the research conducted by

Antas et al. [17], in which Microsoft SQL Server is compared with

MongoDB, a document-based DBMS and Cassandra, a column-oriented

DBMS. The evaluation results demonstrate that in scenarios involving

structured and interrelated data, SQL performs better than the non-

relational alternatives considered in the study. However, in cases

where the data is unstructured and does not necessitate complex oper-

ations, such as joins, then the two non-relational database systems

outperformed the relational database.

In addition to the complexity of the query, the size of the database

and the volume of data retrieved by a query can also impact the per-

formance of database systems. Studies have shown that an increase in

the database size has more adverse effects on relational database

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

systems compared to document-based database systems [18]. Further-

more, as demonstrated in a study conducted by Matallah et al. [19],

SQL systems perform better than document-based databases in small to

medium-scale read operations, but their performance declines as the size

of retrieved data increases. The same study also evaluated the writing

performance of the two database models and noted that the checks and

verifications associated with ACID-based transaction modelling caused

the SQL system to be less time-efficient when writing compared to the

BASE-compliant document-based database. These research outcomes

support the findings of a prior study that compared MySQL with

CouchDB (a document-based DBMS) and concluded that document-

based databases exhibit superior performance to relational databases

when executing write operations particularly involving large volume of

data [20]. This study further reinforces the assertion that document

databases outperform relational databases when reading a large amount

of data, while the performance advantage shifts to the relational data-

base as the complexity of the query increases and more relationships are

involved.

While the document-based database dominates the comparative

studies, a few studies have considered alternative non-relational data

models. For instance, the studies conducted by Abramova et al. [21]

and Antas et al. [17] compared relational databases with column-based

database systems. Both studies have demonstrated that column-based

databases offer easier scalability and faster execution of CRUD

queries. However, the data models have limited functionality compared

to the relational database when it comes to managing more complex

queries, such as joins, which are necessary for managing interrelated

data. Additionally, in rare instances, graph databases have also been

compared to relation databases. A study conducted by Kotiranta et al.

[22] has made the comparison based on the performance of read

queries. In this study, the graph database performed slightly better than

the relational database for simple queries, but it underperformed in

more complex queries. It is worth noting that, in the context of this

study, query complexity refers to operations that involve data aggre-

gation and mathematical computations.

In the context of comparing the non-relational database systems to

one another, the studies mostly focus on comparing document-based

database systems with column-based database systems, occasionally

including key-value store data models. In the assessments that included

key-value stores, the data model has come out as the most optimised for

execution read and write operations [23]. Moreover, when the studies

only include document-based and column-based data models, the

document-based model had a slightly better performance than the

column-based model when executing simple read queries and a signifi-

cantly inferior performance when executing read queries that include

relationships [17,24]. The column-based model also has better data

writing performance than the document-based model, which is attrib-

uted to the fact that the column-based database used in the study did not

require a large amount of memory to run the data insertion [24].

Besides comparing different data models, some studies delve into

analysing various implementations of the same data model across

different environments. Despite a shared underlying data model, distinct

implementations can result in notable differences in performance and

suitability for specific tasks. For instance, De Witte et al. [25] conducted

a comparison of four RDF storage solutions: Blazegraph, Enterprise Store

I, Enterprise Store II, and Virtuoso. Their study aimed to determine

which storage solution offers optimal performance across various data

set sizes by conducting stress tests. Another example is a performance

comparison conducted by Pauwels et al. [26], which focused on rule-

checking procedures that are employed in the construction industry.

This study analysed three distinct rule-checking procedures that utilise

RDF graphs. The study offers both a qualitative assessment of the

essential characteristics of the solutions under consideration and a

quantitative analysis grounded in task execution times.

In summary, previous research comparing non-relational database

systems with relational database systems has predominantly used

document-based data models to represent the non-relational side. In

contrast, the inclusion of graph-based database systems in such studies

has been extremely limited. Studies also often omit graph database

systems even when comparing non-relational databases with one

another. Nevertheless, including graph-based data persistence methods

in comparative studies is an important undertaking due to their unique

nature, one of which is their emphasis on relationships, which sets them

apart from other types of non-relational database systems. These other

systems have consistently been outperformed by relational database

systems in prior studies when managing complex relationships in data

sets. This may not hold true when comparing graph-based databases

with relational databases since both systems are capable of handling

complex relationships. Furthermore, a crucial aspect to consider when

evaluating database systems is the intended business use case. Each

database system possesses unique strengths and weaknesses that render

it suitable for specific domains and unsuitable for others due to the

potential variability in the nature of data sets encountered across

different domains. Prior database comparative studies within the

context of managing building and environmental data are lacking.

Therefore, the main objective of this study is to investigate and provide

new insights into the comparative advantages and disadvantages of

graph-based database systems and relational database systems in the

management of building and environmental data sets. In the pursuit of

delivering an objective analysis within this comparative study, it is

acknowledged that inherent challenges exist which may impact the

fairness and direct comparability of distinct database systems. Factors

including the specific implementations of data models by DBMSs and the

different design choices undertaken by the practitioner have the po-

tential to influence the outcomes. Readers are advised to take these

challenges into account when assessing the findings. A more compre-

hensive discussion of these limitations will be provided in the limitations

section of this paper. Despite these challenges, measures have been

taken throughout this study to maintain as fair a comparison as possible

between the database systems being examined.

3. Research method

3.1. Evaluation metrics and configuration

The comparative analysis presented in this research assesses the

management of interrelated building and environmental data by rela-

tional database systems and graph-based database systems. The type of

graph system selected for the study is the labelled property graph (LPG).

In property graphs, nodes are labelled and interconnected through re-

lationships, and both nodes and relationships can possess properties

[4]. Property graphs offer a simple, straightforward, and compact data

representation that closely resembles real-world relationships. They

allow for richly detailed modelling of relationships where properties can

be assigned directly to relationships, making them a suitable candidate

for a study focused on managing relationships in data sets. Furthermore,

these graph types allow data to be organised without predefining a

schema and hence offer the flexibility to be adapted to changing re-

quirements. These inherent characteristics of labelled property graphs

are the reasons why they are utilised to store interrelated data in this

study.

A comprehensive comparison of database persistence systems entails

a multifaceted analysis involving numerous evaluation metrics and

factors. An exhaustive exploration of every conceivable metric and its

implications is a task that would surpass the practical limitations of a

single article. Consequently, this study narrows its focus to a qualitative

assessment of the database design process and a quantitative assessment

based on query execution time. Execution time is a critical parameter

that is universally relevant across a broad spectrum of use cases, making

it a pragmatic choice for this study. This focus is particularly relevant

from the standpoint of end users in engineering domains, where time-

efficient data retrieval is paramount for user satisfaction. By

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

concentrating on this aspect, the study aims to shed light on a funda-

mental component of database performance, providing valuable insights

that are broadly applicable yet manageable within the scope of this

research.

The overall evaluation is systematically divided into two distinct

sections.

•A qualitative evaluation of the design and maintenance of the

database systems. In this assessment section, a comparison is made

regarding the data structuring capabilities offered by the two data-

base models when managing building and environmental data sets.

Additionally, the effort and cost involved in populating, maintaining,

manipulating and updating the database systems are examined.

•A quantitative evaluation of the database systems’ performance

in retrieving data. This evaluation section involves conducting ex-

periments to evaluate the cost and efficiency of retrieving data from

the databases. Key parameters that are emphasised in this perfor-

mance comparison are the level of relationship traversed by the

queries and the size of the data sets to assess scalability. Various

combinations of these parameters are used in the experiments to

account for different potential use cases.

The research utilised MySQL as a relational DBMS and Neo4j as a

graph DBMS, with both systems occupying prominent positions in their

respective categories [27]. Neo4j adopts a property graph as its un-

derlying technology, and it has been in the market for a longer time than

most other graph database systems, suggesting potential advantages in

terms of stability and maturity. Similarly, MySQL, with its long presence

in the market, is expected to offer similar advantages. Furthermore, both

systems are supported by an active community and possess extensive

documentation and other supporting resources crucial for effective uti-

lisation. SQL is used to interact with MySQL, while Cypher, a graph query

language, is used to communicate with Neo4j. It should be noted that the

evaluation does not focus on the DBMS themselves but rather on the

underlying data model they employ to persist data. However, it is

imperative to recognise that the choice of DBMS to implement a

particular data model can have some level of influence on performance.

This is due to the difference in optimisation features and underlaying

architecture between DBMSs. Such variations can result in disparate

performance outcomes, even if a similar data model is implemented.

Therefore, selecting DBMSs that are different from the ones utilised in

this study may affect the outcomes of the study to some degree.

All the tests in this study are performed on a single computer

equipped with an Intel(R) Core (TM) i7-10750H CPU @ 2.60 GHz 2.59

GHz processer and 32.0 GB RAM. Measures were taken to maintain

consistent system configurations throughout the tests and to ensure that

both DBMSs had similar access to the system’s memory and processing

capabilities. These measures entail allocating an equal amount of

memory to both DBMSs and ensuring that the DBMS under evaluation is

the sole application running on the system when it is being tested.

Furthermore, performance tests were conducted several times, and

average values were obtained from the last ten tests for the comparisons.

Indexes were utilised throughout the database management to ensure

the efficient operation of both database systems. Despite our efforts to

ensure a comparable configuration for both DBMSs in the interest of

fairness, it is important to acknowledge that we cannot assert the sys-

tems have completely identical configurations. This is due to the

inherent differences between the two DBMSs, as they are two entirely

distinct software systems, each with its own unique settings and con-

figurations. Therefore, while we strive for fairness in our comparative

analysis, some variances inherent to each DBMS’s design and architec-

ture may influence the outcomes, underscoring the complexity of con-

ducting a perfectly balanced comparison between such diverse

technologies.

3.2. Data sets

This section delves into the four real-world data sets that are used for

the comparative analysis. Two sets of building data sets and two sets of

city data sets are acquired for this study. In both instances, one data set is

of a small scale, while the other is comparatively larger. This is done

intentionally in order to observe how the scalability of the target data-

base systems is affected by varying dataset sizes. Consequently, in the

context of the city data set, one small and one large city data model is

sought after. Subsequently, the publicly available model of the city of

Espoo, which contains around 60,000 buildings, is selected as a small

building data set. Then, for a large city data set, after a thorough com-

parison of available options, the city model of Tokyo, which is consid-

ered among the largest cities in the world, housing more than 1.7 million

buildings, is selected. This data set is significantly larger in scale

compared to the Espoo data set, and it is believed it can highlight the

impact of data set size on the performance and scalability of the target

databases. Both city data sets are publicly available [28,46]. Similar

considerations were applied to the selection of building data sets, where

a small building model is paired with another model with a compara-

tively greater scale. Both building data sets are from real-world projects

and were shared privately using IFC models for use in this research.

Hence, sharing these data sets will require explicit permission from the

project/data owners. More details about all of the data sets are given in

the subsequent section. It is recognised that these data sets might not be

as large in scale as data sets found in other domains. However, they are

real-world data sets that reflect the realities of the target domain.

The building data sets are derived from building models stored in the

Industry Foundation Classes (IFC) standard. IFC is an open-source inter-

national standard developed by buildingSMART International with the

objective of facilitating data exchange between Building Information

Model (BIM) software tools [45]. Currently, over 400 BIM applications

support the IFC standard, thereby enabling practitioners to exchange

and share valuable building data throughout the design, construction

and operation of buildings [29]. The standard is capable of representing

data about physical components, spatial elements, project structures,

involved actors and analytical items [30].

The city data that is used in this study is obtained from city models

published in accordance with the City Geography Markup Language

(CityGML) data standard. CityGML is an internationally recognized data

standard approved by the Open Geospatial Consortium (OGC) for the

storage and exchange of 3D city and landscape models [47]. It has the

capability to represent both human-made structures (such as buildings,

tunnels, bridges, roads, and railways) and natural features (including

terrains, vegetation, and water bodies) within an urban environment.

Presently, over 50 cities and regions spanning 18 countries globally have

made their urban data publicly accessible through the CityGML stan-

dard, thus providing invaluable environmental data to researchers and

practitioners [31]. In addition to CityGML, this study employed the

crowdsourced geographic database OpenStreetMap (OSM) as an addi-

tional source of environmental data [32]. This service offers a collab-

orative platform featuring a freely editable map database that covers the

entire world [33]. OSM represents a wide range of physical features,

including buildings, natural features, amenities, commercial establish-

ments, transportation infrastructures, energy distribution in-

frastructures, water distribution infrastructures and various other

categories. The rich and varied geodata it offers have been applied in

various fields, including disaster management, routing and navigation

services, tourism, leisure, and research [34].

The structure of IFC and CityGML data standards is used in this study

to design database systems. It is recognised that the principal objective

of these open standards is to facilitate the seamless exchange of data

among diverse software applications [45,47]. In typical situations,

software systems possess proprietary internal data models customized to

their particular functionalities and requirements. They utilise these in-

ternal models to create, structure and persist data in database systems.

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

However, in the context of this study, the input data is already defined

and structured using these open data standards. As a result, the design of

database systems in this study followed the structure provided by these

data standards rather than developing an entirely new data model. This

approach enabled the precise storage of input data while minimising the

need for restructuring processes that pose the risk of data loss or

distortion. Following the extraction of the necessary building and

environmental data from the aforementioned data sources, two building

and two environmental data sets are composed. Each of these data sets is

subsequently stored in both a relational database and a graph-based

database for the evaluation. The subsequent sections provide a

description of each of these data sets.

3.2.1. Small building data set

The smaller of the two building data sets is obtained from the Torre

Turina building model, which is a residential building belonging to the

Cuatro de Marzo pilot case district in Valladolid, Spain [35]. The

building consists of 12 floors and 284 rooms. The building model is

acquired in IFC format. For the purpose of the study, representations of

3,528 physical building elements, 700 element types, 348 spaces and

108,522 property items are extracted from the IFC file. Furthermore,

44,093 relationships, which include relationships between building el-

ements, between building elements and spaces, and between building

elements and their type, as well as their properties, are extracted from

the building model.

3.2.2. Large building data set

The second building data set used in the study belongs to a larger

office complex situated in Espoo, Finland, whose model files were pro-

vided by Trimble Inc. The data set is made up of multiple IFC files that

correspond to architectural, structural, electromechanical, sanitary and

HVAC models. From these files, a total of 371,693 building elements,

1,047 spaces, 14,794 object types and 6,323,802 property items are

extracted. Moreover, the data set includes 1,572,303 relationships that

consist of relationships between building elements, between building

elements and spaces, and between building elements and their type, as

well as their properties.

3.2.3. Small city data set

The Espoo 3D city model is used in this study as a source of a small

city data set. The data set is openly shared by the city of Espoo, and it can

be retrieved from the source listed in the reference section of this paper

[28]. The model is shared in a format that adheres to the CityGML 2.0

standard and includes various objects such as buildings, vegetation,

water bodies, roads, city furniture, land use and other objects that are

found within the city. Most of these city items share a common set of

attributes such as id, class, usage, function, and GPS coordinates. At the

same time, most objects also have attributes that are specific to their

category. However, there are some building attributes as well as city

features (such as fire hydrants) that are not available in the city model.

To address this, the dataset is later enriched by gathering additional data

from OpenStreetMap and integrating it with the existing data from the

city model. The final data set consists of 63,042 buildings, 130,341

vegetation items, 12,552 land use features, 5,325 city amenities and 767

fire hydrants. Furthermore, some additional information is generated

through computations. This involves identifying buildings in proximity

to each other and determining fire hydrants nearby buildings. Coordi-

nate points of the objects are used for the computation. The computation

yielded 981,742 building-to-building proximity relationships and

35,051 hydrant-to-building proximity relationships.

3.2.4. Large city data set

The Tokyo city model is used as a source of a larger city data set in

the comparative study. This publicly available model encompasses all 23

wards of Tokyo and spans 627.57 square kilometres. The data set can be

obtained from a public repository that is given in the reference section of

this paper [46]. It is shared in CityGML 2.0 format and includes

1,768,233 buildings along with their attributes. These attributes

encompass various aspects such as geometry (height and roof area),

position (latitude and longitude), address (town, district, zone, and

prefecture) and flooding risk. Using a similar approach employed on the

Espoo city data, the coordinates of buildings in the Tokyo city model are

utilised to identify proximity relationships between buildings, resulting

in a total of 41,871,173 relationships between adjacent buildings.

3.3. Evaluation design

3.3.1. Designing and maintaining the database systems

Upon the acquisition and processing of the test data sets, the data-

base design ensued. The process of designing a database system has

inherently subjective elements. The designers’ expertise, as well as

preference, plays some role in design choices. However, objective

principles and best practices are followed in this study to arrive at a well-

designed database. One of the principles is avoiding storing redundant

data since it can inflate database size and lead to inconsistencies [36].

Another best practice implemented in this study is ensuring all entries

are atomic or indivisible. This implies, for instance, storing each part of a

building address (such as street name, building number and postal code)

separately instead of saving the full address as a single entry. This en-

sures individual entities can be retrieved and utilised. Another guiding

principle is ensuring the completeness of data stored in both target

database systems. This, in addition to being a good design principle,

ensures the database comparison is fair since the same volume of data

will be stored in the target database systems.

In addition to the general principles discussed in the preceding

paragraph, additional design strategies tailored for each database sys-

tem are implemented. For relational databases, the initial step involves

defining schemas that delineate the structure of the database. The test

data sets contain numerous entities, each possessing multiple attributes

and interrelations with one another. These concepts steered the design of

the database schemas, guiding the creation of tables and columns and

the definition of various constraints related to data types and relation-

ships. For the transformation of building data from IFC models, the IFC

entities, such as IfcBuildingElement, IfcBuildingStorey and IfcPropertySet,

are turned into entity tables. Similarly, for the city data extracted from

CityGML, the entity classes such as AbstractBuilding, CityFurniture and

SolitaryVegetationObject class are transferred into separate tables. In both

cases, attributes of the entity classes are added to the relational tables as

columns. The unique keys of each record, obtainable from the original

data sources, serve as primary keys in the entity tables.

Furthermore, in addition to entity tables, association tables are

created to store relationships between entities. In the context of IFC

data, relationship-oriented classes such as the IfcRelAggregates, IfcRel-

ConnectsElements and IfcRelDefinesByType are transformed into associa-

tion tables, along with their attributes stored as columns. Similarly, for

the city dataset, an association table is established to store the proximity

relationship between buildings, which is calculated using the location of

buildings provided by the CityGML files. These association tables utilise

foreign keys that refer to the primary keys of records in the entity table

to establish relationships. Inverse relationships are explicitly defined in

the table to ensure relationships between entities can be traversed in

both directions. Finally, to enhance query efficiency, all columns that

are going to be used to filter tables are indexed.

The design of the labelled property graph databases is also guided by

the entities, attributes, and relationships present in the test datasets.

These elements informed the formulation of nodes, labels, properties, and

edges (representing relationships) within the graph databases. The

design adheres to the official data modelling guidelines provided by

Neo4j to ensure a comparable design with its relational database

counterpart [37]. Nodes are used to represent entities, equivalent to the

rows of the entity tables in the relational database. While entities are

typically organized across multiple tables in relational databases, the

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

nodes are similarly organised using different labels. Moreover, entity

attributes, which are represented by column fields in the relational

database, are represented using node attributes in the graph database.

Unlike relational databases, nodes within graph databases are not

required to possess an identical set of fields, allowing for the inclusion of

only relevant node attributes. Relationships within the graph database

are depicted using named edges (arrows), which can be assigned prop-

erties. These edges serve as equivalents to the association tables and

foreign key columns utilised in the relational databases. All attribute

column fields present in association tables are mirrored as properties

within the corresponding relationship edges in the graph database.

These edges can be assigned directions and traversed bidirectionally,

thus eliminating the need to define inverse relationships explicitly.

Finally, the node and edge properties that will be used for querying are

indexed to ensure efficient querying.

The preceding paragraphs have provided an overview of the general

principles as well as the database-specific design methodologies adopted

in this study. These principles allowed the design of efficient yet com-

parable relational and graph database systems that are tailored to the

requirements of the research. The upcoming section will provide a more

in-depth explanation of the steps taken to create each individual data-

base system for each of the four data sets used in the study. Samples

extracted from both the relational and graph-based databases will be

included in the discussion.

3.3.1.1. Managing city data sets. The Tokyo city data set includes a long

and structured list of buildings that share a common set of attributes

such as height, roof area as well as data about regional organisation.

Consequently, in the relational database, a single table is created that

lists all the buildings along with their respective attribute as columns.

Concurrently, nodes representing each building in the city are created in

the graph database, where each node is assigned a set of properties that

store building attributes. A sample extract from the relational database

storing the Tokyo city data is presented in Table 1. Similarly, an extract

from the graph database storing similar data is presented in Fig. 1.

In the case of the Espoo city data set, the city model from CityGML

encompasses not only buildings but also other environmental entities

such as vegetation, city amenities, and various land use features. While

the entities within the same group share similar attributes, entities

belonging to various categories possess some distinct attributes. Thus, in

the relational database, separate tables are created for each environ-

mental entity group, with columns representing their attributes. Build-

ings, for instance, possess multiple attributes, including address,

occupancy type, height, and location in terms of latitude and longitude.

The building address is given in the model in a single line that includes

the street name, building number and postal code. To ensure atomicity,

this attribute is translated into three fields, each representing each part

of the address. Meanwhile, in the graph database, nodes are created for

all environmental entities and assigned labels that refer to their type,

Table 1

Sample extract from the building table in the larger (Tokyo) city SQL database.

(Primary key)

Height Building area Districts and

zone

Survey

year

13120-bldg-89402 6.9 53.66159 3 2016

13120-bldg-90754 9.5 133.90931 3 2016

13120-bldg-90061 7.4 84.86166 3 2016

13120-bldg-89798 7.3 184.41252 3 2016

13120-bldg-90553 9.4 30.26722 3 2016

−– −– −– −– −–

Fig. 1. Sample extract from the large (Tokyo) city graph database presenting building nodes, their labels, relationships and properties.

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

such as building or vegetation. Samples extracted from the building data

table are presented in Table 3 with a few selected columns. Similarly, a

comparable extract from the graph database is given in Fig. 2. In both

the Tokyo and Espoo test cases, a schema is carefully designed for the

relational databases, determining the precise structure of the tables and

imposing constraints on the records eligible for insertion into these ta-

bles. Conversely, in the case of graph-based databases, the data sets are

introduced immediately following the creation of the databases without

defining a schema.

Once the environmental entities and their attributes from the city

data sets are inserted into the database systems, the next step involves

storing the relationships among these entities. In relational database

systems, association tables are created to represent these relationships

between city entities, which exhibit many-to-many characteristics.

Foreign key restrictions are utilised to reference the main tables in the

association table as well as to enforce the link. Validating these re-

strictions during data insertion by the DBMS introduces additional

processing time to the procedure. Given that proximity between build-

ings is a bidirectional association, two rows are created to represent each

direction of such relationships, resulting in a record count that is twice

the number of available relationships. In the case of the graph-based

database, the procedure for establishing relationships between envi-

ronmental entities involves retrieving each target node through read

queries, followed by the assignment of the relationship. Indexes are first

created to enhance the efficiency of retrieving nodes. In contrast to the

relational database, bidirectional relationships can be represented using

a single edge in the graph-based database. Hence, a single edge is suf-

ficient to represent a relationship between two entities, avoiding the

need to define inverse relationships. Sample extract from the association

tables from the Tokyo and Espoo data set is presented in Tables 2 & 4,

respectively. Meanwhile, Figs. 1 & 2 demonstrate how the same

Fig. 2. Sample extract from the small (Espoo) city graph database presenting building nodes, their labels, relationships, and properties.

Table 2

Sample extract from the nearby association table that stores proximity re-

lationships between buildings in the large (Tokyo) city SQL database.

(Primary key)

Building Id 1

(Foreign key)

Building Id 2

(Foreign key)

67637903 13120-bldg-89402 13120-bldg-90061

67682035 13120-bldg-89798 13120-bldg-90061

67682040 13120-bldg-89798 13120-bldg-90553

67738056 13120-bldg-90061 13120-bldg-90553

67738059 13120-bldg-90061 13120-bldg-90754

−– −– −–

Table 3

Sample extract from the building data table in the small (Espoo) city SQL database.

(Primary key)

Street House Number Postal Code Year of Construction Structure Type of use

14219 Meripoiju 4 02320 2007 Concrete Parking

14220 Meripoiju 3 02320 1972 Concrete Apartment building

14223 Meripoiju 1 02320 1973 Concrete Apartment building

14254 Kivenlahdenkatu 4 02320 1973 Concrete Apartment building

14253 Kivenlahdenkatu 6 02320 1973 Concrete Apartment building

−– −– −– −– −– −–

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

relationship is stored in the graph databases.

After completing the design of the database systems for the Espoo

city data set and populating them with the structured data originating

from the CityGML model, we proceeded to introduce additional data

from OpenStreetMap that is characteristically unstructured. This step is

included to evaluate how each database system adapts to and manages

the incorporation of data that deviates from its original design. This

approach provides insights into the systems’ capabilities in handling

scenarios that involve irregular or evolving data structures. The sup-

plementary data from OpenStreetMap introduces new city elements like

fire hydrants and extra attributes to the existing buildings. To incorpo-

rate the data related to fire hydrants, a new table is established within

the relational database, listing all fire hydrants within the city along

with their associated attributes. Additionally, another table is set up to

capture the relationship between each hydrant and the buildings it

serves. In parallel, within the graph database, new nodes labelled ’Hy-

drants’ are instantiated to represent individual fire hydrants. Subse-

quently, these hydrants are interconnected with the buildings they

service through relationship edges. An excerpt of the hydrant table, as

well as an association table that relates hydrants to the building they

serve in the relational database, is displayed in Table 5. Meanwhile, a

representation of similar data in the graph database can be seen in Fig. 3.

The newly acquired data from OpenStreetMap also includes building

attribute data that vary significantly from one building to another. For

instance, while some civic buildings possess official names in Finnish,

Swedish, and English, most city buildings lack these attributes. Such

variability in attributes is pervasive throughout the dataset. In the

design of the relational database, a new table is created to store all the

newly added building property data. Additionally, properties extracted

from CityGML, apart from the address, are transferred from the initial

building data table to this new property table. The rationale for this

division is to maintain a lightweight address table, which will be utilised

Table 4

Sample extract from the nearby association table that stores proximity re-

lationships between buildings in the small (Espoo) city SQL database.

(Primary key)

Building Id 1

(Foreign key)

Building Id 2

(Foreign key)

1775022 14219 14223

1775029 14219 14254

1775020 14219 14220

1775161 14223 14254

1776059 14253 14254

−– −– −–

Table 5

Sample extract from the hydrant table (top) and the association table (bottom)

that relates a hydrant with the building it serves in the small (Espoo) city graph

database.

Id (Primary key) Latitude Longitude Type Position

4300534015 60.1547 24.7054 Underground

4300534016 60.1548 24.7388 Underground

4300534017 60.1548 24.774 Underground Lane

4300534018 60.1549 24.6307 Underground

4300534019 60.1549 24.6395 Underground

— — — —

Id (Primary key) Building Id (Foreign key) Hydrant Id (Foreign key)

23232 14252 4300534018

23233 14252 4300534020

23234 14253 4300534018

23235 14254 4300534018

23236 14255 4300534018

— — —

Fig. 3. Sample extract from the small (Espoo) city graph database presenting building nodes, fire hydrant nodes, their labels, relationships, and properties.

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

to identify buildings. In subsequent stages of testing, this table will be

used jointly with the proximity table to identify adjacent buildings.

Thus, by keeping the table lightweight, the cost of joins will be mini-

mised. Overall, incorporating the newly obtained building attribute data

into the Espoo relational database necessitates adding a new table,

modifying an existing table and migrating some data from the existing

table to the new one, all of which require meticulous attention to ensure

data integrity. In contrast, integrating the newly obtained attribute data

into the graph database was comparatively straightforward. This is

achieved by assigning the existing building nodes their new attributes.

The approach described above for incorporating the new data into the

relational and graph database systems is one method among several

possible approaches. It’s important to note that individuals may choose

to follow different approaches, potentially yielding different perfor-

mance outcomes. Table 6 displays excerpts from the updated relational

database, featuring both the address and property tables. Similarly, a

sample from the revised graph database is illustrated in Fig. 4.

3.3.1.2. Managing building data sets. The building data sets extracted

from the IFC files comprise various building elements and features

interconnected by numerous relationships. The database systems

created to store this data are designed to closely adhere to the structure

defined in the IFC files. For both the smaller and larger building data

sets, a similar procedure is followed to design relational and graph

database systems.

In the case of the relational database, tables are created from IFC

entity classes representing building elements, spatial elements, object

types and property sets. The building elements table is formed based on

the IfcBuildingElement class, with each row storing instances belonging to

the class. While each of these instances belongs to a different subclass of

the IfcBuildingElement class, they are all stored in a single table for

simplicity and to minimise the use of join queries. Attributes of the

IfcBuildingElement class became columns in the building elements table

and they include global ID, owner history, name, predefined type, object

type and tag. Furthermore, a column is added to store the subclass of the

IfcBuildingElement class (such as IfcBeam, IfcColumn and IfcDoor) to

which each instance belongs to help categorise the entity table. After-

wards, another table is created to store object types, aggregating in-

stances of the IfcBuildingElementType class. Similar to the building

elements table, attributes of this class are translated into column fields.

Table 6

Sample extract from the modified Espoo city SQL database, depicting the address

(top) and the new properties (bottom) table.

Id (Primary key) Street House Number Postal Code

14219 Meripoiju 4 02320

14220 Meripoiju 3 02320

14223 Meripoiju 1 02320

14254 Kivenlahdenkatu 4 02320

14253 Kivenlahdenkatu 6 02320

— — — —

Id (Primary

key)

Building id (Foreign

key)

Name Value

142017 14254 Year of

Construction

1973

142018 14254 Structure Concrete

142019 14254 Type of use Apartment

building

142020 14254 Building levels 5

142021 14254 Roof shape Flat

— — — —

Fig. 4. Sample extract from the small (Espoo) city graph database presenting building nodes, their labels, relationships, and updated properties.

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

Next, a table is created to house data concerning the different spaces

within the building datasets, represented in IFC using the IfcSpace class,

while another table is created for storing building story data following

the structure of the IfcBuildingStorey class. Lastly, a table is devised to

store properties, with columns for an ID, property name, property set

name, and value. This single table accommodates properties for all

building elements, element types, and spaces. In the case of the graph

database design, the process adheres to the general procedure outlined

at the beginning of this section. Nodes and their properties are generated

according to the IFC class-attribute structure. These nodes are equiva-

lent to all the records in all the tables of the relational database. Labels

are subsequently employed to categorize these nodes, and they corre-

spond to the tables (i.e. table names) in the relational database. Since

multiple labels can be utilised, building elements are assigned a second

label to store their specific element type which is akin to the element

type column defined in the building elements table of the relational

database.

After designing the main tables in the relational database, the sub-

sequent step involves creating association tables to store selected re-

lationships from the IFC datasets. Individual association tables are

created to separately manage various relationships, including those

between decomposition elements (IfcRelAggregates), connected ele-

ments (IfcRelConnectsElements), elements and their types (IfcRelDefi-

nesByType), elements or element types and their properties

(IfcRelDefinesByProperties), elements and the spatial structure that

contains them (IfcRelContainedInSpatialStructure), a space and its

bounding elements (IfcRelSpaceBoundary) and a space and its covering

(IfcRelCoversBldgElements). Each of these tables is structured with

columns derived from the attributes of the respective IFC classes. Everly

table contains two foreign key columns that refer to the records that are

being associated. Complementing these columns are global ID, owner

history, and name column fields. To represent these relationships within

the graph database, named edges that can store properties are utilised.

The edge names correspond to the relationship types highlighted in this

paragraph, mirroring equivalent association tables in the relational

database. Similarly, the attributes of the IFC relationship classes, rep-

resented as column fields in the association tables, are transformed into

properties of the relationship edge in the graph database. Each rela-

tionship needs to be linked to an owner history node. However, Neo4j

only supports relationships between nodes. Therefore, as a workaround,

the ID of the relevant owner history node is included as a property in the

relationship edges. Similar to the database design for the city database,

it should be noted that these design choices are not the only possible

approaches for designing the databases. The guiding principle for the

design outlined here is aimed at closely following the IFC structure to

minimise structural alterations to the data sets. Table 7 presents a

sample from the building elements table extracted from the small

building relational database. Table 8 showcases examples from the as-

sociation table from the same database that stores the relationships

between interconnected building elements. Finally, Table 9 presents the

owner history table that is referred to in the other two tables. The data

organization in the larger building database adheres to a similar struc-

ture. Meanwhile, a sample extract from the small building graph data-

base presenting the connected building elements along with their label

and properties is given in Fig. 5.

The process described so far represents the effort done to store the

four real-world data sets in relational and graph databases. In a later

section, qualitative observations made during this database design and

Table 7

Sample extract from the building elements table in the small building SQL database.

(Primary key)

Element type Owner

history

(Foreign key)

Name Object type Tag Predefined type

0I$5J2q9f4neHdIhXKwt0k IfcWall 1 Basic Wall 4M_External_basement 150906 NOTDEFINED

1PC0J7QuP1XvCE8kJzHa24 IfcWall 1 Basic Wall 4M_Interior_single_hollow_brick 155222 NOTDEFINED

1PC0J7QuP1XvCE8kJzHdv8 IfcWall 1 Basic Wall 4M_Interior_single_hollow_brick 155802 NOTDEFINED

1PC0J7QuP1XvCE8kJzHa3n IfcWall 1 Basic Wall 4M_Interior_single_hollow_brick 155171 NOTDEFINED

1PC0J7QuP1XvCE8kJzHaGW IfcWall 1 Basic Wall 4M_Interior_double_hollow_brick 154354 NOTDEFINED

−– −– −– −– −– −– −–

Table 8

Sample extract from the connected path elements association table in the small building SQL database.

(Primary key)

Owner

history

(Foreign key)

Name Relating element

(Foreign key)

Related element

(Foreign key)

2St4Zrjrj6BgvhZ4uLX_AW 1 Structural 1PC0J7QuP1XvCE8kJzHdv8 0I$5J2q9f4neHdIhXKwt0k

3Z06H71mr5LvYai$WSxt6t 1 Structural 1PC0J7QuP1XvCE8kJzHdv8 1PC0J7QuP1XvCE8kJzHa24

225XQaH6L2gAbtm6vx9CSC 1 Structural 1PC0J7QuP1XvCE8kJzHa24 1PC0J7QuP1XvCE8kJzHa3n

0smQl3IGbAiQ2WeuOxOM$v 1 Structural 1PC0J7QuP1XvCE8kJzHa24 1PC0J7QuP1XvCE8kJzHaGW

1k5U80sVH6BR3TT5smPLg$ 1 Structural 1PC0J7QuP1XvCE8kJzHa3n 1PC0J7QuP1XvCE8kJzHaCD

−– −– −– −– −–

Table 9

Sample extract from the owner history table in the small building SQL database.

Id (Primary key) Owninguser id

(Foreign key)

Owning

application id

(Foreign key)

State Change

action

Last modified date Last modifying user Last modifying application Creation date

1 1 1 NOCHANGE 1549365521

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

manipulation process are presented. In the next section, a quantitative

evaluation of the target database systems is conducted by successively

increasing the complexity of relationships within both the building and

city data sets.

3.3.2. Quantitative evaluation of data retrieval performance

This section of the comparative study focuses on evaluating the data

retrieval performance of the database systems when dealing with

interrelated data sets. The relationships traversed and the size of data

sets are key control parameters in the evaluation. The assessment in-

volves executing a set of queries and measuring the amount of time

required by each database system to execute the query and retrieve the

requested data. Exemplary activities from the fire emergency domain

are used to illustrate the real-world application of the queries utilised in

the study. All queries are executed ten times to get an average execution

time, which is presented using multiple tables.

The performance tests conducted are grouped into three categories

based on their objective, which are:

1. Retrieving data without traversing any relationships

2. Retrieving data by traversing a single relationship.

3. Retrieving data by traversing several relationships.

3.3.2.1. Retrieving data without traversing any relationships. The first use

case to be considered is the retrieval of data from the data sets that do

not require traversing relationships. This is a critical use case to consider

since there are several scenarios in which required building and envi-

ronmental data can be obtained from a single table. In such scenarios,

only a list of physical or abstract features filtered by one or more attri-

butes is needed. For example, a firefighters’ information system might

request a list of windows from a building database to determine a point

of entry into a building. In those cases, retrieving necessary data does

not necessitate connecting multiple tables or nodes to traverse re-

lationships between data points. Hence, this evaluation case compares

the suitability of relational and graph-based database systems for use

cases where the required data is retrieved without traversing any

relationships. The experiment involves executing a series of read

queries, wherein the quantity of retrieved records increases with each

subsequent query. Consequently, the tests offer insights into the per-

formance of the database systems as the retrieved data becomes exten-

sive. Two datasets, a small building dataset and a large city dataset, are

utilised in the tests to evaluate the effect of database size on the per-

formance of the data persistent systems. For the building test case, a set

of queries that request building elements filtered by their element type

attribute are executed. The five queries used for the relational database

and graph database are given in Listing 1 & 2, respectively. Successive

queries progressively retrieve larger volumes of data from their

respective databases. Notably, comparable queries from both sets, such

as the first SQL query and the first Cypher query, yield identical results.

Furthermore, all attributes used as filters in these queries are indexed in

both database systems to optimise query performance.

Listing 1. The successive SQL queries that are used to retrieve data without

traversing any relationships in the small building database.

Fig. 5. Sample extract from the small building graph database presenting the connects path element relationship between building elements along with their label

and properties (the building element nodes are assigned the ‘BuildingElement’ label in addition to the labels visible in the figure).

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

Listing 2. The successive Cypher queries that are used to retrieve data without

traversing any relationships in the small building database.

For the city test case, a series of queries that retrieve a collection of

buildings filtered based on their district and zone attributes are executed

against the large (Tokyo) databases. The SQL and Cypher queries uti-

lised for this test are provided in Listing 3 and 4, respectively.

Comparative queries from both sets yield identical values. Successive

queries incrementally retrieve larger volumes of data from their

respective databases. All attributes utilised in these queries are indexed

in both database systems.

Listing 3. The successive SQL queries that are used to retrieve data without

traversing any relationships in the large city database.

Listing 4. The successive Cypher queries that are used to retrieve data without

traversing any relationships in the large city database.

3.3.2.2. Retrieving data by traversing a single relationship. The second

evaluation case considered in this study is the retrieval of data from a

database, which requires the traversal of a single relationship. In

numerous applications that utilise building and environmental data,

required data is often retrieved by traversing a few relationships. For

instance, fire service providers may seek a list of windows along with

their fire rating property from a building data set. In the relational

database created for this research, property sets are stored in a separate

table from the building elements table, hence necessitating a join to

access the required information. Similarly, firefighters often require a

list of fire hydrants located near a building, which requires a join be-

tween building and hydrant tables in our city databases. As a result,

performance tests are conducted in this section to evaluate the

comparative performance of the relational and graph database system in

managing use cases where few relationships need to be traversed to

acquire needed data. To this end, a series of queries are executed that

traverse a single relationship to retrieve data. Each query set gradually

increases the amount of retrieved data to assess its impact on query

performance. The queries are executed against both small and large data

sets to highlight the effect of building and environmental data set size on

the performance of the database systems. The set of queries executed

against these data sets are the following:

For the building test case, queries are run to fetch building elements

alongside their corresponding properties, which are stored in separate

tables or nodes. The queries are executed against the large building data

set. The SQL and Cypher queries used for this test are given in Listing 5

and 6. Comparative queries from both query sets return identical values.

All attributes used in these queries are indexed in both database systems.

Listing 5. The successive SQL queries that are used to traverse a single rela-

tionship in the small building database.

Listing 6. The successive Cypher queries that are used to traverse a single

relationship in the small building database.

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

In the city test cases, queries retrieving a list of buildings adjacent to

a target building are executed. The queries are executed against the

Tokyo city data set. The SQL and Cypher queries used for this test are

given in Listing 7 and 8. Comparative queries from both query sets re-

turn identical values. All attributes used in these queries are indexed in

both database systems.

Listing 7. The successive SQL queries that are used to traverse a single rela-

tionship in the large city database.

Listing 8. The successive Cypher queries that are used to traverse a single

relationship in the large city database.

3.3.2.3. Retrieving data by traversing several relationships. There are

several practical use cases in built environment management where it is

required to traverse several relationships within building and environ-

mental data sets. To illustrate, in the context of building emergency

management, there are situations where it becomes essential to trace

relationships starting from a specific point of interest. For instance, in

the case of indoor navigation, it is often crucial to find a route from the

building’s entry to the room affected by fire, which can be achieved by

utilising space and door adjacency relationships. By tracing the

connections among spaces and doors, fire service providers can effec-

tively manoeuvre from a particular space, such as the entry door, to-

wards the impacted area, or, in the case of rescuing individuals, they can

navigate from the affected space to an exit door. Similarly, when eval-

uating the propagation of smoke and fire from an affected area to the

rest of the building, traversing spatial relationships from the affected

space to the rest of the building becomes necessary.

Similar to the building data sets, environmental data sets can also

contain complex relationships that need to be traversed in order to

retrieve data that is needed for a given use case. For instance, during a

fire hazard, a fire could spread from an affected building to nearby

buildings, and smoke may propagate to adjacent areas. Identifying

nearby high-risk buildings, such as schools, hospitals, or shopping cen-

tres, can facilitate necessary actions to protect the occupants of those

buildings. This identification process requires traversing the adjacency

relationship between buildings, starting from the affected building.

Moreover, the capability to traverse spatial relationships between

environmental features can support firefighters as they explore and

navigate a complex environment to reach an affected building. These

examples underscore the importance of efficiently traversing complex

relationships that often exist in building and environmental data sets.

The tests in this section centre on assessing the influence of

traversing multiple relationships on the performance of relational and

graph database systems. Each test involves executing a sequence of ten

individual queries, with the number of traversed relationships

increasing in consecutive queries. The tests are conducted on both small

and large data sets to understand how performance is influenced by the

scale and complexity of building and environmental data sets. Four tests

are included in this category, each associated with one of the four test

cases.

For the small building data sets, queries that traverse the connection

between adjacent building elements are utilised. The initial three

consecutive SQL and Cypher queries employed for this test are provided

in Listing 9 and 10, with the remaining queries following a similar

pattern.

Listing 9. The first three queries from the successive SQL queries that are used

to traverse an increasing number of relationships from the small build-

ing database.

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

Listing 10. The first three queries from the successive Cypher queries that are

used to traverse an increasing number of relationships from the small build-

ing database.

A similar test was carried out against the larger building database

where the connection between building elements is traversed. The SQL

and Cypher queries used for this test are given in Listing 11 and 12.

Listing 11. The first three queries from the successive SQL queries that are

used to traverse an increasing number of relationships from the large build-

ing database.

Listing 12. The first three queries from the successive Cypher queries that are

used to traverse an increasing number of relationships from the large build-

ing database.

In the case of the city databases, queries are executed to traverse the

adjacency relationship buildings starting from a specific building. The

SQL and Cypher queries utilised for this purpose are provided in Listing

13 and 14 for the small city dataset and in Listing 15 and 16 for the

larger city database.

Listing 13. The first three queries from the successive SQL queries that are

used to traverse an increasing number of relationships from the small

city database.

Listing 14. The first three queries from the successive Cypher queries that are

used to traverse an increasing number of relationships from the small

city database.

Listing 15. The first three queries from the successive SQL queries that are

used to traverse an increasing number of relationships from the large

city database.

Listing 16. The first three queries from the successive Cypher queries that are

used to traverse an increasing number of relationships from the large city

database.

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

4. Evaluation results

4.1. Designing and maintaining the database systems

Managing City Data Sets: The following observations are made

from the qualitative assessment of the design and maintenance of the

target database systems.

•The process of populating the Tokyo city data set, which exhibits a

well-organised structure, into both the relational database and

graph-based database system was a relatively straightforward and

quick process. However, a disparity in the required effort is observed

when relationships are created. In the relational database, a new

association table was created and populated quite quickly. In

contrast, writing relationships to the graph-based database required

more effort as nodes needed to be first queried before the required

relationships were created. Overall, the implementation of the data

model was noticeably more straightforward in the relational data-

base compared to the graph-based database system.

•Additional unstructured data from OpenStreetMap was integrated

into the databases containing the Espoo dataset. This inclusion aimed

to observe how the systems adapt to changes in the data that need to

be stored. Consequently, the relational database that initially stored

the CityGML data underwent significant alterations to accommodate

the new data. Specifically, a new table was created to store the newly

added unstructured building properties separately from the table

containing the structured building address data. In contrast, the

integration of new unstructured datasets is notably straightforward

in the graph database. There, each pre-existing node, which repre-

sents a building, is readily allocated new attributes pertinent to it.

Notably, the data schema within the graph database dynamically

evolves alongside the inserted data. This stands in stark contrast to

the relational database system, where the schema is predefined prior

to data insertion. Consequently, in response to evolving re-

quirements, the relational database schema necessitates redefinition,

imposing additional overhead and complexity. Thus, the graph

database offered increased flexibility in managing the unstructured

data and accommodating changes in the data structure. In contrast,

the relational database ensures a clear and consistent data structure

is maintained by employing a predefined schema that is separated

from the data set.

•The average number of relationships created per second in both the

relational database and the graph-based database system are pre-

sented in Table 10 for both the Espoo and Tokyo data sets. The results

show how writing performance is affected by the database size. The

relational database system validates the relationships based on

foreign key restrictions as they are created, and hence, it is signifi-

cantly affected by the increase in database size. Meanwhile, the

graph-based database that queries nodes using indexes is minimally

affected by the increase in database size. Consequently, the relational

DBMS exhibited superior write performance when handling a

smaller database but lagged behind the graph DBMS as the database

size expanded.

Managing building data sets: Two use cases were considered in the

study to assess the management of building data from the IFC model in

the target data persistent systems. The following observations are made

from the study.

•The complex interrelationships exhibited in IFC building models are

precisely replicated in the graph-based database system in both use

cases. In contrast, when implementing this data structure in the

relational database, significant modifications were necessary to

transform it into a table-based structure. The implementation of

numerous tables and foreign keys was required to replicate the net-

worked nature of IFC data. Evidently, the complex relationships in

the IFC data are more intuitively represented and are more friendly

for humans to comprehend when using the graph-based database

system’s node-edge structure compared to the tables and foreign

keys used in the relational database system.

•The distinction in how relationships are managed by the two data-

base systems is also reflected in the structure and performance of

queries that traverse those relationships. Listing 9–16 show the

composition of queries written to traverse the relationships found in

the data set. Although both the SQL and Cypher queries fulfil the

same goal and retrieve identical results, there exists a noteworthy

contrast in their composition and execution. The script written for

the graph database demonstrates a notably higher degree of

conciseness in comparison to the SQL query. Navigating these re-

lationships was also much more straightforward in the graph-based

database, where the queries were much more concise. In contrast,

the relational database system relies on creating multiple joins to

construct the relationship between tables, resulting in verbose and

complex scripts. Furthermore, the graph-based database queries

reduce the cost of traversing relationships by isolating the starting

point of a relationship prior to establishing relationships originating

from it. On the contrary, the SQL queries perform expensive joins on

entire tables before filtering out unneeded data.

4.2. Query performance results

The results of the quantitative evaluation of read query performance

are presented in Tables 11 - 14. The following series of paragraphs

discuss these findings.

•Retrieving data without traversing any relationships: As the

experiment results presented in Table 11 demonstrate, when

retrieving data that does not require traversing any relationships, the

relational database demonstrated better performance compared to

the graph-based database across all executed queries. This result is

maintained whether the target data set is small or large in size. It can

also be observed from the results that as the size of data being

retrieved increases, the performance difference between the data

systems increases in a relatively uniform manner. Hence, retrieving a

Table 10

Comparing the time spent writing relationships in both the relational and graph-

based database systems.

City data

sets

No. of

buildings

No. of relationships

(between buildings)

No. of relationships

written per second

Relational

Graph

Espoo 63,042 981,742 57,742 35,400

Tokyo 1,768,233 41,871,173 15,350 33,335

Table 11

Execution time comparison for retrieving data without traversing any relation-

ships from a small building (a) and a large city (b) data set.

(a) (b)

Small Building data set Large City data set

No of

Retrieved

Records

Execution Time (Sec) No of

Retrieved

Records

Execution Time (Sec)

Relational

Graph

Relational

Graph

20 0.004 0.017 150,000 8.85 11.41

100 0.010 0.030 250,000 9.83 17.40

300 0.017 0.046 350,000 10.65 22.05

500 0.023 0.060 450,000 12.52 27.99

900 0.028 0.074 550,000 15.16 34.70

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

large number of records exhibited the largest performance difference

in the execution time of the database systems. For instance, Table 11

shows that the performance difference for the smallest number of

records was a fraction of a second, while for retrieving the largest

number of records, it is almost 20 seconds.

•Retrieving data by traversing a single relationship: Test results

presented in Table 12 demonstrate that test cases where it was

required to traverse a single relationship to retrieve data have mostly

similar results to the previous test case where no relationship

traversal is required. Here again, the relational database outperforms

the graph database regardless of the database size. However, the

performance difference is more pronounced when retrieving a more

significant number of records from the data sets.

•Retrieving data by traversing several relationships:

o Small building data set: All queries in this test case were

executed successfully in both data persistent systems (Table 13).

The increase in relationships traversed by the queries had a

negligible impact on their performance, as the increase in execu-

tion time proved to be very small. Although the SQL database

consistently exhibited faster execution times compared to the

graph database, the performance gap observed was also minute,

considering that both systems managed to retrieve the requested

data within a fraction of a second (less than ten milliseconds).

o Large building data set: The increase in the building data set has

a notably more adverse effect on the relational database when

compared to the graph database, as indicated in Table 13. The first

nine queries were successfully executed in the relational database,

where the performance demonstrated a marked decline in the

latter queries. The last query in the set, which required traversing

ten relationships, failed to produce results within a 3-hour time-

frame (10800 seconds) and was consequently terminated. In

contrast, the graph database returned data for all queries, with

only a marginal increase in execution time as the number of re-

lationships increased. Evidently, the performance was minimally

impacted by the increase in the data set size, as all tests were

successfully completed in under a second. It is worth noting from

the results that when traversing up to three relationships, the two

data persistence systems demonstrated a comparable perfor-

mance, with the relational database performing slightly better.

Nevertheless, upon surpassing the third relationship level, the

graph database outperformed the relational database system by

increasingly substantial margins.

o Small city data set: As depicted in Table 14, the traversal of

interrelated environmental data, even within a small city data set,

incurred a performance cost to the relational database. The

executed queries resulted in a short execution time while

traversing up to three relationships. However, this duration

notably increased when traversing four or more relationships. The

test could not be completed for more than five relationship levels

as the execution time ran beyond 3 hours. In contrast, the graph

database executed all queries within a second, exhibiting only a

marginal increase in execution time as the number of traversed

relationships increased.

o Large city data set: The trends observed in the small city data set

are further validated when the same test is conducted on a larger

city data set, as can be seen presented in Table 14. Only four out of

the ten tests could be completed for the relational databases due to

the extensive execution time resulting from traversing relation-

ships, which surfaced early this time owing to the larger database

size. Notably, there was a significant increase in execution time as

the number of relationships increased from three to four. Beyond

that point, when the queries involved traversing more than four

relationships, the relational database failed to produce results

within 3 hours, leading to the termination of the test. In contrast,

the graph database maintained a comparable level of performance

with only a marginal increase in execution time.

5. Discussion

The selection of a suitable data persistent system for a given appli-

cation requires a thorough consideration of the application’s re-

quirements and the nature of the data to be stored. Each database system

possesses distinct strengths and weaknesses that make it suitable for

certain tasks while unsuitable for others. Although there are existing

comparative studies covering multiple domains, there is a lack of

comparative studies within the context of managing building and

Table 12

Execution time comparison for retrieving data by traversing a single relationship

from a small building (a) and a large city (b) data set.

(a) (b)

Small Building Data Set Large City Data Set

No of

Retrieved

Records

Execution Time (Sec) No of

Retrieved

Records

Execution Time (Sec)

Relational

Graph

Relational

Graph

100 0.004 0.012 5 0.0008 0.0033

1,000 0.350 0.818 25 0.0014 0.0054

20,000 0.558 1.248 50 0.0024 0.0066

30,000 0.859 1.993 75 0.0034 0.0088

40,000 1.154 2.853 100 0.0046 0.0102

Table 13

Execution time comparison for traversing an increasing number of relationships

– small and large building data sets.

No. Relationships

Traversed

Execution Times (Sec)

Small Building Data Set Large Building Data Set

Relational

Graph

Relational DB Graph

1 0.0007 0.0033 0.001 0.004

2 0.0009 0.0034 0.002 0.005

3 0.0009 0.0039 0.006 0.005

4 0.0010 0.0039 0.024 0.005

5 0.0014 0.0044 0.174 0.010

6 0.0015 0.0044 0.798 0.013

7 0.0015 0.0058 5.434 0.019

8 0.0022 0.0069 28.305 0.038

9 0.0033 0.0092 209.427 0.076

10 0.0056 0.0104 >10 800

(Incomplete)

0.164

Table 14

Execution time comparison for traversing an increasing number of relationships

– small and large city data sets.

No. Relationships

Traversed

Execution Times (Sec)

Small City Data Set Large City Data Set

Relational DB Graph

1 0.003 0.014 0.010 0.015

2 0.023 0.041 0.076 0.063

3 0.835 0.062 4.263 0.138

4 46.373 0.099 443.249 0.194

5 3149.000 0.137 >10 800

(Incomplete)

0.241

6 >10 800

(Incomplete)

0.157 −0.282

7 −0.199 −0.276

8 −0.211 −0.384

9 −0.233 −0.486

10 −0.249 −0.554

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

environmental data. This research addressed this gap by conducting a

comparative evaluation of relational database systems and graph-based

database systems, specifically in the context of handling interrelated

buildings and environmental data.

Comparing database systems in their entirety is an extremely chal-

lenging task due to the countless factors that can impact a system’s

performance. Variables such as data volume, query tuning and optimi-

sation, DBMS configurations, and the requirements of the use case used

for evaluation can all significantly influence the results of any compar-

ison. Recognizing these challenges, this study does not attempt to pro-

vide an exhaustive benchmark that covers all aspects of database

performance. Instead, it narrows its focus to compare the two types of

database systems based on a select set of criteria deemed most relevant.

This study employs query processing time, relationship complexity, and

dataset size as its primary metrics for comparison. Additionally, it in-

cludes a qualitative evaluation centred on the design process and

adaptability to evolving requirements. The study is conducted using

real-world data sets based on real-world use cases. This targeted

approach allows for a more manageable and meaningful comparison

that can yield practical insights for professionals and researchers

working in this specific area. The following sections delve into the

practical implications derived from the comparative study, accompa-

nied by an examination of its limitations and suggestions for future

research directions.

5.1. Practical implications

In the first half of the evaluation, a relational database and a graph

database are designed to store building data extracted from an IFC

building model and environmental data obtained from CityGML city

models and OpenStreetMap. The findings from this evaluation under-

score the critical importance of data structure and interrelationships in

the selection of a database system for storing and managing building and

environmental data. It was observed that the building data derived from

the IFC models, possessing significant levels of interrelationships,

particularly on the larger data set, was well represented using the graph

database. The graph database’s inherent flexibility in managing inter-

connected data contrasts sharply with the relational system, wherein

relationships were split into multiple tables, potentially complicating

data retrieval and analysis. When it comes to environmental data, Cit-

yGML yielded relatively structured data with limited interrelationships,

which was efficiently stored in relational tables. However, the intro-

duction of additional environmental data from OpenStreetMap pre-

sented significant challenges due to its unpredictable structure and

variable data fields. The relational database required substantial modi-

fications to integrate this data, as it deviated significantly from the

original schema designed during the initial database design. In stark

contrast, the graph database, with its flexible nature, enabled a more

seamless integration of this diverse and unstructured data. Based on

these findings, it can be inferred that the choice between a relational or

graph database should be guided by the nature of the data to be stored.

For building and environmental data with intricate interrelationships

and evolving structures, the flexible and adaptive nature of graph da-

tabases offers a distinct advantage. In contrast, relational databases may

be more suitable for data with a well-defined, stable structure and fewer

interconnections. This insight is crucial for engineers, urban planners

and other practitioners who rely on efficient management for building

and environmental data, urging a thoughtful consideration of their

data’s inherent structure.

The second phase of the evaluation was centred on the extraction of

necessary data from the database systems by efficiently traversing re-

lationships. The test results indicate that, for tasks requiring minimal or

no relationship traversal, the relational database demonstrates superior

performance, irrespective of data set size. This makes them particularly

well-suited for applications with straightforward data retrieval needs,

where the complexity of relationships between data entities is minimal.

However, as the number of relationships traversed by queries increases,

the performance advantage often shifts to the graph database since the

relational database’s performance is negatively impacted by each

additional relationship it needs to traverse. This is due to the inherent

nature of labelled property graph databases, which are optimised for

handling interconnected data, thereby minimizing the performance

degradation associated with complex relationship traversals. However,

the size of the datasets influences the extent to which the performance of

the database systems is affected by relationship traversal. When

employing a small building dataset, the relational database continues to

exhibit better performance than the graph database. This suggests that

for smaller-scale building data sets, relational databases might still offer

the best performance, even when relationships are a component of the

data model. Conversely, when dealing with large building and envi-

ronmental datasets, the performance of relational databases is signifi-

cantly compromised as the volume of relationships increases. In extreme

cases, especially when traversing several relationships within city

datasets, the relational database failed to provide values within a

reasonable timeframe. This underscores the importance of graph data-

bases for use cases with large-scale relationship-intensive data sets,

where the ability to efficiently navigate complex networks of relation-

ships is paramount.

It’s important to clarify at this point that this study is not advocating

for the outright dismissal of relational databases from managing inter-

connected data. Practitioners can employ various strategies to minimize

or avoid the use of joins in relational database systems to traverse re-

lationships. Such techniques can include merging tables or incorpo-

rating redundant data within tables to eliminate the necessity for joins,

storing the precomputed results of complex queries that involve joins to

obviate the need for real-time execution, and caching the results of

frequently executed joins. Additionally, designing the database schema

to enable common data requests to be fulfilled with fewer joins and

optimising queries for efficiency are other crucial strategies. All these

approaches, however, often entail trade-offs with regard to database

integrity, system complexity, storage requirements, and other factors.

Consequently, this study encourages practitioners to adopt alternative

technologies that are better suited for traversing complex relationships,

namely the labelled property graph.

This study has demonstrated that the performance of relational

database systems and graph database systems varies significantly when

utilised for different tasks, underscoring that the choice of a data storage

solution is not a one-size-fits-all decision. Although the study discusses

the optimum selection of database systems, it is important to recognise

that in real-world scenarios, the complexity of software systems often

necessitates the integration of multiple data persistence technologies.

Each technology in such system architecture is chosen for its optimal

alignment with particular requirements. In a collaborative environment,

a relational database system could manage structured data with a clearly

defined schema, while a graph database could be used to manage

complex relationships between data points. Therefore, the overarching

aim of this study is not to suggest the superiority of one system over

another but rather to highlight the importance of strategic selection of

data persistence systems based on specific requirements within a

collaborative environment. This collaborative approach allows for the

harnessing of each database system’s strengths, thereby enabling a more

efficient software infrastructure that can meet varying requirements.

5.2. Limitations and future work

While this comparative study endeavours to provide valuable in-

sights regarding the comparative advantages and limitations of data

persistent systems, it also has some known limitations with regard to the

data sets utilised, the configuration of the DBMSs, the optimisation of

the queries and the assessment metrics. The building and environmental

data sets employed in the study are exclusively comprised of static data.

Nonetheless, a significant volume of real-time data is generated at the

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

building and city levels. For example, the sensor and detector systems in

buildings produce live data and status updates related to various envi-

ronmental factors like temperature, air quality, and potential hazards.

Moreover, building management systems responsible for monitoring

HVAC systems, fire safety systems, and other utility systems can also

generate a substantial amount of live data [38]. Similarly, city-level

sensors generate a significant volume of live data, catering to diverse

use cases such as traffic management and public safety [39]. Conse-

quently, future research works should delve into the assessment of

relevant database systems with regard to managing real-time data.

In acknowledging the factors that can influence the outcomes of the

database comparison presented in the study, it’s crucial to highlight the

significance of decisions related to database configuration and query

optimisation. The extent of tuning and optimisation applied to the target

database systems can potentially lead to disparities that may not solely

reflect the inherent capabilities of the database technologies themselves.

Efforts were made in this study to apply comparable optimisations to

each database system under review, such as the implementation of

indexing, as detailed in Section 3.1. However, achieving identical levels

of tuning and optimisation is challenging due to the inherent differences

in the database systems’ underlying technologies, architecture, and

available optimisation settings. Moreover, the decisions related to

configuration and optimisation inevitably contain a subjective element

that is influenced by the experience and expertise of the practitioner.

Therefore, it is possible to take a different approach towards database

configuration and query optimisation than what is employed in this

study, potentially impacting some of the findings presented here.

An additional limitation of the study that is open for future research

is the measurement metrics and the exemplary use cases used for the

evaluation. The metrics that are the focus of this study are the agreement

of the data model of the database systems with the data to be stored, the

effort required to create and populate the database systems, the

composition of queries in each system and the time required to write and

retrieved interrelated data from the database systems. Nonetheless,

there are more indicators that should be considered for a full evaluation

of data persistent systems. For instance, transaction-related character-

istics play a pivotal role in use cases where numerous users are expected

to access the database simultaneously. The utilisation of mixed load

queries in database comparison is crucial for accurately simulating real-

world conditions where databases are required to handle simultaneous

read and write operations. This approach not only tests database sys-

tems’ efficiency in managing concurrent data access and manipulation

but also can reveal potential bottlenecks and performance trade-offs

between read and write operations. With an increasing number of

users, the database systems’ throughput, which is the number of queries

they can execute per unit of time, becomes an important indicator.

Furthermore, as the size of data sets grows, the scalability of the target

data persistent system becomes imperative. Hence, data partitioning and

distribution are also important factors. The inclusion of these and other

measurement metrics is essential for a fuller comparison of the database

systems. However, providing a comprehensive and thorough explora-

tion of the implications of each of the listed metrics would have

extended beyond the confines of a single article. Consequently, by

concentrating on processing time and database size for quantitative

analysis, this study is able to provide a sufficiently detailed presentation

of findings without compromising the rigour of analysis. In addition to

the measurement metrics, the use case scenarios that are used to guide

the study could also be expanded. The study primarily concentrated on

use cases that highlight the impact of complex data relationships on the

performance of data-persistent systems. Nevertheless, it is acknowl-

edged here that the management of building environmental data within

a data-persistent system can vary based on the nature of the application

area that is under consideration.

The scope of the assessment conducted in this study is focused on the

data models utilised by database systems, as opposed to the database

management systems (DBMS) themselves, namely MySQL and Neo4j.

Nevertheless, it is imperative to acknowledge that the DBMS that is

selected to implement the data models can have some level of influence

on performance. DBMSs offer different query optimisation techniques

which aim to enhance the efficient utilisation of time and resources

during query execution [40]. Moreover, DBMSs offer varying types of

indexes aimed at optimising query execution alongside diverse ap-

proaches for the implementation of these indexes [14]. The variation in

these features among different DBMSs can lead to disparate performance

outcomes, even when they implement similar data models. As a result,

the selection of DBMSs that are different from the one utilised in this

study may influence the outcomes of the study to some extent. Conse-

quently, a holistic evaluation of a given database system should consider

these optimisation features alongside the characteristics of the funda-

mental data model implemented by the system. Furthermore, beyond

resource optimisation, there are more facets of DBMSs that warrant

thorough consideration. For instance, considering the security features

provided by a DBMS for safeguarding data in both storage and transit is

of paramount importance, given that database users typically expect

their data to remain private and subject to access control [41]. It is also

essential to evaluate the attributes of the DBMS aimed at mitigating data

loss through mechanisms such as backup and recovery in the event of

system failures [42]. Moreover, factors such as overall operational cost,

compatibility and integration capability with other software systems,

and the extent of support available from both vendors and the broader

community are also important considerations prior to selecting a spe-

cific DBMS.

This comparative study employed a specific type of graph database

technology known as labelled property graphs. While this technology

offers significant advantages in terms of flexibility and intuitive

modelling of graph-based data, it is important to consider other graph

technologies. Among these alternatives, RDF triple stores represent a

promising direction. RDF inherently utilises a graph data model, facili-

tating the representation of complex relationships among data entities in

a manner akin to labelled property graphs. Simultaneously, RDF enables

the definition of standardized schemas through ontologies, offering a

balance between the rigid schemas of relational databases and the

schema-less nature of labelled property graphs.

In future research, it would be beneficial to explore the suitable

application of appropriate database systems for various use cases related

to the management of the built environment. Relational databases could

be evaluated for their efficiency in handling structured data, whereas

graph-based systems could be assessed for their effectiveness in

modelling and navigating complex relationships within infrastructure

systems. Further investigations could also focus on hybrid approaches

that leverage the strengths of both database systems. This approach has

the potential to unlock new insights and methods for handling the

increasingly complex data challenges faced in the built environment

management domain.

6. Conclusion

This research provides a comparative assessment of graph-based

database systems and relational database systems in terms of man-

aging building and environmental data. The study is conducted using

four data sets, including two building data sets and two city data sets.

The evaluation focused on a qualitative assessment of data organisation

and maintenance as well as a quantitative evaluation of performance

when traversing relationships and retrieving data from the target data-

base systems. To assess the performance of data retrieval, experiments

are conducted with the number of relationships traversed and the size of

the database as key parameters. The following conclusions are derived

from the findings of the comparative study.

•The conclusions drawn from this evaluation highlight the paramount

significance of data structure and interrelationships when selecting a

database system to store and manage building and environmental

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

data. In scenarios involving intricate interrelationships and dynamic

structural changes within data sets, the inherent flexibility and

adaptability of graph databases confer a distinct advantage.

Conversely, relational databases may prove more apt for persisting

data characterized by a well-defined, stable structure and fewer

interconnections.

•The study has shown a significant variance in performance between

relational database systems and graph database systems when

applied to different tasks. This emphasizes that selecting a data

storage solution is not a one-size-fits-all decision. The study high-

lights the importance of graph databases for scenarios involving

large-scale, relationship-intensive datasets, where efficiently navi-

gating complex networks of relationships is crucial. Conversely,

relational database technology is better suited for data management

applications with straightforward data retrieval needs, where the

complexity of relationships between data entities is minimal.

Effective management of building and environmental data can sup-

port several decision-making processes in the management of the built

environment. This necessitates the careful selection of a suitable data

persistence system that can effectively organise data, facilitating its

seamless access and utilisation for the intended use case. This study has

made a valuable contribution towards this objective by conducting an

assessment of relational and graph-based database systems, highlighting

their relative advantages and limitations in managing building and

environmental data. The findings can serve as a valuable resource for

selecting an appropriate data persistence system to manage building and

environmental data effectively. Subsequent investigations may further

enrich this field by implementing suitable data persistent systems to

various practical use cases related to the built environment.

CRediT authorship contribution statement

Eyosias Dawit Guyo: Writing – review & editing, Writing – original

draft, Visualization, Validation, Software, Resources, Methodology,

Investigation, Formal analysis, Data curation, Conceptualization. Timo

Hartmann: Writing – review & editing, Supervision, Project adminis-

tration, Funding acquisition.

Declaration of competing interest

The authors declare that they have no known competing financial

interests or personal relationships that could have appeared to influence

the work reported in this paper.

Data availability

Data will be made available on request.

Acknowledgement

This project was receiving funding from the European Union’s Ho-

rizon 2020 research and innovation programme under the Marie Skło-

dowska-Curie grant agreement No 860555.

References

[1] The Sedona Conference, 2020. The Sedona Conference Glossary: eDiscovery &

Digital Information Management, Fifth. ed.

[2] C.J. Date, The Relational Database Dictionary, Apress, Berkeley, CA. (2008),

https://doi.org/10.1007/978-1-4302-1042-9.

[3] R. Elmasri, S.B. Navathe, Fundametals of Database Systems, Sixth. ed., Addison-

Wesley, 2011.

[4] I. Robinson, J. Webber, E. Eifrem, Graph Databases: New Opportunities for

Connected Data, 2nd ed., O’Reilly Media Inc., 2015.

[5] Murty, P.S.R., 2017. Power Systems Analysis, 2nd ed. Elsevier. https://doi.org/

10.1016/B978-0-08-101111-9.00002-1.

[6] Harrington, J.L., 2016. Relational Database Design and Implementation, Fourth.

ed.

[7] M.J. Donahoo, G.D. Speegle, SQL: Practical Guide for Developers, Elsevier (2005),

https://doi.org/10.1016/B978-0-12-220531-6.X5000-3.

[8] P. Sadalage, M. Fowler, NoSQL Distilled: A Brief Guide to the Emerging World of

Polyglot Persistence, Pearson Education Inc, Vasa, 2012.

[9] Harrington, J.L., 2002. Relational Database Design Clearly Explained, 2nd ed.

Elsevier. https://doi.org/10.1016/B978-1-55860-820-7.X5000-4.

[10] O. Cur´

e, G. Blin, RDF Database Systems: Triples Storage and SPARQL Query

Processing, Elsevier (2015), https://doi.org/10.1016/C2013-0-14009-3.

[11] A. Vukotic, N. Watt, T. Abedrabbo, D. Fox, Neo4j in Action, Manning Publications,

Shelter Island, 2015.

[12] Carpenter, J., Hewitt, E., 2016. Cassandra: The Definitive Guide, Second. ed.

[13] J.L. Harrington, SQL Clearly Explained, Third. Ed. Elsevier. (2010), https://doi.

org/10.1016/C2009-0-61592-0.

[14] T. Halpin, T. Morgan, Information Modeling and Relational Databases, Second. Ed.

Elsevier. (2008), https://doi.org/10.1016/B978-0-12-373568-3.X5001-2.

[15] G. Vaish, Getting Started with NoSQL, Packt Publishing, 2013.

[16] D. Pritchett, BASE: An Acid Alternative, Queue 6 (2008) 48–55, https://doi.org/

10.1145/1394127.1394128.

[17] J. Antas, R. Rocha Silva, J. Bernardino, Assessment of SQL and NoSQL Systems to

Store and Mine COVID-19 Data, Computers 11 (2022) 29, https://doi.org/

10.3390/computers11020029.

[18] M.M. Eyada, W. Saber, M.M. El Genidy, F. Amer, Performance Evaluation of IoT

Data Management Using MongoDB Versus MySQL Databases in Different Cloud

Environments, IEEE Access 8 (2020) 110656–110668, https://doi.org/10.1109/

ACCESS.2020.3002164.

[19] H. Matallah, G. Belalem, K. Bouamrane, Comparative Study Between the MySQL

Relational Database and the MongoDB NoSQL Database, Int. J. Softw. Sci. Comput.

Intell. 13 (2021) 38–63, https://doi.org/10.4018/IJSSCI.2021070104.

[20] C.A. Gy˝

or¨

odi, D.V. Dums¸e-Burescu, D.R. Zmaranda, R.S

¸. Gy˝

or¨

odi, G.A. Gabor, G.

D. Pecherle, Performance Analysis of NoSQL and Relational Databases with

CouchDB and MySQL for Application’s Data Storage, Appl. Sci. 10 (2020) 8524,

https://doi.org/10.3390/app10238524.

[21] V. Abramova, J. Bernardino, P. Furtado, SQL or NoSQL? Performance and

scalability evaluation, Int. J. Bus. Process Integr. Manag. 7 (2015) 314, https://doi.

org/10.1504/IJBPIM.2015.073655.

[22] P. Kotiranta, M. Junkkari, J. Nummenmaa, Performance of Graph and Relational

Databases in Complex Queries, Appl. Sci. 12 (2022), https://doi.org/10.3390/

APP12136490.

[23] V. Abramova, J. Bernardino, P. Furtado, Experimental Evaluation of NoSQL

Databases, Int. J. Database Manag. Syst. 6 (2014) 01–16, https://doi.org/10.5121/

ijdms.2014.6301.

[24] H. Matallah, G. Belalem, K. Bouamrane, Experimental comparative study of NoSQL

databases: HBase versus MongoDB by YCSB, Comput. Syst. Sci. Eng. 32 (2017)

307–317.

[25] D. De Witte, F. Pattyn, L. De Vocht, H. Constandt, E. Mannens, R. Verborgh,

K. Knecht, R. Van De Walle, Big Linked data ETL benchmark on cloud commodity

hardware, in, in: Proceedings of the ACM SIGMOD International Conference on

Management of Data. Association for Computing Machinery, 2016, https://doi.

org/10.1145/2928294.2928304.

[26] P. Pauwels, T.M. de Farias, C. Zhang, A. Roxin, J. Beetz, J. De Roo, C. Nicolle,

A performance benchmark over semantic rule checking approaches in construction

industry, Adv. Eng. Informatics 33 (2017) 68–88, https://doi.org/10.1016/j.

aei.2017.05.001.

[27] DB-Engines, 2023. DB-Engines Ranking - popularity ranking of relational DBMS

[WWW Document]. URL https://db-engines.com/en/ranking/relational+dbms

(accessed 6.17.23).

[28] City of Espoo, 2023. Espoo’s 3D city model [WWW Document]. URL https://kartat.

espoo.fi/3d/citymodel_en.html.

[29] buildingSMART International, 2023. Software Implementations [WWW

Document]. URL https://technical.buildingsmart.org/resources/software-

implementations/ (accessed 9.11.23).

[30] buildingSMART International, 2017. Industry Foundation Classes 4.0.2.1 [WWW

Document]. URL https://standards.buildingsmart.org/IFC/RELEASE/IFC4/ADD2_

TC1/HTML/ (accessed 3.9.21).

[31] Wysocki, O., Schwab, B., Willenborg, B., 2022. Awesome CityGML [WWW

Document]. URL https://github.com/OloOcki/awesome-citygml.

[32] OpenStreetMap, 2023. OpenStreetMap [WWW Document]. URL https://www.

openstreetmap.org/about (accessed 6.17.23).

[33] J. Jokar Arsanjani, A. Zipf, P. Mooney, M. Helbich, An Introduction to

OpenStreetMap in Geographic Information Science: Experiences, Research, and

Applications. (2015) 1–15, https://doi.org/10.1007/978-3-319-14280-7_1.

[34] Mooney, P., Minghini, M., 2017. A Review of OpenStreetMap Data, in: Mapping

and the Citizen Sensor. Ubiquity Press, London, pp. 37–59. https://doi.org/

10.5334/bbf.c.

[35] Costa, G., Sicilia, ´

A., Madrazo, L., Scaramella, L., Martín, S., Izkara, J., Prieto, I.,

Katsigarakis, K., 2017. D2.6: Validation of the district data model repository and

exchange protocols.

[36] S. Roman, Access Database Design & Programming, 3rd Editio. ed., O’Reilly Media

Inc., 2002.

[37] Hunger, M., Boyd, R., Lyon, W., 2021. The Definitive Guide to Graph Databases for

the RDBMS Developer 34.

[38] Y.Y. Ghadi, M.G. Rasul, M.M.K. Khan, Design and development of advanced fuzzy

logic controllers in smart buildings for institutional buildings in subtropical

E.D. Guyo and T. Hartmann

Advanced Engineering Informatics 62 (2024) 102582

Queensland, Renew. Sustain. Energy Rev. 54 (2016) 738–744, https://doi.org/

10.1016/j.rser.2015.10.105.

[39] J. Zhao, H. Xu, H. Liu, J. Wu, Y. Zheng, D. Wu, Detection and tracking of

pedestrians and vehicles using roadside LiDAR sensors, Transp. Res. Part C Emerg.

Technol. 100 (2019) 68–87, https://doi.org/10.1016/j.trc.2019.01.007.

[40] Hellerstein, J., 2015. Chapter 7 : Query Optimization, in: Bailis, P., Hellerstein, J.

M., Stonebraker, M. (Eds.), Readings in Database Systems.

[41] Ramakrishnan, R., Gehrke, J., 2003. Database Management Systems.

[42] Bailis, P., 2015. Chapter 3 : Techniques Everyone Should Know, in: Bailis, P.,

Hellerstein, J.M., Stonebraker, M. (Eds.), Readings in Database Systems.

[43] Amazon Web Services, 2023. Amazon DocumentDB: Developer Guide.

[44] Bell, D.A. (Ed.), 1986. Relational Databases, Relational Databases. Elsevier.

https://doi.org/10.1016/C2013-0-03808-X.

[45] International Organisation for Standardisation (ISO), 2018. ISO 16739-1:2018

Industry Foundation Classes (IFC) for data sharing in the construction and facility

management industries — Part 1: Data schema.

[46] Ministry of Land, Infrastructure, Transport and Tourism, 2023. 3D city model

(Project PLATEAU) 23 wards of Tokyo [WWW Document]. URL https://www.

geospatial.jp/ckan/dataset/plateau-tokyo23ku (accessed 6.17.23).

[47] Open Geospatial Consortium (OGC), 2021. OGC City Geography Markup Language

(CityGML) Part 1: Conceptual Model Standard.

[48] The World Wide Web Consortium (W3C), 2014. RDF 1.1 Concepts and Abstract

Syntax [WWW Document]. URL https://www.w3.org/TR/rdf11-concepts/

(accessed 11.11.23).

E.D. Guyo and T. Hartmann