Analysis of wireless measurements and sensor data for autonomous mobility in industrial environments [original]

Analysis of Wireless Measurements

and Sensor Data for Autonomous

Mobility in Industrial Environments

Bachelorabschlussarbeit

Herdy Herlianto

# XXXXXX

09.12.2022

Betreuer : Dr.-Ing. Martin Kasparick

Rodrigo Hernangómez

Prüfer : Prof. Dr.-Ing. Slawomir Stanczak

Prof. Dr.-Ing. Wilhelm Keusgen

Technische Universität Berlin

Fakultät IV – Elektrotechnik und Informatik

Institut für Telekommunikationssysteme

Fachgebiet Network and Information Theory (NetIT)

Eidesstattliche Erklärung

Hiermit erkläre ich, dass ich die vorliegende Arbeit selbstständig und eigenhändig sowie ohne

unerlaubte fremde Hilfe und ausschließlich unter Verwendung der aufgeführten Quellen und

Hilfsmittel angefertigt habe.

Berlin, den 09.12.2022

Herdy Herlianto

Herdy Herlianto, # XXXXXX iii

Abstract

The following thesis contains an analysis of data from the Enway Campaign which was con-

ducted in 2021 and offered as a bachelor’s thesis by Fraunhofer HHI.

The Enway Campaign is part of the AI4Mobile research project that is funded by the Federal

Ministry of Education and Research under the umbrella of the Artificial Intelligence in Com-

munication Networks plan within the scope of the German Federal Government’s High-Tech

Strategy. The project intends to develop methodologies that provide sustainable Quality of Ser-

vice (QoS) prediction at high mobility by combining data from the entire mobile network with

secondary data from vehicles, applications, or the environment. The result of this campaign

which is in the form of several sets of wireless measurement and sensor data will be utilized

for machine learning activities. In addition to that, the sensor data will be processed to create a

clearance model that acts as the target prediction of the machine learning classifier.

This thesis deals with analyzing sensor data to derive Line-of-Sight (LoS) prediction from

the clearance model using the Fresnel Zone Clearance Model and also create a classifier for

LoS prediction using machine learning classifiers such as Decision Tree, Random Forest, and

Extreme Gradient Boosting (XGB) from the wireless measurement.

Herdy Herlianto, # XXXXXX v

Zusammenfassung

Die folgende Arbeit handelt es sich um eine Analyse von Daten der Enway-Kampagne, die im

Jahr 2021 durchgeführt und als Bachelorarbeit vom Fraunhofer HHI angeboten wurde.

Die Enway-Kampagne ist Teil des Forschungsprojekts AI4Mobile, das vom Bundesministerium

für Bildung und Forschung (BMBF) unter dem Dach des Plans Künstliche Intelligenz in Kom-

munikationsnetzen im Rahmen der Hightech-Strategie der Bundesregierung gefördert wird. Das

Projekt beabsichtigt, Methoden zu entwickeln, die eine nachhaltige QoS Vorhersage bei hoher

Mobilität ermöglichen, indem Daten aus dem gesamten Mobilfunknetz mit Sekundärdaten aus

Fahrzeugen, Anwendungen oder der Umgebung kombiniert werden. Das Ergebnis dieser Kam-

pagne in Form mehrerer drahtloser Mess- und Sensordatensätze wird für maschinelle Lernakti-

vitäten verwendet. Darüber hinaus werden die Sensordaten verarbeitet, um ein Freiraummodell

zu erstellen, das als Zielvorhersage des maschinell lernenden Klassifikators dient.

Diese Bachelorarbeit befasst sich mit der Analyse von Sensordaten zur Ableitung der Sicht-

linienvorhersage (LoS) aus dem Freiraummodell unter Verwendung des Fresnel Zonen Frei-

raummodells und der Erstellung eines Klassifikators für die Sichtlinienvorhersage (LoS) unter

Verwendung von maschinell lernenden Klassifikatoren wie Decision Tree, Random Forest und

XGB aus der drahtlosen Messung.

Herdy Herlianto, # XXXXXX vii

Contents

List of Acronyms and Abbreviations xi

List of Figures xiii

List of Tables xv

1 Introduction 1

2 Theoretical Background 3

2.1 ClearanceModel ................................. 3

2.2 Data Prediction/Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1 DecisionTree............................... 6

2.2.2 RandomForest.............................. 7

2.2.3 Extreme Gradient Boosting (XGB) . . . . . . . . . . . . . . . . . . . 7

3 Data Source 9

4 Evaluation Methodology 11

4.1 SensorData.................................... 11

4.2 WirelessMeasurement .............................. 11

4.3 Workflow ..................................... 13

5 Prediction Result 17

6 Summary and Conclusions 21

Bibliography 23

Herdy Herlianto, # XXXXXX ix

List of Acronyms and Abbreviations

3GPP 3rd Generation Partnership Project

AGV Automated Guided Vehicle

AI Artificial Intelligence

AT Attention

AUC Area Under ROC Curve

BMBF Bundesministerium für Bildung und Forschung

BS Base Station

EPC Evolved Packet Core

LiDAR Light Detection and Ranging

LoS Line-of-Sight

LTE Long Term Evolution

MIMO Multiple Input Multiple Output

ML Machine Learning

nLoS non-Line-of-Sight

PHY Physical

QoS Quality of Service

ROC Receiver Operating Characteristic

RSRP Reference Signal Received Power

RSRQ Reference Signal Received Quality

RSSI Received Signal Strength Indication

SINR Signal-to-Interference-plus-Noise Ratio

SLAM Simultaneous Localization and Mapping

TDD Time-Division Duplexing

UE User Equipment

XGB Extreme Gradient Boosting

Herdy Herlianto, # XXXXXX xi

List of Figures

2.1 Fresnel Zone [Jcm12]............................... 3

2.2 Ellipse [Ag217].................................. 4

2.3 Decision Tree [Pro]................................ 6

2.4 Random Forest [unk] ............................... 7

2.5 XGB [WCC20] .................................. 7

3.1 Autonomous Cleaning Sweeper Provided by Enway [Gar+22] ......... 9

3.2 Route (black) of sweeper for several rounds of measurements including indoor

(green) and outdoor (blue). The position of the base station provided by Götting

(right picture) is marked by the yellow cross [Gar+22]. ............. 9

3.3 Dynamic Obstacle Maps [Gar+22]........................ 10

4.1 Reference Signal Received Power (RSRP) Comparison . . . . . . . . . . . . . 12

4.2 Reference Signal Received Quality (RSRQ) Comparison . . . . . . . . . . . . 12

4.3 StaticMap..................................... 13

4.4 Static Map with Fresnel Zone . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.5 HeatMap ..................................... 14

4.6 LoS and non-Line-of-Sight (nLoS) Area . . . . . . . . . . . . . . . . . . . . . 14

5.1 Top Section of the Decision Tree with Feature Distance . . . . . . . . . . . . . 17

5.2 Top Section of the Decision Tree without Feature Distance . . . . . . . . . . . 17

5.3 Area Under ROC Curve (AUC) Score Comparison . . . . . . . . . . . . . . . 19

5.4 AUC Scores as the Number of Nodes Increase . . . . . . . . . . . . . . . . . . 20

Herdy Herlianto, # XXXXXX xiii

List of Tables

4.1 Topic Table from the .bag File .......................... 11

5.1 Feature Importance in Decision Tree . . . . . . . . . . . . . . . . . . . . . . . 18

5.2 Feature Importance in Random Forest . . . . . . . . . . . . . . . . . . . . . . 19

5.3 Feature Importance in XGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Herdy Herlianto, # XXXXXX xv

1 Introduction

An Automated Guided Vehicle (AGV) is a mobile robot that is often used in factories, ware-

houses, and inventories across several industries. They serve as a distribution system for the

processing and transfer of various commodities. These vehicles automate the workplace, re-

ducing the amount of time and work needed [Kar+16]. Nearly all manufacturing facilities

operate AGV, along with businesses in paper & printing, food & beverage, and other sectors

[Cha+19].

Additionally, the data that are used in this work are collected from an AGV in the form of an

autonomous sweeper from Enway [Gar+22]. While the AGV drives through a dedicated path,

the AGV also collected data from its sensor and wireless measurement between the AGV and

the base station. Furthermore, in many use cases of AGV in industrial environments, data that

are captured by the AGV can be used to predict the QoS. For instance, the cooperation between

AGVs and stationary robots where the main purpose is to have a stationary robot load and

unload products to an AGV without the AGV having to stop [Kul+21]. To synchronize the load

handling and identify the location of the AGV, additional sensors, such as camera systems, are

currently used. With the data provided by the AGV, synchronization of the load handling could

be achieved by providing the prediction of QoS which portrays how reliable the communication

link will be. As a result, it is possible to avoid using extra specialized sensors, such as cameras

and image analysis at the stationary robot, potentially cutting costs. The prediction of QoS is

also beneficial for use cases such as Traffic Management and Velocity Adaptation for AGV in

factory building where the main idea of this case is to maintain the traffic between several AGVs

[Kul+21]. Information, namely link stability, anticipatory velocity, and route adaptation, is

available through the prediction of QoS ensuring the central entity to identifies the best decision

for the AGV and achieves higher efficiency. Furthermore, it enables more AGVs to operate

simultaneously, which speeds up the transfer of products.

The focus of this thesis is to analyze the potential of data collected by AGV as an enabler of

Artificial Intelligence (AI) for telecommunication in industrial environments. The data from the

AGV will be utilized to predict LoS between the AGV and the base station. First, the sensor data

are analyzed to derive LoS using the Fresnel Zone Clearance Model. A few features from the

sensor data that are collected from the AGV are the position of the sweeper as given by odometry

techniques and a static map. Fundamentally, a Fresnel Zone Clearance Model is created between

the odometry path and the base station. Then, the total of the obstacles inside the Fresnel Zone

is calculated and the values in the area of transition between LoS and nLoS are examined. The

value in the area of transition will become the threshold to determine LoS. When the area of

LoS is discovered, prediction of LoS using wireless measurement features such as Reference

Signal Received Power (RSRP), Reference Signal Received Quality (RSRQ), Received Signal

Strength Indication (RSSI), and Signal-to-Interference-plus-Noise Ratio (SINR) gathered by the

AGV will be analyzed. The prediction of LoS is achieved by creating a Machine Learning (ML)

classifier such as a Decision Tree, Random Forest, and XGB with wireless measurement from

the AGV as its input.

Herdy Herlianto, # XXXXXX 1

2 Theoretical Background

2.1 Clearance Model

In an industrial environment, the path that the AGV drives through is predetermined. However,

the environment between AGV and the base station might change due to obstacles and could

potentially cause a disturbance in the communication channel. To analyze the disturbance, path

clearance between AGV and base station is evaluated commonly by applying the Fresnel Zone

Clearance Model. The assessment of how much of the Fresnel Zone is obscured by obstacles

could be used to determine the availability of LoS [JHJ15].

A Fresnel Zone is a buffer that is structured resembling a prolate spheroid (an ellipse that is

"pointy" instead of "squashed") around the LoS that separates two objects, as seen in Figure

2.1. Based on the distance from the objects and the transmission frequency, its radius varies. It

increases in size as one moves away from the center point, which is the point that is equally far

from both objects [Bro+20].

d1d2

Figure 2.1: Fresnel Zone [Jcm12]

A Fresnel Zone is made up of multiple layers or zones depending on its radius, especially the

semi-minor axis represented as bin Figure 2.1. The first zone, which is the zone with the

smallest semi-minor axis, must be kept mostly clear of obstacles to prevent interfering with the

radio reception. However, a minor Fresnel Zone interference is often acceptable. When 80% of

the corresponding first Fresnel Zone is unobstructed, the wireless connection is said to be free

space [JHJ15].

Figure 2.2 shows the parameter of an ellipse with aand brepresenting semi-major and semi-

minor axis, respectively. The point F1and F2represent the AGV and Base Station where Das

presented in Figure 2.1 is the distance between them. To create the Fresnel Zone, the semi-minor

axis of the Fresnel Zone should first be calculated.

Herdy Herlianto, # XXXXXX 3

2 Theoretical Background

Figure 2.2: Ellipse [Ag217]

The formula to calculate the semi-minor axis is as follows:

bn=1

2√nλDwith λ=c

f(2.1)

In Equation (2.1), nis the indexing of the n-th Fresnel Zone, cis the speed of light, and fis the

carrier frequency. The semi-major axis acould be calculated with the Pythagoras theorem. Once

the semi-minor and semi-major of the Fresnel Zone are found, the ellipse could be constructed.

Since the ellipse is not always in a horizontal or vertical form and can be formed at an angle,

the formula for a general ellipse should be altered. The formula of the first Fresnel Zone with

n=1 and b1=bis as follows:

a2+y2

b2≤1 (2.2)

With the help of the polar coordinate system, xand yare given as follows:

x

y=ρcos(ϕ)

ρsin(ϕ);ϕ= [0,2π[(2.3)

since ρis already defined for the xand yaxis as aand b, the polar coordinate is assigned as:

x

y=acos(ϕ)

bsin(ϕ);ϕ= [0,2π[(2.4)

Rotate the ellipse with rotation matrix Rand substitute xand yfrom Equation (2.4):

x

y=Racos(ϕ)

bsin(ϕ)x

y(2.5)

x

y=cos(θ)−sin(θ)

sin(θ)cos(θ)acos(ϕ)

bsin(ϕ)(2.6)

4Herdy Herlianto, # XXXXXX

2.2 Data Prediction/Classification

Multiply both sides with inverse rotation matrix R−1=RT:

cos(θ)sin(θ)

−sin(θ)cos(θ)x

y=acos(ϕ)

bsin(ϕ)(2.7)

With matrix multiplication, the following equations are given:

xcos(θ)+ysin(θ) = acos(ϕ)(2.8)

ycos(θ)−xsin(θ) = bsin(ϕ)(2.9)

Translate the center of the ellipse from (0,0)to (xc,yc)and isolate cos(ϕ)and sin(ϕ):

(x−xc)cos(θ)+(y−yc)sin(θ)

a=cos(ϕ)(2.10)

(y−yc)cos(θ)−(x−xc)sin(θ)

b=sin(ϕ)(2.11)

With the help of Pythagorean trigonometric identity (2.12) and substitution of both the Equa-

tions (2.10) and (2.11), the Equation (2.13) as the formula for the Fresnel Zone ellipse is

given:

cos2(ϕ)+sin2(ϕ) = 1 (2.12)

"(x−xc)cos(θ)+(y−yc)sin(θ)

a#2

+"(y−yc)cos(θ)−(x−xc)sin(θ)

b#2

≤1 (2.13)

In Equation (2.13), θrepresents the angle of rotation counterclockwise and the symbol ≤is

used to also account for points inside the ellipse.

2.2 Data Prediction/Classification

Machine Learning (ML) is a major element of AI that allows it to learn and improve on its

own. Identifying patterns in targeted data, creating descriptive models, and predicting objects

even without having clear predefined rules and models are a few abilities of the ML algorithm

[Mam+18]. ML has been used in many sectors such as in the medical sector where ML is

used to predict whether a person has breast cancer [BD22]. Another example would be in the

educational environment where ML helps identify Student Attentiveness [Ros+13]. In addition,

ML also contributes to the industrial environment to predict QoS as mentioned in Section 1.

ML algorithms are broadly categorized into four paradigms [Mam+18]:

• Supervised learning: A function to predict output values is produced by this learning

algorithm. First, an established training data set is analyzed to begin the procedure. Using

labels and new data, this algorithm could be used to predict future occurrences from

previously learned data.

Herdy Herlianto, # XXXXXX 5

2 Theoretical Background

• Unsupervised learning: This algorithm identifies data that has not been categorized or

labeled in a training data set. Additionally, it performs studies to deduce a function from a

system to explain a concealed structure from unlabeled data. The method of unsupervised

learning is clustering.

• Semi-supervised learning: It incorporates features from both supervised and unsupervised

learning. These algorithms take a huge amount of unlabeled data and a small amount of

labeled data.

• Reinforcement: This algorithm involves interacting with the environment through actions

and error discovery. It enables software agents and machines to decide what behavior is

appropriate in each situation to maximize performance.

One of the many features of ML is the ability to classify data or make a prediction to match a

certain category which falls into the supervised learning of ML. In classification, assessments

are made based on values acquired during observation. This method uses a mapping function,

say f, on input variables, say x, to approximate a discrete output variable, say y. Although

classification output is often discrete, it may alternatively be continuous for each class label in

the form of probability [Mam+18]. For this work, the ML models will be taken from the Python

package Scikit-learn [Ped+11], an open-source library including coding for various ML tasks

such as regression, classification, and clustering [Ma21]. The classification algorithm that will

be assessed for this work is Decision Tree, Random Forest, and XGB.

2.2.1 Decision Tree

One of the most important components of inductive learning is Decision Tree Induction. It is

among the most popular and useful techniques for the inductive approach. Using a training

set as a starting point, the following is a general process for creating a Decision Tree and is

visualized in Figure 2.3. The root node of the tree is first thought of as the complete training

set. The root node is then divided into several sub-nodes using certain heuristic data. If a sub-

node contains instances that are members of a specific class, it is referred to be a leaf node,

otherwise, it will keep splitting the sub-node based on the heuristic data. Up until all leaf nodes

are produced, this procedure is repeated [Li+09]. Figure 2.3 illustrates a Decision Tree for street

crossing as an example.

car

light

cross stop

child

stop cross

Green

Yes Yes

Red

stop

car

Yes

Figure 2.3: Decision Tree [Pro]

6Herdy Herlianto, # XXXXXX

2.2 Data Prediction/Classification

2.2.2 Random Forest

Random Forest is a classification technique implementing supervised ML that combines dif-

ferent Decision Trees. Due to its increased performance using the ensemble approach of ag-

gregating multiple trees and choosing the best result from a varied group of predictors, the

Random Forest is preferred over a single Decision Tree. Random Forest training is based on

the bootstrap approach, where each Decision Tree learns from random samples of the data set

during training, and these samples are utilized repeatedly (as a replacement) for the Decision

Tree. Then, Random Forest selects a random subset of features from the data set and generates

a collection of Decision Trees with the previously selected random samples. The final class of

the input is then determined by averaging the prediction results from these Decision Trees. The

process is visualized in Figure 2.4. This algorithm works effectively because a single Decision

Tree is sensitive to noise, but several Decision Trees reduce the noise and provide more accurate

predictions [Tam20].

2.2.3 Extreme Gradient Boosting (XGB)

Gradient boosting refers to a class of ensemble ML algorithms that could be applied for regres-

sion and classification models. To create ensembles, Decision Tree models are used. Trees are

introduced to the ensembles one at a time, and their purpose is to mitigate the prediction mis-

takes generated by the prior models. The gradient boosting method has an effective open-source

implementation called XGB. XGB helps researchers to solve large-scale problems in the real

world by using a smaller number of resources [Sar+21].

Figure 2.4: Random Forest [unk]

Figure 2.5: XGB [WCC20]

Herdy Herlianto, # XXXXXX 7

3 Data Source

The following section is taken from an AI4Mobile research project report "Final description

of the collected data" on how the measurement of data by the autonomous sweeper AGV is

conducted [Gar+22]. It should be noted that the report is not publicly available and this thesis

will not focus on how the data is collected.

An autonomous sweeper from Enway with an attached User Equipment (UE) linked to a private

Long Term Evolution (LTE) network with a single base station was utilized to perform a meas-

urement campaign at the Enway facilities in Berlin (see Figure 3.1). The autonomous sweeper

traced the same track (see Figure 3.2) several times, sweeping the inside of the building and a

small portion of the outside. The base station for this campaign was placed inside and atten-

uators were employed to offer a variety of radio measurements, including connection losses.

The only network-connected device was the UE, and traffic between the UE and a local server

was generated and recorded on both the UE and server sides. More than 40000 seconds of

measurements are collected in the data.

Figure 3.1: Autonomous Cleaning Sweeper Provided by Enway [Gar+22]

Figure 3.2: Route (black) of sweeper for several rounds of measurements including indoor

(green) and outdoor (blue). The position of the base station provided by Götting

(right picture) is marked by the yellow cross [Gar+22].

The radio and system characteristics of the LTE network are as follows:

• EPC: 4G Core acc. 3GPP Rel. 15

• BS: 3GPP Rel. 9

• TDD system with 20 MHz bandwidth

Herdy Herlianto, # XXXXXX 9

3 Data Source

• Frequency at 3790 MHz in the industrial spectrum between 3.7GHz and 3.8GHz

• Output power: 24 dBm, 2x2 MIMO

• Radiation pattern of build-in antennas: opening angle hor. 60°, vert. 60°

• Gain: 6 dBi

• Size (H x W x D): 330 x 230 x 130

A variety of sensors on the autonomous sweeper were used to collect data throughout the cam-

paign. Included are the following attributes:

• Position of the sweeper as given by odometry techniques.

• Light Detection and Ranging (LiDAR) data: A point-cloud of reflected obstacles such as

walls and standing objects, identified with x,y, and z coordinates.

• Static elevation map: A precomputed, static occupancy grid.

• Far and near map obstacles: Occupancy grids that are relative to the sweeper’s position

and created dynamically in real-time. See Figure 3.3 for details.

Figure 3.3: Dynamic Obstacle Maps [Gar+22]

Through the use of proprietary algorithms for sensor fusion and Simultaneous Localization and

Mapping (SLAM), the occupancy grids are produced from the raw sensor data. Along with the

sensor data, the following features were captured:

• Full UE LTE Stack was captured (MobileInsight [Li+16])

• Physical (PHY) Layer Measurements: RSRP, RSRQ, RSSI, and SINR are also available

via Attention (AT)-command-based measurements

• QoS Measurements (e.g., Throughput and round-trip time from ping command)

The full data set from this autonomous sweeper measurement campaign has been published in

[Her+22].

10 Herdy Herlianto, # XXXXXX

4 Evaluation Methodology

It should be mentioned that for this work, the data that will be used is not the full data set that

the AGV gathered as mentioned in Section 3 and only the data that is relevant for LoS prediction

is analyzed. The relevant data are sensor data that is collected in the form of .bag files, PHY

Layer Measurements such as RSRP, RSRQ, RSSI, and SINR in a form of .log files, and the Full

UE LTE Stack from MobileInsight [Li+16] in a form of .mi2log files. First, the focus will be

pointed at the sensor data by extracting the .bag files to give out the appropriate features.

4.1 Sensor Data

The .bag files contain different types of messages that store information of the measurement.

The following Table 4.1 shows several types of messages.

Table 4.1: Topic Table from the .bag File

Topics Types Message

Count Frequency

/drive_state simple_drive_msgs/SimpleDrive 29508 49.997664

/global_odom nav_msgs/Odometry 17705 30.001102

/navigation/enway_map/far_map_obstacles nav_msgs/OccupancyGrid 11753 20.051698

/navigation/enway_map/map_static_elevation nav_msgs/OccupancyGrid 1 NaN

/navigation/enway_map/near_map_obstacles nav_msgs/OccupancyGrid 11753 20.015242

/sensors/inertial/imu sensor_msgs/Imu 58944 98.474022

/sensors/lidar/lidar_top/points sensor_msgs/PointCloud2 5902 10.000963

/tf tf2_msgs/TFMessage 94738 148.971906

/tf_static tf2_msgs/TFMessage 2 950.658205

Two topics are important for this thesis, namely /global_odom and /navigation/enway_map/map-

_static_elevation. The topic /global_odom contains the odometry position of the AGV while the

topic /navigation/enway_map/map_static_elevation contains the layout of the static map. The

data contained in both topics will play an important part in creating the Fresnel Zone which de-

termines LoS between AGV and the base station. The workflow of this thesis will be explained

further in Section 4.3.

4.2 Wireless Measurement

There are two wireless measurements from the Enway Campaign that is relevant for this work,

namely the PHY Layer Measurements such as RSRP, RSRQ, RSSI, and SINR via AT-command-

based measurements in a form of .log files and the Full UE LTE Stack captured with MobileIn-

sight [Li+16] in a form of .mi2log files. First, the .mi2log files are parsed and the features are

Herdy Herlianto, # XXXXXX 11

4 Evaluation Methodology

observed. After parsing the files, the only relevant features for this work that is contained in the

files are RSRP and RSRQ. Since features such as RSSI and SINR that are contained in the .log

files are not available, data from the .log files are used in the ML activities.

However, the sampling rate of the data in the .log files is coarser than the sampling rate in

.mi2log files, namely 200 ms to 40 ms. For this reason, the data inside the .log files should

be up-sampled and observed whether the up-sampled data has a sharp contrast to the data in

.mi2log files. Figures 4.1 and 4.2 show the difference between the up-sampled data and data

from the .mi2log files.

15:31:57.560000 15:37:20.520000 15:42:19.760000 15:47:20.360000 15:50:37.96

Timestamp

135

130

125

120

115

110

105

100

RSRP [dBm]

Comparison of RSRP between MobileInsight and AT-Command Measurement

RSRP(dBm) [AT-Command]

RSRP(dBm) [MobileInsight]

Figure 4.1: RSRP Comparison

15:31:57.560000 15:37:20.520000 15:42:19.760000 15:47:20.360000 15:50:37.96

Timestamp

RSRQ [dB]

Comparison of RSRQ between MobileInsight and AT-Command Measurement

RSRQ(dB) [AT-Command]

RSRQ(dB) [MobileInsight]

Figure 4.2: RSRQ Comparison

It is clear from Figures 4.1 and 4.2 that the difference is tolerable and therefore the up-sampled

data could be used for ML activities.

12 Herdy Herlianto, # XXXXXX

4.3 Workflow

The .bag files are first extracted using the python package bagpy [Bha]. From the topic /global_odom

which contains the odometry path of the AGV and the topic /navigation/enway_map/map_static_ele-

vation which contains the static map, the following Figure 4.3 is created.

Figure 4.3: Static Map

In Figure 4.3, represented by green is the path of the AGV. The base station, represented by

blue, is located inside the building. For each point of the AGV’s path, a Fresnel Zone will be

created between those points and the base station. To calculate the obstacle inside the Fresnel

Zone, the floor of the map is entirely removed. Therefore, the values that represented the floor

are not accounted for in the calculation. Figure 4.4 shows the Fresnel Zone between one point

in the path of the AGV and the base station as an example and Figure 4.5 shows a heat map with

the accumulated obstacles as values on each point of the path taken by the AGV. Figures 4.4,

4.5, and also 4.6 are magnified to show the path of the AGV in more detail.

Herdy Herlianto, # XXXXXX 13

4 Evaluation Methodology

Figure 4.4: Static Map with Fresnel Zone Figure 4.5: Heat Map

Figure 4.5 already suggests a distinction in obstacle count between the area of LoS and nLoS.

To classify the LoS and nLoS area, a threshold that determines between the two classifications

is needed. Therefore, the values in the area of transition between LoS and nLoS are observed,

and the value that distinguishes both areas are then set as a threshold. Figure 4.6 shows a clear

contrast between LoS and nLoS areas. The area of LoS is represented with green color and the

area of nLoS is represented in red.

Figure 4.6: LoS and nLoS Area

14 Herdy Herlianto, # XXXXXX

4.3 Workflow

After establishing the area of LoS and nLoS from the sensor data, the next step is to create LoS

prediction with the data from the wireless measurement using ML. As mentioned in Section

4.2, data from the AT-command-based measurement will be used. It is important for this step

that the sampling rate for the data in the .log files matches the sensor data. The sensor data has

a sampling rate of 30 ms while data from the wireless measurement has a sampling rate of 40

ms. Here the sensor data are down-sampled to match the wireless data.

After the sensor data are down-sampled, data sets for ML activities will be created. Two versions

of the data set will be used to train the ML. Both data sets include features namely, RSRP,

RSRQ, RSSI, SINR, and the binary LoS and nLoS classification for the target of prediction. The

difference between the two data sets is the feature distance representing the distance between

AGV and the base station for each point of the path taken by the AGV. It should be noted that

the data that are being used for the data sets are the collection of all data from every run of the

AGV and not only from a singular run.

After the data sets are created, the prediction model that uses the data set as its input is used to

determine LoS and nLoS. Classification algorithms such as Decision Tree, Random Forest, and

XGB are used for the prediction of LoS. Then, the accuracy of each method will be compared.

To train the data set, 20% of the data are used for the validation (or test) set while the other 80%

as the training set. It should be noted that the data set is not split randomly. Instead, the data

set is split in order where the first 80% of the data set is allocated for the training set and the

remaining 20% for the validation (or test) set.

In addition, the best hyper-parameter for each classification algorithm should be decided. For

the Decision Tree, only the hyper-parameter max_depth is observed. As the name suggests,

max_depth limits the growth of the Decision Tree by limiting the depth of the tree. To find

the best max_depth, k-fold cross-validation is used. In this work, the 5-fold cross-validation is

assessed where this method divides the training set into five equal sections. Then, four parts of

the training set are used as a training set and the remaining portion as a test set. This also means

that 80% of the training set is used for training and 20% as a test set. The hyper-parameter value

that gives the highest Area Under ROC Curve (AUC) score for the test set will be used to create

the final prediction model. The AUC score is used to assess the performance of the model. The

higher the AUC score the better the model performs.

In the Random Forest algorithm, the hyper-parameters max_depth and n_estimators are ob-

served. n_estimators sets the number of trees that are used in the algorithm. The method Grid-

SearchCV is used where the method will try a list of different max_depth and n_estimators

values, train the training set, and find the highest AUC score for a certain max_depth and

n_estimators value. The method GridSearchCV that will be used in this work also applies

the 5-fold cross-validation to find the best hyper-parameter value that gives the highest AUC

score.

Then in XGB, the hyper-parameters max_depth,n_estimators, and learning_rate are observed

and the same method GridSearchCV is applied to find the best value for each hyper-parameter.

After the best hyper-parameter for each classification algorithm is decided, the final prediction

model for each algorithm is created and tested using the validation (or test) set which is as

mentioned before, 20% of the whole data set.

Herdy Herlianto, # XXXXXX 15

5 Prediction Result

The following section discusses the result of each classification algorithm for both the data set

without distance as its feature and the data set with feature distance. First, the Decision Tree

algorithm is assessed. The best max_depth value for the classifier that gives the highest AUC

score is six for the data set that has feature distance and four for the data set without feature

distance. The top section of the Decision Tree for the data set with feature distance is shown in

Figure 5.1, while the Decision Tree for the data set without feature distance is shown in Figure

5.2.

(...) (...) (...) (...) (...) (...) (...) (...)

distance <= 8.459

gini = 0.357

samples = 27922

value = [15954.993, 4827.248]

class = 0

RSRP[dBm] <= -106.5

gini = 0.048

samples = 185333

value = [11803.6, 463069.849]

class = 1

RSRP[dBm] <= -100.5

gini = 0.343

samples = 465

value = [266.81, 75.339]

class = 0

distance <= 10.27

gini = 0.0

samples = 722303

value = [439986.097, 39.064]

class = 0

RSRP[dBm] <= -108.5

gini = 0.106

samples = 213255

value = [27758.593, 467897.097]

class = 1

distance <= 10.217

gini = 0.001

samples = 722768

value = [440252.907, 114.403]

class = 0

distance <= 10.207

gini = 0.5

samples = 936023

value = [468011.5, 468011.5]

class = 0

Figure 5.1: Top Section of the Decision Tree with Feature Distance

(...) (...) (...) (...) (...) (...) (...) (...)

RSRP[dBm] <= -110.5

gini = 0.007

samples = 678478

value = [412973.11, 1490.029]

class = 0

RSRQ[dB] <= -7.5

gini = 0.417

samples = 40700

value = [22708.697, 9545.674]

class = 0

RSRP[dBm] <= -102.5

gini = 0.329

samples = 48490

value = [16097.535, 61565.554]

class = 1

RSSI[dBm] <= -74.5

gini = 0.076

samples = 168355

value = [16232.158, 395410.242]

class = 1

RSRP[dBm] <= -109.5

gini = 0.048

samples = 719178

value = [435681.806, 11035.704]

class = 0

RSSI[dBm] <= -76.5

gini = 0.123

samples = 216845

value = [32329.694, 456975.796]

class = 1

RSRP[dBm] <= -107.5

gini = 0.5

samples = 936023

value = [468011.5, 468011.5]

class = 0

Figure 5.2: Top Section of the Decision Tree without Feature Distance

Herdy Herlianto, # XXXXXX 17

5 Prediction Result

To choose the root of the Decision Tree, the feature is selected by the Decision Tree classifier

using the Gini criterion to measure the quality of the split. Then, to split the node into the sub-

nodes, the classifier used the best split strategy which gives out the highest Gini gained from

the node to the sub-nodes. This method applies to both Decision Trees using the two different

data sets. In Figures 5.1 and 5.2, the root of the Decision Tree is the feature distance and RSRP

respectively. However, to ensure the importance of these features, the Python package Scikit-

learn [Ped+11] has an attribute for its classifier called feature_importances_ which returns the

importance or the contribution of a feature for the algorithm. The feature importance of both

data sets in the Decision Tree is listed as the following:

Table 5.1: Feature Importance in Decision Tree

Feature Importance

Data Set with Distance Data Set without Distance

Distance 0.943239 -

RSRP 0.053288 0.979066

RSRQ 0.002017 0.002096

RSSI 0.000147 0.013137

SINR 0.001309 0.005701

From Table 5.1, the feature distance is important in the Decision Tree for the prediction of LoS.

The main reason distance is important, is because the area of LoS for this work only occurs

when the AGV is at its closest to the base station as seen in Figure 4.6. The algorithm took

the distance value as a major decider to predict LoS since it is already clear that the closer

the AGV is to the base station, the higher the probability of LoS. The feature distance also

plays a significant role in the AUC score of the validation (or test) set when trained using the

Decision Tree. The AUC score for the Decision Tree with the feature distance is 0.9559, while

the AUC score for the Decision Tree without the feature distance is 0.9286. The AUC score for

the Decision Tree with feature distance is higher in comparison with the Decision Tree without

feature distance. While the feature distance is named important for this case, it could cause an

issue when applied in other environments where LoS between AGV and the base station does

not only occur when the AGV is at its closest to the base station.

After assessing the performance of the Decision Tree, other algorithms namely, Random Forest

and XGB will be addressed. For Random Forest, the hyper-parameters that need to be tuned are

max_depth and n_estimators, while for XGB, the hyper-parameters that need to be tuned are

learning_rate,max_depth, and n_estimators. Hyper-parameters for each algorithm and each

data set are listed as the following:

• Data set with feature distance:

–Random Forest: max_depth = 4 and n_estimators = 350

–XGB: learning_rate = 0.1, max_depth = 2, and n_estimators = 400

• Data set without feature distance:

–Random Forest: max_depth = 2 and n_estimators = 300

–XGB: learning_rate = 0.1, max_depth = 4, and n_estimators = 250

Random Forest and XGB classifiers also have the attribute feature_importances_ which returns

the importance of a feature and are listed in Tables 5.2 and 5.3:

18 Herdy Herlianto, # XXXXXX

Table 5.2: Feature Importance in Random Forest

Feature Importance

Data Set with Distance Data Set without Distance

Distance 0.427069 -

RSRP 0.305767 0.505130

RSRQ 0.001387 0.000674

RSSI 0.110506 0.173494

SINR 0.155270 0.320702

Table 5.3: Feature Importance in XGB

Feature Importance

Data Set with Distance Data Set without Distance

Distance 0.390010 -

RSRP 0.306542 0.306442

RSRQ 0.038372 0.133894

RSSI 0.199047 0.207965

SINR 0.066030 0.351699

Feature distance as shown in Tables 5.2 and 5.3 are proven to be important for the prediction of

LoS. The reason for this occurrence is already mentioned when comparing feature importance

in the Decision Tree. For the data set without feature distance, the feature RSRP and SINR have

a high value. This shows that RSRP and SINR play a significant role to predict the LoS using

Random Forest and XGB.

After analyzing feature importance in both classification algorithms, The AUC score for the

validation (or test) set of both data sets should be compared. The AUC score for the Random

Forest with feature distance is 0.9586, while the AUC score for the Random Forest without

feature distance is 0.9294. Meanwhile, the AUC score for XGB with feature distance is 0.9922,

while the AUC score for XGB without feature distance is 0.9491. Figure 5.3 summarized the

AUC scores for the validation (or test) sets for the different classification algorithms.

0.88

0.9

0.92

0.94

0.96

0.98

Test Set Test Set

With Distance Without Distance

AUC Score

AUC Score Comparison

Decision Tree Random Forrest XGB

Figure 5.3: AUC Score Comparison

Herdy Herlianto, # XXXXXX 19

5 Prediction Result

Figure 5.3 shows that the AUC scores for the data set with feature distance are higher than the

AUC scores for the data set without feature distance. However, even when the AUC scores are

lower, each classification algorithm still gives a satisfactory performance with AUC scores of

more than 0.92. Furthermore, the XGB distinctly has the best performance, while the Decision

Tree and Random Forest only have a slight difference between them. Additionally, AUC scores

with Random Forest and XGB could also be analyzed as the number of trees in the algorithm

increase. However, since trees are created differently for each algorithm and to have a direct

comparison between each algorithm, the AUC scores with Random Forest and XGB would be

analyzed as the number of nodes in the algorithm increase. Figure 5.4 compares the AUC scores

for Random Forest and XGB as the number of nodes increases.

0 2000 4000 6000 8000 10000

Nodes

0.90

0.92

0.94

0.96

0.98

AUC Score

AUC Scores as the Number of Nodes Increases

Algorithm

XGB

Random Forest

Data Set

With Distance

Without Distance

Figure 5.4: AUC Scores as the Number of Nodes Increase

Figure 5.4 shows how Random Forest has a stable AUC scores as the number of nodes increases

and how the AUC scores for XGB increase gradually as the number of nodes increases. This

also shows, how both algorithms work to achieve their final model. As previously mentioned

in Sections 2.2.2 and 2.2.3, XGB introduced trees to the ensembles one at a time and mitigates

the prediction mistakes generated by the prior models, while Random Forest does not mitigate

prediction mistakes. Therefore, the AUC scores for the XGB gradually increase and improve

the performance of the algorithm. It should also be noted that despite having fewer trees as seen

in the list on page 18, Random Forest using the data set with feature distance produces more

nodes than XGB using the same data set because the size of each tree in the Random Forest is

larger than the size of each tree in XGB.

20 Herdy Herlianto, # XXXXXX

6 Summary and Conclusions

This thesis analyzes the potential of data collected by AGV as an enabler of AI for telecommu-

nication in industrial environments. The data in form of sensor data and wireless measurement

from the AGV are utilized to predict LoS between the AGV and base station. The sensor data is

analyzed to define the area of LoS and nLoS. Then, using data from the wireless measurement,

several ML algorithms are used to give a prediction of LoS namely, Decision Tree, Random

Forest, and XGB. The performance of each ML algorithm is compared by training two data

sets. The first data set includes the features RSRP, RSRQ, RSSI, SINR, and distance while the

second data set is constructed without the feature distance.

From analyzing the performance of each ML algorithm, the data set with the feature distance

gives higher AUC scores than the data set without the feature distance and the feature distance

plays a significant role in the prediction of LoS. However, the main reason distance is important,

is because the area of LoS for this work only occurs when the AGV is at its closest to the base

station. The algorithm took the distance values as a major decider to predict LoS since it is

already clear that the closer the AGV is to the base station, the higher the probability of LoS.

This could cause an issue when applied in other environments where LoS between AGV and

the base station does not only occur when the AGV is at its closest.

The performance of the prediction model of each ML algorithm is ranked using the AUC score.

In this case, the XGB algorithm outperforms other algorithms and the Random Forest performs

slightly better than the Decision Tree. However, even when the AUC scores of Random Forest

and Decision Tree are lower than XGB, each classification algorithm still gives a satisfactory

performance with AUC scores of more than 0.92. At the end of this thesis, the performance of

Random Forest and XGB as the number of nodes increases is also visualized to show how both

algorithms work to achieve their final model. The results of this thesis could serve as a guideline

for future measurements where LoS predictions are not as dependent on distance as in this work.

For future applications, this thesis could be the early step to profile communication channels

using AI. It could expand beyond the classification of LoS and nLoS by also considering other

wireless measurement features.

Herdy Herlianto, # XXXXXX 21

Bibliography

[Ag217] Ag2gaeh. Deutsch: Ellipse: Parameter. 26th Mar. 2017. URL:https://commons.

wikimedia.org/wiki/File:Ellipse-param.svg (visited on 17/11/2022).

[BD22] Abdoulaye Bah and Muhammed Davud. ‘Analysis of Breast Cancer Classification

with Machine Learning based Algorithms’. In: 2022 2nd International Conference

on Computing and Machine Intelligence (ICMI). 2022 2nd International Confer-

ence on Computing and Machine Intelligence (ICMI). July 2022, pp. 1–4. DOI:

10.1109/ICMI55296.2022.9873696.

[Bha] Rahul Bhadani. bagpy: A python class to facilitate the reading of rosbag file based

on semantic datatypes. Version 0.5. URL:https://github.com/jmscslgroup/

bagpy (visited on 06/12/2022).

[Bro+20] Philip E. Brown et al. ‘Interactive Testing of Line-of-Sight and Fresnel Zone Clear-

ance for Planning Microwave Backhaul Links and 5G Networks’. In: Proceedings

of the 28th International Conference on Advances in Geographic Information Sys-

tems. SIGSPATIAL ’20. New York, NY, USA: Association for Computing Ma-

chinery, 13th Nov. 2020, pp. 143–146. ISBN: 978-1-4503-8019-5. DOI:10.1145/

3397536.3422332.URL:https://doi.org/10.1145/3397536.3422332

(visited on 17/11/2022).

[Cha+19] Rocky Chakma et al. ‘Navigation and Tracking of AGV in ware house via Wire-

less Sensor Network’. In: 2019 IEEE 3rd International Electrical and Energy

Conference (CIEEC). 2019 IEEE 3rd International Electrical and Energy Confer-

ence (CIEEC). Sept. 2019, pp. 1686–1690. DOI:10.1109/CIEEC47146.2019.

CIEEC-2019589.

[Gar+22] Victor Garrido et al. Deliverable D1.2c: Final description of the collected data.

AI4Mobile, 15th June 2022, p. 43.

[Her+22] Rodrigo Hernangomez et al. AI4Mobile Industrial Wireless Datasets: iV2V and

iV2I+. 2022. DOI:10.21227/04ta-v128.URL:https://dx.doi.org/10.

21227/04ta-v128.

[Jcm12] Jcmcclurg. English: distances d1 and d2 were identified close but incorrect dis-

tances. 19th Oct. 2012. URL:https://commons.wikimedia.org/wiki/File:

FresnelSVG1.svg (visited on 17/11/2022).

[JHJ15] Jhihoon Joo, Dong Seog Han and Hong-Jong Jeong. ‘First Fresnel zone analysis in

vehicle-to-vehicle communications’. In: 2015 International Conference on Con-

nected Vehicles and Expo (ICCVE). 2015 International Conference on Connected

Vehicles and Expo (ICCVE). ISSN: 2378-1297. Oct. 2015, pp. 196–197. DOI:

10.1109/ICCVE.2015.18.

Herdy Herlianto, # XXXXXX 23

Bibliography

[Kar+16] Aniket K. Kar et al. ‘Automated guided vehicle navigation with obstacle avoidance

in normal and guided environments’. In: 2016 11th International Conference on

Industrial and Information Systems (ICIIS). 2016 11th International Conference

on Industrial and Information Systems (ICIIS). Dec. 2016, pp. 77–82. DOI:10.

1109/ICIINFS.2016.8262911.

[Kul+21] Daniel F. Kulzer et al. ‘AI4Mobile: Use Cases and Challenges of AI-based QoS

Prediction for High-Mobility Scenarios’. In: 2021 IEEE 93rd Vehicular Techno-

logy Conference (VTC2021-Spring). 2021 IEEE 93rd Vehicular Technology Con-

ference (VTC2021-Spring). Helsinki, Finland: IEEE, Apr. 2021, pp. 1–7. ISBN:

978-1-72818-964-2. DOI:10.1109/VTC2021-Spring51267.2021.9449059.

URL:https : / / ieeexplore . ieee . org / document / 9449059/ (visited on

22/11/2022).

[Li+09] Ning Li et al. ‘A new heuristic of the decision tree induction’. In: 2009 Inter-

national Conference on Machine Learning and Cybernetics. 2009 International

Conference on Machine Learning and Cybernetics. Vol. 3. ISSN: 2160-1348. July

2009, pp. 1659–1664. DOI:10.1109/ICMLC.2009.5212227.

[Li+16] Yuanjie Li et al. ‘Mobileinsight: extracting and analyzing cellular network inform-

ation on smartphones’. In: Proceedings of the 22nd Annual International Confer-

ence on Mobile Computing and Networking. MobiCom ’16. New York, NY, USA:

Association for Computing Machinery, 3rd Oct. 2016, pp. 202–215. ISBN: 978-1-

4503-4226-1. DOI:10.1145/2973750.2973751.URL:https://doi.org/10.

1145/2973750.2973751 (visited on 06/12/2022).

[Ma21] Nigel Ma. ‘NBA Playoff Prediction Using Several Machine Learning Methods’.

In: 2021 3rd International Conference on Machine Learning, Big Data and Busi-

ness Intelligence (MLBDBI). 2021 3rd International Conference on Machine Learn-

ing, Big Data and Business Intelligence (MLBDBI). Dec. 2021, pp. 113–116. DOI:

10.1109/MLBDBI54094.2021.00030.

[Mam+18] Sunakshi Mamgain et al. ‘Car Popularity Prediction: A Machine Learning Ap-

proach’. In: 2018 Fourth International Conference on Computing Communica-

tion Control and Automation (ICCUBEA). 2018 Fourth International Conference

on Computing Communication Control and Automation (ICCUBEA). Aug. 2018,

pp. 1–5. DOI:10.1109/ICCUBEA.2018.8697832.

[Ped+11] F. Pedregosa et al. ‘Scikit-learn: Machine Learning in Python’. In: Journal of Ma-

chine Learning Research 12 (2011), pp. 2825–2830.

[Pro] Sebastian Proft. Random Forest Models.URL:https://www.genecascade.

org/MutationTaster2021/rf/ (visited on 17/11/2022).

[Ros+13] Manus Ross et al. ‘Using Support Vector Machines to Classify Student Attent-

iveness for the Development of Personalized Learning Systems’. In: 2013 12th

International Conference on Machine Learning and Applications. 2013 12th In-

ternational Conference on Machine Learning and Applications. Vol. 1. Dec. 2013,

pp. 325–328. DOI:10.1109/ICMLA.2013.66.

24 Herdy Herlianto, # XXXXXX

Bibliography

[Sar+21] Anusmita Sarkar et al. ‘A Novel Detection Approach of Ground Level Ozone us-

ing Machine Learning Classifiers’. In: 2021 Fifth International Conference on I-

SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). 2021 Fifth Inter-

national Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-

SMAC). ISSN: 2768-0673. Nov. 2021, pp. 428–432. DOI:10.1109/I-SMAC52330.

2021.9640852.

[Tam20] Srikanth Tammina. ‘A Hybrid Learning approach for Sentiment Classification in

Telugu Language’. In: 2020 International Conference on Artificial Intelligence

and Signal Processing (AISP). 2020 International Conference on Artificial Intelli-

gence and Signal Processing (AISP). ISSN: 2640-5768. Jan. 2020, pp. 1–6. DOI:

10.1109/AISP48273.2020.9073109.

[unk] unknown. What is a Random Forest? TIBCO Software. URL:https:// www.

tibco.com/reference- center/what- is- a- random- forest (visited on

17/11/2022).

[WCC20] Weilun Wang, Goutam Chakraborty and Basabi Chakraborty. ‘Predicting the Risk

of Chronic Kidney Disease (CKD) Using Machine Learning Algorithm’. In: Ap-

plied Sciences 11 (28th Dec. 2020), p. 202. DOI:10.3390/app11010202.

Herdy Herlianto, # XXXXXX 25