scieee Science in your language
[en] (orig)
Analysis of Wireless Measurements
and Sensor Data for Autonomous
Mobility in Industrial Environments
Bachelorabschlussarbeit
Herdy Herlianto
# XXXXXX
09.12.2022
Betreuer : Dr.-Ing. Martin Kasparick
Rodrigo Hernangómez
Prüfer : Prof. Dr.-Ing. Slawomir Stanczak
Prof. Dr.-Ing. Wilhelm Keusgen
Technische Universität Berlin
Fakultät IV Elektrotechnik und Informatik
Institut für Telekommunikationssysteme
Fachgebiet Network and Information Theory (NetIT)
Eidesstattliche Erklärung
Hiermit erkläre ich, dass ich die vorliegende Arbeit selbstständig und eigenhändig sowie ohne
unerlaubte fremde Hilfe und ausschließlich unter Verwendung der aufgeführten Quellen und
Hilfsmittel angefertigt habe.
Berlin, den 09.12.2022
Herdy Herlianto
Herdy Herlianto, # XXXXXX iii
Abstract
The following thesis contains an analysis of data from the Enway Campaign which was con-
ducted in 2021 and offered as a bachelor’s thesis by Fraunhofer HHI.
The Enway Campaign is part of the AI4Mobile research project that is funded by the Federal
Ministry of Education and Research under the umbrella of the Artificial Intelligence in Com-
munication Networks plan within the scope of the German Federal Government’s High-Tech
Strategy. The project intends to develop methodologies that provide sustainable Quality of Ser-
vice (QoS) prediction at high mobility by combining data from the entire mobile network with
secondary data from vehicles, applications, or the environment. The result of this campaign
which is in the form of several sets of wireless measurement and sensor data will be utilized
for machine learning activities. In addition to that, the sensor data will be processed to create a
clearance model that acts as the target prediction of the machine learning classifier.
This thesis deals with analyzing sensor data to derive Line-of-Sight (LoS) prediction from
the clearance model using the Fresnel Zone Clearance Model and also create a classifier for
LoS prediction using machine learning classifiers such as Decision Tree, Random Forest, and
Extreme Gradient Boosting (XGB) from the wireless measurement.
Herdy Herlianto, # XXXXXX v
Zusammenfassung
Die folgende Arbeit handelt es sich um eine Analyse von Daten der Enway-Kampagne, die im
Jahr 2021 durchgeführt und als Bachelorarbeit vom Fraunhofer HHI angeboten wurde.
Die Enway-Kampagne ist Teil des Forschungsprojekts AI4Mobile, das vom Bundesministerium
für Bildung und Forschung (BMBF) unter dem Dach des Plans Künstliche Intelligenz in Kom-
munikationsnetzen im Rahmen der Hightech-Strategie der Bundesregierung gefördert wird. Das
Projekt beabsichtigt, Methoden zu entwickeln, die eine nachhaltige QoS Vorhersage bei hoher
Mobilität ermöglichen, indem Daten aus dem gesamten Mobilfunknetz mit Sekundärdaten aus
Fahrzeugen, Anwendungen oder der Umgebung kombiniert werden. Das Ergebnis dieser Kam-
pagne in Form mehrerer drahtloser Mess- und Sensordatensätze wird für maschinelle Lernakti-
vitäten verwendet. Darüber hinaus werden die Sensordaten verarbeitet, um ein Freiraummodell
zu erstellen, das als Zielvorhersage des maschinell lernenden Klassifikators dient.
Diese Bachelorarbeit befasst sich mit der Analyse von Sensordaten zur Ableitung der Sicht-
linienvorhersage (LoS) aus dem Freiraummodell unter Verwendung des Fresnel Zonen Frei-
raummodells und der Erstellung eines Klassifikators für die Sichtlinienvorhersage (LoS) unter
Verwendung von maschinell lernenden Klassifikatoren wie Decision Tree, Random Forest und
XGB aus der drahtlosen Messung.
Herdy Herlianto, # XXXXXX vii
Contents
List of Acronyms and Abbreviations xi
List of Figures xiii
List of Tables xv
1 Introduction 1
2 Theoretical Background 3
2.1 ClearanceModel ................................. 3
2.2 Data Prediction/Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 DecisionTree............................... 6
2.2.2 RandomForest.............................. 7
2.2.3 Extreme Gradient Boosting (XGB) . . . . . . . . . . . . . . . . . . . 7
3 Data Source 9
4 Evaluation Methodology 11
4.1 SensorData.................................... 11
4.2 WirelessMeasurement .............................. 11
4.3 Workflow ..................................... 13
5 Prediction Result 17
6 Summary and Conclusions 21
Bibliography 23
Herdy Herlianto, # XXXXXX ix
List of Acronyms and Abbreviations
3GPP 3rd Generation Partnership Project
AGV Automated Guided Vehicle
AI Artificial Intelligence
AT Attention
AUC Area Under ROC Curve
BMBF Bundesministerium für Bildung und Forschung
BS Base Station
EPC Evolved Packet Core
LiDAR Light Detection and Ranging
LoS Line-of-Sight
LTE Long Term Evolution
MIMO Multiple Input Multiple Output
ML Machine Learning
nLoS non-Line-of-Sight
PHY Physical
QoS Quality of Service
ROC Receiver Operating Characteristic
RSRP Reference Signal Received Power
RSRQ Reference Signal Received Quality
RSSI Received Signal Strength Indication
SINR Signal-to-Interference-plus-Noise Ratio
SLAM Simultaneous Localization and Mapping
TDD Time-Division Duplexing
UE User Equipment
XGB Extreme Gradient Boosting
Herdy Herlianto, # XXXXXX xi
List of Figures
2.1 Fresnel Zone [Jcm12]............................... 3
2.2 Ellipse [Ag217].................................. 4
2.3 Decision Tree [Pro]................................ 6
2.4 Random Forest [unk] ............................... 7
2.5 XGB [WCC20] .................................. 7
3.1 Autonomous Cleaning Sweeper Provided by Enway [Gar+22] ......... 9
3.2 Route (black) of sweeper for several rounds of measurements including indoor
(green) and outdoor (blue). The position of the base station provided by Götting
(right picture) is marked by the yellow cross [Gar+22]. ............. 9
3.3 Dynamic Obstacle Maps [Gar+22]........................ 10
4.1 Reference Signal Received Power (RSRP) Comparison . . . . . . . . . . . . . 12
4.2 Reference Signal Received Quality (RSRQ) Comparison . . . . . . . . . . . . 12
4.3 StaticMap..................................... 13
4.4 Static Map with Fresnel Zone . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5 HeatMap ..................................... 14
4.6 LoS and non-Line-of-Sight (nLoS) Area . . . . . . . . . . . . . . . . . . . . . 14
5.1 Top Section of the Decision Tree with Feature Distance . . . . . . . . . . . . . 17
5.2 Top Section of the Decision Tree without Feature Distance . . . . . . . . . . . 17
5.3 Area Under ROC Curve (AUC) Score Comparison . . . . . . . . . . . . . . . 19
5.4 AUC Scores as the Number of Nodes Increase . . . . . . . . . . . . . . . . . . 20
Herdy Herlianto, # XXXXXX xiii
List of Tables
4.1 Topic Table from the .bag File .......................... 11
5.1 Feature Importance in Decision Tree . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Feature Importance in Random Forest . . . . . . . . . . . . . . . . . . . . . . 19
5.3 Feature Importance in XGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Herdy Herlianto, # XXXXXX xv
1 Introduction
An Automated Guided Vehicle (AGV) is a mobile robot that is often used in factories, ware-
houses, and inventories across several industries. They serve as a distribution system for the
processing and transfer of various commodities. These vehicles automate the workplace, re-
ducing the amount of time and work needed [Kar+16]. Nearly all manufacturing facilities
operate AGV, along with businesses in paper & printing, food & beverage, and other sectors
[Cha+19].
Additionally, the data that are used in this work are collected from an AGV in the form of an
autonomous sweeper from Enway [Gar+22]. While the AGV drives through a dedicated path,
the AGV also collected data from its sensor and wireless measurement between the AGV and
the base station. Furthermore, in many use cases of AGV in industrial environments, data that
are captured by the AGV can be used to predict the QoS. For instance, the cooperation between
AGVs and stationary robots where the main purpose is to have a stationary robot load and
unload products to an AGV without the AGV having to stop [Kul+21]. To synchronize the load
handling and identify the location of the AGV, additional sensors, such as camera systems, are
currently used. With the data provided by the AGV, synchronization of the load handling could
be achieved by providing the prediction of QoS which portrays how reliable the communication
link will be. As a result, it is possible to avoid using extra specialized sensors, such as cameras
and image analysis at the stationary robot, potentially cutting costs. The prediction of QoS is
also beneficial for use cases such as Traffic Management and Velocity Adaptation for AGV in
factory building where the main idea of this case is to maintain the traffic between several AGVs
[Kul+21]. Information, namely link stability, anticipatory velocity, and route adaptation, is
available through the prediction of QoS ensuring the central entity to identifies the best decision
for the AGV and achieves higher efficiency. Furthermore, it enables more AGVs to operate
simultaneously, which speeds up the transfer of products.
The focus of this thesis is to analyze the potential of data collected by AGV as an enabler of
Artificial Intelligence (AI) for telecommunication in industrial environments. The data from the
AGV will be utilized to predict LoS between the AGV and the base station. First, the sensor data
are analyzed to derive LoS using the Fresnel Zone Clearance Model. A few features from the
sensor data that are collected from the AGV are the position of the sweeper as given by odometry
techniques and a static map. Fundamentally, a Fresnel Zone Clearance Model is created between
the odometry path and the base station. Then, the total of the obstacles inside the Fresnel Zone
is calculated and the values in the area of transition between LoS and nLoS are examined. The
value in the area of transition will become the threshold to determine LoS. When the area of
LoS is discovered, prediction of LoS using wireless measurement features such as Reference
Signal Received Power (RSRP), Reference Signal Received Quality (RSRQ), Received Signal
Strength Indication (RSSI), and Signal-to-Interference-plus-Noise Ratio (SINR) gathered by the
AGV will be analyzed. The prediction of LoS is achieved by creating a Machine Learning (ML)
classifier such as a Decision Tree, Random Forest, and XGB with wireless measurement from
the AGV as its input.
Herdy Herlianto, # XXXXXX 1
2 Theoretical Background
2.1 Clearance Model
In an industrial environment, the path that the AGV drives through is predetermined. However,
the environment between AGV and the base station might change due to obstacles and could
potentially cause a disturbance in the communication channel. To analyze the disturbance, path
clearance between AGV and base station is evaluated commonly by applying the Fresnel Zone
Clearance Model. The assessment of how much of the Fresnel Zone is obscured by obstacles
could be used to determine the availability of LoS [JHJ15].
A Fresnel Zone is a buffer that is structured resembling a prolate spheroid (an ellipse that is
"pointy" instead of "squashed") around the LoS that separates two objects, as seen in Figure
2.1. Based on the distance from the objects and the transmission frequency, its radius varies. It
increases in size as one moves away from the center point, which is the point that is equally far
from both objects [Bro+20].
D
b
d1d2
Figure 2.1: Fresnel Zone [Jcm12]
A Fresnel Zone is made up of multiple layers or zones depending on its radius, especially the
semi-minor axis represented as bin Figure 2.1. The first zone, which is the zone with the
smallest semi-minor axis, must be kept mostly clear of obstacles to prevent interfering with the
radio reception. However, a minor Fresnel Zone interference is often acceptable. When 80% of
the corresponding first Fresnel Zone is unobstructed, the wireless connection is said to be free
space [JHJ15].
Figure 2.2 shows the parameter of an ellipse with aand brepresenting semi-major and semi-
minor axis, respectively. The point F1and F2represent the AGV and Base Station where Das
presented in Figure 2.1 is the distance between them. To create the Fresnel Zone, the semi-minor
axis of the Fresnel Zone should first be calculated.
Herdy Herlianto, # XXXXXX 3
2 Theoretical Background
Figure 2.2: Ellipse [Ag217]
The formula to calculate the semi-minor axis is as follows:
bn=1
2nλDwith λ=c
f(2.1)
In Equation (2.1), nis the indexing of the n-th Fresnel Zone, cis the speed of light, and fis the
carrier frequency. The semi-major axis acould be calculated with the Pythagoras theorem. Once
the semi-minor and semi-major of the Fresnel Zone are found, the ellipse could be constructed.
Since the ellipse is not always in a horizontal or vertical form and can be formed at an angle,
the formula for a general ellipse should be altered. The formula of the first Fresnel Zone with
n=1 and b1=bis as follows:
x2
a2+y2
b21 (2.2)
With the help of the polar coordinate system, xand yare given as follows:
x
y=ρcos(ϕ)
ρsin(ϕ);ϕ= [0,2π[(2.3)
since ρis already defined for the xand yaxis as aand b, the polar coordinate is assigned as:
x
y=acos(ϕ)
bsin(ϕ);ϕ= [0,2π[(2.4)
Rotate the ellipse with rotation matrix Rand substitute xand yfrom Equation (2.4):
x
y=Racos(ϕ)
bsin(ϕ)x
y(2.5)
x
y=cos(θ)sin(θ)
sin(θ)cos(θ)acos(ϕ)
bsin(ϕ)(2.6)
4Herdy Herlianto, # XXXXXX
2.2 Data Prediction/Classification
Multiply both sides with inverse rotation matrix R1=RT:
cos(θ)sin(θ)
sin(θ)cos(θ)x
y=acos(ϕ)
bsin(ϕ)(2.7)
With matrix multiplication, the following equations are given:
xcos(θ)+ysin(θ) = acos(ϕ)(2.8)
ycos(θ)xsin(θ) = bsin(ϕ)(2.9)
Translate the center of the ellipse from (0,0)to (xc,yc)and isolate cos(ϕ)and sin(ϕ):
(xxc)cos(θ)+(yyc)sin(θ)
a=cos(ϕ)(2.10)
(yyc)cos(θ)(xxc)sin(θ)
b=sin(ϕ)(2.11)
With the help of Pythagorean trigonometric identity (2.12) and substitution of both the Equa-
tions (2.10) and (2.11), the Equation (2.13) as the formula for the Fresnel Zone ellipse is
given:
cos2(ϕ)+sin2(ϕ) = 1 (2.12)
"(xxc)cos(θ)+(yyc)sin(θ)
a#2
+"(yyc)cos(θ)(xxc)sin(θ)
b#2
1 (2.13)
In Equation (2.13), θrepresents the angle of rotation counterclockwise and the symbol is
used to also account for points inside the ellipse.
2.2 Data Prediction/Classification
Machine Learning (ML) is a major element of AI that allows it to learn and improve on its
own. Identifying patterns in targeted data, creating descriptive models, and predicting objects
even without having clear predefined rules and models are a few abilities of the ML algorithm
[Mam+18]. ML has been used in many sectors such as in the medical sector where ML is
used to predict whether a person has breast cancer [BD22]. Another example would be in the
educational environment where ML helps identify Student Attentiveness [Ros+13]. In addition,
ML also contributes to the industrial environment to predict QoS as mentioned in Section 1.
ML algorithms are broadly categorized into four paradigms [Mam+18]:
Supervised learning: A function to predict output values is produced by this learning
algorithm. First, an established training data set is analyzed to begin the procedure. Using
labels and new data, this algorithm could be used to predict future occurrences from
previously learned data.
Herdy Herlianto, # XXXXXX 5
2 Theoretical Background
Unsupervised learning: This algorithm identifies data that has not been categorized or
labeled in a training data set. Additionally, it performs studies to deduce a function from a
system to explain a concealed structure from unlabeled data. The method of unsupervised
learning is clustering.
Semi-supervised learning: It incorporates features from both supervised and unsupervised
learning. These algorithms take a huge amount of unlabeled data and a small amount of
labeled data.
Reinforcement: This algorithm involves interacting with the environment through actions
and error discovery. It enables software agents and machines to decide what behavior is
appropriate in each situation to maximize performance.
One of the many features of ML is the ability to classify data or make a prediction to match a
certain category which falls into the supervised learning of ML. In classification, assessments
are made based on values acquired during observation. This method uses a mapping function,
say f, on input variables, say x, to approximate a discrete output variable, say y. Although
classification output is often discrete, it may alternatively be continuous for each class label in
the form of probability [Mam+18]. For this work, the ML models will be taken from the Python
package Scikit-learn [Ped+11], an open-source library including coding for various ML tasks
such as regression, classification, and clustering [Ma21]. The classification algorithm that will
be assessed for this work is Decision Tree, Random Forest, and XGB.
2.2.1 Decision Tree
One of the most important components of inductive learning is Decision Tree Induction. It is
among the most popular and useful techniques for the inductive approach. Using a training
set as a starting point, the following is a general process for creating a Decision Tree and is
visualized in Figure 2.3. The root node of the tree is first thought of as the complete training
set. The root node is then divided into several sub-nodes using certain heuristic data. If a sub-
node contains instances that are members of a specific class, it is referred to be a leaf node,
otherwise, it will keep splitting the sub-node based on the heuristic data. Up until all leaf nodes
are produced, this procedure is repeated [Li+09]. Figure 2.3 illustrates a Decision Tree for street
crossing as an example.
car
light
cross stop
child
stop cross
Green
Yes Yes
Red
No
No
stop
car
Yes
No
Figure 2.3: Decision Tree [Pro]
6Herdy Herlianto, # XXXXXX
2.2 Data Prediction/Classification
2.2.2 Random Forest
Random Forest is a classification technique implementing supervised ML that combines dif-
ferent Decision Trees. Due to its increased performance using the ensemble approach of ag-
gregating multiple trees and choosing the best result from a varied group of predictors, the
Random Forest is preferred over a single Decision Tree. Random Forest training is based on
the bootstrap approach, where each Decision Tree learns from random samples of the data set
during training, and these samples are utilized repeatedly (as a replacement) for the Decision
Tree. Then, Random Forest selects a random subset of features from the data set and generates
a collection of Decision Trees with the previously selected random samples. The final class of
the input is then determined by averaging the prediction results from these Decision Trees. The
process is visualized in Figure 2.4. This algorithm works effectively because a single Decision
Tree is sensitive to noise, but several Decision Trees reduce the noise and provide more accurate
predictions [Tam20].
2.2.3 Extreme Gradient Boosting (XGB)
Gradient boosting refers to a class of ensemble ML algorithms that could be applied for regres-
sion and classification models. To create ensembles, Decision Tree models are used. Trees are
introduced to the ensembles one at a time, and their purpose is to mitigate the prediction mis-
takes generated by the prior models. The gradient boosting method has an effective open-source
implementation called XGB. XGB helps researchers to solve large-scale problems in the real
world by using a smaller number of resources [Sar+21].
Figure 2.4: Random Forest [unk]
Figure 2.5: XGB [WCC20]
Herdy Herlianto, # XXXXXX 7
3 Data Source
The following section is taken from an AI4Mobile research project report "Final description
of the collected data" on how the measurement of data by the autonomous sweeper AGV is
conducted [Gar+22]. It should be noted that the report is not publicly available and this thesis
will not focus on how the data is collected.
An autonomous sweeper from Enway with an attached User Equipment (UE) linked to a private
Long Term Evolution (LTE) network with a single base station was utilized to perform a meas-
urement campaign at the Enway facilities in Berlin (see Figure 3.1). The autonomous sweeper
traced the same track (see Figure 3.2) several times, sweeping the inside of the building and a
small portion of the outside. The base station for this campaign was placed inside and atten-
uators were employed to offer a variety of radio measurements, including connection losses.
The only network-connected device was the UE, and traffic between the UE and a local server
was generated and recorded on both the UE and server sides. More than 40000 seconds of
measurements are collected in the data.
Figure 3.1: Autonomous Cleaning Sweeper Provided by Enway [Gar+22]
Figure 3.2: Route (black) of sweeper for several rounds of measurements including indoor
(green) and outdoor (blue). The position of the base station provided by Götting
(right picture) is marked by the yellow cross [Gar+22].
The radio and system characteristics of the LTE network are as follows:
EPC: 4G Core acc. 3GPP Rel. 15
BS: 3GPP Rel. 9
TDD system with 20 MHz bandwidth
Herdy Herlianto, # XXXXXX 9
3 Data Source
Frequency at 3790 MHz in the industrial spectrum between 3.7GHz and 3.8GHz
Output power: 24 dBm, 2x2 MIMO
Radiation pattern of build-in antennas: opening angle hor. 60°, vert. 60°
Gain: 6 dBi
Size (H x W x D): 330 x 230 x 130
A variety of sensors on the autonomous sweeper were used to collect data throughout the cam-
paign. Included are the following attributes:
Position of the sweeper as given by odometry techniques.
Light Detection and Ranging (LiDAR) data: A point-cloud of reflected obstacles such as
walls and standing objects, identified with x,y, and z coordinates.
Static elevation map: A precomputed, static occupancy grid.
Far and near map obstacles: Occupancy grids that are relative to the sweeper’s position
and created dynamically in real-time. See Figure 3.3 for details.
Figure 3.3: Dynamic Obstacle Maps [Gar+22]
Through the use of proprietary algorithms for sensor fusion and Simultaneous Localization and
Mapping (SLAM), the occupancy grids are produced from the raw sensor data. Along with the
sensor data, the following features were captured:
Full UE LTE Stack was captured (MobileInsight [Li+16])
Physical (PHY) Layer Measurements: RSRP, RSRQ, RSSI, and SINR are also available
via Attention (AT)-command-based measurements
QoS Measurements (e.g., Throughput and round-trip time from ping command)
The full data set from this autonomous sweeper measurement campaign has been published in
[Her+22].
10 Herdy Herlianto, # XXXXXX
4 Evaluation Methodology
It should be mentioned that for this work, the data that will be used is not the full data set that
the AGV gathered as mentioned in Section 3 and only the data that is relevant for LoS prediction
is analyzed. The relevant data are sensor data that is collected in the form of .bag files, PHY
Layer Measurements such as RSRP, RSRQ, RSSI, and SINR in a form of .log files, and the Full
UE LTE Stack from MobileInsight [Li+16] in a form of .mi2log files. First, the focus will be
pointed at the sensor data by extracting the .bag files to give out the appropriate features.
4.1 Sensor Data
The .bag files contain different types of messages that store information of the measurement.
The following Table 4.1 shows several types of messages.
Table 4.1: Topic Table from the .bag File
Topics Types Message
Count Frequency
/drive_state simple_drive_msgs/SimpleDrive 29508 49.997664
/global_odom nav_msgs/Odometry 17705 30.001102
/navigation/enway_map/far_map_obstacles nav_msgs/OccupancyGrid 11753 20.051698
/navigation/enway_map/map_static_elevation nav_msgs/OccupancyGrid 1 NaN
/navigation/enway_map/near_map_obstacles nav_msgs/OccupancyGrid 11753 20.015242
/sensors/inertial/imu sensor_msgs/Imu 58944 98.474022
/sensors/lidar/lidar_top/points sensor_msgs/PointCloud2 5902 10.000963
/tf tf2_msgs/TFMessage 94738 148.971906
/tf_static tf2_msgs/TFMessage 2 950.658205
Two topics are important for this thesis, namely /global_odom and /navigation/enway_map/map-
_static_elevation. The topic /global_odom contains the odometry position of the AGV while the
topic /navigation/enway_map/map_static_elevation contains the layout of the static map. The
data contained in both topics will play an important part in creating the Fresnel Zone which de-
termines LoS between AGV and the base station. The workflow of this thesis will be explained
further in Section 4.3.
4.2 Wireless Measurement
There are two wireless measurements from the Enway Campaign that is relevant for this work,
namely the PHY Layer Measurements such as RSRP, RSRQ, RSSI, and SINR via AT-command-
based measurements in a form of .log files and the Full UE LTE Stack captured with MobileIn-
sight [Li+16] in a form of .mi2log files. First, the .mi2log files are parsed and the features are
Herdy Herlianto, # XXXXXX 11
4 Evaluation Methodology
observed. After parsing the files, the only relevant features for this work that is contained in the
files are RSRP and RSRQ. Since features such as RSSI and SINR that are contained in the .log
files are not available, data from the .log files are used in the ML activities.
However, the sampling rate of the data in the .log files is coarser than the sampling rate in
.mi2log files, namely 200 ms to 40 ms. For this reason, the data inside the .log files should
be up-sampled and observed whether the up-sampled data has a sharp contrast to the data in
.mi2log files. Figures 4.1 and 4.2 show the difference between the up-sampled data and data
from the .mi2log files.
15:31:57.560000 15:37:20.520000 15:42:19.760000 15:47:20.360000 15:50:37.96
Timestamp
135
130
125
120
115
110
105
100
95
RSRP [dBm]
Comparison of RSRP between MobileInsight and AT-Command Measurement
RSRP(dBm) [AT-Command]
RSRP(dBm) [MobileInsight]
Figure 4.1: RSRP Comparison
15:31:57.560000 15:37:20.520000 15:42:19.760000 15:47:20.360000 15:50:37.96
Timestamp
20
18
16
14
12
10
8
6
RSRQ [dB]
Comparison of RSRQ between MobileInsight and AT-Command Measurement
RSRQ(dB) [AT-Command]
RSRQ(dB) [MobileInsight]
Figure 4.2: RSRQ Comparison
It is clear from Figures 4.1 and 4.2 that the difference is tolerable and therefore the up-sampled
data could be used for ML activities.
12 Herdy Herlianto, # XXXXXX
4.3 Workflow
4.3 Workflow
The .bag files are first extracted using the python package bagpy [Bha]. From the topic /global_odom
which contains the odometry path of the AGV and the topic /navigation/enway_map/map_static_ele-
vation which contains the static map, the following Figure 4.3 is created.
Figure 4.3: Static Map
In Figure 4.3, represented by green is the path of the AGV. The base station, represented by
blue, is located inside the building. For each point of the AGV’s path, a Fresnel Zone will be
created between those points and the base station. To calculate the obstacle inside the Fresnel
Zone, the floor of the map is entirely removed. Therefore, the values that represented the floor
are not accounted for in the calculation. Figure 4.4 shows the Fresnel Zone between one point
in the path of the AGV and the base station as an example and Figure 4.5 shows a heat map with
the accumulated obstacles as values on each point of the path taken by the AGV. Figures 4.4,
4.5, and also 4.6 are magnified to show the path of the AGV in more detail.
Herdy Herlianto, # XXXXXX 13
4 Evaluation Methodology
Figure 4.4: Static Map with Fresnel Zone Figure 4.5: Heat Map
Figure 4.5 already suggests a distinction in obstacle count between the area of LoS and nLoS.
To classify the LoS and nLoS area, a threshold that determines between the two classifications
is needed. Therefore, the values in the area of transition between LoS and nLoS are observed,
and the value that distinguishes both areas are then set as a threshold. Figure 4.6 shows a clear
contrast between LoS and nLoS areas. The area of LoS is represented with green color and the
area of nLoS is represented in red.
Figure 4.6: LoS and nLoS Area
14 Herdy Herlianto, # XXXXXX
4.3 Workflow
After establishing the area of LoS and nLoS from the sensor data, the next step is to create LoS
prediction with the data from the wireless measurement using ML. As mentioned in Section
4.2, data from the AT-command-based measurement will be used. It is important for this step
that the sampling rate for the data in the .log files matches the sensor data. The sensor data has
a sampling rate of 30 ms while data from the wireless measurement has a sampling rate of 40
ms. Here the sensor data are down-sampled to match the wireless data.
After the sensor data are down-sampled, data sets for ML activities will be created. Two versions
of the data set will be used to train the ML. Both data sets include features namely, RSRP,
RSRQ, RSSI, SINR, and the binary LoS and nLoS classification for the target of prediction. The
difference between the two data sets is the feature distance representing the distance between
AGV and the base station for each point of the path taken by the AGV. It should be noted that
the data that are being used for the data sets are the collection of all data from every run of the
AGV and not only from a singular run.
After the data sets are created, the prediction model that uses the data set as its input is used to
determine LoS and nLoS. Classification algorithms such as Decision Tree, Random Forest, and
XGB are used for the prediction of LoS. Then, the accuracy of each method will be compared.
To train the data set, 20% of the data are used for the validation (or test) set while the other 80%
as the training set. It should be noted that the data set is not split randomly. Instead, the data
set is split in order where the first 80% of the data set is allocated for the training set and the
remaining 20% for the validation (or test) set.
In addition, the best hyper-parameter for each classification algorithm should be decided. For
the Decision Tree, only the hyper-parameter max_depth is observed. As the name suggests,
max_depth limits the growth of the Decision Tree by limiting the depth of the tree. To find
the best max_depth, k-fold cross-validation is used. In this work, the 5-fold cross-validation is
assessed where this method divides the training set into five equal sections. Then, four parts of
the training set are used as a training set and the remaining portion as a test set. This also means
that 80% of the training set is used for training and 20% as a test set. The hyper-parameter value
that gives the highest Area Under ROC Curve (AUC) score for the test set will be used to create
the final prediction model. The AUC score is used to assess the performance of the model. The
higher the AUC score the better the model performs.
In the Random Forest algorithm, the hyper-parameters max_depth and n_estimators are ob-
served. n_estimators sets the number of trees that are used in the algorithm. The method Grid-
SearchCV is used where the method will try a list of different max_depth and n_estimators
values, train the training set, and find the highest AUC score for a certain max_depth and
n_estimators value. The method GridSearchCV that will be used in this work also applies
the 5-fold cross-validation to find the best hyper-parameter value that gives the highest AUC
score.
Then in XGB, the hyper-parameters max_depth,n_estimators, and learning_rate are observed
and the same method GridSearchCV is applied to find the best value for each hyper-parameter.
After the best hyper-parameter for each classification algorithm is decided, the final prediction
model for each algorithm is created and tested using the validation (or test) set which is as
mentioned before, 20% of the whole data set.
Herdy Herlianto, # XXXXXX 15
5 Prediction Result
The following section discusses the result of each classification algorithm for both the data set
without distance as its feature and the data set with feature distance. First, the Decision Tree
algorithm is assessed. The best max_depth value for the classifier that gives the highest AUC
score is six for the data set that has feature distance and four for the data set without feature
distance. The top section of the Decision Tree for the data set with feature distance is shown in
Figure 5.1, while the Decision Tree for the data set without feature distance is shown in Figure
5.2.
(...) (...) (...) (...) (...) (...) (...) (...)
distance <= 8.459
gini = 0.357
samples = 27922
value = [15954.993, 4827.248]
class = 0
RSRP[dBm] <= -106.5
gini = 0.048
samples = 185333
value = [11803.6, 463069.849]
class = 1
RSRP[dBm] <= -100.5
gini = 0.343
samples = 465
value = [266.81, 75.339]
class = 0
distance <= 10.27
gini = 0.0
samples = 722303
value = [439986.097, 39.064]
class = 0
RSRP[dBm] <= -108.5
gini = 0.106
samples = 213255
value = [27758.593, 467897.097]
class = 1
distance <= 10.217
gini = 0.001
samples = 722768
value = [440252.907, 114.403]
class = 0
distance <= 10.207
gini = 0.5
samples = 936023
value = [468011.5, 468011.5]
class = 0
Figure 5.1: Top Section of the Decision Tree with Feature Distance
(...) (...) (...) (...) (...) (...) (...) (...)
RSRP[dBm] <= -110.5
gini = 0.007
samples = 678478
value = [412973.11, 1490.029]
class = 0
RSRQ[dB] <= -7.5
gini = 0.417
samples = 40700
value = [22708.697, 9545.674]
class = 0
RSRP[dBm] <= -102.5
gini = 0.329
samples = 48490
value = [16097.535, 61565.554]
class = 1
RSSI[dBm] <= -74.5
gini = 0.076
samples = 168355
value = [16232.158, 395410.242]
class = 1
RSRP[dBm] <= -109.5
gini = 0.048
samples = 719178
value = [435681.806, 11035.704]
class = 0
RSSI[dBm] <= -76.5
gini = 0.123
samples = 216845
value = [32329.694, 456975.796]
class = 1
RSRP[dBm] <= -107.5
gini = 0.5
samples = 936023
value = [468011.5, 468011.5]
class = 0
Figure 5.2: Top Section of the Decision Tree without Feature Distance
Herdy Herlianto, # XXXXXX 17
5 Prediction Result
To choose the root of the Decision Tree, the feature is selected by the Decision Tree classifier
using the Gini criterion to measure the quality of the split. Then, to split the node into the sub-
nodes, the classifier used the best split strategy which gives out the highest Gini gained from
the node to the sub-nodes. This method applies to both Decision Trees using the two different
data sets. In Figures 5.1 and 5.2, the root of the Decision Tree is the feature distance and RSRP
respectively. However, to ensure the importance of these features, the Python package Scikit-
learn [Ped+11] has an attribute for its classifier called feature_importances_ which returns the
importance or the contribution of a feature for the algorithm. The feature importance of both
data sets in the Decision Tree is listed as the following:
Table 5.1: Feature Importance in Decision Tree
Feature Importance
Data Set with Distance Data Set without Distance
Distance 0.943239 -
RSRP 0.053288 0.979066
RSRQ 0.002017 0.002096
RSSI 0.000147 0.013137
SINR 0.001309 0.005701
From Table 5.1, the feature distance is important in the Decision Tree for the prediction of LoS.
The main reason distance is important, is because the area of LoS for this work only occurs
when the AGV is at its closest to the base station as seen in Figure 4.6. The algorithm took
the distance value as a major decider to predict LoS since it is already clear that the closer
the AGV is to the base station, the higher the probability of LoS. The feature distance also
plays a significant role in the AUC score of the validation (or test) set when trained using the
Decision Tree. The AUC score for the Decision Tree with the feature distance is 0.9559, while
the AUC score for the Decision Tree without the feature distance is 0.9286. The AUC score for
the Decision Tree with feature distance is higher in comparison with the Decision Tree without
feature distance. While the feature distance is named important for this case, it could cause an
issue when applied in other environments where LoS between AGV and the base station does
not only occur when the AGV is at its closest to the base station.
After assessing the performance of the Decision Tree, other algorithms namely, Random Forest
and XGB will be addressed. For Random Forest, the hyper-parameters that need to be tuned are
max_depth and n_estimators, while for XGB, the hyper-parameters that need to be tuned are
learning_rate,max_depth, and n_estimators. Hyper-parameters for each algorithm and each
data set are listed as the following:
Data set with feature distance:
Random Forest: max_depth = 4 and n_estimators = 350
XGB: learning_rate = 0.1, max_depth = 2, and n_estimators = 400
Data set without feature distance:
Random Forest: max_depth = 2 and n_estimators = 300
XGB: learning_rate = 0.1, max_depth = 4, and n_estimators = 250
Random Forest and XGB classifiers also have the attribute feature_importances_ which returns
the importance of a feature and are listed in Tables 5.2 and 5.3:
18 Herdy Herlianto, # XXXXXX
Table 5.2: Feature Importance in Random Forest
Feature Importance
Data Set with Distance Data Set without Distance
Distance 0.427069 -
RSRP 0.305767 0.505130
RSRQ 0.001387 0.000674
RSSI 0.110506 0.173494
SINR 0.155270 0.320702
Table 5.3: Feature Importance in XGB
Feature Importance
Data Set with Distance Data Set without Distance
Distance 0.390010 -
RSRP 0.306542 0.306442
RSRQ 0.038372 0.133894
RSSI 0.199047 0.207965
SINR 0.066030 0.351699
Feature distance as shown in Tables 5.2 and 5.3 are proven to be important for the prediction of
LoS. The reason for this occurrence is already mentioned when comparing feature importance
in the Decision Tree. For the data set without feature distance, the feature RSRP and SINR have
a high value. This shows that RSRP and SINR play a significant role to predict the LoS using
Random Forest and XGB.
After analyzing feature importance in both classification algorithms, The AUC score for the
validation (or test) set of both data sets should be compared. The AUC score for the Random
Forest with feature distance is 0.9586, while the AUC score for the Random Forest without
feature distance is 0.9294. Meanwhile, the AUC score for XGB with feature distance is 0.9922,
while the AUC score for XGB without feature distance is 0.9491. Figure 5.3 summarized the
AUC scores for the validation (or test) sets for the different classification algorithms.
0.88
0.9
0.92
0.94
0.96
0.98
1
Test Set Test Set
With Distance Without Distance
AUC Score
AUC Score Comparison
Decision Tree Random Forrest XGB
Figure 5.3: AUC Score Comparison
Herdy Herlianto, # XXXXXX 19
5 Prediction Result
Figure 5.3 shows that the AUC scores for the data set with feature distance are higher than the
AUC scores for the data set without feature distance. However, even when the AUC scores are
lower, each classification algorithm still gives a satisfactory performance with AUC scores of
more than 0.92. Furthermore, the XGB distinctly has the best performance, while the Decision
Tree and Random Forest only have a slight difference between them. Additionally, AUC scores
with Random Forest and XGB could also be analyzed as the number of trees in the algorithm
increase. However, since trees are created differently for each algorithm and to have a direct
comparison between each algorithm, the AUC scores with Random Forest and XGB would be
analyzed as the number of nodes in the algorithm increase. Figure 5.4 compares the AUC scores
for Random Forest and XGB as the number of nodes increases.
0 2000 4000 6000 8000 10000
Nodes
0.90
0.92
0.94
0.96
0.98
AUC Score
AUC Scores as the Number of Nodes Increases
Algorithm
XGB
Random Forest
Data Set
With Distance
Without Distance
Figure 5.4: AUC Scores as the Number of Nodes Increase
Figure 5.4 shows how Random Forest has a stable AUC scores as the number of nodes increases
and how the AUC scores for XGB increase gradually as the number of nodes increases. This
also shows, how both algorithms work to achieve their final model. As previously mentioned
in Sections 2.2.2 and 2.2.3, XGB introduced trees to the ensembles one at a time and mitigates
the prediction mistakes generated by the prior models, while Random Forest does not mitigate
prediction mistakes. Therefore, the AUC scores for the XGB gradually increase and improve
the performance of the algorithm. It should also be noted that despite having fewer trees as seen
in the list on page 18, Random Forest using the data set with feature distance produces more
nodes than XGB using the same data set because the size of each tree in the Random Forest is
larger than the size of each tree in XGB.
20 Herdy Herlianto, # XXXXXX
6 Summary and Conclusions
This thesis analyzes the potential of data collected by AGV as an enabler of AI for telecommu-
nication in industrial environments. The data in form of sensor data and wireless measurement
from the AGV are utilized to predict LoS between the AGV and base station. The sensor data is
analyzed to define the area of LoS and nLoS. Then, using data from the wireless measurement,
several ML algorithms are used to give a prediction of LoS namely, Decision Tree, Random
Forest, and XGB. The performance of each ML algorithm is compared by training two data
sets. The first data set includes the features RSRP, RSRQ, RSSI, SINR, and distance while the
second data set is constructed without the feature distance.
From analyzing the performance of each ML algorithm, the data set with the feature distance
gives higher AUC scores than the data set without the feature distance and the feature distance
plays a significant role in the prediction of LoS. However, the main reason distance is important,
is because the area of LoS for this work only occurs when the AGV is at its closest to the base
station. The algorithm took the distance values as a major decider to predict LoS since it is
already clear that the closer the AGV is to the base station, the higher the probability of LoS.
This could cause an issue when applied in other environments where LoS between AGV and
the base station does not only occur when the AGV is at its closest.
The performance of the prediction model of each ML algorithm is ranked using the AUC score.
In this case, the XGB algorithm outperforms other algorithms and the Random Forest performs
slightly better than the Decision Tree. However, even when the AUC scores of Random Forest
and Decision Tree are lower than XGB, each classification algorithm still gives a satisfactory
performance with AUC scores of more than 0.92. At the end of this thesis, the performance of
Random Forest and XGB as the number of nodes increases is also visualized to show how both
algorithms work to achieve their final model. The results of this thesis could serve as a guideline
for future measurements where LoS predictions are not as dependent on distance as in this work.
For future applications, this thesis could be the early step to profile communication channels
using AI. It could expand beyond the classification of LoS and nLoS by also considering other
wireless measurement features.
Herdy Herlianto, # XXXXXX 21
Bibliography
[Ag217] Ag2gaeh. Deutsch: Ellipse: Parameter. 26th Mar. 2017. URL:https://commons.
wikimedia.org/wiki/File:Ellipse-param.svg (visited on 17/11/2022).
[BD22] Abdoulaye Bah and Muhammed Davud. ‘Analysis of Breast Cancer Classification
with Machine Learning based Algorithms’. In: 2022 2nd International Conference
on Computing and Machine Intelligence (ICMI). 2022 2nd International Confer-
ence on Computing and Machine Intelligence (ICMI). July 2022, pp. 1–4. DOI:
10.1109/ICMI55296.2022.9873696.
[Bha] Rahul Bhadani. bagpy: A python class to facilitate the reading of rosbag file based
on semantic datatypes. Version 0.5. URL:https://github.com/jmscslgroup/
bagpy (visited on 06/12/2022).
[Bro+20] Philip E. Brown et al. ‘Interactive Testing of Line-of-Sight and Fresnel Zone Clear-
ance for Planning Microwave Backhaul Links and 5G Networks’. In: Proceedings
of the 28th International Conference on Advances in Geographic Information Sys-
tems. SIGSPATIAL ’20. New York, NY, USA: Association for Computing Ma-
chinery, 13th Nov. 2020, pp. 143–146. ISBN: 978-1-4503-8019-5. DOI:10.1145/
3397536.3422332.URL:https://doi.org/10.1145/3397536.3422332
(visited on 17/11/2022).
[Cha+19] Rocky Chakma et al. ‘Navigation and Tracking of AGV in ware house via Wire-
less Sensor Network’. In: 2019 IEEE 3rd International Electrical and Energy
Conference (CIEEC). 2019 IEEE 3rd International Electrical and Energy Confer-
ence (CIEEC). Sept. 2019, pp. 1686–1690. DOI:10.1109/CIEEC47146.2019.
CIEEC-2019589.
[Gar+22] Victor Garrido et al. Deliverable D1.2c: Final description of the collected data.
AI4Mobile, 15th June 2022, p. 43.
[Her+22] Rodrigo Hernangomez et al. AI4Mobile Industrial Wireless Datasets: iV2V and
iV2I+. 2022. DOI:10.21227/04ta-v128.URL:https://dx.doi.org/10.
21227/04ta-v128.
[Jcm12] Jcmcclurg. English: distances d1 and d2 were identified close but incorrect dis-
tances. 19th Oct. 2012. URL:https://commons.wikimedia.org/wiki/File:
FresnelSVG1.svg (visited on 17/11/2022).
[JHJ15] Jhihoon Joo, Dong Seog Han and Hong-Jong Jeong. ‘First Fresnel zone analysis in
vehicle-to-vehicle communications’. In: 2015 International Conference on Con-
nected Vehicles and Expo (ICCVE). 2015 International Conference on Connected
Vehicles and Expo (ICCVE). ISSN: 2378-1297. Oct. 2015, pp. 196–197. DOI:
10.1109/ICCVE.2015.18.
Herdy Herlianto, # XXXXXX 23
Bibliography
[Kar+16] Aniket K. Kar et al. ‘Automated guided vehicle navigation with obstacle avoidance
in normal and guided environments’. In: 2016 11th International Conference on
Industrial and Information Systems (ICIIS). 2016 11th International Conference
on Industrial and Information Systems (ICIIS). Dec. 2016, pp. 77–82. DOI:10.
1109/ICIINFS.2016.8262911.
[Kul+21] Daniel F. Kulzer et al. ‘AI4Mobile: Use Cases and Challenges of AI-based QoS
Prediction for High-Mobility Scenarios’. In: 2021 IEEE 93rd Vehicular Techno-
logy Conference (VTC2021-Spring). 2021 IEEE 93rd Vehicular Technology Con-
ference (VTC2021-Spring). Helsinki, Finland: IEEE, Apr. 2021, pp. 1–7. ISBN:
978-1-72818-964-2. DOI:10.1109/VTC2021-Spring51267.2021.9449059.
URL:https : / / ieeexplore . ieee . org / document / 9449059/ (visited on
22/11/2022).
[Li+09] Ning Li et al. ‘A new heuristic of the decision tree induction’. In: 2009 Inter-
national Conference on Machine Learning and Cybernetics. 2009 International
Conference on Machine Learning and Cybernetics. Vol. 3. ISSN: 2160-1348. July
2009, pp. 1659–1664. DOI:10.1109/ICMLC.2009.5212227.
[Li+16] Yuanjie Li et al. ‘Mobileinsight: extracting and analyzing cellular network inform-
ation on smartphones’. In: Proceedings of the 22nd Annual International Confer-
ence on Mobile Computing and Networking. MobiCom ’16. New York, NY, USA:
Association for Computing Machinery, 3rd Oct. 2016, pp. 202–215. ISBN: 978-1-
4503-4226-1. DOI:10.1145/2973750.2973751.URL:https://doi.org/10.
1145/2973750.2973751 (visited on 06/12/2022).
[Ma21] Nigel Ma. ‘NBA Playoff Prediction Using Several Machine Learning Methods’.
In: 2021 3rd International Conference on Machine Learning, Big Data and Busi-
ness Intelligence (MLBDBI). 2021 3rd International Conference on Machine Learn-
ing, Big Data and Business Intelligence (MLBDBI). Dec. 2021, pp. 113–116. DOI:
10.1109/MLBDBI54094.2021.00030.
[Mam+18] Sunakshi Mamgain et al. ‘Car Popularity Prediction: A Machine Learning Ap-
proach’. In: 2018 Fourth International Conference on Computing Communica-
tion Control and Automation (ICCUBEA). 2018 Fourth International Conference
on Computing Communication Control and Automation (ICCUBEA). Aug. 2018,
pp. 1–5. DOI:10.1109/ICCUBEA.2018.8697832.
[Ped+11] F. Pedregosa et al. ‘Scikit-learn: Machine Learning in Python’. In: Journal of Ma-
chine Learning Research 12 (2011), pp. 2825–2830.
[Pro] Sebastian Proft. Random Forest Models.URL:https://www.genecascade.
org/MutationTaster2021/rf/ (visited on 17/11/2022).
[Ros+13] Manus Ross et al. ‘Using Support Vector Machines to Classify Student Attent-
iveness for the Development of Personalized Learning Systems’. In: 2013 12th
International Conference on Machine Learning and Applications. 2013 12th In-
ternational Conference on Machine Learning and Applications. Vol. 1. Dec. 2013,
pp. 325–328. DOI:10.1109/ICMLA.2013.66.
24 Herdy Herlianto, # XXXXXX
Bibliography
[Sar+21] Anusmita Sarkar et al. ‘A Novel Detection Approach of Ground Level Ozone us-
ing Machine Learning Classifiers’. In: 2021 Fifth International Conference on I-
SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). 2021 Fifth Inter-
national Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-
SMAC). ISSN: 2768-0673. Nov. 2021, pp. 428–432. DOI:10.1109/I-SMAC52330.
2021.9640852.
[Tam20] Srikanth Tammina. ‘A Hybrid Learning approach for Sentiment Classification in
Telugu Language’. In: 2020 International Conference on Artificial Intelligence
and Signal Processing (AISP). 2020 International Conference on Artificial Intelli-
gence and Signal Processing (AISP). ISSN: 2640-5768. Jan. 2020, pp. 1–6. DOI:
10.1109/AISP48273.2020.9073109.
[unk] unknown. What is a Random Forest? TIBCO Software. URL:https:// www.
tibco.com/reference- center/what- is- a- random- forest (visited on
17/11/2022).
[WCC20] Weilun Wang, Goutam Chakraborty and Basabi Chakraborty. ‘Predicting the Risk
of Chronic Kidney Disease (CKD) Using Machine Learning Algorithm’. In: Ap-
plied Sciences 11 (28th Dec. 2020), p. 202. DOI:10.3390/app11010202.
Herdy Herlianto, # XXXXXX 25