Document [original]

Millimeter Wave Wireless

Communication: Initial Acquisition,

Data Communication and Relay

Network Investigation

vorgelegt von

M.Eng.

Xiaoshen Song

ill

an der Fakultat IV -Elektrotechnik und Informatik

der Technischen Universitat Berlin

zur Er langung des akademischen Grades

Doktor der Ingenieurwissenschaften

-Dr.-Ing.-

genehmigte Dissertation

Promotionsa usschuss:

Vorsitzender: Prof. Rafael Schaefer

Gutachter: Prof. Giuseppe Caire

Gutachter: Prof. Robert W. Heath Jr.

Gutachter: Prof. Joerg Widmer

Tag der wissenschaftlichen Aussprache: 18. September 2020

Berlin 2020

Abstract

Wireless communication has become an important part of our daily lives. In the past

decades, the phenomenal increasing demand for mobile wireless data services has been

pushing both industry and academia to move to millimeter wave (mmWave) frequencies

(30-300 GHz) for the next generation (5G) mobile communication. The main motivation

for mmWave communication is the unprecedented massive bandwidth (multi-GHz) which

can offer multi-Gbps data rates for each mobile devices. However, mmWave signals

experience high path loss, directivity and blockages, which severely limits the network

performance.

To overcome the aforementioned challenges in mmWave communication, large antenna

arrays are used on the transceivers aiming for a large beamforming gain to compensate the

severe path loss. In addition, hybrid digital analog (HDA) architecture, with much smaller

number of radio frequency (RF) chains in comparison with the number of antennas, are

commonly implemented at the transceivers in order to reduce the hardware complexity

and power consumption. All these new features rise a big challenge for signaling and

networking for mmWave wireless systems.

The goal of this thesis is to clearly incorporate all the new features at mmWave

frequencies, on top of which to provide new state of the art schemes regarding different

communication phases. Specifically, this thesis contains four main contributions.

First, we propose an efficient beam alignment (BA) scheme for mmWave OFDM

(orthogonal frequency division multiplexing) systems. In this scheme, we use pseudo-

random multi-finger beam patterns in the downlink to explore the beam-domain channel,

and then construct an estimate of the channel second-order statistics. By using non-

negative least-squares (NNLS) technique, the resulting under-determined equations can

be efficiently solved. Accordingly, the proposed BA scheme is very robust to channel

time-dynamics and is strongly scalable to multi-user scenarios.

Second, we further explore single-carrier (SC) operation mode at mmWave frequencies

and propose a new BA scheme for mmWave SC systems. In this scheme, the BS

periodically probes the channel via a pre-specified pseudo-random beamforming codebook

and pseudo-random spreading codes. Each UE formulates the BA problem as the

estimation of a sparse non-negative second-order statistic channel vector, which can be

efficiently solved by using NNLS technique. In addition to the advantage of multi-user

scalability, this proposed scheme is purely in time domain and is highly robust to fast

channel variations caused by the large Doppler spread between multipath components.

Third, we define two HDA antenna architectures which can be regarded as two

“extreme” cases, i.e., the fully-connected (FC) architecture and the one-stream-per-

subarray (OSPS) architecture. We propose a joint performance evaluation of the initial

BA, the consequent data communication as well as the hardware impairments, where we

consider, from a realistic point of view, only a limited channel state information (CSI)

that obtained from the BA phase. Also, a family of multi-user MIMO (MU-MIMO)

precoding schemes are investigated to well adapt to the hybrid architectures and the

beam information extracted from the BA phase. An interesting observation from this

work is that the two aforementioned architectures achieve similar sum spectral efficiency,

while the OSPS architecture is advantageous with respect to the FC case in terms

of hardware complexity and power efficiency, only at the cost of a slightly longer BA

time-to-acquisition due to its reduced beam angle resolution.

Fourth, we extend our work to relay networking to further increase the communication

range at mmWave frequencies. For a general mmWave half-duplex (HD) relay network

with arbitrary relay connections, we introduce the information theoretically optimal

schedule to firstly do a topology simplification procedure, on top of which we propose two

practical beam scheduling schemes, i.e., the deterministic edge coloring (EC) scheduler

and the adaptive backpressure (BP) scheduler. The former is more suitable for static

scenarios while the later is more favorable for time-varying scenarios. Both the proposed

schedulers can effectively stabilize the network within its capacity range, meanwhile

achieve much smaller queuing backlogs, much smaller backlog fluctuations, and much

lower packet end-to-end delays in comparison with the reference baseline scheme.

Zusammenfassung

Drahtlose Kommunikation ist zu einem wichtigen Bestandteil unseres täglichen Lebens

geworden. Die Nachfrage nach mobilen Datendiensten ist in den letzten Jahren massiv

gestiegen. Dies hat sowohl die Industrie als auch die Wissenschaft dazu veranlasst, für

die Mobilkommunikation der nächsten Generation (5G) auf Frequenzen im Bereich

der Millimeterwellen (mmWave, 30-300GHz) umzusteigen. Die Hauptmotivation für die

mmWave-Kommunikation ist die Verfügbarkeit einer enormen Bandbreite (mehrere GHz).

Diese ermöglicht Datenraten von mehreren Gbps für mehrere Mobilfunkendgeräte. Bei

mmWave-Signalen treten jedoch hohe Pfadverluste, sehr spezifische Richtcharakteristiken

und Blockierungen auf, was die Netzwerkleistung stark einschränkt.

Um die oben genannten Herausforderungen für die mmWave-Kommunikation zu

bewältigen, werden an den Transceivern große Antennenarrays verwendet. Diese ermög-

lichen einen großen Beamforminggewinn, um den hohen Pfadverlust zu kompensieren.

Darüber hinaus wird an den Transceivern üblicherweise eine hybrid digital analog

-Architektur (HDA-Architektur) mit einer im Vergleich zur Anzahl der Antennen

viel geringeren Anzahl von Basisbandsignalpfaden eingesetzt. Dadurch kann die

Hardwarekomplexität und der Stromverbrauch verringert werden. All diese neuen

Funktionen stellen die Entwickler der physikalische Schicht von mmWave-Funksystemen

und deren Netzwerken vor große Herausforderungen.

Ziel dieser Arbeit ist es, alle neuen Eigenschaften bei mmWave-Frequenzen klar

einzubeziehen und darüber hinaus neue hochmoderne Algorithmen für verschiedene

Phasen der Kommunikation bereitzustellen. Insbesondere enthält diese Arbeit vier

Hauptbeiträge.

Zunächst stellen wir einen effizienten Algorithmus für die initiale Ausrichtung von

Antennencharakteristiken (im Englischen beam alignment - BA) für mmWave orthogonal

frequency division multiplexing-Systeme (OFDM-Systeme) vor. In diesem Algorithmus

verwenden wir pseudozufällige Mehrfinger-Abstrahlcharakteristiken im Downlink, um

den Mobilfunkkanal zu untersuchen, und schätzen dann die Statistik zweiter Ordnung

des Kanals. Durch die Verwendung der Non-Negative Least Squares-Technik (NNLS-

Technik) können die resultierenden unterbestimmten Gleichungen effizient gelöst werden.

Der vorgeschlagene BA-Algorithmus ist sehr robust gegenüber der Zeitdynamik des

Kanals und skaliert sehr gut für Mehrbenutzerszenarien.

Zweitens untersuchen wir den Einträger-Betriebsmodus (single carrier - SC) bei

mmWave-Frequenzen. Wir schlagen einen neuen BA-Algorithmus für mmWave-SC-

Systeme vor. In diesem Schema prüft die Basistation den Kanal periodisch über ein

vorbestimmtes Codebuch aus pseudozufälligen Mehrfinger-Abstrahlcharakteristiken und

Pseudozufalls-Spreizcodes. Jedes Endgerät formuliert das BA-Problem als Schätzung

eines dünnbesetzten nicht negativen statistischen Kanalvektors zweiter Ordnung. Dieser

kann unter Verwendung der NNLS-Technik effizient gelöst werden. Zusätzlich zu dem

Vorteil der Mehrbenutzerskalierbarkeit arbeitet der vorgeschlagene Algorithmus nur

im Zeitbereich und ist damit äußerst robust gegenüber schnellen Kanalschwankungen,

die durch eine große Doppler-Verschiebung zwischen Pfadkomponenten des Funkkanals

verursacht werden.

Drittens definieren wir zwei spezifische Strukturen der HDA-Architektur, die als

zwei Extremfälle angesehen werden können. Diese sind die vollständig verbundene

(fully connected - FC) Architektur und die Ein-Signalpfad-pro-Subarray-Architektur

(one stream per subarray - OSPS). Wir betrachten zusammenhängend die Leistung

des initialen BA, der daraus resultierenden Datenkommunikation sowie der Hardware-

Beeinträchtigungen. Dabei nutzen wir realitätsnah nur eine begrenzte Zustandsinforma-

tion des Funkkanals (channel state information – CSI). Diese Information kann direkt

aus der BA-Phase erhalten werden. Außerdem untersuchen wir verschiedene Multi-

User-MIMO-Codierungen (MU-MIMO precoding), welche an die vorgeschlagenen HDA-

Architekturen und die aus der BA-Phase extrahierten Kanalinformationen angepasst

sind. Eine interessante Beobachtung aus dieser Arbeit ist, dass die beiden oben genannten

Architekturen eine ähnliche spektrale Effizienz erzielen. Während die OSPS-Architektur

im Vergleich zum FC-Fall in Bezug auf Hardwarekomplexität und Energieeffizienz

vorteilhaft ist, zeigt die FC-Architektur eine etwas bessere Leistung im BA. Dies liegt

an der geringeren Winkelauflösung der OSPS-Struktur.

Viertens erweitern wir unsere Arbeit auf Relay-Netzwerke, um die Reichweite bei

mmWave-Frequenzen weiter zu vergrößern. Für ein allgemeines Relay-Netzwerk im

mmWave-Bereich im Halbduplexbetrieb mit beliebigen Relayverbindungen führen wir das

informationstheoretisch optimale Schema (im Englischen schedule) ein. Dadurch können

wir zunächst ein Verfahren zur Vereinfachung der Topologie durchführen. Anschließend

schlagen wir dann zwei praktische Schemata zur Strahlsteuerung vor. Zum einen das

deterministic edge coloring (EC) Schema und zum anderen das adaptive backpressure

(BP) Schema. Ersteres eignet sich besser für statische Szenarien, während letzteres für

zeitlich variierende Szenarien günstiger ist. Beide vorgeschlagenen Schemata können

das Netzwerk innerhalb seines Kapazitätsbereichs effektiv betreiben. Im Vergleich zum

Referenzschema haben beide viel kleinere Warteschlangen, viel kleinere Schwankungen

in der Warteschlangenauslastung und viel geringere Ende-zu-Ende-Verzögerungen.

To My Youth and My Beloved Husband.

— Xiaoshen Song

Acknowledgements

First and foremost, I would like to express my deepest gratitude to my PhD supervisor

Prof. Giuseppe Caire. Without him this research would have not been possible. He has

always been there for me, providing unending support and motivation. He tolerates my

shortcomings and helps me to overcome my weakness. Besides, he is passionate about

new technologies and exciting ideas. He trained me in building efficient research skills

which led me to finish my PhD research effectively and in time. I consider myself very

lucky to have joined his group.

A very special thanks to Dr. Saeid Haghighatshoar. “Research life is tough, find a

mentor!”. So yes, he is my best mentor. Without him I would never have got started

in my PhD research. From academic writing to technical skill, he unconditionally

passes on his valuable experience to me without any reservation. His critical advice,

wide perspective and methodological precision have substantially shaped my scientific

thinking.

I would also like to thank Mozhgan Bayat and Thomas Kühne. Adapting to a new

country and a new culture is not easy. They have provided a listening ear, encouraged

me and unconditionally helped me in private. They have made me feel that I am not

alone in this country. I will always be grateful to them for their valuable friendship.

Many thanks to the doctoral committee experts Prof. Robert W. Heath Jr., Prof.

Joerg Widmer and Prof. Rafael Schaefer for providing insightful comments to my

research and thesis. I have learned a lot from their publications as well as from the long

discussion with them over the defense.

In addition, I would like to express my gratitude and appreciation to my colleagues.

They made my PhD journey funny and interesting. I will always hold dear the days

and nights spent in the office.

I also thank my parents and my sisters for their inspiration and unequivocal support

during my PhD journey. The journey from China to TU Berlin would not have been

possible without their support.

Last but not least, I owe a thanks to my beloved husband Youjiang. I find it difficult

to express my appreciation because it is so boundless. He has seen me through the ups

and downs of the entire PhD journey. He is my most enthusiastic cheerleader; he is my

best friend; and he is an amazing husband. Without his sunny optimism, I would be a

much grumpier person; without his love and support, I would be lost. Fate brought us

together in the college. The past 9 years for our meeting and getting along with each

other are recalled, being still happy and romantic. I have always been firm to spend my

future with him together, pursuing our dreams and facing life’s challenges hand in hand.

I would like to thank all my friends from different countries and cultures for long

lasting friendships and enjoyable experiences during my stay in Berlin.

List of Publications

Below is a selection of publications that I authored / co-authored during my PhD

candidate duration.

Journal Papers:

1. X. Song

, S. Haghighatshoar, and G. Caire,“A scalable and statistically robust beam

alignment technique for mm-wave systems,” IEEE Transactions on Wireless Communi-

cations, 2018. (page 20)

2. X. Song

, S. Haghighatshoar, and G. Caire,“Efficient beam alignment for mmWave

single-carrier systems with hybrid MIMO transceivers,” IEEE Transactions on Wireless

Communications, 2019. (page 38)

3. X. Song

, T. Kühne, and G. Caire, “Fully-/Partially-Connected Hybrid Beamforming

Architectures for mmWave MU-MIMO,” IEEE Transactions on Wireless Communications,

2019. (page 58)

4. X. Song

, Yahya H. Ezzeldin, Giuseppe Caire, Christina Fragouli, “Efficient Beam

Scheduling for Half-Duplex mmWave Relay Networks,” IEEE Transactions on Wireless

Communications, 2020. (to be submitted) (page 78)

Conference Papers:

1. X. Song

, S. Haghighatshoar, and G. Caire, “A robust time-domain beam alignment

scheme for multi-user wideband mmwave systems,” in WSA 2018; 22nd International

ITG Workshop on Smart Antennas, 2018, pp. 1-7.

2. X. Song

, S. Haghighatshoar, and G. Caire, “An Efficient CS-Based and Statistically

Robust Beam Alignment Scheme for mmWave Systems,” in 2018 IEEE International

Conference on Communications (ICC), 2018, pp. 1-6.

3. X. Song

, T. Kühne, and G. Caire, “Fully-Connected vs. Sub-Connected Hybrid Precoding

Architectures for mmWave MU-MIMO,” in ICC 2019-2019 IEEE International Conference

on Communications (ICC), 2019, pp. 1-7.

4. X. Song

, and G. Caire, “Queue-Aware Beam Scheduling for Half-Duplex mmWave Relay

Networks,” in 2020 IEEE International Symposium on Information Theory (ISIT).

(accepted)

xii

T. Kühne,

X. Song

, G. Caire, K. Rasilainen, T. H. Le, M. Rossi, et al., “Performance

Simulation of a 5G Hybrid Beamforming Millimeter-Wave System,” in WSA 2020; 24th

International ITG Workshop on Smart Antennas, 2020, pp. 1-6.

This thesis is an accumulation of publications. It is based on the above selected

journal papers (three published papers after peer-reviewing and one to-be-submitted

journal manuscript), which I wrote as first author. These four journal papers constitute

the four main chapters (Chapter 3 - Chapter 6) in this thesis. At the beginning of

the corresponding chapters, an introductory section with supplementary background

information as well as the clarification of each authors’ contributions are provided.

In reference to IEEE copyrighted material which is used with permission in this

thesis, the IEEE does not endorse any of TU Berlin’s products or services. Internal or

personal use of this material is permitted. If interested in reprinting/republishing IEEE

copyrighted material for advertising or promotional purposes or for creating new collective

works for resale or redistribution, please go to

https://www.ieee.org/publications/

rights/rights-link.html to learn how to obtain a License from RightsLink.

Table of Contents

Title Page i

Abstract iii

Zusammenfassung v

Acknowledgements ix

List of Publications xi

1 Introduction 1

1.1 Background for mmWave communication . . . . . . . . . . . . . . . . . 1

1.1.1 Distinctive mmWave characteristics . . . . . . . . . . . . . . . . 4

1.1.2 Potentials and challenges for mmWave communication . . . . . 5

1.2 Related works in the state of the art . . . . . . . . . . . . . . . . . . . 6

1.3 Contributions and structure of this thesis . . . . . . . . . . . . . . . . . 8

1.4 Notations .................................. 10

2 System Model for mmWave MU-MIMO 11

2.1 Hybrid mmWave transceiver architectures . . . . . . . . . . . . . . . . 12

2.2 Channelmodel ............................... 13

2.3 Signalingmodel............................... 15

2.4 Summary .................................. 17

3 Initial Beam Alignment for mmWave OFDM Systems 19

3.1 Introduction................................. 19

3.2 Clarification of each authors’ contributions . . . . . . . . . . . . . . . . 19

3.3 Original journal article . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4 Initial Beam Alignment for mmWave Single-Carrier Systems 37

4.1 Introduction................................. 37

4.2 Clarification of each authors’ contributions . . . . . . . . . . . . . . . . 37

4.3 Original journal article . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

xiv TABLE OF CONTENTS

5 Data Communication for mmWave Multi-User MIMO 57

5.1 Introduction................................. 57

5.2 Clarification of each authors’ contributions . . . . . . . . . . . . . . . . 57

5.3 Original journal article . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6 Beam Scheduling for mmWave Relay Networks 77

6.1 Introduction................................. 77

6.2 Clarification of each authors’ contributions . . . . . . . . . . . . . . . . 77

6.3 Original journal article . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

7 Conclusions 95

7.1 Summaryofthisthesis........................... 95

7.2 Futuredirections .............................. 96

Appendix A Acronyms and Abbreviations 99

Bibliography 101

Introduction

1.1 Background for mmWave communication

Wireless communication has become an integral part of our lives today. As data-hungry

mobile devices and applications become increasingly prevalent, the mobile communication

infrastructure needs to evolve dramatically to support the exploding demand for wireless

data. Ericsson has predicted that the volume of mobile data traffic will increase five folds

from 2018 to 2024, reaching 136 exabytes (EBs) per month (as illustrated in Figure. 1.1),

equivalent to a compound annual growth rate of 31% [1]. This ever growing trend is

expected to continue mainly due to the services that require massive data, such as high

definition (HD) video streaming, online gaming, virtual reality applications and so on [2,

3, 4]. For example (e.g.), video streaming contributed 60% (

∼

16 EB/month) of total

mobile traffic in 2018, which is expected to reach 74% (

∼

100 EB/month) by 2024 [1].

In addition, billions of new devices envisioned to be connected in the future generation

of wireless networks to provide massive connectivity are also expected to contribute to

the increase in data consumption [1].

The motivation for the evolution from 1G to 2G, 3G, and 4G (the first, second,

third and fourth generation mobile communication, respectively) was to improve a

particular aspect of mobile communication. For example, 1G to 2G transition improved

voice service and increased network capacity by using digital communications; 3G and

4G were developed to improve the data rates. However, the evolution of the next

generation mobile communication (5G) features concurrent improvements in many

areas. Specifically, they include high data rates (1

−

10 Gbps), low latency (less than 1

milliseconds), massive connectivity (

∼

tens of billions new devices), and better quality

of service [1, 5, 6].

The international telecommunication union (ITU) has classified the 5G usage

scenarios (i.e., the international mobile telecommunications (IMT) for 2020 and beyond)

21.1 Background for mmWave communication

16 Ericsson Mobility Report | November 2018Forecasts

Mobile data traffic outlook

In 2024, 5G networks will carry 25 percent

of global mobile data traffic.

Monthly mobile data traffic per smartphone

continues to increase in all regions, driven

by improved device capabilities and more

affordable data plans, as well as an increase

in data-intensive content.

North America has the highest monthly

usage, reaching 8.6 gigabytes (GB) at the

end of this year, and is set to reach 50GB

by the end of 2024. In North East Asia, traffic

per smartphone has grown strongly during

2018 – increasing by around 140 percent

year-on-year. The region now has the

second-highest monthly usage at 7.3GB and

is projected to reach 21GB at the end of the

forecast period. Attractive data plans as well

as innovative mobile apps and content are

driving growth in China.

Total mobile data traffic is expected

to be five times higher

Close to 90 percent of total mobile data

traffic is generated by smartphones today

– a figure which is projected to reach

95 percent at the end of 2024. As monthly

usage per smartphone continues to increase,

total mobile data traffic is predicted to rise at

a compound annual growth rate (CAGR) of

31 percent over the forecast period, reaching

136 exabytes (EB) per month by the end

of 2024. It is expected that 25 percent of

mobile data traffic worldwide will be carried

by 5G networks at that time. This is 1.3 times

more than the total traffic today.

Currently, the 5G traffic forecast does not

include traffic generated by fixed wireless

access (FWA) services. However, as FWA

is one of the early use cases planned for 5G

in some regions, it could have a significant

impact on the forecast figures, depending

on market uptake of the service.

5G data traffic 4G/3G/2G data trafficGlobal mobile data traffic (EB per month)

100

140

120

20242014 2018 2020 20222016

Mobile data traffic per active smartphone (GB per month)

5040302010

North America 2018

2024

Middle East

and Africa

2018

2024

North East Asia 2018

2024

South East Asia

and Oceania

2018

2024

Latin America 2018

2024

India 2018

2024

Central and

Eastern Europe

2018

2024

Western

Europe

2018

2024

8.6

2.9

7.3

3.8

3.4

6.8

4.7

6.1

Figure 1.1: Expected worldwide mobile data demand [1].

into three broad categories [7]: Enhanced mobile broadband (eMBB), ultra-reliable

and low-latency communications (uRLLC), and massive machine type communications

(mMTC). Specifically, eMBB aims to meet the customers’ demand for an increasingly

digital lifestyle, and focuses on services that have high requirements for bandwidth,

such as HD videos, virtual reality (VR), and augmented reality (AR). uRLLC aims to

meet expectations for the demanding digital industry and focuses on latency-sensitive

services, such as assisted and automated driving, and remote management. And mMTC

aims to meet demands for a further developed digital society and focuses on services

that include high requirements for connection density, such as smart city and smart

agriculture. Figure. 1.2 illustrates some examples for the envisioned 5G usage scenarios

in IMT 2020 and beyond [8].

Refer to [7], the IMT 2020 requirements for 5G data rates are: peak data rate of

20 Gbps in downlink, 10 Gbps in uplink, user perceived data rate of 100 Mbps in

downlink and 50 Mbps in uplink, spectral efficiency of 30 bps/Hz in downlink and

15 bps/Hz in uplink. In order to achieve such high data rates, some key technologies

are expected to be the integral parts of 5G, e.g., network densification with various

small cells for interference management [9, 10, 11], massive multiple input multiple

output (MIMO) for spatial multiplexing [12, 13], and shifting to higher frequency for

larger bandwidth [14, 15]. In particular, due to the congestion of sub-6GHz bands

used by current cellular networks, immigration towards millimeter wave (mmWave)

frequency bands (30-300 GHz) is considered as the most attractive enabler for 5G and

41.1 Background for mmWave communication

1.1.1 Distinctive mmWave characteristics

Although the available bandwidth of mmWave frequencies is very large, the propagation

characteristics are significantly different from that of the microwave frequency bands,

which can be briefly summarized as follows [22, 16]:

•

Path loss. From Friis’s law, the isotropic path loss increases with the carrier

frequency. As an example, the free-space path loss decays with the square of

carrier frequency. Thus, in a point-to-point communication, one may expect

significant path loss when we move from sub-6GHz to great than 30 GHz carrier

frequency [23].

•

Diffraction and blockage. Diffraction leads to wave propagation in the geometrical

shadow region behind obstacles. Diffraction may cause a non-negligible multipath

propagation under both line of sight (LOS) and non-LOS conditions [24]. From

electromagnetic theory, it is well understood that electromagnetic waves experience

a difficulty to diffract when they propagate at obstacles with physical dimensions

significantly larger than the wavelength [25]. Furthermore, signals at microwave

frequencies can penetrate more easily through solid materials and buildings than

mmWaves. For these reasons, mmWave signals are influenced by the effect of

shadowing and diffraction to a much greater extent than microwave signals. For

instance, one can observe more than 35 dB blockage losses due to bricks, concretes,

etc., and around 35 dB due to human body, where these losses are negligible at

microwave frequency bands [14].

•

Rain attenuation. In general, the losses due to a rain attenuation at mmWave

frequency bands are much larger than those of microwave bands. If we consider

a typical mmWave frequency of 73 GHz, one can observe a rain attenuation of

roughly 10 dB/km, which is quite large [26].

•

Atmospheric absorption. Field measurement results have shown that mmWave

signals are more susceptible to oxygen absorption than that of microwave signals.

For instance, one can observe roughly 20 dB loss around 60 GHz mmWave signal

(see Fig. 1 of [27]).

•

Foliage loss. The attenuation of radio signals caused due to the presence of trees

obstructing the radio link is termed as foliage loss. Foliage losses for mmWaves

are significant and can be a limiting factor for some propagation environments.

Empirical results demonstrate that at 10 m foliage penetration, the loss at 80 GHz

mmWave frequency reaches around 23

5dB, which is about 15 dB higher compared

to that of the 3GHz microwave frequency [28].

1. Introduction 5

1.1.2 Potentials and challenges for mmWave communication

Having understood the distinctive characteristics of mmWave propagation, the ad-

vantages of mmWave communication are self-evident. As we have mentioned before,

mmWave frequencies allow for larger channel bandwidth allocations which directly result

in higher data rates and indirectly to reduced latencies of the network. Namely, service

providers will be able to support user-data-hungry applications with minimal latency.

Also, mmWave communication can be used in small cell setting with reduced coverage

area, i.e., to establish more densely packed communication links and exploit spatial

reuse to provide increased capacity gains.

In addition to the large bandwidth and capacity gain, the utilizing of massive MIMO

techniques can guarantee many extra performance gains for mmWave communication:

•

Beamforming gain. The small wavelength at mmWave frequencies enables to pack

a large number of antenna elements in a small form factor, which offers high

beamforming gain to compensate for the excessive free space propagation path

loss.

•

Interference suppression. In multi-user systems, the use of multiple antennas at

the transmitter and receiver can significantly increase the transmission directivity,

which accordingly can increase the potential to alleviate intra-channel interference.

This is achieved via precoding at the transmitter or combining at the receiver, or

a combination of both.

•

Diversity gain. Spatial diversity can be exploited in mmWave systems with multiple

antennas at both ends to mitigate the impact of fluctuations and loss of signals in

the channel.

•

Multiplexing gain. With multiple antennas at the transmitter, parallel streams can

be transmitted to the users without using additional bandwidth or power. This

increases the number of spatial dimensions for communication.

However, despite the great potential associated with mmWave communication, a

number of challenges need to be addressed so as to be able to exploit these benefits.

•

Power consumption. In a conventional fully-digital massive MIMO structure, a

radio frequency (RF) chain is dedicated to each antenna element, which would result

in using a large number of RF chains at mmWave frequencies, and consequently

imposes prohibitive power consumption.

•

Hybrid transceiver architecture. A common solution to reduce the power

consumption at mmWave systems is to utilize a hybrid transceiver architecture.

Namely, each RF chain is cascaded with a digital baseband unit, which leads to

61.2 Related works in the state of the art

a much lower number of RF chains in comparison with the number of antennas.

Accordingly, traditional channel estimation / precoding / combining approaches

devised for fully-digital MIMO architectures are not applicable to mmWave hybrid

MIMO architectures. This is because the transceivers lack straight access to each

antenna element owing to the limited number of RF chains. In addition, the

channel matrix is large due to employing large arrays in mmWave systems, and

the signal to noise ratio (SNR) is fairly low as a result of severe path loss before

beamforming.

•

User Mobility. A major challenge that comes with user mobility in mmWave

transmissions is the significant fluctuations of the channel coefficients since channel

coherence time in the mmWave range is very small resulting in a large Doppler

spreading. Thus, the signaling schemes used in mobile mmWave communication

must take into account the fast time-varying channel states.

•

Integrated circuit (IC) design. Additional factors that need to be considered

when designing ICs for mmWave systems with high carrier frequencies and wide

bandwidth include non-linear distortions in the power amplifiers (PAs), phase

noise and IQ (in-phase and quadrature) imbalance because the severity of these

errors scales up with high frequency transmissions.

•

mmWave relaying. With the increasing interest in developing small cells for

mmWave communication, how to use relays to increase coverage and to support

mmWave wireless backhaul for dense small cell deployments remain a big challenge.

The main concern thereby includes how to guarantee efficient beam scheduling,

high data rate, network stability, low latency, etc..

1.2 Related works in the state of the art

Due to the great potential of mmWave communications, multiple international

organizations have emerged for the standardization. In particular, IEEE 802.15.3c

specifies the physical layer and MAC (Media Access Control) layer for indoor wireless

personal area network (WPAN, also referred to as the piconet) at unlicensed 60 GHz

band, which is composed of several wireless nodes and a single piconet controller [29].

IEEE 802.11ad specifies the physical layer and MAC layer for local area network (LAN)

at 60 GHz band to support multi-Gbps wireless applications [30]. In particular for

the physical layer, two operating modes are defined, the orthogonal frequency division

multiplexing (OFDM) mode for high performance applications (e.g., high data rate), and

the single carrier (SC) mode for low power and low complexity implementation [29, 6].

Additionally, 5G NR (new radio), which is designed to be the global standard for a unified

and more capable 5G wireless air interface, specifies the capability to use mmWave bands

1. Introduction 7

to achieve high data rates, enhanced network energy performance, forward compatibility,

low latency, and beam-centric design to allow for massive number of antennas [3]. I

would refer to [3, 6, 29, 30] for a more detailed review of the corresponding standards.

In addition to the aforementioned standardization activities, many research efforts

have also been put into mmWave system design. In terms of the different communication

phases in a wireless system, we classify the related literature into four categories, i.e.,

the initial access phase, the data communication phase, the relay networking and finally

the hardware aspect.

An essential component to obtain large antenna gains at mmWave frequencies consists

of identifying suitable narrow beam combinations in the initial access phase. The problem

of finding an AoA-AoD (angle of arrival, angle of departure) pair is referred to as beam

alignment (BA). The inefficiency of naive exhaustive search and the spars characteristic of

mmWave channels have motivated a large variety of BA approaches in the literature, e.g.,

the multi-level hierarchical schemes [31, 32, 33, 34, 35], the compressed sensing schemes

[36, 37, 38, 39], etc.. All these algorithms, in some way, suffer from some limitations, e.g.,

non-scalable for multi-user scenarios, long-time invariant channel assumptions, limited

to single-side training, etc..

A large number of works on hybrid architectures have investigated the data

communication phase for mmWave systems with an assumption of full channel state

information (CSI) [40, 41, 42, 43, 44, 45, 46, 47], namely, the vectors of baseband

complex channel coefficients at each array element are known. These works focus on

the optimization of the hybrid precoder using the full CSI knowledge. Unfortunately,

this assumption is obviously not feasible in a realistic system, since in order to acquire

such coefficients, one should be able to sample each antenna element, i.e., one would

need an RF chain per antenna element or exhaustively measure all elements successively.

Obviously, the former is prohibitively power consuming and the latter is prohibitively

time consuming.

While relays on sub-6GHz bands suffer from severe interference due to their

ominidirectional transmissions, the directivity of mmWave antennas significantly

mitigates interference, especially in backhaul systems [19, 48]. Recently many efforts

have also been made to study the mmWave relay network regime with an emphasis on

one or several aspects, such as relay selection, congestion control, routing, scheduling and

so on [17, 49, 19, 50, 48, 51]. However, we observe that the existing works more or less

encounter some limitations, e.g., the limitation of single path streaming, the ignorance

of source admission control, etc.. Particularly, a fundamental information theoretical

understanding of mmWave relay networks in terms of its potential at maximum is rather

unexplored.

For the hardware aspects, a common theme that underlies most of the hybrid

mmWave works is that the fully-connected (FC) architecture outperforms the subarray

architecture only at the cost of a higher hardware complexity. However, many reference

81.3 Contributions and structure of this thesis

works [52, 41, 40, 46, 43] have ignored hardware impairments [42], such as the power

dissipation, the PA nonlinear distortion and so on [53]. In particular, the nonlinear

PAs employed at the BS can drastically distort the transmit signal when operated close

to saturation [54]. To this end, a certain power backoff from the saturation power of

a PA should be considered accordingly for different signaling schemes and transceiver

architectures, such that the PAs can always work in their linear operating region.

As we can see, although many studies have been dedicated to mmWave communication

in the last decade, there are still many research gaps regarding to a practical mmWave

implementation.

1.3 Contributions and structure of this thesis

This thesis is an accumulation of publications. It is based on four selected journal papers

(three published papers after peer-reviewing [55, 56, 57] and one to-be-submitted journal

manuscript [58]), which I wrote as first author. These four journal papers constitute the

four main chapters (Chapter 3 - Chapter 6) of this thesis.

An overview of the thesis structure and the contributions of each chapters is given

in below.

•

Chapter 1 is the introduction, which provides the background of mmWave

communication as well as its distinctive characteristics, potential, challenges

and the state of the art.

•

Chapter 2 provides an description of mmWave wireless communication systems as

well as the relevant concepts. The mathematical channel and signaling models for

mmWave multi-user MIMO (MU-MIMO) are also provided in this chapter so as

to prepare the reader for the technical subjects covered in this thesis.

•

Chapter 3 studies the initial beam alignment (BA) problem for OFDM mmWave

systems. This chapter presents an efficient BA scheme, which explores the AoA-

AoD channel domain through pseudo-random multi-finger beam patterns, and

then constructs an estimate of the resulting channel second-order statistics. The

resulting under-determined system of equations is efficiently solved by using the

technique of non-negative least-squares (NNLS). As a result of quadratic channel

measuring, the proposed scheme is highly robust to variations of the channel

time-dynamics compared with the concurrent approaches in the literature. Also,

since all the estimations take place in the downlink, the proposed approach has a

strong scalability for multi-user scenarios.

•

Chapter 4 is a horizontal extension of Chapter 3, which studies the BA problem

for single-carrier (SC) mmWave systems. In this Chapter, we propose a new BA

scheme where the base station (BS) periodically probes the channel in the downlink

1. Introduction 9

via a pre-specified pseudo-random beamforming codebook and pseudo-random

spreading codes, letting each user equipment (UE) estimate its strongest path

direction. This scheme again formulates the BA problem as the estimation of a

sparse non-negative second-order statistic channel vector and then uses NNLS

technique to efficiently find the strongest AoA-AoD pair connecting each UE to the

BS. The proposed scheme is completely done in time domain and is highly robust to

fast channel variations caused by the large Doppler spread between the multipath

components. Furthermore, this chapter will show that after achieving BA, the

beamformed channel is essentially frequency-flat, such that SC communication

needs no equalization in the time domain.

•

Chapter 5 focuses on data communication after BA is achieved. This chapter

presents two typical hybrid digital analog (HDA) mmWave antenna architectures

that can be regarded as two extreme cases, namely, the fully-connected (FC) and

the one-stream-per-subarray (OSPS) architectures. A joint evaluation of the initial

BA and the consequent data communication is considered, where the latter takes

place by using the beam direction information obtained by the former. A family

of MU-MIMO precoding schemes are investigated to well adapt to the hybrid

architectures and the beam information extracted from the BA phase. In addition,

the power efficiency of the two hybrid architectures are also evaluated by taking

into account the power dissipation at different hardware components as well as the

power backoff under typical power amplifier constraints. A small conclusion from

this chapter is that the two architectures achieve similar sum spectral efficiency,

while the OSPS architecture is advantageous with respect to the FC case in terms

of hardware complexity and power efficiency, at the sole cost of a slightly longer

BA time-to-acquisition due to its reduced beam angle resolution.

•

Chapter 6 studies the relay networking for mmWave wireless systems. Although the

optimal beam directions for each node pair can be obtained through an BA phase,

how to efficiently schedule the beams, in terms of avoiding the queuing explosion

as well as assuring large data rates and small end-to-end delays is the main focus

of this chapter. More precisely, this chapter studies the beam scheduling problem

for mmWave half-duplex (HD) relay networks, where the relay topology can be

arbitrary and a link is active only if both nodes focus their beams to face each

other. The approximate information theoretical Shannon capacity is introduced to

help understand at maximum the potential of the underlying networks. Based on

the theoretically optimal schedule results, a prior network simplification procedure

is implemented to reduce the network topology complexity, on top of which two

practical beam scheduling schemes, i.e., the deterministic edge coloring (EC)

scheduler and the adaptive backpressure (BP) scheduler are presented. The former

is a very simple one-time computation and then periodical state repetition, hence

10 1.4 Notations

is more suitable for static scenarios. The later is an “online” approach which

will update in every time slots, thus is more favorable for time-varying scenarios.

Both of the proposed schedulers can achieve much smaller queuing backlogs,

much smaller backlog fluctuations, and much lower packet end-to-end delays in

comparison with the reference baseline scheme.

•Chapter 7 finally concludes the thesis and provides suggestions for future work.

1.4 Notations

Vectors, matrices and scalars are denoted by boldface small letters(e.g.,

) , boldface

capital letters (e.g.,

) and non-boldface letters (e.g.,

), respectively. Sets are

denoted by calligraphic letter

with its cardinality denoted by

|A|

. The empty set is

denoted by

∅

is for the expectation,

⊗

is for Kronecker product,

⊙

is for Hadamard

product,

⊛

is for continuous-time convolution.

denotes transpose,

A∗

denotes

conjugate, and

denotes conjugate transpose of a matrix

, respectively. The

complex circularly symmetric Gaussian distribution with a mean

and a variance

denoted by

(

µ, γ

). For an integer

K∈Z+

, the shorthand notation [

]is used to

represent the set of non-negative integers {1, ..., K}.

System Model for mmWave

MU-MIMO

5G promises great flexibility to support a myriad of Internet Protocol (IP) devices,

small cell architectures, and dense coverage areas. Applications envisioned for 5G

include the Tactile Internet, vehicle-to-vehicle communication, vehicle-to-infrastructure

communication, as well as peer-to-peer and machine-to-machine communication, all of

which will require extremely low network latency and on-call demand for large bursts of

data over minuscule time epochs. Figure. 2.1 shows how backhaul connects the fixed

cellular infrastructure (e.g., BS) to the core telephone network and the Internet [5].

http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2017.2734243, IEEE

Transactions on Antennas and Propagation

Fig. 1: Mobile networks are evolving from 4G towards 5G. Shown here are small cells, edge servers, wireless backhaul, and 5G multi-tier architecture.

to the backhaul. The BBU processes and modulates IP packet

data from the core network into digital baseband signals where

they are transmitted to remote radio heads (RRHs). The digital

baseband signal travels from the BBU to a RRH via a common

public radio interface (CPRI) through a digital radio-over-fiber (D-

RoF) connection, also known as fronthaul. The RRH converts the

digital signal to analog for transmission over the air at the carrier

frequency by connecting to amplifiers and antennas to transmit the

downlink from the cell tower. The RRH also converts the received

radio frequency (RF) uplink signal from the UEs into a digital

baseband signal which travels from the RRH to the BBU via the

same CPRI and D-RoF connection to the base of the cell tower.

The BBU then processes and packetizes the digital baseband signal

from the RRH and sends it through a backhaul connection to the

core network. In summary, fronthaul is the connection between the

RRH and BBU in both directions and backhaul is the connection

between the BBU and the core network in both directions.

Modern cellular architectures support a more flexible deploy-

ment of radio resources that may be distributed using a cloud

radio access network technique, where a BS is split into two parts

[42], one part where the RRHs are at remote cell sites, and in the

other part, one centralized BBU is located up to tens of kilometers

away (see Fig. 1). CPRI is used for fronthaul, and interconnects

the centralized BBU and multiple RRHs through D-RoF. MmWave

wireless backhaul and fronthaul will offer fiber-like data rates and

bandwidth to infrastructure without the expense of deploying wired

backhaul networks or long-range D-RoF [9], [43], [44].

B. Small Cells

An effective way to increase area spectral efficiency is to shrink

cell size [40], [45], [46] where the reduced number of users per

cell, caused by cell shrinking, provides more spectrum to each

user. Total network capacity vastly increases by shrinking cells and

reusing the spectrum, and future nomadic BSs and direct device-

to-device connections between UEs are envisioned to emerge in

5G for even greater capacity per user [47]. Femtocells that can

dynamically change their connection to the operator’s core network

will face challenges such as managing RF interference and keeping

timing and synchronization, and various interference avoidance and

adaptive power control strategies have been suggested [45]. An

analysis of the wireless backhaul traffic at 5.8 GHz, 28 GHz,

and 60 GHz in two typical network architectures showed that

spectral efficiency and energy efficiency increased as the number of

small cells increased [48], and backhaul measurements and models

at 73 GHz were made in New York City [20], [49]. Work in

[50] showed a theory for power consumption analysis, which is

strikingly similar to noise figure, for comparing energy efficiency

and power consumption in wideband networks. An early small-cell

paper [51] gave insights into enhancing user throughput, reducing

signaling overhead, and reducing dropped call likelihoods.

C. Multi-tier Architecture

The roadmap for 5G networks will exploit a multi-tier archi-

tecture of larger coverage 4G cells with an underlying network of

closer-spaced 5G BSs as shown in Fig. 1. A multi-tier architecture

allows users in different tiers to have different priorities for channel

access and different kinds of connections (e.g., macrocells, small

cells, and device-to-device connections), thus supporting higher

data rates, lower latencies, optimized energy consumption, and

interference management by using resource-aware criteria for the

BS association and traffic loads allocated over time and space [52].

Schemes and models for load balanced heterogeneous networks in

a multi-tier architecture are given in [53], [54]. 5G applications

will also require novel network architectures that support the

convergence of different wireless technologies (e.g., WiFi, LTE,

mmWave, low-power IoT) that will interact in a flexible and

seamless manner using Software Defined Networking and Network

Virtualization principles [55], [56].

D. 5G Air Interface

The design of new physical layer air interfaces is an active area

of 5G research. Signaling schemes that provide lower latency, rapid

beamforming and synchronization, with much smaller time slots

and better spectral efficiency than the orthogonal frequency division

Figure 2.1:

An illustration of 5G small cells, edge servers, wireless backhaul, and

multi-tier architecture model [5].

As we have discussed before, to address the ever increasing data demand, the wireless

industry for 5G is moving to mmWave frequencies, since for the backhaul/fronthaul,

14 2.2 Channel model

equals to a half-wavelength λ/2, the elements of aT(θk,l)and aR(ϕk,l)are given by

[aT(θ)](i′−1)·M

ˆ+d=ej(d−1)πsin(θ)·ejΨ(i′,θ), d ∈[M

ˆ](2.2a)

[aR(ϕ)]n=ej(n−1)πsin(ϕ), n ∈[N],(2.2b)

where in

(2.2a)

we assume that (

i′≡

, M

)for the FC architecture as shown

in Figure. 2.3(a), and (

i′∈

[

MRF

]

, M

MRF

)for the OSPS architecture as shown in

Figure. 2.3(b). The additional term

(

i′, θ

)in

(2.2a)

takes into account the phase shifts

among different subarrays, given by

Ψ(i′, θ) = 2π

λ(i′−1) ·Dx·sin(θ),(2.3)

where

i′

indicates the index of the subarrays and

Dx≥

0denotes the subarray center-

to-center spacing in the scan direction. Hence, in the special case with

= 0, all the

subarrays are co-located; while with

MRF ·λ

, the antenna element layout in the

scan direction for the OSPS architecture is exactly the same as for the FC architecture.

We adopt a block fading model, where the coefficient of the

-th multipath component

ρs,k,l

is constant over a short interval (within one slot) and changes from slot to slot

according to a wide-sense stationary process statistics characterized by its power spectral

density (Doppler spectrum) [60]. When the channel coherence time (related to the

inverse of the bandwidth of the Doppler spectrum, see [60]) is significantly larger than

the slot duration but equal or smaller than the (non-consecutive) slot separation in time,

a convenient model is to consider the coefficients as i.i.d. across different slots. Moreover,

the Doppler shift

νk,l

as defined in

(2.1)

introduces a continuous phase rotation for

each channel sample. Each multipath component (channel tap coefficient) is formed

by the superposition of a large number of micro-scattering components (e.g., due to

rough surfaces) having (approximately) the same AoA-AoD and delay. By the central

limit theorem, it is customary to model the superposition of these many small effects as

Gaussian [61, 62]. Hence, the multipath component coefficients can be modeled as Rice

fading given by

ρs,k,l ∼√γk,l (︄√︄ηk,l

1 + ηk,l

√1 + ηk,l

ρˇs,k,l)︄,(2.4)

where

γk,l

denotes the overall multipath component strength,

ηk,l ∈

,∞

)indicates the

strength ratio between the specular reflection (or LOS) and the scattered components,

and

ρˇs,k,l ∼ CN

1) is a zero-mean unit-variance complex Gaussian random variable

whose value changes in an i.i.d. fashion across different slots. In particular,

ηk,l → ∞

indicates a pure LOS path while

ηk,l

= 0 indicates a pure scattered path, affected by

Rayleigh fading.

2. System Model for mmWave MU-MIMO 15

The AoA-AoDs (

ϕk,l, θk,l

)in

(2.1)

can take on arbitrary values in the continuous

AoA-AoD domain. Following the widely used approach of [63], known as beam-domain

representation, we obtain a finite-dimensional representation of the channel response

(2.1). More precisely, we consider the discrete set of AoA-AoDs

Φ := {︃ϕ

ˇ: (1 + sin(ϕ

ˇ))/2 = n−1

N, n ∈[N]}︃,(2.5a)

Θ := {︃θ

ˇ: (1 + sin(θ

ˇ))/2 = m−1

M, m ∈[M]}︃.(2.5b)

It follows that the corresponding sets

{aR

(

) :

ˇ∈

}

and

{aT

(

) :

ˇ∈

}

form discrete dictionaries to represent the channel response. For the ULAs considered

in this paper, the dictionaries

and

, after suitable normalization, reduce to

the columns of unitary discrete Fourier transform (DFT) matrices

FN∈CN×N

and

FM∈CM×M, with elements

[FN]n,n′=1

√Nej2π(n−1)(n′−1

N−1

2), n, n′∈[N],(2.6a)

[FM]m,m′=1

√Mej2π(m−1)(m′−1

M−1

2), m, m′∈[M].(2.6b)

Consequently, based on a subarray basis indexed by

i′

, the beam-domain representation

of the channel response (2.1) is given by [63, 15]

ˇi′

s,k(t, τ) = FH

NHs,k(t, τ)·(︂FM⊙1{(i′−1)M

ˆ+1:i′M

ˆ,1:M})︂

∑︂

l=1

ˇi′

s,k,l(t)δ(τ−τl),(2.7)

where (

i′≡

, M

)for the FC architecture, and (

i′∈

[

MRF

]

, M

MRF

)for the

OSPS architecture. Here we define H

ˇi′

s,k,l

(

) :=

s,k,l

(

)

·(︂FM⊙1{(i′−1)M

ˆ+1:i′M

ˆ,1:M})︂

as the beam-domain

-th multipath component between the

-th UE and the BS, where

1{a1:a2,b1:b2}∈CM×M

is an indicator matrix, with 1at the components indexed by rows

from

and by columns from

, otherwise zero. The indicator matrix takes

into account the fact that the number of antenna elements for each subarray in the

OSPS architecture is MRF times less than that in the FC architecture.

2.3 Signaling model

Let

(

) = [

xs,1

(

)

, xs,2

(

)

, ..., xs,K

(

)]

denote the continuous-time baseband equivalent

signal (either pilot or data signal), transmitted from the BS over the

-th slot. With

HDA beamforming, the beamformed signal at the output of the transmitter over the

16 2.3 Signaling model

s-th slot is generally given by

ˆs(t) = √︂E0·URF

s·WBB

s·xs(t),(2.8)

where for simplicity of exposition we restrict to the case of uniform power allocation, with

PtotTc

indicating the per-chip energy of each signal stream, where

Ptot

denotes the

total radiated power at the BS and

denotes the chip duration with

indicating

the signaling bandwidth. In

(2.8)

, we define

WBB

s∈CMRF×K

and

URF

s∈CM×MRF

the baseband (digital) and the RF analog beamforming matrices, respectively. Note

that, depending on the transmitter architecture, the analog beamforming matrix

URF

takes on the form

˜s,1,u

˜s,2,··· ,u

˜s,MRF ]and







˜s,10··· 0

0 u

˜s,2··· 0

.....

0 0 ··· u

˜s,MRF







(2.9)

for the FC and the OSPS architectures, respectively, where

˜s,i ∈CM

i∈

[

MRF

], with

for the FC architecture and

MRF

for the OSPS architecture. Hence, in

both cases

URF

has dimension

M×MRF

, but FC has a full matrix, while OSPS has a

block-diagonal matrix, due to the constrained connectivity. Without loss of generality,

the beamforming vectors are normalized as ∑︁MRF

i=1 ∥us,i∥2=MRF.

The beamformed signal

(2.8)

goes through the channel as defined in

(2.1)

. At the

UE side, because of the HDA architecture, the UE does not have direct access to

each antenna element. Instead, at each slot

, the UE obtains only a projection of

the received signal by applying some beamforming vector in the analog domain. For

notation simplicity, let’s consider a single RF chain at each UE with

NRF

= 1. The

extension to

NRF >

1is straightforward and will be considered in later sections. Thus,

the received signal at the k-th UE side is given by

yˆs,k(t) =vH

s,kHs,k(t, τ)⊛x

ˆs(t) + zs,k(t)

=√︂E0vH

s,kHs,k(t, τ)⊛(︂URF

s·WBB

s·xs(t))︂

+zs,k(t),(2.10)

where

vs,k ∈CN

denotes the normalized beamforming vector with

∥vs,k∥

= 1 at the

-th

UE, and

zs,k

(

)is the continuous-time complex additive white Gaussian noise (AWGN)

at the output of the UE RF chain, with a power spectral density (PSD) of

Watt/Hz.

2. System Model for mmWave MU-MIMO 17

In order to clearly describe the channel condition between the BS and a generic UE,

it is useful to first define the channel SNR before beamforming (BBF)

SNRBBF

, given by

SNRBBF, k =Ptot ∑︁Lk

l=1 γk,l

N0B.(2.11)

where

is the index of the UE and

γk,l

denotes the strength of the

-th multipath

component. The SNR in

(2.11)

indicates the ratio of the total received signal power

(summing over all the multipath components) over the total noise power at the receiver

baseband processor input, assuming that the signal is isotropically transmitted by

the BS and isotropically received at the

-th UE over the total bandwidth

. As

mentioned before, one of the challenges of mmWaves communication is that the SNR

before beamforming SNRBBF in (2.11) may be very low.

2.4 Summary

This chapter presents two “extreme” hybrid mmWave antenna architectures, on top of

which the mathematical channel and signaling models are also provided. The main object

of this chapter is to prepare the reader for the basic mmWave channel mathematics

covered in this thesis.

Initial Beam Alignment for mmWave

OFDM Systems

3.1 Introduction

To cope with the severe path loss at mmWave frequencies, directional beamforming

both at the BS side and the UE side is necessary in order to establish a strong path

conveying enough signal power. Finding such beamforming directions is referred to as

beam alignment (BA). This chapter presents an efficient BA scheme which can be used

in the initial access phase for mmWave OFDM systems.

3.2 Clarification of each authors’ contributions

This chapter is a journal publication, which is a joint work with Saeid Haghighatshoar

and Giuseppe Caire. I wrote this journal as the first author. The citation information is

in below:

X. Song, S. Haghighatshoar, and G. Caire,“A scalable and statistically robust

beam alignment technique for mm-wave systems,” IEEE Transactions on Wireless

Communications, 2018. DOI: 10.1109/TWC.2018.2831697

All the authors contributed to this paper, but I have implemented all the experiments

and simulations. I also wrote the complete first draft (including all sections) of this

paper.

Saeid Haghighatshoar provided valuable ideas for the channel and signaling model

as well as the mathematical techniques for the channel estimation. He also modified my

first draft in terms of its English expressions.

20 3.3 Original journal article

Giuseppe Caire, who is my PhD supervisor, provided valuable discussions in each

meeting of this work. He also did a final modification of the overall draft.

3.3 Original journal article

The following article is a reprint of the original journal paper. It is the accepted version

of the paper. The copyright information is given in page xii of this thesis as well as in

the first page of the reprinted paper.

A Scalable and Statistically Robust Beam

Alignment Technique for mm-Wave Systems

Xiaoshen Song, Student Member, IEEE, Saeid Haghighatshoar, Member, IEEE, Giuseppe Caire, Fellow,

IEEE

for mm-wave systems,” IEEE Transactions on Wireless Communications, 2018. The published version can be found online: https://ieeexplore.ieee.

org/abstract/document/8356247. This reprint is the accepted version of the paper.

Abstract—Millimeter-Wave (mm-Wave) frequency bands

provide an opportunity for much wider channel bandwidth

compared with the traditional sub-6 GHz band. Communi-

cation at mm-Waves is, however, quite challenging due to

the severe propagation pathloss incurred by conventional

isotropic antennas. To cope with this problem, directional

beamforming both at the Base Station (BS) side and at the

User Equipment (UE) side is necessary in order to establish

a strong path conveying enough signal power. Finding such

beamforming directions is referred to as Beam Alignment

(BA). This paper presents a new scheme for efficient BA.

Our scheme finds a strong propagation path identified by

an Angle-of-Arrival (AoA) and Angle-of-Departure (AoD)

pair, by exploring the AoA-AoD domain through pseudo-

random multi-finger beam patterns, and constructing an

estimate of the resulting second-order statistics (namely,

the average received power for each pseudo-random beam

configuration). The resulting under-determined system of

equations is efficiently solved using non-negative constrained

Least-Squares, yielding naturally a sparse non-negative vector

solution whose maximum component identifies the optimal

path. As a result, our scheme is highly robust to variations

of the channel time-dynamics compared with alternative

concurrent approaches based on the estimation of the in-

stantaneous channel coefficients, rather than of their second-

order statistics. In the proposed scheme, the BS probes the

channel in the Downlink (DL) and trains simultaneously an

arbitrarily large number of UEs. Thus, “beam refinement”,

with multiple interactive rounds of Downlink/Uplink (DL/UL)

transmissions, is not needed. This results in a scalable BA

protocol, where the protocol overhead is virtually independent

of the number of UEs since all the UEs run the BA procedure

at the same time. Extensive simulation results illustrate that

our approach is superior to the state-of-the-art BA schemes

proposed in the literature in terms of training overhead

in multi-user scenarios and robustness to variations in the

channel dynamics.

Index Terms—Millimeter-Wave, Beam Alignment, Com-

pressed Sensing, Non-Negative Least-Squares (NNLS).

I. INTRODUCTION

Communication at millimeter-waves (mm-Waves) provides

an opportunity to fulfill the demand for high data rates in

the next generation communication networks because of

the large available bandwidth [1]. A critical challenge to

signaling at mm-Waves compared with sub-6 GHz spec-

The authors are with the Electrical Engineering and Computer Science

Department, Technische Universit¨

at Berlin, 10587 Berlin, Germany (e-

mail: [email protected]).

X. Song is sponsored by the China Scholarship Council

(201604910530).

trum is the severe propagation loss when conventional

isotropic antennas are used [2]. The standard way to

counter the isotropic pathloss consists of using antenna gain

at both the transmitter and the receiver sides. In a mobile

environment, such antenna gain is achieved by electroni-

cally steerable antenna arrays, in order to cope with beam

direction changes due to the relative motion of transmitter

and receiver. Fortunately, due to the small wavelength, it

is possible to package a large number of antenna elements

in a small form factor, such that large antenna arrays

can be implemented at both the Base Station (BS) side

and the User Equipment (UE) side. Moreover, it has been

observed experimentally and modeled mathematically that

the propagation channel at mm-Waves is formed by a very

sparse collection of scatterers in the angle domain [3–6].

This implies that, to establish reliable communication, the

BS and the UE need to focus their beams in the direction

of a strong path. For example, in the case of Line-of-Sight

(LoS) propagation, the beams must point at each other

since the LoS path is typically the strongest one.

More in general, we refer to the problem of finding

a narrow beam direction at both the BS and the user

sides yielding a SNR after beamforming above a desired

threshold as the Beam Alignment (BA) problem. This

problem is quite well studied in the literature [3–16]. In

particular, it is known to be a challenging problem since in

mm-Waves the SNR before beamforming (i.e., in isotropic

propagation conditions) is typically very low, especially

in outdoor non-LoS conditions. Moreover, although the

number of array antennas may be very large, the number

of Radio Frequency (RF) chains is limited, due to the

difficulty of implementing a full RF chain (including

A/D conversion, modulation, and PA/LNA amplification)

for each array element in a very small form factor and

for a very large bandwidth. The small number of RF

chains prevents the implementation of classical digital

beamforming schemes in the baseband domain. Hence, a

widely studied approach consists of Hybrid-Digital-Analog

(HDA) beamforming [7, 17]. In this case, a naive sequential

scanning of the Angle-of-Departure (AoD) and Angle-of-

Arrival (AoA) domains with narrow beams in order to

find an alignment to a strongly connected propagation

path is very time-consuming and would incur a large

initial acquisition protocol overhead, not suited for outdoor

mobile applications [11–18].

3. Initial Beam Alignment for mmWave OFDM Systems 21

A. Related State-of-the-Art

The inefficiency of naive alignment search has motivated

BA algorithms based on hierarchical adaptive search, in-

teractive search, and Compressed Sensing (CS) techniques

[8–16].

The fundamental idea of hierarchical methods is to use

wider beam patterns at the start of the search and to refine

them in several consecutive stages. In [11], for example, the

authors develop a bisection algorithm in which the range

of AoDs and AoAs are divided by a factor of 2at each step

and is refined by probing the resulting 2×2sections and

identifying the section with the maximum received power.

A similar idea using overlapped beam patterns is used in

[12]. Such hierarchical techniques, however, require the

interaction of the BS with each individual UE, since the

training is bi-directional and involves both Downlink (DL)

probing and Uplink (UL) feedback for each iterative round.

In [13], a method is proposed where the BS and the UE

iteratively and collaboratively identify the dominant eigen-

vector of their channel matrix via the well-known power

method. However, this approach requires to demodulate the

signal at each antenna both at the BS and at the UE sides.

Therefore, this method is essentially incompatible with the

HDA beamforming structure.

More recently, considering the natural channel sparsity

in the AoA-AoD domain [3–6], CS-based algorithms have

been proposed for BA in mm-Waves [14–16, 19–21]. These

algorithms are efficient and particularly attractive for multi-

user scenarios, but they are based on the assumption that

the instantaneous channel remains invariant during the

whole probing/measuring stage (the same assumption is

also adopted in [11, 12]). This assumption is typically not

satisfied in practice due to the large Doppler spread at mm-

Waves, implying significant time-variations of the chan-

nel coefficients even in conditions of moderate mobility

[22, 23].1

B. Contributions

In this paper, we propose a novel BA scheme that has

the following advantages compared with the existing works

in the literature:

1) Low-Complexity Beam Direction Estimation: Our

scheme finds a strong propagation path identified by

an AoA-AoD pair, by exploring the AoA-AoD domain

through pseudo-random multi-finger beam patterns, and

constructing an estimate of the resulting second-order

statistics (namely, the average received power for each

pseudo-random beam configuration). The resulting under-

determined system of equations is efficiently solved using

Non-Negative Least-Squares (NNLS), yielding naturally

1Notice that the channel time-variations are greatly reduced after BA is

achieved, since once the beams are aligned, the effective channel angular

spread is very small [23]. However, before BA is achieved, the channel

variability over time can be large, since even a small motion of a few

centimeters traverses several wavelengths, potentially producing multiple

deep fades [22].

a sparse non-negative vector solution whose maximum

component identifies the optimal path.

2) System-Level Scalability: In our approach, the BS

actively probes the channel by periodically broadcasting a

beamforming codebook (consists of a sequence of pseudo-

random beamforming patterns) over reserved beacon slots

in the DL, while all UEs stay in listening mode. Measure-

ments are collected by the UEs, which locally and in-

dependently identify the AoA-AoD of a strong multipath

component. Since there is no need for interaction between

the BS and each UE, the proposed BA scheme is highly

scalable and its overhead and complexity do not grow with

the number of active users in the system.

3) User-Specific Beamforming Codebook: During the

beacon slots, each UE makes use of its own receive

beamforming codebook. The BS needs no knowledge of

such codebook, which can be locally generated by each UE.

We shall show that the optimal angular spreading factor of

the receiver beamforming patterns yielding the fastest BA

acquisition time depends on the pre-beamforming SNR.

Hence, our method has the advantage that beamforming

codebook of each UE can be individually and locally

tailored, depending on hardware constraints (number of RF

chains) and SNR conditions, without impacting the overall

system functions.

4) Robustness to Variations in Channel Statistics: Our

scheme is based on quadratic measurements (i.e., averaged

received power, yielding estimates of the channel second

order statistics), rather than linear measurements of the

channel coefficient vectors. As such, our scheme is highly

robust to variations in the channel time-dynamics. We

also illustrate via numerical simulations that existing CS-

based algorithms fail to estimate the channel strong path

direction when the channel is significantly time-varying,

i.e., it undergoes several fading cycles during the estimation

period, whereas our scheme performs well for a wide range

of channel dynamics. Using channel second order statistics

for BA is also considered in [24] via the Maximum Like-

lihood (ML) estimation of the channel covariance matrix.

However, in [24] the channel probing signals are transmit-

ted isotropically through a single antenna or via a fixed

beamforming pattern from the BS side. The drawback is

that with isotropic transmission the received SNR at the UE

side might be very low whereas with fixed beamforming

the transmit pattern might not hit any strong multipath

component. Moreover, in [24] the UEs can estimate only

their corresponding AoAs rather than the joint AoA-AoD

pairs of the strong paths. In contrast, our scheme yields the

joint AoA-AoD pairs, allowing full BA at both the BS and

the UE sides.

Notation: We denote vectors by boldface small (e.g.,

a) and matrices by boldface capital (e.g., A) letters.

Scalars are denoted by non-boldface letters (e.g., a,A). We

represent sets by calligraphic letter Aand their cardinality

with |A|. We denote the empty set by ∅. We use Efor the

expectation, ⊗for the Kronecker product of two matrices,

22 3.3 Original journal article

BS with Mantennas

Scatterer clusters

UE1with Nantennas

UE2with Nantennas

(Random codebook

NNLS estimation

(Random codebook

NNLS estimation

(BS →UEs

Pseudo-random codebook

Fig. 1: Illustration of the physical channel model and our

proposed Beam Alignment (BA) scheme.

ATfor transpose, A∗for conjugate, and AHfor conjugate

transpose of a matrix A. The output of an optimization

problem such as arg minx∈Xf(x)is denoted by x∗. The

complex circularly symmetric Gaussian distribution with a

mean µand a variance γis denoted by CN(µ, γ). For an

integer k∈Z, we use the shorthand notation [k]for the

set of non-negative integers {1, ..., k}.

II. BASIC SETUP

A. Channel Model

We consider a mm-Wave system including a BS

equipped with a Uniform Linear Array (ULA) with M

antennas and mMRF chains. We consider a generic

UE, also equipped with a ULA with Nantennas and

nNRF chains. We assume that both the BS and UE

arrays have the antenna spacing d=λ

2, where λis the

wavelength given by λ=c0

f0, where c0is the speed of

the light and f0is the carrier frequency. We denote by

θ, φ ∈[−π

2,π

2]the steering angles with respect to the BS

and UE arrays. We represent the array responses of the BS

and UE to a planar wave coming from the angles θand φ

by the M-dim and N-dim array vectors a(θ)∈CMand

b(φ)∈CNrespectively, with elements

[a(θ)]k=ej(k−1)πsin(θ), k ∈[M],(1a)

[b(φ)]l=ej(l−1)πsin(φ), l ∈[N].(1b)

We assume that the communication between the BS and the

UE occurs via a collection of sparse multi-path components

(MPCs) in the AoA-AoD-delay domain [1], where the N×

Mlow-pass equivalent impulse response of the channel at

a symbol time sis given by2

Hs(τ) =

l=1

ρs,lb(φl)a(θl)Hδ(τ−τl),(2)

2Consistently with the current technology trend in mm-Wave systems,

in this paper we focus on a Time-Division Duplexing (TDD), where the

UL and the DL communication occur over the same frequency band.

where ρs,l is the random channel gain of the l-th MPC

at AoA-AoD-delay (θl, φl, τl),l∈[L]. Typically the

number of significant MPCs satisfies Lmax{M, N}

[2]. In practice there may be a large number of MPCs

that convey such a small amount of signal power that can

be simply neglected since in any case they will not be

useful for signal transmission even after the BA is achieved.

Note that in the channel model, we made the implicit

assumption (very common in most beamforming and array

processing literature) that the communication bandwidth

Bis much smaller than the carrier frequency f0, such

that the array responses in (1) are essentially constant with

f∈[f0−B/2, f0+B/2]. We adopt a block fading model,

where the channel gains ρs,l,l∈[L], remain invariant

over the channel coherence time ∆tcbut change randomly

across different coherence times according to a given wide-

sense stationary process with given Doppler power spectral

density [25]. We also assume that each MPC is formed

by a cluster of micro-scatterers corresponding (roughly) to

the same delay and AoA-AoD (see Fig. 1), such that the

channel gains ρs,l ∼ CN(0, γl)have a zero-mean complex

Gaussian distribution.

We also assume that the angle coherence time, i.e., the

time scale over which the AoA-AoDs of the scatterers

{(θl, φl)}L

l=1 change significantly, is much longer than the

channel coherence time ∆tc. Hence, the angles can be

treated as locally constant (but unknown) during the BA

phase. This local stationarity of the scattering geometry

is widely used in the literature and confirmed by channel

sounding measurements (e.g., see [23, 26]).

B. Signaling Model

Consider the communication between the BS and a

generic UE. Since the BS has mRF chains, it can transmit

up to mdifferent data streams. For a given signaling inter-

val t0, let xs,i(t),t∈[st0,(s+ 1)t0), be the continuous-

time baseband equivalent signal corresponding to the i-th

data stream. We assume that the channel is time-invariant

over each symbol, i.e., t0<∆tc. To transmit the i-th data

stream, the BS applies a beamforming vector us,i ∈CM.

Without loss of generality, the beamforming vectors are

normalized such that kus,ik= 1.3The (baseband equiva-

lent) transmitted signal at symbol time sis given by

xs(t) =

i=1

xs,i(t)us,i.(3)

The corresponding received signal at the UE array is

rs(t) = ZHs(τ)xs(t−τ)dτ

l=1

i=1

ρs,lxs,i(t−τl)b(φl)a(θl)Hus,i

3Also, note that here we are assuming that the beamforming vectors

us,i,i∈[m]are implemented in the RF domain via an analog

beamforming network and therefore they are frequency flat, i.e., they are

constant over the whole signal bandwidth.

3. Initial Beam Alignment for mmWave OFDM Systems 23

l=1

i=1

ρs,lgBS

s,l,ixs,i(t−τl)b(φl)(4)

where gBS

s,l,i := a(θl)Hus,i denotes the beamforming gain

along the l-th MPC at the BS side for the i-th RF chain.

As stated before, we assume that the UE is also equipped

with nRF chains and the analog RF signal received at

the UE antenna array is distributed into these chains for

demodulation. This is achieved by signal splitters that

divide the signal power by a factor of n. The noise in the

receiver is mainly introduced by the RF chain electronics

(filter, mixer, and A/D conversion). It follows that the noisy

received signal at the output of the j-th RF chain at the

UE side is given by

ys,j(t) = 1

√nvH

s,jrs(t) + zs,j(t)

√n

l=1

i=1

ρs,lgBS

s,l,ixs,i(t−τl)vH

s,jb(φl)+zs,j (t)

i=1

√n

l=1

ρs,lgBS

s,l,igUE

s,l,jxs,i(t−τl) + zs,j(t)

(5)

where vs,j ∈CNdenotes the normalized beamforming

vector of the j-th RF chain at the UE side, where

gUE

s,l,j := vH

s,jb(φl)denotes the array gain of the j-th

RF chain along the l-th MPC, and where zs,j(t)is the

continuous-time complex Additive White Gaussian Noise

(AWGN) at the output of the j-th RF chain, with Power

Spectral Density (PSD) of N0Watt/Hz. The factor 1/√n

in (5) takes into account the power split said above.

In this paper, we consider OFDM signaling with given

subcarrier separation ∆f, hence, each symbol xs,i(t)in

the general model defined before corresponds here to an

OFDM symbol. The number of subcarriers is given by

F:= B/∆f, where Bdenotes the channel bandwidth

as defined before. We make the standard assumption that

the duration τcp of the Cyclic Prefix (CP) of the OFDM

modulation is longer than the channel delay spread, im-

plying t0= 1/∆f+τcp with τcp ≥max{τl}−min{τl}.

Hence, after OFDM demodulation, the Inter-Block Inter-

ference is completely removed and we can focus on a

per-symbol model in the frequency domain [25]. Also,

for simplicity, we neglect the effect of pulse-shaping in

the OFDM signaling and assume a frequency-flat pulse

response. Applying the Fourier transform to the matrix-

valued channel impulse response (2), the frequency-domain

channel matrix at symbol interval sis given by

Hs(f) =

l=1

ρs,lb(φl)a(θl)He−j2πfτl.(6)

We denote the OFDM subcarriers as {fω=ω

t0:ω∈[F]}.

The channel matrix at subcarrier ωis given by Hs[ω] :=

Hs(fω). Let ˇxs,i[ω]denote the frequency-domain data

symbol for the i-th stream. Applying OFDM demodulation

024

0.5

(a)

0.5

(b)

010

0.5

(c)

020

0.5

(d)

Fig. 2: Illustration of the sparsity of the channel matrix ˇ

Hs[ω]at

an arbitrary subcarrier ωconsisting of 3off-grid AoA-AoDs with

increasing number of antennas for M=N= 4 (a), M=N= 8

(b), M=N= 16 (c), M=N= 32 (d).

to the received signal (5), we obtain the corresponding

frequency-domain received signal at the j-th receiver RF

chain, with transmit beamforming vector us,i and receive

beamforming vector vs,j in the form

ˇys,i,j [ω] = 1

√nvH

s,jHs[ω]us,i ˇxs,i[ω] + ˇzs,j[ω]

√n

l=1

ρs,le−j2πω

t0τlgBS

s,l,igUE

s,l,j ˇxs,i[ω] + ˇzs,j[ω],

(7)

where ˇzs,j[ω]∼ CN(0, σ2)denotes the noise at j-th RF

chain of UE at subcarrier ω, with variance σ2= ∆fN0

which we assume is known for each UE [6].

C. Beam Alignment

During the DL probing slots (see frame structure dis-

cussed in Section III), we assume that the signal cor-

responding to different transmitted streams xs,i(t)are

orthogonal, i.e.,

hxs,i, xs,i0i:= Z(s+1)t0

st0

xs,i(t)∗xs,i0(t)dt =Eiδi,i0,(8)

where Eiis the energy per symbol for the i-th data stream

and δi,i0is the Kronecker delta symbol (equal to 1for

i=i0and 0 otherwise). For example, this can be obtained

in the frequency domain by using OFDM and mapping the

different streams onto sets of non-overlapping subcarriers.

Letting rs,i,j(t) := PL

l=1 ρs,lgBS

s,l,igUE

s,l,jxs,i(t−τl)denote

the signal contribution relative to the i-th transmitted data

stream of the BS received at the output of the j-th RF

chain of the UE (see (5)), defining Biand Pi=Ei/t0

to be the bandwidth and the average power of xs,i(t),

respectively, and recalling that γl=E[|ρs,l|2], the SNR

24 3.3 Original journal article

after beamforming (ABF) for the i-th data stream received

at the j-th RF chain at the UE is given by

SNRABFi,j :=

EhR(s+1)t0

st0|rs,i,j(t)|2dti

nN0Bi

=PiPL

l=1 γl|gUE

s,l,j|2|gBS

s,l,i|2

nN0Bi

.(9)

We define the total transmit power of the BS as Ptot =

i=1 Pi. In particular, for equal power allocation (Pi=

Ptot/m) over the streams, we have

SNRABFi,j =Ptot PL

l=1 γl|gUE

s,l,j|2|gBS

s,l,i|2

mnN0Bi

,(10)

For later use, we also define the SNR before beamforming

(BBF) by

SNRBBF := Ptot PL

l=1 γl

N0B.(11)

This is the SNR obtained when a single data stream

(m= 1) is transmitted through a single BS antenna and

is received in a single UE antenna (isotropic transmission)

over a single RF chain (n= 1) with full-band spreading.

A challenge in mm-Wave communication is that the

SNR before beamforming SNRBBF in (11) is typically

very low. This cannot be increased by simply boosting the

transmit power Ptot because of hardware and regulation

limitations, also because, in general, we would like to

design energy-efficient systems. Another option consists of

communicating over a small bandwidth B0< B. However,

it is well-known that this strategy is suboptimal.4In fact,

assuming a Gaussian channel with SNR equal to SNRBBF,

Shannon’s capacity formula yields that the achievable rate

in bit/s when communicating over a bandwidth B0is given

by R=B0log(1 + (B/B0)SNRBBF), which is increasing

for 0< B0≤B. Hence, by using a bandwidth smaller

than the available channel bandwidth B, the achievable

rate is reduced. It follows that the only viable alternative to

achieve a reasonable SNR consists in using antenna arrays

with a large number of antennas both at the BS and at the

UE. The goal of BA is to find good beamforming vectors

usand vsat the BS and the UE, respectively, in order

to boost the SNR by a factor ≈Mat the BS side and a

factor ≈Nat the UE side. This is achieved by aligning

the beamforming vectors along the AoA-AoD of a strong

MPC of the channel.

4This statement holds only in the case where the channel coefficients

change sufficiently slowly in time. More in general, for time-varying

wideband fading channels, it has been shown (e.g., see [27–30]) that

spreading the transmit power over the entire bandwidth is suboptimal and

drives the achievable rate to zero for B→ ∞. Intuitively, this is due to

the inability of the receiver to estimate the fading channel coefficients,

as explained in [27]. The issue of optimal signaling in the presence of

time-varying fading is quite intricate and goes beyond the scope of this

paper. As a matter of fact, when a large beamforming gain is available at

both the BS and the UE side, the effective channel coefficients after beam

alignment are slowly varying (see [23]) and the SNR after beamforming

is large enough, such that the channel can be treated as a standard block-

fading AWGN channel with fully known channel coefficients.

D. Sparse Beamspace Representation

The AoA-AoDs (θl, φl)in (6) take continuous values.

In this paper we adopt the approximate finite-dimensional

(discrete) beamspace representation following the well-

known approach of [1, 3, 31]. We consider the discrete

set of AoA-AoDs

Θ := {ˇ

θ: (1 + sin(ˇ

θ))/2 = k−1

M, k ∈[M]},(12a)

Φ := {ˇ

φ: (1 + sin(ˇ

φ))/2 = k0−1

N, k0∈[N]},(12b)

and use the corresponding array responses A:= {a(ˇ

θ) :

θ∈Θ}and B:= {b(ˇ

φ) : ˇ

φ∈Φ}as a discrete dictionary

to represent the channel response. For the ULAs consid-

ered in this paper, the dictionary Aand B, after suitable

normalization, yield orthonormal bases corresponding to

the columns of the unitary DFT matrices FMand FN

[5], where we define the D-dimensional DFT matrix with

elements

[FD]k,k0=1

√Dej2π(k−1)( k0−1

D−1

2), k, k0∈[D].(13)

Hence, we obtain the beamspace representation of the

channel matrix as Hs[ω] = FNˇ

Hs[ω]FH

M, where

Hs[ω] =

l=1

ρs,le−j2πω

t0τlˇ

b(φl)ˇ

a(θl)H,(14)

where ˇ

a(θl) := FH

Ma(θl)and ˇ

b(φl) := FH

Nb(φl)denote

the coefficient vectors of the array responses a(θl)and

b(φl)with respect to the DFT bases, respectively. The m0-

th entry of ˇ

a(θl)is given by

[ˇ

a(θl) ]m0=1

√M

M−1

i=0

e−j2πi(m0−1

M−1

2)ejπi sin(θl)

√M

sin(πψlM)

sin(πψl)e−jπψl(M−1),(15)

where ψl=m0−1

M−1

2sin(θl)−1

2. A similar expression

holds for ˇ

b(φl). It is seen from (15) that |[ˇ

a(θl) ]m0|=

√M|sin(πψlM)|

|sin(πψl)|is a localized kernel around θl=

sin−1[2(m0−1)

M−1] with a resolution of 1

M. In general, the

AoA-AoDs of the MPCs are not aligned with the discrete

grid G= Θ ×Φ. However, as the number of antennas

Mat the BS and Nat the UE increases, the DFT basis

provide good sparsification of the channel matrix ˇ

Hs[ω].

This is qualitatively illustrated in Fig. 2 for a channel with

L= 3 discrete off-grid MPCs. It is seen that, as Mand

Nincrease, the resulting representation ˇ

Hs[ω]is more and

more sparse.

III. PROPOSED BEAM-ALIGNMENT ALGORITHM

A. High-Level Overview

In the proposed scheme, the channel is periodically

probed by the BS while the UEs remain in the listen-

ing mode. During the listening mode, each UE gathers

3. Initial Beam Alignment for mmWave OFDM Systems 25

measurements of the channel, which is continued until the

UE gathers a sufficient number of measurements such that

the AoA-AoD of a strong MPC can be reliably identified.

After this directional channel estimation is done, the UE

tries to announce its identity (user ID) and its beam ID

(i.e., the index of the discrete AoD corresponding to the

estimated strong MPC) to the BS by sending a control

packet. Such control packet is sent over the Random Access

Control CHannel (RACCH), i.e., a dedicated slot in the

frame used for random access, as in virtually all current

cellular standard in use today. During the RACCH, the BS

stays in listening mode. If the control packet is successfully

decoded, the BS responds with a beamformed ACK using

the AoD information extracted from the control packet,

over a DL data slot. During the data slots, the UE stays

in listening mode using its own estimated beam. It follows

that the ACK enjoys the full (two-sided) beamforming gain.

At this point, BA is achieved and high-SNR communication

can take place over the data slots. An overview of the

proposed initial acquisition and BA protocol is illustrated

in Fig. 3.

Fig. 4 illustrates the proposed frame structure, consisting

of three parts: the DL beacon slot, the RACCH slot,

and Data Transmission slot. During the DL beacon slots

(corresponding to Fig. 3 #1), the BS probing signal is

formed by a sequence of pseudo-random beam patterns

(referred to as the transmit beamforming codebook), re-

peated periodically, and priori known to all UEs. Each

UE makes measurements of the beacon transmission by

applying its own (individual) sequence of receive beam

patterns (referred to as the receive beamforming codebook).

The number of measurements may differ from user to user,

depending on the individual pre-beamforming SNR and on

the number of receiver RF chains. We will show in the

simulation section that, when a UE is close to the BS,

i.e., its received signal power (SNR) is sufficiently high, it

can use wider beams and take less measurement rounds in

time to speed up the estimation. In contrast, when a UE

is far from the BS, i.e., the received signal power (SNR)

is very low, it applies narrower receive beams to achieve

sufficiently large SNR and takes more rounds in time to

collect sufficient number of measurements. In general, a

UE might not know the SNR of its channel and may

need an adaptive strategy to find a suitable beamwidth

for BA. Nevertheless, since the beacon signal is repeated

periodically, all the users no matter whether they are weak

or strong are able to gather as many measurements as they

need.

During the RACCH slots (Fig. 3 #2), the BS stays in

listening mode and uses its mRF chains to form mcoarse

beam patterns (sectors), covering the whole BS angle

domain, in order to provide some receiver beamforming

gain. Notice that the control packet in the RACCH may

fail because of incorrect directional channel estimation (i.e.,

the UE beam points in a wrong direction), or because of a

collision in the RACCH due to another user, or also simply

BS UE

#1 Pilot

Listening Mode

Estimation

#2 Control Packet

#3 ACK

#4 ACK

Random Access

#5 Data Data transmission

Fig. 3: Illustration of the proposed Beam Alignment (BA) process

between the BS and a generic UE. The procedures (#2∼#5) are

independently done at each UE. All the UEs share the same BS

beamforming codebook (#1).

because of statistical fluctuations of the noise, yielding a

small but non-zero packet error probability. In all these

cases, the BS will not respond with the ACK packet in the

data field, and the UE will try again, after gathering more

beacon measurements. It should be noticed that packet

losses in the RACCH are handled in various ways in

all cellular standards in operation today, and surely the

RACCH can be dimensioned such that it does not represent

a system bottleneck. Furthermore, collisions in the RACCH

are not a specific problem of our scheme. In fact, they

exist in some form in any scheme for initial acquisition

operating in a multiuser environment. Actually, schemes

based on interactive beam refinement, requiring multiple

control packets and pilot signals to be sent in both UL and

DL, are definitely more prone to such problems than the

proposed scheme. Since the RACCH is not specific of the

proposed scheme, in the following we shall assume that,

when the UE has correctly estimated its best MPC, the

control packet is received without errors. This allows to

compare different systems in a simple and direct manner,

and focus on the important and specific aspects of BA.

B. BS Channel Probing and UE Sensing

Without loss of generality, we focus on the BA procedure

for a generic UE and omit the UE index. Consider the

channel matrix Hs[ω]between the BS and the UE arrays,

as defined in Section II-D, and its beamspace representation

Hs[ω]at beacon slot s∈[T]and subcarrier ω∈[F], where

Tis the effective period of beam training.

For simplicity of exposition, we assume that the beacon

slot contains a single OFDM symbol interval.5At each

beacon slot, the BS uses its mRF chains to probe the

channel along mbeamforming vectors us,i,i∈[m], by

5The generalization to multiple OFDM symbols per slot is immediate

and slots of S≥1OFDM symbols shall be used in the numerical results.

26 3.3 Original journal article

Large separation

ω∈ F1

ω∈ F2

Pseudo-random beam sweeping (BS beacon)

Random Access Control CHannel (RACCH) slot

Data slots

Fig. 4: (Top) Frame structure of the proposed BA scheme. Notice

that the beacon, RACCH, and data slots are multiplexed in time

according to a TDD scheme. The data slot includes both DL and

UL subslots. (Bottom) Different beacon signals are orthogonally

multiplexed over disjoint sets of subcarriers ω∈ Fi,i∈[m]. In

the figure’s example we have two orthogonal beacon signals on

the “blue” and on the “orange” combs of subcarriers.

transmitting an OFDM symbol xs,i(t)along each us,i.

We design the beacon OFDM symbols xs,i(t)such that

they are mutually orthogonal in the frequency domain and

can be separated at the UE side. Thanks to orthogonality,

each beacon pilot stream provides the UE with different

measurements from the underlying channel. In particular,

for each i∈[m]we define a subset Fi⊂[F]of size

|Fi| ≤ Fsuch that Fi∩ Fi0=∅for i6=i0(see

Fig. 4). We choose each subset Fito form a “comb”

of subcarriers of equal size |Fi|=F0, with sufficient

subcarrier separation such that the corresponding channel

matrices {Hs[ω] : ω∈ Fi}are mutually uncorrelated.

A main ingredient of our proposed BA scheme is the

pseudo-random beamforming codebook transmitted by the

BS during the beacon slots, defined as the collection of sets

CBS := {Us,i :s∈[T], i ∈[m]}, where Us,i is the angle-

domain support (i.e., the subset of quantized angles in the

virtual beamspace representation) defining the directions

to which the transmit beam patterns us,i sends the signal

power. We assume |Us,i|=κu≤Mfor all (s, i). The

beamforming vectors are given by us,i =FMˇ

us,i, where

us,i =1Us,i

√κu, and where 1Us,i denotes a vector with 1

at components in the support set Us,i and 0elsewhere. An

example of such patterns with the corresponding vector ˇ

us,i

is shown in Fig. 5 (a). The pseudo-random nature of the

codebook is due to the fact that the sequences of angular

support sets {Us,i :i∈[m], s ∈[T]}are generated in a

pseudo-random manner.

The second ingredient of our proposed BA algorithm is a

local receive codebook at each UE, through which the UE

makes measurements in order to estimate the AoA-AoD

information of its strong MPCs. Each UE can customize

(locally) its own receive beamforming codebook defined

by the collection of sets CUE := {Vs,j :s∈[T], j ∈[n]},

where Vs,j is the angle-domain support defining the direc-

tions from which the receiver beam patterns vs,i collect

signal power. We assume |Vs,j |=κv≤Nfor all (s, j).

The beamforming vectors are given by vs,j =FNˇ

vs,j,

where ˇ

vs,j =1Vs,j

√κv. Similar to the parameter κuat the

transmitter side, the parameter κvcontrols the spread of

the sensing window at the UE side. This is illustrated again

in Fig. 5 (a).

During the s-th beacon slot, the UE applies the receive

beamforming vector vs,j to its j-th RF chain, obtaining the

frequency-domain received signal (after OFDM demodula-

tion) given by (7) for i∈[m]and ω∈ Fi. Note that the

mprobing signals xs,i(t)are orthogonal in the frequency

domain and therefore can be perfectly separated at the

receiver. It is convenient to write (7) directly in terms of

the beamspace representation as

ˇys,i,j[ω] = 1

√nˇ

s,j ˇ

Hs[ω]ˇ

us,i ˇxs,i[ω] + ˇzs,j[ω].(16)

The BS total transmit power Ptot is allocated equally to all

the probing streams i∈[m], all the subcarriers in ω∈ Fi,

and all the κubeamspace directions. Hence, the symbols

{ˇxs,i[ω] : ω∈ Fi}have uniform power distribution with

E[|ˇxs,i[ω]|2] = Ptot

mF 0:= Pdim (power per transmit signal

dimension). In fact, without loss of generality, we choose

the frequency-domain probing symbols to be constant and

given by ˇxs,i[ω] = √Pdim.

Considering the beamforming patterns defined by CBS

and CUE, it is not difficult to see that

|gBS

s,l,i|2=|a(θl)Hus,i|2=|a(θl)HFMˇ

us,i|2≤M

κu

(17a)

|gUE

s,l,j|2=|vH

s,jb(φl)|2=|b(φl)HFNˇ

vs,j|2≤N

κv

.(17b)

Applying the upper bounds (17a) and (17b) in (10), we

obtain the maximum possible SNR for channel estimation

in the per-subcarrier observation (16), given by

SNRCE

ABF := Pdim

MN PL

l=1 γl

κuκvσ2

=MN

κuκvmn ×B

F0∆f×SNRBBF.(18)

where F0denotes the adopted number of subcarriers and

∆fdenotes the subcarrier bandwidth.

By setting κu=κv= 1 in (18), we obtain the beam-

forming gain after BA, namely, aligning the beams along

the strongest scatterer. Moreover, (18) puts in evidence the

role of the different factors: the first term expresses the

power concentration in the spatial domain, i.e., the ratio the

maximal available beamforming gain MN, divided by the

total signal dimensions in the spatial multiplexing domain

κuκvmn over which the signal is spread; the second term

corresponds to the power concentration in the frequency

3. Initial Beam Alignment for mmWave OFDM Systems 27

domain; the third term is the SNR before beamforming,

defined in (11).

The frequency spreading factor F0and angle spreading

factors κu, κvcan be optimized depending on the specific

cell topology (e.g., on the size of the cell, which in

turn determines the worst-case SNR before beamforming).

Clearly, by making κu(resp., κv) larger, each beam pattern

probes (resp., sense) simultaneously more directions, but

the total power is spread over all such directions. In

contrast, by making κu(resp., κv) smaller, the beam

pattern explores less directions but obtains better power

concentration in the angle domain. It is also important

to notice the effect of F0: as we shall see in Section

III-C, the AoA-AoD estimator builds some sample-mean

statistics by averaging over a sufficiently large number of

uncorrelated channel fading realization over the frequency

domain. Hence, larger F0provide better averaging at the

cost of spreading the total power over more subcarriers.6

Remark 1: Angular probing schemes via random,

pseudo-random, or even adaptive codebooks can also be

found in [11, 12, 15, 21]. Our proposed codebook in

this paper can be seen as an improved version of those

schemes where the width of the beam, i.e., κvcan be

individually selected by each UE to achieve an optimal

tradeoff between angular exploration and the SNR obtained

in each measurement. ♦

C. Channel Estimation at the UE Side

The strong MPCs of the channel correspond to the

components (k, k0)in the matrix ˇ

Hs[ω]with large sec-

ond moment. An immediate consequence of the chan-

nel model definition and the standard assumption of un-

correlated MPCs is that the element second moments

E[|(ˇ

Hs[ω])k,k0|2]] are invariant both with respect to s(time)

and with respect to ω(frequency) [32]. If we had direct

access to measurements of the elements of ˇ

Hs[ω], a naive

approach would build estimators for the second moments

(sample covariance), and try to identify the largest element.

However, this would require a number of RF chains equal

to the number of antenna elements. In contrast, we have

only access to the projections ˇ

s,j ˇ

Hs[ω]ˇ

us,i from the

observation in (16).

Using ˇxs,i[ω] = √Pdim in (16), we can write the received

beacon symbol observation at the UE as

ˇys,i,j [ω] = rPdim

nˇ

s,j ˇ

Hs[ω]ˇ

us,i + ˇzs,j[ω]

=rPdim

n(ˇ

s,i ⊗ˇ

s,j)ˇ

hs[ω] + ˇzs,j[ω]

=rPdim

ngH

s,i,j ˇ

hs[ω] + ˇzs,j[ω],(19)

where ˇ

hs[ω] = vec( ˇ

Hs[ω]) denotes the vectorized

beamspace representation of the channel matrix at sub-

6This tradeoff in the choice of the spreading parameters F0and κu, κv

can be seen as an instance of the well-known exploration-exploitation

tradeoff in statistics.

carrier ω∈ Fi, where we used the well-known identity

vec(ABC) = (CT⊗A)vec(B), and where we define

the combined probing and sensing beamforming pattern

as gs,i,j =ˇ

u∗

s,i ⊗ˇ

vs,j ∈CMN , which is common across

all the subcarriers ω∈ Fibut differs for different pairs of

BS and UE RF chains (i, j).

In practice, each beacon slot is formed by a block of

S≥1OFDM symbols. With a slight abuse of notation, we

index the symbols belonging to the (s+1)-th slot as sS+s0,

for s0∈[S]. In order to estimate the average received power

at the UE j-th RF chain output due to the signal transmitted

by the BS i-th RF chain in the s-th beacon slot, we form

the averaged quadratic measurement

ˇqs,i,j =1

SF0X

s0∈[S]X

ω∈Fi|ˇysS+s0,i,j[ω]|2

=Pdim

ngH

s,i,j 

1

SF0X

s0∈[S]X

ω∈Fi

hsS+s0[ω]ˇ

hsS+s0[ω]H



gs,i,j

SF0X

s0∈[S]X

ω∈Fi|ˇzsS+s0,j[ω]|2

SF0X

s0∈[S]X

ω∈Fi

ξsS+s0,j[ω],(20)

where the first and the second terms correspond to the

signal contribution and to the noise contribution, and where

ξsS+s0,j[ω]=2rPdim

nRe gH

s,i,j ˇ

hsS+s0[ω]ˇzsS+s0

,j[ω]H

(21)

denotes the signal-noise cross term. Note that since the

AWGN noise (ˇzsS+s0,j[ω]) and the Gaussian channel coef-

ficients (ˇ

hsS+s0[ω]) are independent, the cross term has a

zero mean. Thus, when the number of dimensions S×F0

(over which the instantaneous power ˇqs,i,j is averaged) is

large, it contributes negligibly to (20) and can be treated as

a residual term in our formulation. Moreover, the empirical

covariance matrix of the channel vector converges as

SF0X

s0∈[S]X

ω∈Fi

hsS+s0[ω]ˇ

hsS+s0[ω]H

→E[hs[ω]hs[ω]H] =: Σh.(22)

Similarly, the noise term converges to

SF0X

s0∈[S]X

ω∈Fi|ˇzsS+s0,j[ω]|2→σ2.(23)

Hence, the received power in (20) gives an approximate

1-dimensional noisy projection of the covariance matrix

Σhwith respect to the combined probing and sensing

vector gs,i,j. We include the signal-noise cross term in

(21) and the difference between the empirical and statistical

averages in (22) and (23) as a residual error. This results

ˇqs,i,j =Pdim

ngH

s,i,jΣhgs,i,j +σ2+ ˇws,i,j,(24)

28 3.3 Original journal article

where ˇws,i,j is the measurement error term. When all the

AoA-AoDs lie on a discrete grid, ˇ

hs[ω]is a sparse vector

with i.i.d. components with only a few nonzero coefficients

corresponding to the scatterers. In practice, for large Mand

N,ˇ

hs[ω]is almost sparse with small clusters of non-zero

coefficients concentrated around the AoA-AoD pairs of the

strong MPCs (as illustrated in Fig. 2). Correspondingly,

also Σhis a very sparse matrix, with strong components

localized on the main diagonal.

Next, we put (24) in the form suitable for the pro-

posed AoA-AoD estimation algorithm. With reference to

Fig. 5 (a), recall the beam probing and sensing vectors

us,i =1Us,i

√κuand ˇ

vs,j =1Vs,j

√κv. With reference to Fig. 5

(b), let ˇ

Γdenote the N×Mmatrix with elements

Pdim

nκuκv

E[|[ˇ

Hs[ω]]k,k0|2]. Finally, define the NM ×1binary

vectors bs,i,j := 1Us,i ⊗1Vs,j , with components equal to

0 or 1, where the 1’s correspond to the positions in the set

Vs,j ×Us,i of the quantized AoA-AoDs pairs probed/sensed

by the beamforming vectors ˇ

vs,j and ˇ

us,i, respectively.

With these definitions, after a little algebra, we can rewrite

(24) as

ˇqs,i,j =bT

s,i,jvec(ˇ

Γ) + σ2+ ˇws,i,j.(25)

Since the BS transmits along mRF chain in each beacon

slot and the UE has nRF chains to sense the channel, the

UE obtains mn equations for the unknown vector vec(ˇ

Γ)

as in (25). Over Tbeacon slots the UE obtains mnT

equations, which can be written in the form

q=B·vec(ˇ

Γ) + σ21+ˇ

w,(26)

where the vector ˇ

q= [ˇq1,1,1, . . . ˇq1,m,n, . . . , ˇqT,m,n]Tcon-

sists of all the mnT measurements calculated as in (20), the

mnT ×MN matrix B= [b1,1,1, . . . , b1,m,n, . . . , bT,m,n]T

is uniquely defined by the beamforming codebooks CBS

and CUE, and ˇ

w∈RmnT is the residual noise and error

in the measurements. At this point, some remarks are in

order.

Remark 2: An implicit assumption made here is that

each UE is frame-synchronous with the BS, i.e., it knows,

at each beacon slot sthe subsets {Us,i :i= 1, . . . , m}

of beam directions of the BS. It is clear that a lack of

frame synchronization between a UE and the BS would

lead to a wrong construction of the measurement matrix

Bin (26). Notice however that, since the beacon patterns

are repeated periodically with some period of Tframes,

this requires only to be aware of the start epoch of the

period. This assumption is explicitly or implicitly made

in virtually all works dealing with initial beam acquisi-

tion (aka, BA problem) [8–15], as reviewed in Section I.

Therefore, this is not a particularly restrictive assumption

specific to our approach. As in most works, we assume

that such coarse frame information can be gathered from

some external source. In practice, this may be either an

overlay cell operating at some standard cellular frequency

(e.g., typically in the range of sub-6 GHz) or, for a stand-

alone small-cell mm-Wave system, by letting the cells be

frame synchronous. In this way, the UE needs to acquire the

frame period only once when it joins at first the system, at

the cost of a small additional overhead.7Then, the frame

synchronization is maintained while the UE roams from

cell to cell. ♦

Remark 3: As already remarked, there exist a fairly

large number of existing/concurrent works that make use

of pseudo-random beam patterns in order to gather linear

(compressed) measurements of the channel matrix, with the

goal of estimating the channel matrix coefficients. This is

typically obtained by using some CS technique, leveraging

the fact that the propagation channels are sparse in the

angle/delay domain [14–16, 19–21]. It is important to note

that our scheme differs from all these works in one key

fact. Namely, our scheme gathers quadratic compressed

measurements (see (24)) and not linear. While in this way

we loose the ability of estimating the complex channel

coefficients, we can estimate the channel second order

statistics in the beamspace domain, and identify the strong

MPCs in terms of the corresponding average received

signal power. This information is much more stable and

robust to variations in the channel time-dynamics than the

channel coefficients themselves. In fact, it is easy to see

that when the channel coefficients vary significantly over

the measurement time, the system of linear equations in CS

schemes tends to become unidentifiable. For example, in

the limiting case of independent channel coefficients across

the measurement slots, each new measurement depends

on a new set of coefficients, such that the number of

measurements is always less than the number of non-

zero channel coefficients. In these conditions, fundamental

information theoretic bounds show that stable reconstruc-

tion is not possible for any CS algorithm, no matter how

sparse the channel is [33]. In contrast, focusing on the

channel second-order statistics sampled by the quadratic

measurements in (24)-(26), we can always gather a number

of measurements mnT larger than the number of non-zero

coefficients in vec(ˇ

Γ), for sufficiently large T, such that

the strong components in vec(ˇ

Γ)can be identified with

high probability. ♦

As a general comment, we notice here that it is more

sensible and more robust to first estimate the beamforming

direction (e.g., via the proposed scheme) and then estimate

the beamformed channel in the regime of high SNR, rather

than trying to first estimate the channel coefficients (in low

SNR and at the mercy of the possibly large time-variations)

and then computing the beamforming coefficients.

D. Non-Negative Least-Squares

In order to identify the AoA-AoD directions of the

strong scatterers, we estimate the MN dimensional vector

vec(ˇ

Γ)from the mnT-dimensional observation given in

(26). Because of the presence of the measurement noise ˇ

a standard approach consists of solving the Least-Squares

7For example, this can be obtained by trying sequentially the different

cyclic shifts of the beacon sequence until successful alignment.

3. Initial Beam Alignment for mmWave OFDM Systems 29

−π

BS with AoD subset Us,i

−π

UE with AoA subset Vs,j

(a)

Vs,j

| {z }

Us,i

| {z }

(b)

Fig. 5: (a) Illustration of the subset of AoA-AoDs at time slot sprobed by the i-th beacon stream transmitted by the BS and

received by the j-th RF chain of the UE, for M=N= 10. The AoD subset is given by Us,i ={1,3,4,6,8,10}(numbered

counterclockwise) with beamforming vector ˇ

us,i =1

√6[1,0,1,0,1,0,1,1,0,1]T. The AoA subset is given by Vs,j ={2,4,5,7,9}

(numbered counterclockwise) with receive beamforming vector ˇ

vs,j =1

√5[0,1,0,1,1,0,1,0,1,0]T. (b) The channel gain matrix ˇ

(with two strong MPCs indicated by the dark spots) measuring along Vs,j × Us,i.

(LS) problem minˇ

ΓkB·vec(ˇ

Γ) + σ21−ˇ

qk2. However,

in general MN is significantly larger than mnT, such that

the system of equations is heavily underdetermined and the

LS solution yields meaningless results. The key observation

here is that ˇ

Γis sparse (by assumption) and non-negative

(by construction). Recent results in CS show that when the

underlying parameter ˇ

Γis non-negative, the simple non-

negative constrained LS given by

Γ∗= arg min

vec(ˇ

Γ)∈RMN

+kB·vec(ˇ

Γ) + σ21−ˇ

qk2,(27)

is still enough to impose sparsity of the solution ˇ

Γ∗

[34, 35], with no need for an explicit sparsity-promoting

regularization term in the objective function as in the

classical LASSO algorithm [36]. The (convex) optimization

problem (27) is generally referred to as Non-Negative

Least-Squares (NNLS), and has been well investigated in

the literature, starting with Donoho et al. in [37]. More

recently, in the context of CS it has been shown that the

non-negativity constraint alone might suffice to recover a

sparse non-negative signal from under-determined linear

measurements both in the noiseless case [38–41] and in

the noisy case [34, 35]. Moreover, [34] demonstrates that

NNLS has a noisy recovery performance comparable to

that of LASSO. In [34] it is also shown that NNLS along

with an appropriate thresholding provides state-of-the-art

performance in terms of support estimation. This property

is very relevant in the context of this paper, where the

identification of the support of ˇ

Γcorresponds to finding

the AoA-AoD directions strongly coupled by MPCs.

As discussed in [34], NNLS implicitly performs `1-

regularization and promotes the sparsity of the resulting

solution provided that the measurement matrix satisfies

the M+-criterion [42]. This property is beneficial for our

proposed BA scheme because of the natural sparsity of

the mm-Wave channel in AoA-AoD domain. Posed in our

framework, the measurement matrix Bfulfills the M+-

criterion if there is a vector g0∈RmnT

+such that BTg0>

0. It is not difficult to see that when g0=1is an all-one

vector of dimension mnT, the i-th component of [BTg0]i

corresponds to the number of measurement patterns that hit

the AoA-AoD pair corresponding to i∈[MN]. Hence, the

necessary condition BTg0>0can be simply interpreted

as the fact that the set of mnT measurement patterns

should hit all MN AoA-AoD pairs at least once. Also, as

stated in [42], NNLS performs better when the condition

number maxi∈[MN][BTg0]i

mini∈[MN][BTg0]iis close to 1, which is met when

the measurement patterns (i.e., the rows of B) cover the

whole set of AoA-AoDs quite uniformly. This also provides

a criterion to design good pseudo-random beamforming

codebooks for the BA problem.

In terms of numerical implementations, the NNLS can

be posed as an unconstrained LS problem over the positive

orthant and can be solved by several efficient techniques

such as Gradient Projection, Primal-Dual techniques, etc.,

with an affordable computational complexity [43], gener-

ally significantly less than CS algorithms for problems of

the same size and sparsity level. We refer to [44, 45] for

the recent progress on the numerical solution of NNLS and

a discussion on other related work in the literature.

IV. PERFORMANCE EVALUATION

In this section we evaluate the performance of our proposed

algorithm via numerical simulations. To run the NNLS

optimization in (27), we use the implementation of NNLS

in MATLAB© called lsqnonneg.m.

Channel and Signal Model. We assume f0= 70 GHz

carrier frequency and bandwidth of B= 1 GHz. The

30 3.3 Original journal article

10 20 30 40 50 60 70 80

0.2

0.4

0.6

0.8

Number of slots T

BS codebook CBS #1

BS codebook CBS #2

BS codebook CBS #3

BS codebook CBS #4

Fig. 6: Detection probability PDof the proposed scheme for

different pseudo-random codebooks (denoted by CBS), where M=

N= 32,F0= 3,m= 3,n= 2,κu=κv= 8,SNRBBF =−33

dB.

OFDM subcarrier spacing is 480 kHz in compliance with

recent 3GPP standard specifications [46, 47]. Assuming

τcp∆f= 0.07 (i.e., the CP length is 7% of the OFDM

duration), we obtain t0= 2.23 µs and around F= 2048

subcarriers (plus some guard band). We fix the frame

duration of our scheme (i.e., the repetition interval of the

beacon slot) to 1ms, consists of 448 OFDM symbols (per

subcarrier).

Abeacon slot contains S= 14 OFDM symbols, the

random access slot also contains 14 OFDM symbols, and

the remaining 420 symbols are used for data transmission

[46]. We assume that the BS has M= 32 antennas and

m= 3 RF chains, and the UE has N= 32 antennas and

n= 2 RF chains. We announce an individual experiment

to be successful if the index of the strongest component in

Γis correctly estimated (i.e., it coincides with the actual

strongest MPC AoA-AoD location, up to the discrete angle

grid quantization).

Dependence on the Random BS Codebook. We gen-

erated 4different beamforming codebooks at the BS side.

Each codebook consists of a randomly generated sequence

of patterns, identified by binary vectors of dimension

Mand Hamming weight κu, obtained by independently

sampling the set of all possible M

κusuch equal-weight

vectors. For simplicity, we consider a very sparse channel

model with only one MPC. Fig. 6 illustrates the detection

probability for the different pseudo-random codebooks,

where the angle spreading factors at the BS and the user

sides are set to κu=κv= 8, respectively. We repeated

each experiment 200 times and plot the resulting detec-

tion probability versus training period length T. Notice

that different codebooks have quite similar performances.

This demonstrates the fact that, well-known in several

CS contexts, that the scheme is quite insensitive to the

specific measurement matrix, as long as it is sufficiently

randomized.

10 20 30 40 50 60 70 80

0.2

0.4

0.6

0.8

Number of slots T

L=1

L=2

L=3

Fig. 7: Detection probability PDof the proposed scheme for

different number of paths (scatterers) L, where M=N= 32,

F0= 3,m= 3,n= 2,κu=κv= 16,SNRBBF =−33 dB

(L= 1), −32 dB (L= 2), −31 dB (L= 3).

20 40 60 80 100 120 140 160

0.2

0.4

0.6

0.8

Number of slots T

κu=κv= 4

κu=κv= 8

κu=κv= 16

κu=κv= 25

Fig. 8: Detection probability PDof the proposed scheme for

different angle spreading factors (κu,κv), where M=N= 32,

F0= 3,m= 3,n= 2,SNRBBF =−33 dB.

Performance with Different Number of Paths (Scat-

terers) L.To illustrate that our proposed scheme works

equally well for single-path and multi-path scenarios, we

repeat the simulation with different number of MPCs

L= 1,2,3, with different strengths γ1> γ2=γ3. Fig. 7

shows the performance of the proposed scheme, where we

announce an individual experiment to be successful if the

strongest path (γ1) is correctly identified. It is seen that the

scheme performs equally well for a single MPC (L= 1)

and multiple MPCs (L= 2,3), where in both cases, at most

T= 40 beacon slots are sufficient to ensure a successful

BA with high probability.

Dependence on the Angle Spreading Factors κuand

κv.The angle spreading factors κuand κvimpose a trade-

off between the angle coverage of the probing/sensing

matrix B(exploration) and its receive SNR at the user side

(exploitation). By making κu(resp., κv) larger, each beam

3. Initial Beam Alignment for mmWave OFDM Systems 31

pattern probes simultaneously more directions, but the total

power is spread over all such directions. In contrast, by

making κu(resp., κv) smaller, the beam pattern explores

less directions but obtains better power concentration in

the angle domain. This is illustrated in Fig. 8. It is seen

that increasing the spreading factor from κu=κv= 4 to

κu=κv= 8 yields a performance improvement. However,

a further increase to κu=κv= 16 slightly degrades the

performance, and the degradation is severe for even larger

values κu=κv= 25. As already remarked a few times

in this paper, the choice of the spreading factor κvat the

UE can be tailored to the individual SNR condition (e.g.,

these may depend on the distance between UE and BS).

To pinpoint this, we repeated the simulation to find the

best κvat the UE side as a function of its channel SNR.

This is illustrated in Fig. 9, reporting the best κvand the

best search time Tas function of UE SNR (assuming a

threshold of PD≥0.95 and a spreading factor of κu= 8

at the BS side). It is seen that, as expected, when a UE

enjoys a larger SNR (e.g., it is close to the BS), it should

use a larger κvin order to better explore the channel thus

reducing the search time T. In contrast, when a UE is in

low SNR conditions (e.g., it is far from the BS), it should

apply a smaller κvin order to gather measurements with

sufficiently large SNR.

Dependence on the Number of Subcarriers F0.As

explained in Section III-B and III-C, using a large number

of subcarriers F0in the beacon signals ensures a reliable

averaging of the received instantaneous power (see (24))

at the cost of a reduced SNR per subcarrier. As shown in

Fig. 10, increasing the number of subcarriers from F0= 1

to F0= 3 improves the performance, but increasing further

to F0= 10,30 degrades the performance considerably.

Dependence on the Probing Dimensions (κuκvmn).

Note that, for a certain pre-beamforming SNR, the output

of the proposed BA scheme inherently depends on the

probing dimensions, i.e., the product κuκvmn. This is

illustrated by the curves marked as (#1, #10) and (#2,

#20) in Fig. 11. For various different configurations of the

parameters, if both the number of measurements (mn) and

the probing dimensions (κuκvmn) in each slot are the

same, a similar performance is achieved. This is useful

in terms of system design where more complexity can be

pushed towards the BS (e.g., more RF chains at the BS)

while keeping the same system-level performance.

System-Level Scalability. We consider a multiuser sce-

nario where Kdenotes the number of UEs in the sys-

tem and K(T)denotes the number of UEs that have

achieved BA (i.e., that have successfully detected their

strong MPC) after Tframes. Fig. 12 compares the fraction

K(T)

Kachieved by interactive bisection scheme [11] and our

proposed scheme. In the simulations, we assume that for

[11] the users are trained one by one with an ideal cost-free

feedback in each round, whereas in our case, all the users

are trained independently and simultaneously, i.e., all the

users share the same BS pseudo-random probing codebook,

while each user use its own sensing codebook which is

also randomly generated. As we can see, the training

overhead of interactive methods scales proportionally with

the number of active users, whereas in our scheme the

overhead does not grow with the number of users. Note

that in practice, the feedback scheme for each iterative

round in [11] costs UL transmissions and may not be

ideal since the beamforming gains are very poor at the

initial rounds. In contrast, the proposed scheme needs only

one UL transmission of the RACCH packet, where the

full beamforming gain at the UE side and the sectored

beamforming gain at the BS side (as discussed in Section

III-A) are available.

Robustness w.r.t. Variations in Channel Statistics. To

investigate the sensitivity of the proposed scheme as well

as competing CS-based schemes to channel time-variations,

we consider a simple Gauss-Markov model for the channel

correlation in time given by

ρs,l =αρs−1,l +p1−|α|2νs,l, s ∈Z+,(28)

where ρ0,l ∼ CN(0, γl), where νs,l ∼ CN(0, γl)is an i.i.d.

sequence (innovation), and where |α| ∈ [0,1] controls the

channel correlation in time. This model is widely used as

a simple and intuitive way to model correlated fading (see

[48]). We assume that the channel is constant over each

beacon slot of 14 OFDM symbols, and evolves in time

according to (28) from slot to slot. More precisely, |α|= 1

yields channel coefficients constant over the whole BA

phase, while |α|= 0 yields channel coefficient changing

in an i.i.d. fashion over the beacon slots. In general, a

full range of channel time variations can be obtained by

varying |α|between 0 and 1. In Fig. 13, we compare the

detection probability of the proposed scheme with that of

other CS-based schemes presented in [15, 16]. In [15],

the instantaneous channel coefficients are estimated using

the Orthogonal Matching Pursuit (OMP) technique. In

[16] an improvement is proposed where the congruence

of the channel AoA/AoD components across a Selected

comb of subcarriers is exploited by applying a Simulta-

neous Orthogonal Matching Pursuit (SS-OMP) technique.

Fig. 13 illustrates the simulation results. It is seen that, the

proposed NNLS scheme performs much better over a wide

range of channel time-correlations whereas the OMP/SS-

OMP schemes are quite fragile in the presence of channel

time-variations.

V. CONCLUSION

In this paper, we proposed an efficient Beam Alignment

(BA) scheme for mm-Wave multiuser MIMO systems. In

the proposed scheme, the AoA/AoD of a strong MPC

component is estimated by exploring the AoA-AoD do-

main through pseudo-random multi-finger beam patterns,

and constructing an estimate of the resulting second-order

statistics (namely, the average received power for each

pseudo-random beam configuration). The resulting under-

32 3.3 Original journal article

−35 −25−15 −5 5

SNRBBF (dB)

opt(κv)

(a)

−35 −25−15 −5 5

SNRBBF (dB)

(b)

Fig. 9: (a) The optimal spreading factor opt(κv)at the UE in terms of different SNRBBF when the BS spreading factor is fixed at

κu= 8. (b) The average training slots Tthat ensures a high detection probability PD≥0.95.

10 20 30 40 50 60 70 80

0.2

0.4

0.6

0.8

Number of slots T

F′= 1

F′= 3

F′= 10

F′= 30

Fig. 10: Detection probability PDof the proposed scheme for

different number of subcarriers F0deployed for each RF chain at

the BS side, where M=N= 32,m= 3,n= 2,SNRBBF =−33

dB.

determined system of equations is efficiently solved using

NNLS, yielding naturally a sparse non-negative vector

solution whose maximum component identifies the optimal

path. In the proposed scheme, the channel is probed by

the BS by sending (pseudo-random) beamformed beacon

DL signals, and sensed by the UEs by applying (pseudo-

random) receive beam patterns. The scheme can train

simultaneously a large number of users, since it requires

no interactive (multiple rounds) bi-directional transmission

of pilots and/or control packets as in bisection methods.

Also, the scheme is robust to the channel coefficient time-

dynamics since it is based on the estimation of the channel

second order statistics (received power) rather than on

trying to estimate the complex channel coefficients, as done

in other concurrent schemes also based on random beams

and compressed sensing. Overall, the proposed scheme

provides a very competitive performance both in terms

10 20 30 40 50 60 70

0.4

0.6

0.8

Number of slots T

m=3,n=2,κu= 8,κv= 8 #1

m=6,n=1,κu= 8,κv= 8 #1′

m=3,n=1,κu= 8,κv=16 #2

m=3,n=1,κu=16,κv= 8 #2′

Fig. 11: Detection probability PDof the proposed scheme when

the product of mn and mnκuκv, i.e., the number of measure-

ments and the probing dimensions in the spatial multiplexing

domain, are constant, and where M=N= 32,F0= 3,

SNRBBF =−33 dB.

of of scalability (with respect to the number of users)

and robustness (with respect to the channel coefficient

statistics), than the state-of-art algorithms for initial beam

acquisition proposed so far.

ACKNOWLEDGMENTS

X.S. is sponsored by the China Scholarship Council

(201604910530). G.C. is supported by the Alexander von

Humboldt Foundation through a Professorship Grant, and

this work was also supported in part by a Collaborative

Research Grant of Intel Research.

REFERENCES

[1] R. W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and A. M.

Sayeed, “An overview of signal processing techniques for millimeter

3. Initial Beam Alignment for mmWave OFDM Systems 33

20 40 60 80

0.2

0.4

0.6

0.8

Number of slots T

E[K(T)

NNLS K= 6

NNLS K= 9

NNLS K= 15

Bisec K= 6

Bisec K= 9

Bisec K= 15

Fig. 12: Comparison of the performance of our proposed scheme

with that of interactive bisection method in [11] in terms of the

fraction of users whose channel is estimated until a given time

slot Tgiven by K(T)

K. We take M=N= 32,F0= 3,m= 3,

n= 2,κu=κv= 8,SNRBBF =−33 dB.

10 20 30 40 50 60 70 80

0.2

0.4

0.6

0.8

Number of slots T

NNLS α= 0

NNLS α= 0.7

NNLS α= 1

OMP α= 0

OMP α= 0.7

OMP α= 1

SS-OMP α= 0

SS-OMP α= 0.7

SS-OMP α= 1

Fig. 13: Comparison of detection probability PDbetween pro-

posed NNLS scheme (m= 3, F0= 3, κu=κv= 8), the

OMP scheme in [15] (m= 1, F0= 1, κu=κv= 16), and

the SS-OMP scheme in [16] (m= 3, F0= 3, κu=κv= 8)

for M=N= 32,n= 2 and SNRBBF =−33 dB, when the

path gains change from i.i.d. (α= 0) to constant (α= 1) over

consecutive beacon slots.

wave MIMO systems,” IEEE journal of selected topics in signal

processing, vol. 10, no. 3, pp. 436–453, 2016.

[2] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang,

G. N. Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter

wave mobile communications for 5g cellular: It will work!” Access,

IEEE, vol. 1, pp. 335–349, 2013.

[3] A. M. Sayeed, “Deconstructing multiantenna fading channels,” IEEE

Transactions on Signal Processing, vol. 50, no. 10, pp. 2563–2579,

2002.

[4] T. Nitsche, C. Cordeiro, A. B. Flores, E. W. Knightly, E. Perahia,

and J. C. Widmer, “IEEE 802.11 ad: directional 60 GHz communi-

cation for multi-Gigabit-per-second Wi-Fi,” IEEE Communications

Magazine, vol. 52, no. 12, pp. 132–141, 2014.

[5] Z. Chen and C. Yang, “Pilot decontamination in wideband massive

MIMO systems by exploiting channel sparsity,” IEEE Transactions

on Wireless Communications, vol. 15, no. 7, pp. 5087–5100, 2016.

[6] S. Haghighatshoar and G. Caire, “The beam alignment problem in

mmwave wireless networks,” in 2016 50th Asilomar Conference on

Signals, Systems and Computers, Nov 2016, pp. 741–745.

[7] V. Desai, L. Krzymien, P. Sartori, W. Xiao, A. Soong, and A. Alkha-

teeb, “Initial beamforming for mmwave communications,” in 2014

48th Asilomar Conference on Signals, Systems and Computers, Nov

2014, pp. 1926–1930.

[8] J. Wang, Z. Lan, C.-W. Pyo, T. Baykas, C.-S. Sum, M. A. Rahman,

J. Gao, R. Funada, F. Kojima, H. Harada et al., “Beam codebook

based beamforming protocol for multi-gbps millimeter-wave wpan

systems,” Selected Areas in Communications, IEEE Journal on,

vol. 27, no. 8, pp. 1390–1399, 2009.

[9] L. Chen, Y. Yang, X. Chen, and W. Wang, “Multi-stage beam-

forming codebook for 60ghz wpan,” in 2011 6th International

ICST Conference on Communications and Networking in China

(CHINACOM), Aug 2011, pp. 361–365.

[10] S. Hur, T. Kim, D. J. Love, J. V. Krogmeier, T. Thomas, A. Ghosh

et al., “Millimeter wave beamforming for wireless backhaul and

access in small cell networks,” Communications, IEEE Transactions

on, vol. 61, no. 10, pp. 4391–4403, 2013.

[11] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel

estimation and hybrid precoding for millimeter wave cellular sys-

tems,” Selected Topics in Signal Processing, IEEE Journal of, vol. 8,

no. 5, pp. 831–846, 2014.

[12] M. Kokshoorn, H. Chen, P. Wang, Y. Li, and B. Vucetic, “Millimeter

wave MIMO channel estimation using overlapped beam patterns and

rate adaptation,” IEEE Transactions on Signal Processing, vol. 65,

no. 3, pp. 601–616, 2017.

[13] P. Xia, R. W. Heath, and N. Gonzalez-Prelcic, “Robust analog

precoding designs for millimeter wave MIMO transceivers with

frequency and time division duplexing,” IEEE Transactions on

Communications, vol. 64, no. 11, pp. 4622–4634, Nov 2016.

[14] D. E. Berraki, S. M. D. Armour, and A. R. Nix, “Application

of compressive sensing in sparse spatial channel recovery for

beamforming in mmwave outdoor systems,” in 2014 IEEE Wire-

less Communications and Networking Conference (WCNC), 2014,

Conference Proceedings, pp. 887–892.

[15] A. Alkhateeb, G. Leusz, and R. W. Heath, “Compressed sensing

based multi-user millimeter wave systems: How many measurements

are needed?” in 2015 IEEE International Conference on Acoustics,

Speech and Signal Processing (ICASSP), 2015, Conference Proceed-

ings, pp. 2909–2913.

[16] J. Rodr´

ıguez-Fern´

andez, N. Gonz´

alez-Prelcic, K. Venugopal, and

R. W. Heath Jr, “Frequency-domain compressive channel estimation

for frequency-selective hybrid mmwave MIMO systems,” arXiv

preprint arXiv:1704.08572, 2017.

[17] A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen, L. Li,

and K. Haneda, “Hybrid beamforming for massive MIMO-a survey,”

arXiv preprint arXiv:1609.05078, 2016.

[18] C. N. Barati, S. A. Hosseini, S. Rangan, P. Liu, T. Korakis,

S. S. Panwar, and T. S. Rappaport, “Directional cell discovery in

millimeter wave cellular networks,” IEEE Transactions on Wireless

Communications, vol. 14, no. 12, pp. 6664–6678, 2015.

[19] J. Choi, “Beam selection in mm-wave multiuser MIMO systems

using compressive sensing,” IEEE Transactions on Communications,

vol. 63, no. 8, pp. 2936–2947, 2015.

[20] R. M´

endez-Rial, C. Rusu, N. Gonz´

alez-Prelcic, A. Alkhateeb, and

R. W. Heath, “Hybrid MIMO architectures for millimeter wave

communications: Phase shifters or switches?” IEEE Access, vol. 4,

pp. 247–267, 2016.

[21] K. Venugopal, A. Alkhateeb, N. G. Prelcic, and R. W. Heath, “Chan-

nel estimation for hybrid architecture based wideband millimeter

wave systems,” IEEE Journal on Selected Areas in Communications,

2017.

[22] R. J. Weiler, M. Peter, W. Keusgen, and M. Wisotzki, “Measuring

the busy urban 60 ghz outdoor access radio channel,” in 2014

IEEE International Conference on Ultra-WideBand (ICUWB), 2014,

Conference Proceedings, pp. 166–170.

[23] V. Va, J. Choi, and R. W. Heath, “The impact of beamwidth on

temporal channel variation in vehicular channels and its implica-

tions,” IEEE Transactions on Vehicular Technology, vol. 66, no. 6,

pp. 5014–5029, 2017.

[24] P. A. Eliasi, S. Rangan, and T. S. Rappaport, “Low-rank spatial

channel estimation for millimeter wave cellular systems,” IEEE

34 3.3 Original journal article

Transactions on Wireless Communications, vol. 16, no. 5, pp. 2748–

2759, 2017.

[25] A. F. Molisch, Wireless communications. John Wiley & Sons, 2012,

vol. 34.

[26] W. Shen, L. Dai, B. Shim, Z. Wang, and R. W. H. Jr.,

“Channel feedback based on aod-adaptive subspace codebook in

FDD massive MIMO systems,” CoRR, vol. abs/1704.00658, 2017.

[Online]. Available: http://arxiv.org/abs/1704.00658

[27] M. M´

edard and R. G. Gallager, “Bandwidth scaling for fading

multipath channels,” IEEE Transactions on Information Theory,

vol. 48, no. 4, pp. 840–852, 2002.

[28] A. Lozano and D. Porrat, “Non-peaky signals in wideband fading

channels: Achievable bit rates and optimal bandwidth,” IEEE Trans-

actions on Wireless Communications, vol. 11, no. 1, pp. 246–257,

2012.

[29] F. G´

omez-Cuba, J. Du, M. M´

edard, and E. Erkip, “Unified capacity

limit of non-coherent wideband fading channels,” IEEE Transactions

on Wireless Communications, vol. 16, no. 1, pp. 43–57, 2017.

[30] G. Durisi, U. G. Schuster, H. Bolcskei, and S. Shamai, “Noncoherent

capacity of underspread fading channels,” IEEE Transactions on

Information Theory, vol. 56, no. 1, pp. 367–395, 2010.

[31] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Compressed

channel sensing: A new approach to estimating sparse multipath

channels,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1058–1076,

2010.

[32] M. R. Akdeniz, Y. Liu, M. K. Samimi, S. Sun, S. Rangan, T. S.

Rappaport, and E. Erkip, “Millimeter wave channel modeling and

cellular capacity evaluation,” IEEE Journal on Selected Areas in

Communications, vol. 32, no. 6, pp. 1164–1179, 2014.

[33] Y. Wu and S. Verd´

u, “Optimal phase transitions in compressed

sensing,” IEEE Transactions on Information Theory, vol. 58, no. 10,

pp. 6241–6263, 2012.

[34] M. Slawski, M. Hein et al., “Non-negative least squares for high-

dimensional linear models: Consistency and sparse recovery without

regularization,” Electronic Journal of Statistics, vol. 7, pp. 3004–

3056, 2013.

[35] R. Kueng and P. Jung, “Robust nonnegative sparse recovery

and the nullspace property of 0/1 measurements,” arXiv preprint

arXiv:1603.07997, 2016.

[36] R. Tibshirani, “Regression shrinkage and selection via the lasso,”

Journal of the Royal Statistical Society. Series B (Methodological),

pp. 267–288, 1996.

[37] D. L. Donoho, I. M. Johnstone, J. C. Hoch, and A. S. Stern,

“Maximum entropy and the nearly black object,” Journal of the

Royal Statistical Society. Series B (Methodological), pp. 41–81,

1992.

[38] A. M. Bruckstein, M. Elad, and M. Zibulevsky, “On the uniqueness

of non-negative sparse & redundant representations,” in 2008 IEEE

International Conference on Acoustics, Speech and Signal Process-

ing, March 2008, pp. 5145–5148.

[39] D. L. Donoho and J. Tanner, “Counting the faces of randomly-

projected hypercubes and orthants, with applications,” Discrete &

computational geometry, vol. 43, no. 3, pp. 522–541, 2010.

[40] M. Wang and A. Tang, “Conditions for a unique non-negative

solution to an underdetermined system,” in 2009 47th Annual

Allerton Conference on Communication, Control, and Computing

(Allerton), Sept 2009, pp. 301–307.

[41] M. Wang, W. Xu, and A. Tang, “A unique “nonnegative” solution

to an underdetermined system: From vectors to matrices,” IEEE

Transactions on Signal Processing, vol. 59, no. 3, pp. 1007–1016,

2011.

[42] R. Kueng and P. Jung, “Robust nonnegative sparse recovery

and the nullspace property of 0/1 measurements,” arXiv preprint

arXiv:1603.07997, 2016.

[43] D. P. Bertsekas and A. Scientific, Convex optimization algorithms.

Athena Scientific Belmont, 2015.

[44] D. Kim, S. Sra, and I. S. Dhillon, “Tackling box-constrained

optimization via a new projected quasi-newton approach,” SIAM

Journal on Scientific Computing, vol. 32, no. 6, pp. 3548–3563,

2010.

[45] D. K. Nguyen and T. B. Ho, “Anti-lopsided algorithm for

large-scale nonnegative least square problems,” arXiv preprint

arXiv:1502.01645, 2015.

[46] “3GPP TR 38.802 V2.0.0 (2017-03) - Study on New Radio (NR)

Access Technology; Physical Layer Aspects (Release 14),” 2017.

[47] A. Ghosh. (2017) 5G mmWave Revolu-

tion & New Radio - IEEE 5G. [Online].

Available: {https://5g.ieee.org/images/files/pdf/5GmmWave\ }\\

{Webinar\IEEE\Nokia\09\20\2017\final.pdf}

[48] C. C. Tan and N. C. Beaulieu, “On first-order Markov modeling for

the Rayleigh fading channel,” IEEE Transactions on Communica-

tions, vol. 48, no. 12, pp. 2032–2040, 2000.

Xiaoshen Song (S’17) received the B.Sc. degree

in Communication Engineering from Northwest-

ern Polytechnical University, Xi’an, China, in

2013, and the M.Sc. degree in Communication

and Information Systems from the Institute of

Electronics, University of Chinese Academy of

Sciences, Beijing, China, in 2016. Her master’s

thesis focuses on video synthetic aperture radar

(VideoSAR) system design and imaging algo-

rithms. She is currently pursuing the Ph.D. de-

gree with the Communications and Information

Theory (CommIT) group at Technische Universit¨

at Berlin, Berlin, Ger-

many. Her research interests include wireless communication, mmWave

massive MIMO, and compressed sensing.

Saeid Haghighatshoar (S’12–M’15) received

the B.Sc. degree in Electrical Engineering (Elec-

tronics) in 2007 and the M.Sc. degree in Elec-

trical Engineering (Communication Systems) in

2009, both from Sharif University of Technol-

ogy, Tehran, Iran, and the Ph.D. degree in Com-

puter and Communication Sciences from ´

Ecole

Polytechnique F´

ed´

erale de Lausanne, Lausanne,

Switzerland, in 2014. Since 2015, he is a post-

doctoral researcher with Communications and

Information Theory (CommIT) group at Tech-

nische Universit¨

at Berlin, Berlin, Germany. His research interests lie in

Information Theory, Communication Systems, Wireless Communication,

Optimization Theory, and Compressed Sensing.

3. Initial Beam Alignment for mmWave OFDM Systems 35

Giuseppe Caire (S’92 – M’94 – SM’03 – F’05)

was born in Torino, Italy, in 1965. He received

the B.Sc. in Electrical Engineering from Politec-

nico di Torino (Italy), in 1990, the M.Sc. in Elec-

trical Engineering from Princeton University in

1992 and the Ph.D. from Politecnico di Torino in

1994. He has been a post-doctoral research fel-

low with the European Space Agency (ESTEC,

Noordwijk, The Netherlands) in 1994-1995, As-

sistant Professor in Telecommunications at the

Politecnico di Torino, Associate Professor at the

University of Parma, Italy, Professor with the Department of Mobile

Communications at the Eurecom Institute, Sophia-Antipolis, France, a

Professor of Electrical Engineering with the Viterbi School of Engineer-

ing, University of Southern California, Los Angeles, and he is currently

an Alexander von Humboldt Professor with the Electrical Engineering

and Computer Science Department of the Technical University of Berlin,

Germany.

He served as Associate Editor for the IEEE Transactions on Communi-

cations in 1998-2001 and as Associate Editor for the IEEE Transactions on

Information Theory in 2001-2003. He received the Jack Neubauer Best

System Paper Award from the IEEE Vehicular Technology Society in

2003, the IEEE Communications Society & Information Theory Society

Joint Paper Award in 2004 and in 2011, the Okawa Research Award

in 2006, the Alexander von Humboldt Professorship in 2014, and the

Vodafone Innovation Prize in 2015. Giuseppe Caire is a Fellow of IEEE

since 2005. He has served in the Board of Governors of the IEEE

Information Theory Society from 2004 to 2007, and as officer from 2008

to 2013. He was President of the IEEE Information Theory Society in

2011. His main research interests are in the field of communications

theory, information theory, channel and source coding with particular

focus on wireless communications.

36 3.3 Original journal article

Initial Beam Alignment for mmWave

Single-Carrier Systems

4.1 Introduction

As discussed before, the IEEE 802.11.ad standard specifies two operating modes at

60 GHz bands, i.e, the OFDM mode for high performance applications (e.g., high data

rate), and the single carrier (SC) mode for low power and low complexity implementation.

On top of the efficient BA scheme for mmWave OFDM systems provided in the last

chapter, this chapter focuses on developing a new BA scheme for mmWave SC systems.

4.2 Clarification of each authors’ contributions

This chapter is a journal publication, which is a joint work with Saeid Haghighatshoar

and Giuseppe Caire. I wrote this journal as the first author. The citation information is

in below:

X. Song, S. Haghighatshoar, and G. Caire,“Efficient beam alignment for mmWave

single-carrier systems with hybrid MIMO transceivers,” IEEE Transactions on Wireless

Communications, 2019. DOI: 10.1109/TWC.2019.2892043

All the authors contributed to this paper, but I have implemented all the experiments

and simulations. I also wrote the complete first draft (including all sections) of this

paper.

Saeid Haghighatshoar provided valuable ideas for the signaling model. He also

modified my first draft in terms of its English expressions.

38 4.3 Original journal article

Giuseppe Caire, who is my PhD supervisor, provided valuable discussions in each

meeting of this work. He also did a final modification of the overall draft.

4.3 Original journal article

The following article is a reprint of the original journal paper. It is the accepted version

of the paper. The copyright information is given in page xii of this thesis as well as in

the first page of the reprinted paper.

Efficient Beam Alignment for mmWave

Single-Carrier Systems with Hybrid MIMO

Transceivers

Xiaoshen Song, Student Member, IEEE, Saeid Haghighatshoar, Member, IEEE, Giuseppe Caire, Fellow,

IEEE

systems with hybrid MIMO transceivers,” IEEE Transactions on Wireless Communications, 2019. The published version can be found online:

https://ieeexplore.ieee.org/abstract/document/8625694. This reprint is the accepted version of the paper.

Abstract—Communication at millimeter wave (mmWave)

bands is expected to become a key ingredient of next

generation (5G) wireless networks. Effective mmWave

communications require fast and reliable methods for

beamforming at both the User Equipment (UE) and the

Base Station (BS) sides, in order to achieve a sufficiently

large Signal-to-Noise Ratio (SNR) after beamforming. We

refer to the problem of finding a pair of strongly

coupled narrow beams at the transmitter and receiver

as the Beam Alignment (BA) problem. In this paper, we

propose an efficient BA scheme for single-carrier mmWave

communications. In the proposed scheme, the BS periodically

probes the channel in the downlink via a pre-specified

pseudo-random beamforming codebook and pseudo-random

spreading codes, letting each UE estimate the Angle-of-Arrival

/ Angle-of-Departure (AoA-AoD) pair of the multipath channel

for which the energy transfer is maximum. We leverage

the sparse nature of mmWave channels in the AoA-AoD

domain to formulate the BA problem as the estimation of a

sparse non-negative vector. Based on the recently developed

Non-Negative Least Squares (NNLS) technique, we efficiently

find the strongest AoA-AoD pair connecting each UE to the

BS. We evaluate the performance of the proposed scheme

under a realistic channel model, where the propagation

channel consists of a few multipath components each having

different delays, AoAs-AoDs, and Doppler shifts. The channel

model parameters are consistent with experimental channel

measurements. Simulation results indicate that the proposed

method is highly robust to fast channel variations caused by

the large Doppler spread between the multipath components.

Furthermore, we also show that after achieving BA the

beamformed channel is essentially frequency-flat, such that

single-carrier communication needs no equalization in the

time domain.

Index Terms—mmWave, Beam Alignment, Single-Carrier,

Compressed Sensing, Non-Negative Least Squares (NNLS).

I. INTRODUCTION

The majority of existing wireless communication systems

operate in the sub-6GHz microwave spectrum, which

The authors are with the Electrical Engineering and Computer Science

Department, Technische Universit¨

at Berlin, 10587 Berlin, Germany

(e-mail: [email protected]).

The work of X. Song is sponsored by the China Scholarship Council

(201604910530).

The work of G. Caire and S. Haghighatshoar is partially funded by a

Professorship Grant of the Alexander von Humboldt Foundation and by

the EU H2020 Project SERENA.

has now become very crowded. As a result, millimeter

wave (mmWave) spectrum ranging from 30 to 300 GHz

has been considered as an alternative to achieve very

high data rates in the next generation wireless systems.

At these frequencies, a signal bandwidth of 1GHz with

Signal-to-Noise Ratio (SNR) between 0dB and 3dB yields

data rates ∼1Gb/s per data stream. A mmWave Base

Station (BS) supporting multiple data streams through the

use of multiuser Multiple-Input Multiple-Output (MIMO)

can achieve tens of Gb/s of aggregate rate, thus fulfilling

the requirements of enhanced Mobile Broad Band (eMBB)

in 5G [1, 2].

A main challenge of communication at mmWaves

is the short range of isotropic propagation. According

to Friis’s Law [3], the effective area of an isotropic

antenna decreases polynomially with frequency, therefore,

the isotropic pathloss at mmWaves is considerably larger

compared with sub-6GHz counterpart. Moreover, signal

propagation through scattering elements also suffers from a

large attenuation at high frequencies. Fortunately, the small

wavelength of mmWave signals enables to pack a large

number of antenna elements in a small form factor, such

that it is possible to cope with the severe isotropic pathloss

by using large antenna arrays both at the BS side and

the User Equipment (UE) side, providing an overall large

beamforming gain. An essential component to obtain such

large antenna gains consists of identifying suitable narrow

beam combinations, i.e., a pair of Angle of Departure

(AoD) at the BS and Angle of Arrival (AoA) at the UE,

yielding a sufficiently large beamforming gain through the

scatterers in the channel. 1The problem of finding an

AoA-AoD pair with a large channel gain is referred to

as Initial Beam Training, Acquisition, or Alignment in the

literature (see references in Section I-A). Consistently with

our previous work [4], we shall refer to it simply as Beam

Alignment (BA).

It is important to define the conditions under which the

BA operation must be performed. In this work, we focus

on MIMO devices with a Hybrid Digital Analog (HDA)

1We refer to AoD for the BS and AoA for the UE since the proposed

scheme consists of downlink probing from the BS to the UEs. Of course,

due to the propagation angle reciprocity, the role of AoA and AoD is

referred in the uplink.

4. Initial Beam Alignment for mmWave Single-Carrier Systems 39

structure. HDA MIMO is widely proposed especially for

mmWave systems, since the size and power consumption

of all-digital architectures prevent the integration of many

antenna elements on a small space. A HDA transceiver

architecture consists of the concatenation of an analog part

implementing the beamforming functions, and a digital part

implementing the baseband processing [5, 6]. This poses

some specific challenges: i) The signal received at the

antennas passes through an analog beamforming network

with only a limited number of Radio Frequency (RF)

chains, much smaller than the number of antennas. Hence,

the baseband vector of received signal samples at the

output of the physical antenna array are not simultaneously

available; ii) Due to the large isotropic pathloss, the

received signal power is very low before beamforming, i.e.,

at every antenna port. Therefore, the BA scheme must be

able to operate in very low SNR conditions; iii) Because

of the large number of antennas at both sides, the size

of the channel matrix between each UE and the BS is

very large. However, extensive measurements have shown

that mmWave channels typically exhibit a small number

of multipath components (on average of up to 3strong

components), each corresponding to a scattering cluster

with small delay / angle spreading [7, 8]. Considering the

discretization of the AoA-AoD domain according to the

UE and BS array resolution, a suitable BA scheme requires

the identification of a very sparse set of AoA-AoD pairs

coupled via strong propagation coefficient in the very

high-dimensional matrix of all possible pairs of discrete

beam directions [9, 10].

The other fundamental aspect to the BA problem

is that this is the first operation that a UE must

accomplish in order to communicate with the BS. Hence,

while coarse frame and carrier frequency synchronization

may be assumed (especially for the non-stand alone

system, assisted by some other existing cell operating

at lower frequencies), the fine timing and Doppler shift

compensation cannot be assumed. It follows that the BA

operation must cope with significant timing offsets and

Doppler shifts. In addition, in a multipath propagation

environment with paths coming from different directions,

each path may be affected by a different Doppler shift.

In multicarrier (OFDM-based) systems, this may lead

to significant inter-carrier interference, which has been

typically ignored in most of the current literature.

A. Related Work

The most straightforward BA method is an exhaustive

search, where the BS and the UE scan all the AoA-AoD

beam pairs until they find a strong one [7]. This is, however,

prohibitively time-consuming, especially considering the

very large dimension of the channel matrix due to very

large number of antennas. Several BA algorithms have been

recently proposed in the literature. All these algorithms, in

some way, aim at achieving reliable BA while using less

overhead than exhaustive search.

In [11], a two-stage pseudo-exhaustive BA scheme was

proposed, where in the first stage, the BS isotropically

probes the channel, while the UE scans its discrete beam

directions (beam sweeping) to find the best AoA. In the

second stage, the UE probes the channel along the AoA

found in the first stage, while the BS performs beam

sweeping to find the best AoD. A main limitation of [11] is

that, due to the isotropic BS beamforming in the first stage,

the scheme may suffer from a low pre-beamforming SNR

[9, 12, 13], which may impair the whole BA performance.

Some mmWave standards such as IEEE 802.11ad [14]

proposed to use multi-level hierarchical BA schemes

(e.g., see also [15–18]). The underlying idea is to start

with sectors of wide beams to do a coarse BA and

then shrink the beamwidth adaptively and successively

to obtain a more refined BA. The drawback of such

schemes, however, is that each UE has its own specific

AoA as seen from the BS side, thus, the BS needs

to interact with each UE individually. As a result, all

these hierarchical schemes require non-trivial coordination

among the UEs and the BS, which is difficult to have

at the initial channel acquisition stage. Moreover, since

hierarchical schemes require interactive uplink-downlink

communication between the BS and each individual UE,

it is not clear how the overhead of such schemes scales

in small cell scenarios with significant mobility of users

across cells, where the BA procedure should be repeated

at each handover.

The sparse nature of mmWave channels, i.e.,

large-dimension channel matrices along with very

sparse scatterers in the AoA-AoD domain [7, 8], motivates

the application of Compressed Sensing (CS) methods to

speed up the BA. There are two groups of CS-based

methods in the literature. The first group (e.g., see

[9, 19–21]) applies CS to estimate the complex baseband

channel coefficients. These algorithms are efficient and

particularly attractive for multiuser scenarios, but they are

based on the assumption that the instantaneous channel

remains invariant during the whole probing/measuring

stage. As anticipated before, this assumption is difficult

to meet at mmWaves because of the large Doppler

spread between the multipath components coming from

different angles, implying significant time-variations of

the channel coefficients even for UEs with small mobility

[10, 22, 23].2The second group of CS-based schemes

focuses on estimating the second-order statistics of the

channel, i.e., the covariance of the channel matrix, which

is very robust to channel variations. In [10] for example,

2Notice that the channel delay spread and time-variation are greatly

reduced after BA is achieved, since once the beams are aligned, the

communication occurs only through a single multipath component with

small effective angular spread, whose delay and Doppler shift can be well

compensated [23]. However, before BA is achieved the channel delay

spread and time-variation can be large due to the presence of several

mulipath components, each with its own delay and Doppler shift. In

this case, even a small motion of a few centimeters traverses several

wavelengths, potentially producing multiple deep fades [22].

40 4.3 Original journal article

aMaximum Likelihood (ML) method was proposed to

estimate the covariance of the channel matrix. However,

this scheme suffers from low SNR and the BA is achieved

only at the UE side because of isotropic probing at the

BS.

In our previous work [4], we proposed a novel efficient

BA scheme that jointly estimates the two-sided AoA-AoD

of the strongest path from the second-order statistics

of the channel matrix. A limitation of [4], as well as

most works based on OFDM signaling [9, 10], is the

assumption of perfect OFDM frame synchronization and no

inter-carrier interference. This is in fact difficult to achieve

at mmWaves due to the potentially large multipath delay

spread, Doppler shifts, and very low SNR before BA. These

weaknesses, together with the fact that OFDM signaling

suffers from large Peak-to-Average Power Ratio (PAPR),

has motivated the proposal of single-carrier transmission

[24, 25] as a more favorable option at mmWaves. Recently,

[20, 21] proposed a time-domain BA approach based

on CS techniques for single-carrier mmWave systems.

However, as in [9, 19], this work focuses on estimating

the instantaneous complex channel coefficients, with the

assumption that these complex coefficients remain invariant

over the whole training stage, which is an unrealistic

assumption, as discussed above [4, 10, 22, 23].

B. Contributions

In this paper, we propose a novel efficient BA scheme

for single-carrier mmWave communications with HDA

transceivers and frequency-selective multipath channels. In

the proposed scheme, each UE independently estimates its

best AoA-AoD pair over the reserved beacon slots (see

Section III), during which the BS periodically broadcasts

its probing time-domain sequences. We exploit the sparsity

of the mmWave channel in both angle and delay domains

[26] to reduce the training overhead. We also pose

the estimation of the strongest AoA-AoD pair as a

Non-Negative Least Squares (NNLS) problem, which can

be efficiently solved by standard techniques. Our main

contributions can be summarized as follows:

1) Pure time-domain operation. Unlike our prior work in

[4] and other works based on OFDM signaling [9, 10], the

scheme proposed in this paper takes place completely in

the time-domain and uses Pseudo-Noise (PN) sequences

with good correlation properties that suits single-carrier

mmWave systems.

2) More general and realistic mmWave channel model.

We consider a quite general mmWave wireless channel

model, taking into account the fundamental features of

mmWave channels such as fast time-variation due to

Doppler, frequency-selectivity, and the AoA-AoD sparsity

[10, 22, 27].

3) Tolerance to large Doppler shifts. As in [4, 10],

we design a signaling scheme to collect quadratic

measurements, yielding estimates of the channel

second-order statistics in the discretized AoA-AoD

domain. Since quadratic measurements are related to the

estimation of the received signal power, which is invariant

with respect to the phase rotation of the channel taps,

the proposed scheme is highly robust to the channel

time-variations caused by the large Doppler spread

between the multipath components.

4) Impact of the PN sequence length. Unlike our prior

work in [28] and the work in [6], where Doppler is

modeled as a phase rotation across different frames but

the phase is kept constant over each beacon slot, here

we consider a truly continuous linear (in time) phase

rotation within the whole beacon slot. As a by-product

of this realistic Doppler model, we notice that longer PN

sequences do not necessarily exhibit better performance

since they undergo larger phase rotations. We illustrate by

numerical simulations that there is an optimal PN sequence

length based on the given set of parameters, using which

the proposed scheme achieves better performance in the

presence of large Doppler shifts.

5) System-level scalability and low-complexity beam

direction estimation. In our scheme, the BS actively probes

the channel by periodically broadcasting a pseudo-random

beamforming codebook over reserved beacon slots while

all the UEs remain in the listening mode. Therefore, each

UE is able to collect measurements from its channel locally

and independently of all the other UEs. We pose the

identification of the strongly coupled AoA-AoD pairs based

on the measurements of each UE as an underdetermined

system of noisy linear equations and solve it efficiently

using Non-Negative Least-Squares (NNLS). Due to the

properties of the NNLS, this yields a sparse estimate of

the vector of non-negative channel gain coefficients in the

discrete AoA-AoD domain. We illustrate via numerical

simulations that the proposed scheme outperforms existing

time-domain BA algorithms proposed in the literature in

terms of training overhead. Moreover, in contrast with

hierarchical algorithms, it does not require multiple rounds

of uplink-downlink interaction between the BS and the

UEs during the BA. Therefore, the proposed scheme

is scalable, in the sense that its protocol overhead is

essentially constant with the number of active UEs in the

system.

6) Effectiveness of single-carrier modulation. Our

proposed time-domain BA scheme is tailored to

single-carrier mmWave systems. In particular, we

show that, after achieving BA, the effective channel

reduces essentially to a single path with a single delay

and Doppler shift, with relatively large SNR due to the

high beamforming gain. This means that single-carrier

modulation needs no time-domain equalization and the

baseband signal processing after BA is indeed very

simple, requiring only standard timing and carrier

frequency offset (CFO) recovery, operating in relatively

large SNR conditions (after beamforming).

Notation: We denote vectors, matrices and scalars by

a,Aand a(A) respectively. We represent sets by Aand

4. Initial Beam Alignment for mmWave Single-Carrier Systems 41



AoA

AoD

()a

()b

AoA

AoD

Fig. 1: Illustration of the channel sparsity in the Angle of Arrival

(AoA), Angle of Departure (AoD), and delay domains. (a) Slices

of the channel power spread function over discrete delay taps,

where only a few slices contain scattering components with large

power. (b) Marginal power spread function of the channel in

the AoA-AoD domain obtained from the integration of the power

spread function over the delay domain.

their cardinality with |A|. We use Efor the expectation,

⊗for the Kronecker product, ATfor transpose, A∗for

conjugate, and AHfor conjugate transpose. We define the

vectorization operator as vec(·). For an integer k∈Z, we

use the shorthand notation [k]for the index set {1, ..., k}.

II. PROBLEM STATEMENT

In this section, we provide a general overview of the BA

problem based on the channel second-order statistics. Then,

in Sections III and IV we provide the fully detailed system

model and the proposed algorithm.

A. Channel Second-Order Statistics

We consider a widely used and well accepted mmWave

scattering channel model (e.g., see [7, 8]), where the

propagation between the BS and a generic UE occurs

along a sparse collection of multipath components in the

continuous AoA-AoD-delay (φ, θ, τ)domain, including

a possible Line-of-Sight (LOS) component as well as

some Non-Line-of-Sight (NLOS) reflected paths [25].

The channel follows locally the classical Wide-Sense

Stationary with Uncorrelated Scattering (WSSUS) model

[29, 30]. The average signal energy distribution over the

AoA-AoD-Delay domain is described by the Power Spread

Function (PSF) fp(φ, θ, τ). In brief, fp(φ, θ, τ)dφdθdτ

is the aggregate signal power transfer coefficient for the

propagation paths in the AoA-AoD region [φ, φ +dφ)×

[θ, θ+dθ)with path delays in [τ, τ +dτ). The PSF encodes

the second-order statistics of the channel and it is locally

time-invariant as long as the propagation geometry does

not change significantly. The time scale over which the

PSF is time-invariant is very large with respect to the

inverse of the signaling bandwidth, justifying the local

WSS assumption. Practical channel measurements have

shown that only a few discrete delays carry significant

signal energy, corresponding to the propagation delays of

the LOS and some reflection NLOS paths [7, 8, 26]. This

is illustrated in Fig. 1 (a), where only a few slices of the

PSF with respect to the delay domain contain scattering

components with large power. The marginal PSF of the

channel in the AoA-AoD domain is obtained by integrating

over the delay variable as

fp(φ, θ) = Zτ

fp(φ, θ, τ)dτ, (1)

and it is typically very sparse in the continuous angle

domain (see, e.g., Fig. 1 (b)).

B. Beam-Alignment Using Second-order Statistics

In terms of BA, we are interested in finding an AoA-AoD

pair corresponding to strong communication path between

the UE and the BS. If the marginal PSF of the channel in

the AoA-AoD domain fp(φ, θ)in (1) was a-priori known,

the BA problem would simply boil down to finding the

support of fp(φ, θ)(e.g., see the two bubbles in Fig. 1

(b)). In practice, however, fp(φ, θ)is not known and should

be estimated via a suitable signaling scheme. With this in

mind, we can pose the BA problem as follows.

BA Problem: Design a suitable signaling between the

BS and the UE, find an estimate of the AoA-AoD PSF

fp(φ, θ), and identify an AoA-AoD pair (φ0, θ0)with

a sufficiently large strength fp(φ0, θ0).

In this paper, we use pseudo-random waveforms with

nice auto-/cross-correlation properties as the probing

signals. We will show that, using the proposed signaling,

each UE is able to collect its own quadratic measurements,

which yield noisy linear projections of a suitably

discretized version of the marginal PSF fp(φ, θ). By

expressing such linear projections as a matrix-vector

product, we formulate the PSF estimation as the

Least-Squares solution of an underdetermined system

of linear equations. Imposing the non-negativity of the

discretized PSF coefficients, we are in the presence of a

NNLS problem, which naturally yields a sparse solution

[31, 32].

Fig. 2 (a) illustrates the proposed frame structure which

consists of three parts: the downlink beacon slot, the

Random Access Control CHannel (RACCH) slot, and the

data slot. An overview of the proposed initial acquisition

and BA protocol is illustrated in Fig. 2 (b). As in [4, 28],

the measurements are collected by the UEs from the

sequence of downlink beacon slots broadcasted by the

BS. By running the NNLS estimation algorithm mentioned

above, each UE selects its strongest AoA-AoD pair, i.e., the

discrete beam indices corresponding to the strongest path in

the estimated discretized PSF. Then, the initial acquisition

protocol proceeds as described in [4, 28]. Namely, the UE

sends a beamformed packet to the BS in the RACCH slot,

during which the BS stays in listening mode and uses

its MRF RF chains to form MRF coarse beam patterns

(sectors) covering the whole BS angle domain, in order

to provide some receiver beamforming gain. The RACCH

packet contains basic information such as user ID and the

42 4.3 Original journal article

where (φl, θl, τl, νl)denote the AoA, AoD, delay, and

Doppler shift of the l-th component, and δ(·)denotes

the Dirac delta function. The vectors aT(θl)∈CMand

aR(φl)∈CNare the array response vectors of the BS

and UE at AoD θland AoA φlrespectively, with elements

given by

[aT(θ)]m=ej(m−1)πsin(θ), m ∈[M],(3a)

[aR(φ)]n=ej(n−1)πsin(φ), n ∈[N],(3b)

where we assume that the spacing of the ULA antennas

equals to half wavelength.

For the sake of modeling simplicity, we assume in (2)

that each multipath component has a very narrow footprint

over the AoA-AoD and delay domain. Extension to more

widely spread multipath clusters is straightforward and

will be applied in the numerical simulations. Moreover,

we make the very standard assumption in array processing

that the array response vectors are invariant with frequency

over the signal bandwidth. More precisely, we assume

that the wavelength λover the frequency interval f∈

[f0−B/2, f0+B/2] can be approximated as λ0=

c/f0, where cdenotes the speed of light. This is indeed

well verified when Bis less than 1/10 of the carrier

frequency (e.g., B= 1 GHz with carrier between 30 and 70

GHz). Each scatterer corresponding to a AoA-AoD-Delay

(φl, θl, τl)has a Doppler shift νl=∆vlf0

cwhere ∆vl

indicates the relative speed of the receiver, the l-th scatterer,

and the transmitter [6]. We adopt a block fading model,

where the coefficient of the l-th multipath component

ρs,l remains invariant over the channel coherence time

∆tcbut change i.i.d. randomly across different coherence

times [10]. Each scatterer is formed by the superposition

of a large number of micro-scattering components (e.g.,

due to rough surfaces) having (approximately) the same

AoA-AoD and delay. By the central limit theorem it

is customary to model the superposition of these many

small effects as Gaussian [29, 30]. Hence, the multipath

component coefficients are modeled as Rice fading given

ρs,l ∼√γlrηl

1 + ηl

√1 + ηl

ˇρs,l,(4)

where γldenotes the overall multipath component strength,

ηl∈[0,∞)indicates the strength ratio between the

LOS and the NLOS components, and ˇρs,l ∼ CN(0,1)

is a zero-mean unit-variance complex Gaussian random

variable. In particular, ηl→ ∞ indicates a pure LOS path

while ηl= 0 indicates a pure NLOS path, affected by

standard Rayleigh fading.

B. Proposed Signaling Scheme

We assume that the BS can simultaneously transmit up

to MRF Mdifferent pilot streams. In our previous

work [4], we considered OFDM signaling where the

different pilot streams are assigned to non-overlapping sets

of orthogonal subcarriers, such that (in the absence of

inter-carrier interference) they can be perfectly separated

by the UE in the frequency domain. However, such scheme

may incur performance degradation in the presence of

significant Doppler spread between the different multipath

components and/or carrier frequency offset between the

BS transmitter and UE receiver. Hence, in this work

we consider single-carrier signaling where different PN

sequences are assigned to each pilot stream, similar

to Code Division Multiple Access (CDMA). In the

proposed scheme, the different pilot streams are generally

non-orthogonal but the cross-interference is very small if

the assigned PN sequences are sufficiently long with good

cross-correlation properties. As we shall see, this signaling

scheme yields very good robustness to Doppler.

Let xs,i(t),t∈[st0,(s+ 1)t0), be the continuous-time

baseband equivalent PN signal corresponding to the i-th

(i∈[MRF]) pilot stream transmitted over s-th slot, given

xs,i(t) =

n=1

%n,ipr(t−nTc), %n,i ∈ {1,−1},(5)

where t0denotes the duration of the PN sequence, pr(t)

is a square-root Nyquist pulse4[33] with normalized

energy R|pr(t)|2dt = 1, and {%n,i :n∈[Nc]}is

the n-th chip symbol. The PN sequence has a chip

duration Tc, bandwidth B0≈1/Tc≤B(where B

denotes the maximum available channel bandwidth), and

a total of Nc=t0/Tcchips. We shall choose a suitable

sequence length Nc, such that the time-domain signal (5)

is transmitted in a sufficiently small time-interval t0over

which the channel can be considered time-invariant, i.e.,

t0∆tc.

To transmit the i-th pilot stream, the BS applies

a beamforming vector us,i ∈CM. Without loss of

generality, the beamforming vectors are normalized such

that kus,ik= 1. As mentioned before, we consider a

HDA beamforming architecture where the beamforming

function is implemented in the analog RF domain. Hence,

the beamforming vectors us,i,i∈[MRF], are independent

of frequency and constant over the whole bandwidth. The

transmitted signal at slot sis given by

xs(t) =

MRF

i=1 rPtotTc

MRF

xs,i(t)us,i

MRF

i=1

n=1 rPtotTc

MRF

%n,ipr(t−nTc)us,i,(6)

where Ptot is the total transmit power which is equally

distributed into the MRF RF chains from BS. The term

Ptot Tc

MRF indicates the energy per chip of the transmitted

PN sequences, where Tcdenotes the chip duration.

4A square-root Nyquist pulse is a finite-energy waveform pr(t)such

that the squared magnitude of its spectrum |Pr(f)|2satisfies the Nyquist

criterion [33].

44 4.3 Original journal article

Consequently, the received baseband equivalent signal at

the UE array is

rs(t) = ZHs(t, τ)xs(t−τ)dτ

l=1

Hs,l(t)xs(t−τl)

MRF

i=1

l=1 rPtotTc

MRF

Hs,l(t)xs,i(t−τl)us,i,(7)

where Hs,l(t) := ρs,lej2πνltaR(φl)aT(θl)H,l∈[L]are the

time-varying MIMO channel taps corresponding to the L

multipath components as in (2).

With a hybrid MIMO structure, the UE does not have

direct access to (a sampled version of) the components of

rs(t). Instead, at each beacon slot s, the UE must apply

some beamforming vector in the analog domain obtaining a

projection of the received signal. Since the UE has NRF RF

chains, it can obtain up to NRF such projections per slot.

The analog RF signal received at the UE antenna array

is distributed across the NRF RF chains for demodulation.

This is achieved by signal splitters that divide the signal

power by a factor of NRF. Thus, the received signal at the

output of the j-th RF chain at the UE side is given by

ˆys,j (t) = 1

√NRF

s,jrs(t) + zs,j(t)

MRF

i=1

l=1pEdimvH

s,jHs,l(t)us,ixs,i(t−τl)+zs,j (t),

(8)

where Edim =Ptot Tc

MRFNRF indicates the per-stream pilot chip

energy distributed over the transmit and receive RF chains,

vs,j ∈CNdenotes the normalized beamforming vector of

the j-th RF chain at the UE side with kvs,jk= 1,zs,j(t)

is the continuous-time complex Additive White Gaussian

Noise (AWGN) at the output of the j-th RF chain, with a

Power Spectral Density (PSD) of N0Watt/Hz. The noise

at the receiver is mainly introduced by the RF chain

electronics, e.g., filter, mixer, and A/D conversion. The

factor 1

√NRF in (8) takes into account the power split said

above, assuming that this only applies to the useful signal

and not to the thermal noise. Therefore, this received signal

model is a conservative worst-case assumption.

In realistic conditions, we have Tcνl1.5Hence, the

phase time-variation over the duration of the chip pulse

shape is negligible. It follows that we can replace the

continuously time-varying matrix tap coefficient Hs,l(t)

with its discrete approximation, which can be simply

written in the form

Hs,l(t)t∈[nTc,(n+1)Tc)≈ρs,lej2π(ˇνs,l+νlnTc)aR(φl)aT(θl)H

=Hs,lej2πνlnTc(9)

5For example, consider Tc= 1 ns, ∆vl= 10 m/s and f0= 60 GHz

yielding νl= 2 kHz and Tcνl= 2 ·10−6.

with n∈[Nc], where Hs,l := ρs,lej2πˇνs,l aR(φl)aT(θl)H,

and where ˇνs,l represents a phase rotation at the beginning

of the s-th beacon slot which is irrelevant since it can be

incorporated in the Gaussian coefficient ρs,l. As a result,

the product term Hs,l(t)xs,i(t−τl)in (8) can be written

Hs,l(t)xs,i(t−τl) = Hs,l

n=1

%n,iej2πνlnTcpr(t−nTc−τl)

:= Hs,lxl

s,i(t−τl),(10)

where xl

s,i(t)is given by

s,i(t) =

n=1

%n,iej2πνlnTcpr(t−nTc).(11)

Notice that xl

s,i(t)consists of a modified modulated PN

sequence where the chip symbols %n,i are rotated by the

time-varying phase factor ej2πνlnTcdue to the Doppler

shift. Substituting (10) into (8), we can write the received

signal ˆys,j(t)in (8) as

ˆys,j (t)=

MRF

i=1

l=1pEdimvH

s,jHs,lus,ixl

s,i(t−τl)+zs,j(t).

(12)

Since the PN sequences assigned to the MRF RF

chains are mutually (roughly) orthogonal, the MRF pilot

streams transmitted from the BS side can be approximately

separated at the UE by passing each j-th received signal

(12) through a bank of matched filters where the ifilter

has impulse response x∗

s,i(−t) = PNc

n=1 %n,ip∗

r(−t+nTc).

Consequently, the i-th BS pilot stream received through the

j-th RF chain at the UE is given by

ys,i,j(t) = Zˆys,j(τ)x∗

s,i(τ−t)dτ

l=1

MRF

i0=1

pEdimvH

s,jHs,lus,iRxl

i0,i(t−τl)+zc

s,j(t)

(a)

≈

l=1 pEdimvH

s,jHs,lus,iRxl

i,i(t−τl)+zc

s,j(t)

(13)

where ∀i, i0∈[MRF],Rxl

i0,i(t) := Rxl

s,i0(τ)x∗

s,i(τ−t)dτ

represents the correlation between the Doppler-rotated

sequence xl

s,i0(t)given by (11) and the desired sequence

xs,i(t), and zc

s,j(t) = Rzs,j(τ)x∗

s,i(τ−t)dτ denotes the

noise at the output of the matched filter. The approximation

(a)in (13) follows the fact that, the cross-correlations

between different PN sequences are nearly zero, i.e.,

,i(t) = Rxs,i0(τ)x∗

s,i(τ−t)dτ ≈0, for i06=i. Since

the phase rotation introduced by Doppler is very small

(νlTc1), we can also safely assume that Rxl

,i(t) =

Rxl

s,i0(τ)x∗

s,i(τ−t)dτ ≈0, for i06=i. However, it

is important to point out that these are only working

assumptions in order to derive our algorithm. The actual

4. Initial Beam Alignment for mmWave Single-Carrier Systems 45

performance of the scheme will of course depend also on

the residual non-zero cross-interference between the PN

sequences. Hence, in our numerical simulations, we made

no such simplification and took into account all the cross

terms arising from non-perfect orthogonality.

Consider (13) and suppose that the output signal at

the UE side is sampled at the chip-rate. The resulting

discrete-time signal can be written as

ys,i,j[k] = ys,i,j(t)|t=kTc

l=1pEdimvH

s,jHs,lus,iRxl

i,i(kTc−τl)+zc

s,j[k],

(14)

where k∈[ˇ

Nc]indicates the sampling index, ˇ

Nc≥

Nc+∆τmax

Tcdenotes the total number of samples in

each received PN sequence, and ∆τmax = max{|τl−

τl0|:l, l0∈[L]}denotes the maximum delay spread of

the channel. Note that for PN sequences, the sequence

of samples {|Rxl

i,i(kTc−τl)|:k∈[ˇ

Nc]}in (14) has

sharp peaks at indices kl≈τl

Tc, corresponding to the

delays of the channel multipath components. Intuitively

speaking, the output ys,i,j[k]at those indices klyields

Gaussian variables whose power is obtained by projecting

the AoA-AoD-Delay PSF fp(φ, θ, τ)along beamforming

vectors (us,i,vs,j)in the angular domain and along the

kl-th delay slice τ∈[klTc,(kl+ 1)Tc]. The slicing in the

delay domain results from the fact that, as said before,

|Rxl

i,i(kTc−τl)|is well localized around kl. We refer to

Fig. 1 (a) for an illustration and will use this property later

on in the paper to design our BA algorithm.

C. Sparse Beam-space Representation

The AoA-AoDs (φl, θl)in (2) take on arbitrary values

in the continuous AoA-AoDs domain. Following the

widely used approach of [34], known as beam-space

representation, we obtain a finite-dimensional

representation of the channel response (2) by discretizing

the angle domain. Consider the discrete set of AoA-AoDs

Φ := {ˇ

φ: (1 + sin(ˇ

φ))/2 = n−1

N, n ∈[N]},(15a)

Θ := {ˇ

θ: (1 + sin(ˇ

θ))/2 = m−1

M, m ∈[M]}.(15b)

The corresponding sets of array responses AR:= {aR(ˇ

φ) :

φ∈Φ}and AT:= {aT(ˇ

θ) : ˇ

θ∈Θ}form discrete

dictionaries to represent the channel response. For the

ULAs considered in this paper, the dictionaries ARand

AT, after suitable normalization, reduce to the columns

of unitary Discrete Fourier Transform (DFT) matrices

FN∈CN×Nand FM∈CM×M, with elements

[FN]n,n0=1

√Nej2π(n−1)( n0−1

N−1

2), n, n0∈[N],(16a)

[FM]m,m0=1

√Mej2π(m−1)( m0−1

M−1

2), m, m0∈[M].

(16b)

The channel beam-space representation consists of

expressing the channel matrix as the linear combination of

the outer product of rank-1 matrices of the form fN,nfH

M,m

for all n∈[N]and m∈[M], where fN,n and fM,m denote

the n-th and m-th columns of FNand of FM, respectively.

Explicitly, the beam-space representation expression is

given by

Hs(t, τ) =

n=1

m=1 ˇ

Hs(t, τ)n,m fN,nfH

M,m

=FNˇ

Hs(t, τ)FH

M,(17)

where the beam-space representation of the channel

response is given by

Hs(t, τ) = FH

NHs(t, τ)FM=

l=1

Hs,l(t)δ(τ−τl),(18)

where ˇ

Hs,l(t) := FH

NHs,l(t)FMcorresponds to the

beam-space l-th channel path.

As shown in our earlier work [4], as the number of

antennas Mat the BS and Nat the UE increases, the

DFT basis provides a good sparsification of the propagation

channel. As a result, ˇ

Hs(t, τ)can be approximated as

a sparse matrix, with non-zero elements in the locations

corresponding to small clusters of discrete AoA-AoD pairs

in the proximity of the (continuous) angle pairs of the L

scatterers of the physical channel. We may encounter a grid

error in (18) since the AoAs/AoDs do not necessarily fall

into the uniform grid Φ×Θ. Nevertheless, as shown in [4],

the grid error becomes negligible by increasing the number

of antennas (i.e., the grid resolution). We hasten to say that,

in our simulations, we do not constrain the AoA-AoD pairs

of the physical channel to take on values on the discrete

grid; therefore, the grid discretization effects is fully taken

into account in our numerical results.

IV. PROPOSED BEAM ALIGNMENT SCHEME

A. BS Channel Probing and UE Sensing

Consider the scattering channel model in (2) and its

beam-space representation in (18). In our proposed scheme,

at each beacon slot s, the BS probes the channel along MRF

beamforming vectors us,i,i∈[MRF], each of which is

applied to a unique PN sequence signal xs,i(t). We select

the beamforming vectors at the BS side according to a

pre-defined pseudo-random codebook, which is a collection

of the angle sets CT:= {Us,i :s∈[T], i ∈[MRF]}, where

Us,i denotes the angle-domain support of the beamforming

vector us,i, i.e., the indices of the quantized angles in

the beam-space representation of us,i, and where Tis

the effective period of beam training. We assume that

the beamforming vector us,i sends equal power along the

directions in Us,i with the number of active angles given by

|Us,i|=: κu≤M, which we assume to be the same for all

(s, i). We call κuthe angle spreading factor with respect

46 4.3 Original journal article

to the transmit beamforming vectors. Consequently, we can

write such beamforming vectors as us,i =FMˇ

us,i, where

us,i =1Us,i

√κu, and where 1Us,i denotes a vector with 1at

components in the support set Us,i and 0elsewhere. One

can simply imagine the vector ˇ

us,i as a multi-finger beam

pattern in the angle-domain as illustrated in Fig. 3 (a).6

We assume that the angle indices in Us,i in the codebook

CTare a priori generated in a random manner and are a

priori known to all UEs in the system. This is similar

to the BS-dependent pseudo-random synchronization codes

used in the 3G WCDMA standard [36]. Thus, we call CT

a pseudo-random codebook.

At the UE side, each UE can locally customize its own

receive beamforming codebook defined as CR:= {Vs,j :

s∈[T], j ∈[NRF]}, where Vs,j, with |Vs,j|=κv≤N

for all (s, j), is the angle-domain support, defining the

directions from which the receiver beam patterns collect

the signal power. We define the beamforming vectors at

the UE side by vs,j =FNˇ

vs,j, where ˇ

vs,j =1Vs,j

√κvagain

defines the finger-shaped beam patterns as shown in Fig. 3

(a). Similar to the power spreading factor κuat the BS,

the parameter κvcontrols the spread of the sensing beam

patterns at the UE.

In our scheme, the UEs collect their measurements

independently and simultaneously, without any influence

or coordination to each other. Therefore, the scheme is

quite scalable for multiuser scenarios, where the overhead

of training all the UEs does not increase with the number

of UEs. This represents a significant advantage with

respect to traditional multi-level/interactive BA schemes,

that require multiple beam-sweeping rounds and interactive

data exchanges between the BS and each UE, such that the

acquisition protocol overhead grows proportionally to the

number of UEs being acquired.

B. UE Measurement Sparse Formulation

During the s-th beacon slot, the UE applies the receive

beamforming vector vs,j to its j-th RF chain. Assuming

that the probing PN signals xs,i(t)are approximately

orthogonal in the time domain as discussed before, each

RF chain at the UE side can almost perfectly separate the

transmitted MRF pilot streams. Thus, using the beam-space

representation of the channel in (18), we can write (14) as

ys,i,j[k]=

l=1pEdim ˇ

s,j ˇ

Hs,l ˇ

us,iRxl

i,i(kTc−τl)+zc

s,j[k],

(19)

where ˇ

us,i =FH

Mus,i and ˇ

vs,j =FH

Nvs,j are the

beamforming vectors in the beam-space domain. Here,

we used the unitary property of the DFT matrices, i.e.,

6Note that, in our scheme, the beamforming vectors result from a

uniform linear combining of the DFT vectors. Further optimization of

the beamforming vectors with non-uniform combining [35] is possible.

However, this goes outside the scope of the present work and it is left for

future investigation.

MFM=IMand FH

NFN=IN, where IMand INare

identity matrices of dimension Mand Nrespectively.

To formulate the sparse estimation problem, we define

the vectors ˇ

hs,l = 1/√NRF ·vec(ˇ

Hs,l),l∈[L], resulting

in a reformulated channel matrix ˇ

Hs= [ˇ

hs,1,··· ,ˇ

hs,L]

that collects all the channel coefficients in the beam-space

domain. We also define a vector ci

k= [Rx1

i,i(kTc−

τ1),··· , RxL

i,i (kTc−τL)]T·√Edim, which can be regarded

as the Power Delay Profile (PDP) of the i-th pilot stream

transmitted along the Lpaths and sampled at the k-th

discrete delay tap kTc. Consequently, we can express the

received beacon signal (19) at the UE as

ys,i,j[k] =

l=1pEdim ˇ

s,j ˇ

Hs,l ˇ

us,iRxl

i,i(kTc−τl)+zc

s,j[k]

= (ˇ

us,i ⊗ˇ

v∗

s,j)Tˇ

Hsci

k+zc

s,j[k]

=gT

s,i,j ˇ

Hsci

k+zc

s,j[k],(20)

where we used the well-known identity vec(ABC) =

(CT⊗A)vec(B), and where gs,i,j := ˇ

us,i ⊗ˇ

v∗

s,j

denotes the combined beam-space representation of the

beamforming vectors corresponding to the i-th RF chain

at the BS and the j-th RF chain at the UE.

Next, we introduce a slight generalization of the scheme

illustrated so far, by allowing the repetition of the PN

beacon sequences S≥1times during each beacon slot (see

Fig. 2 (a)). Hence, each beacon slot consists of Ssubslots,

each of which contains a PN sequence transmission as

explained above. Since beamforming is implemented in

the analog RF domain, it is typically impractical to switch

the beamforming pattern during the beacon slot. Hence,

we assume that the combined beamforming vector gs,i,j

remains constant over the Ssubslots, whereas ˇ

Hschanges

because of the Doppler shifts νl. Over different beacon

slots, in contrast, the beamforming vector gs,i,j changes

periodically according to the pre-defined pseudo-random

beamforming codebook Us,i ×Vs,j as said before. In order

to accommodate for this extension, with a slight abuse of

notation, we index the received subslots belonging to the

s-th beacon slot as sS +s0,s0∈[S], where the index s

labels the beacon slots and the index s0labels the subslots

inside each beacon slot. It follows that the received signal

through the i-th RF chain at the BS and the j-th RF chain

at the UE after matched filtering (refer to (20)) can be

written as

ysS+s0,i,j[k] = gT

s,i,j ˇ

HsS+s0ci

k+zc

sS+s0,j[k].(21)

As anticipated in Section I, in order to ensure a

robust scheme with respect to fast channel variations [10],

we focus on the second-order statistics of the channel

coefficients. More specifically, we accumulate the energy

at the output of the matched filter across all the ˇ

discrete delay taps, by computing the following quadratic

measurements in (22), where the first two terms correspond

to the useful signal and noise contributions, respectively,

4. Initial Beam Alignment for mmWave Single-Carrier Systems 47

qs,i,j =1

s0=1

ˇqsS+s0,i,j

=gT

s,i,j 



l=1 1

s0=1

hsS+s0

,l ˇ

sS+s0

,l!ˇ

k=1

Edim|Rxl

i,i(kTc−τl)|2



g∗

s,i,j (23a)

k=1 1

s0=1 |zc

sS+s0,j[k]|2!+ws,i,j,(23b)

have variance N0Rx(0) [33]. Hence, we can assume the

approximation

s0=1 |zc

sS+s0,j[k]|2≈E[|zc

sS+s0,j[k]|2] = N0Rx(0),

(26)

which holds true in the limit of large S. Using the definition

of Γin (25), the approximations (24) and (26), and defining

the NM-dimensional binary vectors

bs,i,j := gs,i,j√κuκv=1Us,i ⊗1Vs,j ,(27)

we can eventually write (23) in the convenient compact

form

qs,i,j =bT

s,i,jvec(Γ) + ˇ

NcN0Rx(0) + ews,i,j,(28)

where ews,i,j collects the error term ws,i,j plus all the

residual errors incurred by the above approximations.7

Notice that bs,i,j contains “ones” in the positions

corresponding to the discrete angle support Us,i × Vs,j

from the beamforming codebook, while it contains “zeros”

everywhere else. Hence, the inner product bT

s,i,jvec(Γ)

corresponds effectively to collecting all the signal energy

received from the AoA-AoD pairs indexed by the angular

support Us,i ×Vs,j. An example of the probing geometry

is illustrated in Fig. 3 (b).

In order to gain insight on the role of the algorithm

parameters κu, κv, MRF, NRF, and Tc, it is useful to

compare the SNR before beamforming (BBF) with the

SNR associated to each of the measurements in (28). We

define the SNR BBF as

SNRBBF =Ptot PL

l=1 γl

N0B.(29)

This is the ratio of the total received signal power

(summing over all the multipath components) over the

total noise power at the receiver baseband processor input,

with total bandwidth B. As mentioned before, one of

the challenges of BA and in general communication at

mmWaves is that the SNR before beamforming SNRBBF

7We should point out here again that the goodness of such

approximations reflects in the variance of the error term ews,i,j . The fact

that our algorithm works very well in very low SNR conditions (see

Section V) confirms that all the working assumptions made here are valid

and justified.

in (29) is typically very low. The average SNR of

the measurements in (28), with average taken over the

randomness of the beam codebook and the channel, can

be qualitatively quantified as

SNRMEA =PtotTcPL

l=1 γl·MN

κuκvMRFNRFN0

.(30)

This quantity is explained as follows: the energy per

chip PtotTcis uniformly spread over the angular

fraction κuκv/(MN)and over the MRFNRF measurements

obtained in each beacon slot. Comparing (29) and

(30) prompts to the following qualitative observations:

i) by making the product κuκvlarge, we explore

simultaneously more angle directions, but the signal power

is spread over a broader angle such that the SNR per

measurement decreases. Therefore, we expect the existence

of an exploration/exploitation trade-off with respect to

the product κuκv(as noticed in [4]). ii) The scheme

gathers MRFNRF new measurements for each beacon slot,

but the SNR per measurements decreases with MRFNRF.

Hence, a similar exploitation/exploration trade-off exists

with respect to the number of RF chains used in the BA

algorithm (see also [4]). iii) By making Tclarger than

1/B, the signal power is effectively concentrated in a

bandwidth 1/Tc< B. This energy accumulation in the

frequency domain improves the SNR per measurement.

However, given a total pilot signal duration, increasing

Tcdecreases the number of chips of the PN sequence

such that the cross-interference between PN sequences and

their delayed versions increases. Therefore, there exists a

trade-off between energy concentration in the frequency

domain and self-interference in the system, reflected in the

variance of the error term ews,i,j.

C. Path Strength Estimation via Non-Negative Least

Squares

After Tbeacon slots, the UE obtains a total number of

MRFNRFTequations, given by

q=B·vec(Γ) + ˇ

NcN0Rx(0) ·1+e

w,(31)

where the vector q= [q1,1,1, . . . q1,MRF,NRF , . . . , qT,MRF,NRF ]T∈

RMRFNRFTconsists of all MRFNRFT

measurements achieved as in (28),

B= [b1,1,1, . . . , b1,MRF,NRF , . . . , bT,MRF,NRF ]T∈

4. Initial Beam Alignment for mmWave Single-Carrier Systems 49

RMRFNRFT×MN is uniquely defined by the pseudo-random

beamforming codebook of the BS and the local

beamforming codebook of the UE, and e

w∈RMRFNRFT

denotes the residual error.

In order to identify the strong AoA-AoD quantized

directions, the UE needs to estimate the MN-dim vector

vec(Γ)from the MRFNRFT-dim observation (31) in

presence of the measurement noise e

w, where in general,

MN is significantly larger than MRFNRFT. There are

a great variety of algorithms to solve (31) in the

Least-Squares sense. The key observation here is that Γ

is sparse (by the sparse nature of mmWave channels) and

non-negative (by the second-order statistic construction of

our scheme). As discussed in our previous work [4], recent

results in CS show that when the underlying parameter Γ

is non-negative, the simple non-negative constrained Least

Squares (LS) given by

Γ?= arg min

Γ∈RN×M

+kB·vec(Γ) + ˇ

NcN0Rx(0) ·1−qk2,

(32)

is sufficient to yield a sparse solution Γ?[31, 32], without

the need for an explicit sparsity-promoting regularization

term in the objective function as for example in the

classical LASSO algorithm [37]. The (convex) optimization

problem (32) is generally referred to as Non-Negative

Least Squares (NNLS), and has been well investigated

in the literature. As discussed in [31], NNLS implicitly

performs `1-regularization and promotes the sparsity of the

resulting solution provided that the measurement matrix

Bsatisfies the M+-criterion [32], i.e., there exits a

vector d∈RMRFNRFT

+such that BTd>0. In our

case, this criterion can be simply interpreted as the fact

that the set of MRFNRFTmeasurement beam patterns

should hit all the MN AoA-AoD pairs at least once,

which is almost fully satisfied in our scheme because of

the random finger-shaped beam patterns, also because of

the pseudo-random property of the designed beamforming

codebook.

In terms of numerical implementation, the NNLS can be

posed as an unconstrained LS problem over the positive

orthant and can be solved by several efficient techniques

such as Gradient Projection, Primal-Dual techniques,

etc., with an affordable computational complexity [38],

which is generally significantly less than conventional CS

algorithms for problems of the same size and sparsity

level. We refer to [39, 40] for the recent progress on the

numerical solution of NNLS and a discussion on other

related work in the literature.

V. PERFORMANCE EVALUATION

We consider a system with M= 32 antennas, MRF = 3 RF

chains at the BS, and N= 32 antennas, NRF = 2 RF chains

at a generic UE. We assume a short preamble structure used

in IEEE 802.11ad [20, 41], where the beacon slot is of

duration t0S= 1.891 µs. The system is assumed to work

at f0= 70 GHz, has a maximum available bandwidth of

B= 1.76 GHz, namely, each beacon slot amounts to more

than 3200 chips as in [20, 21]. We assume the channel

contains L= 3 links given by (γl= 1, η1= 100),(γ2=

0.6, η2= 10) and (γ3= 0.6, η3= 0), where γldenotes the

scatterer strength, ηlindicates the strength ratio between

the LOS and the NLOS propagation as in (4). Thus, the first

scatterer can be roughly regarded as the LOS path, while

the remaining scatterers represent the NLOS paths. This

is consistent with the practical mmWave MIMO channel

measurements in [27], where the relative power level of

the NLOS path is around 10 dB lower than the desired

LOS path. We assume that the relative speed ∆vlfor each

path is in the range 0∼8m/s. We announce a success if

the location of the strongest component in Γ?(see (32))

coincides with the LOS path.8

In the following simulations,9we evaluate the

performance of our time-domain BA scheme according to

three viewpoints: i) We study the effect of various scheme

parameters on the achieved BA probability; ii) We show

the superiority of our proposed scheme in comparison with

other recently proposed time-domain BA schemes [20, 21];

iii) We consider the effectiveness of the BA scheme in

the context of single-carrier modulation. To tackle the

latter aspect we compute upper and lower bounds on the

ergodic achievable rate for the effective SISO channel

between the BS and the UE after BA. These bounds

show that BA yields essentially a frequency-flat channel

even when the original channel has multiple multipath

components. Also, the effective SNR of the channel after

BA is quite large. Therefore, single-carrier modulation with

standard timing and carrier synchronization and without

time-domain equalization works very well.

A. Success Probability of the Proposed BA Scheme

Dependence on the beam spreading factors (κu, κv).

As discussed at the end of Section IV-B (see also our

previous work [4, 28]), the trade-off between the angle

exploration of the measuring matrix Band the SNR

of received measurements is illustrated in Fig. 4 (a).

Increasing the angular spreading factor from κu=κv= 4

to κu=κv= 8 improves the performance. However, the

performance keeps degrading when (κu, κv)are increased

to κu=κv= 16,22.

Dependence on the PN sequence length Ncand

robustness to Doppler shifts. In general, larger PN

8In the case that there is no LOS link, one can announce a success if

the location of the strongest component in Γ?coincides with the central

AoA-AoD of the strongest scatterer cluster.

9We will use lsqnonneg.m in MATLAB© to solve the NNLS

optimization problem in (32). Also, for simplicity, in our simulations, we

assume that the sizes of the beamforming codebooks given by 1

MRF |CT|

and 1

NRF |CR|on both sides, are the same as the number of effective

beam training beacon slots T. In practical implementation, however, the

BS codebook size should be fixed and used periodically, since it is shared

to all UEs in advance; while the local beamforming codebook for each

UE can be set to any size depending on the individual UE.

50 4.3 Original journal article

10 20 30 40 50 60 70

0.2

0.4

0.6

0.8

Number of beacon slots T

(a)

κu=κv= 4

κu=κv= 8

κu=κv= 16

κu=κv= 22

15 20 25 30 35 40

0.4

0.6

0.8

Number of beacon slots T

(b)

Nc= 16

Nc= 32

Nc= 64

Nc= 128

Nc= 256

20 25 30 35 40 45

0.6

0.7

0.8

0.9

Number of beacon slots T

(c)

∆vl?= 0 m/s

∆vl?= 3 m/s

∆vl?= 5 m/s

∆vl?= 8 m/s

Fig. 4: Detection probability PDof the proposed time-domain scheme with respect to (a) different power spreading factors (κu,

κv), where Nc= 64 and the relative speed of the strongest path ∆vl?= 5 m/s; (b) different PN sequence lengths Nc, where

κu=κv= 8 and the relative speed of the strongest path ∆vl?= 5 m/s; (c) different relative speed values of the strongest path

∆vl?, where κu=κv= 8,Nc= 64. In all the above cases, M=N= 32,MRF = 3,NRF = 2,B0=B,SNRBBF =−14 dB.

sequence length Ncprovides better correlation properties,

such that different pilot streams can be well separated at the

UE. However, increasing Ncincreases the whole duration

t0=NcTcof the transmitted signal. Thus, because of the

Doppler shift, the received PN sequence undergoes larger

phase rotation of the chips. This rotation degrades the PN

sequence correlation property. This is illustrated in Fig. 4

(b). Increasing the PN sequence length Ncfrom Nc= 16

to Nc= 32,64 improves the performance of the proposed

scheme. However, the performance degrades slightly when

Ncis increased to Nc= 128,256. In general, our scheme is

highly insensitive to the Doppler spread between different

multipath components, as illustrated in Fig. 4 (c). For

example, varying the speed difference between the paths

from 0to 8m/s, the BA success probability remains

virtually unchanged. This provides a significant advantage

with respect to schemes based on OFDM signaling, which

is known to be fragile to uncompensated Doppler shifts

yielding inter-carrier interference.

Comparison with other time-domain methods. Fig. 5

compares the performance of our proposed scheme with

a recently proposed time-domain approach [20, 21] based

on the Orthogonal Matching Pursuit (OMP) CS technique.

The approach in [20, 21] assumes that the channel

vector coefficients remain constant over the whole training

stage (in other words, it assumes a completely stationary

situation with zero Doppler shifts). It can be seen from

Fig. 5 that the proposed scheme exhibits much more robust

performance with respect to the channel time-variations

whereas the approach in [20, 21] fails when the channel is

fast time-varying.

Remark 1: In all the simulations so far, for simplicity,

we have considered path delays equal to integer multiples

of the chip duration, namely, τl=Gl·Tc, for some

integer Gl. In such cases, the chip pulse shape with any

arbitrary square-root Nyquist pulse [33] yields the same

10 20 30 40 50 60 70 80

0.2

0.4

0.6

0.8

Number of beacon slots T

NNLS slow-varying channel

NNLS fast-varying channel

OMP slow-varying channel

OMP fast-varying channel

Fig. 5: Comparison of the proposed scheme based on NNLS

with that in [20, 21] based on OMP for both slow-varying

channels (i.e., when the instantaneous channel coefficients

are time-invariant) and fast-varying channels (i.e., when the

instantaneous channel coefficients decorrelate almost completely

from slot to slot due to the large Doppler spread), where M=

N= 32,MRF = 3,NRF = 2,κu=κv= 8,B0=B,Nc= 64,

SNRBBF =−14 dB.

performance since the samples of any Nyquist pulse at the

output of the matched filter (see, e.g., (14)) are zero at

all integer multiples of Tcexcept 0. In practice, however,

the path delays are not integer multiple of Tcand can

be generally written as τl=Gl·Tc+ ∆τlfor some

0<∆τl< Tc, referred to as the delay fractional part. In

general, this is not an issue during the data communication

phase since the delays are well compensated by suitable

synchronization at the receiver, but it may affect the

performance of our proposed BA since it is unrealistic to

assume any proper synchronization during the BA. As a

result, in the presence of non-null delay fractional parts,

the performance of our scheme may depend on the specific

4. Initial Beam Alignment for mmWave Single-Carrier Systems 51

10 20 30 40 50 60 70

0.2

0.4

0.6

0.8

Number of beacon slots T

NNLS, integer delay

NNLS, RRC, fractional delay

NNLS, Rect, fractional delay

Fig. 7: Illustration of pulse shaping effect, where M=N= 32,MRF = 3,NRF = 2,κu=κv= 8,B0=B,

Nc= 64,SNRBBF =−14 dB.

B. Effectiveness of Single-Carrier Modulation

Assume that after a BA procedure as proposed in Section IV, the strongest component in Γ?

corresponds to the l?-th scatterer between the BS and the UE. Hence, the estimated beamforming

vectors for the data transmission are given by ul?=FMˇ

ul?at the BS and vl?=FNˇ

vl?at the

UE respectively, where ˇ

ul?∈CMis an all-zero vector with a 1at the component corresponding

to the AoD of the l?-th scatterer, and ˇ

vl?∈CNis an all-zero vector with a 1at the component

corresponding to the AoA of the l?-th scatterer. We assume that in the downlink data transmission

phase, the BS and the UE employ a single RF chain, therefore, with a slight abuse of notation,

we assume that transmitted waveform, consisting of Ndinformation symbols, is given by x(t) =

PNd

n=1 √PtotTd·dnpr(t−nTd), where pr(t)denotes the normalized band-limited pulse shaping

filter (such as a raised cosine pulse), Td= 1/B indicates the symbol transmission rate over the

whole bandwidth B, and ∀n∈[Nd],dn∈ {1,−1}indicate the information symbols. From (7)

and (8), the received signal after passing through the beamforming vectors (vl?,ul?)is given by

ˆy(t) = vH

l?ZH(t, τ)x(t−τ)dτul?+z(t)

l=1

n=1

Clpr(t−nTd−τl)ej2π(ˇνl+νlnTd)+z(t),(34)

Fig. 6: Illustration of pulse shaping effect, where M=N= 32,

MRF = 3,NRF = 2,κu=κv= 8,B0=B,Nc= 64,SNRBBF =

−14 dB, the relative speed of the strongest path ∆vl?= 5 m/s.

chip pulse used. In order to investigate this effect, we

perform numerical simulations with two different pulse

shapes: root-raised-cosine (RRC) and rectangular (Rect)

pulses [33]. Fig. 6 illustrates the simulation results for

random fractional delays. As expected, we see a slight

performance degradation compared with the ideal integer

delay curve (an additional 5∼10 slots for PD≥0.95).

However, the effect of non-integer delays and of the

specific chip pulse shape is rather small. Furthermore, it

is observed that the Rect pulse yields less degradation than

RRC.

B. Effectiveness of Single-Carrier Modulation

After running the BA protocol as described in Section

IV, and assuming that the strongest multipath component

is correctly identified, we denote such component as l?-th.

Hence, the estimated beamforming vectors for the data

transmission are given by ul?=FMˇ

ul?at the BS and

vl?=FNˇ

vl?at the UE respectively, where ˇ

ul?∈CMis

an all-zero vector with a 1at the component corresponding

to the AoD of the l?-th scatterer, and ˇ

vl?∈CNis an

all-zero vector with a 1at the component corresponding

to the AoA of the l?scatterer. In this section we focus

on the data transmission phase under the beam alignment

assumption. During the data transmission phase, a standard

single-carrier linear modulated signal consisting of Nd

information symbols is used. The complex baseband signal

is given by x(t) = PNd

n=1 √PtotTd·dnpr(t−nTd), where

pr(t)denotes a unit-energy square-root Nyquist pulse

shaping filter (e.g., a RRC pulse), Td= 1/B indicates the

symbol interval, and {dn}is the sequence of unit-energy

modulation symbols, belonging to a suitable modulation

constellation [33]. From (7) and (8), the received signal

including the transmit and receive beamforming vectors

(vl?,ul?)is given by

ˆy(t) = ZvH

l?H(t, τ)ul?x(t−τ)dτ +z(t)

l=1

n=1

Cldnpr(t−nTd−τl)ej2π(ˇνl+νlnTd)+z(t),

(33)

where Cl:= √PtotTdρlvH

l?aR(φl)aT(θl)Hul?. The

receiver uses standard timing and carrier synchronization

with respect to the multipath component l?selected

by the BA algorithm. When BA is achieved, the SNR

corresponding such l?multipath component is quite large,

since it is boosted by the combined beamforming gain of

the UE and the BS. For example, in order to support a

spectral 1 bit/s/Hz with practical coded modulation (e.g.,

using a QPSK constellation with binary coding rate 1/2),

the SNR after beamforming should be between 0 and 3 dB,

depending on the coding scheme used. In these conditions,

it is well-known that timing and carrier synchronization

can be considered virtually ideal. Therefore, the receiver

performs matched filtering with respect to the symbol pulse

pr(t), sampling at epochs t=kTd+τl?, and symbol

de-rotation by the factor e−j2π(ˇνl?+νl?nTd). It follows

that the discrete-time baseband signal at the output of

the matched filter and synchronizer takes on the form

of (34), where zc

n[k]denotes the noise at the output of

the matched filter with variance N0,10 and we define

ϕ(t) = Rpr(τ)p∗

r(τ−t)dτ. In (34) (a)we used the fact that

since pr(t)is a square-root Nyquist pulse, then ϕ(¯

k·Td)is

equal to 1for ¯

k= 0 and is zero otherwise. The first term

in (34) corresponds to the desired symbol dkmultiplied

by an overall channel coefficient Cl?that contains the

beamforming gain achieved by BA, whereas the last two

terms correspond the inter-symbol interference and noise,

respectively.

The resulting SNR after beamforming SNRABF is given

by (35), where in (35) (a)we used the fact that ϕ(t)≈0for

|t|> Td, thus, Pn∈[Nd]E[|Clϕ((k−n)Td+τl?−τl)|2].

E[|Cl|2], in (35) (b)we used the fact that the interference

caused by the other paths is negligible (compared with

the noise floor of the receiver) since for the paths whose

AoA-AoD is away from the beamforming directions the

SNR is even lower than the isotropic SNR SNRBBF defined

in (29). Finally, in (35) (c)we used the fact that the

dominant path l?has nearly full beamforming gain MN. It

is seen from (35) that SNRABF is around MN times larger

than SNRBBF. This justifies the assumption of nearly ideal

timing and carrier recovery.

Consequently, the ergodic achievable rate in (34) can be

upper and lower bounded as (36) and (37), respectively

[42]. The upper bound (36) is obtained via the Maximum

Ratio Combining for the case where all the delayed

versions of the transmitted signal are separately observable

(this is sometimes referred to as “matched filter upper

bound”). The lower bound is actually achieved by a simple

10As usual, we assume that the symbol pulse has unit energy, i.e.,

R|pr(t)|2dt = 1, therefore the noise sample has the same variance of

the noise power spectral density N0[33].

52 4.3 Original journal article

y(t)|t=kTd+τl?=

n=1

dnCl?ϕ[(k−n)Td]

| {z }

(a)

=dkCl?

n=1

dnX

l6=l?

Clϕ[(k−n)Td+τl?−τl]·ej2π(ˇνl−ˇνl?+(νl−νl?)nTd)+

n=1

n[k],(34)

SNRABF =E[|Cl?dk|2]

Pl6=l?Pn∈[Nd]E[|dn|2]E[|Clϕ((k−n)Td+τl?−τl)|2] + E[|zc

n[k]|2]

(a)

≈E[|dk|2]×E[|Cl?|2]

E[|dn|2]×Pl6=l?E[|Cl|2] + E[|zc

n[k]|2]

(b)

≈E[|dk|2]×E[|Cl?|2]

E[|zc

n[k]|2]

(c)

=Ptot ·γl?·MN

N0B,(35)

Rub?=E"log2 1 + PL

l=1 |Clϕ(τl?−τl)|2

N0!#,(36)

Rlb?= log2 1 + |E[Cl?ϕ(0)]|2

N0+Var(Cl?ϕ(0)) + Pm∈[Nd]Pl6=l?E[|Clϕ(mTd+τl?−τl)|2]!.(37)

Desired Region

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Achievable rate (bit/s/Hz)

Fig. 7: The ergodic achievable rate after a successful Beam

Alignment using the proposed time-domain scheme, where M=

N= 32,B0=B,Nc= 64, the relative speed of the strongest

path ∆vl?= 5 m/s.

receiver that treats all the Inter-symbol Interference (ISI)

as a Gaussian noise.

Ergodic achievable rate bounds. In Fig. 7, we illustrate

the lower and upper bounds on the achievable ergodic rate

(36) (37) as a function of SNRBBF, under the assumption

of successful BA, i.e., that the BA algorithm found beam

indices of the strongest path.11 It is clear that the lower

11As seen before, this happens with probability ≈1after a few tens

of beacon slots.

bound is self-interference limited while the upper bound is

not. However, the gap between the bounds is quite small

in the regime of low pre-beamforming SNR (SNRBBF <

10 dB), while the achievable ergodic spectral efficiency

in this regime can be quite high, which is relevant in

mmWave applications. In particular, it is important to

recall here that the lower bound refers to the case of

single-carrier transmission without any equalization. For

example, focusing on a realistic spectral efficiency between

1∼2bit/s/Hz, we notice that single-carrier with the

proposed BA scheme and no equalization (just standard

post-beamforming timing and frequency synchronization)

achieves the relevant spectral efficiency in the range of

SNRBBF between -30 and -20 dB, and suffers from a very

small gap with respect to the best possible equalization

(given by the upper bound).

PDP before and after Beam Alignment (BA). Fig. 8

compares the average PDP of the mmWave channel with

L= 3 multipath components before and after BA. It can

be seen from Fig. 8 (a) that, before BA, the channel has

a relatively large delay spread and is highly frequency

selective. Moreover, since different multipath components

are mixed with each other and since each one has its own

delay and Doppler shift, the time-domain channel is highly

time-varying. In contrast, as seen from Fig. 8 (b), after

BA, the channel effectively consists of a single multipath

component, thus, it is almost flat in frequency. Also,

note that in contrast with the former case where different

4. Initial Beam Alignment for mmWave Single-Carrier Systems 53

20 40 60 80 100 120 140

E[|ys,i,j[k]|2]

(a)

20 40 60 80 100 120 140

100

Delay

E[|ys,i,j[k]|2]

(b)

Fig. 8: Illustration of the Power Delay Profile (PDP) with multi-path (L= 3) channel in (14).

(a)Before Beam Alignment.(b)After Beam Alignment

different Doppler frequencies, in the latter case the Doppler frequency of the single multi-path

component can be easily compensated by standard timing, frequency, and phase synchronization

techniques at the receiver.

Ergodic achievable rate bounds. In Fig. 9, we illustrate the lower and upper bounds on the

achievable ergodic rate (see (36) and (37)) as a function of SNRBBF. While it is clear that the

lower bound is interference-limited while the upper bound is not, we notice that the gap between

the bounds is quite small in the regime of low pre-beamforming SNR (SNRBBF <10 dB), which

is relevant in mmWave applications. At the same time, the achievable ergodic spectral efficiency

in this regime can be quite high. In particular, we remark here that the lower bound refers

to the case of single-carrier transmission without any equalization. For example, focusing on

a realistic spectral efficiency between 1 and 2 bit/s/Hz, we notice that single-carrier with the

proposed BA scheme and no equalization (just standard post-beamforming timing and frequency

synchronization) achieves the relevant spectral efficiency in the range of SNRBBF between -30

Fig. 8: Illustration of the PDP with multipath (L= 3) channel

in (14).(a)Before Beam Alignment.(b)After Beam Alignment

multipath components were mixed with different Doppler

frequencies, in the latter case the Doppler frequency of

the single multipath component can be easily compensated

by standard timing, frequency, and phase synchronization

techniques at the receiver.

VI. CONCLUSION

In this paper, we proposed a novel time-domain Beam

Alignment (BA) scheme for mmWave MIMO systems

with a HDA architecture. The proposed scheme is

particularly suited for single-carrier multiuser mmWave

communication, where each user has access to the whole

bandwidth, and all the users within the BS coverage can

be trained simultaneously. We focused on the channel

second-order statistics, incorporating both the random

channel gains and Doppler shifts into the channel matrix to

further capture the realistic features of mmWave channels.

We applied the recently developed Non-Negative Least

Squares (NNLS) technique to efficiently find the strongest

path for each user. Simulation results showed that the

proposed scheme incurs moderately low training overhead,

achieves very good robustness to fast time-varying

channels, and it is very robust to large Doppler shifts

among different multipath components. Furthermore, we

have shown that the multipath channel after BA reduces

essentially to a single giant tap. Hence, single-carrier

signaling can perform very efficiently and requires just

standard timing and frequency synchronization (that works

well at high SNR after beamforming) while it requires

no time-domain equalization. This makes the proposed

BA scheme together with single-carrier signaling a strong

contender for future mmWave systems, especially in

outdoor mobile scenarios.

REFERENCES

[1] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, and

P. Popovski, “Five disruptive technology directions for 5G,” IEEE

Communications Magazine, vol. 52, no. 2, pp. 74–80, 2014.

[2] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C.

Soong, and J. C. Zhang, “What will 5G be?” IEEE Journal on

selected areas in communications, vol. 32, no. 6, pp. 1065–1082,

2014.

[3] J. Zhao, X. Wang, and H. Viswanathan, “Directional beam

alignment for millimeter wave cellular systems,” in 2016 IEEE

36th International Conference on Distributed Computing Systems

(ICDCS), June 2016, pp. 619–628.

[4] X. Song, S. Haghighatshoar, and G. Caire, “A scalable and

statistically robust beam alignment technique for mm-Wave

systems,” IEEE Trans. on Wireless Comm., vol. PP, pp. 1–1, 2018.

[5] A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen,

L. Li, and K. Haneda, “Hybrid beamforming for massive MIMO:

A survey,” IEEE Communications Magazine, vol. 55, no. 9, pp.

134–141, 2017.

[6] R. W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and A. M.

Sayeed, “An overview of signal processing techniques for millimeter

wave MIMO systems,” IEEE journal of selected topics in signal

processing, vol. 10, no. 3, pp. 436–453, 2016.

[7] M. R. Akdeniz, Y. Liu, M. K. Samimi, S. Sun, S. Rangan, T. S.

Rappaport, and E. Erkip, “Millimeter wave channel modeling and

cellular capacity evaluation,” IEEE Journal on Selected Areas in

Communications, vol. 32, no. 6, pp. 1164–1179, June 2014.

[8] P. Schniter and A. Sayeed, “Channel estimation and precoder design

for millimeter-wave communications: The sparse way,” in 2014 48th

Asilomar Conference on Signals, Systems and Computers, Nov 2014,

pp. 273–277.

[9] J. Rodr´

ıguez-Fern´

andez, N. Gonz´

alez-Prelcic, K. Venugopal, and

R. W. Heath Jr, “Frequency-domain compressive channel estimation

for frequency-selective hybrid mmWave MIMO systems,” arXiv

preprint arXiv:1704.08572, 2017.

[10] P. A. Eliasi, S. Rangan, and T. S. Rappaport, “Low-rank spatial

channel estimation for millimeter wave cellular systems,” IEEE

Transactions on Wireless Communications, vol. 16, no. 5, pp.

2748–2759, 2017.

[11] J. Palacios, D. D. Donno, and J. Widmer, “Tracking mm-Wave

channel dynamics: Fast beam training strategies under mobility,”

in IEEE INFOCOM 2017 - IEEE Conference on Computer

Communications, May 2017, pp. 1–9.

[12] S. Hur, T. Kim, D. J. Love, J. V. Krogmeier, T. A. Thomas,

and A. Ghosh, “Millimeter wave beamforming for wireless

backhaul and access in small cell networks,” IEEE Transactions

on Communications, vol. 61, no. 10, pp. 4391–4403, October 2013.

[13] S. Haghighatshoar and G. Caire, “The beam alignment problem in

mmWave wireless networks,” in 2016 50th Asilomar Conference on

Signals, Systems and Computers, Nov 2016, pp. 741–745.

[14] IEEE P802.11ad, Part 11, “Wireless LAN medium access control

(MAC) and physical layer (PHY) specifications amendment 3:

enhancements for very high throughput in the 60 GHz band,” IEEE

Computer Society, 2012.

[15] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel

estimation and hybrid precoding for millimeter wave cellular

systems,” Selected Topics in Signal Processing, IEEE Journal of,

vol. 8, no. 5, pp. 831–846, 2014.

[16] M. Kokshoorn, H. Chen, P. Wang, Y. Li, and B. Vucetic, “Millimeter

wave MIMO channel estimation using overlapped beam patterns and

rate adaptation,” IEEE Transactions on Signal Processing, vol. 65,

no. 3, pp. 601–616, 2016.

[17] S. Noh, M. D. Zoltowski, and D. J. Love, “Multi-resolution

codebook and adaptive beamforming sequence design for

millimeter wave beam alignment,” IEEE Transactions on Wireless

Communications, vol. 16, no. 9, pp. 5689–5701, Sept 2017.

[18] M. Hussain and N. Michelusi, “Throughput optimal beam alignment

in millimeter wave networks,” in 2017 Information Theory and

Applications Workshop (ITA), Feb 2017, pp. 1–6.

[19] A. Alkhateeb, G. Leus, and R. W. Heath, “Compressed sensing

based multi-user millimeter wave systems: How many measurements

are needed?” in 2015 IEEE International Conference on Acoustics,

Speech and Signal Processing (ICASSP), April 2015, pp.

54 4.3 Original journal article

2909–2913.

[20] K. Venugopal, A. Alkhateeb, N. G. Prelcic, and R. W.

Heath, “Channel estimation for hybrid architecture-based wideband

millimeter wave systems,” IEEE Journal on Selected Areas in

Communications, vol. 35, no. 9, pp. 1996–2009, 2017.

[21] K. Venugopal, A. Alkhateeb, R. W. Heath, and N. G. Prelcic,

“Time-domain channel estimation for wideband millimeter wave

systems with hybrid architecture,” in Acoustics, Speech and Signal

Processing (ICASSP), 2017 IEEE International Conference on.

IEEE, 2017, Conference Proceedings, pp. 6493–6497.

[22] R. J. Weiler, M. Peter, W. Keusgen, and M. Wisotzki, “Measuring

the busy urban 60 GHz outdoor access radio channel,” in 2014 IEEE

International Conference on Ultra-WideBand (ICUWB), Sept 2014,

pp. 166–170.

[23] V. Va, J. Choi, and R. W. Heath, “The impact of beamwidth

on temporal channel variation in vehicular channels and its

implications,” IEEE Transactions on Vehicular Technology, vol. 66,

no. 6, pp. 5014–5029, 2017.

[24] A. Ghosh, T. A. Thomas, M. C. Cudak, R. Ratasuk, P. Moorut,

F. W. Vook, T. S. Rappaport, G. R. MacCartney, S. Sun, and S. Nie,

“Millimeter-wave enhanced local area systems: A high-data-rate

approach for future wireless networks,” IEEE Journal on Selected

Areas in Communications, vol. 32, no. 6, pp. 1152–1163, June 2014.

[25] S. Buzzi, C. D’Andrea, T. Foggi, A. Ugolini, and G. Colavolpe,

“Single-carrier modulation versus OFDM for millimeter-wave

wireless MIMO,” IEEE Transactions on Communications, vol. PP,

no. 99, pp. 1–1, 2017.

[26] A. Nasser and M. Elsabrouty, “Frequency-selective massive MIMO

channel estimation and feedback in angle-time domain,” in 2016

IEEE Symposium on Computers and Communication (ISCC), June

2016, pp. 1018–1023.

[27] T. H¨

alsig, D. Cvetkovski, E. Grass, and B. Lankl, “Statistical

properties and variations of LOS MIMO channels at millimeter wave

frequencies,” arXiv preprint arXiv:1803.07768, 2018.

[28] X. Song, S. Haghighatshoar, and G. Caire, “A robust time-domain

beam alignment scheme for multi-user wideband mmWave systems,”

in WSA 2018; 22th International ITG Workshop on Smart Antennas

(to be published), March 2018, pp. 1–7.

[29] P. Bello, “Characterization of randomly time-variant linear

channels,” IEEE Transactions on Communications Systems, vol. 11,

no. 4, pp. 360–393, 1963.

[30] A. Goldsmith, Wireless communications. Cambridge University

Press, 2005.

[31] M. Slawski, M. Hein et al., “Non-negative least squares for

high-dimensional linear models: Consistency and sparse recovery

without regularization,” Electronic Journal of Statistics, vol. 7, pp.

3004–3056, 2013.

[32] R. Kueng and P. Jung, “Robust nonnegative sparse recovery and

the nullspace property of 0/1 measurements,” IEEE Transactions on

Information Theory, vol. 64, no. 2, pp. 689–703, Feb 2018.

[33] J. G. Proakis and M. Salehi, Digital communications. McGraw-Hill,

2008.

[34] A. M. Sayeed, “Deconstructing multiantenna fading channels,” IEEE

Transactions on Signal Processing, vol. 50, no. 10, pp. 2563–2579,

2002.

[35] J. Song, J. Choi, T. Kim, and D. J. Love, “Advanced quantizer

designs for FDD-based FD-MIMO systems using uniform planar

arrays,” IEEE Transactions on Signal Processing, vol. 66, no. 14,

pp. 3891–3905, July 2018.

[36] E. Dahlman, P. Beming, J. Knutsson, F. Ovesjo, M. Persson, and

C. Roobol, “WCDMA – the radio interface for future mobile

multimedia communications,” IEEE Transactions on Vehicular

Technology, vol. 47, no. 4, pp. 1105–1118, 1998.

[37] R. Tibshirani, “Regression shrinkage and selection via the lasso,”

Journal of the Royal Statistical Society. Series B (Methodological),

pp. 267–288, 1996.

[38] D. P. Bertsekas and A. Scientific, Convex optimization algorithms.

Athena Scientific Belmont, 2015.

[39] D. Kim, S. Sra, and I. S. Dhillon, “Tackling box-constrained

optimization via a new projected quasi-Newton approach,” SIAM

Journal on Scientific Computing, vol. 32, no. 6, pp. 3548–3563,

2010.

[40] D. K. Nguyen and T. B. Ho, “Anti-lopsided algorithm for

large-scale nonnegative least square problems,” arXiv preprint

arXiv:1502.01645, 2015.

[41] E. Perahia, C. Cordeiro, M. Park, and L. L. Yang, “IEEE 802.11

ad: Defining the next generation multi-Gbps Wi-Fi,” in Consumer

Communications and Networking Conference (CCNC), 2010 7th

IEEE. IEEE, Conference Proceedings, pp. 1–5.

[42] G. Caire, “On the ergodic rate lower bounds with applications to

massive MIMO,” arXiv preprint arXiv:1705.03577, 2017.

Xiaoshen Song (S’17) received the B.Sc.

degree in Communication Engineering from

Northwestern Polytechnical University, Xi’an,

China, in 2013, and the M.Sc. degree in

Communication and Information Systems from

the Institute of Electronics, University of

Chinese Academy of Sciences, Beijing, China,

in 2016. Her master’s thesis focuses on

video synthetic aperture radar (VideoSAR)

system design and imaging algorithms. She is

currently pursuing the Ph.D. degree with the

Communications and Information Theory (CommIT) group at Technische

Universit¨

at Berlin, Berlin, Germany. Her research interests include

wireless communication, mmWave MIMO, and compressed sensing.

Saeid Haghighatshoar (S’12–M’15) received

the B.Sc. degree in Electrical Engineering

(Electronics) in 2007 and the M.Sc. degree

in Electrical Engineering (Communication

Systems) in 2009, both from Sharif University

of Technology, Tehran, Iran, and the Ph.D.

degree in Computer and Communication

Sciences from ´

Ecole Polytechnique F´

ed´

erale

de Lausanne, Lausanne, Switzerland, in 2014.

Since 2015, he is a postdoctoral researcher

with Communications and Information Theory

(CommIT) group at Technische Universit¨

at Berlin, Berlin, Germany. His

research interests lie in Information Theory, Communication Systems,

Wireless Communication, Optimization Theory, and Compressed

Sensing.

4. Initial Beam Alignment for mmWave Single-Carrier Systems 55

Giuseppe Caire (S’92 – M’94 – SM’03

– F’05) was born in Torino, Italy, in

1965. He received the B.Sc. in Electrical

Engineering from Politecnico di Torino (Italy),

in 1990, the M.Sc. in Electrical Engineering

from Princeton University in 1992 and the

Ph.D. from Politecnico di Torino in 1994.

He has been a post-doctoral research fellow

with the European Space Agency (ESTEC,

Noordwijk, The Netherlands) in 1994-1995,

Assistant Professor in Telecommunications at

the Politecnico di Torino, Associate Professor at the University of Parma,

Italy, Professor with the Department of Mobile Communications at the

Eurecom Institute, Sophia-Antipolis, France, a Professor of Electrical

Engineering with the Viterbi School of Engineering, University of

Southern California, Los Angeles, and he is currently an Alexander

von Humboldt Professor with the Electrical Engineering and Computer

Science Department of the Technical University of Berlin, Germany.

He served as Associate Editor for the IEEE Transactions on

Communications in 1998-2001 and as Associate Editor for the IEEE

Transactions on Information Theory in 2001-2003. He received the Jack

Neubauer Best System Paper Award from the IEEE Vehicular Technology

Society in 2003, the IEEE Communications Society & Information Theory

Society Joint Paper Award in 2004 and in 2011, the Okawa Research

Award in 2006, the Alexander von Humboldt Professorship in 2014, and

the Vodafone Innovation Prize in 2015. Giuseppe Caire is a Fellow of

IEEE since 2005. He has served in the Board of Governors of the IEEE

Information Theory Society from 2004 to 2007, and as officer from 2008

to 2013. He was President of the IEEE Information Theory Society in

2011. His main research interests are in the field of communications

theory, information theory, channel and source coding with particular

focus on wireless communications.

56 4.3 Original journal article

Data Communication for mmWave

Multi-User MIMO

5.1 Introduction

Hybrid digital analog (HDA) beamforming is the most practical solution for mmWave

communication regarding the implementation cost, performance and power efficiency.

This chapter presents two HDA mmWave antenna architectures that can be regarded as

two extreme cases, namely, the fully-connected (FC) architecture and the one-stream-

per-subarray (OSPS) architecture. A joint performance evaluation of the initial beam

alignment and the consequent data communication will be provided, such that the latter

takes place by using the beam direction information obtained by the former. In addition,

the power efficiency of the the two architectures will also be evaluated, which takes into

account the hardware impairments, e.g., power dissipation, power backoff, etc..

5.2 Clarification of each authors’ contributions

This chapter is a journal publication, which is a joint work with Thomas Kühne and

Giuseppe Caire. I wrote this journal as the first author. The citation information is in

below:

X. Song, T. Kühne, and G. Caire, “Fully-/Partially-Connected Hybrid Beamforming

Architectures for mmWave MU-MIMO,” IEEE Transactions on Wireless Communica-

tions, 2019. DOI: 10.1109/TWC.2019.2957227

58 5.3 Original journal article

All the authors contributed to this paper. I authored the channel and signaling models.

I implemented the simulations for beam alignment section and data communication

section. I also wrote the complete first draft (including all sections) of this paper.

Thomas Kühne authored all the hardware sections. He proposed the HDA antenna

architecture model and the hardware impairment model.

Giuseppe Caire, who is my PhD supervisor, provided valuable discussions in each

meeting of this work. He also did a final modification of the overall draft.

5.3 Original journal article

The following article is a reprint of the original journal paper. It is the accepted version

of the paper. The copyright information is given in page xii of this thesis as well as in

the first page of the reprinted paper.

Fully- / Partially-Connected Hybrid Beamforming

Architectures for mmWave MU-MIMO

Xiaoshen Song, Student Member, IEEE, Thomas K¨

uhne, and Giuseppe Caire Fellow, IEEE

uhne, and G. Caire, ”Fully-/Partially-Connected Hybrid Beamforming Architectures for

mmWave MU-MIMO,” IEEE Transactions on Wireless Communications, 2019. The published version can be found online: https://ieeexplore.ieee.

org/document/8931770. This reprint is the accepted version of the paper.

Abstract—Hybrid digital analog (HDA) beamforming has

attracted considerable attention in practical implementation

of millimeter wave (mmWave) multiuser multiple-input

multiple-output (MU-MIMO) systems due to the low power

consumption with respect to its fully digital baseband

counterpart. The implementation cost, performance, and

power efficiency of HDA beamforming depends on the

level of connectivity and reconfigurability of the analog

beamforming network. In this paper, we investigate the

performance of two typical architectures that can be regarded

as extreme cases, namely, the fully-connected (FC) and

the one-stream-per-subarray (OSPS) architectures. In the

FC architecture each RF antenna port is connected to all

antenna elements of the array, while in the OSPS architecture

the RF antenna ports are connected to disjoint subarrays.

We jointly consider the initial beam acquisition and data

communication phases, such that the latter takes place by

using the beam direction information obtained by the former.

We use the state-of-the-art beam alignment (BA) scheme

previously proposed by the authors and consider a family

of MU-MIMO precoding schemes well adapted to the beam

information extracted from the BA phase. We also evaluate

the power efficiency of the two HDA architectures taking

into account the power dissipation at different hardware

components as well as the power backoff under typical power

amplifier constraints. Numerical results show that the two

architectures achieve similar sum spectral efficiency, while

the OSPS architecture is advantageous with respect to the FC

case in terms of hardware complexity and power efficiency,

at the sole cost of a slightly longer BA time-to-acquisition due

to its reduced beam angle resolution.

Index Terms—Millimeter Waves, MU-MIMO, HDA

Beamforming, Beam Acquisition, Spectral Efficiency, Power

Efficiency.

I. INTRODUCTION

Millimeter wave (mmWave) multiuser multiple-input

multiple-output (MU-MIMO) communications have

emerged as one of the most promising techniques for the

second phase of 5G wireless systems, aimed at achieving

broadband data communications at unprecedented

high rates (≥1Gb/s per user) in very dense urban

small-cell environments. The relatively underutilized

The authors are with the Electrical Engineering and Computer Science

Department, Technische Universit¨

at Berlin, 10587 Berlin, Germany

(e-mail: [email protected]).

X. Song is sponsored by the China Scholarship Council

(201604910530). This work was funded by the European Union’s

Horizon 2020 research and innovation programme under grant agreement

No. 779305 (SERENA).

mmWave spectrum (30-300 GHz) allows to achieve a

target ∼1Gb/s per data stream with ∼1GHz signal

bandwidth, provided that the system can support a spectral

efficiency of about 1 bit/s/Hz. Such relatively low spectral

efficiency per stream can be achieved with rather standard

modulation and coding techniques (e.g., binary codes of

rate 1/2mapped onto a QPSK constellation), when that

the signal to interference plus noise ratio (SINR) at the

receiver is between 0and 3dB (depending on the gap to

capacity of the underlying code).1

Due to the severe isotropic pathloss incurred by

mmWave frequencies, large antenna gains are required both

at the base station (BS) side and the user equipment (UE)

side. Fortunately, the small carrier wavelength associated

with mmWave frequencies enables large antenna arrays to

be packed in a small form factor, such that the required

large antenna gain can be obtained using beamforming. For

example, in a single-user scenario where the signal-to-noise

ratio (SNR) at the receiver in isotropic propagation

conditions2is between −30 dB ∼ −20 dB (a quite realistic

situation for outdoor mmWave channels), a combined Tx

and Rx beamforming gain of 30 dB is needed such that,

when the Tx and the Rx beams are well aligned, the

resulting SNR after beamforming reaches the desired target

(a bit above 0dB, as argued before).

Realizing fast and accurate digitally steerable

beamforming at mmWave, however, is not a trivial

task. One main challenge is that the conventional full

digital transceiver architecture (with one radio frequency

(RF) chain per antenna element) is infeasible due

to hardware cost, power consumption, and above all

power dissipation in the small integrated array form

factor. Each RF chain consists of (roughly speaking)

analog-to-digital / digital-to-analog (A/D, D/A) converters,

up / down-conversion mixers, filters, power amplifiers

(PAs), and low-noise amplifiers (LNAs). It follows that

a design goal for mmWave transceivers is to reduce the

number of RF chains to be significantly smaller than the

number of antenna array elements. For this reason, the

1With ideal single-user capacity achieving codes for the Gaussian

channel, we have that log(1 + SINR) = 1 bit/s/Hz is achieved for

SINR = 1 (i.e., 0 dB). In practice, gaps of a fraction of a dB to 3-4

dB are obtained by actual coding schemes adopted in current standards.

2Here the isotropic propagation conditions correspond to one active

antenna at the transmitter (Tx) and one active antenna at the receiver

(Rx), respectively.

5. Data Communication for mmWave Multi-User MIMO 59

concatenation of digital and analog beamforming, known

as hybrid digital analog (HDA) beamforming architecture,

has been widely considered. In such a context, the limited

number of RF chains are used to enable the multistream

baseband processing, while an analog processing is used to

realize the antenna beamforming gain. A primary objective

of HDA beamforming is to maximize the multiuser sum

rate, while keeping the hardware costs, complexity, and

power efficiency, within some desirable targets.

A. Related Work

A large number of works have addressed HDA

beamforming for mmWave communication systems.

Rather than giving a complete account of such considerable

body of literature (out of scope of the present non-tutorial

paper), we consider a few significant representatives and

examine their proposed approaches in a critical manner.

A common assumption in most of existing works is that

the analog part of the HDA precoder can only utilize

phase control. This phase control can be realized through

either phase shifters [1–6] or lenses [7, 8]. Consequently,

the problem of finding the (sub-) optimal analog and

digital precoding matrices is transformed into a series

of relatively complicated decomposition steps [2–6],

since the underlying optimization problem is non-convex.

This phase-only control assumption may somewhat

reduce the hardware complexity. However, the signaling

freedom is also drastically reduced and the corresponding

optimization computational complexity is typically

prohibitive for practical real-time implementations.

These drawbacks motivate the exploration of an analog

precoding architecture with both phase and amplitude

controls [9, 10]. In fact, it has been demonstrated in

practice that simultaneous phase and amplitude control

is fully feasible at mmWaves with good accuracy, low

complexity, and low cost [11, 12].

Another severe limitation appearing in several HDA

beamforming works is the assumption of invariant

instantaneous channel coefficients over a large time

duration [1, 13, 14]. It is known that, in order to overcome

the heavy signal attenuation, communication at mmWaves

requires an initial beam acquisition (which we refer as

beam alignment (BA)) [7, 15, 16]. The goal of BA is to

find a pair of narrow beams connecting each UE with the

BS.3Thus, the nearly invariant channel assumption only

makes sense after BA is achieved, since once the beams are

aligned, the communication occurs only through a single

narrow path with small effective angular spread, whose

delay and Doppler shift can be easily compensated using

standard synchronization techniques [17–19]. However,

before BA is achieved, the channel delay spread and

time-variation can be large due to the presence of several

multipath components (both the LOS and the non-LOS

(NLOS) paths), each with its own delay and Doppler

3E.g., in line-of-sight (LOS) propagation, the aligned directions

typically coincide with the AoA and AoD of the LoS path.

shift. In this case, the instantaneous channel coefficients

change very fast. Any BA algorithms relying on an

invariant instantaneous channel assumption are no-longer

feasible, since for example, even a small motion of a

few centimeters traverses several wavelengths, potentially

producing multiple deep fades [20, 21].

In addition, a large number of works on HDA

architectures investigated only the data communication

phase and assume full channel state information (CSI)

[2–6, 10, 22, 23], i.e., that the vectors of baseband

complex channel coefficients at each array element

are known. These works focus on the optimization

of the HDA precoder using the full CSI knowledge.

Unfortunately, this assumption is obviously not feasible in

a realistic system. In order to acquire such coefficients,

one should be able to sample each antenna element, i.e.,

one would need an RF chain per antenna element or

exhaustively measure all elements successively. Hence, if

full CSI knowledge was possible, no HDA beamforming

would be needed, since we could simply implement

baseband digital beamforming/multiuser precoding, which

is performance-wise more efficient. As a matter of fact,

it makes sense to study HDA architectures under the

assumption that only a low-dimensional projection of the

channel vectors can be measured by the limited number

of RF chains. To this end, a hybrid precoding scheme

exploiting implicit CSI (i.e., the couplings of all possible

pairs of analog beamforming vectors) was proposed in [24].

However, the work in [24] (as well as in [4, 6, 10, 22, 23])

is limited to a single-user configuration and does not treat

the MU-MIMO case.

It is known that MU-MIMO is superior to single-user

beamforming from a network spectral efficiency

perspective even under HDA, provided that the user

density is rich enough such that the BS can schedule

subsets of UEs to be served by spatial multiplexing

with sufficient angular separation [25, 26]. Hence, this

motivates us to consider the implementation of MU-MIMO

schemes under realistic HDA architecture constraints. Two

“extreme” HDA architectures are depicted in Fig. 1 [27].

Fig. 1 (a) shows a fully-connected (FC) architecture, where

each RF antenna port is connected to all antenna elements

of the array. At the other extreme, Fig. 1 (b) shows

what we refer to as the one-stream-per-subarray (OSPS)

architecture, where each RF antenna port is connected

to a disjoint subarray. A common theme that underlies

most of the HDA works is that the FC architecture

outperforms the OSPS architecture only at the cost of

higher hardware complexity. However, many reference

works [3, 8, 10, 22, 23] ignore hardware impairments

[6], such as the power dissipation and the PA nonlinear

distortion. In particular, the nonlinear PAs employed at

the BS can drastically distort the transmit signal when

operated close to saturation [28]. To this end, a certain

power backoff from the saturation power of a PA should

be considered accordingly for different signaling schemes

60 5.3 Original journal article

and transceiver architectures, such that the PAs can always

work in their linear operating region.

B. Contributions

In this paper we overcome the shortcomings of the

present literature outlined before, and comprehensively

evaluate the performance of HDA architectures (in

particular, as shown in Fig. 1), where we assume both

amplitude and phase control for each analog path. Our

main focus is on the MU-MIMO downlink, but similar

and symmetric conclusions can be reached for the uplink

as well. Our main contributions are summarized as follows:

1) More general and realistic mmWave channel model.

We consider a quite general mmWave wireless channel

model, taking into account the fundamental features of

mmWave channels such as fast time-variation due to

Doppler, frequency-selectivity, and the AoA-AoD sparsity

[20, 21, 29]. The numerical results based on our proposed

channel model are further verified on the 3D geometry

based channel generator QuaDRiGa [30], which has

become a standard tool in industrial R&D as well as in

3GPP standardization.

2) More practical hardware impairments and

power efficiency analysis. When comparing the HDA

beamforming performance of different transmitter

architectures, we take into account the practical hardware

impairments, particularly, the potential power dissipation

of the underlying analog network components, as well

as the unavoidable power backoff for the nonlinear PAs.

While the former is not difficult to be compensated, the

latter is highly dependent on the peak-to-average power

ratio (PAPR) of the input signal, which (as illustrated in

Section V) should be carefully investigated in terms of

different signaling and modulating schemes. On top of

the potential hardware impairments, we also evaluate the

power efficiency of the most power consuming PAs with

respect to different transmitter architectures. Numerical

results show that the OSPS architecture with single-carrier

(SC) modulation achieves the highest power efficiency.

3) A joint evaluation of initial BA and data

communication. As mentioned before, a main limitation in

most hybrid beamforming works is that they only focus on

the data communication and assume full CSI. To address

this issue, we consider both initial BA and consecutive data

communication in this paper. We assume that the precoder

in the data communication phase can only exploit a limited

amount of CSI, which is obtained along the beams acquired

in the BA phase. Hence, the signaling and communication

procedure in our paper captures the fundamental features

of practical mmWave communication.

4) Low-complexity data transmit precoding. In the

BA phase, we use our previously proposed BA scheme

[16, 18, 19], after which each UE obtains a sparse estimate

of the channel gains associated to all pairs of AoA-AoD on

a finely spaced discrete grid, corresponding to the Tx and

Rx beamforming codebooks. For the data communication

phase, we consider three alternative precoding options on

top of the effective channel after the BA phase. These are

referred to as beam steering (BST), analog maximum ratio

transmission (MRT), and joint analog maximum ratio and

baseband zeroforcing (MR-ZF), respectively. The proposed

schemes are very suitable for practical implementations

due to the low-time-overhead and low-complexity. In

particular, the MR-ZF precoding scheme proposed in this

paper outperforms the state-of-the-art counterparts in the

literature.

Notation: We denote vectors, matrices, and scalars by

a,A, and a(A), respectively. For an integer K∈Z,

[K]denotes the index set {1, ..., K}. We represent sets by

calligraphic Aand their cardinality with |A|. We use E[·]

for the expectation, k·k for l2-norm, ~for continuous-time

convolution, ⊗for the Kronecker product, for Hadamard

product.

II. CHANNEL AND SIGNAL MODELS

A. Channel Model

One of the main new features of 5G wireless networks

is the densely spread small cell layer [31]. In small

cell configurations as illustrated in Fig. 2 (a), the BS

creates a fixed arc-like sectorized beam in the elevation

direction. The orientation of the BS beam center in the

elevation direction tends to be fixed with an elevation

angle αe[32]. It follows that the probing area in the

range direction is restricted and the intensive initial beam

searching takes place mainly in the azimuth direction.

For notation simplicity, in this paper we only focus on

the 2D azimuth plane. Extension to the 3D geometry is

conceptually straightforward although may lead to a rather

high dimensional search for the initial beam acquisition

phase. In the small cell scenario as illustrated in Fig. 2,

where the beam shape in the elevation direction is fixed

a priori in order to define the cell footprint area, the 2D

azimuth geometry is fully justified. We assume that the BS

serving simultaneously KUEs. The BS is equipped with

a uniform linear array (ULA) of Mantennas and MRF RF

chains, where K≤MRF M. Each UE is equipped with

a ULA of Nantennas and NRF NRF chains. Since

the focus of this paper is the BS architecture, we consider

the case of NRF = 1, where the extension to NRF >1

is straightforward and was considered in our work on BA

[16, 18, 19]. The propagation channel between the BS and

the k-th UE, k∈[K], consists of Lkmax{M, N}

significant multipath components. As a result, the N×M

baseband equivalent impulse response of the channel at

time slot scan be written as

Hs,k(t, τ) =

l=1

ρs,k,lej2πνk,ltaR(φk,l)aT(θk,l)Hδ(τ−τk,l)

l=1

Hs,k,l(t)δ(τ−τk,l),(1)

5. Data Communication for mmWave Multi-User MIMO 61

multipath component ρs,k,l is constant over a short interval

(within one slot) and changes from slot to slot according

to a wide-sense stationary process statistics characterized

by its power spectral density (Doppler spectrum) [33].

When the channel coherence time (related to the inverse

of the bandwidth of the Doppler spectrum, see [33]) is

significantly larger than the slot duration but equal or

smaller than the (non-consecutive) slot separation in time,

a convenient model is to consider the coefficients as i.i.d.

across different slots. Moreover, the Doppler shift νk,l as

defined in (1) introduces a continuous phase rotation for

each channel sample. Each multipath component (channel

tap coefficient) is formed by the superposition of a large

number of micro-scattering components (e.g., due to rough

surfaces) having (approximately) the same AoA-AoD and

delay. By the central limit theorem, it is customary to model

the superposition of these many small effects as Gaussian

[34, 35]. Hence, the multipath component coefficients can

be modeled as Rice fading given by

ρs,k,l ∼√γk,l rηk,l

1 + ηk,l

p1 + ηk,l

ˇρs,k,l!,(4)

where γk,l denotes the overall multipath component

strength, ηk,l ∈[0,∞)indicates the strength ratio

between the specular reflection (or LOS) and the scattered

components, and ˇρs,k,l ∼ CN(0,1) is a zero-mean

unit-variance complex Gaussian random variable whose

value changes in an i.i.d. fashion across different slots.

In particular, ηk,l → ∞ indicates a pure LOS path while

ηk,l = 0 indicates a pure scattered path, affected by

Rayleigh fading.

The AoA-AoDs (φk,l, θk,l)in (1) can take on arbitrary

values in the continuous AoA-AoD domain. Following

the widely used approach of [36], known as beam-domain

representation, we obtain a finite-dimensional

representation of the channel response (1). More precisely,

we consider the discrete set of AoA-AoDs

Φ := ˇ

φ: (1 + sin(ˇ

φ))/2 = n−1

N, n ∈[N],(5a)

Θ := ˇ

θ: (1 + sin(ˇ

θ))/2 = m−1

M, m ∈[M].(5b)

It follows that the corresponding sets AR:= {aR(ˇ

φ) : ˇ

φ∈

Φ}and AT:= {aT(ˇ

θ) : ˇ

θ∈Θ}form discrete dictionaries

to represent the channel response. For the ULAs considered

in this paper, the dictionaries ARand AT, after suitable

normalization, reduce to the columns of unitary Discrete

Fourier Transform (DFT) matrices FN∈CN×Nand

FM∈CM×M, with elements

[FN]n,n0=1

√Nej2π(n−1)( n0−1

N−1

2), n, n0∈[N],(6a)

[FM]m,m0=1

√Mej2π(m−1)( m0−1

M−1

2), m, m0∈[M].

(6b)

Consequently, based on a subarray basis indexed by i0, the

beam-domain representation of the channel response (1) is

given by [7, 36]

Hi0

s,k(t, τ) = FH

NHs,k(t, τ)·FM1{(i0−1) ˆ

M+1:i0ˆ

M,1:M}

l=1

Hi0

s,k,l(t)δ(τ−τl),(7)

where (i0≡1,ˆ

M=M)for the FC architecture,

and (i0∈[MRF],ˆ

M=M

MRF )for the OSPS

architecture. Here we define ˇ

Hi0

s,k,l(t) := FH

NHs,k,l(t)·

FM1{(i0−1) ˆ

M+1:i0ˆ

M,1:M}as the beam-domain l-th

multipath component between the k-th UE and the BS,

where 1{a1:a2,b1:b2}∈CM×Mis an indicator matrix, with

1at the components indexed by rows from a1to a2and by

columns from b1to b2, otherwise zero. The indicator matrix

takes into account the fact that the number of antenna

elements for each subarray in the OSPS architecture is MRF

times less than that in the FC architecture.

As shown in our earlier work [16] (and the references

therein), for the FC architecture, as the number of antennas

Mat the BS and Nat the UE increases, the DFT

basis provides a good sparsification of the propagation

channel. As a result, ˇ

Hi0

s,k(t, τ)can be approximated as

a sparse matrix, with non-zero elements in the locations

corresponding to small clusters of discrete AoA-AoD pairs.

For the OSPS architecture, note that the indices of non-zero

elements in ˇ

Hi0

s,k(t, τ)are identical for all i0∈[MRF].

However, the channel sparsity depends on the number of

antennas in each subarray. In both cases, we may encounter

a grid error in (7) since the AoAs-AoDs do not necessarily

fall into the uniform grid Φ×Θ. Nevertheless, as shown

in [16], the grid error becomes negligible by increasing the

number of (subarray) antennas (i.e., the grid resolution). In

our simulations, we do not constrain the AoA-AoD pairs

of the physical channel to take on values on the discrete

grid; therefore, the grid discretization effect is fully taken

into account in our numerical results.

B. Signaling Model

Because of space limitation, in this paper we focus

on SC signaling. Similar conclusions can be reached for

OFDM, although the latter is generally more fragile to

frame synchronization errors, large PAPR, and, before

BA is achieved, to inter-carrier interference due to

the fact that the Doppler spread between the several

multipath components may be large [19, 37]. Let xs(t) =

[xs,1(t), xs,2(t), ..., xs,K(t)]Tdenote the continuous-time

baseband equivalent signal (either pilot or data signal),

transmitted over the s-th slot. With HDA beamforming, the

beamformed signal at the output of the transmitter over the

s-th slot is generally given by

xs(t) = pE0·URF

s·WBB

s·xs(t),(8)

5. Data Communication for mmWave Multi-User MIMO 63

where for simplicity of exposition we restrict to the case

of uniform power allocation, with E0=Ptot Tc

Kindicating

the per-chip energy of each signal stream, where Ptot

denotes the total radiated power at the BS and Tc=1

denotes the chip duration with Bindicating the signaling

bandwidth. In (8), we define WBB

s∈CMRF×Kand URF

s∈

CM×MRF as the baseband (digital) and the RF analog

beamforming matrices, respectively. Note that, depending

on the transmitter architecture, the analog beamforming

matrix URF

stakes on the form

[˜

us,1,˜

us,2,··· ,˜

us,MRF ]and 





us,10··· 0

0˜

us,2··· 0

.....

0 0 ··· ˜

us,MRF







(9)

for the FC (left) and the OSPS (right) architectures,

respectively, where ˜

us,i ∈Cˆ

M,i∈[MRF], with ˆ

M=M

for the FC architecture and ˆ

M=M

MRF for the OSPS

architecture. Hence, in both cases URF

shas dimension

M×MRF, but FC has a full matrix, while OSPS has a

block-diagonal matrix, due to the constrained connectivity.

Without loss of generality, the beamforming vectors are

normalized as PMRF

i=1 kus,ik2=MRF.

The beamformed signal (8) goes through the channel

as defined in (1). At the UE side, because of the HDA

architecture, the UE does not have direct access to each

antenna element. Instead, at each slot s, the UE obtains

only a projection of the received signal by applying some

beamforming vector in the analog domain. We consider a

single RF chain at each UE as mentioned before. Thus, the

received signal at the k-th UE side is given by

ˆys,k(t) =vH

s,kHs,k(t, τ)~ˆ

xs(t) + zs,k(t)

=pE0vH

s,kHs,k(t, τ)~URF

s·WBB

s·xs(t)

+zs,k(t),(10)

where vs,k ∈CNdenotes the normalized beamforming

vector with kvs,kk= 1 at the k-th UE, and zs,k(t)is the

continuous-time complex Additive White Gaussian Noise

(AWGN) at the output of the UE RF chain, with a Power

Spectral Density (PSD) of N0Watt/Hz.

In the following, we will evaluate the performance of

different transmitter architectures as shown in Fig. 1. For

this purpose, it is useful to first define the channel SNR

before beamforming (BBF) SNRBBF, given by

SNRBBF, k =Ptot PLk

l=1 γk,l

N0B.(11)

where kis the index of the UE and γk,l denotes the

strength of the l-th multipath component. The SNR in

(11) indicates the ratio of the total received signal power

(summing over all the multipath components) over the

total noise power at the receiver baseband processor input,

assuming that the signal is isotropically transmitted by

the BS and isotropically received at the k-th UE over

the total bandwidth B. As mentioned before, one of the

challenges of mmWaves communication is that the SNR

before beamforming SNRBBF in (11) may be very low.

III. BEAM ACQUISITION AND DATA TRANSMISSION

We evaluate the performance of the FC and OSPS

architectures including both the BA phase and the

consequent data transmission phase, where the latter uses

the beam information obtained by the former. For the

BA phase we use the scheme proposed in our previous

work [19], that compares favorably with respect to several

competing schemes proposed in the literature. For the sake

of space limitation, we provide here only a high-level

summary of the scheme and invite the reader to consider

[19] for the full details. Fig. 3 (a) illustrates the considered

frame structure, which consists of three parts: the beacon

slot, the random access control channel (RACCH) slot,

and the data slot. As shown in Fig. 3 (b), the BS

broadcasts its pilot signals periodically over the beacon

slots. The measurements are collected at each UE locally

and independently of other UEs. Based on measurements

accumulated over a sequence of several beacon slots, each

UE can estimate a set of strongly coupled AoA-AoD

pairs, corresponding to the directions of strong propagation

paths between the UE and the BS arrays. These determine

the beamforming direction for possible data transmission.

During the RACCH slot, the BS stays in listening mode and

the UEs send beamformed uplink packets. These packets

contain basic information such as the UE ID and the

beam indices of the selected BS beam directions. The BS

responds with an acknowledgment (ACK) data packet in

the data subslot of a next frame, using the indicated beam

indices for transmission. From this moment on, the BS and

the UE are connected in the sense that, if the procedure is

successful, they can communicate by aligning their beams

along a small number of multipath components with strong

average power transmission.

As explained in details in [19], the BS beacon signals

are formed by MRF different PN sequences, each of

which undergoes a “multifinger” beam pattern obtained by

selecting a subset of the columns (or masked DFT columns

as in the case of OSPS). The beamforming patterns send

the signal energy uniformly distributed along subsets of

the BS AoD grid. The beamforming patterns follow a

pre-determined pseudo-random sequence, similar in the

spirit to the primary synchronization code of a W-CDMA

3G system for BS identification. During the beacon slot,

each UE kreceives using its own pseudo-random sequence

of multifinger beam patterns, and integrates the received

signal energy over the multiple time segments within a

beacon slot in order to obtain an estimate of the average

received energy. As a result, this fully non-coherent energy

measurement yields (approximately) the average energy

sum of several multipath components. These multipath

components corresponds to the AoA-AoD pairs in the grid

64 5.3 Original journal article

[33], and Ndindicates the number of the transmit symbols.

Accordingly, the received data signal at the k-th UE is

given by (13), where wk0denotes the k0-th column of

WBB,∆k,n,l = 2π(ˇνk,l +νk,lnTc), and Ck,k0,l,n :=

ρk,ldk0,n√E0(vH

kaR(φk,l)aT(θk,l)HURFwk0). We assume

that each UE uses standard timing synchronization with

respect to its strongest multipath component indexed by

l1, which is selected by its initial BA. To decode the

data signal, each UE performs matched filtering with

respect to the symbol pulse pr(t), sampling at epochs

t= ˆnTc+τk,l1. It follows that the discrete-time baseband

signal received at the k-th UE receiver takes on the form

of (14), where ˆn∆

k,k0,ˆn,n,l := (ˆn−n)Tc+τk,l1−τk0,l,

ϕr[t∆] = ϕr(t)|t=t∆:= Rpr(τ)p∗

r(τ−t∆)dτ, and zc

k[ˆn]

denotes the noise at the output of the matched filter with

variance N0·R|pr(t)|2dt =N0. As we can see, the first

term in (14) corresponds to the desired data symbol dk,n

multiplied by a different complex coefficient over each path

l.6Whereas, the last two terms in (14) correspond to the

multiuser interference and noise, respectively. By treating

the multiuser interference as noise, the asymptotic ergodic

spectral efficiency of the k-th UE is given by (15) and the

sum rate reads Rsum =PK

k=1 Rk. In all the schemes treated

here, coherent communication can be practically achieved

by including per-user beamformed pilot symbols at the cost

of a very small overhead, as it is quite state of art and usual

in virtually any modern wireless communication standard.

For simplicity, we shall not take into account this overhead

or the degradation of quasi-coherent receivers, which is

well known and not a specific feature of the systems under

consideration.

1) Hybrid Precoding Formulation

Now the remaining problem is how to define the

precoding/combining vectors. We assume that the BS

communicates with the k-th UE along its top-pbeams.

We will show later that the parameter p≥1is somehow a

tradeoff between the transmitter power spreading, multiuser

interference, and the system robustness to potential

blockages. To simplify the practical implementation, we

define the combining vector at the k-th UE as

vk=1

√pFN·

p0=1

vk,p0,(16)

where ˇ

vk,p0∈CNis an all-zero vector with a 1at

the component corresponding to the p0-th strong AoA,

i.e., the AoA index of the p0-th strong component in

Γ?

k. Denoted by V∈CNK×Kas the aggregated receive

beamforming matrix given by V=diag(v1,v2, ..., vK).

6Actually, we have shown in our precious work [19] that, the phase

perturbations over several strong paths are easy to compensate by standard

carrier synchronization techniques given that a successful BA is achieved

and the effective channel after BA has a very small time spreading. Due

to the space limit, in (14) and also in our simulations, we will keep

the phase perturbations such that the numerical results coincide with the

conservative worst-case scenario.

It follows that the receive data signal vector ¯

y(t) =

[y1(t), y2(t), ..., yK(t)]T∈CKcorresponding to the K

UEs can be written as

y(t) = pE0VH·H(t, τ)~URF ·WBB ·xd(t)+¯

z(t)

(a)

=pE0VH·H(t, τ)·U·ARF ·WBB~xd(t)+¯

z(t)

(b)

=pE0e

H(t, τ)·ARF ·WBB~xd(t) + ¯

z(t),

(17)

where ¯

z(t)∈CKindicates the noise vector, URF :=

U·ARF is the analog beamforming matrix, e

H(t, τ) :=

VH·H(t, τ)·Udenotes a constructed effective channel,

and Hs(t, τ)∈CNK×Mrepresents the aggregated

instantaneous channel of all the KUEs given by

H(t, τ) = H1(t, τ)T,H2(t, τ)T,··· ,HK(t, τ)TT,(18)

where Hk(t, τ),k∈[K], is given in (1). In (17) (a), we

define U∈CM×pK as the angular support, and ARF =

[a1,a2, ..., aK]∈CpK×Kas the coefficient tuning for the

analog part. More precisely, we assume U= [U1, ..., UK],

where Uk∈CM×p,k∈[K], takes on the form

Uk=FM1{(k0−1) ˆ

M+1:k0ˆ

M,1:M}

×hˇ

uk,1,ˇ

uk,2, ..., ˇ

uk,pi,(19)

where (i0≡1,ˆ

M=M)for the FC architecture, and (i0=

k, ˆ

M=M

MRF )for the OSPS architecture. Also, we define

uk,p0∈CM,p0∈[p], as an all-zero vector with a 1at

the component corresponding to the p0-th strongest AoD

of Γ?

Notice that in order to construct the beamforming vector

at each k-th UE and the precoding vectors at the BS, only

the AoA-AoD indices of the pstrongest components in the

estimated channel gain matrix Γ?

kare needed. Then, once

these vectors are fixed, the resulting effective channel has

much lower dimensions than the original physical N×M

channel (from array to array). Therefore, it can be estimated

using orthogonal uplink pilots and channel reciprocity as

in regular TDD MU-MIMO (e.g., see [38, 39]). Namely,

the constructed effective channel matrix e

H(t, τ)in (17) (b)

has dimension K×(p K), and can be estimated using pK

uplink pilot sub-slots using TDD reciprocity.

2) Beam Steering (BST) Scheme

The BST scheme consists of simply steering the Kdata

streams towards the KUEs along their strongest AoD.

Hence, we have p= 1 in (16) and in (19), respectively.

In such case, the analog tuning matrix and the baseband

precoding matrices under the BST precoding scheme turn

to be identity, i.e., ARF =WBB =IK. Note that in the

BST scheme, we do not need any additional uplink channel

estimation of e

H(t, τ). Namely, once the UEs has fed back

its strongest AoD control packet, the BS can immediately

provide the BST precoder.

5. Data Communication for mmWave Multi-User MIMO 67

ˆyk(t) = pE0vH

kHk(t, τ)~URF ·WBB ·xd(t)+zk(t)

n=1

k0=1

l=1

dk0,npE0vH

kHk,l(t)URFwk0pr(t−τk0,l −nTc) + zk(t)

n=1

k0=1

l=1

Ck,k0,l,nej∆k,n,l pr(t−τk0,l −nTc) + zk(t)(13)

yk[ˆn] = yk(t)|t=ˆnTc+τk,l1= ˆyk(t)~p∗

r(−t)t=ˆnTc+τk,l1

n=1

k0=1

l=1

Ck,k0,l,nej∆k,n,l ϕrˆn∆

k,k0,ˆn,n,l+

n=1

k[ˆn]

n=1



l=1

Ck,k,l,nej∆k,n,l ϕrˆn∆

k,k,ˆn,n,l+X

k06=k

l=1

Ck,k0,l,nej∆k,n,l ϕrˆn∆

k,k0,ˆn,n,l+zc

k[ˆn]

(14)

Rk= log2

1 +

EhPLk

l=1 Ck,k,l,nej∆k,n,l ϕrhˆn∆

k,k,ˆn,n,li2i

EhPk06=kPLk

l=1 Ck,k0,l,nej∆k,n,l ϕrhˆn∆

k,k0,ˆn,n,li2i+N0

(15)

3) Analog Maximum Ratio Transmission (MRT) Scheme

In this scheme, we aims to maximize the desired

signal power as well as to increase the scheme blockage

robustness. To this end, the baseband precoding matrix

remains identity, i.e., WBB =IK, while the k-th analog

MRT tuning vector (i.e., the k-th column of ARF) is given

ak=e

H(t, τ){k,:}H

1{(k0−1)ˆp+1:k0ˆp}·∆RF,(20)

where e

H(t, τ){k,:}indicates the k-th row of e

H(t, τ),

and ∆RF ∈R+denotes the normalizing factor

such that PMRF

i=1 kuik2=MRF. The indicator vector

1{(k0−1)ˆp+1:k0ˆp}∈CpK has components 1over the index

{(k0−1)ˆp+1 : k0ˆp}otherwise 0, where (k0≡1,ˆp=pK)

for the FC architecture and (k0=k, ˆp=p)for the

OSPS architecture. Here the indicator vector ensures that,

in the OSPS architecture, the analog beamforming matrix

URF =U·ARF satisfies the block diagonal structure as

illustrated in (9).

4) Joint Analog Maximum Ratio and Baseband

Zeroforcing (MR-ZF) Scheme

On top of the previous MRT scheme, in this joint MR-ZF

scheme, we propose to make use of the baseband precoding

to further reduce the multiuser interference. Accordingly,

the analog MRT vectors in ARF are given by (20), while

the baseband ZF matrix WBB takes on the form

WBB =

e

H(t, τ)ARFH

·e

H(t, τ)ARF e

H(t, τ)ARFH−1

·∆ZF,

(21)

where ∆ZF ∈R+is the normalizing factor ensuring the

total radiated power constraint, i.e., PK

k=1 kwkk2=K.

IV. HARDWARE IMPAIRMENTS

In all the above derivations, we have implicitly assumed

that all the hardware components work in their ideal

range without any distortion or power dissipation. However,

in practical hardware systems, such assumption is not

trivial to meet. For example, the implementation of HDA

transceivers consists of a large number of power dividers

and combiners in the analog part, particularly for the

FC architecture. The power dissipation caused by these

components has a severe impact on the transmit power and

the power efficiency. Moreover, due to the superposition

of multiple beamformed pilots / data, the input signal at

the PAs may encounter a large PAPR. Also, different

beamforming vectors will create different power levels

for different PAs. As a result, the input power for

some individual PAs may exceed their saturation limit

(relevant to per-antenna power constraint) and even cause

a disruption of the whole transmission. All these hardware

impairment have a severe impact on the transmitter

performance and should not be neglected. In this section,

we will provide the mathematical model to evaluate the

hardware efficiency of different transmitter architectures

given in Fig. 1.

68 5.3 Original journal article

We assume that each analog path has simultaneous

amplitude and phase control as shown in Fig. 1. Refer

to (8), let ˜

x∈CMdenote the pre-amplified beamformed

signal7, given by

x=√αcom ·e

URF ·√αdiv ·WBB ·x,(22)

where x= [x1,··· , xK]∈CKdenotes the transmit

symbol, with E[|xi|2] = ,i∈[K]. The factor αdiv

indicates the power splitting at the divider, with αdiv =1

for the FC architecture as shown in Fig. 1 (a) and αdiv =

MRF

Mfor the OSPS architecture as shown in Fig. 1 (b).

Moreover, the factor αcom models the power dissipation

factor of the combiners, i.e., αcom =1

MRF for the FC

architecture, and αcom = 1 for the OSPS architecture. Both

αdiv and αcom result from the hardware implementation

and are based on the corresponding S-parameters of the

dividers and combiners as in [5]. We assume that the

baseband beamforming matrix WBB is of dimension K×K

with K=MRF, and the analog beamforming matrix

URF = [u1, ..., uMRF ]∈CM×MRF satisfies the specific

FC / OSPS architecture as illustrated in (9).

We consider the rather simple BST precoding with

WBB =IK. To first meet the total power constraint, for

any i∈[MRF], we have kuik2=Mfor the FC architecture

and kuik2=M

MRF for the OSPS architecture, respectively.

It follows that the effective pre-amplified radiated power

of the beamformed signal ˜

xin (22) can be written as

P=E[˜

xH˜

x] = αcomαdiv ·E[xH(e

URF)He

URFx]

=αcomαdiv ·tr E[xxH]·(e

URF)He

URF.(23)

Accordingly, the pre-amplified radiated power for the FC

and the OSPS architectures reads ˜

PFC =MRF 1

MRF and

POSPS =MRF, respectively. As we can see, in order

to achieve the same output power, the FC transmitter

should compensate for an additional combiner power

dissipation. More precisely, the transmitter should either

boost the input signal as MRFxor choose PAs with

larger gain for the amplification stage. We consider the

former approach and mathematically include the potential

boosting factor MRF as well as the factors (αcom, αdiv)

into the beamforming matrix e

URF. Denoted by URF as

the integrated analog beamforming matrix, such that the

pre-amplified beamformed signal in (22) can be written

as ˜

x=URF ·WBB ·x, which is consistent with our

assumptions and formulations in Section II.

The beamformed signal (22) then goes through the

amplification stage, where at each antenna branch a

PA amplifies the signal before transmission. We assume

that the PAs in different antenna branches have the

same input-output relation. For any given antenna in the

transmitter array, let Prad denote the radiated power of

the antenna, and Pcons denote the consumed power of the

7For notation simplicity, here we ignored the slot index sand the time

index t.

corresponding PA, which includes both the radiated power

and the dissipated power. Following the approach in [28],

the power consumed by the PA takes on the form

Pcons =√Pmax

ηmax pPrad,(24)

where Pmax is the maximum output power of the PA with

Prad ≤Pmax and ηmax is the maximum efficiency of the

PA. Note that this relation holds for the most common

PA implementations and is therefore a good choice for the

following calculation. Considering that the PAs are often

the predominant power consumption part, we define ηeff

given by

ηeff =Prad

Pcons

(25)

as the metric to effectively compare the power efficiency of

the two transmitter architectures shown in Fig. 1. Note that

due to the superposition of multiple beamforming vectors

(particularly for the FC architecture) and the potentially

high PAPR of the time-domain transmit waveform ˜

xin

(22) (particularly with OFDM signaling), the input power

for some individual PA may exceed its saturation limit. This

would result in non-linear distortion and even the disruption

of the whole transmission. To compare the two transmitter

architectures and ensure that all the underlying MPAs

simultaneously work in their linear range, we generally

have two options:

Option I: Both the FC architecture and the OSPS

architecture utilize the same PA but apply a different input

back-off αoff ∈(0,1], such that the peak power of the

radiated signal is smaller than Pmax. As a reference, we

denote by (Prad,0, ηmax,0)as the parameters of a reference

PA under the reference precoding/beamforming strategy

with a power backoff factor αoff,0(as illustrated later

in Section V). For different scenarios (with certain αoff)

the average radiated power and the consumed power take

the form Prad =αoff

αoff,0Prad,0,Pcons =√Pmax,0

ηmax,0√Prad. The

transmitter power efficiency is given by

ηeff =Prad

Pcons

=√Prad ·ηmax,0

pPmax,0

.(26)

Option II: We choose to deploy different PAs for the

FC architecture and the OSPS architecture. More precisely,

we assume that the underlying PA has a maximum output

power of Pmax =αoff,0

αoff Pmax,0, where αoff has the same

value as in Option I. Consequently, the average radiated

power and the consumed power of the underlying PA can

be written as Prad =Prad,0,Pcons =√Pmax,0·αoff,0/αoff

ηmax √Prad.

The transmitter power efficiency is given by

ηeff =Prad

Pcons

=√Prad ·ηmax

pPmax,0·αoff,0·√αoff.(27)

Note that the characteristics (Pmax and ηmax)of

different PAs highly depend on the operation frequency,

5. Data Communication for mmWave Multi-User MIMO 69

implementation, and technology. Aiming at illustrating how

to apply the proposed analysis framework in practical

system design, we will exemplify a set of PA parameters

in Section V to evaluate the efficiency ηeff of the two

architectures given in Fig. 1. For the comparison of BA

and data communication algorithms, we are interested in

the performance of the corresponding algorithms using

the different transmitter architectures but with the same

channel condition as in (11). Therefore, we assume

the same total radiated power Ptot constraint for both

architectures in Fig. 1. In practical systems, this assumption

can be satisfied by applying a certain power backoff as in

Option I or chosing different PAs as in Option II. This in

addition fulfills the per-antenna power constraint, such that

all the underlying PAs work in their linear range with an

identical scalar gain. However, we will show in Section V

that, under the same radiated power constraint, different

architectures may have a different power efficiency.

V. NUMERICAL EVALUATION

We now present the numerical results to evaluate

the proposed precoding schemes and to illustrate the

performance of different transmitter architectures as shown

in Fig. 1. The BA scheme was already extensively

studied in [16, 18, 19] in terms of complexity,

system-level scalability, and robustness to fast channel

time-variations / large Doppler spread. Hence, here we

focus only on the difference in time-to-successful BA

required by the two BS architectures under comparison.

We consider a system with a BS using M= 128 antennas

and MRF = 4 RF chains. The BS simultaneously schedules

K=MRF = 4 UEs, each of which uses N= 16 antennas

and NRF = 1 RF chain. We assume a short preamble

structure used in IEEE 802.11ad [1, 40], where the beacon

slot is of duration t0S= 1.891 µs. The system is assumed

to work at f0= 40 GHz with a bandwidth of B= 0.8GHz,

namely, each beacon slot amounts to more than 1500 chips.

In the following simulations, otherwise stated, we will

assume a fixed total radiated power constraint Ptot, where

all the underlying PAs working in their linear range (w.r.t.,

per-antenna power constraint Pmax) with an identical scalar

gain. The MU-MIMO channel is generated in two ways:

1) In Section V-A and Section V-B, we use the channel

model in (1) to generate the channel matrix between

each UE kand the BS. Based on the practical mmWave

MIMO channel measurements in [29], we assume Lk= 3,

k∈[K], multipath components for each UE, given by

(γk,1= 1, ηk,1= 100),(γk,2= 0.6, ηk,2= 10) and

(γk,3= 0.4, ηk,3= 0) with respect to (4). Thus, the first

link can be roughly regarded as the LOS path, while the

remaining links represent the NLOS paths. We also assume

that the LOS paths for the simultaneously scheduled UEs

are well separated in the beam domain, while all the NLOS

paths are generated in a random way.

2) In Section V-C, we use the quasi-deterministic radio

channel generator (QuaDRiGa) to generate the propagation

channel matrix. The channel model is based on the

3GPP 38.901 standard and takes into account the spatial

consistency [30, 41]. In this case, the height of the BS

antenna array is set to 10 m. The beam center of the

BS orientates to the ground with an elevation angle8of

αe=−20◦as shown in Fig. 2 (a). The simultaneous

scheduled UEs are set to 1.5m in height and 18 ∼25 m

horizontally away from the BS with a downlink AoD

difference of ∆θmin ≈8◦[42]. Each UE kmoves towards

the BS at a speed of ∆vk= 1 m/s. We will show

that the numerical results based on our proposed channel

model (1) are consistent with the results based on the

QuaDRiGa generator, implying that the proposed work not

only theoretically but also practically provides valuable

references for mmWave system design.

A. Evaluation of the Proposed Precoding Schemes

The efficiency of the proposed precoding schemes are

illustrated in Fig. 6. As a comparison, we also simulate the

ZF precoder proposed in [26], where the effective channel

is approximated by the initial BA vectors, and only a

single path is selected between each UE and the BS. As

we can see from Fig. 6 (a), for the FC architecture with

no blockage, all the schemes coincide with each other in

the range of SNRBBF ≤0dB. Whereas when SNRBBF >

0dB, the performance ranking of the underlying precoding

schemes is as follow (MR-ZF, p= 2)≈(MR-ZF, p= 1)>

(MRT, p= 1)>(MRT, p= 2)>(BST) ≈(ZF in [26]).

Here the MRT scheme with p= 2 performs worse than

with p= 1 due to the fact of power spreading and the

fact that with multiple receiving directions, the UE tends

to have more interference. However, this effect is not

observable in the MR-ZF scheme because of the further

power coefficient tuning and interference cancellation,

which results from the baseband zeroforcing. Next, by

increasing the blockage probability of the strongest path

while remaining unblocked for all the less strong paths

between each UE and the BS, as shown in Fig. 6 (b)

and Fig. 6 (c), the curves with p= 2 drops much less

than the others (equivalent to p= 1), and the scheme

of MR-ZF with p= 2 achieves the best performance.

For the OSPS architecture, when there is no blockage as

shown in Fig. 6 (d), in the low SNR range (SNRBBF ≤

−10 dB), all the curves (roughly) coincide with each

other. Whereas, by increasing SNRBBF >−10 dB, the

precoding schemes rank (MR-ZF, p= 2)≈(MR-ZF, p=

1)>(MRT, p= 1)≈(BST) ≈(ZF in [26]) >(MRT, p= 2).

Similar with the FC case, the MR-ZF scheme for the

OSPS architecture achieves the best performance when

increasing the blockage probability as shown in Fig. 6 (e)

and Fig. 6 (f). As a brief summary w.r.t. the given scenario,

for both architectures, when the channel SNR is weak and

there is no blockage, we claim that the BST scheme is

8In QuaDRiGa, the elevation angle 90◦points to the zenith and 0◦

points to the horizon.

70 5.3 Original journal article

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(a)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(b)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(a)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(b)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(a)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(b)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(a)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(b)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(a)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(b)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(a)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(b)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p=2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p=2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p=2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(a)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(b)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(a)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(b)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(a)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(b)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p=2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p=2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p=2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(a)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(b)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(a)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(b)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(a)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(b)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p=2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(e)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p=2

OSPS, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

(f)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p=2

OSPS, ZF in [26]

Fig. 6: The sum spectral eﬀiciency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0,

(b) 0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

than the latter for PD≥0.95.

Spectral eﬀiciency for the data communication phase.

To compare the spectral eﬀiciency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the

range of SNRBBF ≤ −10 dB, which is more relevant in

mmWave channels, all the 4curves coincide with each

other. Namely, for either the MR-ZF scheme or the BST

scheme, the two architectures achieve a rather similar

spectral eﬀiciency. In contrast, when SNRBBF >−10 dB,

the MR-ZF scheme performs better. The two architectures

with the MR-ZF precoding again achieve a rather similar

performance.

Hardware power eﬀiciency. To evaluate the architecture

power eﬀiciency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power eﬀiciency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a

SC modulation. We use reference PAs with Pmax,0= 6

dBm and ηmax,0= 0.3. The backoff factor with respect

to different waveforms and transmitter architectures can

be written as αoff = 1/(PPAPR), where PPAPR represents

the PAPR of the input signal at a PA. The investigation

for 3GPP LTE in [37] showed that with a probability

of 0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work in

the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same eﬀiciency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power eﬀiciency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral eﬀiciency with

certain precoders, but the OSPS architecture outperforms

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear

range have roughly the same maximum eﬀiciency ηmax,0.

Fig. 6: The sum spectral efficiency vs. increasing SNRBBF. The blockage probability of the strongest path is given by (a) 0.0, (b)

0.3, (c) 0.6for the FC architecture, and (d) 0.0, (e) 0.3, (f) 0.6for the OSPS architecture.

preferred since it is rather simple but adequate to achieve

good performance. However, when the channel SNR is

not too weak or there are potential blockages, the MR-ZF

scheme with p > 1outperforms the other schemes. As

a side note, in practical implementation, the choice of p

should not be too large since it plays a trade-off between

blockage robustness, power spreading and the overhead for

additional channel estimation.

B. Fully-Connected (FC) or One-Stream-Per-Subarray

(OSPS)?

Note that the performance of different architectures

highly depends on the channel condition and the underlying

precoders. On top of the given scenario in this paper,

we jointly evaluate the architecture performance in three

aspects:

Training efficiency for the initial BA phase. Let PD

denote the detection probability, i.e., the probability of

finding the strongest AoA-AoD pair between the BS and a

generic UE. The BA results are illustrated in Fig. 7 (a). As

a comparison, we also simulate a recent time-domain BA

algorithm proposed in [43], which focuses on estimating

the instantaneous channel coefficients with an orthogonal

matching pursuit (OMP) technique. As we can see, the

proposed BA scheme requires much less training overhead

than that in [43]. In addition, due to the fact that the OSPS

architecture has lower angular resolution and encounters

larger sidelobe power leakage than the FC case, the former

requires moderately ∼10 more beacon slots than the latter

for PD≥0.95.

Spectral efficiency for the data communication phase.

To compare the spectral efficiency of the two transmitter

architectures as shown in Fig. 1, we consider a no-blockage

scenario and focus on two precoding schemes, i.e., the

simple BST scheme and the high-performance MR-ZF

scheme with p= 2. As we can see in Fig. 7 (b), in the range

of SNRBBF ≤ −10 dB, which is more relevant in mmWave

channels, all the 4curves coincide with each other. Namely,

for either the MR-ZF scheme or the BST scheme, the two

architectures achieve a rather similar spectral efficiency. In

contrast, when SNRBBF >−10 dB, the MR-ZF scheme

performs better. The two architectures with the MR-ZF

precoding again achieve a rather similar performance.

Hardware power efficiency. To evaluate the architecture

power efficiency, otherwise stated, we will consider the

simple BST precoder. Also, since the modulation highly

affects the power efficiency, we will take into account

both the SC and the OFDM signaling in this section.

We first assume a reference scenario as the baseline, i.e,

the OSPS architecture using the BST precoder and a SC

modulation. We use reference PAs with Pmax,0= 6 dBm

and ηmax,0= 0.3. The backoff factor with respect to

different waveforms and transmitter architectures can be

written as αoff = 1/(PPAPR), where PPAPR represents the

PAPR of the input signal at a PA. The investigation

5. Data Communication for mmWave Multi-User MIMO 71

20 40 60 80 100 120 140 160

0.2

0.4

0.6

0.8

Number of beacon slots T

(a)

FC, NNLS

OSPS, NNLS

FC, OMP [43]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(b)

FC, BST

FC, MR-ZF p=2

OSPS, BST

OSPS, MR-ZF p=2

−2 0 2 4 6

−5

Prad,0(dBm)

Prad (dBm)

(c)

FC, SC, αoff=−9.8dB

FC, OFDM, αoff =−11.4dB

OSPS, SC, αoff=−7.2dB

OSPS, OFDM, αoff=−11.4dB

−6−4−20246

0.1

0.15

0.2

0.25

0.3

Prad (dBm)

ηeff

(d)

FC, SC

FC, OFDM

OSPS, SC

OSPS, OFDM

Fig. 7: The performance comparison of different transmitter architectures. (a) The initial BA detection probability vs. the training overhead

with SNRBBF =−19 dB. (b) The sum spectral efficiency vs. increasing SNRBBF, without blockage. (c) The actual radiated power under

Option I vs. the radiated power of the reference scenario. (d) The power efficiency under Option II vs. the actual radiated power.

achieve a rather similar spectral efficiency. In contrast, when

SNRBBF >−10 dB, the MR-ZF scheme performs better. The

two architectures with the MR-ZF precoding again achieve a

rather similar performance.

Hardware power efficiency. To evaluate the architecture

power efficiency, otherwise stated, we will consider the simple

BST precoder. Also, since the modulation highly affects the

power efficiency, we will take into account both the SC and the

OFDM signaling in this section. We first assume a reference

scenario as the baseline, i.e, the OSPS architecture using

the BST precoder and a SC modulation. We use reference

PAs with Pmax,0= 6 dBm and ηmax,0= 0.3. The backoff

factor with respect to different waveforms and transmitter

architectures can be written as αoff = 1/(PPAPR), where

PPAPR represents the PAPR of the input signal at a PA.

The investigation for 3GPP LTE in [37] showed that with

a probability of 0.999, the PAPR of the LTE SC waveform

(known as SC-FDMA) is smaller than ∼7.2dB and the PAPR

of the LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of the

signals from different RF chains. Since each OFDM signal

can be modeled as a Gaussian random process [37] and the

signals from different RF chains are independent, the PAPR

of the sum is the same as of one RF chain. For the case of SC

signaling, there is no clear work in the literature that shows

how the sum of SC signals behaves. We simulated the sum

of MRF = 4 SC signals using the same parameters as in [37].

The result shows that with probability of 0.999 the PAPR of

the sum is smaller than ∼9.8dB. We apply these values and

without loss of generality, we choose αoff,0=−7.2dB as

the reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the same

efficiency for a given Prad. However, as illustrated in Fig. 7 (c),

the OSPS architecture with SC signaling (OSPS, SC) achieves

the highest Prad, followed by (FC, SC), (OSPS, OFDM), and

(FC, OFDM). In contrast, by deploying different PAs (Option

II)9, Fig. 7 (d) shows that (OSPS, SC) achieves the highest

power efficiency, followed by (FC, SC), (OSPS, OFDM) and

(FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral efficiency with

certain precoders, but the OSPS architecture outperforms the

FC case in terms of hardware complexity and power efficiency,

only at the cost of a slightly longer latency for the initial BA.

C. Simulations Based on QuaDRiGa

In this section, we resort to the 3D geometry based channel

generator QuaDRiGa [30] to show that our numerical results

are quite consistent with practical mmWave communication

channels.10 More precisely, we apply our BA and precoding

schemes over ∼3×105channel snapshots generated by

QuaDRiGa. These channel snapshots correspond to a short

9Since the ηmax of different PAs highly depends on the technology, for

simplicity, we assume that different PAs working in their linear range have

roughly the same maximum efficiency ηmax,0.

10Due to the QuaDRiGa generator limits, only the no-blockage scenario is

considered in this section.

20 40 60 80 100 120 140 160

0.2

0.4

0.6

0.8

Number of beacon slots T

(a)

FC, NNLS

OSPS, NNLS

FC, OMP [43]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(b)

FC, BST

FC, MR-ZF p=2

OSPS, BST

OSPS, MR-ZF p=2

−2 0 2 4 6

−5

Prad,0(dBm)

Prad (dBm)

(c)

FC, SC, αoff=−9.8dB

FC, OFDM, αoff =−11.4dB

OSPS, SC, αoff=−7.2dB

OSPS, OFDM, αoff=−11.4dB

−6−4−20246

0.1

0.15

0.2

0.25

0.3

Prad (dBm)

ηeff

(d)

FC, SC

FC, OFDM

OSPS, SC

OSPS, OFDM

Fig. 7: The performance comparison of different transmitter architectures. (a) The initial BA detection probability vs. the training overhead

with SNRBBF =−19 dB. (b) The sum spectral efficiency vs. increasing SNRBBF, without blockage. (c) The actual radiated power under

Option I vs. the radiated power of the reference scenario. (d) The power efficiency under Option II vs. the actual radiated power.

achieve a rather similar spectral efficiency. In contrast, when

SNRBBF >−10 dB, the MR-ZF scheme performs better. The

two architectures with the MR-ZF precoding again achieve a

rather similar performance.

Hardware power efficiency. To evaluate the architecture

power efficiency, otherwise stated, we will consider the simple

BST precoder. Also, since the modulation highly affects the

power efficiency, we will take into account both the SC and the

OFDM signaling in this section. We first assume a reference

scenario as the baseline, i.e, the OSPS architecture using

the BST precoder and a SC modulation. We use reference

PAs with Pmax,0= 6 dBm and ηmax,0= 0.3. The backoff

factor with respect to different waveforms and transmitter

architectures can be written as αoff = 1/(PPAPR), where

PPAPR represents the PAPR of the input signal at a PA.

The investigation for 3GPP LTE in [37] showed that with

a probability of 0.999, the PAPR of the LTE SC waveform

(known as SC-FDMA) is smaller than ∼7.2dB and the PAPR

of the LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of the

signals from different RF chains. Since each OFDM signal

can be modeled as a Gaussian random process [37] and the

signals from different RF chains are independent, the PAPR

of the sum is the same as of one RF chain. For the case of SC

signaling, there is no clear work in the literature that shows

how the sum of SC signals behaves. We simulated the sum

of MRF = 4 SC signals using the same parameters as in [37].

The result shows that with probability of 0.999 the PAPR of

the sum is smaller than ∼9.8dB. We apply these values and

without loss of generality, we choose αoff,0=−7.2dB as

the reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the same

efficiency for a given Prad. However, as illustrated in Fig. 7 (c),

the OSPS architecture with SC signaling (OSPS, SC) achieves

the highest Prad, followed by (FC, SC), (OSPS, OFDM), and

(FC, OFDM). In contrast, by deploying different PAs (Option

II)9, Fig. 7 (d) shows that (OSPS, SC) achieves the highest

power efficiency, followed by (FC, SC), (OSPS, OFDM) and

(FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral efficiency with

certain precoders, but the OSPS architecture outperforms the

FC case in terms of hardware complexity and power efficiency,

only at the cost of a slightly longer latency for the initial BA.

C. Simulations Based on QuaDRiGa

In this section, we resort to the 3D geometry based channel

generator QuaDRiGa [30] to show that our numerical results

are quite consistent with practical mmWave communication

channels.10 More precisely, we apply our BA and precoding

schemes over ∼3×105channel snapshots generated by

QuaDRiGa. These channel snapshots correspond to a short

9Since the ηmax of different PAs highly depends on the technology, for

simplicity, we assume that different PAs working in their linear range have

roughly the same maximum efficiency ηmax,0.

10Due to the QuaDRiGa generator limits, only the no-blockage scenario is

considered in this section.

20 40 60 80 100 120 140 160

0.2

0.4

0.6

0.8

Number of beacon slots T

(a)

FC, NNLS

OSPS, NNLS

FC, OMP [43]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(b)

FC, BST

FC, MR-ZF p=2

OSPS, BST

OSPS, MR-ZF p=2

−2 0 2 4 6

−5

Prad,0(dBm)

Prad (dBm)

(c)

FC, SC, αoff=−9.8dB

FC, OFDM, αoff =−11.4dB

OSPS, SC, αoff=−7.2dB

OSPS, OFDM, αoff=−11.4dB

−6−4−20246

0.1

0.15

0.2

0.25

0.3

Prad (dBm)

ηeff

(d)

FC, SC

FC, OFDM

OSPS, SC

OSPS, OFDM

Fig. 7: The performance comparison of different transmitter architectures. (a) The initial BA detection probability vs. the training overhead

with SNRBBF =−19 dB. (b) The sum spectral efficiency vs. increasing SNRBBF, without blockage. (c) The actual radiated power under

Option I vs. the radiated power of the reference scenario. (d) The power efficiency under Option II vs. the actual radiated power.

achieve a rather similar spectral efficiency. In contrast, when

SNRBBF >−10 dB, the MR-ZF scheme performs better. The

two architectures with the MR-ZF precoding again achieve a

rather similar performance.

Hardware power efficiency. To evaluate the architecture

power efficiency, otherwise stated, we will consider the simple

BST precoder. Also, since the modulation highly affects the

power efficiency, we will take into account both the SC and the

OFDM signaling in this section. We first assume a reference

scenario as the baseline, i.e, the OSPS architecture using

the BST precoder and a SC modulation. We use reference

PAs with Pmax,0= 6 dBm and ηmax,0= 0.3. The backoff

factor with respect to different waveforms and transmitter

architectures can be written as αoff = 1/(PPAPR), where

PPAPR represents the PAPR of the input signal at a PA.

The investigation for 3GPP LTE in [37] showed that with

a probability of 0.999, the PAPR of the LTE SC waveform

(known as SC-FDMA) is smaller than ∼7.2dB and the PAPR

of the LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of the

signals from different RF chains. Since each OFDM signal

can be modeled as a Gaussian random process [37] and the

signals from different RF chains are independent, the PAPR

of the sum is the same as of one RF chain. For the case of SC

signaling, there is no clear work in the literature that shows

how the sum of SC signals behaves. We simulated the sum

of MRF = 4 SC signals using the same parameters as in [37].

The result shows that with probability of 0.999 the PAPR of

the sum is smaller than ∼9.8dB. We apply these values and

without loss of generality, we choose αoff,0=−7.2dB as

the reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the same

efficiency for a given Prad. However, as illustrated in Fig. 7 (c),

the OSPS architecture with SC signaling (OSPS, SC) achieves

the highest Prad, followed by (FC, SC), (OSPS, OFDM), and

(FC, OFDM). In contrast, by deploying different PAs (Option

II)9, Fig. 7 (d) shows that (OSPS, SC) achieves the highest

power efficiency, followed by (FC, SC), (OSPS, OFDM) and

(FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral efficiency with

certain precoders, but the OSPS architecture outperforms the

FC case in terms of hardware complexity and power efficiency,

only at the cost of a slightly longer latency for the initial BA.

C. Simulations Based on QuaDRiGa

In this section, we resort to the 3D geometry based channel

generator QuaDRiGa [30] to show that our numerical results

are quite consistent with practical mmWave communication

channels.10 More precisely, we apply our BA and precoding

schemes over ∼3×105channel snapshots generated by

QuaDRiGa. These channel snapshots correspond to a short

9Since the ηmax of different PAs highly depends on the technology, for

simplicity, we assume that different PAs working in their linear range have

roughly the same maximum efficiency ηmax,0.

10Due to the QuaDRiGa generator limits, only the no-blockage scenario is

considered in this section.

20 40 60 80 100 120 140 160

0.2

0.4

0.6

0.8

Number of beacon slots T

(a)

FC, NNLS

OSPS, NNLS

FC, OMP [43]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(b)

FC, BST

FC, MR-ZF p=2

OSPS, BST

OSPS, MR-ZF p=2

−2 0 2 4 6

−5

Prad,0(dBm)

Prad (dBm)

(c)

FC, SC, αoff=−9.8dB

FC, OFDM, αoff =−11.4dB

OSPS, SC, αoff=−7.2dB

OSPS, OFDM, αoff=−11.4dB

−6−4−20246

0.1

0.15

0.2

0.25

0.3

Prad (dBm)

ηeff

(d)

FC, SC

FC, OFDM

OSPS, SC

OSPS, OFDM

Fig. 7: The performance comparison of different transmitter architectures. (a) The initial BA detection probability vs. the training overhead

with SNRBBF =−19 dB. (b) The sum spectral efficiency vs. increasing SNRBBF, without blockage. (c) The actual radiated power under

Option I vs. the radiated power of the reference scenario. (d) The power efficiency under Option II vs. the actual radiated power.

achieve a rather similar spectral efficiency. In contrast, when

SNRBBF >−10 dB, the MR-ZF scheme performs better. The

two architectures with the MR-ZF precoding again achieve a

rather similar performance.

Hardware power efficiency. To evaluate the architecture

power efficiency, otherwise stated, we will consider the simple

BST precoder. Also, since the modulation highly affects the

power efficiency, we will take into account both the SC and the

OFDM signaling in this section. We first assume a reference

scenario as the baseline, i.e, the OSPS architecture using

the BST precoder and a SC modulation. We use reference

PAs with Pmax,0= 6 dBm and ηmax,0= 0.3. The backoff

factor with respect to different waveforms and transmitter

architectures can be written as αoff = 1/(PPAPR), where

PPAPR represents the PAPR of the input signal at a PA.

The investigation for 3GPP LTE in [37] showed that with

a probability of 0.999, the PAPR of the LTE SC waveform

(known as SC-FDMA) is smaller than ∼7.2dB and the PAPR

of the LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of the

signals from different RF chains. Since each OFDM signal

can be modeled as a Gaussian random process [37] and the

signals from different RF chains are independent, the PAPR

of the sum is the same as of one RF chain. For the case of SC

signaling, there is no clear work in the literature that shows

how the sum of SC signals behaves. We simulated the sum

of MRF = 4 SC signals using the same parameters as in [37].

The result shows that with probability of 0.999 the PAPR of

the sum is smaller than ∼9.8dB. We apply these values and

without loss of generality, we choose αoff,0=−7.2dB as

the reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the same

efficiency for a given Prad. However, as illustrated in Fig. 7 (c),

the OSPS architecture with SC signaling (OSPS, SC) achieves

the highest Prad, followed by (FC, SC), (OSPS, OFDM), and

(FC, OFDM). In contrast, by deploying different PAs (Option

II)9, Fig. 7 (d) shows that (OSPS, SC) achieves the highest

power efficiency, followed by (FC, SC), (OSPS, OFDM) and

(FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral efficiency with

certain precoders, but the OSPS architecture outperforms the

FC case in terms of hardware complexity and power efficiency,

only at the cost of a slightly longer latency for the initial BA.

C. Simulations Based on QuaDRiGa

In this section, we resort to the 3D geometry based channel

generator QuaDRiGa [30] to show that our numerical results

are quite consistent with practical mmWave communication

channels.10 More precisely, we apply our BA and precoding

schemes over ∼3×105channel snapshots generated by

QuaDRiGa. These channel snapshots correspond to a short

9Since the ηmax of different PAs highly depends on the technology, for

simplicity, we assume that different PAs working in their linear range have

roughly the same maximum efficiency ηmax,0.

10Due to the QuaDRiGa generator limits, only the no-blockage scenario is

considered in this section.

Fig. 7: The performance comparison of different transmitter architectures. (a) The initial BA detection probability vs. the training

overhead with SNRBBF =−19 dB. (b) The sum spectral efficiency vs. increasing SNRBBF, without blockage. (c) The actual radiated

power under Option I vs. the radiated power of the reference scenario. (d) The power efficiency under Option II vs. the actual radiated

power.

for 3GPP LTE in [37] showed that with a probability of

0.999, the PAPR of the LTE SC waveform (known as

SC-FDMA) is smaller than ∼7.2dB and the PAPR of the

LTE OFDM waveform (with 512 subcarriers employing

QPSK) is smaller than ∼11.4dB. We set PPAPR to these

values for the OSPS architecture. For the FC architecture,

however, the input signals of the PAs are the sum of

the signals from different RF chains. Since each OFDM

signal can be modeled as a Gaussian random process [37]

and the signals from different RF chains are independent,

the PAPR of the sum is the same as of one RF chain.

For the case of SC signaling, there is no clear work

in the literature that shows how the sum of SC signals

behaves. We simulated the sum of MRF = 4 SC signals

using the same parameters as in [37]. The result shows

that with probability of 0.999 the PAPR of the sum is

smaller than ∼9.8dB. We apply these values and without

loss of generality, we choose αoff,0=−7.2dB as the

reference scenario. As shown in (26), by deploying the

same PAs (Option I), the two architectures achieve the

same efficiency for a given Prad. However, as illustrated

in Fig. 7 (c), the OSPS architecture with SC signaling

(OSPS, SC) achieves the highest Prad, followed by (FC,

SC), (OSPS, OFDM), and (FC, OFDM). In contrast, by

deploying different PAs (Option II)9, Fig. 7 (d) shows that

(OSPS, SC) achieves the highest power efficiency, followed

by (FC, SC), (OSPS, OFDM) and (FC, OFDM).

To sum up, given the parameters in this paper, the two

architectures achieve a similar sum spectral efficiency with

certain precoders, but the OSPS architecture outperforms

the FC case in terms of hardware complexity and power

efficiency, only at the cost of a slightly longer latency for

the initial BA.

C. Simulations Based on QuaDRiGa

In this section, we resort to the 3D geometry based

channel generator QuaDRiGa [30] to show that our

numerical results are quite consistent with practical

mmWave communication channels.10 More precisely, we

apply our BA and precoding schemes over ∼3×105

channel snapshots generated by QuaDRiGa. These channel

snapshots correspond to a short segment of time evolution,

where the BS is stationary and the speed of each UE

along its moving direction is 1m/s. The simulation results

with respect to different transmitter architectures are shown

9Since the ηmax of different PAs highly depends on the technology,

for simplicity, we assume that different PAs working in their linear range

have roughly the same maximum efficiency ηmax,0.

10Due to the QuaDRiGa generator limits, only the no-blockage scenario

is considered in this section.

72 5.3 Original journal article

20 40 60 80 100 120 140 160

0.2

0.4

0.6

0.8

Number of beacon slots T

(a)

FC, NNLS

OSPS, NNLS

FC, OMP [43]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(b)

FC, BST

FC, MR-ZF, p= 2

OSPS, BST

OSPS, MR-ZF p=2

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 8: The simulations based on QuaDRiGa: (a) The initial BA detection probability vs. the training overhead, with SNRBBF =

−19 dB. (b) The sum spectral eﬀiciency of different transmitter architectures vs. increasing SNRBBF. (c) The sum spectral

eﬀiciency of the FC architecture vs. increasing SNRBBF . (d) The sum spectral eﬀiciency of the OSPS architecture vs. increasing

SNRBBF.

zero-forcing (MR-ZF), respectively. Particularly, both the

BA scheme and the MR-ZF precoding scheme outperform

the state-of-the-art counterparts in the literature. Given

the parameters in this paper, our simulation results show

that the two architectures achieve a similar sum spectral

eﬀiciency, but the OSPS architecture outperforms the

FC case in terms of hardware complexity and power

eﬀiciency, only at the cost of a slightly longer latency for

the initial BA. Therefore, the OSPS architecture emerges

as a good choice for a simple and eﬀicient design of

MU-MIMO base stations operating at mmWave.

References

[1] K. Venugopal, A. Alkhateeb, N. G. Prelcic, and R. W. Heath,

“Channel estimation for hybrid architecture-based wideband

millimeter wave systems,” IEEE Journal on Selected Areas in

Communications, vol. 35, no. 9, pp. 1996–2009, 2017.

[2] F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming

design for large-scale antenna arrays,” IEEE Journal of Selected

Topics in Signal Processing, vol. 10, no. 3, pp. 501–513, April

2016.

[3] A. Li and C. Masouros, “Hybrid analog-digital millimeter-wave

MU-MIMO transmission with virtual path selection,” IEEE

Communications Letters, vol. 21, no. 2, pp. 438–441, 2017.

[4] S. S. Ioushua and Y. C. Eldar, “Hybrid analog-digital

beamforming for massive MIMO systems,” arXiv preprint

arXiv:1712.03485, 2017.

[5] J. Du, W. Xu, H. Shen, X. Dong, and C. Zhao, “Hybrid

precoding architecture for massive multiuser MIMO with

dissipation: sub-connected or fully connected structures?” IEEE

Transactions on Wireless Communications, vol. 17, no. 8, pp.

5465–5479, 2018.

[6] P. L. Cao, T. J. Oechtering, and M. Skoglund, “Precoding design

for massive MIMO systems with sub-connected architecture

and per-antenna power constraints,” in WSA 2018; 22nd

International ITG Workshop on Smart Antennas, March 2018,

pp. 1–6.

[7] R. W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and

A. M. Sayeed, “An overview of signal processing techniques

for millimeter wave MIMO systems,” IEEE journal of selected

topics in signal processing, vol. 10, no. 3, pp. 436–453, 2016.

[8] X. Gao, L. Dai, and A. M. Sayeed, “Low RF-complexity

technologies to enable millimeter-wave MIMO with large

antenna array for 5G wireless communications,” IEEE

Communications Magazine, vol. 56, no. 4, pp. 211–217, APRIL

2018.

[9] A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen,

L. Li, and K. Haneda, “Hybrid beamforming for massive MIMO:

A survey,” IEEE Communications Magazine, vol. 55, no. 9, pp.

134–141, 2017.

[10] M. Majidzadeh, A. Moilanen, N. Tervo, H. Pennanen, A. Tölli,

and M. Latva-aho, “Hybrid beamforming for single-user MIMO

with partially connected RF architecture,” in 2017 European

Conference on Networks and Communications (EuCNC), June

2017, pp. 1–6.

[11] M. R. Castellanos, V. Raghavan, J. H. Ryu, O. H. Koymen,

J. Li, D. J. Love, and B. Peleato, “Hybrid multi-user precoding

with amplitude and phase control,” in 2018 IEEE International

Conference on Communications (ICC), May 2018, pp. 1–6.

[12] V. Raghavan, A. Partyka, A. Sampath, S. Subramanian,

O. H. Koymen, K. Ravid, J. Cezanne, K. Mukkavilli,

and J. Li, “Millimeter-wave MIMO prototype: Measurements

and experimental results,” IEEE Communications Magazine,

vol. 56, no. 1, pp. 202–209, 2018.

[13] Z. Gao, L. Dai, and Z. Wang, “Channel estimation for mmwave

massive MIMO based access and backhaul in ultra-dense

network,” in Communications (ICC), 2016 IEEE International

Conference on. IEEE, Conference Proceedings, pp. 1–6.

[14] J. Rodríguez-Fernández, N. González-Prelcic, K. Venugopal,

and R. W. Heath Jr, “Frequency-domain compressive channel

estimation for frequency-selective hybrid mmWave MIMO

20 40 60 80 100 120 140 160

0.2

0.4

0.6

0.8

Number of beacon slots T

(a)

FC, NNLS

OSPS, NNLS

FC, OMP [43]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(b)

FC, BST

FC, MR-ZF, p= 2

OSPS, BST

OSPS, MR-ZF p=2

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 8: The simulations based on QuaDRiGa: (a) The initial BA detection probability vs. the training overhead, with SNRBBF =

−19 dB. (b) The sum spectral eﬀiciency of different transmitter architectures vs. increasing SNRBBF. (c) The sum spectral

eﬀiciency of the FC architecture vs. increasing SNRBBF . (d) The sum spectral eﬀiciency of the OSPS architecture vs. increasing

SNRBBF.

zero-forcing (MR-ZF), respectively. Particularly, both the

BA scheme and the MR-ZF precoding scheme outperform

the state-of-the-art counterparts in the literature. Given

the parameters in this paper, our simulation results show

that the two architectures achieve a similar sum spectral

eﬀiciency, but the OSPS architecture outperforms the

FC case in terms of hardware complexity and power

eﬀiciency, only at the cost of a slightly longer latency for

the initial BA. Therefore, the OSPS architecture emerges

as a good choice for a simple and eﬀicient design of

MU-MIMO base stations operating at mmWave.

References

[1] K. Venugopal, A. Alkhateeb, N. G. Prelcic, and R. W. Heath,

“Channel estimation for hybrid architecture-based wideband

millimeter wave systems,” IEEE Journal on Selected Areas in

Communications, vol. 35, no. 9, pp. 1996–2009, 2017.

[2] F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming

design for large-scale antenna arrays,” IEEE Journal of Selected

Topics in Signal Processing, vol. 10, no. 3, pp. 501–513, April

2016.

[3] A. Li and C. Masouros, “Hybrid analog-digital millimeter-wave

MU-MIMO transmission with virtual path selection,” IEEE

Communications Letters, vol. 21, no. 2, pp. 438–441, 2017.

[4] S. S. Ioushua and Y. C. Eldar, “Hybrid analog-digital

beamforming for massive MIMO systems,” arXiv preprint

arXiv:1712.03485, 2017.

[5] J. Du, W. Xu, H. Shen, X. Dong, and C. Zhao, “Hybrid

precoding architecture for massive multiuser MIMO with

dissipation: sub-connected or fully connected structures?” IEEE

Transactions on Wireless Communications, vol. 17, no. 8, pp.

5465–5479, 2018.

[6] P. L. Cao, T. J. Oechtering, and M. Skoglund, “Precoding design

for massive MIMO systems with sub-connected architecture

and per-antenna power constraints,” in WSA 2018; 22nd

International ITG Workshop on Smart Antennas, March 2018,

pp. 1–6.

[7] R. W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and

A. M. Sayeed, “An overview of signal processing techniques

for millimeter wave MIMO systems,” IEEE journal of selected

topics in signal processing, vol. 10, no. 3, pp. 436–453, 2016.

[8] X. Gao, L. Dai, and A. M. Sayeed, “Low RF-complexity

technologies to enable millimeter-wave MIMO with large

antenna array for 5G wireless communications,” IEEE

Communications Magazine, vol. 56, no. 4, pp. 211–217, APRIL

2018.

[9] A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen,

L. Li, and K. Haneda, “Hybrid beamforming for massive MIMO:

A survey,” IEEE Communications Magazine, vol. 55, no. 9, pp.

134–141, 2017.

[10] M. Majidzadeh, A. Moilanen, N. Tervo, H. Pennanen, A. Tölli,

and M. Latva-aho, “Hybrid beamforming for single-user MIMO

with partially connected RF architecture,” in 2017 European

Conference on Networks and Communications (EuCNC), June

2017, pp. 1–6.

[11] M. R. Castellanos, V. Raghavan, J. H. Ryu, O. H. Koymen,

J. Li, D. J. Love, and B. Peleato, “Hybrid multi-user precoding

with amplitude and phase control,” in 2018 IEEE International

Conference on Communications (ICC), May 2018, pp. 1–6.

[12] V. Raghavan, A. Partyka, A. Sampath, S. Subramanian,

O. H. Koymen, K. Ravid, J. Cezanne, K. Mukkavilli,

and J. Li, “Millimeter-wave MIMO prototype: Measurements

and experimental results,” IEEE Communications Magazine,

vol. 56, no. 1, pp. 202–209, 2018.

[13] Z. Gao, L. Dai, and Z. Wang, “Channel estimation for mmwave

massive MIMO based access and backhaul in ultra-dense

network,” in Communications (ICC), 2016 IEEE International

Conference on. IEEE, Conference Proceedings, pp. 1–6.

[14] J. Rodríguez-Fernández, N. González-Prelcic, K. Venugopal,

and R. W. Heath Jr, “Frequency-domain compressive channel

estimation for frequency-selective hybrid mmWave MIMO

20 40 60 80 100 120 140 160

0.2

0.4

0.6

0.8

Number of beacon slots T

(a)

FC, NNLS

OSPS, NNLS

FC, OMP [43]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(b)

FC, BST

FC, MR-ZF, p= 2

OSPS, BST

OSPS, MR-ZF p=2

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 8: The simulations based on QuaDRiGa: (a) The initial BA detection probability vs. the training overhead, with SNRBBF =

−19 dB. (b) The sum spectral eﬀiciency of different transmitter architectures vs. increasing SNRBBF. (c) The sum spectral

eﬀiciency of the FC architecture vs. increasing SNRBBF . (d) The sum spectral eﬀiciency of the OSPS architecture vs. increasing

SNRBBF.

zero-forcing (MR-ZF), respectively. Particularly, both the

BA scheme and the MR-ZF precoding scheme outperform

the state-of-the-art counterparts in the literature. Given

the parameters in this paper, our simulation results show

that the two architectures achieve a similar sum spectral

eﬀiciency, but the OSPS architecture outperforms the

FC case in terms of hardware complexity and power

eﬀiciency, only at the cost of a slightly longer latency for

the initial BA. Therefore, the OSPS architecture emerges

as a good choice for a simple and eﬀicient design of

MU-MIMO base stations operating at mmWave.

References

[1] K. Venugopal, A. Alkhateeb, N. G. Prelcic, and R. W. Heath,

“Channel estimation for hybrid architecture-based wideband

millimeter wave systems,” IEEE Journal on Selected Areas in

Communications, vol. 35, no. 9, pp. 1996–2009, 2017.

[2] F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming

design for large-scale antenna arrays,” IEEE Journal of Selected

Topics in Signal Processing, vol. 10, no. 3, pp. 501–513, April

2016.

[3] A. Li and C. Masouros, “Hybrid analog-digital millimeter-wave

MU-MIMO transmission with virtual path selection,” IEEE

Communications Letters, vol. 21, no. 2, pp. 438–441, 2017.

[4] S. S. Ioushua and Y. C. Eldar, “Hybrid analog-digital

beamforming for massive MIMO systems,” arXiv preprint

arXiv:1712.03485, 2017.

[5] J. Du, W. Xu, H. Shen, X. Dong, and C. Zhao, “Hybrid

precoding architecture for massive multiuser MIMO with

dissipation: sub-connected or fully connected structures?” IEEE

Transactions on Wireless Communications, vol. 17, no. 8, pp.

5465–5479, 2018.

[6] P. L. Cao, T. J. Oechtering, and M. Skoglund, “Precoding design

for massive MIMO systems with sub-connected architecture

and per-antenna power constraints,” in WSA 2018; 22nd

International ITG Workshop on Smart Antennas, March 2018,

pp. 1–6.

[7] R. W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and

A. M. Sayeed, “An overview of signal processing techniques

for millimeter wave MIMO systems,” IEEE journal of selected

topics in signal processing, vol. 10, no. 3, pp. 436–453, 2016.

[8] X. Gao, L. Dai, and A. M. Sayeed, “Low RF-complexity

technologies to enable millimeter-wave MIMO with large

antenna array for 5G wireless communications,” IEEE

Communications Magazine, vol. 56, no. 4, pp. 211–217, APRIL

2018.

[9] A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen,

L. Li, and K. Haneda, “Hybrid beamforming for massive MIMO:

A survey,” IEEE Communications Magazine, vol. 55, no. 9, pp.

134–141, 2017.

[10] M. Majidzadeh, A. Moilanen, N. Tervo, H. Pennanen, A. Tölli,

and M. Latva-aho, “Hybrid beamforming for single-user MIMO

with partially connected RF architecture,” in 2017 European

Conference on Networks and Communications (EuCNC), June

2017, pp. 1–6.

[11] M. R. Castellanos, V. Raghavan, J. H. Ryu, O. H. Koymen,

J. Li, D. J. Love, and B. Peleato, “Hybrid multi-user precoding

with amplitude and phase control,” in 2018 IEEE International

Conference on Communications (ICC), May 2018, pp. 1–6.

[12] V. Raghavan, A. Partyka, A. Sampath, S. Subramanian,

O. H. Koymen, K. Ravid, J. Cezanne, K. Mukkavilli,

and J. Li, “Millimeter-wave MIMO prototype: Measurements

and experimental results,” IEEE Communications Magazine,

vol. 56, no. 1, pp. 202–209, 2018.

[13] Z. Gao, L. Dai, and Z. Wang, “Channel estimation for mmwave

massive MIMO based access and backhaul in ultra-dense

network,” in Communications (ICC), 2016 IEEE International

Conference on. IEEE, Conference Proceedings, pp. 1–6.

[14] J. Rodríguez-Fernández, N. González-Prelcic, K. Venugopal,

and R. W. Heath Jr, “Frequency-domain compressive channel

estimation for frequency-selective hybrid mmWave MIMO

20 40 60 80 100 120 140 160

0.2

0.4

0.6

0.8

Number of beacon slots T

(a)

FC, NNLS

OSPS, NNLS

FC, OMP [43]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(b)

FC, BST

FC, MR-ZF, p= 2

OSPS, BST

OSPS, MR-ZF p=2

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(c)

FC, BST

FC, MRT, p= 1

FC, MRT, p= 2

FC, MR-ZF, p= 1

FC, MR-ZF, p= 2

FC, ZF in [26]

−30 −20 −10 0 10 20 30

SNRBBF (dB)

Rsum (bit/s/Hz)

(d)

OSPS, BST

OSPS, MRT, p= 1

OSPS, MRT, p= 2

OSPS, MR-ZF, p=1

OSPS, MR-ZF, p= 2

OSPS, ZF in [26]

Fig. 8: The simulations based on QuaDRiGa: (a) The initial BA detection probability vs. the training overhead, with SNRBBF =

−19 dB. (b) The sum spectral eﬀiciency of different transmitter architectures vs. increasing SNRBBF. (c) The sum spectral

eﬀiciency of the FC architecture vs. increasing SNRBBF . (d) The sum spectral eﬀiciency of the OSPS architecture vs. increasing

SNRBBF.

zero-forcing (MR-ZF), respectively. Particularly, both the

BA scheme and the MR-ZF precoding scheme outperform

the state-of-the-art counterparts in the literature. Given

the parameters in this paper, our simulation results show

that the two architectures achieve a similar sum spectral

eﬀiciency, but the OSPS architecture outperforms the

FC case in terms of hardware complexity and power

eﬀiciency, only at the cost of a slightly longer latency for

the initial BA. Therefore, the OSPS architecture emerges

as a good choice for a simple and eﬀicient design of

MU-MIMO base stations operating at mmWave.

References

[1] K. Venugopal, A. Alkhateeb, N. G. Prelcic, and R. W. Heath,

“Channel estimation for hybrid architecture-based wideband

millimeter wave systems,” IEEE Journal on Selected Areas in

Communications, vol. 35, no. 9, pp. 1996–2009, 2017.

[2] F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming

design for large-scale antenna arrays,” IEEE Journal of Selected

Topics in Signal Processing, vol. 10, no. 3, pp. 501–513, April

2016.

[3] A. Li and C. Masouros, “Hybrid analog-digital millimeter-wave

MU-MIMO transmission with virtual path selection,” IEEE

Communications Letters, vol. 21, no. 2, pp. 438–441, 2017.

[4] S. S. Ioushua and Y. C. Eldar, “Hybrid analog-digital

beamforming for massive MIMO systems,” arXiv preprint

arXiv:1712.03485, 2017.

[5] J. Du, W. Xu, H. Shen, X. Dong, and C. Zhao, “Hybrid

precoding architecture for massive multiuser MIMO with

dissipation: sub-connected or fully connected structures?” IEEE

Transactions on Wireless Communications, vol. 17, no. 8, pp.

5465–5479, 2018.

[6] P. L. Cao, T. J. Oechtering, and M. Skoglund, “Precoding design

for massive MIMO systems with sub-connected architecture

and per-antenna power constraints,” in WSA 2018; 22nd

International ITG Workshop on Smart Antennas, March 2018,

pp. 1–6.

[7] R. W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and

A. M. Sayeed, “An overview of signal processing techniques

for millimeter wave MIMO systems,” IEEE journal of selected

topics in signal processing, vol. 10, no. 3, pp. 436–453, 2016.

[8] X. Gao, L. Dai, and A. M. Sayeed, “Low RF-complexity

technologies to enable millimeter-wave MIMO with large

antenna array for 5G wireless communications,” IEEE

Communications Magazine, vol. 56, no. 4, pp. 211–217, APRIL

2018.

[9] A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen,

L. Li, and K. Haneda, “Hybrid beamforming for massive MIMO:

A survey,” IEEE Communications Magazine, vol. 55, no. 9, pp.

134–141, 2017.

[10] M. Majidzadeh, A. Moilanen, N. Tervo, H. Pennanen, A. Tölli,

and M. Latva-aho, “Hybrid beamforming for single-user MIMO

with partially connected RF architecture,” in 2017 European

Conference on Networks and Communications (EuCNC), June

2017, pp. 1–6.

[11] M. R. Castellanos, V. Raghavan, J. H. Ryu, O. H. Koymen,

J. Li, D. J. Love, and B. Peleato, “Hybrid multi-user precoding

with amplitude and phase control,” in 2018 IEEE International

Conference on Communications (ICC), May 2018, pp. 1–6.

[12] V. Raghavan, A. Partyka, A. Sampath, S. Subramanian,

O. H. Koymen, K. Ravid, J. Cezanne, K. Mukkavilli,

and J. Li, “Millimeter-wave MIMO prototype: Measurements

and experimental results,” IEEE Communications Magazine,

vol. 56, no. 1, pp. 202–209, 2018.

[13] Z. Gao, L. Dai, and Z. Wang, “Channel estimation for mmwave

massive MIMO based access and backhaul in ultra-dense

network,” in Communications (ICC), 2016 IEEE International

Conference on. IEEE, Conference Proceedings, pp. 1–6.

[14] J. Rodríguez-Fernández, N. González-Prelcic, K. Venugopal,

and R. W. Heath Jr, “Frequency-domain compressive channel

estimation for frequency-selective hybrid mmWave MIMO

Fig. 8: The simulations based on QuaDRiGa: (a) The initial BA detection probability vs. the training overhead, with SNRBBF =−19

dB. (b) The sum spectral efficiency of different transmitter architectures vs. increasing SNRBBF. (c) The sum spectral efficiency of

the FC architecture vs. increasing SNRBBF. (d) The sum spectral efficiency of the OSPS architecture vs. increasing SNRBBF.

in Fig. 8. As we can see from Fig. 8 (a), for the initial

BA with PD≥0.95, the FC architecture requires ∼10

less beacon slots than the OSPS case. Whereas, for the

data communication phase as shown in Fig. 8 (b), by

using either the BST or the MR-ZF precoder in the low

SNR range (SNRBBF ≤ −15 dB), and using the MR-ZF

precoder in the high SNR range (SNRBBF >−15 dB), the

two architectures achieve a quite similar performance. In

addition, for both architectures as shown in Fig. 8 (c) and

Fig. 8 (d), respectively, all the curves coincides with each

other in the low SNR range, whereas the MR-ZF precoder

outperforms the rest in the high SNR range. As we can see,

all the results based on the QuaDRiGa generator are quite

consistent with the results based on our proposed channel

model. This consistency implies that our models, schemes,

results and statements are not only theoretically reliable but

also practically applicable.

VI. CONCLUSION

In this paper, we proposed an analysis framework

to evaluate the performance of typical hybrid

transmitters at mmWave frequencies. In particular,

we focused on the comparison of a fully-connected (FC)

architecture and a partially-connected architecture with

one-stream-per-subarray (OSPS) for a MU-MIMO base

station using HDA beamforming. We jointly evaluated

the performance of the two architectures in terms of the

initial beam alignment (BA), the data communication, and

the transmitter power efficiency. We used our recently

proposed BA scheme and further proposed three simple

precoding schemes on top of the effective channel after

the BA. The precoding schemes are based on beam

steering (BST), analog maximum ratio transmitting

(MRT), and joint analog maximum ratio and baseband

zero-forcing (MR-ZF), respectively. Particularly, both the

BA scheme and the MR-ZF precoding scheme outperform

the state-of-the-art counterparts in the literature. Given

the parameters in this paper, our simulation results show

that the two architectures achieve a similar sum spectral

efficiency, but the OSPS architecture outperforms the

FC case in terms of hardware complexity and power

efficiency, only at the cost of a slightly longer latency for

the initial BA. Therefore, the OSPS architecture emerges

as a good choice for a simple and efficient design of

MU-MIMO base stations operating at mmWave.

REFERENCES

[1] K. Venugopal, A. Alkhateeb, N. G. Prelcic, and R. W.

Heath, “Channel estimation for hybrid architecture-based wideband

millimeter wave systems,” IEEE Journal on Selected Areas in

Communications, vol. 35, no. 9, pp. 1996–2009, 2017.

[2] F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming

5. Data Communication for mmWave Multi-User MIMO 73

design for large-scale antenna arrays,” IEEE Journal of Selected

Topics in Signal Processing, vol. 10, no. 3, pp. 501–513, April 2016.

[3] A. Li and C. Masouros, “Hybrid analog-digital millimeter-wave

MU-MIMO transmission with virtual path selection,” IEEE

Communications Letters, vol. 21, no. 2, pp. 438–441, 2017.

[4] S. S. Ioushua and Y. C. Eldar, “Hybrid analog-digital beamforming

for massive MIMO systems,” arXiv preprint arXiv:1712.03485,

2017.

[5] J. Du, W. Xu, H. Shen, X. Dong, and C. Zhao, “Hybrid precoding

architecture for massive multiuser MIMO with dissipation:

sub-connected or fully connected structures?” IEEE Transactions

on Wireless Communications, vol. 17, no. 8, pp. 5465–5479, 2018.

[6] P. L. Cao, T. J. Oechtering, and M. Skoglund, “Precoding design

for massive MIMO systems with sub-connected architecture and

per-antenna power constraints,” in WSA 2018; 22nd International

ITG Workshop on Smart Antennas, March 2018, pp. 1–6.

[7] R. W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and A. M.

Sayeed, “An overview of signal processing techniques for millimeter

wave MIMO systems,” IEEE journal of selected topics in signal

processing, vol. 10, no. 3, pp. 436–453, 2016.

[8] X. Gao, L. Dai, and A. M. Sayeed, “Low RF-complexity

technologies to enable millimeter-wave MIMO with large antenna

array for 5G wireless communications,” IEEE Communications

Magazine, vol. 56, no. 4, pp. 211–217, APRIL 2018.

[9] A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen,

L. Li, and K. Haneda, “Hybrid beamforming for massive MIMO:

A survey,” IEEE Communications Magazine, vol. 55, no. 9, pp.

134–141, 2017.

[10] M. Majidzadeh, A. Moilanen, N. Tervo, H. Pennanen, A. T¨

olli, and

M. Latva-aho, “Hybrid beamforming for single-user MIMO with

partially connected RF architecture,” in 2017 European Conference

on Networks and Communications (EuCNC), June 2017, pp. 1–6.

[11] M. R. Castellanos, V. Raghavan, J. H. Ryu, O. H. Koymen,

J. Li, D. J. Love, and B. Peleato, “Hybrid multi-user precoding

with amplitude and phase control,” in 2018 IEEE International

Conference on Communications (ICC), May 2018, pp. 1–6.

[12] V. Raghavan, A. Partyka, A. Sampath, S. Subramanian,

O. H. Koymen, K. Ravid, J. Cezanne, K. Mukkavilli, and

J. Li, “Millimeter-wave MIMO prototype: Measurements and

experimental results,” IEEE Communications Magazine, vol. 56,

no. 1, pp. 202–209, 2018.

[13] Z. Gao, L. Dai, and Z. Wang, “Channel estimation for mmwave

massive MIMO based access and backhaul in ultra-dense network,”

in Communications (ICC), 2016 IEEE International Conference on.

IEEE, Conference Proceedings, pp. 1–6.

[14] J. Rodr´

ıguez-Fern´

andez, N. Gonz´

alez-Prelcic, K. Venugopal, and

R. W. Heath Jr, “Frequency-domain compressive channel estimation

for frequency-selective hybrid mmWave MIMO systems,” arXiv

preprint arXiv:1704.08572, 2017.

[15] S. Haghighatshoar and G. Caire, “The beam alignment problem in

mmWave wireless networks,” in 2016 50th Asilomar Conference on

Signals, Systems and Computers, Nov 2016, pp. 741–745.

[16] X. Song, S. Haghighatshoar, and G. Caire, “A scalable and

statistically robust beam alignment technique for mm-Wave

systems,” IEEE Trans. on Wireless Comm., vol. PP, pp. 1–1, 2018.

[17] V. Va, J. Choi, and R. W. Heath, “The impact of beamwidth

on temporal channel variation in vehicular channels and its

implications,” IEEE Transactions on Vehicular Technology, vol. 66,

no. 6, pp. 5014–5029, 2017.

[18] X. Song, S. Haghighatshoar, and G. Caire, “A robust time-domain

beam alignment scheme for multi-user wideband mmWave systems,”

in WSA 2018; 22th International ITG Workshop on Smart Antennas

(to be published), March 2018, pp. 1–7.

[19] ——, “Efficient beam alignment for mmWave single-carrier systems

with hybrid MIMO transceivers,” IEEE Transactions on Wireless

Communications, 2019.

[20] R. J. Weiler, M. Peter, W. Keusgen, and M. Wisotzki, “Measuring

the busy urban 60 GHz outdoor access radio channel,” in 2014 IEEE

International Conference on Ultra-WideBand (ICUWB), Sept 2014,

pp. 166–170.

[21] P. A. Eliasi, S. Rangan, and T. S. Rappaport, “Low-rank spatial

channel estimation for millimeter wave cellular systems,” IEEE

Transactions on Wireless Communications, vol. 16, no. 5, pp.

2748–2759, 2017.

[22] O. El Ayach, R. W. Heath, S. Rajagopal, and Z. Pi, “Multimode

precoding in millimeter wave MIMO transmitters with multiple

antenna sub-arrays,” in Global Communications Conference

(GLOBECOM), 2013 IEEE. IEEE, Conference Proceedings, pp.

3476–3480.

[23] D. Zhang, Y. Wang, X. Li, and W. Xiang, “Hybridly connected

structure for hybrid beamforming in mmWave massive MIMO

systems,” IEEE Transactions on Communications, vol. 66, no. 2,

pp. 662–674, 2018.

[24] H.-L. Chiang, W. Rave, T. Kadur, and G. Fettweis, “Hybrid

beamforming based on implicit channel state information for

millimeter wave links,” IEEE Journal of Selected Topics in Signal

Processing, vol. 12, no. 2, pp. 326–339, 2018.

[25] V. Raghavan, S. Subramanian, J. Cezanne, A. Sampath, O. Koymen,

and J. Li, “Directional hybrid precoding in millimeter-wave MIMO

systems,” in Global Communications Conference (GLOBECOM),

2016 IEEE. IEEE, Conference Proceedings, pp. 1–7.

[26] V. Raghavan, S. Subramanian, J. Cezanne, A. Sampath, O. H.

Koymen, and J. Li, “Single-user versus multi-User precoding for

millimeter wave MIMO systems,” IEEE Journal on Selected Areas

in Communications, vol. 35, no. 6, pp. 1387–1401, June 2017.

[27] X. Song, T. K¨

uhne, and G. Caire, “Fully-connected vs.

sub-connected hybrid precoding architectures for mmWave

MU-MIMO,” in 2019 IEEE International Conference on

Communications (ICC) (accepted).

[28] N. N. Moghadam, G. Fodor, M. Bengtsson, and D. J. Love, “On

the energy efficiency of MIMO hybrid beamforming for millimeter

wave systems with nonlinear power amplifiers,” arXiv preprint

arXiv:1806.01602, 2018.

[29] T. H¨

alsig, D. Cvetkovski, E. Grass, and B. Lankl, “Statistical

properties and variations of LOS MIMO channels at millimeter wave

frequencies,” arXiv preprint arXiv:1803.07768, 2018.

[30] S. Jaeckel, L. Raschkowski, K. B¨

orner, and L. Thiele, “QuaDRiGa:

A 3-D multi-cell channel model with time evolution for

enabling virtual field trials,” IEEE Transactions on Antennas and

Propagation, vol. 62, no. 6, pp. 3242–3256, 2014.

[31] A. Gupta and R. K. Jha, “A Survey of 5G Network: Architecture

and Emerging Technologies,” IEEE Access, vol. 3, pp. 1206–1232,

2015.

[32] M. Agiwal, A. Roy, and N. Saxena, “Next Generation 5G Wireless

Networks: A Comprehensive Survey,” IEEE Communications

Surveys Tutorials, vol. 18, no. 3, pp. 1617–1655, thirdquarter 2016.

[33] J. G. Proakis and M. Salehi, Digital communications. McGraw-Hill,

2008.

[34] P. Bello, “Characterization of randomly time-variant linear

channels,” IEEE Transactions on Communications Systems, vol. 11,

no. 4, pp. 360–393, 1963.

[35] A. Goldsmith, Wireless communications. Cambridge University

Press, 2005.

[36] A. M. Sayeed, “Deconstructing multiantenna fading channels,” IEEE

Transactions on Signal Processing, vol. 50, no. 10, pp. 2563–2579,

2002.

[37] H. G. Myung, J. Lim, and D. J. Goodman, “Peak-to-average

power ratio of single carrier FDMA signals with pulse shaping,”

in Personal, Indoor and Mobile Radio Communications, 2006 IEEE

17th International Symposium on. IEEE, Conference Proceedings,

pp. 1–5.

[38] C. Shepard, H. Yu, N. Anand, E. Li, T. Marzetta, R. Yang,

and L. Zhong, “Argos: Practical many-antenna base stations,” in

Proceedings of the 18th annual international conference on Mobile

computing and networking. ACM, 2012, pp. 53–64.

[39] T. L. Marzetta, E. G. Larsson, H. Yang, and H. Q. Ngo,

Fundamentals of massive MIMO. Cambridge University Press,

2016.

[40] E. Perahia, C. Cordeiro, M. Park, and L. L. Yang, “IEEE 802.11

ad: Defining the next generation multi-Gbps Wi-Fi,” in Consumer

Communications and Networking Conference (CCNC), 2010 7th

IEEE. IEEE, Conference Proceedings, pp. 1–5.

[41] T. S. Rappaport, G. R. MacCartney, S. Sun, H. Yan, and

S. Deng, “Small-Scale, Local Area, and Transitional Millimeter

Wave Propagation for 5G Communications,” IEEE Transactions on

Antennas and Propagation, vol. 65, no. 12, pp. 6474–6490, Dec

2017.

[42] S. Jaeckel, L. Raschkowski, K. B¨

orner, L. Thiele, and F. Burkhardt,

“Quasi deterministic radio channel generator user manual and

74 5.3 Original journal article

documentation,” Fraunhofer Heinrich Hertz Institute Wireless

Communications and Networks, 2016.

[43] K. Venugopal, A. Alkhateeb, R. W. Heath, and N. G. Prelcic,

“Time-domain channel estimation for wideband millimeter wave

systems with hybrid architecture,” in Acoustics, Speech and Signal

Processing (ICASSP), 2017 IEEE International Conference on.

IEEE, 2017, Conference Proceedings, pp. 6493–6497.

Xiaoshen Song (S’17) received the B.Sc.

degree in Communication Engineering from

Northwestern Polytechnical University, Xi’an,

China, in 2013, and the M.Sc. degree in

Communication and Information Systems from

the Institute of Electronics, University of

Chinese Academy of Sciences, Beijing, China,

in 2016. Her master’s thesis focuses on

video synthetic aperture radar (VideoSAR)

system design and imaging algorithms. She is

currently pursuing the Ph.D. degree with the

Communications and Information Theory (CommIT) group at Technische

Universit¨

at Berlin, Berlin, Germany. Her research interests include

wireless communication, mmWave MIMO, and compressed sensing.

Thomas K¨

uhne received his university degree

(5-year Dipl.-Ing. equivalent to a M.Sc.) in

Electrical Engineering from the University

of Technology Dresden. During his master

studies he focused on communication systems

and circuit design. He gained professional

research experience while working 3 years

for the Fraunhofer Heinrich-Hertz-Institute in

Berlin Germany. At the Heinrich-Hertz-Institute

he developed prototypes for mm-wave

communication and measurement devices for

mm-wave channels. Since 2015 he works for the Communications and

Information Theory group of Prof. Caire at the Technische Universit¨

Berlin. His research interests include hardware software co-design,

wireless communication systems, and signal processing.

Giuseppe Caire (S’92 – M’94 – SM’03

– F’05) was born in Torino in 1965. He

received the B.Sc. in Electrical Engineering

from Politecnico di Torino in 1990, the M.Sc.

in Electrical Engineering from Princeton

University in 1992, and the Ph.D. from

Politecnico di Torino in 1994. He has been a

post-doctoral research fellow with the European

Space Agency (ESTEC, Noordwijk, The

Netherlands) in 1994-1995, Assistant Professor

in Telecommunications at the Politecnico di

Torino, Associate Professor at the University of Parma, Italy, Professor

with the Department of Mobile Communications at the Eurecom Institute,

Sophia-Antipolis, France, a Professor of Electrical Engineering with

the Viterbi School of Engineering, University of Southern California,

Los Angeles, and he is currently an Alexander von Humboldt Professor

with the Faculty of Electrical Engineering and Computer Science at the

Technical University of Berlin, Germany.

He received the Jack Neubauer Best System Paper Award from the

IEEE Vehicular Technology Society in 2003, the IEEE Communications

Society & Information Theory Society Joint Paper Award in 2004 and

in 2011, the Leonard G. Abraham Prize for best IEEE JSAC paper in

2019, the Okawa Research Award in 2006, the Alexander von Humboldt

Professorship in 2014, the Vodafone Innovation Prize in 2015, and an

ERC Advanced Grant in 2018. Giuseppe Caire is a Fellow of IEEE since

2005. He has served in the Board of Governors of the IEEE Information

Theory Society from 2004 to 2007, and as officer from 2008 to 2013.

He was President of the IEEE Information Theory Society in 2011.

His main research interests are in the field of communications theory,

information theory, channel and source coding with particular focus on

wireless communications.

5. Data Communication for mmWave Multi-User MIMO 75

Beam Scheduling for mmWave Relay

Networks

6.1 Introduction

In mmWave communication, one effective way to mitigate the severe path loss,

the sensitivity to blockages and meanwhile to increase the communication range is

beamforming in combination with relaying. Having studied the beamforming issues in

the previous chapters, this chapter focus on the beam scheduling problem for mmWave

half-duplex (HD) relay networks. Two practical beam scheduling schemes, i.e., the

deterministic edge coloring (EC) scheduler and the adaptive backpressure (BP) scheduler,

will be presented to stabilize the network within its capacity range, meanwhile to

guarantee small queuing backlog and end-to-end delay.

6.2 Clarification of each authors’ contributions

This chapter is a journal manuscript, which is a joint work with Yahya H. Ezzeldin,

Giuseppe Caire, and Christina Fragouli. I wrote this journal manuscript as the first

author. This manuscript will be submitted to the journal IEEE TWC in a short time.

Currently the manuscripy is still under modifications by the co-authors. The citation

information is in below:

Xiaoshen Song, Yahya H. Ezzeldin, Giuseppe Caire, Christina Fragouli,“Efficient

Beam Scheduling for Half-Duplex mmWave Relay Networks,” IEEE Transactions on

Wireless Communications, 2020. (to be submitted).

78 6.3 Original journal article

All the authors contributed to this paper. I authored the beam scheduling sections. I

proposed the underlying beam scheduling methods and implemented the simulations for

different beam schedulers. I also wrote the complete first draft (including all sections)

of this paper.

Yahya H. Ezzeldin authored the network capacity section. He implemented the

simulations for the network capacity.

Giuseppe Caire, who is my PhD supervisor, provided valuable discussions in each

meeting of this work. He will also do a final modification together with Christina

Fragouli for the overall draft.

6.3 Original journal article

The following article is a reprint of the original journal manuscript. It is the latest

version of our work. The copyright information is given in page xii of this thesis as well

as in the first page of the reprinted paper

Efficient Beam Scheduling for Half-Duplex

mmWave Relay Networks

Xiaoshen Song†,Student Member, IEEE, Yahya H. Ezzeldin∗,Student Member, IEEE, Giuseppe Caire†,

Fellow, IEEE, Christina Fragouli∗,Fellow, IEEE

©All the authors. Reprinted, with permision, from X. Song, Y. H. Ezzeldin, G. Caire, and C. Fragouli. This paper will be submitted to one of IEEE

transactions. This reprint is the latest version of the paper.

Abstract—Millimeter wave (mmWave) communication is

expected to play a central role in next generation mobile

systems (5G) and beyond, by providing multi-Gbps data rates.

However, the severe pathloss and sensitivity to blockages

at mmWave frequencies significantly challenge practical

implementations. One effective way to mitigate these effects

and to increase the communication range is beamforming in

combination with relaying. In this paper, we study the beam

scheduling problem for mmWave half-duplex (HD) relay

networks, where the relay topology can be arbitrary. Based

on theoretically optimal schedule results, we first implement

a network simplification procedure to reduce the network

topology complexity, and then propose two practically

relevant beam scheduling schemes: the deterministic edge

coloring (EC) scheduler and the adaptive backpressure (BP)

scheduler. The former consists of a very simple one-time

computation of the sequence of scheduling states, which

is then repeated periodically. The one-time computation

depends on the underlying network topology, and therefore

it must be repeated when such topology changes. As such,

this approach is more suited to quasi-static scenarios. The

latter is an “online” approach which updates scheduling

weights and solves at each time slots a weighted sum rate

maximization. Hence, its computational complexity may be

significantly higher than that of EC, but it is better suited to

dynamic time-varying scenarios. With the aid of computer

simulations, we show that both the proposed schedulers

guarantee network stability within the network capacity.

Particularly, in comparison with a baseline scheme, the

proposed schedulers achieve much smaller queuing backlogs,

much smaller backlog fluctuations, and much lower packet

end-to-end delays.

Index Terms—mmWave, relay network, scheduling,

network stability, end-to-end delay, network capacity

I. INTRODUCTION

Migration towards millimeter wave (mmWave) bands

(30-300 GHz) is considered a key enabler for next

generation (5G) mobile networks and beyond [1–4].

Thanks to the large available bandwidth, a mmWave

transceiver can potentially achieve individual link rates

in tens of Gbps. However, compared with the traditional

sub-6GHz frequencies, mmWave communication has three

main characteristics [4–6]: 1) High free-space isotropic

propagation loss; 2) Highly directional propagation along

†X. Song and G. Caire are with the Electrical Engineering and

Computer Science Department, Technische Universit¨

at Berlin, 10587

Berlin, Germany. ∗Y. H. Ezzeldin and C. Fragouli are with the University

of California, Los Angeles, CA 90095, USA.

the line of sight (LoS) and a small number of specular

paths; 3) Vulnerability to obstacles. One effective way to

mitigate these effects is beamforming in combination with

relaying [2], where the former is achieved by utilizing large

antenna arrays at both the transmitter (Tx) and receiver

(Rx) sides and pointing their beams towards each other,

and the latter refers to using intermediate nodes to relay

the source signal to the destination [3].

The beamforming problem in a small cell mmWave

scenario with one base station (BS) and multiple user

equipments (UEs) has been studied in our previous

work [7–9], in which we proposed an efficient initial

beam alignment scheme to find the strongest beam pair

connecting each UE and the BS, such that the consequent

data communication phase can achieve large directivity and

beamforming gain. Under this directional communication,

the multi-user interference among different links is

negligible [3, 10–12], and thus concurrent transmissions

(i.e., spatial reuse) can be fully utilized to improve

the transmission efficiency and to increase the network

capacity.

With the increasing interest in developing small cells for

mmWave communication, how to use relays to increase

the coverage and to support high-rate mmWave wireless

connections for dense small cell deployments remains

a major challenge [3]. The relay network problem at

sub-6 GHz frequencies has been well studied in the past

decades [13–15]. A single source single destination relay

network is a classical information theoretic model [16],

and represents one step towards the understanding and

designing of general multiple multicast networks. In its

own right, there are important situations where one node

wishes to communicate a common message to a set of

other nodes (single multicast relay network), e.g., vehicle

to vehicle (V2V) communication for platooning, where the

head of the platoon sends commands to the other vehicles,

or vehicle to everything (V2X) fast emergency control,

where a road-side base station wishes to send emergency

control messages to all the vehicles in a certain area

[17, 18]. By taking into account the unique characteristics

at mmWave frequencies, the relay nodes in a mmWave

network divide the long link into some short but very

high-rate links to overcome the mmWave sensitivity to

blockages. In such a case, a link is active only if both

nodes focus their beams to face each other, which is

6. Beam Scheduling for mmWave Relay Networks 79

determined by the underlying beam scheduling scheme.

The source and destination cannot communicate with each

other directly because the distance between them is too

large to achieve the required data rate and /or some

obstacles are in between preventing direct communication.

Consider a general half-duplex (HD) relay network, where

all the nodes are assumed to work in HD mode thus

cannot simultaneously transmit and receive 1. Although

the optimal beam directions for each node pair can be

obtained through an initial BA phase, how to efficiently

schedule the beams, in terms of avoiding too large queuing

backlogs at the intermediate nodes as well as assuring a

small end-to-end delay, becomes an important concern in

practical network operations [12].

In this paper, we study the beam scheduling problem

for HD mmWave relay networks with arbitrary topology.

Our study will focus on developing practically relevant

scheduling algorithms guided by theoretical results on

the network approximate capacity Ccs,iid and the optimal

scheduling in mmWave network models [19].

A. Related work

While relays on sub-6GHz bands suffer from severe

interference due to their ominidirectional transmissions,

the directivity of mmWave antennas significantly mitigates

interference [10, 11], especially in backhaul systems [3,

12]. A large body of efforts have been made to study the

mmWave relay network regime with an emphasis on one

or several aspects, i.e., relay selection, congestion control,

routing, scheduling and so on. However, we observe that

the existing works lack the fundamental understanding

of the information theoretic limit of the underlying relay

network model.

The work in [1, 20] studied the relay selection problem,

in which once a direct LoS link is blocked, a relay

selection scheme would be activated to search a best

relay path in terms of the achievable data rate. The work

in [1, 20] can effectively handle an accidental blockage,

however, one should note that mmWave relay network

settings have potentially much more advantages than only

passively dealing with blockages. The work in [3] and

[21] focused on designing a multi-hop mmWave network

for backhauling, range extension and improved robustness

from path diversity. A main limitation in [3, 21], however,

is that it considers only single path streaming [12], i.e., the

selection of a single relay-path with the highest throughput

for each UE. Although a claim is made to maximize the

network capacity, we observe that a more fundamental

capacity exploitation between the source-destination pair

with possibly multiple relay paths is not taken into

consideration. Actually, the throughput improvement with

multiple relay paths (flows) for a single source-destination

pair has been proved in [22]. The underlying idea is to

1We assume each node is equipped with an electronically steerable

antenna array to beamform in the transmitting or receiving directions, so

each node works in the half-duplex (HD) mode

inject as much traffic demands as possible so as to activate

more concurrent transmission flows. Unfortunately the

authors in [22] have ignored a crucial congestion control

procedure, which may result in large queuing aggregation

and network instability. A recent work in [12] used a

network utility maximization (NUM) framework to study

the operation regime of mmWave relay networks, subject to

an upper delay bound and network stability. As mentioned

in this paper, one important suggestion can be to randomly

re-select some paths from the set of all available paths and

then shift among the links with higher payoff (e.g., the

minimum power consumption or the highest throughput).

However, without a prior topology simplification to remove

unnecessary links, the underlying method in [12] is very

likely to split data into too many paths, resulting in

increased signaling overhead and traffic congestion.

In general, the Shannon capacity of an arbitrary HD

mmWave relay network is unknown and is notoriously

hard to study, since for a network with Nnodes, each

of which can either transmit or receive, there exist

as many as 2Npossible states. The classical network

optimization scheme uses a NUM framework [12, 21,

23–25], which includes a joint congestion control and

routing /scheduling, so as to accept data into the network

to maximize certain utilities and to make scheduling

decisions at each node, such that all accepted data are

delivered to intended destinations without overflowing

any queue in intermediate nodes. However, since the

network capacity is unknown, all the existing algorithms

suffer from the complexity of a multi-parameter tuning

procedure to tackle the fundamental utility-delay tradeoff

[23–25]. A recent progress in information theory [19]

proposed a Gaussian HD 1-2-1 model, which corresponds

to an idealized and simplified information theoretic relay

network. In this model, all the nodes work in HD mode. A

potential link is active only if the transmitter beam and the

receiver beam are pointing at each other. In this way, the

fundamental characteristic of directional transmission and

necessity of two-sided (Tx and Rx) beamforming to “close

the link” (i.e., achieving a sufficient received signal power

after beamforming), are captured by the 1-2-1 model. The

authors in [19] designed an algorithm that computes the

optimal schedule to achieve the approximate capacity in

polynomial time. The approximate capacity is information

theoretically optimal for the Gaussian HD 1-2-1 model

within a gap that depends only on the network size N

but not on the topological and operating signal to noise

ratio (SNR). Moreover, this approximate capacity can be

achieved by activating only a subset of all the available

links.

By noticing the great similarities between the Gaussian

HD 1-2-1 model [19] and the HD mmWave relay network

(i.e., very high pathloss and strong directivity), in this

paper, we introduce this information theoretical result

from [19] into the operation regime of HD mmWave

relay networks. This helps us to understand the maximum

80 6.3 Original journal article

II. SYSTEM MODEL

A. Channel model

We consider a general topology for a HD mmWave

N-node network denoted by N0. The network, as shown

in Fig. 1 (c), consists of N−2relays assisting the

communication between a source node (node 1) and a

destination node (node N)2. We assume that the network

operates in slotted time, denoted by t≥0. At any

time slot, each node can point its transmitting/receiving

beam towards at most one other node along the links

corresponding to the edges of the network graph. In

addition, all the relay nodes operate in HD mode, namely,

at any time slot, each relay can be either transmitting to

or receiving from at most one node. Note that the network

graph describes the ensemble of potential links, i.e., the

links that can transmit information provided that the beam

pointing condition is satisfied. This captures the notions

of blocking and distance, i.e., two nodes in the graph are

connected by an edge if they are sufficiently close and there

is no blocking object between them. The potential links

are actually “active” when the beams of the Tx and Rx

nodes connected by the link are “aligned”. This captures

the fact that even in LoS/proximity condition, isotropic

transmission is not sufficient to achieve the desired SNR

over the link, and that beam alignment is necessary. At

any point in time, the network state is determined by

where the node beams are pointing and whether the node

is transmitting or receiving. We denote the network state

by s. We can mathematically model the aforementioned

network operational features by introducing two discrete

set variables Si,t and Si,r, for each node i∈[N]in state

s. The set variable Si,t (respectively, Si,r) indicates the

node towards which node iis pointing its Tx (respectively,

Rx) beam in state s. With this, we have

Si,t ⊆[N]\{1, i},|Si,t| ≤ 1,(1a)

Si,r ⊆[N]\{i, N},|Si,r| ≤ 1,(1b)

|Si,t|+|Si,r| ≤ 1,(1c)

where S1,r =SN,t =∅since the source node always

transmits and the destination node always receives, and

where (1c) follows the HD operation, i.e., for any relay

node i, if Si,t 6=∅, then Si,r =∅, and vice versa.

We denote by H0∈CN×N, the matrix of complex

channel coefficients between nodes in the network, with

element H0,[j,i]=hj,i,i, j ∈[N], representing the

complex channel coefficient from node ito node j.

Also, since the source node can only transmit and the

destination can only receive, we have h1,i =hj,N ≡0

for all i, j ∈[N]. Aside from these restrictions, the

node connection and channel coefficients can be arbitrary.

2For clarity, we focus on a single source-destination pair. However,

the proposed work can be readily extended to multicasting as long as the

(approximate) network capacity and the corresponding optimal scheduling

are known.

Denote the point-to-point link capacity from node ito

node jby L0∈CN×N, with elements L0,[j,i]=lj,i,

i, j ∈[N]. Suppose that the channel inputs satisfy a unit

average power constraint, hence the link capacity lj,i can

be written as

lj,i = log(1 + G·|hj,i|2),∀i, j ∈[N],(2)

where we assume the additive Gaussian noise at each

node is independent and identically distributed (i.i.d.) as

CN(0,1). The factor Gindicates the combined BF gain

of the Tx and Rx beams in alignment condition. Following

[27], we refer to the HD mmWave network described above

as a Gaussian HD 1-2-1 network.

Note that the Gaussian capacity in (2) is fully justified

in light of our previous results in [8], where we have

shown that effectively, after beam alignment, the channel

for each link is reduced to a pure delay and Doppler shift

(all multipath is killed by directional beamforming), hence,

timing and frequency synchronization after beamforming

can be easily implemented. Therefore, the unfaded

Gaussian capacity for the links after beamforming (2) is

a good first-order model.

B. Network capacity results

The Shannon capacity Cof the considered Gaussian HD

1-2-1 network is in general unknown. However, the work

in [27] has proved that Ccan be approximated by Ccs,iid

as follows,

Ccs,iid ≤C≤Ccs,iid +GAP,(3a)

Ccs,iid = max

λs:λs≥0

Psλs=1

min

A⊆[N−1],

A=¯

A∪{1}X

(j,i):i∈A,

j∈Ac





X

j∈Si,t,

i∈Sj,r

λs



lj,i,

(3b)

GAP =ONlog N,(3c)

where (i) Aenumerates all possible cuts in the graph

representing the network topology, the source node 1

always belongs to Aand Ac= [N]\A; (ii) srepresents

all possible network states of the HD 1-2-1 network, with

each network state scorresponding to specific values for

the set variables Si,t and Si,r as defined in (1); (iii) {λs},

i.e., the optimization variables, are the fraction of time

for which state sis active. We refer to a schedule as

the collection of {λs}for all feasible states, such that

they sum up at most to 1. The expression in (3b) can

be explained as maximizing a graph-theoretical min-cut

over all possible feasible schedules of the HD 1-2-1

network. For any Gaussian HD 1-2-1 networks, Ccs,iid is

the approximate capacity of the network, where there exist

a gap in comparison with the Shannon capacity Cand the

gap depends only on the network size Nas shown in (3c).

In [19], it was shown that Ccs,iid in (3b) can be efficiently

computed by solving an equivalent linear program (LP),

where the state activation times {λs}are replaced by link

82 6.3 Original journal article

activation times {λj,i}. Let ¯

Λ∈CN×Nbe the average link

activation time fraction matrix with elements ¯

Λ[j,i]=λj,i.

Then, it follows that

Ccs,iid = max

j=1

Fj,1,(4a)

s.t. 0≤Fj,i ≤λj,ilj,i, i, j ∈[N],(4b)

j∈[N]

Fj,i =X

k∈[N]

Fi,k, i ∈[N−1] \{1},(4c)

λj,i ≥0, i, j ∈[N],(4d)

λi,j =λj,i +λi,j, i ∈[N−1], j ∈[N]\[i],(4e)

λi,j ≥0, i ∈[N−1], j ∈[N]\[i],(4f)

(i,j):i=kor j=k

i<j

λi,j ≤1, k ∈[N],(4g)

i∈S,j∈S

i<j

λi,j ≤|S|−1

2,S ⊆ [N],|S| is odd,

(4h)

where, Fj,i represents the data flow through the link of

capacity lj,i and λj,i represents the fraction of time in

which the link from node ito node jis active. Note that

although the relay links satisfy reciprocity with lj,i =li,j,

i, j ∈[N−1] \{1}, the corresponding link activation time

λi,j and λj,i are not necessarily equal.

Remark 1: Although the LP in (4) has an exponential

number of constraints, it has been shown in [19] that

using the ellipsoid method, the optimal solution for (4) can

be found in polynomial-time in N. The approach relies

on constructing a polynomial-time separation oracle for

the ellipsoid method for the HD 1-2-1 network using the

concept of Gomory-Hu trees [28]. We refer to [19] for more

comprehensive details. Throughout this work, we will use

the approximate capacity Ccs,iid (4) as a prior to bound the

network capacity. ♦

C. Network stability and end-to-end delay

All the exogenous arrivals first enter the transport layer

at the source node, and this data is held in storage reservoirs

to await acceptance to the network layer. The resulting

source admission rate is determined by a congestion control

mechanism. Assume that the transport layer reservoir at

the source node 1is infinitely backlogged. We denote

by x1(t), the source admission rate at slot t. We say

that a network is stable for an average admission rate

¯x1= lim

T→∞

TPT

t=1 x1(t)if there exists a scheduling

strategy such that the average backlog of all queues is

finite. A well known result [23] is that the network could

be stable for any ¯x1< C, where Cis the Shannon capacity

of the network. Consider a first-in-first-out (FIFO) system,

we assume that only the packets currently in the node

iat the beginning of slot tcan be transmitted during

that slot. Let Di(t)and Ai(t)be the transmitting and

arriving processes at node i, respectively. The arriving

process Ai(t)is composed of random exogenous arrivals

as well as endogenous arrivals resulting from routing and

transmission decisions from other nodes of the network.

We assume that the Ai(t)arrivals occur at the end of

each slot t, so that they cannot be transmitted during that

slot. Accordingly, the slot-to-slot dynamics of the queuing

backlog Ui(t)satisfies the following

Ui(t+ 1) = max Ui(t)−Di(t),0+Ai(t).(5)

To evaluate the network stability under a certain

scheduling scheme, we define the network average sum

backlog given by

U= lim

T→∞

t=1

N−1

i=1

Ui(t)

N−1

i=1 (lim

T→∞

t=1

Ui(t))=

N−1

i=1

Ui,(6)

where ¯

Ui= lim

T→∞

TPT

t=1 Ui(t)denotes the time average

backlog in the queue of node i. Here we have implicitly

ignored the destination node since the backlog at the

destination node UN(t)is always zero.

Note that if the average admission rate at the source node

¯x1exceeds the network capacity, the network would surely

become unstable regardless of the underlying scheduling

schemes. However, within the network capacity region, a

superior beam scheduling scheme should achieve a smaller

average backlog as defined in (6). Actually by Littles’s

theorem [29], a small average backlog indicates also a

small end-to–end delay. Here the end-to-end delay refers

to the time taken for a packet to be transmitted across

the network from the source node 1to the destination

node N. The end-to-end delay comes from several

sources including transmission delay, propagation delay,

processing delay and queuing delay. We assume that the

slot duration is long enough such that the aforementioned

transmission, propagation and processing time are included

within each slot. Moreover, the slot duration remains

constant regardless of the coding and scheduling policies.

Accordingly, the most time consuming part is the queuing

delay [30]. By Little’s theorem [29], the average queuing

delay time ¯ωthat a packet spends in the network satisfies

¯ω=¯

U/¯x1. In the later simulation section, we will evaluate

the network stability as well as the packet end-to-end delay

performance with respect to (w.r.t.) different scheduling

schemes.

D. The NUM framework for joint congestion control and

scheduling

When the exogenous arrival rates are outside the network

capacity region, the network cannot be stabilized without

a congestion control mechanism to limit the amount

of data that is admitted. The classical NUM (network

utility maximization) framework controls the admission

6. Beam Scheduling for mmWave Relay Networks 83

Algorithm 1: The network utility maximization (NUM) framework for joint congestion control and scheduling

Initialization:

Choose V > 0and xmax >0as constant parameters. Initialize the queue backlog at the beginning of time slot t= 1 as

Ui(1) = 0,∀i∈[N].

Iteration:

In each time slot t≥1, repeat the following three steps.

1. Scheduling: At the beginning of each slot t, define the differential backlog weight matrix W(t), with elements

W(t)[j,i]= max{Ui(t)−Uj(t),0}for all j, i ∈[N]. Then choose the scheduling decision matrix Λ(t)∈CN×Nand the

link rate allocation matrix R(t)∈CN×Nas the solution to the following optimization problem

Λ(t),R(t) = arg max

j=1

i=1

(W(t)Λ(t)R(t))[j,i](7a)

s.t. R(t)[j,i]≤lj,i,∀i, j ∈[N](7b)

Λ(t)∈ I.(7c)

where lj,i is the link capacity defined in (2),Iconsists of all feasible link activation sets, i.e., all sets of links that can be

simultaneously activated.

2. Congestion control: For the source node i= 1, calculate the admission rate x1(t)as the solution to the following

optimization problem

x1(t) = arg max V·g1(x1(t)) −x1(t)·U1(t)(8a)

s.t. x1(t)∈[0, xmax],(8b)

where the utility function g1(·)is assumed to be non-decreasing and concave, xmax is a large constant number.

3. Queuing update: For each node i∈[N−1], update the queue backlogs for the next time slot as

Ui(t+ 1) = Ui(t)−X

j∈O(i)

R(t)[j,i]+X

j∈I(i)

R(t)[i,j]+x1(t)·1{i=1},(9)

where O(i)and I(i)represent the sets of outgoing links and incoming links of node i, respectively. 1{·} is an indicator

function that takes the value 1if the underlying condition is true, otherwise 0.

congestion via an optimization of the utility function

g0(x1(t))which represents the “satisfaction” received by

sending the commodity data from source node 1to the

destination node Nat an admission rate of x1(t). The

network is then stabilized by applying the backpressure

algorithm at each time slot t[12, 23, 25, 31]. Define R(t)∈

CN×Nand Λ(t)∈CN×Nas the link rate allocation and

the scheduling decision matrices at time slot t, respectively.

The scheduling decision matrix has elements Λ(t)[j,i]= 1

if link (i, j)is activated, otherwise 0. We summarize the

conceptual NUM framework in Algorithm 1. As discussed

before, in most of the literature it is not clear how to

tune the algorithm parameters Vand xmax, which often

needs an empirical trial-and-error procedure. In contrast,

by knowing Ccs,iid in our proposed scheme, we can easily

get rid of the parameters Vand xmax, setting the source

admission rate as a simple constant. Moreover, we exploit

the insight on the underlying optimization problem to do

network simplification so as to significantly reduce the

network topology / scheduling complexity. In what follows

we will present our scheduling schemes in more detail.

III. PROPOSED BEAM SCHEDULING METHODS

In this section we first introduce a pre-processing procedure

to simplify the network topology, on top of which two

beam scheduling schemes are provided: the deterministic

edge coloring (EC) scheduler and the adaptive backpressure

(BP) scheduler.

A. The prior network topology simplification

The topology of the original network N0hinges on the

link capacity matrix L0. Namely, a link connection (i, j)

exists only if L0,[j,i]=lj,i >0. However, based on the

link activation time ¯

Λ[j,i]=λj,i as calculated in (4), in

order to approach the network approximate capacity Ccs,iid,

some links are not necessary to be activated at all. Aiming

at reducing the scheduling complexity, we define a new

associate N-node network Nwith link capacity Lgiven

L[j,i]=(lj,i, λj,i >0

0,otherwise.(10)

The new network Nis a simplified version of the original

network N0and contains only the links that are necessary

to use w.r.t. the network approximate capacity Ccs,iid. We

consider a running example as shown in Fig. 2 (a). Without

loss of generality, we assume that the link capacities lj,i

are in the unit of packet per slot (packet/slot). The link

84 6.3 Original journal article

B. The deterministic edge coloring (EC) beam scheduler

The EC scheduler leverages the similarities between

network states in HD and edge coloring in a graph [32, 33].

In particular, an edge coloring assigns colors to edges in

a graph such that no two adjacent edges are colored with

the same color. Similarly in HD, a network state cannot

be a receiver and a transmitter simultaneously. Consider

the same running example as illustrated in Fig. 2 with link

capacity matrix L(after a prior network simplification) and

its associated link activation times matrix ¯

Λ, as defined in

(12). Let Mbe a common multiple of the denominators

in ¯

Λ. We construct an associate multigraph N1w.r.t. the

network N, as illustrated in Fig. 2 (c), where the set of

nodes is the same as in Nand each link (i, j)with capacity

L[j,i]>0is replaced by nj,i parallel edges, given by

nj,i =(M·¯

Λ[j,i],L[j,i]>0

0,otherwise.(15)

It is not difficult to see that nj,i ∈Z,∀i, j ∈[N]. It follows

that the maximum node degree ∆of the graph N1can be

written as

∆ = max

i,j,k∈[N]{nj,k +nk,i}.(16)

In the running example we have M= 12,∆ = 12. The

values of nj,i are labeled aside each edge in Fig. 2(c).

Our proposed EC scheduler is applied on Nand N1

consecutively, and consists of two procedures, namely, the

Path Partitioning (PP) and the Alternating Coloring (AC)

procedures described below.

1) Path Partitioning (PP): The PP procedure is based

on network Nand gives a partition of the network links

into independent paths, such that each link in Nappears

only in one path. The main motivation of the PP procedure

is to provide a logical order for the consequent coloring

procedure. Also, each path resulted from the PP procedure

corresponds to a simple line network with a single flow

direction, that logically align with the overall data flow

from the source node 1to the destination node N. This

enables us to implement a consequent alternating coloring

(AC) procedure (as provided in the next paragraph) to

reduce the packet delay [34]. The PP procedure can be

applied as follows: Choose a node in Nwith no incoming

edges. Traverse an edge from that node to another, and

erase that edge. Continue traversing and erasing edges until

a node with no outgoing edge is reached. This gives a

path in the partition. Then choose a new start node and

repeat the process. Do this until no possible start node

remains. We summarize the PP procedure in Algorithm 2

with step 1). Define Pas the set disjoint paths from the

PP procedure and let Pbe the number such paths. For the

running example we have P= 3 and the paths in Pare

illustrated in Fig. 2(d).

2) Alternating Coloring (AC): For each path in P,

replace each link in the path with its corresponding parallel

edges in N1with the number defined as in (15). We color

the edges in an alternative manner, such that any data

packets entering into the network Nwill be transmitted

towards the destination node as soon as possible. More

precisely, for each path, extract one non-colored edge for

each link if there exists. Start from the first non-colored

edge. Consecutive edges in the path are alternately colored

with the smallest legal color. Continue this extracting

and alternately coloring process until no no-colored edge

remains. We summarize the AC procedure in Algorithm 2

with step 2). The color assignments for the running

example are shown in Fig. 2(d). Note that, as illustrated

in Fig. 2(d), the coloring is done greedily. The first path

p1is colored before moving on to color the second path

p2and so on. When coloring the edge in p2for the first

hop, the smallest legal color is #2 since color #1 is already

used for an edge connected to node 1 in p1.

Define Kas the total number of unique colors used

in Algorithm 2, and Ej,i as the set of colors assigned to

link (i, j). Once the two procedures in Algorithm 2 have

finished, the consequent beam scheduling would reduce to

a simple deterministic repetition among the Kstates, where

each state indexed with k∈[K]corresponds to activating

the links associated to the k-th color. Particularly, for each

time slot t≥1, the scheduling decision is given by

Λ(t)[j,i]=1{ˆ

t∈Ej,i}, i, j ∈[N],ˆ

t= ((t−1) mod K)+1,

(17)

where Λ(t)[j,i]= 1 indicates that link (i, j)is activated at

slot t, and it is idle otherwise. The actual transmission rate

for link (i, j)at slot tis given by

R(t)[j,i]= min{L(t)[j,i]·Λ(t)[j,i], Ui(t)}.(18)

Accordingly, the slot-to-slot queuing evolution follows (9).

Lemma 1: The proposed EC scheduler can achieve at

least 1

2of the network approximate capacity, i.e.,

2Ccs,iid < Cmax ≤Ccs,iid,(19)

where Cmax denotes the maximum achievable data rate

under the EC scheduler.

Proof. With a total number of Kcolors used in

the EC scheduler, we have Cmax =∆

KCcs,iid, where

K≥∆since no two incident edges have the same color.

Accordingly, we have proved the upper bound in (19) with

Cmax ≤Ccs,iid. On the other hand, the number of colors

Kw.r.t. network graph N1satisfies K≤2∆ −1. The

proof is straightforward, since for any given edge, there

are at most ∆−1colored edges incident to each of its

endpoints; thus, even if all 2∆ −2edges have different

colors, there is still a single usable color. Accordingly, the

proposed EC scheduler is guaranteed to achieve at least 1

of the approximate capacity Ccs,iid, with Cmax >1

2Ccs,iid.

This is a very nice performance guarantee, given the low

complexity and simplicity of scheduling and the fact that

it yields very low latency (as shown later). Actually, a

86 6.3 Original journal article

Algorithm 2: The two procedures for the edge coloring (EC) beam scheduler

1) Procedure path partition (PP)

Initialization: Make Pan empty list; Make Va set that contains all the nodes in the network N;

while Vis nonempty do

let vbe the first node in Vthat has no incoming links;

delete vfrom V;

if node vhas nonzero outgoing links then

make a new path ρempty;

ˆv:= v;

while node ˆvhas nonzero outgoing links do

let (w, ˆv)be an outgoing link of ˆv;

delete (w, ˆv)from N;

put (w, ˆv)in ρ;

ˆv:= w;

end

put path ρin P;

end

if node vhas nonzero degree then

put vin V

end

2) Procedure alternately coloring (AC)

Initialization: Define Pas the number of paths in P; Define ¯vp,p∈[P], as the number of nodes in the p-th path; Define

Ej,i as the set of colors assigned to link (i, j),Ej,i are initialized as empty; Replace each link in the p-th path with parallel

edges as defined in (15);

for each path pin Pdo

while there still exists non-colored edge in the p-th path do

for kin [¯vp−1] do

assign the smallest legal color e∈Z+to one of the non-colored edges in the k-th hop of path p;

put ein the corresponding set Ej,i;

end

classical upper bound [35] on coloring the multigraph N1

states that an optimal coloring scheme (one that uses the

minimum number of colors possible) uses at most ∆ + µ

colors, where µis the multiplicity of graph N1, i.e, the

maximum number of edges in any bundle of parallel edges.

Although not theoretically proven, we have observed in our

simulations that the number of colors Kused by our EC

scheduler satisfies that K≤∆ + µ. Hence in most of

the cases, the EC scheduler can guarantee much more than

2Ccs,iid.

C. The adaptive backpressure (BP) beam scheduler

The EC scheduler described in the previous subsection

is rather simple, since once the K-color states are obtained,

the network scheduling becomes deterministic, namely,

the scheduler just needs to periodically repeat the K

states defined by (17). However, since the EC scheduler

is one-time predetermined by the network link capacities

lj,i, the scheduler is mostly favorable for quasi-static

scenarios, and needs to be recomputed whenever some

significant changes in the network topology or potential

link capacities occur. As an alternative approach for

time-varying scenarios, we will consider “online” dynamic

scheduling policies that are guaranteed to achieve stability

for all x1(t)≤Ccs,iid. In particular, we consider the

well-known BP algorithm [23] which is well understood to

stabilize the network whenever the source admission rate

lies within the capacity region of the network.

Define the differential backlog weight matrix W(t)∈

CN×Nwith elements given by

W(t)[j,i]=(max{Ui(t)−Uj(t),0}, λj,i >0

0,otherwise,(20)

where as mentioned before, we have intentionally ignored

all the links that would never be activated (λj,i = 0)

in terms of achieving the network approximate capacity

Ccs,iid. As a consequence, the scheduler only needs to deal

with a much smaller set of links, which can significantly

reduce the scheduling complexity. Similarly, define the

candidate transmit rate matrix ˆ

R(t)∈CN×Nwith

elements given by

R(t)[j,i]=(min{Ui(t),L[j,i]}, λj,i >0

0,otherwise.(21)

Then choose the scheduling matrix Λ(t)at slot tas the

solution to the following binary integer program (BIP)

optimization problem

Λ(t) = arg max

j=1

i=1 W(t)ˆ

R(t)Λ(t)[j,i]

(22a)

6. Beam Scheduling for mmWave Relay Networks 87

s.t. Λ(t)[j,i]∈ {0,1},(22b)

kΛ(t)[:,i]k1+kΛ(t)[i,:]k1≤1, i ∈[N],(22c)

where (22b) denotes the binary activation constraint, and

(22c) indicates the HD 1-2-1 network operating constraint,

i.e., a node can at most receive from (or transmit to) one

node and cannot do both simultaneously. Accordingly, the

actual link transmission rate due to (22) is given by

R(t)[j,i]=ˆ

R(t)[j,i]·Λ(t)[j,i],(23)

and the slot-to-slot queuing evolution follows the procedure

of (9).

It is well-known that BP is able to stabilize the network

the source admission rate lies within the capacity region

of the network [23]. In the following section, we compare

the performance of our two proposed algorithms with a

“standard” baseline scheme, and show how applying the

EC and BP schedulers on top of the simplified network N

can significantly reduce the scheduling complexity, thus,

bearing smaller queuing backlogs. Also, the packets will

experience much smaller packet end-to-end delays.

Remark 3: Note that although (22) is an integer

linear program, the convex hull of its feasible points

can be represented by a set of linear inequalities using

Edmonds [36] matching polytope. The matching polytope,

although having an exponential number of constraints, can

be efficiently solved in polynomial-time using the ellipsoid

method [19]. ♦

IV. NUMERICAL RESULTS

In this section, we investigate the numerical performance

of the proposed EC and BP schedulers, and compare

them with a “standard” baseline scheme. We start off by

presenting our simulation scenarios and then discuss our

baseline comparison before delving into the simulation

results.

Simulated Examples. We consider two running

examples (two random network topologies) and denote the

two examples by Exp1 and Exp2, respectively. The first

running example Exp1 is the same network N0used in the

previous subsections with total number of nodes N= 7

and the link capacity matrix L0given by (11). By solving

(4) and (10) for Exp1, the network approximate capacity is

Ccs,iid = 15 packets/slot. The link activation time fraction

matrix ¯

Λand the link capacity matrix Lfor the simplified

network Nare shown in (12). The second running example

Exp2 again has N= 7 nodes. The link capacity matrix for

Exp2 is given by

L0=





0000000

7069890

7607660

9970670

9866060

8967600

0676670







.(24)

Following the approach in (4)(10), the link activation

time fraction ¯

Λand the link capacity Lof the simplified

network Nsatisfy

Λ=







0 0 0 0 0 0 0

20 0 0 0 0 0

18 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 1

20 0 0

0 0 1

20 0 1







(25)

and

L=





0000000

7000000

0900000

0000000

0007000

0070070







,(26)

respectively. The network approximate capacity reads

Ccs,iid = 7 packets/slot. Note that unless otherwise stated,

we will assume that simulated network is static.

Baseline scheme. As a comparison, we also consider

a baseline backpressure scheme from [23] and denote it

by BPo. The baseline scheme BPo has been commonly

used in the literature [12, 21, 31], which uses the same

NUM framework as in Algorithm 1 and the underlying

scheduling is also based on the concept of backpressure.

In contrast with our proposed beam schedulers, the BPo

does not exploit the knowledge of the network approximate

capacity and the resulting network simplification. As a

result, the congestion control in BPo requires a complex

multi-parameter (V, xmax)tuning procedure so as to tackle

the fundamental utility-delay tradeoff as illustrated in (8).

Moreover, the scheduling procedure in BPo is directly

implemented on the original network topology N0, which

consequently encounters a larger scheduling complexity.

In what follows, we provide numerical results that: 1)

Evaluate the performance of our proposed schemes; 2)

Compare the performance of our proposed schemes and

the aforementioned baseline scheme. We also separately

provide simulation results for time-varying scenarios,

where the network encounters accidental blockages.

A. Performance evaluation of the proposed schemes

We consider three performance metrics, i.e., the

network stability, the packet end-to-end delay and the

queuing evolution in time-varying scenarios with accidental

blockages.

In terms of network stability, Fig. 3 (a) illustrates the

numerical performance for the example network Exp1.

Here the network approximate capacity is Ccs,iid = 15

packets/slot , shown by the vertical dotted line. As we

can see from Fig. 3 (a), within the network capacity

region x1(t)≤Ccs,iid, both the EC scheduler and the

BP scheduler can guarantee a finite average backlog ¯

i.e., can effectively stabilize the network. In particular,

88 6.3 Original journal article

2 4 6 8 10 12 14 16

100

150

200

250

Admission rate x1(t)(packet/slot)

Average backlog ¯

U(packets)

(a)

Ccs,iid

2 4 6 8

100

150

200

Admission rate x1(t)(packet/slot)

(b)

Ccs,iid

Fig. 3: The average backlog ¯

Uwith respect to different source admission rate x1(t). (a) Evaluation on the running example Exp1,

where the network approximate capacity reads Ccs,iid = 15 packets/slot, the maximum node degree of the corresponding associate

multigraph is ∆ = 12, and the total number of unique colors used in the EC scheduler reads K= 12 = ∆. (b) Evaluation on the

running example Exp2, with Ccs,iid = 7 packets/slot, ∆ = 18 and K= 19 >∆.

since the total number of unique colors used in the

EC scheduler equals the maximum node degree of the

corresponding associate multigraph N1with K= ∆ = 12,

the maximum achievable data rate under the EC scheduler

reaches exactly the maximum network capacity point with

Cmax =∆

KCcs,iid =Ccs,iid. The stability evaluation w.r.t.

the example network Exp2 is shown in Fig. 3 (b), where the

network approximate capacity is Ccs,iid = 7 packets/slot.

In this particular example, the total number of unique

colors used in the EC scheduler is slightly greater than

the maximum node degree of the corresponding associate

multigraph N1with K= 19 colors and ∆ = 18 maximum

degree. As a result, the maximum achievable data rate for

Exp2 using the EC scheduler is Cmax =∆

KCcs,iid < Ccs,iid.

In addition to this, the numerical performance in Exp2 is

similar to that in Exp1. As shown in Fig. 3 (b), within

the corresponding capacity ranges, i.e., x1(t)≤Cmax

for the EC scheduler and x1(t)≤Ccs,iid for the BP

scheduler, both the two schedulers can efficiently stabilize

the network with finite average backlog ¯

U. With the same

source admission x1(t)≤Cmax, the average backlog ¯

with the BP scheduler is slightly smaller than that with the

EC scheduler.

Note that, although the BP scheduler shows slight

apparent benefits over the EC scheduler as seen in

Fig. 3 (a)-(b), it should be contrasted with its operational

complexity. The BP scheduler must solve a weighted

sum rate maximization (22) at each time slot, while the

EC scheduler uses only one-time computation and then

periodical state repetition.

In terms of packet end-to-end delay, Fig. 4 (a)-(b)

illustrates the numerical performance w.r.t. Exp1. Here

the end-to-end delay indicates how long the packets are

delayed in the queues during the transmission from the

source node to the destination node. The cumulative

density function (CDF) of the packet delay in Fig. 4

indicates the probability that the packet end-to-end delay

is smaller than the specified delay. The packet delay

distribution for example Exp1 under the EC scheduler

is shown in Fig. 4 (a), where the source admission rate

is set as x1(t) = 12 packets/slot. As we can see, with

probability 1the end-to-end delay of each individual data

packet is smaller than 13 slots. By increasing the source

admission rate from x1(t) = 12 packets/slot to x1(t) = 15

packets/slot, the maximum end-to-end delay increases to

15 slots. The BP scheduler achieves similar performance

as shown in Fig. 4 (b). As we can see, with probability

1the packet end-to-end delay is smaller than 3slots for

source admission x1(t) = 12 packets/slot. This maximum

delay shifts to 13 slots for source admission x1(t) = 15

packets/slot.

The delay performance for Exp2 is similar as shown

in Fig. 4 (c)-(d). Namely, with probability 1the packet

end-to-end delay is smaller than 10 slots under EC

scheduler with x1(t) = 4 packets/slot, 14 slots under EC

scheduler with x1(t) = 6 packets/slot, 4slots under BP

scheduler with x1(t) = 4 packets/slot, and finally 5slots

under BP scheduler with x1(t) = 7 packets/slot. A short

conclusion is that for either of the proposed two schedulers,

by increasing the source admission rate x1(t), packet delay

CDF curve translates more to the right side, namely, the

packets experience longer delays.

The queuing evolution regarding the instantaneous (i.e.,

not time-averaged) sum backlog PN−1

i=1 Ui(t)w.r.t. Exp1

is show in Fig. 5 (a) and (b) for the static and the

time-varying scenarios, respectively. Here we assume that

in the static scenario, the link capacities are constant

with no accidental blockages, however, in the time-varying

6. Beam Scheduling for mmWave Relay Networks 89

0 10 20 30 40

0.2

0.4

0.6

0.8

CDF

(a)

EC, x1(t) = 12

EC, x1(t) = 15

0 10 20 30 40

0.2

0.4

0.6

0.8

(b)

BP, x1(t) = 12

BP, x1(t) = 15

0 5 10 15 20 25 30

0.2

0.4

0.6

0.8

End-to-end delay (slots)

CDF

(c)

EC, x1(t) = 4

EC, x1(t) = 6

0 5 10 15 20 25 30

0.2

0.4

0.6

0.8

End-to-end delay (slots)

(d)

BP, x1(t)=4

BP, x1(t)=7

Fig. 4: The packet end-to-end delay distribution under the proposed EC and BP schedulers. (a) The delay distribution in Exp1 under

the EC scheduler, with admission rate x1(t) = 12 and x1(t) = 15, respectively. (b) The delay distribution in Exp1 under the BP

scheduler, with admission rate x1(t) = 12 and x1(t) = 15, respectively. (c) The delay distribution in Exp2 under the EC scheduler,

with admission rate x1(t) = 4 and x1(t) = 6, respectively. (d) The delay distribution in Exp2 under the BP scheduler, with admission

rate x1(t) = 4 and x1(t) = 7, respectively. For all the cases, the maximum delay will increase by increasing the source admission

rate x1(t).

scenario, link (7,6) will be blocked every T0= 200

slots and each time the blocking will last for 80 slots.

We assume that the network state, the computation of the

network approximate capacity and the overall scheduling

decisions will be updated every TEC = 50 slots for the EC

scheduler and every TBP = 1 slot for the BP scheduler,

respectively. As we can see from Fig. 5 (a), in the static

scenario, the sum backlog and its fluctuations under the

EC scheduler are slightly larger than that under the BP

scheduler. In the time-varying scenario, however, the sum

backlog and its fluctuation under the EC scheduler are

much larger than that under the BP scheduler as illustrated

in Fig. 5 (b). We can observe a similar performance in the

Exp2 network as illustrated in Fig. 5 (c) and (d) for the

static and the time-varying scenarios, respectively. Here

we assume that in the static scenario, the link capacities

are constant with no accidental blockages, while in the

time-varying scenario, link (1,0) will be blocked every

T0= 200 slots and each time the blocking will last for

40 slots. The scheduler updates are the same as in Exp1.

As we can see, the performance difference between the

proposed two schedulers in the static scenario is very

moderate. However, in the time-varying scenario, the BP

scheduler again outperforms the EC scheduler in terms the

amount of queuing backlog and its fluctuations. Therefore,

we claim that the EC scheduler is more suitable for static

scenarios with mulch less computation and slightly larger

sum backlog than that under the BP scheduler. In contrast,

the BP scheduler will be updated in every time slot and

react very fast to blockages, thus is more favorable for

time-varying scenarios.

90 6.3 Original journal article

0 200 400 600 800 1,000

100

150

200

Sum backlog (packets)

(a)

EC, x1(t) = 13

BP, x1(t) = 13

0 200 400 600 800 1,000

100

150

200

(b)

EC, x1(t) = 13

BP, x1(t) = 13

0 200 400 600 800 1,000

100

120

Iterations (slots)

Sum backlog (packets)

(c)

EC, x1(t) = 6

BP, x1(t)=6

0 200 400 600 800 1,000

100

120

Iterations (slots)

(d)

EC, x1(t) = 6

BP, x1(t)=6

Fig. 5: The instantaneous sum backlog PN−1

i=1 Ui(t)w.r.t. increasing iterations (slots): (a) Exp1 in static scenario. (b) Exp1 in

time-varying scenario with accidental blockages. (c) Exp2 in static scenario. (d) Exp2 in time-varying scenario with accidental

blockages.

B. The performance comparison with the baseline scheme

Fig. 6 (a) compares the instantaneous sum backlog

PN−1

i=1 Ui(t)between the proposed schemes (EC, BP) and

the baseline scheme (BPo) w.r.t. the running example Exp1.

For the baseline scheme BPo, we choose the sum-rate

utility as follows: since there is only one commodity,

then in the NUM framework in Algorithm 1, we have

g1(x1(t)) = x1(t). Aiming at on one hand to approach

the network capacity (w.r.t. large value of xmax), and on

the other hand to handle the utility-delay tradeoff (w.r.t.

(O(V), O(1/V ))), we choose three sets of parameter for

the baseline BPo scheme with (V, xmax) = (200,200),

(V, xmax) = (200,50) and (V, xmax) = (50,200),

respectively. For our proposed schemes EC and BP, since

we have managed to compute the network approximate

capacity Ccs,iid as shown in (4), the congestion control

reduces to a simple constant threshold given by (14). Hence

we do not need to suffer from a complex multi-parameter

(V, xmax)tuning procedure. We pick the point with the

maximum achievable data rate (source admission rate) for

the proposed schemes, i.e., x1(t) = 15 packets/slot for both

of the EC and BP schedulers. As we can see from Fig. 6 (a),

all the underlying schemes can stabilize the network since

they all converge to finite backlogs. The baseline scheme

BPo can approximately approach Ccs,iid in a long-term

average sense indicated by ¯x1. It’s worth noting that the

fluctuation ranges of instantaneous sum backlog converge

to [96,105] packets under the EC scheduler and [93,98]

packets under the BP scheduler, respectively. However, this

fluctuations increase to the ranges of [326,529],[306,404],

and [118,346] packets under the baseline BPo scheme

with (V, xmax) = (200,200),(V, xmax) = (200,50), and

(V, xmax) = (50,200), respectively. Hence, the proposed

schemes achieve much smaller backlog and much smaller

backlog fluctuations compared with the baseline scheme.

As for the packet end-to-end delay, Fig. 6 (b) illustrates

6. Beam Scheduling for mmWave Relay Networks 91

0 200 400 600 800 1,000

200

400

600

800

1,000

Iterations (slots)

Sum backlog (packets)

(a)

EC, x1(t) = 15

BP, x1(t) = 15

BPo #1,¯x1= 14.8

BPo #2,¯x1= 14.7

BPo #3,¯x1= 14.5

0 10 20 30 40 50 60

0.2

0.4

0.6

0.8

End-to-end delay (slots)

CDF

(b)

EC, x1(t) = 15

BP, x1(t) = 15

BPo #1,¯x1= 14.8

BPo #2,¯x1= 14.7

BPo #3,¯x1= 14.5

Fig. 6: The performance comparison between the proposed schemes (EC, BP) and the baseline scheme (BPo) w.r.t. the first running

example Exp1. (a) The instantaneous sum backlog PN−1

i=1 Ui(t)w.r.t. increasing iterations (slots). (b) The packet end-to-end delay

distribution. The multi-parameter sets in the BPo scheme are (V, xmax) = (200,200),(V, xmax) = (200,50) and (V, xmax) =

(50,200) for #1,#2 and #3, respectively.

the delay distributions w.r.t. different schemes in the

running example Exp1. As we can see, when the source

admission rate is set as x1(t) = 15 packets/slot, with

probability 1the end-to-end delay of each individual data

packet is smaller than 15 slots under the EC scheduler and

smaller than 13 slots under the BP scheduler, respectively.

However, all the curves w.r.t. the BPo scheme significantly

shift to the right side. Namely, the maximum packet

end-to-end delays under the BPo scheme with different

parameter sets are much larger than that under the proposed

schemes (>68 slots).

As illustrated in Fig. 7, the numerical results in the

running example Exp2 achieve similar performance as

that in Exp1. Again we choose three sets of parameter

for the baseline scheme BPo with (V, xmax) = (40,50),

(V, xmax) = (40,20) and (V, xmax) = (10,50),

respectively. For the proposed schemes, we pick the

point with the maximum achievable data rate (source

admission rate) , i.e., x1(t)=6packets/slot for the EC

scheduler and x1(t) = 7 packets/slot for the BP scheduler,

respectively. As we can see from Fig. 7 (a), the fluctuation

ranges of instantaneous sum backlog converge to [26,36]

packets under the EC scheduler, [21,21] packets under

the BP scheduler, [224,326] packets under the BPo with

(V, xmax) = (40,50),[226,279] packets under the BPo

with (V, xmax) = (40,20), and [50,149] packets under

the BPo with (V, xmax) = (10,50). Hence, the proposed

schemes achieve much smaller backlog and much smaller

backlog fluctuations compared with the baseline scheme.

The packet end-to-end delay distribution is illustrated in

Fig. 7 (b). As we can see, the maximum end-to-end delay

under the proposed schemes are 14 slots (EC, x1(t) = 6)

and 5slots (BP, x1(t) = 7), respectively. However, the

packets under the baseline BPo scheme with different

parameter sets experience much longer delays (35 slots).

V. CONCLUSION

In this paper, we studied the beam scheduling problem

for HD mmWave relay networks with arbitrary topology.

Our study focused on developing practically relevant

scheduling algorithms guided by theoretical results on the

approximate capacity Ccs,iid and optimal scheduling in

mmWave network models [19]. Based on the theoretically

optimal schedule results, we first implemented a network

simplification procedure to reduce the network topology

complexity. Accordingly, using this simplified topology, we

proposed two practical and very simple beam scheduling

schemes; the deterministic edge coloring (EC) scheduler

and the adaptive backpressure (BP) scheduler. The former

is a very simple one-time computation followed by a

periodic repetitive schedule, hence is more suitable for

quasi-static scenarios. The later is an “online” approach

which will update in every time slot, thus is more

favorable for time-varying scenarios. We have shown

through simulation that both the proposed schedulers can

guarantee the network stability within a certain operating

range of the input rate. In particular, the EC scheduler

guarantees stability for input rates less than ∆

KCcs,iid,

where ∆and Kdenote the maximum degree and the

number of colors used in EC for an associate multigraph,

respectively; The BP scheduler guarantees stability for rates

less than Ccs,iid. Moreover, in comparison with a standard

baseline scheme, which consists of applying classical

BP-based NUM over the whole network (without network

simplification), the proposed schedulers do not require the

92 6.3 Original journal article

0 200 400 600 800 1,000

200

400

600

Iterations (slots)

Sum backlog (packets)

(a)

EC, x1(t) = 6

BP, x1(t)=7

BPo #1,¯x1= 6.8

BPo #2,¯x1= 6.7

BPo #3,¯x1= 6.7

0 10 20 30 40 50 60

0.2

0.4

0.6

0.8

End-to-end delay (slots)

CDF

(b)

EC, x1(t) = 6

BP, x1(t)=7

BPo #1,¯x1= 6.8

BPo #2,¯x1= 6.7

BPo #3,¯x1= 6.7

Fig. 7: The performance comparison between the proposed schemes (EC, BP) and the baseline scheme (BPo) w.r.t. the second running

example Exp2. (a) The instantaneous sum backlog PN−1

i=1 Ui(t)w.r.t. increasing iterations (slots). (b) The packet end-to-end delay

distribution. The multi-parameter sets in the BPo scheme are (V, xmax) = (40,50),(V, xmax) = (40,20) and (V, xmax) = (10,50)

for #1,#2 and #3, respectively.

empirical tuning of the BP control parameters and achieve

a much smaller queuing backlogs and packet end-to-end

delays.

REFERENCES

[1] Y. Niu, W. Ding, H. Wu, Y. Li, X. Chen, B. Ai, and Z. Zhong,

“Relay-Assisted and QoS Aware Scheduling to Overcome Blockage

in mmWave Backhaul Networks,” IEEE Transactions on Vehicular

Technology, vol. 68, no. 2, pp. 1733–1744, 2019.

[2] A. Dimas, D. S. Kalogerias, and A. P. Petropulu, “Cooperative

beamforming with predictive relay selection for urban mmWave

communications,” IEEE Access, vol. 7, pp. 157 057–157 071, 2019.

[3] Y. Yan, Q. Hu, and D. M. Blough, “Path Selection with Amplify

and Forward Relays in mmWave Backhaul Networks,” in 2018

IEEE 29th Annual International Symposium on Personal, Indoor

and Mobile Radio Communications (PIMRC), Sep. 2018, pp. 1–6.

[4] M. Shafi, J. Zhang, H. Tataria, A. F. Molisch, S. Sun, T. S.

Rappaport, F. Tufvesson, S. Wu, and K. Kitao, “Microwave vs.

Millimeter-Wave Propagation Channels: Key Differences and Impact

on 5G Cellular Systems,” IEEE Communications Magazine, vol. 56,

no. 12, pp. 14–20, 2018.

[5] A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen,

L. Li, and K. Haneda, “Hybrid beamforming for massive MIMO:

A survey,” IEEE Communications Magazine, vol. 55, no. 9, pp.

134–141, 2017.

[6] R. W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and A. M.

Sayeed, “An overview of signal processing techniques for millimeter

wave MIMO systems,” IEEE journal of selected topics in signal

processing, vol. 10, no. 3, pp. 436–453, 2016.

[7] X. Song, S. Haghighatshoar, and G. Caire, “A scalable and

statistically robust beam alignment technique for mm-Wave

systems,” IEEE Trans. on Wireless Comm., vol. PP, pp. 1–1, 2018.

[8] X. Song, S. Haghighatshoar, and G. Caire, “Efficient Beam

Alignment for Millimeter Wave Single-Carrier Systems With

Hybrid MIMO Transceivers,” IEEE Transactions on Wireless

Communications, vol. 18, no. 3, pp. 1518–1533, 2019.

[9] X. Song, T. K¨

uhne, and G. Caire, “Fully-/Partially-Connected

Hybrid Beamforming Architectures for mmWave MU-MIMO,”

IEEE Transactions on Wireless Communications, vol. 19, no. 3, pp.

1754–1769, 2020.

[10] Y. Xu, H. Shokri-Ghadikolaei, and C. Fischione, “Distributed

Association and Relaying With Fairness in Millimeter Wave

Networks,” IEEE Transactions on Wireless Communications,

vol. 15, no. 12, pp. 7955–7970, 2016.

[11] Y. Xu, G. Athanasiou, C. Fischione, and L. Tassiulas,

“Distributed Association Control and Relaying in Millimeter

Wave Wireless Networks,” in 2016 IEEE International Conference

on Communications (ICC), 2016, pp. 1–6.

[12] T. K. Vu, M. Bennis, M. Debbah, and M. Latva-Aho,

“Joint Path Selection and Rate Allocation Framework for 5G

Self-Backhauled mm-wave Networks,” IEEE Transactions on

Wireless Communications, vol. 18, no. 4, pp. 2431–2445, 2019.

[13] J. Chang and Y. Chen, “A cluster-based relay station deployment

scheme for multi-hop relay networks,” Journal of Communications

and Networks, vol. 17, no. 1, pp. 84–92, 2015.

[14] Y. Wei, Y. Hou, L. Li, and M. Song, “Energy efficient topology

control for multi-hop relay cellular networks based on flow

management,” Journal of Communications and Networks, vol. 19,

no. 6, pp. 618–626, 2017.

[15] X. Song, R. Zhang, J. Pan, and J. Liu, “A statistical geometric

approach for capacity analysis in two-hop relay communications,”

in 2013 IEEE Global Communications Conference (GLOBECOM),

Conference Proceedings, pp. 4823–4829.

[16] T. Cover and A. E. Gamal, “Capacity theorems for the relay

channel,” IEEE Transactions on Information Theory, vol. 25, no. 5,

pp. 572–584, 1979.

[17] W. Yi, Y. Liu, Y. Deng, A. Nallanathan, and R. W. Heath, “Modeling

and analysis of mmwave v2x networks with vehicular platoon

systems,” IEEE Journal on Selected Areas in Communications,

vol. 37, no. 12, pp. 2851–2866, 2019.

[18] S. Lien, Y. Kuo, D. Deng, H. Tsai, A. Vinel, and A. Benslimane,

“Latency-optimal mmwave radio access for v2x supporting next

generation driving use cases,” IEEE Access, vol. 7, pp. 6782–6795,

2019.

[19] Y. H. Ezzeldin, M. Cardone, C. Fragouli, and G. Caire,

“Polynomial-time Capacity Calculation and Scheduling for

Half-Duplex 1-2-1 Networks,” in 2019 IEEE International

Symposium on Information Theory (ISIT), 2019, pp. 460–464.

[20] R. Abdel-Raouf, H. Esmaiel, and O. A. Omer, “Fuzzy logic

based relay selection for mmWave communications,” in 2019 9th

Annual Information Technology, Electromechanical Engineering

and Microelectronics Conference (IEMECON), March 2019, pp.

263–267.

[21] J. Garc´

ıa-Rois, F. G´

omez-Cuba, M. R. Akdeniz, F. J.

Gonz´

alez-Castao, J. C. Burguillo, S. Rangan, and B. Lorenzo, “On

the analysis of scheduling in dynamic duplex multihop mmwave

6. Beam Scheduling for mmWave Relay Networks 93

cellular systems,” IEEE Transactions on Wireless Communications,

vol. 14, no. 11, pp. 6028–6042, 2015.

[22] B. P. S. Sahoo, C. Yao, and H. Wei, “Millimeter-Wave Multi-Hop

Wireless Backhauling for 5G Cellular Networks,” in 2017 IEEE 85th

Vehicular Technology Conference (VTC Spring), 2017, pp. 1–5.

[23] L. Georgiadis, M. J. Neely, and L. Tassiulas, “Resource allocation

and cross-layer control in wireless networks,” Foundations and

Trends in Networking, vol. 1, no. 1, pp. 1–144, 2006.

[24] S. Wang and N. Shroff, “Towards fast-convergence, low-delay and

low-complexity network optimization,” Proceedings of the ACM on

Measurement and Analysis of Computing Systems, vol. 1, no. 2,

p. 34, 2017.

[25] H. Yu and M. J. Neely, “A new backpressure algorithm for joint rate

control and routing with vanishing utility optimality gaps and finite

queue lengths,” IEEE/ACM Transactions on Networking, vol. 26,

no. 4, pp. 1605–1618, 2018.

[26] X. Song and G. Caire, “Queue-Aware Beam Scheduling for

Half-Duplex mmWave Relay Networks,” in 2020 IEEE International

Symposium on Information Theory (ISIT), 2020, pp. 1611–1616.

[27] Y. H. Ezzeldin, M. Cardone, C. Fragouli, and G. Caire, “Gaussian

1-2-1 networks: Capacity results for mmWave communications,” in

2018 IEEE International Symposium on Information Theory (ISIT),

June 2018, pp. 2569–2573.

[28] R. E. Gomory and T. C. Hu, “Multi-Terminal Network Flows,”

Journal of the Society for Industrial and Applied Mathematics,

vol. 9, no. 4, pp. 551–570, 1961.

[29] J. D. Little, “A proof for the queuing formula: L=W,” Operations

research, vol. 9, no. 3, pp. 383–387, 1961.

[30] D. Park, “A throughput-optimal scheduling policy for wireless relay

networks,” in 2010 IEEE Wireless Communication and Networking

Conference, April 2010, pp. 1–5.

[31] J. Wang, L. He, and J. Song, “Stochastic Optimization Based

Dynamic User Scheduling and Hybrid Precoding for Broadband

MmWave MIMO,” in ICC 2019 - 2019 IEEE International

Conference on Communications (ICC), 2019, pp. 1–6.

[32] H. N. Gabow, “Using Euler Partitions to Edge Color Bipartite

Multigraphs,” International Journal of Computer & Information

Sciences, vol. 5, no. 4, pp. 345–355, 1976.

[33] C. Sinnamon, “Fast and Simple Edge-Coloring Algorithms,”

preprint arXiv:1907.03201, 2019.

[34] X. Song and G. Caire, “Queue-Aware Beam Scheduling for

Half-Duplex mmWave Relay Networks,” in 2020 IEEE International

Symposium on Information Theory (ISIT), 2020, pp. 1611–1616.

[35] V. G. Vizing, “On an estimate of the chromatic class of a p-graph,”

Discret Analiz, vol. 3, pp. 25–30, 1964.

[36] J. Edmonds, “Maximum matching and a polyhedron with 0,

1-vertices,” Journal of research of the National Bureau of Standards

B, vol. 69, no. 125-130, pp. 55–56, 1965.

94 6.3 Original journal article

Conclusions

7.1 Summary of this thesis

In the past decades, tremendous fundings and research efforts have been dedicated to

the investigation of millimeter wave (mmWave) wireless communication, since the use of

mmWaves will solve the spectrum shortage in current sub-6GHz cellular communication

systems and offer unprecedented multi-Gbps date rates for each mobile devices in the

next generation (5G) mobile communication systems. This thesis has proposed several

enabling schemes to address the challenges in mmWave communication including the

initial access, the data communication and the relay networking.

For the initial access, we proposed two efficient beam alignment (BA) schemes for

mmWave OFDM (orthogonal frequency division multiplexing) system and mmWave SC

(single-carrier) system, respectively. The proposed schemes are based on quadratic

channel measurements and the non-negative Least Squares (NNLS) technique in

compressed sensing (CS). These schemes can operate in much more realistic conditions

than existing schemes in the literature, are strongly scalable for multi-user scenarios

and are very robust to fast channel variations cased by Doppler spread.

For the data communication after BA is achieved, we defined two “extreme” hybrid

digital analog (HDA) antenna architectures, i.e., the fully-connected (FC) architecture

and the one-stream-per-subarray (OSPS) architecture. We provided a joint performance

evaluation of the initial access and data communication phases with more realistic channel

and hardware conditions. In each phase, we proposed our own BA and precoding schemes

that outperform the counterparts in the literature. We have observed that the proposed

two architectures achieve similar sum spectral efficiency, but the OSPS architecture

outperforms the FC case in terms of hardware complexity and power efficiency, only at

the cost of a slightly longer time of initial beam acquisition.

96 7.2 Future directions

On top of the above beamforming work, we further extended our work into

mmWave relay networking. For a general half-duplex (HD) mmWave relay network with

arbitrary relay connections, we proposed two beam scheduling schemes to approach

the approximate information theoretical Shannon capacity, namely, the deterministic

edge coloring (EC) scheduler and the adaptive backpressure (BP) scheduler. The EC

scheduler is more suitable for static scenarios since it is one-time computation and

then periodically state repetition. In contrast, the BP scheduler is more favorable

for time-varying scenarios because it updates in every time slots. Both the proposed

schedulers can effectively stabilize the network, meanwhile achieve much smaller queuing

backlogs, much smaller backlog fluctuations, and much lower packet end-to-end delays

in comparison with the reference baseline scheme.

7.2 Future directions

One interesting direction to go on top of this thesis is mmWave vehicle-to-vehicle

(V2V) and vehicle-to-everything (V2X) communication. mmWave V2V and V2X

communication can provide NLOS information about the surrounding environment,

thus improve the safety and traffic efficiency of cooperative automated driving. In the

V2V and V2X scenarios, the nodes will create a relay network and route data packets

through multi-hop transmission. An essential component in these networks consists

of selecting, at each time slot, which beams are active, and in which direction they

should be pointed. Therefore, the problem of beamforming (proposed in Chapter 4-6) is

intrinsically connected with the problem of beam scheduling (proposed in Chapter 7),

since transmission is not isotropic as in conventional wireless networks, rather highly

directional.

Also, the development of mmWave massive MIMO communication technology is now

in the hands of the product departments of companies such as Huawei, qualcomm,

Ericsson, Nokia, etc.. A large number of communication, signal processing, and

optimization algorithms have been development over the years and it remains to be

seen which ones will work well in practice. If 5G becomes a commercial success,

massive digitally controllable antenna arrays will be deployed “everywhere” for countless

applications at mmWave frequencies and even much higher THz frequencies (6G). Thus,

we can expect a future where extremely large aperture array with thousands of antenna

elements are used to serve a set of users. There are, however, practical limits to how

many antennas can be deployed at conventional towers and rooftop locations.

In addition, there is a recent surge of papers applying machine learning (ML) to

various problems in communications. ML is especially powerful when a system has

characteristics that are hard to model or analyze by conventional approaches. Thus

it would be an exciting possibility to use ML in the future mmWave wireless systems

whenever a good model is lacking, or a model is available but it is intractable for analysis.

7. Conclusions 97

However, before ML can be successfully used in communication systems, many obstacles,

like the acquisition of training data, the hard real-time constraints and so on still need

to be overcome.

Acronyms and Abbreviations

AoA angle of arrival 7

AoD angle of departure 7

AR augmented reality 2

AWGN additive white Gaussian noise 16

BA beam alignment 7

BBF before beamforming 17

BP backpressure 9

BS base station 8

CSI full channel state information 7

D-RoF digital radio-over-fiber 12

DFT discrete Fourier transform 15

EB exabytes 1

EC edge coloring 9

eMBB enhanced mobile broadband 2

FC fully-connected 7

Gbps gigabit per second 2

HD half-duplex 9

HD high definition 1

HDA hybrid digital analog 9

IC integrated circuit 6

IMT

international mobile telecommunications 1

IP Internet protocol 11

IQ in-phase and quadrature 6

LAN local area network 6

MAC media access control 6

MIMO multiple input multiple output 2

mMTC

massive machine type communications

mmWave millimeter wave 2

OFDM

orthogonal frequency division multiplex-

ing 6

OSPS one-stream-per-subarray 9

PA power amplifier 6

PSD power spectral density 16

RF radio frequency 5

SC single carrier 6

SNR signal to noise ratio 6

UE user equipment 9

ULA uniform linear array 13

uRLLC

ultra-reliable and low-latency commu-

nications 2

V2V vehicle to vehicle 96

V2X vehicle to everything 96

VR virtual reality 2

WPAN wireless personal area network 6

Bibliography

[1]

Ericsson Inc. Ericsson Mobility Report. Mar. 2018. url:

https://www.ericsson.

com/assets/local/mobility-report/documents/2018/ericsson-mobility-

report-november-2018.pdf.

[2]

Erik Dahlman, Stefan Parkvall, and Johan Skold. 5G NR: The next generation

wireless access technology. Academic Press, 2018. isbn: 012814324X.

[3]

Xingqin Lin et al. “5G New Radio: Unveiling the essentials of the next generation

wireless access technology”. In: IEEE Communications Standards Magazine 3.3

(2019), pp. 30–37. issn: 2471-2825.

[4]

Khagendra Belbase. “Analysis of Millimeter Wave Wireless Relay Networks”.

PhD thesis. University of Alberta, 2019.

[5]

Theodore S Rappaport et al. “Overview of millimeter wave communications for

fifth-generation (5G) wireless networks—with a focus on propagation models”. In:

IEEE Transactions on Antennas and Propagation 65.12 (2017), pp. 6213–6230.

issn: 0018-926X.

[6]

M. Shafi et al. “5G: A Tutorial Overview of Standards, Trials, Challenges,

Deployment, and Practice”. In: IEEE Journal on Selected Areas in Communications

35.6 (2017), pp. 1201–1221. issn: 0733-8716. doi:10.1109/JSAC.2017.2692307.

[7]

M Series. “IMT Vision–Framework and overall objectives of the future development

of IMT for 2020 and beyond”. In: Recommendation ITU (2015), pp. 2083–.

[8]

Huawei Technologies Co. “White Paper: 5G Network Architecture - A High-Level

Perspective”. In: (2016).

[9]

Naga Bhushan et al. “Network densification: the dominant theme for wireless

evolution into 5G”. In: IEEE Communications Magazine 52.2 (2014), pp. 82–89.

issn: 0163-6804.

[10]

A. Gupta and R. K. Jha. “A Survey of 5G Network: Architecture and Emerging

Technologies”. In: IEEE Access 3 (2015), pp. 1206–1232. issn: 2169-3536. doi:

10.1109/ACCESS.2015.2461602.

102 BIBLIOGRAPHY

[11]

M. Agiwal, A. Roy, and N. Saxena. “Next Generation 5G Wireless Networks:

A Comprehensive Survey”. In: IEEE Communications Surveys Tutorials 18.3

(thirdquarter 2016), pp. 1617–1655. issn: 1553-877X. doi:

10.1109/COMST.2016.

2532458.

[12]

Erik G Larsson et al. “Massive MIMO for next generation wireless systems”. In:

IEEE communications magazine 52.2 (2014), pp. 186–195. issn: 0163-6804.

[13]

Emil Björnson, Jakob Hoydis, and Luca Sanguinetti. Massive MIMO networks:

Spectral, energy, and hardware efficiency. Vol. 11. 3-4. 2017, pp. 154–655.

[14]

M. R. Akdeniz et al. “Millimeter wave channel modeling and cellular capacity

evaluation”. In: IEEE Journal on Selected Areas in Communications 32.6 (June

2014), pp. 1164–1179. issn: 0733-8716. doi:10.1109/JSAC.2014.2328154.

[15]

Robert W Heath et al. “An overview of signal processing techniques for millimeter

wave MIMO systems”. In: IEEE journal of selected topics in signal processing 10.3

(2016), pp. 436–453. issn: 1932-4553.

[16]

M. Shafi et al. “Microwave vs. Millimeter-Wave Propagation Channels: Key

Differences and Impact on 5G Cellular Systems”. In: IEEE Communications

Magazine 56.12 (2018), pp. 14–20.

[17]

Yong Niu et al. “Relay-Assisted and QoS Aware Scheduling to Overcome Blockage

in mmWave Backhaul Networks”. In: IEEE Transactions on Vehicular Technology

68.2 (2019), pp. 1733–1744. issn: 0018-9545.

[18]

A. Dimas, D. S. Kalogerias, and A. P. Petropulu. “Cooperative beamforming with

predictive relay selection for urban mmWave communications”. In: IEEE Access 7

(2019), pp. 157057–157071. issn: 2169-3536. doi:

10.1109/ACCESS.2019.2950274

[19]

Y. Yan, Q. Hu, and D. M. Blough. “Path Selection with Amplify and Forward

Relays in mmWave Backhaul Networks”. In: 2018 IEEE 29th Annual International

Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC).

Sept. 2018, pp. 1–6. doi:10.1109/PIMRC.2018.8580768.

[20]

Yilin Li et al. “Radio resource management considerations for 5G millimeter wave

backhaul and access networks”. In: IEEE Communications Magazine 55.6 (2017),

pp. 86–92. issn: 0163-6804.

[21]

Yilin Li. “Efficient Data Delivery in 5G Mobile Communication Networks”. PhD

thesis. Technischen Universität Berlin, 2019.

[22]

TE Bogale, X Wang, and LB Le. “mmWave communication enabling techniques

for 5G wireless systems: A link level perspective”. In: mmWave Massive MIMO.

Elsevier, 2017, pp. 195–225.

BIBLIOGRAPHY 103

[23]

Farooq Khan and Zhouyue Pi. “mmWave mobile broadband (MMB): Unleashing

the 3–300GHz spectrum”. In: 34th IEEE Sarnoff Symposium. IEEE, pp. 1–6. isbn:

1612846807.

[24]

M. Jacob et al. “Diffraction in mm and Sub-mm Wave Indoor Propagation

Channels”. In: IEEE Transactions on Microwave Theory and Techniques 60.3

(2012), pp. 833–844.

[25]

Z. Shi et al. “Three-dimensional spatial multiplexing for directional millimeter-

wave communications in multi-cubicle office environments”. In: 2013 IEEE Global

Communications Conference (GLOBECOM). 2013, pp. 4384–4389.

[26]

T. S. Rappaport, J. N. Murdock, and F. Gutierrez. “State of the Art in 60-GHz

Integrated Circuits and Systems for Wireless Communications”. In: Proceedings

of the IEEE 99.8 (2011), pp. 1390–1436.

[27]

Z. Qingling and J. Li. “Rain Attenuation in Millimeter Wave Ranges”. In: 2006 7th

International Symposium on Antennas, Propagation EM Theory. 2006, pp. 1–4.

[28]

S. Joshi and S. Sancheti. “Foliage loss measurements of tropical trees at 35 GHz”.

In: 2008 International Conference on Recent Advances in Microwave Theory and

Applications. 2008, pp. 531–532.

[29]

Ahmed M Al-samman, Marwan Hadri Azmi, and Tharek Abd Rahman. “A survey

of millimeter wave (mm-Wave) communications for 5G: Channel measurement

below and above 6 GHz”. In: International Conference of Reliable Information

and Communication Technology. Springer, pp. 451–463.

[30]

T. Nitsche, C. Cordeiro, A. B. Flores, E. W. Knightly, E. Perahia, and J. C.

Widmer. “IEEE 802.11 ad: directional 60 GHz communication for multi-Gigabit-

per-second Wi-Fi”. In: IEEE Communications Magazine 52.12 (2014), pp. 132–141.

issn: 0163-6804.

[31]

Ahmed Alkhateeb et al. “Channel estimation and hybrid precoding for millimeter

wave cellular systems”. In: Selected Topics in Signal Processing, IEEE Journal of

8.5 (2014), pp. 831–846.

[32]

Matthew Kokshoorn et al. “Millimeter wave MiMo channel estimation using

overlapped beam patterns and rate adaptation”. In: IEEE Transactions on Signal

Processing 65.3 (2016), pp. 601–616. issn: 1053-587X.

[33]

S. Noh, M. D. Zoltowski, and D. J. Love. “Multi-resolution codebook and adaptive

beamforming sequence design for millimeter wave beam alignment”. In: IEEE

Transactions on Wireless Communications 16.9 (Sept. 2017), pp. 5689–5701. issn:

1536-1276. doi:10.1109/TWC.2017.2713357.

[34]

M. Hussain and N. Michelusi. “Throughput optimal beam alignment in millimeter

wave networks”. In: 2017 Information Theory and Applications Workshop (ITA).

Feb. 2017, pp. 1–6. doi:10.1109/ITA.2017.8023460.

104 BIBLIOGRAPHY

[35]

J. Palacios and D. De Donno and J. Widmer. “Tracking mm-Wave channel

dynamics: Fast beam training strategies under mobility”. In: IEEE INFOCOM

2017 - IEEE Conference on Computer Communications. May 2017, pp. 1–9. doi:

10.1109/INFOCOM.2017.8056991.

[36]

A. Alkhateeb, G. Leus, and R. W. Heath. “Compressed sensing based multi-user

millimeter wave systems: How many measurements are needed?” In: 2015 IEEE

International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Apr. 2015, pp. 2909–2913. doi:10.1109/ICASSP.2015.7178503.

[37]

J. Rodríguez-Fernández, N. González-Prelcic, K. Venugopal, and R. W. Heath Jr.

“Frequency-domain compressive channel estimation for frequency-selective hybrid

mmWave MIMO systems”. In: arXiv preprint arXiv:1704.08572 (2017).

[38]

Kiran Venugopal et al. “Channel estimation for hybrid srchitecture-based wideband

millimeter wave systems”. In: IEEE Journal on Selected Areas in Communications

35.9 (2017), pp. 1996–2009. issn: 0733-8716.

[39]

Kiran Venugopal et al. “Time-domain channel estimation for wideband millimeter

wave systems with hybrid architecture”. In: Acoustics, Speech and Signal Processing

(ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 6493–6497.

isbn: 1509041176.

[40]

O. El Ayach, R. W. Heath, S. Rajagopal, and Z. Pi. “Multimode precoding in

millimeter wave MIMO transmitters with multiple antenna sub-arrays”. In: Global

Communications Conference (GLOBECOM), 2013 IEEE. IEEE, pp. 3476–3480.

isbn: 1479913537.

[41]

Didi Zhang et al. “Hybridly connected structure for hybrid beamforming in

mmWave massive MIMO systems”. In: IEEE Transactions on Communications

66.2 (2018), pp. 662–674. issn: 0090-6778.

[42]

P. L. Cao, T. J. Oechtering, and M. Skoglund. “Precoding design for massive MIMO

systems with sub-connected architecture and per-antenna power constraints”. In:

WSA 2018; 22nd International ITG Workshop on Smart Antennas. Mar. 2018,

pp. 1–6.

[43]

M. Majidzadeh et al. “Hybrid beamforming for single-user MIMO with partially

connected RF architecture”. In: 2017 European Conference on Networks and

Communications (EuCNC). June 2017, pp. 1–6. doi:

10.1109/EuCNC.2017.

7980696.

[44]

Shahar Stein Ioushua and Yonina C Eldar. “Hybrid analog-digital beamforming

for massive MIMO systems”. In: arXiv preprint arXiv:1712.03485 (2017).

[45]

F. Sohrabi and W. Yu. “Hybrid digital and analog beamforming design for large-

scale antenna arrays”. In: IEEE Journal of Selected Topics in Signal Processing 10.3

(Apr. 2016), pp. 501–513. issn: 1932-4553. doi:10.1109/JSTSP.2016.2520912.

BIBLIOGRAPHY 105

[46]

Ang Li and Christos Masouros. “Hybrid analog-digital millimeter-wave MU-MIMO

transmission with virtual path selection”. In: IEEE Communications Letters 21.2

(2017), pp. 438–441. issn: 1089-7798.

[47]

Jingbo Du et al. “Hybrid precoding architecture for massive multiuser MIMO

with dissipation: Sub-connected or fully-connected structures?” In: arXiv preprint

arXiv:1806.02857 (2018).

[48]

T. K. Vu et al. “Joint Path Selection and Rate Allocation Framework for

5G Self-Backhauled mm-wave Networks”. In: IEEE Transactions on Wireless

Communications 18.4 (2019), pp. 2431–2445.

[49]

R. Abdel-Raouf, H. Esmaiel, and O. A. Omer. “Fuzzy logic based relay selection

for mmWave communications”. In: 2019 9th Annual Information Technology,

Electromechanical Engineering and Microelectronics Conference (IEMECON).

Mar. 2019, pp. 263–267. doi:10.1109/IEMECONX.2019.8877074.

[50]

J. García-Rois et al. “On the Analysis of Scheduling in Dynamic Duplex Multihop

mmWave Cellular Systems”. In: IEEE Transactions on Wireless Communications

14.11 (2015), pp. 6028–6042.

[51]

B. P. S. Sahoo, C. Yao, and H. Wei. “Millimeter-Wave Multi-Hop Wireless

Backhauling for 5G Cellular Networks”. In: 2017 IEEE 85th Vehicular Technology

Conference (VTC Spring). 2017, pp. 1–5.

[52]

X. Gao, L. Dai, and A. M. Sayeed. “Low RF-complexity technologies to enable

millimeter-wave MIMO with large antenna array for 5G wireless communications”.

In: IEEE Communications Magazine 56.4 (Apr. 2018), pp. 211–217. issn: 0163-

6804. doi:10.1109/MCOM.2018.1600727.

[53]

J. Palacios, N. González-Prelcic, and J. Widmer. “Managing Hardware Impairments

in Hybrid Millimeter Wave Mimo Systems: A Dictionary Learning-Based Ap-

proach”. In: 2019 53rd Asilomar Conference on Signals, Systems, and Computers.

IEEE, pp. 168–172. isbn: 1728143004.

[54]

Nima N Moghadam et al. “On the energy efficiency of MIMO hybrid beamforming

for millimeter wave systems with nonlinear power amplifiers”. In: arXiv preprint

arXiv:1806.01602 (2018).

[55]

Xiaoshen Song, Saeid Haghighatshoar, and Giuseppe Caire. “A scalable and

statistically robust beam alignment technique for mm-Wave systems”. In: IEEE

Trans. on Wireless Comm. PP (2018), pp. 1–1.

[56]

X. Song, S. Haghighatshoar, and G. Caire. “Efficient Beam Alignment for

Millimeter Wave Single-Carrier Systems With Hybrid MIMO Transceivers”. In:

IEEE Transactions on Wireless Communications 18.3 (2019), pp. 1518–1533.

106 BIBLIOGRAPHY

[57]

X. Song, T. Kühne, and G. Caire. “Fully-/Partially-Connected Hybrid Beamform-

ing Architectures for mmWave MU-MIMO”. In: IEEE Transactions on Wireless

Communications 19.3 (2020), pp. 1754–1769.

[58]

X. Song et al. “Joint Topology Simplification and Beam Scheduling for Half-Duplex

mmWave Relay Networks”. In: IEEE Transactions on Wireless Communications.

(2020 (to be submitted)).

[59]

P. Schniter and A. Sayeed. “Channel estimation and precoder design for millimeter-

wave communications: The sparse way”. In: 2014 48th Asilomar Conference on

Signals, Systems and Computers. Nov. 2014, pp. 273–277. doi:

10.1109/ACSSC.

2014.7094443.

[60]

John G.. Proakis and Masoud Salehi. Digital communications. McGraw-Hill, 2008.

[61]

Philip Bello. “Characterization of randomly time-variant linear channels”. In:

IEEE Transactions on Communications Systems 11.4 (1963), pp. 360–393.

[62] Andrea Goldsmith. Wireless communications. Cambridge University Press, 2005.

[63]

Akbar M Sayeed. “Deconstructing multiantenna fading channels”. In: IEEE

Transactions on Signal Processing 50.10 (2002), pp. 2563–2579. issn: 1053-587X.