Facets of neural network complexity [original]

Christoph Hertrich

Facets of Neural Network Complexity

˜1

˜2

˜3

˜4

˜5

-1

-1 1

-1

gout(p)

p∗

out

sin

gin(P)

···

gin(2)

gin(1)

p∗

Com-

pute

dold

Com-

pute

dnew

Select

gin(p(2))

Select

gin(p(1))

min

Facets of Neural Network Complexity

vorgelegt von

M. Sc.

Christoph Hertrich

ORCID: 0000-0001-5646-8567

an der Fakultät II – Mathematik und Naturwissenschaften

der Technischen Universität Berlin

zur Erlangung des akademischen Grades

Doktor der Naturwissenschaften

Dr. rer. nat.

genehmigte Dissertation

Promotionsausschuss:

Vorsitzender: Prof. Dr. John Sullivan (TU Berlin)

Gutachter: Prof. Dr. Martin Skutella (TU Berlin)

Gutachter: Prof. Dr. Sebastian Pokutta (TU Berlin & Zuse-Institut Berlin)

Gutachter: Prof. Dr. Amitabh Basu (Johns Hopkins University, Baltimore, USA)

Tag der wissenschaftlichen Aussprache: 2. März 2022

Berlin 2022

Abstract

Artificial neural networks are at the heart of some of the greatest advances in mod-

ern technology. They enable huge breakthroughs in applications ranging from computer

vision via machine translation to speech recognition as well as autonomous driving and

many more. However, we are still far away from a more rigorous theoretical explanation

of these overwhelming success stories. Consequently, the development of a better math-

ematical understanding of neural networks is currently one of the hottest research topics

in computer science. In this thesis we provide several contributions in that direction

for the simple, but practically powerful and widely used model of feedforward neural

networks with rectified linear unit (ReLU) activations.

Our focus is on various notions of what we call the complexity of such neural net-

works: how much computing resources (time, hardware, network size, etc.) are required

to achieve a certain goal? Of course, such questions can be asked in various contexts.

We identify and study the following three facets of complexity for neural networks with

ReLU activations.

The first facet is neural networks’ expressivity: What functions can be represented by

certain neural network architectures? Even though this is such a fundamental question,

very little is known so far. We make progress concerning the question whether the

class of exactly representable functions strictly increases by adding more layers (with no

restrictions on size). We also provide upper bounds on the number of neurons required to

represent arbitrary piecewise linear functions with small-depth ReLU neural networks.

The second facet is neural networks’ computational power. Here, we view neu-

ral networks as a model of computation, just like Boolean, or even closer, arithmetic

circuits. We then investigate which network (or circuit) size is required to solve vari-

ous problems, with a focus on combinatorial optimization problems. Even though this

model is quite restrictive compared to other models of computation, we are able to show

that comparably small neural networks can provably solve problems like the efficiently

solvable Maximum Flow Problem or the NP-hard Knapsack Problem.

The third facet is neural networks’ training complexity: How difficult is it to fit the

weights of a neural network to training data? It is widely known that optimal solutions to

the training problem are hard to obtain, which is why local optimization techniques like

stochastic gradient descent are used in practice. We focus on the question whether the

situation improves for fixed input dimension, leading to the paradigm of parameterized

complexity analysis. We provide running time lower bounds in terms of W[1]-hardness

results, proving that known brute-force strategies are essentially optimal. On the positive

side, we extend a known polynomial-time algorithm for constant dimension and convex

loss functions to a more general class of loss functions.

The mathematical methods used in this thesis include polyhedral theory, discrete

and tropical geometry, mixed-integer and combinatorial optimization, as well as tools

from complexity theory.

Loading more pages...