scieee Science in your language
[en] (orig)
Christoph Hertrich
Facets of Neural Network Complexity
x
˜1
x
˜2
x
˜3
x
˜4
x
˜5
x1
x2
x3
x4
y
1
1
-1
1
1
-1
1
1
-1
1
-1
1
-1 1
-1
1
-1
1
-1
1
-1
gout(p)
p
out
sin
pin
gin(P)
···
gin(2)
gin(1)
p
in
Com-
pute
dold
Com-
pute
dnew
+
Select
gin(p(2))
Select
gin(p(1))
min
+
Facets of Neural Network Complexity
vorgelegt von
M. Sc.
Christoph Hertrich
ORCID: 0000-0001-5646-8567
an der Fakultät II Mathematik und Naturwissenschaften
der Technischen Universität Berlin
zur Erlangung des akademischen Grades
Doktor der Naturwissenschaften
Dr. rer. nat.
genehmigte Dissertation
Promotionsausschuss:
Vorsitzender: Prof. Dr. John Sullivan (TU Berlin)
Gutachter: Prof. Dr. Martin Skutella (TU Berlin)
Gutachter: Prof. Dr. Sebastian Pokutta (TU Berlin & Zuse-Institut Berlin)
Gutachter: Prof. Dr. Amitabh Basu (Johns Hopkins University, Baltimore, USA)
Tag der wissenschaftlichen Aussprache: 2. rz 2022
Berlin 2022
Advertisement
Abstract
Artificial neural networks are at the heart of some of the greatest advances in mod-
ern technology. They enable huge breakthroughs in applications ranging from computer
vision via machine translation to speech recognition as well as autonomous driving and
many more. However, we are still far away from a more rigorous theoretical explanation
of these overwhelming success stories. Consequently, the development of a better math-
ematical understanding of neural networks is currently one of the hottest research topics
in computer science. In this thesis we provide several contributions in that direction
for the simple, but practically powerful and widely used model of feedforward neural
networks with rectified linear unit (ReLU) activations.
Our focus is on various notions of what we call the complexity of such neural net-
works: how much computing resources (time, hardware, network size, etc.) are required
to achieve a certain goal? Of course, such questions can be asked in various contexts.
We identify and study the following three facets of complexity for neural networks with
ReLU activations.
The first facet is neural networks’ expressivity: What functions can be represented by
certain neural network architectures? Even though this is such a fundamental question,
very little is known so far. We make progress concerning the question whether the
class of exactly representable functions strictly increases by adding more layers (with no
restrictions on size). We also provide upper bounds on the number of neurons required to
represent arbitrary piecewise linear functions with small-depth ReLU neural networks.
The second facet is neural networks’ computational power. Here, we view neu-
ral networks as a model of computation, just like Boolean, or even closer, arithmetic
circuits. We then investigate which network (or circuit) size is required to solve vari-
ous problems, with a focus on combinatorial optimization problems. Even though this
model is quite restrictive compared to other models of computation, we are able to show
that comparably small neural networks can provably solve problems like the efficiently
solvable Maximum Flow Problem or the NP-hard Knapsack Problem.
The third facet is neural networks’ training complexity: How difficult is it to fit the
weights of a neural network to training data? It is widely known that optimal solutions to
the training problem are hard to obtain, which is why local optimization techniques like
stochastic gradient descent are used in practice. We focus on the question whether the
situation improves for fixed input dimension, leading to the paradigm of parameterized
complexity analysis. We provide running time lower bounds in terms of W[1]-hardness
results, proving that known brute-force strategies are essentially optimal. On the positive
side, we extend a known polynomial-time algorithm for constant dimension and convex
loss functions to a more general class of loss functions.
The mathematical methods used in this thesis include polyhedral theory, discrete
and tropical geometry, mixed-integer and combinatorial optimization, as well as tools
from complexity theory.
v
Advertisement
Loading more pages...