Document [original]

Stefan Apostel, Nicholas D. Haynes, Eckehard Schöll, Otti D’Huys,

Daniel J. Gauthier

Reservoir Computing Using Autonomous Boolean

Networks Realized on Field-Programmable Gate

Arrays

Open Access via institutional repository of Technische Universität Berlin

Document type

Book chapter | Accepted version

(i. e. final author-created version that incorporates referee comments and is the version accepted for

publication; also known as: Author’s Accepted Manuscript (AAM), Final Draft, Postprint)

This version is available at

https://doi.org/10.14279/depositonce-14985

Citation details

Apostel S., Haynes N. D., Schöll E., D’Huys O., Gauthier D. J. (2021) Reservoir Computing Using Autonomous

Boolean Networks Realized on Field-Programmable Gate Arrays. In: Nakajima K., Fischer I. (eds.) Reservoir

Computing. pp 239–271. Natural Computing Series. Springer, Singapore.

https://doi.org/10.1007/978-981-13-1687-6_11.

This work is protected by copyright and/or related rights. You are free to use this work in any way permitted by

the copyright and related rights legislation that applies to your usage. For other uses, you must obtain

permission from the rights-holder(s).

Reservoir Computing using Autonomous

Boolean Networks Realized on

Field-Programmable Gate Arrays

Stefan Apostel, Nicholas D. Haynes, Eckehard Sch¨

oll, Otti D’Huys, and Daniel J.

Gauthier

Abstract In this Chapter, we consider realizing a reservoir computer on an elec-

tronic chip that allows for many tens of network nodes whose connection topology

can be quickly reconfigured. The reservoir computer displays analog-like behav-

ior and has the potential to perform computations beyond that of a classic Turning

machine. In detail, we present our preliminary results of using a physical reservoir

computer for performing the task of identifying written digits. The reservoir is real-

ized on a commercially-available electronic device known as a field-programmable

gate array on which we create an autonomous Boolean network for information

processing. Even though the network nodes are Boolean logic elements, they dis-

play analog behavior because their is no master clock that controls the nodes. In

addition, the electronic signals related to the written-digit images are injected into

the reservoir at high speed, leading to the possibility of full image classification on

the nanosecond time scale. We explore the dynamics of the autonomous Boolean

networks in response to injected signals and, based on these results, investigate the

performance of the reservoir computer on the written-digit task. For a wide range of

Stefan Apostel

Institut f¨

ur Theoretische Physik, Technische Universit¨

at Berlin, 10623 Berlin, Germany, e-mail:

[email protected]

Nicholas D. Haynes

Duke University, Department of Physics, Box 90305, Durham, North Carolina 27706 USA, e-mail:

nickda[email protected]

Eckehard Sch¨

oll

Institut f¨

ur Theoretische Physik, Technische Universit¨

at Berlin, 10623 Berlin, Germany, e-mail:

[email protected]

Otti D’Huys

Department of Mathematics, Aston University, B4 7ET Birmingham, United Kingdom e-mail:

[email protected]

Daniel J. Gauthier

The Ohio State University, Department of Physics, 191 W. Woodruff Ave., Columbus, Ohio 43210

USA, e-mail: gauthier[email protected]

2 Reservoir computing with ABNs

reservoir structures, we obtain a typical performance of ⇠90% for correctly identi-

fying a written digit, which exceeds that obtained by a linear classifier. This work

paves the way for achieving low-power, high-speed reservoir computing on readily-

available field-programmable gate arrays, which are well matched to existing com-

puting infrastructure.

1 Introduction

As demonstrated by the breadth of contributions in this volume, there is great inter-

est in reservoir computing [1], from fundamental studies of their properties to using

them for a wide range of applications. While many studies use a standard Turing-

von Neumann computer to simulate a reservoir computer (RC), physical RCs have

the potential to demonstrate beyond-Turing computing and may increase the infor-

mation processing speed. One challenge in building a physical RC is fabricating a

large enough reservoir, which may require 100’s to 1,000’s of nonlinear input-output

nodes, and a large number of interconnects (links) between the nodes.

A highly successful alternative that is capable of operating at high speeds is to

realize a RC using a single nonlinear node with time-delay feedback with a large

number of ‘virtual’ nodes determined by the ratio of the delay time to the charac-

teristic time scale of the node. Such platforms have been used for a wide variety

of machine learning tasks, such as classifying spoken words [2] at high speed [3],

nonlinear channel equalization [4], and classifying serial digital data [5]. Unfortu-

nately, this approach requires complex temporal time delaying and multiplexing of

the input information (masking) and injection into the loop [6, 7]. In principle, this

masking could be done in real time using dedicated hardware as has been demon-

strated recently [8], but most demonstrations to date perform this preprocessing of-

fline thereby slowing down the overall operation rate of the RC.

A different, potentially scalable, approach is to use nonlinear optical micro-ring

resonators fabricated on a planar photonic chip [9, 10], or arrays of linear micro-ring

resonators or swirls where the nonlinearity is provided by the square-law detectors

(that is, detectors that respond to the intensity of the light rather than the field) in

the read-out layer [11, 12, 13, 14]. This technology is currently limited to a small

(⇠16) number of resonators because of optical loss and the network connectivity is

limited by the planar geometry. But this platform may find future application as the

quality of photonic chips improves.

Here, we focus on an approach using a commercially-available electronic device

known as a field-programmable gate array (FPGA), which greatly simplifies the

creation of a reservoir in comparison to the photonic approaches discussed above.

FPGAs are also much easier to reconfigure in comparison to custom-built boards

with discrete logic [15, 16] (that is, devices using a collection of single-chip logic

gates soldered on the board). Rosin [17] pioneered the application of FPGAs to real-

ize autonomous time-delay Boolean networks [18]. Here, the FPGA is configured to

realize a reservoir, where the FPGA logic elements serve as the network nodes that

Reservoir computing with ABNs 3

nominally perform a Boolean operation on the inputs. The term autonomous indi-

cates that the logic elements are not gated by a master clock; they process new edges

as soon as they arrive on the inputs to the logic elements. Furthermore, because the

network is autonomous, signals traveling along the links have a finite propagation

speed so that the link delays must be accounted for. The autonomous nature of the

complex network allows for the possibility of beyond-Turing computation [19]. Au-

tonomous Boolean networks on FPGAs have been used previously for reservoir

computing and applied to high-speed pattern recognition of digital serial data pat-

terns [5] and high-speed forecasting of chaotic dynamics [20]. The FPGA can also

operate in the clocked mode, thus realizing a finite-state-machine and offering the

possibility of accelerating the software simulation of RCs. This approach has been

applied to channel equalization [21, 22, 23, 24] and speech recognition [8, 25].

In this chapter, we describe our preliminary studies on using an FPGA-based

RC for performing a classification task: identifying images containing human-

written digits (the MNIST task). While RCs are most suited for processing time-

dependent signals, performing a classification task on images presents interesting

challenges, such as identifying efficient methods for injecting data into the reser-

voir. We also take this opportunity to study the dynamics of autonomous Boolean

networks (ABNs) in response to a perturbation, such as a phase transition as the

node in-degree varies. This study helps guide the choice of metaparameters for the

RC.

In the next section, we present the reservoir concept with an associated mathe-

matical model, describe our experimental studies of the dynamics of ABNs as the

network parameters vary in Sec. 3, realize an RC on an FPGA and use it for the

written digit (MNIST) task in Sec. 4, and discuss future directions and conclude in

Sec. 5 [26]. We briefly discuss using FPGAs and describe our experimental work-

flow in the Appendix.

2 Reservoir computers based on autonomous Boolean networks

One aspect that distinguishes RCs from other artificial neural networks is that each

node is itself a dynamical system. Glass and colleagues [27] have studied exten-

sively the dynamics of autonomous Boolean networks using a model where node i

is described by a continuous variable xiwhose behavior is governed by a first-order

differential equation with decay rate giand driven by a Boolean function fiof the

node inputs. The function fiis defined via a lookup table as described below and

can take on essentially any Boolean function consistent with the number of inputs

to the node. In the context of our work, the decay rate gimodels the finite rise-time

of the FPGA logic elements due to their input capacitance and inductance and are

nominally identical for all nodes with gi⇠2p/(0.41 ns). When signals propagating

from node jto iexperience delay ti,j, the so-called Glass model can be extended

to a set of delay differential equations [28]. Delays appear in our FPGA-based sys-

tem because the signals propagate at a finite velocity along the link wires, where a

4 Reservoir computing with ABNs

typical delay time between nearby logic elements is only a few 10’s of picoseconds.

To purposefully add more delay, we use pairs of series-connected NOT gates along

the network links, where we obtain a delay of ⇠0.52 ns per pair of gates [29]. We

use special care in writing the hardware description language used to configure the

FPGA (described in greater detail in the Appendix) to make sure that the logic gates

used to create delay are not removed from our design.

Adopting this theoretical approach for reservoir computing with time-delay

ABNs, illustrated in Fig. 1, the dynamics of the RC are given by [30]

dxi

dt =gixi+gifi J

j=1

Win

i,juj(t)+

n=1

Wi,nxn(tti,n)!,i=1,...,N(1)

ym(t)=

n=1

Wout

m,nxn,m=1,...,M.(2)

Here, uj(t)are the Jsignals input to the RC, which are connected to the Nreservoir

nodes with random fixed weights Win

i,j,Wi,nare the random fixed internal weights of

the reservoir, ym(t)are the Moutputs of the RC with trained (morphable) weights

Wm,n. Often, the W’s are sparse so that the typical node in-degree (number of inputs)

Kiis small. The reservoir embeds the input data to a higher-dimension phase space

when N>Jbecause of the nonlinear response of the node, known as dimension ex-

pansion and often a key requirement for effective information processing. Because

of the time delays, the phase-space dimension of the reservoir is infinite, which

can be seen by considering that the initial conditions of the time-delay differential

equations must be specified over the real interval of the maximum link delay.

Evaluating fiis particularly simple when the arguments ujand xnare Boolean,

but the W’s are real. Here, the multiplications and additions appearing in the argu-

ment of fican be done a single time after the W’s are chosen, thereby defining the

Boolean look-up table (LUT). Thereafter, only the LUT is used and no further op-

erations are required. This procedure maps perfectly onto the structure of the FPGA

logic elements, which are based on LUTs, as discussed in the Appendix.

Models similar to Eq. (1) have been used to understand the dynamics of time-

delay ABNs representing simple genetic regulatory networks realized on an FPGA,

for example, and can capture the extremely long transient behavior observed exper-

imentally [5, 29, 31]. While we do not consider solutions to Eq. 1 here, we include

it for completeness because it is a good starting point for a theoretical analysis of

FPGA-based RCs.

For the classification task considered here, we adjust Wout

k,nusing a finite-size

training data set so that the resulting output properly classifies each input in a least-

square sense, know as supervised learning. Specifically, the output weight matrix

Wout is determined by injecting a finite-length input training data set U(t)and

recording the network dynamics X(t). Here, we assume that the state of the net-

work is sampled at equal discrete time intervals Dtso that the matrices are finite

dimensional. Based on these observations, we modify Wout to minimize the error of

the output Y(the classes) to the expected output Yexpected, resulting in

Reservoir computing with ABNs 5

KƵƚƉƵƚůĂǇĞƌ

;ĐůĂƐƐĞƐͿ

ZĞƐĞƌǀŽŝƌ

/ŶƉƵƚůĂǇĞƌ

/ŶƉƵƚĚĂƚĂ

ƌĂŶĚŽŵ

ĨŝǆĞĚ

ŝŶƉƵƚ

ǁĞŝŐŚƚƐ

ƌĂŶĚŽŵĨŝǆĞĚ

ƌĞĐƵƌƌĞŶƚ

ǁĞŝŐŚƚƐ

ƚƌĂŝŶĞĚ

;ŵŽƌƉŚĂďůĞͿ

ŽƵƚƉƵƚǁĞŝŐŚƚƐ

Fig. 1 Illustration of the reservoir computer architecture.

Wout =YexpectedXTXXT+b2I1,(3)

where bis a regularization parameter, Iis the identity matrix, and Tindicates the

transpose. A nonzero value of bensures that the norm of Wout does not become

large, prevents sensitivity of the training to noise, and improves the generalizability

of the RC to different inputs. We stress that X(t)is a concatenation of the network

dynamics over all input data, which is a collection of image data described below

in Sec. 4.1. We can also find a solution to Eq. (3) using gradient descent methods,

which are helpful when the matrix dimensions are large. Gradient-descent routines

are readily found in modern toolkits developed by the deep learning community and

they can operate at high speed using graphical processing units. Another approach

is to use recursive least-squares.

3 Dynamics of random autonomous Boolean networks

Before discussing the performance of the RC applied to the MNIST classification

task, we explore the dynamics of time-delay ABNs realized on an FPGA for dif-

ferent network parameters. We seek to identify a phase transition from ordered to

disordered (chaotic) behavior and compare our observation to the prediction for

clocked Boolean networks. Identifying the location of the phase transition is impor-

tant because the performance of the RC is expected to be highest near the border of

the transition from order to chaos [32].

3.1 Dynamics of clocked random Boolean Networks

There exists a considerable literature on the dynamics of clocked random Boolean

networks and we briefly summarize some of the findings here. A clocked random

6 Reservoir computing with ABNs

Boolean network (RBN) is typically taken to consist of Nnodes each of which

receives exactly Kinputs. The Boolean node function fiis defined in terms of a

random LUT, where the probability of an entry in the LUT taking on the value 0 (1)

is p(1 p) and is often called the node bias [33, 34, 35].

For a clocked RBN with random and fixed (quenched) functions and connections,

the network is a deterministic system with a finite number of states (a finite-state

machine) and must show periodic behavior with a maximum period of 2N. Different

network behaviors are observed as Kand pvaries. One behavior is called ordered

when the dynamics go to a fixed point or to a low-period periodic orbit. Another

behavior, often called chaotic, is when the period approaches its maximum value

and the specific periodic orbit followed is sensitive to a change in the initial value

of a single network node. Of course, chaos cannot exist in an RBN because all

behaviors are periodic, but the term is suggested because of the sensitivity of the

dynamics on the initial conditions. Using an annealed network approach [33], the

order-chaos transition is given by

Kc=1

2p(1p)(4)

in the limit N!•and shown in Fig. 2.

Fig. 2 Order-chaos phase transition in a clocked RBN.

3.2 Dynamics of ABNs on an FPGA

Guided by the results on clocked RBNs, we investigate experimentally the dynamics

of random ABNs on an FPGA. Here, we measure the Booleanized state of each

node at discrete (clocked) time intervals using the finite-state machine described in

the Appendix. We choose a subset of networks from the full range of possibilities

described by Eq. 1 so we can make a comparison to the previous work on clocked

networks. In particular, we consider networks with N=64 each having exactly K

Reservoir computing with ABNs 7

inputs and outputs so that Wi,nappearing in Eq. 1 is quite sparse. Here, we consider

only K4. Each node is assigned a random and fixed Boolean function fiwith bias

punder the restriction that fi=0 when all the inputs are zero so that the network is

in the quiescent state (no self-oscillation) when the FPGA is first turned on or reset

to this value. This restriction limits the bias to pmax =11/2K, but is advantageous

for realizing an RC because it results in a well-defined initial state of the reservoir. In

addition, to slow down the network dynamics [20] to better match the rate at which

we can inject data into the RC (see the next section), and to more closely match

the clock RNB theory, we use a nominally constant link delay ti,n=12.8±0.5 ns.

Here, the link time delay is set by using pairs of series-connected NOT gates, where

each pair of gates causes a delay of ⇠0.52 ns [29]. For this study, we investigate the

dynamics of 6,000 randomly chosen networks.

The initial condition of the network is set using an OR logic gate at the output of

each node as illustrated in Fig. 3. Initially, the state of all nodes are set to 0 to reset

the network, and then the desired initial condition is set during a 5-ns-long window

and passed to the link delay lines. This is accomplished using the OR gate, where

uj(t)appears at the output while the node is in the state 0. Once signals appear at

the input of the node from other network nodes (after a delay time tj,s), any signals

related to uj(t)still being injected into the node will be ‘blended’ with the incoming

node signals via the OR gate.

In general, setting the initial conditions to a random binary value will destabi-

lize the initial stable network state where all nodes are equal to 0, giving rise to a

transient that can last for a time beyond our data-collection window for p⇠0.5.

After initializing the network, we measure the state of each node every 5 ns for 200

samples (1 µs total record length), and transfer the data to a computer for analysis.

The network is reset to the 0 state at the end of data collection for initial condition.

For each network realization, we use 1,000 randomly chosen initial conditions and

repeat the experiment 50 times for each initial condition. We do not attempt to study

the duration of the transient beyond 1 µs because it is not relevant for reservoir

computing although we note the extremely long (second to minutes) chaotic time

transient times have been observed in simple Boolean networks with similar bias

[5].

<ŝŶƉƵƚƐ

ŝũ

ƌũ

<ŽƵƚƉƵƚƐ

EŽĚĞũ

>hd

/ŶƉƵƚ

Ƶũ;ƚͿ

Fig. 3 Using an OR gate on the output of node jto inject the input data for the node, illustrated

here for K=2. The signal emanating from the OR gate passes to the delay-line links to nodes iand

8 Reservoir computing with ABNs

One issue that arises when using an FPGA is that a randomly assigned node LUT

may have the same output (0 or 1) regardless of the inputs. In this case, the optimizer

within the compiler that converts the Verilog hardware description language to a

bitstream to configure the FPGA removes such nodes from the network. This is

more pronounced for low and high biases and the actual network synthesized on

the chip has fewer than 64 nodes when the network is pruned by the optimizer

(typically, only a few nodes are pruned). In the figures presented below, we use

different color symbols to indicate networks that have been pruned by the compiler.

This behavior does not change the network dynamics or time scales because such a

node would have been inactive. Nodes with high bias (p⇠1) may also be inactive,

but we already exclude nodes with high bias as discussed above.

We observe a wide range of network dynamics as we vary pand K. Again moti-

vated by the research on clocked RBNs, we initially search for the network to settle

down to a fixed-point behavior where all network nodes take on a constant value

(frozen dynamics) after a transient time. While we only measure the network dy-

namics at discrete times, it is unlikely that fast oscillations in between the measure-

ment times would escape unnoticed. We search for fixed-point behavior by measur-

ing the Hamming distance between the state of the network at time t0with its values

at all later times up to the limit of our 200 samples. Here, the Hamming distance

between two Boolean vectors Aand Bis defined as

H(A,B)=

i=1

AiBi.(5)

Figure 4 shows the number of time steps required for the dynamics of the ABN

to reach a fixed point (network realizations where a fixed point is never observed

are not shown). We also show the probability for a network to reach a fixed point.

Qualitatively consistent with the predictions shown in Fig. 2 for a clocked RBN

with the same network structure as our ANB, we observe longer transients as p

approaches 0.5 from above and below, and the range of biases that do not show

fixed-point behavior widens as Kincreases. Of course, we are only considering

fixed-point behavior and so these results only suggest the location of the order-

chaos boundary, although we find that chaotic behavior tends to be observed when

the probability for observing a fixed point drops near zero around p=0.5

Over the range of pwhere the probability of observing a fixed point is nearly zero

(the region around p⇠0.5 where the green line is near zero), we observe fast, com-

plex behavior after an initial short transient that depends sensitively on the initial

conditions, likely a manifestation of chaos in the ABN [16] as shown in Fig. 5(b).

Here, the dynamics is so fast that our finite sampling rate is too low to faithfully

record the dynamics. As seen in Fig. 5(a), the behavior predicted by a discrete-time

model using the same LUT and initial conditions is similar to our observations of

the ABN on the first few time steps, but disagrees thereafter. This disagreement is

perhaps not too surprising because the discrete map is a finite-state machine and

hence cannot display chaos (the dynamics eventually has to repeat). The fact that

we observe chaos gives a hint that an RC realized with an ABN reservoir on an

Reservoir computing with ABNs 9

Fig. 4 Length of transient for cases when the ABN reaches a stable fixed point for different net-

work parameters. Blue symbols indicate unmodified networks and red symbols indicate optimizer-

pruned networks. The green solid line shows the probability that a network reaches a stable fixed

point.

FPGA could display beyond-Turing computation, although observing chaos is not a

necessary or sufficient condition for beyond-Turning computation.

Fig. 5 Dynamics of a Boolean networks with N=64, K=4, and p=0.55 for (a) a clocked

network simulated numerically and (b) an autonomous network on the FPGA averaged over 50 ex-

perimental runs with the same initial conditions for each. Boolean 0 (1) is colored cyan (magenta).

This network realization has no pruned nodes.

Close inspection of Fig. 5(b) reveals apparent horizontal stripes, indicating that

the average value of the nodes take on different values. For this data set, the average

value of the nodes range from hxii=0.36 to 0.74, with an average for all nodes of

10 Reservoir computing with ABNs

hxi=0.57±0.07, which is consistent with the value of p. We currently do not have

an explanation for the appearance of these patterns.

We find that another qualitative measure of the order-chaos boundary can be

obtained by measuring hxifor different network parameters as shown in Fig. 6. Here

we perform the average from time steps 150 through 200 to minimize the effects of

the transients. At p=0, we expect that hxi=0 because all nodes in the network

are inactive regardless of their input. As pincreases, we see that hxiremains near

zero until an apparent transition to complex behavior indicated by a rapid rise in hxi.

The value pat this transition is smaller as Kincreases, consistent with the phase-

transition behavior of clocked RBNs shown in Fig. 2. We also show a line indicating

hxi=p, which is expected for a RBN with N,K!•. As Kincreases, the average

node value approaches this line, which is an interesting result because our network

is quite different from an RBN. Specifically, we are using an ABN, we restrict the

node LUTs so that the all-zero state is a solution, and Nand Kare not all that large.

Fig. 6 Average value of the network nodes for different network parameters.

4 Reservoir computing with ABNs on an FPGA

The experiments described in the previous section demonstrate that an ABN-based

reservoir can be constructed on an FPGA whose behavior can be adjusted from or-

dered to chaotic simply by adjusting the node bias and in-degree. Operating near or

within the chaotic domain demonstrates that the reservoir embeds the dynamics in a

high-dimensional phase space, one crucial characteristic needed for reservoir com-

puting. It is also widely believed that the reservoir must display other characteristics

when data is input to the reservoir, including [1]:

•Separation of different input states. Input data in different classes should have

distinct output dynamics.

•Generalization of similar inputs to similar outputs. Different input data within

a class should have similar output dynamics.

Reservoir computing with ABNs 11

•Fading memory. The output dynamics should be correlated with the input dy-

namics over a time known as the consistency window, but this temporal correla-

tion should fade.

Based on these properties, an RC is well suited for classification or prediction

based on correlations in data over time because it inherently has temporal memory.

For classification tasks on static image data such as the MNIST data set, the cor-

relations are inherently spatial. Thus, we break the image up into sub-images and

feed them rapidly into the reservoir (at the 200 MHz limit imposed by our write and

read procedure) to create spatial-temporal correlations. While this process is a bit

contrived, it may be well suited for processing high-speed video imagery data where

the data is usually generated in a progressive scan of the image.

In the next section, we introduce the MNIST classification task and modifications

we make to the input data to make a better match to our reservoir characteristics,

then inject this data into ABNs to verify the behaviors described above. Finally, we

describe the performance of an FPGA-based RC for the classification task.

4.1 The MNIST classification task

The MNIST database [36] of handwritten digits is a collection of 60,000 training

samples and 10,000 test samples of handwritten digits from 0 to 9. Each written

digit in the original data set, referred to as NIST after the U.S. federal agency that

collected the data, is represented by black-and-white 20⇥20-pixel, which was mod-

ified (MNIST) by centering the image and padding with white space, resulting in a

gray-scale 28⇥28-pixel image. Figure 7 shows four sample MNIST images for

each written digit.

Fig. 7 Four examples for each digit of the MNIST data set shown as gray scale images.

As mentioned above, we seek to inject sub-images into the reservoir to exploit

its fading memory. There is a trade-off in using too many sub-images because the

time interval for injecting data may be longer than the reservoir consistency window.

We also want to use sub-images with no more than pixels than reservoir nodes. To

reduce the parameter space, we choose to map each pixel to a single reservoir node

12 Reservoir computing with ABNs

and thus to have a sub-image size of 64 pixels. Thus, in this case, the input layer is

the sub-image and we use a restricted connectivity of the input layer to the reservoir:

each input node only connects to a single reservoir node. For cases when there is

not enough image data to fill a full 64-element input data vector, we pad the data

with zeros. When the reservoir is pruned by the Verilog compiler, the corresponding

pixel is not mapped into the reservoir and hence this information is lost.

For most of the studies presented below, we reduce the MNIST images to 16⇥16

pixels distributed over 4 sub-images. We tested many methods for creating sub-

images, such as vertical or horizontal image segments, or using random masks. We

find no statistically significant differences in the performance of the RC for the

MNIST classification task for different protocols for sub-images creation; the data

shown below uses vertical image segments.

The data is injected into the network using the same method described in Sec. 3.2

above using an OR gate at the output of each node (see Fig. 3) controlled by our data

write/read FSM described in the Appendix. Here, the data is loaded into the links,

which have a nominal propagation delay of 12.8 ns. A sub-image is input every 5 ns

(200 MHz input data rate) and hence some of the data is ‘blended’ with the reservoir

dynamics if the total image time exceeds the link delay time. For example, it takes

4⇥(5 ns) = 20 ns to inject a full image with 4 sub-images so that the first sub-image

is fully blended with the reservoir dynamics, the second sub-image is blended with

the reservoir dynamics for ⇠2.2 ns, while the last two sub-images are presented fully

for 5 ns. We did not realize this shortcoming of our method until after substantial

data collection. This represents a limitation of this initial study and the error rates

of the RC presented in Sec. 4 below are likely lower bounds. This limitation will be

addressed in future studies.

Figure 8 illustrates the steps involved to create the reduced image. This first step

is to determine an image ‘focus,’ which is an image coordinate determined by an

average of the pixel coordinates weighted by their respective gray-scale value. The

intersection of the solid-blue lines shown in Fig. 8(a) shows the image focus, which

is not necessarily the center of the image indicated by the intersection of the black

dashed lines. Next, we determine exterior white pixels to remove the excess padding.

We then create a grid of q⇥qpixels within this region centered on the focus with

q=820, but q=16 typically. A resulting pixel is labeled by filling each pixel

like a ‘bucket’ by adding the value overlapping area ⇥gray-scale weight (0-255)

from each original MNIST pixel to the new grid pixel. The new pixel will be set to

Boolean 1 (0) if the filled value reaches (does not reach) a threshold set at approxi-

mately the half-gray-scale point. This process is depicted in Fig. 8(b) with Boolean

1 shown as blue and the the bucket gray scale shown as an underlay.

Reservoir computing with ABNs 13

Fig. 8 (a) Determining the focus of the MNIST image. (b) Creating a reduced image shown here

for the case of 16⇥16 pixels.

4.2 Dynamics of ABNs with data injection: The consistency

window

As mentioned above, an important feature of an RC is to separate data from dif-

ferent classes and to give similar output for data from the same class. Even if the

reservoir is operating in the chaotic state, the transient dynamics can still be used for

classification when the output data is restricted to the consistency window [5, 37].

Haynes et al. [5] proposed using the Hamming distance for Boolean data given in

Eq. (5) as a measure for the consistency window. We follow the same approach here.

Figure 9 shows examples of the dynamics of the average Hamming distance, where

the bias is set in the ordered regime (panel a) so that the dynamics evolves to a fixed

point for all inputs and the other for a bias in the chaotic regime (panel b). Here, a

random Boolean initial condition is injected into the network for a single 5-ns-long

period, which destabilizes the network dynamics. This is repeated many times for

the same and different initial conditions. The consistency window wquantifies the

interval over which similar and different inputs can be distinguished.

Also apparent in this plot is the fading memory property of the ABNs. Beyond

the consistency window, the output state is uncorrelated with the input state in both

the ordered and chaotic regimes.

We repeat this experiment for a large number of randomly chosen reservoirs as a

function of network parameters. We determine wby finding the intersection point of

lines fit to the early part of the data. This procedure can result in values greater than

the 200 steps over which we collect data when the slopes are shallow. As seen in

Fig. 10, the consistency window sharply peaks for bias values at the ordered-chaotic

boundary (see Fig. 4) and approaches zero for small p. In the chaotic regime, w

takes on it smallest value around p=0.5 of ⇠50 time steps for K=3 and ⇠35 time

steps for K=4.

We also explore the dynamics of the network where all nodes execute the XOR

function, which corresponds to p=0.5 This is motivated by the previous work of

14 Reservoir computing with ABNs

Fig. 9 Temporal evolution of the normalized Hamming distance using two randomly selected data

sets with 64 pixels injected into the reservoir for 5 ns. The blue symbols indicates repeated injection

of the same data and finding the Hamming distance for each trial, and the red symbols indicate

the difference between this data set and a randomly chosen second data set for a) K=2 and

p=0.387 and b) K=4 and p=0.476. The black dashed lines are fits of the data to straight lines, the

intersection point of which are used to indicate the end of the consistency window, about ⇠30 time

steps for both sets of data.

Fig. 10 Average consistency window for as a function of network parameters. Blue symbols indi-

cate unmodified networks and red symbols indicate optimizer-pruned networks.

Haynes et al. [5] who realized an RC for serial digital data pattern recognition using

a single XOR node with self-feedback over two links with different time delays. The

results of this study are contained in the tight cluster of blue data points in Fig. 10

at p=0.5 and w⇠35 for K=2, w⇠25 for K=3, and w⇠20 for K=4. Given

that these point clusters are below the ‘U’-shaped band of points appearing above

the point cluster, corresponding to the random networks, indicates that the random

networks outperform the XOR network with regards to the consistency window,

which may improve the performance of the RC.

The results presented in this section demonstrate that a random ABN satisfies

the criteria of separation, generalization, and fading memory that is believed to be

required for reservoir computing.

Reservoir computing with ABNs 15

4.3 Realizing an RC for the MNIST classification task

In this section, we report on the performance of the RC on the MNIST classification

task using the following procedures. We use the first 42,000 images from the MNIST

data set and reduce the images to 16⇥16 pixels using the procedure given in Sec. 4.1.

We use less than the total MNIST image database size to reduce the computational

cost of our experiments and subsequent analysis. The output of the reservoir for each

written-digit image is a series of 200 time steps for each of the 64 reservoir nodes.

The resulting data for the output state of the network is transferred to a personal

computer, then to the Open Science Grid (see the Appendix) for training the RC

output weights. Even in binary format, the file size for the network output state

(42,000 images ⇥32 bytes/image ⇥200 time steps) is 672 MB for a single reservoir

realization. For this reason, we only choose 800 random networks when measuring

RC performance as a function of network parameters rather than the 6,000 used in

Sec. 3.2 above.

These data are used to determine Wout

m,nusing a ridge-regression regularization

parameter b2=103. The 10-element class output vector Yexpected has a value of

1 for the corresponding image and -1 for the remaining elements. The output data

for each image is reshaped into a single feature vector with a total length 64⇥200

= 12,800 elements with a total feature matrix Xin R12,800⇥42,000; performing the

matrix inversion given in Eq. 3 with this data set pushes the limits of our accessible

computer hardware in terms of memory to store the matrices and computation time.

The classification performance for each individual written digit is evaluated us-

ing the cross-validation method. Here, our selected data set of 42,000 written-digit

images is split into 5 equal-size groups of 8,400 images each. The training of Wout

m,n

is repeated 5 times using 4 groups for the training and the remaining for testing the

performance of the RC using this ‘unseen’ data set. After repeating this procedure

for all groups, the percent of correct classifications for each written digit is calcu-

lated (that is, the fraction of classification attempts that was completed correctly).

This procedure is then repeated for each of the 800 random reservoirs and for each

network parameter. The performance Pis defined as the average of the percentage

correct score for each written digit.

4.3.1 Exploring fading memory in the RC

In the previous section, our observations indicate the presence of fading memory

(see Fig. 9) as quantified by a single metric given by the consistency window. A

more nuanced measure for how the fading memory affects the performance on the

MNIST task is obtained by varying the range of time steps used in the classifica-

tion task (that is, the location and length of the block of data X(t)used during RC

training). We expect that the initial data is highly correlated with the input data and

hence it contains limited new information that can be used for classification. At the

other extreme, there is a loss of correlation at later times as the network short-term

16 Reservoir computing with ABNs

memory fades. We adjust the location and length of the data block by repeating the

classification task using data only from a start time step in the range 0 <tstart <70

until a final time step of tend =70 to determine the classification performance P. That

is, X only has data from observation times tstart to tend. At the final time step, there

is is almost no data recorded from the reservoir, explaining the small dip apparent

for the last data point of Fig. 11b).

Figure 11 shows two characteristic dependencies of the performance as a func-

tion of tstart . The performance starts high, followed by a rapid decrease, followed

by a long decaying tail. To compare performance across the random networks, we

find empirically that the initial high-performance section is well-fit with a Gaussian

distribution, which characterizes the duration of ‘short term memory.’ The longer

tail is well-fit by a power-law of the form

P(tstart )=c(tstart )a+0.1 (6)

and quantifies the ‘long-term’ memory of the reservoir. As tstart !•,P!0.1,

corresponding to a random guess of the 10 written-digit classes. As seen in the

figure, the fits to these two functions is reasonable.

Fig. 11 Fading memory of two example random reservoirs with (a) K=4 and p=0.912 and (b)

K=4 and p=0.553.

From the Gaussian fit, we extract a short-term memory coefficient sequal to the

1/ehalf-width of the Gaussian function in units of start samples used for classifica-

tion and gives a characteristic time scale for the short-term memory. Already after a

few time steps, the performance drops substantially, for which the output data con-

tains only partial information of the input image (recall we insert data into the net-

work over 4 time steps for the 16⇥16-pixel images). Figure 12 shows the short-term

memory as a function of the network parameters. Similar to the time to reach a fixed

point shown in Fig. 4, the short-term memory increases near the ordered-chaotic

transition boundary, but remains high throughout the chaotic regime for K=2. For

K=2 and 3, there is less scatter in the data, the domain of long memory time in-

creases with K, the memory scale peaks at the transition boundary, and there exists a

Reservoir computing with ABNs 17

minima in the memory time at p⇠0.5 between the peaks, although the dependence

is fairly flat. The longer short-term memory time for K=2 may suggest a reason for

the slightly higher best performance found for this network connectivity described

below in Sec. 4.

Fig. 12 Short-term memory of the RC for different reservoir parameters.

From the power-law fit, we extract a long-term memory coefficient Mdefined as

the time at which the the performance drops to 0.11. This is a measure of the time

needed for almost all information to vanish from the system. As seen from Fig. 13,

Mis peaked near the ordered-chaotic boundary, with a larger separation between the

two boundaries as Kincreases, and takes on a minimum value in the chaotic domain

when p⇠0.5. Interestingly, it is possible to create networks with long memory (10’s

of samples corresponding to >50 ns) even for reservoirs with biases as low as 0.1

when K>2, likely due to the large number of recurrent loops in the network and

the relatively long link time delays.

Fig. 13 Long-term memory of the RC for different reservoir parameters.

It is not possible to give an appropriate scale for the long-term memory because

it appears to be well described by a power law, which has no scale. As a substitute,

we determine the power-law exponent afor different network parameters as shown

in Fig. 14. For a power-law with a negative exponent (a>0 in Eq. 6), its integral is

18 Reservoir computing with ABNs

finite for a>1 and will diverge otherwise. The exponent becomes large for p!0

and p!1, which is due to fact that these networks reach a fix point within a few

time steps and almost all nodes are in the inactive (active) state for low (high) bias.

For pclose to the phase transition, a<1, suggesting the existence of substantial

long-term memory and hence long-term retention of information in the network. For

p⇠0.5, a⇡2 for all K, demonstrating that the long-term memory nearly absent

(a>1) and fairly insensitive to K, whereas the short-term memory is more sensitive

to this parameter.

Fig. 14 Power-law exponent for long-term memory in the RC for different network parameters.

4.3.2 RC Performance for different reservoir parameters

The previous sections demonstrate that an FPGA-based RC shows a fairly strong

dependence of the consistency window and memory on network parameters Kand

p. Based on the vast literature on RCs that states it is important to optimize the

consistency window and fading memory, we find a surprising lack of dependence of

the RC performance on Kand pfor the MNIST task as seen in Fig. 15. As in our

previous plots, the red symbols correspond to networks whose inactive nodes are

pruned by the Verilog compiler and optimizer. These pruned networks tend to have

lower performance; the lowest performer outliers have only a few active nodes and

hence there is little information available for classifying the written digits. For fully

realized networks with no pruning, indicated by the blue symbols, the performance

is only weakly dependent on Kand p, although the domain of well-performing

networks increases with K. Interestingly, we obtain good performance throughout

the chaotic domain and well into the ordered domains [32].

Zooming in on the high-performance, un-pruned networks (Fig. 16), some struc-

ture is evident with the largest spread of 4% for K=2. Evident is a slight peaking in

performance near the ordered-chaotic boundaries, although the spread in the data is

comparable to the peak sizes. Even within the middle of the chaotic domain, we find

some networks that perform nearly as well as the best performers near the ordered-

chaotic boundaries, especially for K=3 and 4.

Reservoir computing with ABNs 19

Fig. 15 The classification performance for various network parameters. The black line at P⇡

0.82 indicates the score of the linear classifier for comparison. Blue symbols indicate unmodified

networks and red symbols indicate optimizer-pruned networks.

Fig. 16 Higher resolution plots of the data shown in Fig. 15, but only including the results from

full (un-pruned) networks.

4.3.3 RC Performance dependence on output data sample size

We find that it is not necessary to use the data characterizing the network dynamics

X(t)collected over the entire 200 time steps. As above, we use the cross-validation

method but with the smaller recorded network data. Figure 17 shows the dependence

of the performance on the number of time steps used in the training where we always

start with the first time step. The performance saturates only after ⇠10 time steps

(corresponding to a interval of only 50 ns). Note that all nodes are connected to

the input layer and hence are activated immediately, but information injected to a

node does not fully spread fully throughout the network because of the substantial

link time delay (⇠13 ns). This suggests that the recurrent connections, which allow

for arbitrarily long loops in the reservoir, may not be as important for the MNIST

classification task.

4.3.4 Best results

After an exhaustive search over network parameters, we find the highest perfor-

mance over all p, finding the results given in Table 1. For comparison, we find that

20 Reservoir computing with ABNs

Fig. 17 Performance score as a function of the length of the data record used for training the RC

for K=2 and p=0.222. The black dashed line represents the performance of a linear classifier.

a linear classifier (no reservoir) has P⇡82.4% for all Kand thus the RC has better

performance in all cases. For a linear classifier with the full 28⇥28 image, LeCun

et al. [38] obtained P=88%. Our results with a compressed image still outperform

a full-image linear classifier. Furthermore, the image data is processed by the reser-

voir within 1 µs, suggesting high-speed prediction could be possible if the weights

are applied in real time on the FPGA (after off-line completion of the training of the

output weights), which we will explore in future studies. The best performing net-

works occurred for pnear the ordered-chaotic transition, although the dependence

on pis weak as discussed in Sec. 4.3.2 above.

Table 1 Observed RC best performance on the MNIST task

K PKoptimum p

2 90.5% p=0.688

3 89.9% p=0.668

4 89.0% p=0.229

The performance in a typical classification experiment is about 90%. In particu-

lar, the classification results were quite different for each written digit. Figure 18a)

shows the average classification accuracy Pifor each written digit iover all ex-

periments. When the classification fails (that it, the actual digit injected into the

RC is mis-classified), the RC will incorrectly classify the data as a different digit.

Figure 18b) shows the probability that the RC selects a digit for the case when it

mis-classified the image.

The digit ‘1’ is classified with the highest accuracy with a score of P1=94.6%,

while digit ‘5’ has the worst result of P5=76.0%, nearly a 20% difference from

the best digit. Surprisingly, digit ‘1’ is selected with much higher probability when

the RC fails to correctly classify an image. The digit selected least during a mis-

classification are ‘0’, ‘2’, and ‘6.’

To go deeper into understanding the errors, Fig. 19 shows the confusion matrix

quantifying how often a certain digit is mis-identified as a function of the input

digit. Here, the rows of the matrix are normalized to one; in each row, a blue color

Reservoir computing with ABNs 21

Fig. 18 a) Average classification accuracy over all experiments and b) mis-classification of the

written digit images.

indicates that this digit is hardly chosen while magenta pixels contribute the most to

the error probability.

Fig. 19 The confusion matrix visualizing the error for each digit.

From the confusion matrix, we see that if a digit is mis-classified as another digit

with high likelihood, the same can be said about the other digit. The most obvious

mis-classification pairs are 4 $9, 7$9, 1$7 and 3$5. Given the similarities of the

strokes to write these numbers, their confusion is sensible. However, it is surprising

that the pair 3$8 is not pronounced.

We now turn to investigate the influence of the image size on the performance.

For images compressed to a very small number of pixels, there is loss of information

because the coarse representation will effectively merge features in the written digit.

On the other hand, once the image has ⇠13 ⇥13 pixels, some of the data input to

the reservoir is lost because of the shortcoming of the way in which we inject data

into the reservoir as discussed above in Sec. 3.2. Figure 20 shows the performance

22 Reservoir computing with ABNs

of four different randomly chosen reservoirs for each connectivity K. In addition,

we show the performance of the linear classifier for the same data sets.

Fig. 20 Classification performance variation with the number of pixels representing the written

digit. The colored lines are for 4 randomly chosen reservoirs, while the black line is for the linear

classifier. The parameters are: K=2, p=0.5 (red), 0.566 (green), 0.719 (blue), 0.352 (yellow);

K=3, p=0.758 (red), 0.521 (green), 0.170 (blue), 0.482 (yellow); and K=4, p=0.176 (red), 0.244

(green), 0.297 (blue), 0.743 (yellow).

We see that the FPGA-based reservoirs always outperform the linear classifier for

all image sizes. As expected based on increased image resolution, the performance

of the linear classifier increased monotonically but there is a noticeable decrease in

the slope beyond ⇠12 ⇥12-pixel image size. On the other hand, the RC classifier

performance peaks around a ⇠14 ⇥14-pixel image size and then decreases some-

what. This decrease is likely due to the fact that only part of these images are input

to the reservoir by our method for data injection as mentioned above and discussed

in Sec. 3.2, a limitation we will address in future studies. However, for all cases, the

performance is above the linear classifier.

4.3.5 Parallel RCs

One method to possibly improve the performance of our reservoir computer is to

use multiple reservoirs, which we explore here in some preliminary studies. Similar

to deep learning artificial neural networks, a hierarchical structure can allow for

forecasting dynamical systems with different time scales or to focus the network on

different temporal or spatial features [39]. Also, a serial cascaded of RCs improves

the chaotic time series prediction performance [40]. A recent review summarizes

variations on these architectures [41].

Specific to the MNIST task Jalalvand et al. [42], showed improved performance

of an RC using multiple RCs in a parallel, serial, or parallel-serial geometries. Their

work is based on a software simulation with each RC having ⇠10,000 reservoir

nodes, but only feeding in a single column of pixel data from the original MNIST

28⇥28-pixel images. It is not apparent whether a similar approach will work here

Reservoir computing with ABNs 23

where we consider using ABNs with link time delays, whereas Ref. [42] considers

nodes with a hyperbolic tangent nonlinearity and no link delays. There is some hope

for an improvement using a parallel approach because specific reservoir realizations

perform somewhat better on some digits than others. We use the data collected from

the different reservoir realizations in a parallel configuration with `RCs where we

choose the `best reservoirs. The results are combined using a majority vote for

classification. As seen in Fig. 21, there is a modest increase in the performance,

peaking at P=91.8% with `⇠8.

Fig. 21 Classification performance for parallel RCs that each classify all written digits (blue line)

and the best combination of reservoirs where each only classifies a sub-set of the digits (green

line).

To obtain a sense of a possible optimal FPGA-based reservoir with 64-nodes, we

search through our database of results to find the `reservoirs that perform the best

for a single digit. The performance of this hypothetical machine is shown by the

green line. Better performance can likely only be obtained by increasing the various

resources as discussed in the next section.

5 Discussion and Conclusions

In this Chapter, we demonstrate that an FPGA-based RC can perform the written-

digit classification task. The performance over a wide range of reservoir parameters

exceeds that of a linear classifier, although the error rate is still relatively high in

comparison to the state-of-the-art deep learning feed-forward artificial neural net-

works with fully trained weights. One way to improve the performance of our RC

is to increase the number of connections of the input data to the reservoir and to

increase the number of reservoir nodes; in other related studies, we find that the

performance scales approximately as the square root of the number of input connec-

tions and reservoir nodes. In the future, we will also investigate full on-chip training

and classification [22], which should lead to extremely fast operation of an RC.

The FPGA platform is an especially appealing RC physical substrate because it is

commercially available, operates at low power, runs at high speed, easily integrated

24 Reservoir computing with ABNs

with traditional computer infrastructure, and can potentially lead to beyond-Turing

computing because we run the reservoir in the autonomous mode.

Acknowledgements S.A. and E.S. gratefully acknowledge the financial support of the Deutsche

Forschungsgemeinschaft (DFG, German Research Foundation) - Projektnummer 163436311- SFB

910. N.D.H., O.D. and D.J.G. gratefully acknowledge financial support from the U.S. Army Re-

search Office through Grant No. W911NF-12-1-0099. O.D. also gratefully acknowledges financial

support from the European Union’s Horizon 2020 research and innovation programme under the

Marie Skłodowska-Curie grant agreement No. 713694. We gratefully acknowledge the IT support

of Jimmy Dorff of the Duke University Department of Physics for extensive help in developing the

server to automate our experiments.

Appendix: FPGAs and project workflow

Brief introduction to FPGAs

An extensive discussion on using FPGAs for experiments on autonomous Boolean

networks, including hardware description language metacode, can be found in

Rosin [17], D’Huys et al. [31], Lohmann et al. [29], and Canaday et al. [20].

Briefly, FPGA’s contain more than 105programmable logic elements (LE) that can

be linked via programmable connections. The logic elements are based on CMOS

technology and are arranged in a grid system. Each of these elements consists of a

look-up table (LUT) with four inputs for the Altera Cyclone IV chip family used

here (EP4CE115F29C7N, Terasic DE2-115 demonstration board), a flip-flop and a

multiplexer. Figure A.1 illustrates the layout of a logic element.

Fig. A.1 Structure of a single logic element on the FPGA. The figure is adapted from [17].

The multiplexer is used to switch between clocked and unclocked (autonomous)

mode and the flip-flop provides the clocked output. The look-up table can be used

to define an operation for Kinputs, so having a length of 2Kbits and thus can be

used to realize up to 22Kdifferent Boolean functions. The Altera Cyclone IV logic

elements used in this FPGA have a maximum input number of K=4 so a total

number of 65,536 different operation can be performed by a single LE. This makes

Reservoir computing with ABNs 25

it easy to create an arbitrary setup on the device. Their relatively low costs and

their operation time on the nanosecond time scale are ideally suited for reservoir

computing applications.

The network is defined using the hardware description language Verilog, which

instructs the FPGA how to configure itself, including the function of the nodes and

the network links. This code is compiled using the proprietary software Quartus II,

version 14.0 [43]. The resulting bitstream is loaded onto the FPGA, which causes the

configuration of the RC. See the references above for details. Unique to this project

is the way in which the input and output state data are handled and interfaced with

the autonomous network forming the reservoir. We use a finite state machine (FSM)

synthesized from clocked logic and communicates with a personal computer (PC)

via a USB controller chip (FTDI part #FT232H), which is connected directly to the

FPGA pins via the GPIO connector on the demonstration board.

The FSM has the distinct states: idle,receive,acquire,send,program

and reset whose possible transitions are depicted in Fig. A.2. It starts in the idle

state and remains without any action as long as it does not receive any input over the

USB connection via the PC. After a signal is detected on the USB, the FSM changes

to the receive state. Next, the first available byte read from the USB determines

the next state of the FSM. The program state reads a number of bytes from the

USB connection and stores it onto the RAM onboard the chip. This number is set

in the Verilog code defining the reservoir (see below) and can only be changed by

recompiling the Verilog code. The data transferred to the FPGA in this manner sets

the initial state for a reservoir.

Fig. A.2 Illustration of the transition between the six states of the FSM.

When the state changes to acquire, all nodes in the reservoir are set to the

value zero and the data from the RAM that was transferred during the program

process is fed into the reservoir. Because the FSM operates with clocked logic, the

initial state is inserted over one clock cycle with the length of 5 ns (corresponding

to a data rate of 200 MHz). After this data insertion, the reservoir runs freely and

26 Reservoir computing with ABNs

autonomously. The only restrictions for the dynamics arise from the finite response

time of electric elements within the FPGA. On every clock cycle (5 ns period), a

snapshot of the reservoir is taken by saving the current value of each node to the

RAM. In this fashion, a series of 200 samples is captured and stored. Note that the

dynamics in between clock cycles can’t be measured using this method.

After all data is collected, the FSM is set to the send state, which will send

all the data stored in the RAM to the USB connection and is available to a Python

program running on the PC. The same reservoir can be used to collect data for the

same input word by sending several acquire commands or a new input word can

be used just by changing the initial-state RAM contents without having to recompile

the Verilog code and transferring the resulting bitstream to the FPGA.

The Verilog code for the FSM has not been presented in our previous publications

so we give the module below for completeness.

2 Master FSM monitors bytes sent from the PC to

3 look for command signals

4 Valid command signals are:

5 Begin aquiring data

6 Send data from RAM to PC

7 Program input words

8 Reset

9 When a valid command is recognized, the FSM flips

10 a signal bit that other modules are monitoring

11 */

13 module master_fsm(CLOCK_50, KEY, rcvd_sig,

byte_from_pc, acquire_signal, send_signal,

prog_word_signal, reset);

16 // Parameters

17 parameter IDLE = 0, RCVD = 1, ACQD_I = 2,

ACQD_II = 3, SEND_I = 4, SEND_II = 5, PROG_I = 6,

PROG_II = 7, RST_I = 8, RST_II = 9;

19 // Internal elements

20 input CLOCK_50;

21 input KEY;

22 input rcvd_sig;

23 input [7:0] byte_from_pc;

25 output acquire_signal;

26 output send_signal;

27 output prog_word_signal;

28 output reset;

Reservoir computing with ABNs 27

30 reg [3:0] state;

31 reg [5:0] count;

33 wire hardware_reset;

35 reg acq_sig, send_sig, prog_sig, rst_sig;

36 assign acquire_signal = acq_sig;

37 assign send_signal = send_sig;

38 assign prog_word_signal = prog_sig;

39 assign hardware_reset = ˜KEY;

40 assign reset = hardware_reset | rst_sig;

42 // The finite state machine

43 always @(*)

44 begin

45 case (state)

46 IDLE:

47 begin

48 acq_sig <= 1’b0;

49 send_sig <= 1’b0;

50 prog_sig <= 1’b0;

51 rst_sig <= 1’b0;

52 end

53 RCVD:

54 begin

55 acq_sig <= 1’b0;

56 send_sig <= 1’b0;

57 prog_sig <= 1’b0;

58 rst_sig <= 1’b0;

59 end

60 ACQD_I:

61 begin

62 acq_sig <= 1’b1;

63 send_sig <= 1’b0;

64 prog_sig <= 1’b0;

65 rst_sig <= 1’b0;

66 end

67 ACQD_II:

68 begin

69 acq_sig <= 1’b1;

70 send_sig <= 1’b0;

71 prog_sig <= 1’b0;

72 rst_sig <= 1’b0;

73 end

74 SEND_I:

28 Reservoir computing with ABNs

75 begin

76 acq_sig <= 1’b0;

77 send_sig <= 1’b1;

78 prog_sig <= 1’b0;

79 st_sig <= 1’b0;

80 end

81 SEND_II:

82 begin

83 acq_sig <= 1’b0;

84 send_sig <= 1’b1;

85 prog_sig <= 1’b0;

86 rst_sig <= 1’b0;

87 end

88 PROG_I:

89 begin

90 acq_sig <= 1’b0;

91 send_sig <= 1’b0;

92 prog_sig <= 1’b1;

93 rst_sig <= 1’b0;

94 end

95 PROG_II:

96 begin

97 acq_sig <= 1’b0;

98 send_sig <= 1’b0;

99 prog_sig <= 1’b1;

100 rst_sig <= 1’b0;

101 end

102 RST_I:

103 begin

104 acq_sig <= 1’b0;

105 send_sig <= 1’b0;

106 prog_sig <= 1’b0;

107 rst_sig <= 1’b1;

108 end

109 RST_II:

110 begin

111 acq_sig <= 1’b0;

112 send_sig <= 1’b0;

113 prog_sig <= 1’b0;

114 rst_sig <= 1’b1;

115 end

116 default:

117 begin

118 acq_sig <= 1’b0;

119 send_sig <= 1’b0;

Reservoir computing with ABNs 29

120 prog_sig <= 1’b0;

121 rst_sig <= 1’b0;

122 end

123 endcase

124 end

125

126 always @(posedge CLOCK_50 or posedge

hardware_reset)

127 begin

128 if (hardware_reset)

129 state <= IDLE;

130 else

131 case (state)

132 IDLE:

133 begin

134 if (rcvd_sig)

135 state <= RCVD;

136 else

137 state <= IDLE;

138 end

139 RCVD:

140 begin

141 case (byte_from_pc[7:0])

142 8’b00000001: state <= ACQD_I;

143 8’b00000010: state <= SEND_I;

144 8’b00001000: state <= PROG_I;

145 8’b00010000: state <= RST_I;

146 default:

147 state <= IDLE;

148 endcase

149 end

150 ACQD_I:

151 state <= ACQD_II;

152 ACQD_II:

153 state <= IDLE;

154 SEND_I:

155 state <= SEND_II;

156 SEND_II:

157 state <= IDLE;

158 PROG_I:

159 state <= PROG_II;

160 PROG_II:

161 state <= IDLE;

162 RST_I:

163 state <= RST_II;

30 Reservoir computing with ABNs

164 RST_II:

165 state <= IDLE;

166 default:

167 state <= IDLE;

168 endcase

169 end

170

171 endmodule

Experimental pipeline

Creating networks, running the experiments, and analyzing the results presented in

this chapter was simplified by automating the process with the following feature

specifications:

• The system runs in an automated fashion with as few manual steps as possible

• The system is separated into the distinct steps: creating the network; compiling

the network; running the experiment; and analyzing the data

• Scalability: Each individual step has any number of computers that can be

added or removed to or from the system at any time to run tasks in parallel

• Extensibility: New methods for running experiments or analyzing data can be

added at any time and all connected machines will automatically receive all

updates

• Supervision: The system is accompanied by a website that always shows the

current state of all ongoing actions

The code for this project is extensive; we provide it on a repository on an as-is basis

for those interested in exploring or extending the system [44].

The entire system is implemented in Python and runs on a centralized server

set up for us by the Duke University Physics Department IT staff. The core of the

system is based on a MySQL database that provides tables for reservoir,experiment

and analysis. All tables contain the data unique to each step of the workflow, such

as the number of reservoir nodes, number of delay elements, network connectivity,

number of written-image input words, and experimental repetitions. The pipeline is

controlled by two additional fields: status and tag, which are assigned to all items.

The status is an integer number that indicates open, running, successful, error, or

canceled. All sub-systems have a unique identifier that creates a tag on each item,

which allows us to determine which machine creates a certain item. The run-time

order is usually random and depends on the internal ordering of the database. If

some items are preferred over others, we assign a priority tag to force a schedule

order.

It is challenging to interact with the Quartus II system to automate creating dif-

ferent reservoirs in an automated manner. Therefore, this step was completed by a

user who creates a new python script to add new reservoirs. This process was fa-

cilitated as much as possible by providing many functions to create random nodes

and links and assign a bias to node functions. Finally, the Verilog code defining the

reservoir is created with appropriate metadata to identify the reservoir.

Reservoir computing with ABNs 31

After uploading a compiled bitstrea file, a new entry is inserted into the database

with a unique id and all attributes extracted from the reservoir file. The new item

is created as status open and has no tag assigned. For this step, a compile daemon

checks for new reservoirs, specifically looking for status open. If a new item is

found, it is updated to status running with the tag of the machine that claimed the

reservoir, thus preventing any other machine from taking the same item. The daemon

then downloads the reservoir file from the server and use the Quartus II software to

compile it. If the process ends successfully, the new file is then uploaded back to the

server and the status becomes successful. Otherwise the status is set to error and a

file containing the error message is uploaded instead.

To run experiments, an input creation function must exist. This function defines

the initial values of the network nodes. If a new function is required by a user, it

is developed and added to the system. Developing the project is managed by the

versioning software git, which has its repository on the Duke University centralized

data storage system.

Creating a new experiment is undertaken by inserting the experimental defini-

tion for an existing reservoir with input method and all required parameters into

the database with status open. An experiment daemon similar to reservoir daemon

checks for new experiments that have their status open, but additionally requires

the used reservoir to have status successful. It then assigns status running and sets

the tag identifier. If a new experiment is found, the daemon checks if an update has

been made to the code via the git repository and updates if necessary. Next, the in-

put creation function is called with all parameters and returns the defined number of

different input words. These initialize the network and each input is run with the set

repetition number. The entire data is stored in a single binary file that is uploaded if

the collection succeeds. An error message is uploaded in case of an error.

To analyze the data, we use the Open Science Grid (OSG) [45], which is a giant

computer cluster that includes machines from many universities all over the United

States of America. This platform allows us to start hundreds of jobs at the same

time and distributes the work to all connected machines, thus providing massive

computational power. However, this resource is not always reliable and sometimes

fails.

To account for this possibility, we use an analysis daemon that runs on local

machines as a fall back. All analysis code is developed in advance and synchronized

with git repository. The local and OSG daemons check for open analyses that have

successful experiments as their predecessor, and verifies that the latest version of

the code is in use. For the OSG, the data is uploaded and a configuration file sent

with all analysis parameters. When the analysis completes successfully, the results

files are uploaded back to the Duke University data storage system and all data on

the OSG belonging to this analysis is deleted. In case of an error, the data remains

on the OSG for further investigation. In any case, the status is set to the result.

32 Reservoir computing with ABNs

References

1. H. Jaeger and H. Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy

in wireless communication. Science 304, 78 (2004).

2. L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B.

Schrauwen, C. R. Mirasso, and I. Fischer. Information processing using a single dynamical

node as complex system, Nat. Commun. 2, 468 (2011).

3. L. Larger, A. Bayl´

oon-Fuentes, R. Martinenghi, V. S. Udaltsov, Y. K. Chembo, and M.

Jacquot. High-Speed Photonic Reservoir Computing Using a Time-Delay-Based Architec-

ture: Million Words per Second Classification. Phys. Rev. X 7, 011015 (2017).

4. Y. Paquot, F. Dupart, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar.

Optoelectronics reservoir computing. Sci. Rep. 2, 287 (2012).

5. N. D. Haynes, M. C. Soriano, D. P. Rosin, I. Fischer, and D. J. Gauthier. Reservoir computing

with a single time-delay autonomous Boolean node. Phys. Rev. E 91, 020801 (2015).

6. L. Appeltant, G. Van der Sande, J. Danckaert, and I. Fischer. Constructing optimized binary

masks for reservoir computing with delay systems, Sci. Rep. 4, 3629 (2014).

7. A. R¨

ohm, and K. L¨

udge. Multiplexed networks: reservoir computing with virtual and real

nodes. J. Phys. Commun. 2, 085007 (2018).

8. B. Penkovsky, L. Larger, and D. Brunner. Efficient design of hardware-enabled reservoir com-

puting in FPGAs. J. Appl. Phys. 124, 162101 (2018).

9. C. Mesaritakis, A. Bogris, A. Kapsalis, and D. Syvridis. High-speed all-optical pattern recog-

nition of dispersive Fourier images through a photonic reservoir computing subsystem. Opt.

Lett. 40, 3416-3419 (2015).

10. F. Denis-Le Coarer, M. Sciamanna, A. Katumba, M. Freiberger, J. Dambre, P. Bienstman and

D. Rontani. All-Optical Reservoir Computing on a Photonic Chip Using Silicon-Based Ring

Resonators. IEEE J. Sel. Top. Quantum Electron. 24 1-8 (2018).

11. K. Vandoorne, P. Mechet, T. Van Vaerenbergh, M. Fiers, G. Morthier, D. Verstraeten, B.

Schrauwen, J. Dambre, and P. Bienstman. Experimental demonstration of reservoir comput-

ing on a silicon photonics chip. Nat Commun 5, 3541 (2014).

12. H. Zhang, Z. Feng, B. Li, Y. Wang, K. Cui, F. Lin, W. Dou, and Y. Huang. Integrated pho-

tonic reservoir computing based on hierarchical time-multiplexing structure. Opt. Express. 22

31356-31370 (2014).

13. Q. Vinckier, F. Duport, A. Smerieri, K. Vandoorne, P. Bienstman, M. Haelterman, and S.

Massar. High-performance photonic reservoir computer based on a coherently driven passive

cavity. Optica 2438-446 (2015).

14. A. Katumba, J. Heyvaert, B. Schneider, S. Uvin, J. Dambre, and P. Bienstman. Low-loss

photonic reservoir computing with multimode photonic integrated circuits. Sci. Reports 8,

2653 (2018).

15. J. P. Mason, P. S. Linsay, J. J. Collins, L. Glass. Evolving complex dynamics in electronic

models of genetic networks. Chaos 14, 707 (2004).

16. R. Zhang, H. L. D. de S. Cavalcante, Z. Gao, D. J. Gauthier, J. E. S. Socolar, M. M. Adams,

and D. P. Lathrop. Boolean Chaos. Phys. Rev. E. 80, 045202(R) (2009).

17. D. P. Rosin. Dynamics of Complex Autonomous Boolean Networks. (Springer, Heidelberg,

2015).

18. M. Ghil and A. Mullhaupt. Boolean delay equations. II. Periodic and aperiodic solutions. J.

Stat. Phys. 41, 125 (1985).

19. M. Ghil, I. Zaliapin, and B. Coluzzi. Boolean delay equations: A simple way of looking at

complex systems. Physica D 237, 2967 (2008).

20. D. Canaday, A. Griffith, and D. J. Gauthier. Rapid Time Series Prediction with a Hardware-

Based Reservoir Computer. Chaos 28, 123119 (2018).

21. P. Antonik, A. Smerieri, F. Duport, M. Haelterman, and S. Massar. FPGA Implementation

of Reservoir Computing with Online Learning. Benelearn (24: 19 June 2015: Delft, Nether-

lands), 24th Belgian-Dutch Conference on Machine Learning (Benelearn).

Reservoir computing with ABNs 33

22. Y. Yi, Y. Liao, B. Wang, X. Fu, F. Shen, H. Hou, and L. Liu. FPGA based spike-time depen-

dent encoder and reservoir design in neuromorphic computing processors. Microprocessors

and Microsystems 46B, 175 (2016).

23. P. Antonik. Application of FPGA to Real-Time Machine Learning. (Springer, Cham, Switzer-

land, 2018).

24. E. S. Skibinsky-Gitlin, M. L. Alomar, C. F. Frasser, V. Canals, E. Isern, M. Roca, and J. L.

Rossell´

o. Cyclic Reservoir Computing with FPGA Devices for Efficient Channel Equaliza-

tion. In: Rutkowski L., Scherer R., Korytkowski M., Pedrycz W., Tadeusiewicz R., Zurada J.

(eds.) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer

Science, vol 10841. (Springer, Cham, 2018).

25. M. L. Alomar, M. C. Soriano, M. Escalona-Mor´

an, V. Canals, I. Fischer, C. R. Mirasso, and J.

L. Rossell´

o. Digital Implementation of a Single Dynamical Node Reservoir Computer. IEEE

Trans. Circuits Sys. II: Exp. Briefs 62, 977 (2015).

26. S. Apostel. Dynamics of driven complex autonomous Boolean networks with application to

reservoir computing. M.S. Thesis, Technische Universit¨

at Berlin. Unpublished (2017).

27. R. Edwards and L. Glass. A calculus for relating the dynamics and structure of complex

biological networks. In: Adventures in Chemical Physics: A Special Volume of Advances in

Chemical Physics. R. S. Berry and J. Jortner, Eds. (John Wiley & Sons, Inc., Hoboken, 2006),

pp. 151-178.

28. R. Edwards, P. van den Driessche, and L. Wang, Periodicity in piecewise-linear switching

networks with delay. J. Math. Bio. 55, 271 (2007).

29. J. Lohmann, O. D’Huys, N. D. Haynes, E. Sch¨

oll, and D. J. Gauthier. Transient dynamics

and their control in time-delay autonomous Boolean ring networks. Phys. Rev. E 95, 022211

(2017).

30. D.J. Gauthier. Reservoir computing: Harnessing a universal dynamical system. SIAM News

51:2, 12 (2018).

31. O. D’Huys, J. Lohmann, N. D. Haynes, and D. J. Gauthier. Super-transient scaling in time-

delay autonomous Boolean network motifs. Chaos 26, 4954274 (2016).

32. I. B. Yildiz, H. Jaeger, and S. J. Kiebe. Re-visiting the echo state property. Neural Networks

35, 1 (2012).

33. B. Derrida and Y. Pomeau. Random networks of automata: a simple annealed approximation.

Europhys. Lett. 1, 45 (1986).

34. H. Flyvbjerg. An order parameter for networks of automata. J. Phys. A. 21, L955 (1988).

35. B. Luque and R. V. Sol´

e. Lyapunov exponents in random Boolean networks. Physica A. 284,

33 (2000).

36. http://yann.lecun.com/exdb/mnist/

37. A. Uchida, R. McAllister, and R. Roy. Consistency of nonlinear system response to complex

drive signals. Phys. Rev. Lett. 93, 244102 (2004).

38. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document

recognition. Proceedings IEEE, 86, 2278 (1998).

39. H. Jaeger. Discovering multiscale dynamical features with hierarchical echo state networks.

Technical report 10, School of Engineering and Science, Jacobs University, unpublished.

40. R. Y. Webb. Multi-layer corrective cascade architecture for one-line predictive echo state

networks. Appl. Artif. Intell. 22, 811 (2008).

41. G. Tanaka, T. Yamane, J. B. H´

eroux, R. Nakane, N. Kanazawa, S. Takeda, H. Numata,

D. Nakano, and A. Hirose. Recent advances in physical reservoir computing: A review.

https://arxiv.org/abs/1808.04962v2 (2018).

42. A. Jalalvand, K. Demuynck, W. De Neve, R. Van de Walle, and J.-P. Martens. Design of

Reservoir Computing Systems for Noise-Robust Speech and Handwriting Recognition. In:

Conference on graphics, patterns and images, 28. (SIBGRAPI), 2015, Salvador. Porto Alegre:

Sociedade Brasileira de Computac¸˜

ao, 2015. On-line. IBI: 8JMKD3MGPBW34M/3JUJ5DP.

Available from: http://urlib.net/rep/8JMKD3MGPBW34M/3JUJ5DP.

43. https://www.intel.com/content/www/us/en/programmable/downloads/download-center.html

44. https://github.com/nickdavidhaynes/boolean-reservoir-computer

45. https://opensciencegrid.org/