Document [original]

Expanding the analysis of functional

Near-Infrared Spectroscopy (fNIRS) data

with multivariate techniques:

application to a children’s literacy study

vorgelegt von

M.Sc.

Jessica Gemignani

ORCID: 0000-0002-7722-4489

von der Fakultät IV

−

Elektrotechnik und Informatik

der Technischen Universität Berlin

zur Erlangung des akademischen Grades

Doktor der Ingenieurwissenschaften (

Dr.-Ing

)

genehmigte Dissertation

Promotionausschuss:

Vorsitzender: Prof. Dr. Klaus Obermayer

Gutachter: Prof. Dr. Benjamin Blankertz

Gutachter: Prof. Dr. Judit Gervain

Gutachter: Prof. Dr. Isabell Wartenburger

Gutachter: Dr. Christoph H. Schmitz

Tag der wissenschaftlichen Aussprache: 25. September 2019

Berlin 2019

This project has received funding from the European Union’s Horizon 2020 research

and innovation programme, under the Marie Sklodowska-Curie grant agreement No.

641858, Innovative Training Network PredictAble, "Understanding and predicting de-

velopmental language abilities and disorders in multilingual Europe".

Acknowledgements

These years as a Ph.D. student in Berlin have been a true rollercoaster of experiences

and emotions that I will never forget, and now that I finished writing this thesis I want

to take a moment to thank all those who made it that way.

As someone who pursued her Ph.D at university, while being employed at a company,

framed in a network with several other universities, and did a research visit overseas,

I surely have come in touch with a lot of people. And it is indeed the case that when

I began, my worst fear was that there was just “too many people” involved and that I

wouldn’t be able to meet everyone’s expectations.

Little did I know that those “too many people” would actually be key to the success of

this work and that they would all help me shape my project and my scientific growth

in such a smooth way I could not possibly imagine at the beginning.

First of all, I want to thank Prof. Benjamin Blankertz, who guided me scientifically

and from whom I learned so much - but I wish I had learnt so much more!

Then my gratitude to the whole NIRx family: you have been, indeed, like a family to

me. In particular, I want to thank Dr. Christoph Schmitz and Prof. Randall Barbour

for always helping and sharing their immense knowledge on the theoretical aspects of

fNIRS, you definitely have been the ace in the hole that anyone in my position would

have wanted; I want to express my gratitude to Dr. Eike Middell, former colleague of

the Development division, for always jumping together with me into my data analysis

problems and contributing to shape the project from the very start. I want also to

thank the whole Support Team, in particular Ms. Lamija Pašalić, because by giving

me constant exposure to the support cases they helped me understand so much better

many practical aspects of various fNIRS applications - this was so precious for my

learning. Finally, a big thank you to Mr. Richard Barbour, NIRx’s CEO, for believing

iii

in my abilities and making possible that I could be in contact with all these very

different aspects of the company life.

I want to express my gratitude also to Prof. Kenneth Pugh, director of Haskins Labo-

ratories, where I spent three wonderful and so productive months: I have been amazed

at how available he has been to me during the whole visit and afterwards, and how

open and helpful the Haskins members have all been to me; working there has been a

real landmark moment of my Ph.D journey.

I am also very grateful I had the opportunity of being a Marie Skłodowska-Curie Early

Stage Researcher (ESR): knowing to be part of a network has been very encouraging

from the very beginning, and I want to thank in particular Prof. Isabell Wartenburger,

for letting me in with her fNIRS data, guiding me through her data analysis, sharing

tips, discussing with me shortcomings and problems that were to be addressed, in other

words, for teaching me a lot!

That name of mine on this dissertation represents therefore not only my work but also

the sum of the contributions of all of you, who helped me so much along my way.

Finally, nothing would have been possible without the support of my wonderful family:

my husband, my siblings, my loving parents, always ready to jump on a plane at a

moment’s notice and come to Berlin just to see me for a few hours. I feel so lucky to

have you in my life.

Abstract

The use of functional Near-Infrared Spectroscopy is experiencing a rapid growth in use

in every area of neuroscientific research, but one of the most prolific areas of application

is the field of language development in children.

If over the past decades there have been incredible advancements on the technology

side, the same can not be said about data analysis techniques: for fNIRS, an ad-hoc

standardized data analysis procedure is still lacking; there is no general consensus

around the best set of pre-processing steps to be performed; there is no standard

choice of which hemoglobin component should be used to infer functional brain activity

(oxygenated or deoxygenated hemoglobin); the statistical analysis has been carried out

for a long time with methods borrowed from the fMRI practice. Only recently, effort

has been made in defining fNIRS-specific techniques, both at single-channel and multi-

channel level.

The first contribution of this work was to introduce a single-channel classifier that com-

bines features from both oxy- and deoxyhemoglobin and that employs Linear Discrim-

inant Analysis (LDA) to classify channels as ‘active’ or ‘not-active’. Its performances

were compared to those achieved by the General Linear Model (GLM) and it was found

that the LDA-based classifier not only yields higher classification accuracies but those

accuracies are more stable across different subjects, i.e. the multivariate method was

more robust with respect to intersubjective variability of the hemodynamic response.

The second contribution of this work was to investigate the impact of literacy on

functional brain organization, with both single-channel (univariate) and multi-channel

(multivariate) analysis approaches. Using a large fNIRS dataset including children of

various ages and literacy levels, we were able to show that the univariate approach

has a unique strength in localizing the effects under investigation. On the other hand,

the multi-channel analysis approach did not produce a statistically significant effect,

most likely because the experimental design was not optimally suited for the trial-by-

trial classification; nevertheless, it highlighted a trend that had eluded the univariate

analysis.

We conclude that both types of analysis should be routinely employed, because their

complementary strengths can answer to different questions, but also that the single-

channel analysis should be rendered more robust by using both hemoglobin compo-

nents and that a data-driven approach such as the one proposed may mitigate the

shortcomings of the model-based single-channel analysis. Finally, in order to perform

a multivariate pattern analysis, the experimental paradigm should be designed as to

include enough trials for classification.

Zusammenfassung

Der Einsatz der funktionellen Nahinfrarotspektroskopie nimmt in allen Bereichen

der neurowissenschaftlichen Forschung rasant zu, und eines der produktivsten Anwen-

dungsgebiete ist das Feld der Sprachentwicklung bei Kindern.

Wenn es in den letzten Jahrzehnten technologisch gesehen unglaubliche Fortschritte

gegeben hat, kann man das Gleiche nicht über Datenanalyseverfahren sagen: Für

fNIRS fehlt noch ein ad-hoc standardisiertes Datenanalyseverfahren; es gibt keinen all-

gemeinen Konsens über die beste Reihe von durchzuführenden Vorverarbeitungsschrit-

ten; es gibt keine Standardauswahl, welche Hämoglobinkomponente verwendet werden

sollte, um funktionelle Hirnaktivität abzuleiten (sauerstoffreiches oder sauerstoffarmes

Hämoglobin); die statistische Analyse wird seit langem mit Methoden durchgeführt,

die der fMRI-Praxis entnommen wurden. Erst kürzlich wurden Anstrengungen unter-

nommen, um fNIRS-spezifische Techniken zu definieren, sowohl auf Einkanal- als auch

auf Mehrkanalebene.

Der erste Beitrag dieser Arbeit war die Einführung eines Einkanalklassifizierers, der

Merkmale von Oxy- und Desoxyhämoglobin kombiniert und der die Linear Discrimi-

nant Analysis einsetzt, um Kanäle als "aktiv" oder "nicht aktiv" zu klassifizieren. Seine

Leistungen wurden mit denen des General Linear Model verglichen und es wurde

festgestellt, dass der LDA-basierte Klassifikator nicht nur höhere Klassifikationsge-

nauigkeiten liefert, sondern dass diese Genauigkeiten bei verschiedenen Probanden

stabiler sind, d.h. die multivariate Methode war robuster in Bezug auf intersubjek-

tive Variabilität der hämodynamischen Reaktion.

Der zweite Beitrag dieser Arbeit war die Untersuchung der Auswirkungen der Al-

phabetisierung auf die funktionelle Gehirnorganisation, mit einkanaligen (univariaten)

vii

und mehrkanaligen (multivariaten) Analysemethoden. Anhand eines großen fNIRS-

Datensatzes mit Kindern unterschiedlichen Alters und Alphabetisierungsgrades kon-

nten wir zeigen, dass der univariate Ansatz eine einzigartige Stärke bei der Lokalisierung

der untersuchten Effekte hat. Andererseits zeigte der Multi-Channel-Analyse-Ansatz

keinen statistisch signifikanten Effekt, wahrscheinlich weil das experimentelle Design

nicht optimal für die Klassifizierung im Versuch geeignet war; dennoch zeigte er einen

Trend auf, der sich der univariaten Analyse entzogen hatte.

Wir kommen zu dem Schluss, dass beide Arten der Analyse routinemäßig eingesetzt

werden sollten, weil ihre komplementären Stärken auf unterschiedliche Fragen antworten

können, aber auch, dass die Einkanalanalyse durch den Einsatz beider Hämoglobinkom-

ponenten robuster gemacht werden sollte und dass ein datengetriebener Ansatz wie der

vorgeschlagene die Mängel der modellbasierten Einkanalanalyse mildern kann.

Schließlich, um eine multivariate Musteranalyse durchzuführen, sollte das experimentelle

Paradigma so gestaltet werden, dass es genügend Versuche für die Klassifizierung en-

thält.

viii

Contents

Title Page i

Acknowledgements iii

Abstract v

Zusammenfassung vii

Contents ix

List of Figures xv

List of Tables xxv

1 Introduction 1

1.1 The use of functional Near-Infrared Spectroscopy in language research . 1

1.2 Motivations and outline of this thesis . . . . . . . . . . . . . . . . . . . 4

1.3 Publications................................. 6

1.3.1 First-author publications . . . . . . . . . . . . . . . . . . . . . . 6

Peer-Reviewed journals . . . . . . . . . . . . . . . . . . . . . . . 6

Peer-Reviewed Conferences . . . . . . . . . . . . . . . . . . . . . 6

Abstracts .............................. 6

1.3.2 Additional contributions . . . . . . . . . . . . . . . . . . . . . . 7

Peer-Reviewed journals . . . . . . . . . . . . . . . . . . . . . . . 7

Peer-Reviewed Conferences . . . . . . . . . . . . . . . . . . . . . 7

Abstracts .............................. 7

2 Physical principles and applications of functional Near-Infrared Spec-

troscopy 9

2.1 Physiological and biological principles of fNIRS . . . . . . . . . . . . . 9

2.1.1 Vasculature and hemodynamics of the brain . . . . . . . . . . . 9

2.1.2 The hemodynamic response . . . . . . . . . . . . . . . . . . . . 10

2.1.3 Instrumentation........................... 11

2.1.4 Acquisition of the signal . . . . . . . . . . . . . . . . . . . . . . 12

2.1.5 The fNIRS signal: how the raw data looks like . . . . . . . . . . 14

2.1.6 From light intensities to concentration changes: the modified

Beer-LambertLaw ......................... 16

2.2 Other neuroimaging techniques . . . . . . . . . . . . . . . . . . . . . . 20

2.2.1 EEGandfNIRS........................... 21

2.2.2 fMRIandfNIRS .......................... 23

2.3 The use of fNIRS in language research on children . . . . . . . . . . . . 25

2.4 AnalysisoffNIRSdata........................... 26

2.4.1 Reduction of motion artifacts . . . . . . . . . . . . . . . . . . . 27

2.4.2 Physiological confounds . . . . . . . . . . . . . . . . . . . . . . 29

2.4.3 Frequency filtering . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.4.4 Computation of concentration changes . . . . . . . . . . . . . . 32

2.4.5 Statistical inference . . . . . . . . . . . . . . . . . . . . . . . . . 33

Thedesignmatrix ......................... 34

Solvingthemodel.......................... 35

2.4.6 Limitations of the univariate statistical analysis . . . . . . . . . 38

fNIRS errors are non-spherical . . . . . . . . . . . . . . . . . . . 38

A-priori selection of the basis function . . . . . . . . . . . . . . 41

Multiplecomparisons........................ 42

Independent analysis of HbO and HbR . . . . . . . . . . . . . . 43

2.5 The potential role of Machine Learning . . . . . . . . . . . . . . . . . . 44

2.6 Linear Discriminant Analysis (LDA) . . . . . . . . . . . . . . . . . . . 46

2.7 Support Vector Machines (SVM) . . . . . . . . . . . . . . . . . . . . . 50

2.8 Lessonslearned ............................... 52

3 From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation 53

3.1 Introduction................................. 53

3.2 Methods................................... 55

3.3 Theoretical formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.3.1 Generation of the synthetic dataset . . . . . . . . . . . . . . . . 57

3.3.2 Dataanalysis ............................ 58

Pre-processing............................ 58

AnalysiswithGLM......................... 59

AnalysiswithLDA......................... 59

Evaluation of performances . . . . . . . . . . . . . . . . . . . . 61

Analysis of the correlation between subjects’ demographics and

classification accuracy . . . . . . . . . . . . . . . . . . 62

3.4 Application of the algorithm to experimental data . . . . . . . . . . . . 63

3.4.1 Experimental setup and data collection . . . . . . . . . . . . . . 63

3.4.2 Dataanalysis ............................ 64

AnalysiswithGLM......................... 66

AnalysiswithLDA......................... 66

3.4.3 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.5 Results.................................... 67

3.5.1 Theoretical formulation . . . . . . . . . . . . . . . . . . . . . . . 67

Performance of the algorithms: overall classification accuracies . 67

Correlation between classification accuracies and individual mea-

sures............................ 69

3.5.2 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . 70

3.6 Discussion.................................. 76

3.7 Lessonslearned ............................... 80

4 Investigating the effect of literacy on the functional organization of

the brain with univariate analysis of fNIRS data 81

4.1 Introduction................................. 81

4.2 Methods................................... 83

4.2.1 Participants............................. 83

4.2.2 Behavioural assessment . . . . . . . . . . . . . . . . . . . . . . . 84

4.2.3 Experimental design . . . . . . . . . . . . . . . . . . . . . . . . 84

4.2.4 Dataacquisition........................... 85

4.2.5 Optode localization . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.2.6 Pre-processing............................ 87

4.2.7 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.3 Results.................................... 89

4.3.1 Mixed-Effects Linear Model . . . . . . . . . . . . . . . . . . . . 89

4.3.2 Blockaverages ........................... 92

4.4 Discussion.................................. 94

4.5 Lessonslearned ............................... 96

5 Decoding the mental representation of word meaning in children to

predict their literacy skills, with multivariate analysis of fNIRS data 97

5.1 Introduction................................. 97

5.2 Methods................................... 99

5.2.1 Participants............................. 99

Pre-processing............................ 100

5.2.2 Within-subject classifier . . . . . . . . . . . . . . . . . . . . . . 102

Featureextraction ......................... 102

Classification of Multivariate Patterns . . . . . . . . . . . . . . 104

5.2.3 Analysis of individual classification accuracies . . . . . . . . . . 105

5.2.4 Within-groups classifier . . . . . . . . . . . . . . . . . . . . . . . 106

5.3 Results.................................... 106

5.3.1 Overall performances: SVM vs LDA . . . . . . . . . . . . . . . 106

5.3.2 Individual performances . . . . . . . . . . . . . . . . . . . . . . 107

5.3.3 Correlation between individual decoding accuracies and reading

abilities ............................... 107

5.3.4 Within-groups classification . . . . . . . . . . . . . . . . . . . . 111

5.4 Discussion.................................. 113

5.5 Lessonslearned ............................... 116

6 Summary and Conclusions 117

A Appendix 123

A.1 Supplementary information of Chapter 3 . . . . . . . . . . . . . . . . . 123

A.1.1 Correlation between individual measures and classification accu-

racies................................. 123

A.1.2 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . 124

xii

A.2 Supplementary information of Chapter 4 . . . . . . . . . . . . . . . . . 142

A.2.1 Localization of optodes . . . . . . . . . . . . . . . . . . . . . . . 142

Bibliography 149

xiii

List of Figures

2.1 Extinction coefficients for HbO (HbO2), HbR (Hb) and cytochrome ox-

idase (CtOx). Figure adapted from Delpy and Cope, 1997 [19]. . . . . . 11

2.2 Different technological approaches for fNIRS . . . . . . . . . . . . . . . 11

2.3 Each source-detector pair constitutes a NIRS "channel"; the light emit-

ted from the source and measured by the detector follows the proba-

bility model given by the banana-shaped path. This path represents

the probability of sampling a given optical path. In this case, the pairs

source1-detector2 and source2-detector1 allow penetration of the light

into the cortex, while the pairs source1-detector1 and source2-detector2

are close enough to only allow penetration through the shallow layers

(short-distance channels). Figure is taken from www.nirx.net . . . . . . 14

2.4 Composition of the fNIRS signal. Figure is taken from Tachsidis and

Scholkmann,2016[26] ........................... 15

2.5 (Top and bottom left): Example of the raw signal of light intensity. In

this timetrace the different components (heartbeat, respiration, Mayer

waves) are clearly recognizable, as well as a few spike artifacts, probably

caused by the displacement between one of the optodes and the skin.

(Bottom right) Typical frequency spectrum of the fNIRS signal . . . 16

2.6 Tabulated DPF values from Essenpreis et al. 1993. The different time-

traces correspond to different subjects (1 female, 6 males, median age

28 years old). It is clear that, despite the intersubject variability, the

DPF exhibits a descending trend as the wavelength increases. . . . . . 19

2.7 EEG, fNIRS and fMRI in terms of spatial and temporal resolution . . . 22

2.8 Spring-loaded fiber holders. The spring keeps the tip of the optode at

constant pressure, reducing the possibility of loss of contact between the

tipandthescalp............................... 27

2.9 Multi-distance measurement; by placing a detector at a shorter distance,

light measured by that detector will only come from the shallow layer . 30

2.10 Example of application of the algorithm proposed by Saager and Berger

[62]. On the right side of the figure, it is clearly visible that the ∼0.1

Hz fluctuations ("Mayer waves") are removed with the short-distance

correction .................................. 30

2.11 Effect of applying a band-pass filter [0.01-0.2 Hz]; the data has sampling

frequency 4.4 Hz and the magenta parts represent the task periods (10

s, with a jittered pause of 19-26 s). On the PSD graphs, the red line is

used for the PSD of the first wavelength and the blue one for the second

wavelength (in this case 760 and 850 nm) . . . . . . . . . . . . . . . . . 32

2.12 Each column of the design matrix X(t) is the result of the convolu-

tion of the stimulus vector of each experimental condition, u(t) (left),

with a “basis function” representing the expected hemodynamic response

elicited by that condition (middle). The result is a theoretical time

course that is completely predicted by the administered stimuli (right),

and this is the timetrace that is then fitted to the measured data. The

more the condition actually had an impact on the data, the better will

be the fit, in terms of amplitude of the corresponding ß. Figure is taken

from[4] ................................... 34

xvi

2.13 Basis functions that are most commonly used to model the expected

HRF in a GLM. Figures are produced with the analysis software nirsLAB

[66]. On the top row, the Canonical HRF is depicted (top-left); the

Canonical HRF can be used in conjunction with its time derivative (top-

middle) to capture temporal shifts in the peak latency, and also with

the dispersion derivative (top-right) to capture differences in the peak

duration. The (bottom-left) panel shows the Fourier set; the Fourier

set consists of a constant + K sine functions + K cosine functions (tot:

2K +1) of harmonic period T, T/2, . . . T/K. The (bottom-middle)

panel shows K gamma functions (K=3); while the Finite Impulse Re-

sponse (FIR) basis function (bottom-right) consists of K contiguous

boxcar functions, each lasting T/K seconds, where T is the duration of

the HRF. Linear combinations of the FIR or Fourier basis functions can

capture any shape of response. For a more comprehensive description of

these functions, we refer the reader to the book “Statistical Parametric

Mapping: the analysis of functional brain images” [4] . . . . . . . . . . 36

2.14 Figure taken and adapted from [5]. [Left] the distribution of the resid-

uals is not Gaussian, but it’s characterized by heavy-tails produced by

large outliers, such as motion artifacts. [Right] The temporal auto-

correlation of the raw data along with the autocorrelation of the data

processed according to different algorithms; the chart shows that the

AR(n) prewhitening yields the best results in terms of reduction of au-

tocorrelation. ................................ 39

2.15 Geometry of the classification problem with Linear Discriminant Analy-

sis: the decision surface (in red) is perpendicular to w, and its intercept

is defined by the bias w0. Figure taken from Bishop (2006) [92] . . . . . 47

3.1 In each iteration, data are simulated, based on either synthetic or real

resting-state data; HbO and HbR (red and blue time traces in the “Syn-

thetic dataset” panel, respectively) were analyzed with GLM or LDA,

and ROC analysis was performed to compare the classification accura-

cies. The simulated HRFs vary in shape and size, and 30% of them are

characterized by a “double bump” as a simplified model of stimulus-

lockedMayerwaves. ............................ 56

xvii

3.2 (A) Example of how simulated HRFs are created (black line) and added

to a real resting-state time trace (dark grey line). The top trace is

a simple HRF while the bottom trace contains a double bump. (B)

The grey time trace represents the HbO signal before the hemodynamic

activations are added; the red and blue ones are, respectively, the time

traces after a simple HRF or a double-bump HRF have been added. . . 56

3.3 (A): Block averages of HbO (left) and HbR (right) signal used for feature

extraction; dashed lines represent the 1s steps used for the moving-

window computation of amplitude and slope. (B): Features vectors are

obtained from the block averages by computing mean and slope of the

signal over a sliding window of 3 s duration with 1 s steps, resulting

in a 30-features vectors that were then normalized and concatenated

to produce the 60-features multivariate (HbO + HbR) classifier. Grey

lines represent individual trials, black lines highlight the mean value of

the feature vectors corresponding to “active” channels, and blue lines

highlight the mean value of the feature vectors corresponding to the “not

active” channels. Values of the y axis are normalized values . . . . . . 60

3.4 (A) 8 sources ×8 detectors montage (covering the motor region); (B)

Source-Detector pairs and corresponding channels . . . . . . . . . . . . 64

3.5 (Left) Probe setup. The arrangement of optodes follows the 10-20 stan-

dard and the placement is analogous in the other hemisphere. Red dots

indicate the sources, blue dots indicate the detectors, and yellow lines

indicate the formed channels. (Right) Sensitivity profile of the probe

setup. The sensitivity profile has units of mm−1. Multiplying the value

of the sensitivity profile of a given measurement channel by an absorp-

tion change (mm−1) and the area over which that change occurs (mm2)

will give an estimate of the optical density change that would be mea-

sured by that channel as a result of that absorption change [111] . . . . 65

xviii

3.6 (A) ROC curves for GLM and LDA, using HbO, HbR and HbO+HbR

features (only for LDA). The curves for the real resting-state data (solid

lines) are obtained by averaging the individual curves across subjects,

while dotted lines refer to the completely synthetic dataset. (B) The

table reports the mean classification accuracies, over all iterations, of

the three algorithms applied to synthetic and real resting-state data.

The classification accuracy is computed from the ROC curves at the

false positive rate of 0.05. . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.7 (left) Individual classification accuracies for the real-resting-state datasets,

for LDA(HbO+HbR), GLM(HbO), and GLM(HbR) (left, middle, right).

The red line indicates the mean accuracy reached by each algorithm

over all the subjects. The errorbars represent the standard error of

the mean for each individual subject, over all the iterations performed.

The individual mean accuracies achieved by the LDA method is sig-

nificantly higher than those achieved by the GLM(HbO) (p = 0.0002)

and GLM(HbR) (p = 0.01), but no significant difference was found

between GLM(HbO) and GLM(HbR) (p = 0.24, Repeated Measures

ANOVA 1-way with Fixed Effect: “Analysis Method”). Also, the indi-

vidual standard errors of the mean yielded by the LDA are significantly

lower than those achieved by the GLM(HbO) and GLM(HbR) (p=0.021

and p=0.022, respectively), but no difference was found between those

yielded by GLM(HbO) and GLM(HbR) (p=0.97). (right) Classification

accuracies computed on two subsets of the real-resting-state datasets,

one completely free from Mayer-wave oscillations and the other one with

all the HRFs tainted by double-bumps. For the LDA, there is no signif-

icant difference between the accuracies reached in presence and absence

of Mayer waves (paired t-test, p = 0.44), while for the GLM the differ-

ence was statistically significant (GLM(HbO), p < 0.001, GLM(HbR),

p < 0.001) Note: the analysis, both with LDA and with GLM, was con-

ducted on the two datasets (with and without Mayer waves) independently 69

xix

3.8 Distribution of individual classification accuracies within the two hair

color classes (six blond and nine brown). The accuracies reached by

the GLM(HbR) are significantly higher for blond-haired subjects than

brown-haired. More distributions for the other modelled effects can be

found in the Appendix. The central red marks represent the median

values, the blue boxes extend from the 25th to the 75th percentiles,

and the black whiskers extend to the most extreme data points not

considered outliers (which are marked with red crosses). . . . . . . . . . 70

3.9 Classifier results for the finger-tapping experimental data, for the three

different analyses. GLM t-statistic values and LDA classifier outputs

(the latter derived from application of the separating hyperplane for-

mula) are thresholded at p≤0.05. Blank cells indicate non-significant

values (i.e., that the corresponding channel was classified as not active).

The individual minimum value for statistical significance for the results

of the LDA classifier varied across channels, ranging from 0.12 to 1.18.

The numbers of channels classified as active by the three analyses are

significantly different (p = 0.01, 1-way repeated measures ANOVA). . . 71

3.10 Topographic images and block averages for all the analyses on Subject 1 73

3.11 Block-average data for Subject 1, Channel 16. On the left the plot of

the averaged signal is accompanied by the plot of the model used by the

GLM analysis, namely a canonical HRF with peak time = 6s. On the

right, the same plot is accompanied by the plot of an example of average

of resting state trials against which the task trials are classified. . . . . 74

3.12 Block-average data for Subject 2, Channel 5. . . . . . . . . . . . . . . . 74

3.13 Topographic images and block averages for all the analyses on Subject 2 75

4.1 Structure of the experimental design . . . . . . . . . . . . . . . . . . . 85

4.2 Bar plots of activation values averaged across conditions, channels (32,

33, 44, 45 ,46) and subjects within each group; bar plots are reported

for both hemoglobin forms to show that, although the test did not prove

statistically significant on HbO values, they nevertheless mirror the HbR

results, in that they both indicate a higher hemodynamic response in

these RH channels in Pre-Readers as compared to Readers. . . . . . . . 91

4.3 (center) Scalp plot of F-values resulting from the ANOVA test on the

results of the Linear-Mixed Effects model on HbR beta values. From

this analysis, a main effect of Group resulted statistically significant for

channels 32, 33, 44, 45 and 46, whose block averages are reported around

the scalp plot. The plots confirm that in these channels, hemodynamic

activation is greater in Pre-Readers than in Readers. . . . . . . . . . . 93

5.1 Grand averages of the epoched timeseries; timeseries were averaged

across trials, channels and finally subjects. Errorbars represent the stan-

dard error of the mean across subjects. The shaded area of the plots

represent the period of stimulation of 7 seconds, during which partici-

pants passively listened to the auditory stimuli (Words or Non Words). 101

5.2 (Top) Features are extracted from each block of HbO and HbR time-

series as the positive and negative peak value, respectively, within the

time window [5-10] seconds after the presentation of the stimulus. Fea-

tures were normalized by removing their mean value and dividing by

their standard deviation. (Bottom) Distribution of normalized features

for each class, Word and Non Word. . . . . . . . . . . . . . . . . . . . . 103

5.3 Examples of how individual classification accuracies were statistically

assessed. In Panel Aan example of individual performances that did

not reach statistical significance, while in Panel Bthe individual clas-

sification accuracy of 76% was higher than the 95th percentile (75%),

obtaining a p-value of 0.049. . . . . . . . . . . . . . . . . . . . . . . . . 105

5.4 (Left) Distribution of classification performances across all subjects and

all channel groups. (Right) The performances of the two algorithms did

notsignificantlydiffer. ........................... 107

5.5 Classification accuracies achieved by LDA and SVM within the channel

subsets defined in Table 5.2 . . . . . . . . . . . . . . . . . . . . . . . . 108

5.6 In none of the three employed channel groups was the correlation be-

tween Reading Scores and Classification Accuracies statistically signifi-

cant at p=0.05; nevertheless, a pattern can be qualitatively detected of

negative and positive correlations in the RH and LH, respectively. . . . 109

xxi

5.7 Effect of removing the effect of Age from the original Reading Scores,

using Partial Least Squares (PLS) regression. The dashed lines represent

the least squares correlation lines (r= 0.67 for the original scores, r= 0

for the “corrected” scores). . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.8 Boxplots of individual classification accuracies within the two groups

Pre-Readers vs Readers; the central red line in each box indicates the

median; the bottom and top edges of the box indicate the 25th and 75th

percentiles, respectively. The whiskers represent the whole range of ob-

served values. In the RH, the unpaired t-test revealed a significant group

difference, with Pre-Readers achieving better accuracies than Readers

(60% vs 50%,respectively). ........................ 111

5.9 Results of within-groups classification. Classification accuracies obtained

using trials from Readers are not statistically significant for any of the

channel subsets, compared to their corresponding null distributions. For

the Pre-Readers, accuracies are marginally significant when using whole-

brain channels and more so when using RH channels, but still they are

not high (53.2% and 54.1%, respectively). . . . . . . . . . . . . . . . . 112

5.10 Boxplots of within-groups classification accuracies, for the three selec-

tions of channels; the central red line in each box indicates the median;

the bottom and top edges of the box indicate the 25th and 75th per-

centiles, respectively. The whiskers represent the whole range of ob-

served values. The unpaired t-tests confirm a significant difference be-

tween the two groups when using channels from the RH and also Whole-

Brain; however, the mean classification accuracy does not exceed 54.1%. 113

A.1 Distributions of classification accuracies grouped by Time of Measure-

ment. The central red marks represents the median values, the blue

boxes extend from the 25th to the 75th percentiles, and the black whiskers

extend to the most extreme data points not considered outliers (which

are marked with red crosses) . . . . . . . . . . . . . . . . . . . . . . . . 124

A.2 Distributions of classification accuracies grouped by Gender . . . . . . 124

xxii

A.3 (Left) Effect of subject age on classification accuracy, according to Lin-

ear Mixed Effects analysis. Partial-dependency line for each method is

superimposed on the individual-subject data points (Right) Coefficients

estimated by the LME for Predictor: Age, Response Variable: Classifi-

cation Accuracy. None of the methods has a statistically significant age

dependence on accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . 125

A.4 Uncorrected p-values resulting from the 3 different analyses (GLM on

HbO traces, GLM on HbR traces, LDA-based classification using com-

bination of HbO and HbR features). Cells highlighted in red indicate

the channels most likely covering the motor cortex. Green highlight-

ing indicates channels that are classified as “active” at p=0.05. Empty

(dark-gray highlighting) cells indicate channels with poor data quality,

which were therefore discarded from the analysis. . . . . . . . . . . . . 126

A.5 Plots of GLM and LDA results- Subject 1 . . . . . . . . . . . . . . . . 127

A.6 Table of GLM and LDA results- Subject 1 . . . . . . . . . . . . . . . . 128

A.7 Plots of GLM and LDA results- Subject 2 . . . . . . . . . . . . . . . . 129

A.8 Table of GLM and LDA results- Subject 2 . . . . . . . . . . . . . . . . 130

A.9 Plots of GLM and LDA results- Subject 3 . . . . . . . . . . . . . . . . 131

A.10 Table of GLM and LDA results- Subject 3 . . . . . . . . . . . . . . . . 132

A.11 Plots of GLM and LDA results- Subject 4 . . . . . . . . . . . . . . . . 133

A.12 Table of GLM and LDA results- Subject 4 . . . . . . . . . . . . . . . . 134

A.13 Plots of GLM and LDA results- Subject 5 . . . . . . . . . . . . . . . . 135

A.14 Table of GLM and LDA results- Subject 5 . . . . . . . . . . . . . . . . 136

A.15 Plots of GLM and LDA results- Subject 6 . . . . . . . . . . . . . . . . 137

A.16 Table of GLM and LDA results- Subject 6 . . . . . . . . . . . . . . . . 138

A.17 Plots of GLM and LDA results- Subject 7 . . . . . . . . . . . . . . . . 139

A.18 Table of GLM and LDA results- Subject 7 . . . . . . . . . . . . . . . . 140

xxiii

A.19 Table of all the experimental results. Uncorrected p-values resulting

from the 3 different analyses (GLM on HbO traces, GLM on HbR traces,

LDA-based classification using combination of HbO and HbR features).

Highlighted cells (red) indicate the channels most likely covering the

motor cortex. Highlighted cells (green) indicate p<0.05. Highlighted

cells (grey) with missing values indicate channels with poor acquisition

quality and therefore discarded from the analysis. . . . . . . . . . . . . 141

xxiv

List of Tables

2.1 Characteristics of different neuroimaging modalities . . . . . . . . . . . 20

3.1 Results of the linear model fitted to the individual classification accura-

cies, with fixed effects: Age, Hair Color, Gender and Time of Measurement 70

4.1 Participant’s summary information . . . . . . . . . . . . . . . . . . . . 83

4.2 The X, Y, and Z columns represent median MNI coordinates across

subjects. Anatomical labelling of the MNI coordinates was conducted

using the Talairach atlas. The last column lists the atlas-based proba-

bility that the channel coordinates are within that anatomical location.

In this table only the Brodmann area with the highest probability is

displayed; the complete table is available in the Appendix. . . . . . . . 87

4.3 Channels showing a significant (p≤0.05) effect of Group, Stimulus

Type or Stimulus Type x Group, before FDR correction. The formula

employed for the model was ’beta ∼stim*group + (stim|id)’; categorical

variables were coded with the “effects” coding, that means that the

sum of the dummy coefficients amounts to zero. This is also known

as deviation coding and it is appropriate for testing main effects and

interactioneffects[123]........................... 90

4.4 Results of two-sided t-tests on HbR activation values between Pre-Readers

and Readers in channels 32, 33, 44, 45 and 46. The negative t-values

indicate that the mean of activation in Pre-Readers is significantly lower

than the mean of activation in Readers, in each of the tested channels. 91

5.1 Participant’s summary information . . . . . . . . . . . . . . . . . . . . 100

xxv

5.2 Classification was performed on brain patterns distributed over the whole

brain, plus two hemispheric macro-regions, that were identified as groups

of adjacent anatomically-defined regions of interest. . . . . . . . . . . . 104

5.3 Results of the LME models performed within each channel group (’Accuracy ∼

ReadingScores + (1|P articipant)’).................... 108

5.4 Results of unpaired t-tests between groups (Pre-Readers vs Readers),

divided at ReadingScore=20 (degrees of freedom = 63). . . . . . . . . . 110

A.1 The X, Y, and Z columns represent median MNI coordinates across sub-

jects. Anatomical labelling of the MNI coordinates was conducted using

the Talairach atlas. The last column lists the atlas-based probability

that the channel coordinates are within that anatomical location. The

Brodmann area with the highest probability is highlighted in bold. . . . 147

xxvi

Chapter 1

Introduction

1.1 The use of functional Near-Infrared Spectroscopy

in language research

Functional Near Infrared Spectroscopy (fNIRS) is a relatively recent technique: it

was only in 1977 that Franz Jöbsis reported in his seminal Science article “Noninva-

sive, infrared monitoring of cerebral and myocardial oxygen sufficiency and circulatory

parameters” [1] that biological materials are transparent enough in the near-infrared

region of the spectrum so to allow to observe and record changes in tissue blood volume

and in hemoglobin-oxyhemoglobin equilibrium for research and clinical purposes.

Since then, the technique has experienced a rapid growth in use, thanks to steady

advancements in the development of the technology: while in the early 90s recordings

were conducted mostly with bulky, single-channel devices, nowadays fNIRS devices

are extremely portable, include hundreds of channels, multi-distance measurements to

measure depth-dependent contributions, they are compatible for simultaneous acquisi-

tion of any other measurement and can record at a very high sampling frequency.

As more sophisticated instrumentation has been developed and put to use in cogni-

tive experiments, fNIRS has made important contributions to the understanding of

functional brain activity and higher cognitive functions in virtually every area of neu-

roscientific research. In particular, one of the most prolific areas of application for

fNIRS has been the field of language development in children.

Chapter 1. Introduction

Several aspects are key to this success: the fact that it is completely portable means

that children do not need to be physically isolated from others to participate to an

experiment, but instead they can be setup in a comfortable environment where they

feel at ease; the fact that there are virtually no operating costs makes it possible to

run several sessions on the same child, and this may prove especially useful when the

goal is closely following up the outcome of a treatment or a remediation program. The

fact that it is completely silent allows to administer any kind of auditory stimuli in

completely ecological conditions. Finally, it is completely safe and harmless.

And it is indeed the study of typical and atypical language development in children

that laid the foundations of this doctoral project.

The work presented in this thesis is embedded in the framework of “PredictAble”, a

large multi-disciplinary Marie Skłodowska Curie Innovative Training Network (ITN)

that includes, besides the Technische Universität Berlin and NIRx Medizintechnik

GmbH, also University of Potsdam, University Paris Descartes, University of Barcelona

Pompeu Fabra and University of Jyväskylä.

The work scope of PredictAble is to pioneer an interdisciplinary approach to enhance

the understanding of the cognitive mechanisms that underlie developmental disorders

of spoken and written language, two areas of study that have been traditionally con-

sidered separately, and to do so by making the best use of the recent technological

developments, such as fNIRS.

In this context, this doctoral project was carried out at the Technische Universität

Berlin and at NIRx Medizintechnik, one of the leading manufacturers of fNIRS devices.

The role of this work in the grant was to expand and optimize the tools available for

the analysis of fNIRS data, in order to render them suitable for use with very young

children, with the ultimate goal of developing diagnostic tools to detect early risks for

language-related impairments.

In fact, despite the dramatic increase in the use and application of fNIRS, an ad-

hoc standardized data analysis procedure is still lacking; there is no general consensus

around the best set of pre-processing steps to be performed, or around the correct chan-

nel exclusion criteria [2]; furthermore, there is no standard choice of which hemoglobin

1.1. The use of functional Near-Infrared Spectroscopy in language research

component should be used to infer functional brain activity: while in principle func-

tional activation should manifest with a concurrent increase in oxygenated-hemoglobin

and a decrease in deoxygenated-hemoglobin, it is oftentimes the case that significant

changes in response to a certain task are only found in one chromophore and not the

other; this can happen for many different reasons, but it is clear that, without an es-

tablished guideline, a serious issue of reproducibility arises: the same study may draw

different conclusions when considering only results found on oxy-, or deoxy-hemoglobin,

or a combination of the two [3]. As for the statistical analysis, the last step of the anal-

ysis chain, methods reported in literature mostly involve time series averaging and/or

the use of methods borrowed from the fMRI practice, like Statistical Parametric Map-

ping (SPM) [4], which proved suboptimal for the analysis of fNIRS data [5]. Recently,

a more advanced and fNIRS-tailored General Linear Model has been proposed that

overcomes some of the drawbacks of SPM [6]. Nevertheless, all of these methods are

most commonly applied to one channel at a time and to one hemoglobin component

(univariate analysis). Recently, effort has been made in the direction of defining more

sophisticated methods that involve classification of multi-channel patterns of hemody-

namic activity by means of machine learning algorithms [7][8].

This new direction could represent a turning point in the fNIRS data analysis, since it

could help mitigate many of the limitations of the aforementioned methods [9]: concur-

rent variations of both hemoglobin components could be considered in a multi-variate

fashion, with the potential of increasing the sensitivity of the analysis while removing

the ambiguity associated with choosing one or the other chromophore; furthermore,

in this framework, the natural inter- and intra-subject variability of the hemodynamic

responses, that for any model-based analysis constitutes a problem, could be accounted

for in what might be called a self-referencing scheme; finer effects that are distributed

across multiple channels could be detected, even when they don’t achieve statistical

significance at single-channel level. For these reasons, multi-variate analysis approaches

based on the use of machine learning have the potential of offering novel insights while

overcoming many technical problems related to the current analysis practice, and rep-

resent the object of study of this doctoral project.

Chapter 1. Introduction

1.2 Motivations and outline of this thesis

This thesis reviews the options available for the analysis of fNIRS data and describes

the special consideration that need to be taken into account - especially when working

with children.

While reviewing the opportunities offered by the application of exquisitely univariate

analysis approaches, this work wants to promote the appreciation that the fNIRS sig-

nal’s richness of features may open the way for the discovery of subtle effects, by means

of multivariate analysis approaches.

To pursue this goal, this work was developed in two steps: the first step includes the

quantitative comparison, under controlled conditions, of a univariate and a multivariate

analysis approach. This step was crucial in that it allowed to precisely quantify in

terms of classification accuracy the advantage of using a multivariate approach over a

univariate.

The second part of this work was carried out on experimental data collected at Haskins

Laboratories (New Haven, CT, USA), a secondary partner of the PredictAble network;

Haskins Laboratories is a leading institution in the study of typical and atypical de-

velopment of spoken and written language in children.

Here, a large dataset of fNIRS measurements was collected on beginning and skilled

readers; we applied both univariate and multivariate analyses on this data to explore

the effect of literacy acquisition on the functional organization of the brain and to

investigate the possibility of developing a “literacy predicting tool”. This allowed to

quantitively and qualitatively compare the results yielded by the two approaches.

The thesis is structured in the following way:

•Chapter 2 introduces the fundamentals about the physiological origin of the

fNIRS signal, describes the instrumentation for signal acquisition and how the

methodology compares to other neuroimaging techniques like fMRI and EEG;

lastly, it describes the most common steps that need to be taken for the pre-

processing of the data and outlines the concepts behind univariate and multivari-

ate approaches for the data analysis.

1.2. Motivations and outline of this thesis

•Chapter 3 describes the first step of this research project, namely the quan-

titative comparison between the univariate analysis, performed by applying a

General Linear Model (GLM) and a multivariate approach involving the use of

Linear Discriminant Analysis (LDA) to classify single channels as “active” or

“not active”. To this end, both synthetic and experimental data; synthetic data,

namely data with known ground-truth, allowed to obtain a quantitative compar-

ison; experimental data allowed us to further characterize the reliability of the

proposed multivariate approach. This study was published in [9].

•Chapter 4 introduces a literacy study in which participants of various ages and

literacy abilities were presented with purely auditory stimuli and illustrates the

application of a channel-wise General Linear Model to detect group differences

between beginning Readers and skilled Readers but also to detect a differential

activation to non-words depending on the literacy levels. Such univariate analysis

failed to expose the latter effect, and therefore another analytical approach was

required. This work is being prepared for publication [10]

•Chapter 5 extends the work of Chapter 4 and presents the implementation

of a within-subject neural decoder and its application for the classification of

multivariate distributed patterns of hemodynamic activations in response to two

different classes of stimuli: words and non-words; in doing so, it compares the

performances of Linear Discriminant Analysis and Support Vector Machines; in-

dividual classification accuracies were correlated with reading scores, with the

expectation that high accuracies would indicate an increased effort in processing

non-words, and therefore would be associated to poor reading scores. Such a

correlation was found but it was not statistically significant. Finally, the Chap-

ter discusses possible reasons for the underwhelming results. Preliminary results

of this project were published in [11] and final results are being prepared for a

journal publication [10].

•Chapter 6 concludes the thesis, provides an overview of the findings of each

single step of this doctoral work, represented by the three research chapters, and

discusses the remaining limitations and future directions of this work.

Chapter 1. Introduction

1.3 Publications

1.3.1 First-author publications

The work in this thesis has been published in peer-reviewed journals and conferences

and follows the publications listed below:

Peer-Reviewed journals

1. Gemignani, J., Middell, E., Barbour, R. L., Graber, H. L. and Blankertz,

B. (2018) Improving the analysis of near-infrared spectroscopy data with mul-

tivariate classification of hemodynamic patterns: A theoretical formulation and

validation, Journal of Neural Engineering, 15(4)

2. Gemignani, J., Blankertz, B., Aslin, R. N. and Pugh, K. R. Observing the

effects of literacy acquisition on the brain with fNIRS. (2019) Neuroimage (in

preparation)

Peer-Reviewed Conferences

3. Gemignani, J., Bayet, L., Kabdebon, C., Blankertz, B., Pugh, K. R. and Aslin,

R. N. (2018). Classifying the mental representation of word meaning in children

with Multivariate Pattern Analysis of fNIRS. Engineering in Medicine and Bi-

ology Society (EMBC) 2018 40th Annual International Conference of the IEEE

(pp. 295-298). IEEE.

Abstracts

4. Gemignani, J., Middell. E., Blankertz, B. Investigating The Impact Of Indi-

vidual Differences On The Accuracy Reached By Univariate And Multivariate

Analysis of fNIRS Measurements, FENS (2018, July, Berlin)

1.3. Publications

5. Gemignani, J., Middell, E., Blankertz, B. Improving the analysis of NIRS data

with multivariate classification of hemodynamic patterns, Joint Italian-French

Workshop on fNIRS, Politecnico di Milano (2018,June, Milan)

6. Gemignani, J., Middell, E., Blankertz, B. From Univariate to Multivariate

Analysis of NIRS data. Neuroadaptive Technology Meeting (2017, July, Berlin)

1.3.2 Additional contributions

Additional publications in peer-reviewed journals and conferences that were co-authored

and/or that are not strictly related to the topic of the dissertation are listed below:

Peer-Reviewed journals

7. Fisher, S. P., Cui, N., McKillop, L. E., Gemignani, J., Bannerman, D. M.,

Oliver, P. L., Peirson, S. N. and Vyazovskiy, V. V. (2016). Stereotypic wheel

running decreases cortical activity in mice. Nature Communications, 7, 13138.

Peer-Reviewed Conferences

8. Bartkowski, C. H., Gemignani, J., Barbour, R. L., Lang, K. D., von Krshi-

woblozki, M., Vieroth, R., Dils, C., Jung, E. and Schmitz, C. H. (2019) Vari-

able Source Emitter, Smart Textile Headgear Design for Functional Near-Infrared

Spectroscopy. Engineering in Medicine and Biology Society (EMBC) 2019 41th

Annual International Conference of the IEEE

Abstracts

9. Gemignani, J., Barbour, R. L., Middell, E. and Schmitz, C. H. Optode de-

sign for improved optical efficiency and fast setup time on hairy head regions,

The Society for Functional Near Infrared Spectroscopy, Biennial Meeting (2016,

October, Paris)

Chapter 2

Physical principles and applications

of functional Near-Infrared

Spectroscopy

2.1 Physiological and biological principles of fNIRS

2.1.1 Vasculature and hemodynamics of the brain

The regulation of the blood flow in the human brain is complex and builds upon

several paradigms and components [12]. In particular, there are three main regulatory

mechanisms to consider: cerebral autoregulation, which is the process whereby cerebral

arterioles maintain a constant blood flow in response to changing perfusion pressure;

flow-metabolism coupling, which refers to the capability of the brain to change the

blood flow to match the metabolic demands; neurogenic regulation, which refers to the

role of the extensive network of perivascular nerves in controlling the flow. Out of these

three mechanisms, the most relevant from the neuroimaging point of view is definitely

the flow-metabolism coupling.

The tight correlation between brain metabolism and cerebral blood flow (CBF) was

suggested already more than a century ago [13]. The main processes that contribute to

the high energy need of the brain are the maintenance and restoration of ion gradients

dissipated by signaling processes, like postsynaptic and action potentials, as well as

the use and recycling of neurotransmitters [14].

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

The main energy source in the brain is glucose, and its oxidation is the main ATP

(adenosine-tri-phosphate) producing mechanism 1. While there are small amounts of

glucose stored in the brain, there are no stores of oxygen, that must therefore be carried

to the site of neuronal activation on demand. This demand is met by the surrounding

vasculature: neuronal activation leads to arteriolar dilation, so that CBF increases and

ATP can be produced [15][16].

2.1.2 The hemodynamic response

The relationship between cerebral metabolic rate of oxygen consumption (CMRO2)

and CBF is at the basis of functional neuroimaging. As said, the metabolic demand

induces a local increase in CBF. The quantity of oxygen that is in fact removed from

the blood is the arteriovenous oxygen difference, and it can be termed as Ca– Cv,

where Cais the oxygen concentration of arterial (oxygenated) blood and Cvis the

oxygen concentration of venous (deoxygenated) blood. The arteriovenous difference

can be defined as [16]:

CMRO2

CBF =Ca −Cv (2.1)

In an activated region, CBF increases, so the difference Ca-Cvbecomes smaller. Since

the blood is almost fully oxygenated at the arterial end, Ca-Cvcan become smaller only

if venous oxygenation is increased. In other words, the metabolically driven increase of

local CBF is disproportionally larger than the increase of CMRO2and this results in a

local hyperoxygenation that can be observed as an increase of oxygenated hemoglobin

(HbO) accompanied by a corresponding decrease in deoxygenated hemoglobin (HbR)

at the venous end of the capillary [17]. This is called hemodynamic response and it can

be usually observed with a latency of approximately [4-6] seconds after the onset of the

stimulus that induced the activation [18]. When the neural activation is terminated,

CBF decreases and both HbO and HbR return to their baseline values.

These concentration changes can be observed with functional techniques like fMRI and

fNIRS. While fMRI relies on the oxygen-dependent magnetic properties of hemoglobin

(Section 2.2.2), fNIRS, in turn, relies on the oxygen-dependent optical properties of

1To oxidate one mole of glucose , 6 moles of oxygen are needed, according to the stechiometry of

complete glucose oxidation: C6H12O6 + 6O2 →6CO2 + 6H20

2.1. Physiological and biological principles of fNIRS

hemoglobin. In fact, HbO and HbR show differential absorption spectra in the near-

infrared spectral range (∼650 −950nm), and for this reason, their relative concen-

tration changes can be separately detected by illuminating the brain with specific

wavelengths within this range.

Figure 2.1: Extinction coefficients for HbO (HbO2), HbR (Hb) and

cytochrome oxidase (CtOx). Figure adapted from Delpy and Cope, 1997

[19].

2.1.3 Instrumentation

From the technology point of view, fNIRS is most commonly implemented in one of

three spectroscopic methods: Continuous Wave (CW-NIRS), Frequency Domain (FD-

NIRS) and Time Domain (TD-NIRS).

Figure 2.2: Different technological approaches for fNIRS

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

•Continuous Wave (CW-NIRS)

The Continuous Wave (CW) method relies on the illumination of tissue with

light of constant amplitude, and the detection of the transmitted light intensity,

allowing therefore to assess the overall light attenuation inside the tissue. This

method cannot differentiate between scattering and absorption effects, and can

therefore only quantify concentration changes relatively to a baseline, rather than

absolute values (see Section 2.1.6 for more details).

•Frequency Domain (FD-NIRS)

In this technique the light source is intensity-modulated in the radio frequency

range (∼100MHz). The propagating waves are then measured by the detector.

This allows FD instruments to extract, besides the intensity attenuation, also

two more independent quantities: a phase shift (φ) and the decay of modulation

depth (ratio of AC to DC component). These quantities are affected differently

by absorption and scattering of the tissue, therefore with FD systems it is possible

to distinguish these parameters and compute absolution concentration levels.

•Time Domain (TD-NIRS)

Time Domain measurements utilize picoseconds pulses of photons to irradiate

the tissue and fast responding detectors to register the shape of the light pulse

as it exits the tissue. The properties of the received photon distribution such as

area under the curve, the time of maximum, and width allow assessment of the

tissues absorption and scattering properties.

In terms of costs and complexity of the instrumentation, FD and TD systems are

definitely more expensive and cumbersome than CW systems, which on the contrary

can be miniaturized and made portable. Therefore, since quantification of absolute

values is often not essential for most neuroscientific applications, the great majority of

the fNIRS-based research utilizes CW systems.

2.1.4 Acquisition of the signal

To acquire the fNIRS signal, light sources and detectors are necessary. A source-

detector pair constitutes a “channel”, and the difference between the light emitted

2.1. Physiological and biological principles of fNIRS

through the source and that measured at the detector location represents the changes

of the optical properties of the tissue underlying that channel.

Light in the near-infrared spectral range can propagate relatively deeply into the bio-

logical tissue, including the skull, because in this range light is only weakly absorbed

by it (Figure 2.1). The selection of the precise wavelengths depends on several factors

[20] but the pair 780 and 830 nm is most often used.

When light is emitted through the source, it diffuses in all directions through the

tissues of the head (scalp, skull, cerebrospinal fluid); part of the light is scattered,

part is absorbed by the chromophores. Although there are several chromophores that

absorb light, the only ones that exhibit oxygenation-dependent absorption in the NIR

region are HbO, HbR and cytochrome oxidase (CytOx), found in the cell mitochondrial

membrane and involved in the oxidative metabolism of glucose.

On the other hand, the predominant factor of light transport in tissues is scattering,

since the transport scattering coefficient µsof most tissues is much larger than their

absorption coefficient µa, in the NIR region [19]. This introduces an unknown light

loss but also results in a non-linear relationship between absorption changes and the

resulting attenuation changes.

The fact that light is both absorbed and scattered coupled with the fact that the trav-

eled tissues are inhomogeneous and anisotropic make it difficult to exactly determine

the path followed by the light.

However, if we simplify the model and we assume the medium is homogenous, then

we can model this path by only taking into account the phenomena of absorption

and scattering. Several studies by Okada et al. were conducted on the propagation

of light in simplified media (homogeneous or semi-homogeneous) and it was possible

to formulate the Photon Measurement Density Function, or as it is most commonly

called, the “banana-shaped” path [21, 22, 23]. This function is a three-dimensional

model representing the probability that a photon travels through a certain optical path

and allows therefore to determine the brain volume sampled by a given source-detector

pair. The depth reached by the photons depends on the source-detector distance and

it is usually approximately one half of it [24].

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

Figure 2.3: Each source-detector pair constitutes a NIRS "channel";

the light emitted from the source and measured by the detector follows

the probability model given by the banana-shaped path. This path

represents the probability of sampling a given optical path. In this case,

the pairs source1-detector2 and source2-detector1 allow penetration of

the light into the cortex, while the pairs source1-detector1 and source2-

detector2 are close enough to only allow penetration through the shallow

layers (short-distance channels). Figure is taken from www.nirx.net

2.1.5 The fNIRS signal: how the raw data looks like

The photon detectors detect light and convert it into a voltage signal, whose amplitude

is proportional to the light intensity. The fNIRS signal can be looked at as a combina-

tion of several components, that can be distinguished according to the physical source

of the signal, that can be cerebral or extracerebral, the relation of it with a task, namely

it can be evoked by the task or independent on it, and its physiological origin, that can

be neuronal or systemic, as exemplified in Figure 2.4. This model has been described

by recent works of Tachsidis and Scholkmann ([20][25][26]), among others.

The components of the signal originated in the cerebral compartment can be either neu-

ronal or systemic and can be either evoked by the task or spontaneous (not evoked).

What the experimenter usually aims at is to unambiguously identify the cerebral neu-

ronal component, i.e. the signal resulting from the neurovascular coupling, but the

changes associated with it are relatively small compared to the overall variability of

2.1. Physiological and biological principles of fNIRS

the signal (∼0.5µM for HbO, ∼0.2µM for HbR, compared to ∼1µM for the whole

signal for both HbO and HbR, [27]).

Figure 2.4: Composition of the fNIRS signal. Figure is taken from

Tachsidis and Scholkmann, 2016 [26]

Besides the signal of neuronal origin, the cerebral component also contains a systemic

element, that is sometimes referred to as “physiological noise” and that accounts for

changes in blood pressure, cerebral blood flow, cerebral blood volume, heart rate and

respiration. Although these elements are always physiologically present and spon-

taneous, and therefore non-evoked, it is also true that the execution of a task can

significantly modulate their characteristics, like amplitude and frequency.

All these components are reflected and can be clearly observed in the rich frequency

spectrum of the fNIRS signal, that shows clear contributions by the heartbeat (∼1 Hz),

the respiration (∼0.3 Hz), the low-frequency oscillation also known as “Mayer wave”

(∼0.1 Hz) and very-low-frequency oscillations of different origin (< 0.1 Hz) (Figure

2.5)

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

Figure 2.5: (Top and bottom left): Example of the raw signal of

light intensity. In this timetrace the different components (heartbeat,

respiration, Mayer waves) are clearly recognizable, as well as a few spike

artifacts, probably caused by the displacement between one of the op-

todes and the skin. (Bottom right) Typical frequency spectrum of the

fNIRS signal

2.1.6 From light intensities to concentration changes: the mod-

ified Beer-Lambert Law

After the light has been measured by the photon detector and converted into a voltage

signal, it is necessary to proceed and convert this raw signal into relative concentrations

changes of the chromophores of interest. This is made possible by the modified Beer-

Lambert Law.

Here we will focus only on the case where two wavelengths are used and the chro-

mophores of interest are HbO and HbR, but of course, the same math applies when

more wavelengths are used: in that case, it is possible to resolve the concentration

changes of more chromophores, like that of the cytochrome oxidase.

2.1. Physiological and biological principles of fNIRS

The way light is transported through a homogeneous, non-scattering tissue, is modelled

by the Beer-Lambert Law:

I(λ) = I0(λ)·e−µa(λ)·L(2.2)

where I0(λ) is the intensity of light emitted by the source, I(λ) is the intensity of

light detected by the detector, µa(λ) is the absorption coefficient of the medium at

the wavelength λ; L is the optical pathlength and it represents the distance travelled

by the photons through the medium. The ratio between I and I0defines the optical

density of attenuation:

OpticalDensity =−log(I

) = µa(λ)·L(2.3)

As said, the Beer-Lambert equation would hold true in case the medium was non-

scattering, so when the light is only absorbed. But when light undergoes scattering,

the distance travelled by the photons is no longer L, because the path followed by the

photons becomes random, and photons will take more time to exit the tissue, and they

will have more probability to be absorbed.

Furthermore, scattering introduces a certain amount of losses, which depends on the

geometry and on the scattering coefficients of the involved tissues. The modified Beer-

Lambert Law (mBLL) was formulated to take account scattering:

OpticalDensity =µa(λ)·d·DP F (λ) + G(λ)(2.4)

The optical pathlength, L, is replaced by d·DP F (λ), where d is the distance between

source and detector and DPF is the Differential Pathlength Factor, that accounts for the

increased path followed by the photons as a result of scattering. The term d·DP F (λ)

corresponds to the mean light propagation distance in the medium; basically, the DPF

is a scaling factor that indicates how many times more than d the detected light has

travelled. G(λ)accounts for the light losses due to scattering and, as the DP F (λ),

depends on the wavelength λ.

This equation can be used to calculate the chromophores concentrations, because the

absorption coefficient of any chromophore, µa, can be written as the product of its

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

concentration, c, and its extinction coefficient, (λ):

µa=c·(λ)(2.5)

Therefore, the modified Beer-Lambert Law can be written as:

OpticalDensity =c·(λ)·d·DP F (λ) + G(λ)(2.6)

where c is our unknown of interest: the chromophore concentration. In the real-world

scenario, this equation is very difficult to be resolved exactly, because both DP F (λ)

and G(λ)are not constant and depend on the amount of scattering. However, under

the hypothesis of constant scattering, they can be considered constant. Concentration

changes of the chromophores can therefore be calculated by measuring a change in

optical density:

∆OD =ODt−ODt0=∆c·(λ)·d·DP F (λ)(2.7)

Where ODtis the optical density measured at time t and ODt0is the optical density

measured at time t0. If we apply this equation to the two chromophores of our interest,

HbO and HbR, the mBLL becomes:

∆OD = (∆[HbO]·HbO(λ) + ∆[HbR]·HbR(λ)) ·d·DP F (λ)(2.8)

Where HbO(λ)and HbR(λ)are the extinction coefficients of HbO and HbR, respec-

tively, at the wavelength of choice. By using at least two different wavelengths λ1and

λ2it is possible to resolve the two unknowns, ∆[HbO]and ∆[HbR]:

∆[HbO] = HbO(λ1)·∆OD(λ2)

DP F (λ2)−HbR(λ2)·∆OD(λ1)

DP F (λ1)

(HbR(λ1)·HbO(λ2)−HbR(λ2)·HbO(λ1)) ·d(2.9)

∆[HbR] = HbO(λ2)·∆OD(λ1)

DP F (λ1)−HbR(λ1)·∆OD(λ2)

DP F (λ2)

(HbR(λ1)·HbO(λ2)−HbR(λ2)·HbO(λ1)) ·d(2.10)

2.1. Physiological and biological principles of fNIRS

The extinction coefficients are represented in Figure 2.1. As for the DPFs, several

works have been published over the years, attempting at tabulating values for different

ages, tissues and wavelengths.

Figure 2.6: Tabulated DPF values from Essenpreis et al. 1993. The

different timetraces correspond to different subjects (1 female, 6 males,

median age 28 years old). It is clear that, despite the intersubject vari-

ability, the DPF exhibits a descending trend as the wavelength increases.

The DPF does not depend only on the wavelength but also on the absorption and

scattering coefficients of the tissue; in particular, it increases with µs and decreases with

µa. Furthermore, it depends on d, but this only for small d values: for d≥2.5 cm, the

DPF is virtually independent on d [28]. In 1993, Essenpreis et al. [29] determined the

DPFs on the adult head, forearm, calf and infant head, between 740 and 840 nm, using

in vivo and post mortem measurements (Figure 2.6). In a recent paper, Scholkmann

and Wolf [28] introduced a formula to calculate the dependency of the DPF on the age

of the measured subject.

Finally, it is important to point out that CW-NIRS systems are not able to measure

absolute concentration levels of the chromophores, but only concentration changes

relative to a baseline. This is because, although the light emitted from the source

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

in principle could be determined, due to complex boundary conditions not all the

light that is emitted actually enters the tissue. Since CW-NIRS systems are unable to

quantify the optical properties of the tissue, in particular the scattering coefficient, it is

not possible to know the incident light I0and use it in the mBLL. Instead, these systems

use another reference level, a baseline, and express the chromophores’ concentration

changes relative to that.

2.2 Other neuroimaging techniques

fNIRS is only one of the many neuroimaging techniques available. Methods to investi-

gate brain physiology include other techniques that rely on the hemodynamics of the

brain, such as functional Magnetic Resonance Imaging (fMRI) and Positron Emission

Tomography (PET). Also, brain physiology can be explored in terms of electrical ac-

tivity, as it is the case with Electroencephalography (EEG), and magnetic activity, as

it is the case for Magnetoencephalography (MEG). The strengths and limitations of

these methods are listed in Table 2.1.

Property EEG MEG fNIRS fMRI PET

Measurement Electrical

Activity

Electrical

Activity

∆[HbO],

∆[HbR],

∆[HbT ]

BOLD Blood

flow

Temporal Resolution 1 ms 1 ms 100 ms 2-5 s 1 s

Spatial Resolution > cm < cm < cm mm3mm3

Cost €/€€ €€€ €/€€ €€€ €€€

Required Immobility low high low high high

Table 2.1: Characteristics of different neuroimaging modalities

In this paragraph, we will focus on the comparison between fNIRS and its closest

– in terms of outcome – neuroimaging technique: fMRI, and its closest in terms of

setup: EEG. Furthermore, a brief review of the research fields where fNIRS is currently

employed will be presented, with particular attention to research studies including

young subjects and the study of language development in children, the research topic

of the grant under which this doctoral project was developed.

2.2. Other neuroimaging techniques

2.2.1 EEG and fNIRS

The most known neuroimaging technique is probably the Electroencephalography (EEG).

EEG is a well-established modality that allows to non-invasively record electrical ac-

tivity at scalp level, by using electrodes. The electrical activity is a measure of the

flow of extracellular currents produced by the summation of the activity of thousands

of neurons. The contribution of a single neuron cannot, therefore, be resolved at scalp

level. Additionally, the skull is a semi-conductive medium and has a smearing effect on

the signal: all these factors make it hard to determine the localization of the sources

(inverse problem).

Nevertheless, EEG is very useful in that many cognitive processes and states of con-

sciousness are reflected in electrical activity within specific frequency bands. Moreover,

EEG can be used for the study of event related potentials (ERP), that are stereotypical

electric responses resulting directly from a specific cognitive or motor stimulation. From

the technical standpoint, EEG has a poor spatial resolution in comparison to fNIRS

but an excellent temporal resolution: normally the sampling frequency is around 250

Hz but some modern systems can reach rates of 20 kHz, while for fNIRS that can be

at most in the hundreds of Hz. They are both perfectly portable, allowing therefore to

conduct a whole range of studies that require the subject to move freely, for example

motor rehabilitation studies [30].

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

Figure 2.7: EEG, fNIRS and fMRI in terms of spatial and temporal

resolution

Apart from these purely technical considerations, a meaningful comparison between the

two modalities cannot really be drawn: they provide insights on different aspects of the

neurophysiological processes under investigation, carrying complementary information

that can become really meaningful when taken into account together [31][32]: in fact,

it is the case that in recent years the integration of the two modalities has been gaining

increasing interest and recognition, in several research fields. An example is the field of

Brain-Computer Interfaces (BCIs): while until a few years ago BCIs were mostly EEG-

based [33], recently Fazli and colleagues [34] demonstrated that a BCI based on the

simultaneous acquisition of EEG and NIRS can achieve augmented classification per-

formances, by taking advantage of each modality’s individual strength, namely the fast

responsiveness of the EEG and the information about the vascular response provided

by the NIRS signal. In their study, they presented for the first time an EEG-fNIRS

based Sensory Motor Rhythm (SMR) BCI and showed that the combination increased

the classification of 5%, on average, with respect to using only EEG. Also in cognitive

neuroscience simultaneous EEG-NIRS measurements are widely used, for example in

the research areas of language processing [35][36] and epilepsy [37], just to name a few.

2.2. Other neuroimaging techniques

2.2.2 fMRI and fNIRS

fMRI is also a non-invasive technique, that builds on the MRI scanning technology.

The MRI signal comes almost entirely from the protons of water of the tissues (water

makes up for almost 65% of the body weight).

Protons have their own electric charge and have their own spin, i.e. they rotate about

their own axes, resulting in a magnetic dipole that lies parallel to the axis of the nucleus.

Normally, these dipoles are randomly oriented, but during an MRI scan a strong and

static magnetic field B0(typically ≥3Tesla) is applied, causing the dipoles to align

with its direction. The summation of all the magnetic moments of the aligned dipoles

is the magnetization vector M, and has the same direction of B0.

When a brief radio-frequency pulse is applied, with a frequency equal to the resonating

frequency of the rotating protons, then the phenomenon of magnetic resonance occurs:

protons can change energy state and the magnetization vector M can rotate away from

equilibrium. When the pulse terminates, protons go back to equilibrium by releasing

the absorbed energy as a radio-frequency wave, with a process called relaxation, and

producing a measurable signal.

With MRI it is possible to investigate the levels of oxygenation of the brain, thanks to

the different magnetic properties showed by oxygenated and de-oxygenated hemoglobin;

in fact, while oxy-hemoglobin is diamagnetic, and has therefore a little magnetic mo-

ment, deoxy-hemoglobin is paramagnetic, and produces a large magnetic moment.

Large concentrations of deoxy-hemoglobin cause a reduction in the relaxation time of

the tissue and therefore a decrease of the measured signal. Correspondingly, when a

brain area activates in response to a certain stimulus, deoxy-hemoglobin decreases and

the signal increases. This phenomenon is the Blood-Oxygen Level-Dependent (BOLD)

effect, and this is the reason why the fMRI signal is also called BOLD signal[38].

It is fairly easy to recognize that fNIRS and fMRI are two different roads for the

same destination: while fMRI relies on the oxygen-dependent magnetic properties of

hemoglobin, fNIRS uses its oxygen-dependent optical properties. However, the two

methods present some significant technical differences. First of all, the BOLD signal

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

reflects only the contribution of deoxy-hemoglobin, while fNIRS can resolve the con-

centration changes of oxy-hemoglobin too (and also of other chromophores, provided

that the instrument can use more than two wavelengths).

The spatial resolution of fMRI is better than that of fNIRS, that is in the order of cm.

However, the temporal resolution of fMRI is quite poor, and that is the cost for having

such a good spatial resolution: due to the sequential acquisition of slices (30-100 ms

each), it takes about 1-3 seconds (0.5 – 1.3 Hz) to achieve a whole-brain coverage at

the spatial resolution of 3-5 mm [39], [40]. On the other hand, the fNIRS technical

sampling frequency can be much higher, in the order of 100 Hz, depending on the

technology used. However, it should be noted that in both cases of fMRI and fNIRS,

the measured signal is the hemodynamic response, and this is naturally slow, showing

a peak at around 4-6 seconds after stimulus and lasting for at least 20 seconds.

Aside from this technological comparison, practical considerations are really crucial

when it comes to comparing fMRI and fNIRS: first of all, the fNIRS setup is rela-

tively quick and easy, just like the EEG’s, and an fNIRS experiment can be performed

anywhere, while fMRI has some significant physical constraints: the scanner is cumber-

some and can’t be used outside its specific facility; both its purchase and maintenance

are expensive; during an experiment the subject must lay very still inside to avoid the

occurrence of motion artifacts; no one is allowed in the room where the scan takes

place, and the scanner is also very noisy.

Especially the two latter features make fMRI inconvenient for babies and children, as

well as for certain groups of adult subjects, depending on their pathologies or disabil-

ities, and make fNIRS stand out: and fNIRS measurement is completely silent and

allows a whole range of studies involving the administration of auditory stimuli, that

could not be performed during an fMRI scan, at least not in ecological conditions.

Also, because it is silent and can be performed anywhere, it is definitely more tolerable

for children and babies, that can undergo an experiment while being around parents,

for example, if the experiment allows so.

Another field where fNIRS is naturally preferred over fMRI is the field of motor and

rehabilitation studies: whether the goal is measuring the functional effect of a reha-

bilitation task or the simple monitoring of a sustained movement, the portability of

2.3. The use of fNIRS in language research on children

the technique and the tolerance to motion artifacts clearly represent a key advantage

[41][42].

Lastly, fNIRS techniques do not interfere with signals of other nature: this feature

makes it ideal for multimodal integration with other modalities like EEG, as already

mentioned, but also MRI: in this case, it would be possible to acquire structural infor-

mation and localize anatomically the hemodynamic activity measured optically.

2.3 The use of fNIRS in language research on chil-

dren

fNIRS is now used in a wide variety of fields in neuroscience, from neurology [43]

to psychiatry [44], cognitive neuroscience [45], to basic research. Its versatility and

relatively low-cost make it a useful technique for the investigation of many research

questions. For the scope of this dissertation, we will focus on the use of fNIRS on

children for language studies.

As already articulated, fNIRS proves to be particularly convenient for use on children:

from a very practical point of view, children do not necessarily need to be isolated in

order to be tested, as it is the case for fMRI. An fNIRS instrument is easily portable,

therefore an experiment can be conducted in a comfortable environment where the

child can feel at ease. The setup itself is easy and quick, and after the cap has been

prepared the first time, it can be just taken off a subject and put on another subject

with very little preparation needed, especially for children and infants, since they have

little and thin hair.

Additionally, fNIRS is less sensitive than fMRI and EEG to motion artifacts and this

allows to test freely behaving participants. Artifacts can certainly occur, especially

when testing infants, as a result of bad coupling between the optodes and the skin

following an abrupt movement of the head; the result is generally a spike artifact

that is not very destructive of the underlying signal, and in many cases this can be

recovered, provided that appropriate motion artifacts removal algorithms are in place.

For a review and comparison of these algorithms, we refer the reader to [46].

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

Thanks to all these features, fNIRS can be used at any age and it is particularly

employed on newborns; this has made possible to make giant progress in understanding

the mechanisms by which the brain processes language and how this processing evolves

throughout the whole span of the brain development, to the point that was even possible

to infer, from fNIRS measurements at birth, the effects of prenatal experience on later

language abilities and to document an innate hemispheric lateralization for speech

processing [47, 48]. It also made possible to broadly localize the functional areas linked

to the perception of other kinds of stimulation, like music, odours, particular visual

patterns.

For a review on the use of fNIRS in developmental language research we recommend

[49, 50, 51].

It is therefore clear that the silenceness of the technique proves extremely useful in

language research, because allows the presentation of auditory stimuli in a non-intrusive

environment.

Auditory stimuli are not used only in the context of developmental neuroscience of

language, but also to the investigation of literacy abilities: in fact, findings suggest

that the very way our brain processes auditory information is affected by the degree

of literacy, since learning to read strengthens the neural pathways associated with

phonological processing skills, that are otherwise weakened in illiterate subjects [52].

2.4 Analysis of fNIRS data

As already pointed out in a previous section, a raw fNIRS timetrace is a voltage signal

representing the light intensity measured by each detector. This timetrace includes the

contribution of several phenomena that are reflected in hemoglobin relative concentra-

tion changes, and the cognitive processes underlying the specific research question is

only one among them. Furthermore, raw data may contain motion artifacts, in the

form of shifts and spikes. All these considerations, related to quality and to physiologi-

cal confounds, needs to be dealt with before starting the statistical analysis of the data.

Therefore, we review in this section the most relevant state of the art pre-processing

steps.

2.4. Analysis of fNIRS data

2.4.1 Reduction of motion artifacts

Motion artifacts are the result of a momentary loss of optical coupling between the

scalp and the optodes, resulting in a brief and sharp wave superimposed to the signal;

even when the coupling is restored, it is invariably different from before, and this results

in a shift of the level of the signal reflective of a change of the baseline caused by the

altered coupling.

The constant advances in hardware provide increasingly efficient ways to prevent this

uncoupling from occurring; this is the case for example of spring-loaded fiber holders

from NIRx (Figure 2.8).

Figure 2.8: Spring-loaded fiber holders. The spring keeps the tip of the

optode at constant pressure, reducing the possibility of loss of contact

between the tip and the scalp.

Nevertheless, motion artifacts still occur; many methods have been proposed to reduce

and correct motion artifacts. Some of them require additional hardware, for example

a short-distance fNIRS channel or an accelerometer, but the majority of the proposed

methods seek to detect artifacts by relying only on changes of amplitude and frequency

of the data corresponding to the occurrence of an artifact. Among these approaches, we

will briefly describe the most popular ones. For a more complete review, we recommend

the reader [46, 53].

Principle Component Analysis (PCA)

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

The correction of motion artifacts via PCA was first proposed by Zhang et al [54]. The

idea behind it is that motion artifacts account for a large proportion of variance of the

data, due to their large amplitude; therefore, when decomposing the data into uncor-

related principal components, they should be represented in the first M components,

since these are organized in descending order of explained variance. Therefore, motion

artifacts can be removed by removing the first M components and reconstructing the

“clean” signal.

Wavelet Filtering

The motion artifact correction based on the use of wavelets was introduced by Molavi

and Dumont [55]. The timetrace of each channel is decomposed into L levels by ap-

plying a discrete wavelet transform; for each level, a series of coefficients are obtained

for detail and approximation. The proposed method relies on the assumption that the

measured signal is a linear combination of the signal of interest plus motion artifacts

and that the distribution of the detail coefficients accounting for the signal of interest is

centered around zero and has low variance, while the detail coefficients corresponding

to artifacts will appear as outliers in this distribution; therefore, a threshold of variance

can be set on the probability distribution and coefficients exceeding the threshold are

set to zero. The timetrace can be then reconstructed with the inverse discrete wavelet

transform.

Spline Interpolation

The method based on spline interpolation was first proposed by Scholkmann et al [56].

In this procedure, a moving window is employed for the calculation of the standard

deviation of the signal; a motion artifact is identified if, within the window, the standard

deviation is greater than a certain threshold, predefined by the user. The timetrace is

segmented and each segment containing an artifact is spline interpolated. Finally, the

timetrace is reconstructed.

Transient Artifacts Reduction Algorithm (TARA)

TARA was introduced by Selesnick et al [57]. The method models the timeseries as a

combination of a low-pass signal, an artifact of each type (spikes and shifts), and white

noise. The estimation of the different components is formulated as an optimization

problem. After shifts and spikes are estimated, they are removed from the signal.

2.4. Analysis of fNIRS data

2.4.2 Physiological confounds

As already described, the fNIRS signal of interest is significantly contaminated with

global components arising from superficial layers of the scalp. A certain amount of

these component is task-independent, and could therefore be modelled, in principle;

however, the contribution to the signal arising in the superficial-layers is also modulated

by the task, to some extent. This happens because, depending on the type of stimuli

that are administered, they can produce significant changes in systemic variables, like

heart-rate, blood pressure, respiration rate.

Many algorithms have been proposed to overcome this issue but one of the main lim-

itations of any of them is that they rely uniquely on theoretical assumptions on the

features of the confounding signal. For example, several studies proposed to decom-

pose the signal via PCA and reconstruct the signal of interest by removing the first N

components, under the assumption that they contain mainly the global contribution

of the superficial layers [54, 58]. This kind of assumptions may yield adequate results

when dealing with motion artifacts, since they are discrete events in the data, but they

are at high risk of approximation when it comes to estimating the different sources of

the signal and the extent of contribution given by each of them.

While this issue is definitely still an open research question, it is certainly true that the

best way forward is to combine fine analysis tools with appropriate hardware means,

for example multi-distance measurements.

Multi-distance channels allow to measure the signal at different depths, and therefore

to obtain a signal from the superficial layers (Figure 2.9). The optimal short-distance

separation was recently determined to be 8.4 mm for adults and 2.15 for term age

infants [59].

Short-distance signals can then be regressed out from the whole signal. To this end,

several methods have been introduced: one possibility is to employ them during the

first-level statistical analysis as additional regressors of a General Linear Model (GLM)

[60]; another approach is to use them in combination with Independent Component

Analysis (ICA) [61].

In any case, the ideal amount of short-distance measurement to be removed from

the corresponding long-distance measurement is still an open question. Saager and

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

Figure 2.9: Multi-distance measurement; by placing a detector at a

shorter distance, light measured by that detector will only come from

the shallow layer

Berger [62] proposed a least-squares minimization procedure to fit the short-distance

measurement to the long-distance measurement, and to use the result of the fit as a

weighing coefficient.

Figure 2.10: Example of application of the algorithm proposed by

Saager and Berger [62]. On the right side of the figure, it is clearly

visible that the ∼0.1 Hz fluctuations ("Mayer waves") are removed with

the short-distance correction

2.4. Analysis of fNIRS data

2.4.3 Frequency filtering

Frequency filtering is the most straightforward method to eliminate components of

the signal that are not of interest, for example those associated with the heart beat

(∼1 Hz) and the very-low frequency fluctuations (drifts <0.1 Hz). For this reason,

the use of band-pass filters has been widely used in literature, usually with lower cut-

off around 0.01 Hz and upper cut-off around 0.5 Hz; of course, the frequencies being

removed should not include the “frequency of stimulation”, i.e. the frequency at which

hemodynamic activation is expected based on the stimulation timings: for example,

with a task of duration 10 s, and a pause of 30 s, the frequency at which the task-specific

activation is expected is 1/40 s = 0.025 Hz. A band-pass filter should be designed that

does not exclude this frequency.

This is also the reason why band-pass filters can not generally be used to remove

other kinds of unwanted features of the signal, such as Mayer waves or respiratory

oscillations: their frequencies normally overlap with (or are very close to) the frequency

of stimulation. Also, caution should be used when designing band-pass filters: the use

of overly narrow bands could produce artifacts in the signal in the form of ripples, or

could remove the hemodynamic responses that one is looking for.

Finally, also the filter choice requires special attention: Finite Impulse Response (FIR)

filters are more stable than the Infinite Impulse Response (IIR) and the phase delay

they introduce is constant across the whole frequency range, while IIRs introduce a

frequency-dependent phase shift; on the other hand, IIRs are generally more compu-

tationally efficient.

In both cases, the phase-distortion can be corrected by shifting back in time the filtered

signal of the delay amount (in case of FIRs) or by filtering the time series forward and

backward (in case of IIRs). This latter option is named ‘zero-phase filter’ and it is

widely used in fNIRS pre-processing [63]. Zero-phase filters are non-causal, in that

they use samples not only from the past and present inputs but also from the future

ones: as a consequence, they can be employed only as an offline operation, while real-

time filtering has to be performed with causal filters.

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

Figure 2.11: Effect of applying a band-pass filter [0.01-0.2 Hz]; the

data has sampling frequency 4.4 Hz and the magenta parts represent

the task periods (10 s, with a jittered pause of 19-26 s). On the PSD

graphs, the red line is used for the PSD of the first wavelength and the

blue one for the second wavelength (in this case 760 and 850 nm)

2.4.4 Computation of concentration changes

After the data has been pre-processed, relative concentration changes can be computed

through the modified Beer-Lambert Law (mBLL). The formulation of the mBLL has

been already described in detail in Section 2.1.6.

From a practical perspective, there is an important difference between the theory and

how the algorithm is implemented in most of the available software tools. As illustrated

in Section 2.1.3, CW-NIRS instruments – which represent the great majority of the

fNIRS instruments employed in research - are unable to quantify the amount of incident

light; therefore, some other reference signal is needed to compute the concentration

changes: usually, the mean of the signal over the whole course of the study is used as

a baseline, in place of the incident light. This is why CW instruments are not able to

compute absolute values of concentration changes but changes relative to an arbitrary

baseline.

As mentioned, the implementation of the mBLL, as well as of the other processing

steps, is now offered by many available software solutions, like Homer2 [64], the NIRS

Brain AnalyzIR Toolbox [65] and nirsLAB [66], just to name a few.

2.4. Analysis of fNIRS data

2.4.5 Statistical inference

Any analysis workflow is unique and depends on the specific research question that is

being investigated. However, there is a general consensus around the necessary pre-

processing steps that lead from the intensity measurements to having chromophores

concentration changes.

The roadmap is not as well marked for the statistical inference. This is partially because

the way data is statistically assessed depends on the question being investigated, but

also because for fNIRS there is not yet a clear gold standard. Therefore, many different

methods can be found in literature [67].

However, due to the similarity to the BOLD signal, the fNIRS signal has been often-

times analysed for a long time with the same methods, namely by modelling the signal

with a General Linear Model; the hypothesis of using a GLM for this purpose is that

the signal can be seen as a superimposition of hemodynamic responses, elicited by the

experimental conditions, and physiological oscillations.

Although the measured timeseries Y is really discrete, it is useful to treat it as a

function of time, Y(t) [4, 68]; a GLM seeks to express the concentration timeseries

Y(t) as:

Y(t) = X(t)β+(t)(2.11)

where X(t) is the design matrix, with regressors in each column; βis a vector containing

the coefficients of the regressors, to be estimated; finally, (t)is the error term, having

variance σ2:

(t)∼N0, σ2(2.12)

If each regressor contains the timings of each experimental condition, the resulting β

for that regressor represents therefore the impact of that condition on the measured

data.

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

Figure 2.12: Each column of the design matrix X(t) is the result of the

convolution of the stimulus vector of each experimental condition, u(t)

(left), with a “basis function” representing the expected hemodynamic

response elicited by that condition (middle). The result is a theoretical

time course that is completely predicted by the administered stimuli

(right), and this is the timetrace that is then fitted to the measured

data. The more the condition actually had an impact on the data, the

better will be the fit, in terms of amplitude of the corresponding ß.

Figure is taken from [4]

The design matrix

Each column of the design matrix X encodes the timings of each experimental condition;

in particular, a convolutional model is employed to express the corresponding column

in X as the convolution of the stimulus vector u(t), usually a stick function containing

the stimulation timings, with the expected hemodynamic response function (HRF) h(t)

elicited by that stimulation:

X(t) = u(t)⊗h(τ) = ZT

0u(t−τ)h(τ)dτ (2.13)

where τindexes the peristimulus time(PST), over which the HRF is expressed. The

validity of this model relies on the assumption that the hemodynamic response can be

modelled as a linear time-invariant (LTI) process [69]: thus, it has a finite duration, it

is independent of time and responses to successive inputs superimpose in a linear way.

The result of convolving a stimulus vector u(t) with a canonical HRF is shown in Figure

2.12.

2.4. Analysis of fNIRS data

The simplest and most commonly used basis function is the canonical HRF. The canon-

ical HRF is a model characterized by two gamma functions, one modelling the peak

and one modelling the undershoot. It is usually parametrized by a peak delay of 6 s

and an undershot delay of 16 s, with a peak-undershoot amplitude ratio of 6.

However, the precise shape of the real HRF may vary over brain regions and over indi-

viduals; therefore, it is possible to choose a basis function different from the canonical

HRF in order to accommodate this variability (Figure 2.13). Another possibility is

to use the canonical HRF together with its partial derivatives. This set is known as

informed basis set and it is used to capture variations of the peak delay and dispersion:

the temporal derivative accounts for shifts in the latency of the response, while the

dispersion derivative account for differences in the peak durations.

In a recent work, Santosa and colleagues investigated how the choice of the basis

function affects the sensitivity and specificity of the fNIRS analysis, in presence of

variability and/or systemic bias in the underlying evoked response, and especially in

relation to the task duration [70]. They concluded that the sensitivity of the model

depends heavily on the function choice for shorter task durations (∼5 seconds), while

for longer durations the difference in terms of outcome was less susbstantial.

Solving the model

Once the design matrix has been designed, the model is ready and it is typically solved

via generalized least squares:

β=XTX−1

XTY(2.14)

cov(β) = XTX−1(2.15)

In order for this solution to be the best unbiased estimate of the coefficients, the

Gauss-Markov assumptions have to be fulfilled [71] , namely:

1. Each sample of the error term should be uncorrelated with each other: cov (i,j)=

0 , ∀i6=j

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

Figure 2.13: Basis functions that are most commonly used to model

the expected HRF in a GLM. Figures are produced with the analysis

software nirsLAB [66]. On the top row, the Canonical HRF is depicted

(top-left); the Canonical HRF can be used in conjunction with its time

derivative (top-middle) to capture temporal shifts in the peak latency,

and also with the dispersion derivative (top-right) to capture differ-

ences in the peak duration. The (bottom-left) panel shows the Fourier

set; the Fourier set consists of a constant + K sine functions + K co-

sine functions (tot: 2K +1) of harmonic period T, T/2, . . . T/K. The

(bottom-middle) panel shows K gamma functions (K=3); while the

Finite Impulse Response (FIR) basis function (bottom-right) consists

of K contiguous boxcar functions, each lasting T/K seconds, where T

is the duration of the HRF. Linear combinations of the FIR or Fourier

basis functions can capture any shape of response. For a more compre-

hensive description of these functions, we refer the reader to the book

“Statistical Parametric Mapping: the analysis of functional brain im-

ages” [4]

2.4. Analysis of fNIRS data

2. Each sample of the error term has the same variance var (i)=σ2Σ,∀i(ho-

moscedasticity of the noise)

Once the model is estimated, t-statistics can be estimated for the regressor(s) of inter-

est:

t=β

qcov(β)(2.16)

If the goal is to statistically assess whether two or more regressors had a different

impact on the data, a contrast vector c can be employed that encodes this difference;

in practice, if one wants to test, for example, if condition 1 produced a larger activation

than condition 2, then the vector c= [1 -1 0 0 . . . ] will be defined and the corresponding

t-test will be:

t=c·β

qc·cov(β)·cT(2.17)

The whole procedure, from the definition of the model to the statistics on the regressors,

is normally conducted channel-by-channel.

This way, it is possible to determine in which channels was the hemodynamic activity

most affected by the experimental conditions, and therefore to have a clear localization

of the effects under investigation.

The same analysis is usually performed on all the subjects. The possible options

for proceeding with group level analysis are manifold and depend on the particular

experiment. For example, it is possible to run a correlation analysis between each

channel’s regressor of a particular condition and a demographic variable, like a task

score, or age of the subjects. Another possibility, when groups of subjects are specified,

is to run a Linear Mixed Model with group as a between-subject factor, to test whether

different experimental conditions had different effects on the groups.

The possibilities are numerous and it is not possible to describe a universal recipe that

applies to all circumstances. In the following chapters several approaches that have

been used in this doctoral work for group analysis will be presented.

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

2.4.6 Limitations of the univariate statistical analysis

fNIRS errors are non-spherical

The most important problem with using a GLM at subject-level for fNIRS analysis is

that the requirements needed to have an unbiased estimate of the regressors are not

met. The error term is nonspherical, namely is not uncorrelated and does not have

uniform variance.

Let’s start with the latter: the error term resulting from modelling fNIRS data does

not have uniform variance. This is caused by the fact that fNIRS data may exhibit

components that are much larger in amplitude than the physiological background, for

example motion artifacts. These strong components will then appear as outliers in the

noise distribution, making it not uniform.

Heteroscedasticity itself does not cause the coefficients estimates to be biased, but it

can cause the estimates of their variance to be biased, possibly above or below the

true variance. Therefore, while estimated βwill still be unbiased, their variances will

not, and wrong variances may lead to wrong computation of t-values, making the

subsequent statistical inference possibly wrong.

2.4. Analysis of fNIRS data

Figure 2.14: Figure taken and adapted from [5]. [Left] the distribu-

tion of the residuals is not Gaussian, but it’s characterized by heavy-tails

produced by large outliers, such as motion artifacts. [Right] The tem-

poral autocorrelation of the raw data along with the autocorrelation

of the data processed according to different algorithms; the chart shows

that the AR(n) prewhitening yields the best results in terms of reduction

of autocorrelation.

The other problem is that the samples of the error term are correlated. This is due to

the fact that the physiological signal being measured is much slower than the sampling

frequency of the acquisition instrument. For this reason, the acquired signal Y – and

so also the error term - have high autocorrelations.

Ignoring these correlations when forming a t- or F-statistics leads to an inappropriate

estimate of the degrees of freedom, that are effectively much less than the number of

samples, and therefore to invalid tests.

This issue has been known for a long time to be true also for the fMRI analysis, but for

the fNIRS analysis it is even more critical because of the much higher sampling rate.

To overcome this problem, a standard solution is to transform the noise structure before

solving the model; in particular, the two main approaches that can be adopted are the

noise pre-whitening and noise pre-coloring.

Pre-whitening refers to making the distribution of the noise uniform, thus white, by

applying a whitening filter W, selected such that the new error term W·is spherical:

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

W·Y=W·X·β+W·(2.18)

β=XTWTWX−1

XTWTWY (2.19)

The whitening matrix W can be obtained by iteratively fitting the GLM and examining

the residuals: at the first iteration, the GLM is run without any whitening (W=I), and

based on the residuals a W matrix is estimated. Usually, this is performed by fitting

an autoregressive (AR) model to the residuals vector. The AR model will remove the

noise structure related to the autocorrelation of the samples. The coefficients of the

AR model are then used to build the W matrix.

A whitening matrix based on fitting an AR model to the residuals can definitely reduce

their autocorrelations, but the extent of this effect depends on the model order, i.e.

it depends on the modelled duration of the autocorrelation. In the software package

SPM [4], that was designed for the analysis of fMRI data, the used model order is n=1.

This means that correlation is assumed only in neighbouring data samples. While this

may prove appropriate for fMRI data, it certainly is not for fNIRS data, since it has a

much higher sampling frequency. Nevertheless, the SPM approach has been employed

for some time also in the fNIRS data analysis [72].

The other option that has been proposed to solve this issue is noise pre-coloring. Pre-

coloring refers to the idea of applying a known and specific color spectrum to the noise

to mask its unknown structure. Basically the resulting noise structure is still coloured,

only this time it has a known profile and can be accounted for by estimating a pre-

coloring matrix C, to be applied to the GLM in a similar way of the pre-whitening

matrix W.

Pre-whitening is generally preferred over pre-coloring [5], but the autocorrelation of

the fNIRS data extends well beyond a model order of n=1. Therefore, research has

progressed recently towards finding an fNIRS-specific version of this approach. In

particular, an algorithm was recently proposed by Barker et al [6] that corrects both

the autocorrelation of the residuals, by implementing a more powerful pre-whitening

algorithm, and their heteroscedasticity.

2.4. Analysis of fNIRS data

In this method, the W matrix is iteratively estimated by fitting an AR(n) model to the

residuals, with ndetermined by minimization of the Bayesian Information Criterion

(BIC). This procedure is performed on each channel independently, resulting in different

model orders across different channels. Additionally, at each iteration the weights of

the least squares regression are updated to downweigh the timepoints corresponding

to outliers, and enter the model via the diagonal weight matrix S. The solution to the

model becomes:

β=XTWTSW X−1

XTWTSW Y (2.20)

The authors found that for data acquired at 10 Hz, the optimal model order was

n=33, ascertaining that indeed autocorrelation in fNIRS data do extend much beyond

the n=1 assumed by SPM.

A-priori selection of the basis function

The choice of the right basis function is another item of complication posed by the

use of a GLM. The basis function should represent as accurately as possible the real

hemodynamic response evoked by the particular task that is carried on by the subject.

In particular, complex tasks may engage temporally protracted processes that cannot

be captured by the canonical set [73]. Moreover, even assuming that the canonical

HRF is the appropriate choice, several studies have shown that there is a substantial

variability in the shape of responses collected across subjects, and even across chan-

nels with the same single subject, all of which makes the definition of the parameters

challenging [18, 74].

A strategy that has been proposed for addressing this variability is the incorporation

of temporal and dispersion derivatives into the canonical set; however, the issue is

surrounded by much debate [75, 76, 77, 78]: the main criticism regards the fact that

although it is true that the additional terms do capture additional variance, the final

test for statistical significance is still carried out on the amplitude of the nonderivative

portion of the model, therefore the improvement made in model fitness does not lead

to higher power estimate. Furthermore, the subsequent group-level analysis do not

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

typically account for the subject-level variances at all, thus it remains unaltered by the

employment of the additional terms.

Lastly, it is worth noting that the hemodynamic response to the same stimulus can

vary greatly across different channels’ positions. For example, Zimeo Morais et al.

[79] reported a consistent positive correlation between HbO and HbR in the anterior

temporal region, that can be explained by extracerebral changes but more likely by

oxygenation changes in the temporalis muscle. The contribution of muscle oxygenation

changes has been described to play an important role in the signal measured with fNIRS

[80, 81], as well as the contribution of small capillaries [82].

All these considerations explain why the GLM can hardly capture the natural inter-

and intra-subject variability of the hemodynamic response: if the basis function is

inaccurate, the design matrix is not a good model of the measured data, and actual

task-induced neural activity may remain undetected.

Multiple comparisons

As the technological advances of the fNIRS hardware provide an increasingly wide

coverage of the head, with many systems now offering caps with hundreds of channels,

the issue of the multiple testing correction is as crucial as it has ever been, and can no

longer be ignored.

The multiple comparisons problem occurs when several statistical tests are carried

out simultaneously, as if they were independent: in any statistical test, there is a

α%probability of rejecting the null hypothesis when it is true (Type I error, or false

discovery), with αdefined by the experimenter (usually α=5%). If N tests are

carried out simultaneously, the risk of Type I errors becomes (N x α)%. This is

exactly what happens when a univariate statistics is employed on fNIRS data: each

channel is statistically tested independently on each other, than a significance threshold

αis applied and channels whose t-test resulted in p≤αare considered active. Since

channels are not really independent, there is an increased risk of Type I errors, namely,

some of the identified active channels are actually false positives.

The simplest way of controlling Type I errors is using the Bonferroni correction: if

there are ktests being performed, then the significance level αis replaced by α/k, for

2.4. Analysis of fNIRS data

each test. Although this method provides a strong control for errors, it is definitely too

conservative for neuroimaging data, and can eliminate both false and true positives. For

this reason, other methods have been proposed; for a thorough review, we recommend

the reader the work of Nichols and Hayasaka [83].

One of the most advantageous for fNIRS seems to be the False Discovery Rate (FDR)

procedure introduced by Benjamini and Hochberg [84, 85]. Briefly, FDR methods aim

at controlling the proportion of errors not among the entire set of hypotheses tested, as

it is the case for the Bonferroni correction, but only among those resulting in rejecting

the null hypothesis, i.e. the declared active channels. This way, the result is less

conservative and the power of this correction increases as the number of rejected null

hypotheses increases.

Independent analysis of HbO and HbR

Lastly, a significant limitation with performing univariate statistics on fNIRS data is

that the analysis is carried out on HbO and HbR independently, when in fact they are

nonlinearly partially correlated variables.

The are several inconveniences to this. The first problem is a very theoretical one:

the correlation between HbO and HbR is per se meaningful and should be taken into

careful examination; in fact, hemodynamic activation should always involve a negative

correlation between HbO and HbR (unless we are considering a particular group of

subjects or infants [50]). However, this information is completely lost when keeping

the two analyses separate. The examination of the two individual components alone

can, therefore, lead to misinterpretation of the findings.

This leads to the other problem: for the same experimental condition, it is a very

common situation that test statistics yielded by the HbO analysis do not mirror the

test statistics yielded by the HbR analysis. In other words, one may find that channels

considered active by looking at the HbO results, would be considered not active with

the HbR results, or vice versa; this can happen because every subject has its own

“hemodynamic signature”, and while some exhibit stronger activation as an increase

in the HbO signal, others may express activation more pronouncedly with a decrease

in the HbR signal.

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

So, on one hand the correlation between the two components should always be negative

for an activation to be considered as such, but on the other hand one may empirically

observe that some subjects do not exhibit much activation with one or the other com-

ponent [86].

This issue becomes especially problematic at group level: it is often the case indeed

that group level results based on the statistical analysis of HbO values do not match

results based on the analysis of the HbR values. To overcome this limitation, a univocal

measure would be needed, that incorporates information on both components.

2.5 The potential role of Machine Learning

Machine Learning techniques have started being successfully employed in the analysis

of fNIRS data only recently, with the research field of Brain-Computer Interfaces (BCI)

leading this methodological renovation.

A BCI is a communication system that allows the use of brain activity to control

external devices, by clustering these signals into two or more classes; this makes it

an application where machine learning naturally belongs. It consists typically of five

stages: brain-signal acquisition, pre-processing, feature extraction, classification and

application interface. While the acquisition and the pre-processing steps have been

already covered in previous sections, and the interface application is beyond the scope

of this work, we will focus on the steps concerning feature extraction and classification.

The first study that demonstrated the feasibility of using fNIRS for a BCI was from

Coyle et al [87]: in their work, participants had simply to imagine squeezing a rubber

ball. The average HbO concentration level was computed within 1 second and used as

a feature: if the feature exceeded a reference level (defined as the largest value of the

signal within the first 10 seconds), then the event was classified as “task”.

Since this study, many others followed and BCIs have become increasingly complex

and articulated. Among the features that are mostly employed, the signal average

amplitude and slope are the most common; others include signal variance, skewness,

kurtosis and zero crossing. For a complete review we recommend the reader the work

from Naseer and Hong [88]. As for the classification step, they found that the most

2.5. The potential role of Machine Learning

commonly used method in fNIRS-BCI is the Linear Discriminant Analysis (LDA).

The reason for it can be found in the fact that LDA combines both good classification

performances and low computational requirements. Other algorithms that have proven

efficient for fNIRS-BCIs include Support Vector Machines (SVM), Artificial Neural

Networks (ANN) and Hidden Markov Models (HNN). A detailed description of LDA

and SVM is provided in Sections 2.6 and 2.7.

The use of machine learning in the context of BCIs is then to classify brain signals

into several classes. But it can also be used as an alternative to univariate statistics in

order to investigate neurophysiological processes and cognitive functions.

This is the case of Multivariate Pattern Analysis (MVPA) methods. MVPA methods

started being developed for the analysis of fMRI data, as an alternative to the univariate

voxel-wise analysis.

The rationale behind this approach is that part of information is represented not only

within each voxel but also across groups of voxel, as “spatial patterns” of activity;

MVPA involves the use of pattern-classification algorithms to search for reproducible

spatial patterns that differentiate across experimental conditions, and can therefore be

considered as a supervised classification problem, where the chosen classifier attempts

to capture the relationship between the patterns of activity and the experimental con-

ditions [89].

This approach has several advantages over univariate methods: first of all, it can be

more sensitive, simply because the voxel-wise response to the experimental manipula-

tion might be not strong enough to be statistically significant, but could be discrimi-

native of the condition if accumulated with the responses of neighbouring voxels and

analysed with classification algorithms [90, 91]. This way, MVPA can extract a finer

level of detail from the data.

Secondly, even if two regions do not individually bear information, they may do so

when taken into account together.

For this reasons, this approach has been gaining increasing popularity, and has been

applied to fMRI, EEG and, more recently, fNIRS [7, 8]. Emberson et al [7] described the

first demonstration of neural decoding in infants using multivariate techniques on fNIRS

data. In particular, they classified spatial patterns of hemodynamic activity arising

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

in infants in response to two conditions that were both audio-visual but of different

nature (face+music vs fireworks+speech). They demonstrated that despite the absence

of significant univariate results, the multivariate classification did manage to decode

the two different states, with a classification accuracy averaged across subjects greater

than 75%. This and other studies testify to the great potential that machine learning

and model-free feature selection have to fill the gaps left by the univariate analysis.

2.6 Linear Discriminant Analysis (LDA)

The goal of classification is to take an input vector xand assign it to one of K discrete

classes Ck(k=1, . . . .. K). The input space is divided into decision regions, and the

boundaries of these regions are called decision surfaces [92].In this section, we will focus

on the case of two-classes classification (K=2).

One of the simplest approach to the classification problem is to use a decision surface

that is a linear function of the input vector and it’s defined by hyperplanes of dimension

(D-1), with D being the number of dimensions of the input space. Such linear function

is called linear discriminant function and it is defined as:

y(x) = wTx+w0(2.21)

where wis the weight vector, xis the input vector and w0is a bias.

An input vector xis assigned to class C1 if y(x)≥0and to class C2 otherwise.

Therefore, the decision boundary is defined by the relation y(x) = 0, which corresponds

to a (D-1)-dimensional hyperplane within the D-dimensional input space. The vector

w determines the orientation of the decision surface, while the bias parameter w0

determines the location of the decision surface (Figure 2.15). The bias can be chosen

in different ways; often, it is chosen as the middle point between the projections of the

classes’ means on the normal vector w.

2.6. Linear Discriminant Analysis (LDA)

Figure 2.15: Geometry of the classification problem with Linear Dis-

criminant Analysis: the decision surface (in red) is perpendicular to w,

and its intercept is defined by the bias w0. Figure taken from Bishop

(2006) [92]

The weight vector wis chosen such as to maximize the distance between the two classes’

means (between-class scatter,SB) and minimizes the interclass variances (within-class

scatter,SW). This criterion is known as Fisher criterion, and it can be formulated as

follows. For class k, the within-class variance is defined as:

k=X

n∈Ck

(yn−mk)2(2.22)

with yn=wTxnand mkbeing the sample mean for the projected points of the class,

defined as

mk=1

nkX

n∈Ck

yn=1

nkX

n∈Ck

wTxn(2.23)

Therefore, the objective function to maximize in LDA is

J(w) = (m2−m1)2

1+s2

(2.24)

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

To find the optimum w, J(w) must be expressed as a function of w. It can be shown

that the variance within the class kcan be expressed as:

k=X

n∈Ck

wT(xn−mk) (xn−mk)Tw(2.25)

k=wTSkw(2.26)

Thus, the total within-class scatter is defined as:

1+s2

2=wTSWw(2.27)

Similarly, the difference between the projected means can be expressed in terms of the

between-class scatter SB:

(m2−m1)2=wTSBw(2.28)

Therefore, we can express J(w) as:

J(w) = (m2−m1)2

1+s2

=wTSBw

wTSWw(2.29)

Now it’s possible to find the optimum wby differentiating J(w); we find that J(w) is

maximised when

wTSBwSWw=wTSWwSBw(2.30)

By multiplying both sides of the equation by SW

−1we obtain that the expression of

the optimal weight vector is given by:

w∝b

S−1

W(c

m2−c

m1)(2.31)

2.6. Linear Discriminant Analysis (LDA)

It can be shown [93] that LDA is the optimal classifier when a) input features from

all classes are normally distributed, b) the classes have the same covariance matrices

and c) the true class distributions are known: this latter assumption is never met,

therefore means, ˆm1and ˆm2, and covariance matrices ˆ

Sof the distributions have to

be estimated from the data 2.8):

S=1

n−1

i=1

(xi−c

m) (xi−c

m)T(2.32)

The estimator of the covariance matrix is unbiased, but in the case of neuroimaging

data the estimate may become imprecise: this is due to the fact the neuroimaging

data is usually characterized by a limited number of data points in a high-dimensional

feature space.

This curse of dimensionality leads to a systematically distorted estimation of the covari-

ance matrix, given by an overestimation of the large eigenvalues and an underestima-

tion of the small ones. The distortion eventually results in degrading the classification

performances [94, 95]. The solution that has been proposed to mitigate this effect

involves the introduction of a regularization term in the Equation 2.8, such that the

new estimated covariance matrix is given by:

S(γ) = (1−γ)b

S+γvI (2.33)

Where γ∈[0, 1]and v is defined as the average eigenvalue of ˆ

S(v=trace(ˆ

S)/d),

with d being the dimensionality of the feature space. It can be shown [94] that ˆ

and e

Shave the same eigenvectors, and that this approach, named shrinkage or LDA

regularization, has indeed the effect to push extreme eigenvectors towards the average

v. The regularization is largest when γ=1and absent when γ=0. Several studies

demonstrated that regularized LDA yields improved predictive accuracy in the context

of ERP decoding [94], fMRI-based decoding [96] and also in fNIRS-based BCIs [97,

98].

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

2.7 Support Vector Machines (SVM)

While in LDA the separating hyperplane is obtained by seeking the projection that

maximises the distance between the two classes’ means and minimizes the interclass

variances, Support Vector Machines are designed to maximise the distance between

the separating hyperplane and the nearest training points, called support vectors.

Such distance M is called margin and maximising it means minimizing wTw. Addi-

tionally, wand w0should also ensure that datapoints are correctly classified, namely

the input vectors x1, .... xNshould be classified according to their target values t1,

...tN, with tn∈[−1, 1]:y(xn)≥0for tn= +1and y(xn)≤0for tn=−1. This

criterion can be expressed by the following formula, named the constraints:

tnwTxn+w0≥1for n=1, . . . n (2.34)

This is a quadratic problem; it can be shown that an optimal solution to this problem

is found when the Karush-Kuhn-Tucker (KKT) conditions are satisfied:

λ∗

n1−tnw∗Txn+w∗

0) = 0(2.35)

1−tnw∗Txn+w∗

0)≤0(2.36)

λ∗

n≥0(2.37)

where λnare positive values known as Lagrange multipliers. By solving the quadratic

problem, it can be shown that w* and w0∗are given by:

w∗=

n=1

λntnxn(2.38)

n=1

λntn=0(2.39)

2.7. Support Vector Machines (SVM)

w∗

0=1

NsX

support vectors j



tj−

n=1

λntnxT

nxj

(2.40)

And that a prediction on a new datapoint z can be made with the following formula:

w∗Tz+w∗

0=



n=1

λntnxn



z+w∗

0(2.41)

If the classes’ distributions overlap in the feature space, we can still maximize the

margin M, but we have to allow for some points to be on the wrong side of the margin,

with a penalty that increases with the distance from that boundary.

This is achieved by introducing the slack variables ξn= (ξ1,ξ2, ..., ξN), with N being

the number of data points.

Data points that lie on or inside the correct margin boundary have ξn=0, while for

the other points ξn=|tn−y(xn)|. Points for which o<ξn≤1lie inside the margin,

but on the correct side of the decision boundary; points for which ξn>1like on the

wrong side of the boundary and are misclassified.

Therefore, the constraint (Equation 2.34) must be adjusted to take into account slack

variables. This is sometimes described as relaxing the hard margin:

tnwTxn+w0≥1−ξnfor n=1, . . . .N (2.42)

with ξn≥0.

Slack variables allow to find an appropriate linear separation, even if the two classes

are not completely linearly separable.

When it is impossible to find a linear separation between the classes, the solution

adopted by SVMs is to transform the input features xwith a non-linear kernel.

This doctoral work only employed Linear SVMs (see Chapter 5), therefore the choice of

the kernel for non-Linear SVMs will not be described here; for a complete illustration

of the subject, we refer the reader to Bishop (2006) [92] and Marsland (2014) [99].

Chapter 2. Physical principles and applications of functional Near-Infrared

Spectroscopy

2.8 Lessons learned

•Systemic oscillations and temporal correlation in the data can reduce the sensi-

tivity of the General Linear Model.

•The independent univariate analysis of HbO and HbR can make difficult and

mislead the interpretation of the results, especially at group level.

•The a-priori choice of the basis function impede accounting for inter- and intra-

subject variability of the hemodynamic response.

•Machine learning methods can offer a viable alternative to uncover finer levels

of detail, by classifying patterns without using any a-priori assumptions on the

hemodynamic response; also, an univocal measure accounting for both HbO and

HbR could be employed as a feature.

Chapter 3

From univariate to multivariate

analysis of fNIRS data: a

theoretical formulation and

validation

3.1 Introduction

In the previous chapter, the limitations of the analysis based on GLMs were exten-

sively described: the need for a-priori chosen model for the data, that restricts the

possibility of accounting for subject- and channel-level shifts from the expected model;

the problems yielded by the independent analysis of the HbO and HbR components;

the systemic oscillations, that can considerably reduce the sensitivity of the GLM, that

may therefore not produce significant results.

Despite the increasing use of machine learning-based algorithms for neuroimaging data,

particularly driven by the field of BCIs, and the body of literature involving machine

learning-based procedures for fMRI and EEG for the investigation of neurocognitive

processes, to date fNIRS still lacks a comparable literature.

In particular, no systematic comparison has been performed between the GLM-based

approach and an alternative analysis based on using machine learning, for fNIRS data.

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

An objective comparison between the two techniques is a critical factor in order to

make a reasoned choice when it comes to data analysis.

For this reason, this effort was the very first step of this doctoral project and it is the

subject addressed in this chapter, based on the following publication:

[9] Gemignani, J., Middell, E., Barbour, R. L., Graber, H. L. and Blankertz, B. (2018).

Improving the analysis of near-infrared spectroscopy data with multivariate classifica-

tion of hemodynamic patterns: a theoretical formulation and validation. Journal of

Neural Engineering, 15(4), 045001. doi: 10.1088/1741-2552/aabb7c

The GLM approach was compared with a model-free approach based on the use of

multivariate features, including both HbO and HbR, and on the use of LDA for the

classification.

As described in Section 2.5, LDA is often used in the context of BCIs, thanks to its

advantageous combination of low computational requirements and good performances.

Here we want to assess if its characteristics make it also a convenient tool for offline

statistical analysis, in quest of interpreting hemodynamic patterns with respect to the

experimental conditions that elicited them. In particular, this work aims at comparing

GLM and the LDA-based classifier in performing channel-wise classifications: active

channel vs non-active channel.

The advantage of using a classifier for this purpose is that no assumptions on the

structure of the noise are necessary, and that no prior knowledge of the shape of the

expected hemodynamic response is needed. Furthermore, for the LDA-based classifier,

the information regarding the simultaneous variations of HbO and HbR can be com-

bined in a multivariate strategy. In this way, the analysis would yield a single metric for

“activation” for each channel, and this would be easier to test than separately testing

βcoefficients from HbO and HbR, especially at group-level.

However, the benefits of using machine learning may go beyond “just” producing a

larger classification accuracy: the comparison between the results yielded by the two

analytic approaches may be informative in the sense that the classifier might identify

unpredictable effects that elude the model-based analysis. For example, in a case where

GLM analysis reports a channel as “not active”, the availability of LDA results could

facilitate the process of deciding whether activation truly was absent (i.e., because LDA

3.2. Methods

also classified the channel as “not active”) or if the hemodynamic model used for GLM

was not optimal (i.e., because LDA classified it as “active”).

The present work comprises two steps: the first is a comparison of the proposed LDA-

based method with the canonical GLM analysis. In order to do this, an extensive

volume of simulated data is used to characterize the two algorithms in terms of re-

ceiver operator curves (ROC) when no systemic oscillation is present (i.e., simulated

hemodynamic responses were added to simulated resting-state data) or when a consid-

erable amount of systemic oscillation is present (i.e., simulated hemodynamic responses

were added to experimental resting-state data); the real-data results also were used to

characterize the impact of inter-subject variability on the outcomes of the classification

analyses. Second, the two algorithms were used to analyse and classify the task-induced

activations in a set of experimental data.

3.2 Methods

In order to compare the performances of the LDA-based and GLM-based methods

under controlled conditions, sensitivity and specificity were quantified by recovering

a known synthetic hemodynamic response added to either synthetic or real resting-

state data. This approach has been used in several reports [5, 6, 100, 101] and it is

particularly suited for studies that make use of ROC analysis, because it permits an

exact quantification of true and false discovery rates.

In a second step, the two analysis pipelines were applied to real experimental data and

channel-wise statistical assessments of each subject were compared. Figure 3.1 reports

a summary description of the whole procedure followed in this work. Figure 3.2 shows

how known hemodynamic responses were added over resting state time traces.

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

Figure 3.1: In each iteration, data are simulated, based on either syn-

thetic or real resting-state data; HbO and HbR (red and blue time traces

in the “Synthetic dataset” panel, respectively) were analyzed with GLM

or LDA, and ROC analysis was performed to compare the classification

accuracies. The simulated HRFs vary in shape and size, and 30% of

them are characterized by a “double bump” as a simplified model of

stimulus-locked Mayer waves.

Figure 3.2: (A) Example of how simulated HRFs are created (black

line) and added to a real resting-state time trace (dark grey line). The

top trace is a simple HRF while the bottom trace contains a double

bump. (B) The grey time trace represents the HbO signal before the

hemodynamic activations are added; the red and blue ones are, respec-

tively, the time traces after a simple HRF or a double-bump HRF have

been added.

3.3. Theoretical formulation

3.3 Theoretical formulation

3.3.1 Generation of the synthetic dataset

5000 datasets of NIRS data were iteratively simulated by combining temporally corre-

lated (“colored”) noise and synthetic hemodynamic response functions (HRFs). Base-

line noise was produced by first generating white noise, then imposing temporal corre-

lation on it by employing an autoregressive model of order 30, via tools in the NIRS

Brain AnalyzIR toolbox [65]. Each dataset contained 20 channels, and synthetic HRFs

were added to the resting state for half of them (i.e., Channels 1−10). Channels that

included synthetic HRFs in their HbO and HbR data were labelled as “active” and the

others as “not active”. For the active channels, ‘Start’ and ‘Stop’ markers were created

according to an experimental paradigm with 3 trials, each of 10 seconds duration. The

position of the ‘Start’ markers was randomized, with the constraint that successive

ones were separated in time by at least 35 seconds. The total duration of each time

series was 4 minutes, at a sampling frequency of 7.81 Hz.

To model the inter- and intra-subject variability of real hemodynamic responses, syn-

thetic evoked HRFs had variable size and shape across subjects and channels. While

each HRF had the mathematical form of a canonical HRF [4], their peak amplitudes

ranged from 0.01 to 0.1 µM [101], while, based on experience and existing literature on

the variability of the hemodynamic response [18], the onset-to-peak times ranged from

2 to 8 seconds and the onset-to-undershoot times ranged from 14 to 18 seconds.

Positive-going synthetic HRFs (see Fig.3.2A) were added to the resting-state data for

the HbO time series. The synthetic HRFs added to the corresponding HbR resting-

state data had the same form as those for HbO, but were 50% reduced in magnitude

and reversed in algebraic sign (i.e., they were negative-going). In addition, for 30%

of the active channels in each dataset the synthetic HRF included a “double bump”

(see Fig.3.2A), as an elementary model of systemic hemodynamic activity time-locked

with the experimental condition. An example of this sort of additional activity that

frequently is present in experimental data are the so-called “Mayer waves,” which

are systemic oscillations originating in the superficial layers of the tissues [102], and

which occur, more or less prominently, at ∼0.1 Hz frequencies. Such oscillations are

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

particularly difficult to treat in a GLM-analysis framework, owing to their extensive

spectral overlap with typical event-related activity (i.e., they cannot be eliminated via

straightforward frequency filtering) [103].

The use of synthetic baseline noise has the clear benefit that a large volume of data

can be created, and it allows to benchmark the proposed methodology against recent

literature on the topic [5, 6].

However, synthetic data might not capture all the properties of real physiological data.

Therefore we complemented the synthetic-data analysis by using experimental resting-

state data as baseline noise. For this purpose, 15 young adults (mean age ±SD: 28.1

±4.0 years old; age range: 23-38; 11 women, 4 men) participated in the collection of 4

minutes of resting-state data. For a subset of the participants, this measurement was

followed by the motor-task study that was used in a later stage of this analysis (see

Section 2.2.1 for descriptions of the experimental setup and data collection).

Experimental resting-state recordings were used as a source of real physiological and

correlated data, and were employed in the same manner as described above for the

synthetic resting-state data, namely by performing 5000 randomizations of the positions

of the ‘Start’ markers and adding simulated HRFs of variable shapes and amplitudes

to only Channels 1-10 (left hemisphere), and labeling those channels as active.

3.3.2 Data analysis

Pre-processing

Both simulated and real data underwent the same pre-processing steps. Hemoglobin

concentration changes were calculated using the modified Beer-Lambert law (Differ-

ential Pathlength Factor (DPF): 6 , absorption coefficients (µa,cm−1M−1) for HbO:

µa(760 nm) = 1349 and µa(850 nm) = 2436, for HbR: µa(760 nm) = 3565 and µa(850

nm) = 1592).

Data was bandpass filtered in the range [0.01-0.2] Hz, with a zero-phase distortion

digital FIR filter designed and implemented, respectively, with the MATLAB com-

mands firls and filtfilt. For the subsequent statistical analysis, filtered data was used

for the LDA analysis, in accordance with most fNIRS-based BCI literature [88], while

3.3. Theoretical formulation

unfiltered data was used for the GLM computations because it has been reported that

frequency filtering can produce biased estimates of the regressors [5]. In this way, both

methods were used at their optimal settings.

Analysis with GLM

GLM was applied using the autoregressive iteratively reweighted least squares ("AR

IRLS") algorithm available in the NIRS Brain AnalyzIR toolbox. This algorithm is re-

ported to efficiently remove serial correlations from data, thereby achieving an accept-

able false discovery rate [6]. HbO and HbR time traces were analyzed independently.

After the regressors βwere estimated, the null hypothesis that there was no hemody-

namic response (H0: β=0) was tested by defining a contrast vector (c) and calculating

the channel-wise t-statistic via the formula [67]:

t=cTβ

qcTcov(β)c(3.1)

In this case, with only one experimental condition to be tested, the contrast vector

would be [1 0], with the second column referring to the constant column added to the

GLM design matrix. The p-values corresponding to the t-statistics from Eq. 3.1 were

computed via two-tailed t-tests.

Analysis with LDA

For each HbO and HbR time series, trials were defined as the signal in the 15-second

time interval following each ‘Start’ marker. Each trial was baseline-corrected by re-

moving the mean value of the signal over the 3-seconds interval prior to stimulus onset.

Channel-wise block averages were obtained by averaging across all trials within each

channel.

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

Figure 3.3: (A): Block averages of HbO (left) and HbR (right) signal

used for feature extraction; dashed lines represent the 1s steps used for

the moving-window computation of amplitude and slope. (B): Features

vectors are obtained from the block averages by computing mean and

slope of the signal over a sliding window of 3 s duration with 1 s steps,

resulting in a 30-features vectors that were then normalized and con-

catenated to produce the 60-features multivariate (HbO + HbR) classi-

fier. Grey lines represent individual trials, black lines highlight the mean

value of the feature vectors corresponding to “active” channels, and blue

lines highlight the mean value of the feature vectors corresponding to

the “not active” channels. Values of the y axis are normalized values

3.3. Theoretical formulation

Features were extracted from the channel-wise block averages (Figure 3.3 A). To do this,

a 3-seconds-wide window was moved through the block-average time series in 1-second

steps and the mean value and mean slope (computed as the change in signal amplitude

over the time window divided by its size in number of samples) were computed within

each window, yielding a 30-features vector (2 features ×15 windows) for each of HbO

and HbR. Each feature vector was normalized to zero mean value and unit variance.

Then the HbO and HbR feature vectors were concatenated, resulting in a 60-features

vector that was used for the classification (Figure 3.3 B).

Channels were classified as active or not active with Regularized Linear Discriminant

Analysis, via tools available in the Berlin Brain-Computer interfacing (BBCI) toolbox

[104, 105]. Ten repetitions of 4-fold cross validation were performed: 20 trials (10 active

channels and 10 not active channels) were separated into 4 folds, with three folds used

for training and the remaining fold as the test dataset. The procedure was repeated

10 times.

Each feature vector x∈RNwas assigned an output by the application of the formula

of the separating hyperplane characterizing the LDA classifier [104]:

wTx+w0=0(3.2)

where wis the projection vector characterizing the classifier and w0is a bias term.

For further detail, see Chapter 2, Section 2.6. In this implementation, the class not

active was assigned to negative or zero outputs and class active was assigned to positive

outputs.

Evaluation of performances

The performances of the GLM analysis and the LDA-based method were evaluated by

computing receiver operating characteristic (ROC) curves.

ROC curves for the GLM results were computed by varying the significance threshold

for the t-test p-values, from 0 to 1 in increments of 0.001, and computing the corre-

sponding false positive rate and true positive rate for each threshold. ROC curves for

the LDA results were computed by comparing the distributions of output values of

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

active and not active channels. We defined a significance threshold, varying from 0

to 1 in increments of 0.001, in the following manner: on the distribution of not active

outputs, we defined a reference value as the percentile corresponding to the considered

threshold. We defined as True Negatives (TN) the samples of the not active distribu-

tion that were smaller than the reference value, False Positive (FP) the samples of the

not active distribution that were equal or greater than reference value, True Positives

(TP) the samples of the active distribution that were equal or greater than the refer-

ence value and False Negative (FN) the samples of the active distribution that were

smaller than the reference value. We repeated the procedure by sliding the reference

value until 100% of the not active distribution was covered (i.e., significance threshold

= 1). For example, by setting the threshold at 0.05, we computed the reference value

on the distribution of not active outputs corresponding to its 5% percentile, and based

on this reference value we computed TP, TN, FP, FN at p = 0.05.

For both GLM and LDA, classification accuracy was computed as the rate of correct

classifications, (TP + TN)/(TP + TN + FP + FN), at p = 0.05. In order to investigate

the impact of double-bump HRFs on classification accuracy, we conducted two separate

analyses on the two subsets of data characterized by, respectively, only HRFs with no-

double bumps and only HRFs coupled with double-bumps.

Analysis of the correlation between subjects’ demographics and classifica-

tion accuracy

Demographic information such as gender, age and chronobiology has been reported to

play a role in the cerebral metabolism [106, 107, 108, 109], and therefore we tested for

correlations between the individual-subject classification accuracies and each subject’s

characteristics. To do so, a linear mixed-effects (LME) model was fitted in Matlab

2017, with a random intercept for each participant and fixed effects for age, gender,

hair color (two levels: Blond, Brown), and time-of-day of the measurement (three

levels: 10 AM-1 PM, 1 PM-3 PM, 3 PM-6 PM):

Accuracy ∼Age +HairColor +Gender +T imeMeasurement + (1|P articipant)

3.4. Application of the algorithm to experimental data

This analysis was carried out for LDA (HbO+HbR), GLM (HbO) and GLM (HbR)

separately. Analysis of variance (ANOVA) was performed on each model to test the

significance of the effects (error DF = 10 (15 observations minus 5 modelled effects)).

3.4 Application of the algorithm to experimental

data

To provide a practical example of use of the proposed algorithm and compare it with the

GLM analysis in the framework of a real experiment, a paradigm was chosen—finger

tapping—that has a well-known effect on the motor cortex. In particular, it is expected

to elicit a recognizable and significant response in the primary motor cortex (M1,

Brodmann area 4, likely to underlie the C3/C4 positions of the EEG 10-20 system)

and the premotor cortex (PMC, Brodmann area 6, likely to underlie the FC3/FC4

positions) [110].

3.4.1 Experimental setup and data collection

Seven healthy young adults (a subset of the 15 participants in the preceding study;

mean±SD age 26.0±2.3 yr, age range 23-30 yr; 5 female, 2 male) participated in

this study. The experiment consisted of 16 trials of finger tapping (8 left, 8 right,

alternating), each of 10 s duration, with 20 s rest periods between successive episodes.

Before the experiment began, the subject was required to sit quietly for the collection

of 4 minutes of resting-state data.

NIRS recordings were conducted with a NIRSport system (NIRx GmbH, Berlin, Ger-

many), with sampling frequency 7.81 Hz, at wavelengths 760 nm and 850 nm, with

8 sources and 8 detectors. Sources and detectors were placed into a cap (EASYCAP,

Herrsching, Germany), arranged according to the International 10-20 system. The

source-detector distance was 2.5-3 cm, to form 20 channels evenly distributed between

the hemispheres (Figure 3.4).

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

Figure 3.4: (A) 8 sources ×8 detectors montage (covering the motor

region); (B) Source-Detector pairs and corresponding channels

A spatial sensitivity profile was calculated based on the Monte Carlo photon migration

modeling available in the AtlasViewer software [111], to prove that the probe design

was selective for the regions relevant to the finger tapping task (underlying the 10-20

positions FC3/FC4 and C3/C4). The Monte Carlo modeling was performed with 106

photons. Figure 3.5 shows the probe arrangement and the resulting sensitivity profile.

3.4.2 Data analysis

The data analysis aims at identifying which channels are significantly activated by the

motor task and can therefore be labeled active, as opposed to the not active channels

that are not significantly activated by the task. For this reason, no distinction was

made between left and right-hand finger tapping. The data was analyzed with the

GLM analysis and the LDA-based method described in the previous section.

3.4. Application of the algorithm to experimental data

Figure 3.5: (Left) Probe setup. The arrangement of optodes fol-

lows the 10-20 standard and the placement is analogous in the other

hemisphere. Red dots indicate the sources, blue dots indicate the detec-

tors, and yellow lines indicate the formed channels. (Right) Sensitivity

profile of the probe setup. The sensitivity profile has units of mm−1.

Multiplying the value of the sensitivity profile of a given measurement

channel by an absorption change (mm−1) and the area over which that

change occurs (mm2) will give an estimate of the optical density change

that would be measured by that channel as a result of that absorption

change [111]

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

Analysis with GLM

For the GLM analysis, the stimulus times of the task were convolved with a canonical

hemodynamic response function (peakTime = 6s) to produce the single column (“Task”

condition) of the design matrix. A GLM was applied using the autoregressive iteratively

reweighted least squares algorithm available in the NIRS Brain AnalyzIR toolbox [65].

Analysis with LDA

For the LDA analysis, amplitudes and slopes were computed for each trial of finger

tapping. For the “Rest” condition, an equal number (n = 16) of time intervals were

produced by randomly sampling the initial 4 minutes of resting-state data of each

measurement, and features were extracted. The sampling of “Rest” trials and the

classification Task vs. Rest was iterated 2000 times for each channel and for each

subject, to ensure robustness of the analysis. Ten repetitions of 4-fold cross validation

were conducted and each of the 32 trials (16 task and 16 rest) was assigned a classifier

output via Equation 3.2.

3.4.3 Statistical analysis

The results of the GLM analysis were statistically assessed by computing the channel-

wise t-statistics (Equation 3.1) from the resulting βvalues, then testing them via

two-tailed t-tests. The outputs of the LDA analysis were divided into “Task” and

“Rest”, then averaged over folds and over repetitions, and tested by comparing the

two distributions (Task and Rest). The rationale of this procedure is that, if the task

elicited a hemodynamic response and the classifier had a good discrimination between

“Task” and “Rest”, then the distributions of the outputs should be well separated and

the channel will be labeled as active. If, on the contrary, the two distributions are not

well separated, it means that for that channel the execution of the task did not elicit a

response substantially different from the resting state, and the channel will be labeled

as not active. As explained in Section 2.1.2, the class label active is assigned to positive

outputs, while not active to the negative outputs. Therefore, the channel-wise p-value

3.5. Results

in this case was computed as the fraction of “Rest” outputs equal or greater than the

mean value of the distribution of the “Task” outputs [112].

3.5 Results

3.5.1 Theoretical formulation

Performance of the algorithms: overall classification accuracies

Our first goal was to theoretically compare the two algorithms in terms of overall clas-

sification accuracy, both on synthetic and on real data. The other important objective

was to evaluate whether, with real data, the achieved results are consistent across sub-

jects, and to evaluate the impact of inter-subject variability on the performance of each

algorithm.

Figure 3.6: (A) ROC curves for GLM and LDA, using HbO, HbR and

HbO+HbR features (only for LDA). The curves for the real resting-state

data (solid lines) are obtained by averaging the individual curves across

subjects, while dotted lines refer to the completely synthetic dataset.

(B) The table reports the mean classification accuracies, over all itera-

tions, of the three algorithms applied to synthetic and real resting-state

data. The classification accuracy is computed from the ROC curves at

the false positive rate of 0.05.

Figure 3.6 A shows the ROC curves obtained using synthetic and real resting-state

data. In both cases the LDA classifier based on HbO+HbR features outperforms GLM

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

applied to either HbO or HbR, with results tabulated in Figure 3.6B. A difference

between GLM results for synthetic and real resting-state data is also seen, in that

GLM(HbO) is more accurate than GLM(HbR) in the former case, while GLM(HbR)

is more accurate than GLM(HBO) in the latter.

We speculate that this difference indicates that the synthetic data does not entirely

represent the properties of the real physiological data. For example, it certainly does

not reflect the frequency structure of the resting-state signal, or its spatial dependence

across the different channel positions. In addition, the temporal correlation in the

synthetic data was imposed by using an autoregressive model of fixed order (N =

30) [5], which doesn’t account for the variability that can be found in real data from

different subjects.

To further investigate the performances of the three methods, we computed the classi-

fication accuracies for each subject individually (Figure 3.7 A). The barplots indicate

the classification accuracy as computed from the individual subjects’ ROC curves at

p = 0.05, and the red line indicates the mean accuracy over all subjects, respectively

(mean±SD) 78.76 ±5.1% for LDA(HbO+HbR), 65.76 ±10.2% for GLM(HbO) and

70.29 ±8.9% for GLM(HbR), the standard deviation being computed across the 15

subjects. The individual errorbars represent the standard error of the mean across the

5000 repetitions.

Finally Figure 3.7 B shows the classification accuracies computed on two separate sets

of data: data for all the channels that did not have Mayer waves modelled (i.e., no

“double bumps” [Figure 3.2 A]) and data for all the channels that did have them. In

this case, for the data without Mayer waves we found that LDA achieves an accuracy

of 79.1 ±6%, GLM(HbO) 77.8 ±9.3% and GLM(HbR) 82.4 ±8.2%, while for data

with Mayer waves, the accuracy decreases to 77.0 ±11% for LDA, 62.4 ±7.5% for

GLM(HbO)and 64.8 ±6.8% for GLM(HbR).

3.5. Results

Figure 3.7: (left) Individual classification accuracies for the

real-resting-state datasets, for LDA(HbO+HbR), GLM(HbO), and

GLM(HbR) (left, middle, right). The red line indicates the mean ac-

curacy reached by each algorithm over all the subjects. The errorbars

represent the standard error of the mean for each individual subject, over

all the iterations performed. The individual mean accuracies achieved

by the LDA method is significantly higher than those achieved by the

GLM(HbO) (p = 0.0002) and GLM(HbR) (p = 0.01), but no signifi-

cant difference was found between GLM(HbO) and GLM(HbR) (p =

0.24, Repeated Measures ANOVA 1-way with Fixed Effect: “Analysis

Method”). Also, the individual standard errors of the mean yielded by

the LDA are significantly lower than those achieved by the GLM(HbO)

and GLM(HbR) (p=0.021 and p=0.022, respectively), but no differ-

ence was found between those yielded by GLM(HbO) and GLM(HbR)

(p=0.97). (right) Classification accuracies computed on two subsets

of the real-resting-state datasets, one completely free from Mayer-wave

oscillations and the other one with all the HRFs tainted by double-

bumps. For the LDA, there is no significant difference between the ac-

curacies reached in presence and absence of Mayer waves (paired t-test,

p = 0.44), while for the GLM the difference was statistically significant

(GLM(HbO), p < 0.001, GLM(HbR), p < 0.001) Note: the analysis,

both with LDA and with GLM, was conducted on the two datasets

(with and without Mayer waves) independently

Correlation between classification accuracies and individual measures

The goal of this analysis was to quantitatively assess the impact of individual charac-

teristics (hair color, gender, age), and of the measurement time of day, on the indi-

vidual classification accuracy. Table 3.1 shows the results of the LME analysis. The

model shows a significant correlation between Hair Color and individual accuracies for

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

GLM(HbR), but not for any of the other fixed effects in the model, and there are no

significant correlations for either LDA or GLM(HbO). Figure 3.8 reports distributions

of individual accuracies grouped by hair color. More plots of accuracy distributions

grouped by the other effects are provided in the Appendix (Figures A.1, A.2, A.3).

LDA:HbO + HbR GLM:HbO GLM:HbR

ßp-value ßp-value ßp-value

Age 0.0042 0.1762 0.0010 0.8734 0.0068 0.1120

Hair Color 0.0361 0.1875 -0.0936 0.1020 -0.1408 0.0052

Gender 0.0038 0.1476 -0.0077 0.1552 -0.0026 0.4904

Time of Measurement -0.0187 0.5109 0.0573 0.3308 -0.0072 0.7540

Table 3.1: Results of the linear model fitted to the individual classifi-

cation accuracies, with fixed effects: Age, Hair Color, Gender and Time

of Measurement

Figure 3.8: Distribution of individual classification accuracies within

the two hair color classes (six blond and nine brown). The accura-

cies reached by the GLM(HbR) are significantly higher for blond-haired

subjects than brown-haired. More distributions for the other modelled

effects can be found in the Appendix. The central red marks represent

the median values, the blue boxes extend from the 25th to the 75th per-

centiles, and the black whiskers extend to the most extreme data points

not considered outliers (which are marked with red crosses).

3.5.2 Experimental results

Results from the experimental-data study are reported in Figure 3.9, as t-statistic

values for GLM(HbO) and GLM(HbR), and classifier outputs for LDA, for p≤0.05

3.5. Results

threshold. White cells indicate that the channel did not reach statistical significance.

Figure 3.9: Classifier results for the finger-tapping experimental data,

for the three different analyses. GLM t-statistic values and LDA clas-

sifier outputs (the latter derived from application of the separating hy-

perplane formula) are thresholded at p≤0.05. Blank cells indicate non-

significant values (i.e., that the corresponding channel was classified as

not active). The individual minimum value for statistical significance

for the results of the LDA classifier varied across channels, ranging from

0.12 to 1.18. The numbers of channels classified as active by the three

analyses are significantly different (p = 0.01, 1-way repeated measures

ANOVA).

To better understand the source of the variability in the results, plots of the block-

averaged trials for those channels, and topographic images of the channel-wise output

values (LDA output values, and βs for GLM HbO/HbR), were produced for each

subject. The images were produced via the visualization tool available in nirsLAB

v2017.06 [66].

Plots for all subjects and corresponding output values are available in the Appendix,

while here only the plots for subject 1 and subject 2 are reported.

Subject 1: Subject 1 results are non-significant at p = 0.05 for every channel according

to the GLM(HbO) analysis, and significant for channels 1, 4 and 16 in the GLM(HbR)

analysis, while the great majority of channels are classified as active (p<0.05) by the

LDA classifier.

In the plots of Table 3.10, we observe that the HbO time traces are greatly affected

by the double-bump typical of the 0.1 Hz systemic oscillation, and also that the first

peak after stimulus occurs earlier than the onset-to-peak time of the theoretical model

(6 sec).

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

An enlarged depiction of the block-average behaviour for Channel 16 is shown in Figure

3.11, together with the plots of the HRF model used in the GLM analysis and the

block averages of the resting-state trials used by the LDA classifier. The resting-

state HbO trace also includes a feature that is qualitatively similar to a hemodynamic

response, but the task response is correctly discriminated from the resting-state time

series nevertheless (p<0.001).

3.5. Results

Figure 3.10: Topographic images and block averages for all the analyses on Subject 1

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

Subject 2: All channels of Subject 2 are classified as active (p = 0.05) in the GLM(HbO)

analysis, while 6 of 20 are classified as active by the GLM(HbR) analysis, and 19 of 20

by the LDA classifier (3.13). A depiction of Channel 5 is presented in Figure 3.12).

Figure 3.11: Block-average data for Subject 1, Channel 16. On the

left the plot of the averaged signal is accompanied by the plot of the

model used by the GLM analysis, namely a canonical HRF with peak

time = 6s. On the right, the same plot is accompanied by the plot of an

example of average of resting state trials against which the task trials

are classified.

Figure 3.12: Block-average data for Subject 2, Channel 5.

3.5. Results

Figure 3.13: Topographic images and block averages for all the analyses on Subject 2

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

Corresponding results for all subjects can be found in the Appendix. For each subject,

a table is reported with:

•Topographic images of channel-wise GLM βvalues and LDA classifier outputs.

Large positive (or negative) βvalues indicate a good fit of the GLM model to the

HbO (or HbR) data, and a correspondingly better chance of that channel having

a statistically significant hemodynamic response. LDA outputs are negative if

the channel is classified as not active and positive is the channel is classified as

active. Therefore, a large positive classifier output value indicates a good chance

that the channel is labeled as active.

•Block averages of the signal in response to the stimulus (read and blue curves for

HbO and HbR, respectively). The shaded error bars indicate the standard error

computed over the experimental trials. The GLM plots are accompanied by the

canonical basis function used by the model; the LDA plots are accompanied by

the block averages of the resting-state trials. The block averages are shown only

for the channels covering the motor cortex.

3.6 Discussion

The statistical analysis of fNIRS data is often complicated by serial correlations, inter-

subject variability of the hemodynamic response, and the presence of systemic oscilla-

tions and possibly motion artifacts. The study presented in this paper demonstrates

that a data-driven approach (linear discriminant analysis, LDA) to data analysis is

more robust than the most commonly employed model-based approach (general linear

model, GLM) to many of these issues, and can therefore improve the detection of the

hemodynamic activation.

Advantages of the proposed LDA approach are that no assumptions on the structure

of the noise are necessary, and that no prior knowledge of the shape of the expected

hemodynamic response is assumed. The LDA method compares data from different

temporal segments of the same recording; namely, it compares, within the same subject,

time intervals corresponding to the resting state and to execution of the task. Thus

it constitutes a self-referencing approach, and in other fNIRS imaging contexts it has

3.6. Discussion

been shown that this data-analysis strategy can enhance detectability of effects that

are small in comparison to other sources of intra- and inter-subject variance [113]. As

such, LDA can generate information potentially superior, or at least complementary,

to the information yielded by a model-based approach. For example, if LDA recog-

nizes activation where the GLM doesn’t, it could mean that the GLM model does not

accurately represent the real HRF, and it might be worth investigating why this is so.

An additional strength of the multivariate LDA classifier proposed in this study is

that it combines features from the simultaneous variations in HbO and HbR time

series, while the GLM approach analyzes them independently. This results in the

former yielding a univariate channel-wise metric for “activation”, while the latter yields

separate beta coefficients for HbO and HbR. Performing statistical tests on a single

metric is highly desirable, especially for group-level studies.

To quantify and compare the classification performances of the three methods, we

made use of both synthetic and real resting-state data. The use of synthetic data,

for which the ground truth is known with certainty, also allowed us to benchmark our

methodology against recent literature regarding GLM classification accuracy [6].

The multivariate LDA classifier yielded greater classification accuracy than GLM, for

both the synthetic and real resting-state data (78.7% for LDA, 65.76 for GLM(HbO)

and 70.3% for GLM (HbR), in the real resting-state data case [Figure 3.6A]). Moreover,

we demonstrated that the LDA had less inter-subject variability, as illustrated in Figure

3.7, where the standard deviation of individual results was 5.1% about the mean for

LDA, as opposed to 10.2% for GLM(HbO) and 8.9% for GLM(HbR). In addition,

the linear mixed model fit of individual-subject accuracies to predictors Age, Hair

Color, Gender, and Time of Measurement revealed a significant effect only for Hair

Color and only on the accuracy achieved with GLM(HbR) (blond-hair accuracy >

brown-hair accuracy). The latter findings show that the observed differences between

accuracies of the model-based and data-driven approaches is not simply accounted for

by obvious (and easily absorbed into the classification model) demographic or physical

characteristics of either the subject (e.g., gender) or the measurement (e.g., time of

day).

The LDA results also are less sensitive than those for GLM to the “double bumps”

that were used to approximate Mayer waves synchronized with hemodynamic task

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

responses (in the presence of double bumps, classification accuracy falls from 79.1 ±6%

to 77.0 ±11% for LDA, from 77.8 ±9.3% to 62.4 ±7.5% for GLM(HbO), and from

82.4 ±8.2% to 64.8 ±6.8% for GLM(HbR) [Figure 3.7B]). These results confirm that

the GLM, at least when used with a fixed basis function for all subjects, as was the

case here, is less successful than LDA at picking up individual variability and atypical

activation patterns, and is at risk of false negatives. These results suggest the possibility

that the model used does not accurately represent the real hemodynamic responses and

that therefore a different model would need to be designed. In this respect, the results

of the one analysis can be used in support of interpreting the results of the other.

As a control study, additional simulations were performed to identify the classification

performance of the LDA-based method when applied to data that did not actually

contain any task-induced responses (either real or simulated) in the “Task” time in-

tervals. The classifier performed at chance level in these cases (results not shown),

suggesting that the possibility of false-positive results in the analyses of simulated and

real task-response data is not an important concern.

To further understand and characterize the performances of the two pipelines in a real

application, we used the two methods to analyze data from a motor experiment. The

optode array covered the motor cortex and its vicinity on both hemispheres. Eight out

of twenty channels were placed over the scalp positions most likely to cover the motor

area. In this scenario we could verify that the LDA-based classifier is less susceptible

to than GLM to 0.1 Hz systemic oscillations. This is illustrated for the subject con-

sidered in Figure 3.10: due to the systemic oscillations, the block-average HbO and

HbR traces in the eight channels over the motor cortex differ from the hemodynamic

response modeled in the GLM computations. Consequently, the GLM recognizes none

of these channels as “active”. Conversely, by contrasting “Task” temporal segments

with “Rest” temporal segments, the LDA classifier finds significant differences in all of

these channels, regardless of the presence of Mayer waves.

On the other hand, the LDA-based method generally classified more channels as active

in response to the motor task than the canonical GLM analysis did. Because no estab-

lished ground truth exists in the real data, this classification result warrants cautious

interpretation. Especially for channels that extend beyond the center of the motor

cortex, the activations found cannot be unambiguously attributed to neural activation

3.6. Discussion

caused by the motor task. However, the resting-state data classification results, and

inspection of the experimental hemoglobin time traces, show that these classification

results also are not easily dismissed as false positives.

In fact, as shown in recent literature [26], the fNIRS signal is not composed exclusively

of cerebral task-evoked signal but also includes cerebral non-evoked signal (“cerebral

resting state”), extracerebral task-evoked signal (“extracerebral confound”), and ex-

tracerebral non-evoked signal. Quantitative characterization of the three latter com-

ponents is still an open research question [25], which is why they could not be modeled

separately in our simulations. However, they are likely to be present in the motor

experiment data and offer a plausible explanation for the classification results: when

a difference is found between “Task” and “Rest”, not only the cerebral response to

the task, but also all the systemic hemodynamic changes provoked in the extracerebral

compartment by the execution the task (e.g., changes in heart rate, blood pressure, res-

piration rate), is discriminated from the resting state. These changes involve the whole

extracerebral layer, and therefore their effect extends beyond the probes that specifi-

cally illuminate the motor cortex [26]. To exclusively associate the found activations

with cerebral recruitment, a step that would be necessary, but is beyond the scope of

the present work, would be to remove from the data the physiological component mea-

sured in the extracerebral layers before the analysis, for example with multi-distance

NIRS measurements [59].

Finally, it is worthy of note that the experimental data, in agreement with the results

of the theoretical simulations, reveal great inter-subject variability in the comparative

sensitivities of GLM(HbO) and GLM(HbR). That is, some subjects’ hemodynamic

patterns are better interpreted, and hemodynamic task responses more detectable,

using HbO data, while others’ are better explained using HbR. This is a manifestation

of the highly subject-specific hemodynamic fingerprint that has been reported [103]. A

classifier, such as the one proposed, that takes into account simultaneous variations of

both hemoglobin components has the potential to overcome this limitation and offer a

more flexible analysis that adapts to the individual’s own hemodynamic characteristics.

In this respect, the proposed approach would be of especial application for populations,

such as young children, that exhibit “atypical” patterns of hemodynamic responses,

such as uncoupled HbO and HbR or inverted response direction [114, 115].

Chapter 3. From univariate to multivariate analysis of fNIRS data: a theoretical

formulation and validation

Nevertheless, in the current form the proposed approach only classifies “activations”

vs “non-activations.” As a future development, a non-parametric framework can be

formulated in order to test more complex hypotheses on the distributions of classifier

outputs, such as the comparison of amplitudes of the responses induced by different

conditions, within subjects or between groups of subjects.

3.7 Lessons learned

•With the proposed LDA-based approach, no assumptions on the structure of the

noise are required, and no prior knowledge of the shape of the HRF is expected.

•The proposed LDA-based classifier combines features from the simultaneous vari-

ation of HbO and HbR.

•LDA yielded classification accuracies that were, both overall and at individual

level, higher than those yielded by the GLM analysis; also, they were more stable

across subjects, i.e. less susceptible to individual variations, and less affected by

the presence of systemic oscillations.

•When employed in the context of a motor experiment, LDA classified as active

channels that extend beyond the motor cortex; the reason for this result could

likely be found in the systemic hemodynamic changes provoked in the extracere-

bral compartment by the execution of the motor task, that are picked up by the

classifier and used for the “task vs rest” classification.

•Future work should include the removal of the physiological confound from the

data, by means of short-distance channels. This procedure would allow to exclu-

sively associate the found activations with cerebral recruitment.

Chapter 4

Investigating the effect of literacy

on the functional organization of

the brain with univariate analysis

of fNIRS data

4.1 Introduction

Reading is a complex cognitive task. It is usually acquired during childhood, it requires

explicit effort and training, and it relies on a whole range of brain functions in order

to develop successfully.

In particular, it has been shown that the acquisition of literacy skills is associated

with the activation of a specific brain network that includes the temporo-parietal,

occipito-temporal and inferior-frontal areas of the brain [116]; furthermore, fMRI stud-

ies demonstrated that as the development of these skills progresses, activation of these

areas becomes less and less bilateral: while it is common in beginning readers to “em-

ploy” both left and right hemisphere (LH and RH, respectively), skilled reading involves

mainly the LH only [117, 118].

This knowledge is very useful: if a “typical-development trajectory” can be clearly

charted, then it is also possible to identify deviations from it and to characterize the

Chapter 4. Investigating the effect of literacy on the functional organization of the

brain with univariate analysis of fNIRS data

neurobiological features of reading disability, or to follow up a remediation program

for reading skills improvement.

Studies have further shown that, in addition to brain changes, reading skills can also be

predicted at behavioural level, by how well a child can read “pseudo-words”, namely

words that are phonologically but not lexically possible: pseudo-words – or “non-

words” - are often used in conjunction to real words in experiments of developmental

psychology because they allow testing separately phonological and lexical skills. It has

been shown that while both skilled and less skilled readers can repeat real words, less

skilled readers fail to repeat pseudo-words [52, 119].

The knowledge about the changes in brain organization associated with reading devel-

opment and about the predictive power of behavioural measures lay the foundations

of this project.

On one hand, neuroimaging has had a crucial role in the insights we have today about

the brain organization changes, and this is especially true for fMRI; however, there

would be several advantages of using fNIRS to follow these brain changes: its ease-

of-use, low operating costs and portability would facilitate greatly the diagnosis of

reading disability, that could be formulated in a more child-friendly setting and using

ecological conditions; but especially it would make possible a tight-scheduled follow-up

of the treatment, since there are no burdensome costs associated with each session, as

it is the case for fMRI.

For these reasons, the first goal of this project was to ascertain whether it is possible

to gain with fNIRS the same insights about brain organization changes associated to

reading development that have gained with fMRI.

On the other hand, the knowledge about the predictive power of certain behavioural

measures like pseudo-word repetition can be also be explored to see how this measure

correlates with hemodynamic patterns at different levels of literacy skills.

The ultimate ambition of this project is establishing an “fNIRS-based predictor” of

literacy: if it is possible to observe developmental changes in brain organization and also

to verify that non-words elicit different brain patterns depending on reading skills, then

we have a simple yet effective tool for facilitating the diagnosis of reading difficulties.

This project is the object of this Chapter and of Chapter 5.

4.2. Methods

In particular, in this Chapter we apply a standard processing routine, that includes

the use of a General Linear Model at subject-level, to respond to the two questions:

can fNIRS see the same as fMRI does? And can fNIRS measure different hemody-

namic patterns depending on reading skills? The first question could be successfully

addressed: as it is illustrated in the Chapter, it was possible to observe a clear shift in

the cerebral recruitment, from bilateral to left-hemispheric, as reading skills develop.

As for the second question, however, we will see that it was not possible, with the

employed analysis approach, to highlight any differential activation to non-words de-

pending on reading skills. For this reason, the focus of the next chapter will be to use

an alternative procedure based on Multivariate Pattern Analysis to try and respond to

the same question.

4.2 Methods

4.2.1 Participants

Eighty-three (83) healthy children (52 males, 31 females) between the ages of 3.7 and

7.9 years (M: 5.6, S: 1.05) participated in this study. Participants were recruited from

New Haven and the surrounding areas of Connecticut. The experimental protocols were

approved by the Yale Institutional Review Board. Participants were native English

speakers. Children who had a formal diagnosis of a developmental disorder did not

meet eligibility criteria for this study.

Measure

n 83

Age (years, M=mean, S= standard deviation) 3.7÷7.8 (M: 5.6, S: 1.05)

Gender (Male:Female) 52:31

IQ 73÷134 (M:110.3, S:13.6)

Letter-Word Decoding (WJ) 4÷55 (M:25.8, S: 12.3)

Table 4.1: Participant’s summary information

Chapter 4. Investigating the effect of literacy on the functional organization of the

brain with univariate analysis of fNIRS data

4.2.2 Behavioural assessment

Behavioural testing sessions assessed children’s reading abilities. In particular, the as-

sessment was performed using the standardized Woodcock-Johnson III Tests of Achieve-

ment (WJ-III; Woodcock, McGrew, and Mather, 2001).

WJ represents a comprehensive battery of psycho-educational tests aimed at measuring

general intellectual abilities, specific skills, oral language and academic achievement

across a wide age range. Out of the comprehensive collection of tests available, we

focused on the “Letter-Word Identification” (WJ-LW): in this sub-test, participants

are presented with written words of increasing difficulty, and they are required to

read them out loud. The raw score represents the number of words that were read

correctly. As simple as it is, the raw WJ-LW identification is an effective score of

reading abilities. For this reason, it was used in this study as a measure of literacy

skills of the participants and will be called from now on just “Reading Score”.

Additionally, children’s verbal and performance intelligence quotient (IQ) was mea-

sured using the Wechsler Abbreviated Scale of Intelligence (WASI-II or WPPSI-IV).

4.2.3 Experimental design

Participants were instructed to passively listen to 16 blocks of auditory stimuli, played

though headphones while looking at a fixation cross that appeared on a monitor.

Each auditory condition consisted of 8 blocks, with each block consisting of the repeated

sequence of one word or non-word. Non-words conformed to the phonological properties

of English, but had no meaningful referent. The duration in each block was 7 s and

contained 6 repetitions of the same word or non-word, with 100 ms of silence between

each repetition. There was a 13 s rest period between each block and the order of

blocks (word vs. non-word) was randomized.

Throughout the exposure to these stimuli, participants’ fNIRS signals were measured.

4.2. Methods

Figure 4.1: Structure of the experimental design

4.2.4 Data acquisition

4.2.5 Optode localization

A Patriot 3D Digitizer (Polhemus, Colchester, VT) was employed to measure for each

participant the anatomical locations of optodes in relation to standard head landmarks.

For each channel, the corresponding Montreal Neurological Institute (MNI) coordinates

were obtained using functions available in the NFRI toolbox [120], and used to perform

anatomical labelling, according to the Talairach atlas [121, 110].

The anatomical location of each channel can vary sensibly across different subjects,

due to individual anatomical differences, and this can affect the group level results. To

account for this effect, median MNI coordinates across participants were calculated for

each channel (Table 4.2), and were later used as described in Section 4.2.7. With the

MNI coordinates it was possible to determine the anatomical location of each channel,

using the Talairach atlas [110].

Ch# MNI coordinates Brodmann Area Pr(%)

X Y Z

1 -65.6 -52.33 -14.83 37 - Fusiform gyrus 64.8

2 -70 -28.33 -16.5 21 - Middle Temporal gyrus 97.5

3 -65.5 -8.33 -23.5 21 - Middle Temporal gyrus 58.7

4 -35.6 62.83 -7 10 - Frontopolar area 79.1

5 -12.3 72 -1.5 10 - Frontopolar area 98.8

6 17 72 -0.5 10 - Frontopolar area 100

7 39.3 63.33 -6.5 10 - Frontopolar area 87.3

Chapter 4. Investigating the effect of literacy on the functional organization of the

brain with univariate analysis of fNIRS data

8 68.17 -6.33 -21.33 21 - Middle Temporal gyrus 81.6

9 72 -26.5 -14.83 21 - Middle Temporal gyrus 98.7

10 67 -48.67 -15.17 37 - Fusiform gyrus 54.8

11 -62.3 -61 -5.5 37 - Fusiform gyrus 84.7

12 -70 -39.33 -2 21 - Middle Temporal gyrus 88.8

13 -70 -17 -6.33 21 - Middle Temporal gyrus 97.5

14 -62 2.3 -19 21 - Middle Temporal gyrus 100

15 -49.3 45.83 -5.33 47 - Inferior prefrontal gyrus 98.5

16 -25.3 67 9.34 10 - Frontopolar area 100

17 5 70 14.67 10 - Frontopolar area 100

18 28.83 67.67 10.17 10 - Frontopolar area 100

19 52 47.33 -2.83 47 - Inferior prefrontal gyrus 78.3

20 64 4.33 -14.83 21 - Middle Temporal gyrus 100

21 72 -14.5 -3.33 21 - Middle Temporal gyrus 83.5

22 72 -37.67 -0.83 21 - Middle Temporal gyrus 64.2

23 64 -58.5 -5.67 37 - Fusiform gyrus 93.1

24 -67.17 -50.33 9.67 22 - Superior Temporal Gyrus 65.8

25 -69 -26.33 13.5 42 - Primary and Auditory Association Cortex 98.7

26 -66.5 -5.33 7.83 22 - Superior Temporal Gyrus 93.2

27 -56.67 25.17 3.33 45 - pars triangularis Broca’s area 56.8

28 -12 66.67 24.83 10 - Frontopolar area 100

29 15.67 67 25.83 10 - Frontopolar area 100

30 60 27.67 7.67 45 - pars triangularis Broca’s area 96.3

31 68 -3.33 9 22 - Superior Temporal Gyrus 96.1

32 71.17 -25.33 13.17 42 - Primary and Auditory Association Cortex 97.5

33 69 -48.17 10.83 22 - Superior Temporal Gyrus 88.3

34 -63 -59.33 21.33 39 - Angular gyrus, part of Wernicke’s area 50.7

35 -68 -36.5 30.67 40 - Supramarginal gyrus part of Wernicke’s area 100

36 -67 -12.33 27.33 1, 3 - Primary Somatosensory Cortex 50.5

37 -61 11.33 14.17 44 - pars opercularis, part of Broca’s area 100

38 -51 41 14.33 46 - Dorsolateral prefrontal cortex 98.6

39 -24.7 54.83 34.33 9 - Dorsolateral prefrontal cortex 90.6

40 1.67 57.17 39.67 9 - Dorsolateral prefrontal cortex 100

4.2. Methods

41 27.33 55 34.67 9 - Dorsolateral prefrontal cortex 98.2

42 53 40.33 18.67 46 - Dorsolateral prefrontal cortex 100

43 63 13.83 17 44 - pars opercularis, part of Broca’s area 90.7

44 69 -9.33 28.5 6 - Pre-Motor and Supplementary Motor Cortex 47.4

45 70 -34.67 29.5 40 - Supramarginal gyrus part of Wernicke’s area 100

46 64 -58.33 21 39 - Angular gyrus, part of Wernicke’s area 44.7

47 -64 -46.5 39.67 40 - Supramarginal gyrus part of Wernicke’s area 100

48 -65 -22.5 39.33 1,2,3 - Primary Somatosensory Cortex 98.8

49 -63 1.83 31.33 6 - Pre-Motor and Supplementary Motor Cortex 97.3

50 -53 30.33 26.67 46 - Dorsolateral prefrontal cortex 95.7

51 -37.67 44.17 36.33 9 - Dorsolateral prefrontal cortex 96.2

52 -12.33 47.67 49.67 8 - Includes Frontal eye fields 100

53 14.33 47.33 50.5 8 - Includes Frontal eye fields 100

54 38.67 44.67 38 9 - Dorsolateral prefrontal cortex 100

55 55 29.67 28.5 46 - Dorsolateral prefrontal cortex 98.6

56 65 3.33 33.33 6 - Pre-Motor and Supplementary Motor Cortex 87.7

57 67.5 -22.5 40.67 1 - Primary Somatosensory Cortex 50.6

58 65.17 -47 38.67 40 - Supramarginal gyrus part of Wernicke’s area 100

Table 4.2: The X, Y, and Z columns represent median MNI coordinates

across subjects. Anatomical labelling of the MNI coordinates was con-

ducted using the Talairach atlas. The last column lists the atlas-based

probability that the channel coordinates are within that anatomical loca-

tion. In this table only the Brodmann area with the highest probability

is displayed; the complete table is available in the Appendix.

4.2.6 Pre-processing

Data were pre-processed using custom scripts written in MATLABTM. In particu-

lar, raw data were converted into optical density and then concentration changes of

oxy- and deoxyhemoglobin (HbO and HbR) using the modified Beer-Lambert Equa-

tion (∆HbO =−1.488 ×∆A780 +0.5970 ×∆A805 +1.4847 ×∆A830,∆HbR =1.845 ×

∆A780 −0.2394 ×∆A805 −1.0947 ×∆A830 , with A780, A805 and A830 being the op-

tical absorbances at 780, 805 and 830 nm, respectively).

Chapter 4. Investigating the effect of literacy on the functional organization of the

brain with univariate analysis of fNIRS data

After conversion into HbO and HbR changes, time traces were screened to detect and

correct motion artifacts. This step was performed using the Wavelet-based algorithm

described in [55] and available in Homer2 [64], and applied with parameter threshold

=1.5 and order N=2 Daubechies wavelet. After the motion artifacts correction, the

data quality of the channels was evaluated by computing the coefficient of variation

(CV) of each channel, defined as the standard deviation of the timeseries divided by its

mean value. Channels having CV > 8% for either HbO or HbR were discarded from

the analysis. This criterion was defined by visual inspection.

The next step was to remove from the data the contribution of global systemic effects

originating in the superficial layers of scalp, dura, and peripheral vasculature and not in

the cortex [26]. To this end, a spatial filter based on principal component decomposition

[58] was applied to both HbO and HbR time traces.

Finally, a band-pass frequency filter was applied to the data in the range 0.01-0.2 Hz,

to ensure that the time traces did not contain contributions from either the very low

frequency systemic fluctuations, the respiration, or the heart rate oscillations. The

filter was designed as a zero-phase digital FIR filter with the MATLAB function filtfilt.

4.2.7 Statistical analysis

The statistical analysis at Subject Level was performed with a General Linear Model

(GLM), implemented via tools in the NIRS Brain AnalyzIR Toolbox [65]. In particular,

the design matrix was obtained by convolving the stimuli times with a Canonical

Hemodynamic Response Function (onset-to-peak time= 6 s, undershooting time= 16

s, duration= 32 s), and it was fit to the timeseries using the “iteratively reweighted

least squares autoregressive model” presented in [6] and available in the toolbox. This

implementation has been shown to have a good control of type-I-error rate [5].

This analysis resulted in two βcoefficients for each experimental condition (Word,

Non-Word).

To account for the fact that the anatomical location of each channel varies across sub-

jects, beta values were projected into the MNI standard brain space, namely, they were

normalized via linear interpolation according to the distance between each subject’s

4.3. Results

actual channel location and the median location of that channel across all subjects

(Table 4.2), according to the procedure described in [122]. This step ensured beta

values could be correctly compared at group-level.

For each channel independently, the Group Level Analysis was carried out by fitting a

2x2 Linear Mixed Model on the ß coefficients resulting from the Subject Level Analysis,

with between-subject factor Group (Non Readers, Readers) and within-subject factor

Stimulus Type (Word, Non Word), random intercept Participant and random slope

Stimulus Type. The model was implemented with the MATLAB function fitlme.

On each model, an Analysis of Variance (ANOVA) was performed aimed at quantify-

ing the main effects of Group and Stimulus Type and the interaction effect Group x

Stimulus Type. Additional T-tests were conducted only on the channels that resulted

significant (p < 0.05) from the previous step.

4.3 Results

4.3.1 Mixed-Effects Linear Model

The results of the channel-wise Mixed Effects Models are reported in Table 4.3, for

both HbO and HbR. To avoid type I errors due to multiple comparisons, p-values were

adjusted using the False Discovery Rate procedure by Benjamini and Hochberg [84].

Both HbO and HbR analyses showed a significant effect of Group in the Right Hemi-

sphere, especially in those channels pertaining to the Middle Temporal Gyrus (BA: 21,

channels 8, 9, 20, 21, 22), Fusiform Gyrus (BA: 37; channel 10, 23), Superior Temporal

Gyrus (BA: 22; channels 31, 33, 46), Supramarginal and Angular Gyri (“Wernicke’s

area", BA:39, 40; channels 45, 46) and the Pars Opericularis and Triangularis parts of

Broca’s area (BA: 44, 45, channels: 30, 43), but only the analysis of HbR results for

channels 32, 33, 44, 45, 46 kept statistical significance after FDR correction.

Only for these channels, additional two-sided t-tests were carried out to assess the

direction of the effect and it was found that in all of them Pre-Readers exhibit a

hemodynamic activation higher than Readers (Table 4.4).

Chapter 4. Investigating the effect of literacy on the functional organization of the

brain with univariate analysis of fNIRS data

Main Effect: Group (Non Readers vs Readers)

HbO HbR

Channel F DF1p pF DR F DF1p pF DR

87.22 203.4 0.0078 0.1130 0.75 243.0 0.3881 0.8658

94.69 231.0 0.0314 0.1861 2.30 231.0 0.1310 0.4468

10 0.94 219.0 0.3342 0.6057 6.11 219.0 0.0142 0.0634

20 7.94 243.0 0.0052 0.1011 1.10 243.0 0.2958 0.8658

21 8.07 243.0 0.0049 0.1011 5.17 243.0 0.0238 0.0987

22 4.79 234.0 0.0296 0.1861 6.37 201.9 0.0124 0.0634

23 4.08 213.0 0.0447 0.2159 7.89 187.5 0.0055 0.0510

31 8.07 246.0 0.0049 0.1011 6.72 199.9 0.0102 0.0594

32 6.77 246.0 0.0098 0.1140 8.82 203.4 0.0033 0.0388

33 4.62 234.0 0.0326 0.1861 8.89 191.6 0.0032 0.0388

43 5.80 237.0 0.0168 0.1621 6.21 170.9 0.0137 0.0634

44 4.48 237.0 0.0353 0.1861 10.11 175.1 0.0017 0.0338

45 3.65 243.0 0.0573 0.2555 10.88 191.8 0.0012 0.0338

46 4.59 228.0 0.0332 0.1861 10.19 186.1 0.0017 0.0338

56 2.44 234.0 0.1195 0.4348 7.43 159.5 0.0071 0.0510

57 1.30 225.0 0.2558 0.5506 7.21 180.3 0.0079 0.0510

58 2.34 231.0 0.1274 0.4348 7.36 178.3 0.0073 0.0510

Main Effect: Stimulus Type (Word vs NonWord)

1 4.92 180.5 0.0278 0.4744 4.11 180.9 0.0440 0.3070

25.14 185.2 0.0246 0.4744 4.11 179.4 0.0442 0.3070

35.73 177.1 0.0178 0.4744 4.17 168.1 0.0427 0.3070

11 3.93 182.7 0.0491 0.4744 3.65 182.5 0.0577 0.3070

12 4.03 181.9 0.0462 0.4744 4.14 177.7 0.0435 0.3070

13 4.31 189.0 0.0393 0.4744 4.81 172.9 0.0297 0.3070

24 3.49 190.4 0.0632 0.5161 4.16 187.9 0.0427 0.3070

Interaction Effect: Stimulus Type x Group

—

Table 4.3: Channels showing a significant (p≤0.05) effect of Group,

Stimulus Type or Stimulus Type x Group, before FDR correction. The

formula employed for the model was ’beta ∼stim*group + (stim|id)’;

categorical variables were coded with the “effects” coding, that means

that the sum of the dummy coefficients amounts to zero. This is also

known as deviation coding and it is appropriate for testing main effects

and interaction effects [123]

4.3. Results

Channel t degrees of freedom p

32 -2.62 244 0.0092

33 -2.82 232 0.0052

44 -3.23 235 0.0014

45 -3.37 241 0.0009

46 -3.14 226 0.0019

Table 4.4: Results of two-sided t-tests on HbR activation values be-

tween Pre-Readers and Readers in channels 32, 33, 44, 45 and 46. The

negative t-values indicate that the mean of activation in Pre-Readers is

significantly lower than the mean of activation in Readers, in each of

the tested channels.

Figure 4.2: Bar plots of activation values averaged across conditions,

channels (32, 33, 44, 45 ,46) and subjects within each group; bar plots

are reported for both hemoglobin forms to show that, although the test

did not prove statistically significant on HbO values, they nevertheless

mirror the HbR results, in that they both indicate a higher hemody-

namic response in these RH channels in Pre-Readers as compared to

Readers.

Chapter 4. Investigating the effect of literacy on the functional organization of the

brain with univariate analysis of fNIRS data

As for the main effect of Stimulus Type, the Mixed-Effects models showed that a few

channels, all located in the LH, mainly associated with the Middle Temporal Gyrus

(BA: 37, channels 1, 2, 3, 11, 12, 13) are significantly associated with a differential

activation between Words and Non Word, but this effect does not hold when the FDR

correction is applied. Therefore, these channels were not followed up to. Finally, none

of the channels resulted statistically significant for an interaction effect of Stimulus

Type x Group, as opposed to what initially was hypothesized.

4.3.2 Block averages

To visualize the time course of activation, blocks were averaged from the onset of the

presentation of the stimulus with a duration of 15 sec. A period of half a second before

the onset was used for baseline correction.

To highlight the Group effect resulting from the statistical analyses, blocks were av-

eraged over subjects within each group, irrespective of stimulus type, and these block

averages are shown in Figure 4.3 along with the scalp map of F-values reported in

Table 4.4. The shaded errorbars in the block averages’ plots represent standard errors

of the mean across subjects within each group. Scalp plots were produced with the

NFRI toolbox [85, 120].

4.3. Results

Figure 4.3: (center) Scalp plot of F-values resulting from the ANOVA

test on the results of the Linear-Mixed Effects model on HbR beta val-

ues. From this analysis, a main effect of Group resulted statistically

significant for channels 32, 33, 44, 45 and 46, whose block averages are

reported around the scalp plot. The plots confirm that in these channels,

hemodynamic activation is greater in Pre-Readers than in Readers.

Chapter 4. Investigating the effect of literacy on the functional organization of the

brain with univariate analysis of fNIRS data

4.4 Discussion

Cortical activation to auditory stimuli is known to be related to changes in brain

organization as children develop their reading abilities [52]. In particular, we know

from fMRI studies that, while beginning readers employ a wide network of brain areas

distributed over both hemispheres, the advancement of reading skills is associated with

a shift in the recruitment of brain regions towards the left hemisphere (LH). Also, we

know that while skilled readers are able to repeat effectively both words and non-words,

less skilled readers have a hard time with the latter ones.

With these two concepts in mind, the ultimate ambition of this project is to explore

the feasibility of developing an “fNIRS-based predictor” of literacy: namely, we aim at

investigating whether it is possible to assess someone’s reading skills based solely on the

analysis of their brain patterns in response to words and non-words. In order to do this,

we had first of all to establish if it is possible to observe, with fNIRS, the changes in

brain organization associated with reading development that have been observed with

fMRI, namely, the shift from bilateral to left-hemispheric brain activation. Secondly,

we wanted to see if participants’ reading skills could be correlated to different patterns

of activation in response to non-words.

To this end, we recorded the fNIRS signal of children of various ages while they were

passively listening to words and non-words.

We divided participants into two groups based on their reading abilities: namely, the

first group included children who were able to read up to the simplest 20 words, while

the second group included children who could read more words. We called the first

group “Pre-Readers” and the second group “Readers”.

It is important to note here that groups were clearly not balanced with respect to the

age factor, i.e., the mean age of the Pre-Readers group was much lower than that of

Readers (M: 4.9, S: 0.7 for the Pre-Readers VS M: 6.1, S:0.8 for the Readers), but they

were characterized by similar IQ (M:109.9, S: 13.6 for Pre-Readers, M:110.7, S: 13.9

for Readers) .

The fact that ages were very different in the two groups can be seen as a limitation

of this analysis, but for the goals of this project this was not a problem: indeed,

4.4. Discussion

we were not interested in distinguishing the effect of brain maturation from reading

development, but instead in exposing any kind of group difference in the perception of

the non-words, irrespectively of the source of this difference.

In the first steps of this project, described in this chapter, we employed a very stan-

dard analysis procedure: pre-processing included motion artifact correction, removal

of global systemic component and frequency filtering; at subject-level, timetraces were

modelled with a General Linear Model with the two experimental conditions, Words

and Non-Words, as predictors. Coefficients resulting from subject-level statistics were

then employed at group-level, to explore whether there was any significant main effect

of Group (Pre-Readers vs Readers), or Stimulus Type (Word vs Non-Word), or, most

interestingly, an interaction effect of Group x Stimulus Type.

This analysis showed an important main effect of Group in the brain areas of the

RH associated with language processing: the Middle and Superior Temporal Gyri, the

Fusiform Gyrus, the Pars Opericularis and the Pars Triangularis. In these areas, we

found that Pre-Readers exhibit a significantly greater hemodynamic activation com-

pared to Readers, irrespective of the type of stimulus. This finding is very important

because it proves that changes in brain organization during reading development can

indeed be observed also with fNIRS, and that has undeniable benefits: the possibility of

observing the developmental trajectory opens up the way to using fNIRS for following

up a reading remediation program, for example.

The second goal of this project, however, could not be pursued: the analysis did not

show any significant interaction effect between Group and Stimulus type, namely, it

was not possible to detect the differential activation pattern in response to Non-Words

between the two groups that was initially hypothesized. Therefore, another analytical

approach was explored that classifies distributed patterns of neural activity in response

to the different stimuli, and it is the object of Chapter 5.

Chapter 4. Investigating the effect of literacy on the functional organization of the

brain with univariate analysis of fNIRS data

4.5 Lessons learned

•Pre-Readers exhibit a greater hemodynamic activation than Readers in certain

channels of the Right Hemisphere, when listening to any auditory token – regard-

less of their meaning.

•The group effect was especially circumscribed to the Right Middle and Superior

Temporal Gyri, the Fusiform Gyrus, the Pars Opericularis and the Pars Trian-

gularis, all known to be associated to language processing.

•The group-level analysis did not show a differential activation to non-words de-

pending on the literacy skills.

Chapter 5

Decoding the mental representation

of word meaning in children to

predict their literacy skills, with

multivariate analysis of fNIRS data

5.1 Introduction

In the previous chapter, a literacy study was introduced where participants of various

ages and literacy abilities were presented with purely auditory stimuli including only

two conditions: words and non-words.

The goal of the study was twofold: on one hand, we aimed at confirming that learning to

read is accompanied by an increasing lateralization towards the Left Hemisphere of the

functions associated to language processing. On the other hand, we also wanted to test

whether it was possible to determine, based solely on the fNIRS patterns elicited by the

two conditions, the literacy abilities of the participants. Our hypothesis is motivated

by the notion, confirmed by several fMRI and PET studies, that Non-Word reading

and Non-Word verbal repetition is more difficult for non- and beginning readers [124];

a correct non-word repetition requires indeed a phonological processing capacity that

is sculptured by learning to read and write [125]. For this reason, non-word repetition

is known to be strongly predictive of reading outcomes [52].

Chapter 5. Decoding the mental representation of word meaning in children to

predict their literacy skills, with multivariate analysis of fNIRS data

Motivated by this understanding, our second goal was to test whether this increased

processing effort - that at behavioral level is manifested in failure in non-word repetition

– is encoded in the fNIRS signal. In other words, we wanted to test if it is possible to

distinguish, without using the behavioral evidence, the mental representations of words

and to non-words and, if such a dissimilarity exists, to use it for predicting reading

outcomes.

The first goal was achieved in the previous Chapter: it was possible, indeed, to expose

a more bilateral activation in the Pre-Readers, as opposed to the skilled Readers. This

finding is completely consistent with the literature on the topic and it is the first time,

to our knowledge, that such result has been revealed with fNIRS and by means of such

a simple experimental setup.

The second point, however, was left unresolved: with the applied approach it was not

possible to bring to light a differential activation for the two groups in response to the

two classes of stimuli. Therefore, we followed-up these results with another analytical

approach: Multi-Variate Pattern Analysis (MVPA).

MVPA is an approach aimed at decoding neural activity represented across a dis-

tributed group of channels, by means of powerful classification algorithms [90].

Several studies have shown that these methods can indeed uncover information that

is not revealed by conventional univariate statistical analysis [126, 127, 128, 129], sim-

ply because distributed patterns of neural activity may encode significant information

without producing a robust contrast when considered in a univariate fashion.

For this reason, the use of these techniques has become much widespread especially in

the EEG field , and gradually to the analysis of other neuroimaging data like fMRI [91,

130], MEG [131] and also extracellular recordings [132]; these studies have confirmed

that the distributed patterns of brain activity can decode many different classes of

stimuli [7, 8].

The goal of this work was to use MVPA to classify patterns of hemodynamic responses

to Words and to Non Words, both within-subject and within-group.

Consistently with the work presented in Chapter 3, a classifier was implemented that

used features drawn from the simultaneous variation of both oxygenated and deoxy-

genated components of the hemodynamic response measured with fNIRS.

5.2. Methods

Furthermore, the classifier was developed based on Regularized Linear Discriminant

Analysis (LDA), as it was done in Chapter 3, but also on Support Vector Machines

(SVM), in order to compare the results yielded by the two methods.

As for the within-subject analysis, we explored whether the individual classification

performances were representative of the lexical skills of the individuals, by means of

correlation analysis and also by comparing the averages over groups of subjects.

The within-subject analysis was severely limited by the number of trials available for

the trial-by-trial classification; therefore, we also applied the classifier to trials pooled

across subjects, after they were grouped according to their literacy skills (“within-

group” classification).

This chapter builds upon and extends the findings of the following publication:

[11] Gemignani J., Bayet L., Kabdebon C., Blankertz B., Pugh K.R, Aslin R.N. (2018),

Classifying the mental representation of word meaning in children with Multivariate

Pattern Analysis of fNIRS, 2018 40th Annual International Conference of the IEEE

Engineering in Medicine and Biology Society (EMBC)

5.2 Methods

5.2.1 Participants

Participants were the same as those described in Chapter 4. Out of 83 subjects, 11

were excluded because they completed less than 8 trials per condition.

In addition to this, 7 (seven) more subjects were discarded from the analysis because,

based on the quality check of the timeseries described in Section 4.2.6, they had more

than half channels with bad quality.

Therefore, the total number of participants used in this analysis was 65 (sixty-five).

Table 5.1 summarizes the characteristics of the new subset:

Chapter 5. Decoding the mental representation of word meaning in children to

predict their literacy skills, with multivariate analysis of fNIRS data

Measure

n 65

Age (years, M=mean, S= standard deviation) 3.7÷7.7 (M: 5.5, S: 1.1)

Gender (Male:Female) 35:30

IQ 74÷134 (M:110, S:13)

Letter-Word Decoding (WJ) 4÷55 (M:20.8, S: 13.7)

Table 5.1: Participant’s summary information

Pre-processing

The pre-processing of the data was identical to the one described in Chapter 4. For the

purpose of trials-classification, after the band-pass frequency filter, time traces were

segmented into trials, with a time window of -0.5 s before each trigger to 15 s after; each

trial was then baseline corrected by subtracting the mean value over the 0.5 seconds

before the trigger. This ensured all trial to be aligned at time 0. The grand averages

of the epoched time traces are depicted in Figure 5.1.

100

5.2. Methods

Figure 5.1: Grand averages of the epoched timeseries; timeseries were

averaged across trials, channels and finally subjects. Errorbars represent

the standard error of the mean across subjects. The shaded area of the

plots represent the period of stimulation of 7 seconds, during which

participants passively listened to the auditory stimuli (Words or Non

Words).

101

Chapter 5. Decoding the mental representation of word meaning in children to

predict their literacy skills, with multivariate analysis of fNIRS data

5.2.2 Within-subject classifier

Feature extraction

Features were extracted for each block and for each channel from both HbO and HbR

time traces, separately. The largest peak values of HbO and HbR were computed

within the time window from 5 to 10 seconds after onset of the stimulus, since it’s the

time interval where the peak of the hemodynamic response is most likely to occur [34].

To ensure the meaningfulness of the features in representing the hemodynamic activa-

tion in response to the task, the peak value was computed as the largest positive value

for HbO and the largest negative value of HbR, because HbR is expected to have a

negative deflection in response to neural activation [19, 24].

Each trial’s feature vector was then built by concatenating features from each of the

selected channels. In order to account for the fact that the two hemispheres may

contribute differently to lexical processing, and to explore the possible impact of this

factor on the classification results, three different schemes were employed for channel

selection and classification: in addition to using all channels (whole-brain), we also

employed, for each hemisphere, a combination of several anatomically-defined regions

of interest. Table 5.2 describes the locations of each of them.

102

5.2. Methods

Figure 5.2: (Top) Features are extracted from each block of HbO

and HbR timeseries as the positive and negative peak value, respec-

tively, within the time window [5-10] seconds after the presentation of

the stimulus. Features were normalized by removing their mean value

and dividing by their standard deviation. (Bottom) Distribution of

normalized features for each class, Word and Non Word.

103

Chapter 5. Decoding the mental representation of word meaning in children to

predict their literacy skills, with multivariate analysis of fNIRS data

Channel

subset ID Description Brodmann

Areas

Selected

channels

1 Whole brain – 1. . . 58

(RH) Fusiform Gyrus,

Middle Temporal

Gyrus; Inferior pre-

frontal G., Superior

Temporal G, Supra-

marginal G., Angular

G., Pars Triangularis,

Pars Opericularis

37, 21; 47,

22, 45, 39,

40, 44

1, 2, 3, 11,

12, 13, 14,

15, 24, 26,

27, 34, 35,

37, 47

(LH) Fusiform Gyrus,

Middle Temporal

Gyrus; Inferior pre-

frontal G., Superior

Temporal G, Supra-

marginal G., Angular

G., Pars Triangularis,

Pars Opericularis

37, 21; 47,

22, 45, 39,

40, 44

8, 9, 10,

19, 20, 21,

22, 23, 30,

31, 33, 43,

45, 46, 58

Table 5.2: Classification was performed on brain patterns distributed

over the whole brain, plus two hemispheric macro-regions, that were

identified as groups of adjacent anatomically-defined regions of interest.

Features extracted from HbO and HbR timetraces were concatenated into a single fea-

ture vector; each feature was then normalized, by removing its mean value and dividing

by its standard deviation. Figure 5.2 shows the procedure of features’ extraction and

their distributions after normalization.

Classification of Multivariate Patterns

The obtained trials were classified using both Linear Support Vector Machines as well

as Regularized Linear Discriminant Analysis.

Linear SVMs were implemented in MATLAB 2017b with the libsvm-3.11 library [133],

while Regularized LDA was implemented via tools available in the Berlin Brain-Computer

Interfacing (BBCI) Toolbox [104].

104

5.2. Methods

In both cases, cross-validation was performed using 4 folds and 100 permutations: the

sixteen blocks were randomly divided into 4 folds, with each fold containing 2 blocks for

each class. At each repetition, three folds and the remaining fold were used for training

and testing, respectively. At each repetition, the order of the trials was permuted. For

each subject, classification accuracies were averaged across the permutations.

5.2.3 Analysis of individual classification accuracies

The achieved classification accuracies were evaluated both at subject-level and at group-

level.

At subject-level, we wanted to test, for each participant, the statistical significance of

the achieved accuracy; to this end, the empirical subject-level null distribution was

estimated by classifying trials after classes’ labels were randomly shuffled; to produce

a robust distribution, the procedure was repeated 1000 times. After that, each partici-

pant’s observed accuracy was tested by computing its p value as the proportion of the

null distribution greater than, or equal to, the observed value.

Figure 5.3: Examples of how individual classification accuracies were

statistically assessed. In Panel Aan example of individual performances

that did not reach statistical significance, while in Panel Bthe individual

classification accuracy of 76% was higher than the 95th percentile (75%),

obtaining a p-value of 0.049.

At group-level, we were interested in exploring if and how classification accuracies were

related to the reading abilities of the participants. Therefore, firstly, we employed the

Reading Score as a continuous variable, by fitting a Mixed Effect Linear (LME) Model

105

Chapter 5. Decoding the mental representation of word meaning in children to

predict their literacy skills, with multivariate analysis of fNIRS data

with a random intercept for each participant, and Reading Score as fixed effect, for

each subset of channels. The model was fitted in MATLAB 2017b.

Secondly, the Reading Score was used to divide participants into two groups, in the

same fashion used in Chapter 4. For each area, an unpaired t-test was used to compare

classification accuracies achieved by the two groups (Pre-Readers vs Readers).

5.2.4 Within-groups classifier

One of the major limitations of the within-subject classification is posed by the low

number of available observations, especially considering the large number of features

(116, in case of the whole-brain scheme). To mitigate this issue we also performed the

same Word vs Non-Word classification on the trials pooled within the two groups (Pre-

Readers and Readers). Within each group, each individual’s trials were concatenated

together; a feature matrix of 116 x 544 was obtained for the Pre-Readers and a 116 x

496 for the Readers.

The obtained trials were classified using LDA and SVM; in both cases, cross-validation

was performed using 4 folds and 100 repetitions; at each repetition, trials were randomly

permuted, and classification accuracies were averaged over folds and repetitions.

Additionally, for both groups, the null distribution was generated by repeatedly ran-

domly shuffling class labels, and applying the same classification routine (1000 repeti-

tions).

5.3 Results

5.3.1 Overall performances: SVM vs LDA

Across the whole pool of subjects and all brain areas, classification accuracies yielded

by SVM and LDA did not result significantly different.

106

5.3. Results

Figure 5.4: (Left) Distribution of classification performances across

all subjects and all channel groups. (Right) The performances of the

two algorithms did not significantly differ.

Even when considering the three employed channel subsets separately, the two methods

performed similarly (Figure 5.5).

For this reason, in the subsequent sections only results yielded by LDA will be pre-

sented, but very similar analogous results were found by using SVM-based accuracies.

5.3.2 Individual performances

At individual level, statistically significant classification accuracies were achieved only

for few participants. In particular, when whole-brain channels were employed, only

Subject # 17 had a significant classification accuracy (79%, p= 0.038); when LH chan-

nels were employed, only Subjects # 11 and # 74 had significant accuracy (84%, 80%

, respectively, and p= 0.007 and 0.033); finally, with RH channels, Subjects # 11, #74

and #90 had significantly high accuracies (76%, 77%, 79% with p= 0.040, 0.047, 0.045,

respectively).

5.3.3 Correlation between individual decoding accuracies and

reading abilities

The correlation analysis between achieved classification accuracies and individual read-

ing scores was performed by means of Linear Mixed Effects (LME) models. In neither

107

Chapter 5. Decoding the mental representation of word meaning in children to

predict their literacy skills, with multivariate analysis of fNIRS data

Figure 5.5: Classification accuracies achieved by LDA and SVM within

the channel subsets defined in Table 5.2

of the three combinations of channels was the correlation between accuracies and scores

significant at p= 0.05; correlations where found to be positive when using LH channels

and negative when using RH channels:

Channel group Estimated regression coefficient Standard Error p

Whole Brain 0.03 0.18 0.8548

Whole LH 0.26 0.16 0.1232

Whole RH -0.10 0.20 0.6396

Table 5.3: Results of the LME models performed within each channel

group (’Accuracy ∼ReadingScores + (1|P articipant)’)

108

5.3. Results

Figure 5.6: In none of the three employed channel groups was the

correlation between Reading Scores and Classification Accuracies statis-

tically significant at p=0.05; nevertheless, a pattern can be qualitatively

detected of negative and positive correlations in the RH and LH, respec-

tively.

In light of the fact that the Reading Score is highly correlated with the age of the

subjects, an additional analysis was performed with the intent of removing the effect

the age from the reading scores, by means of Partial Least Squares regression (Figure

5.7.

109

Chapter 5. Decoding the mental representation of word meaning in children to

predict their literacy skills, with multivariate analysis of fNIRS data

Figure 5.7: Effect of removing the effect of Age from the original Read-

ing Scores, using Partial Least Squares (PLS) regression. The dashed

lines represent the least squares correlation lines (r= 0.67 for the original

scores, r= 0 for the “corrected” scores).

Then, the LME analysis with Classification Accuracies as the response variable was

run again, this time using the residuals of the regression between age and reading

scores as a predictor. The models yielded similar results, namely, negative and positive

correlations, but not statistically significant, in the RH and LH, respectively.

Finally, we also employed the Reading Score to group participants in two groups Pre-

Readers and Readers, to reproduce the conditions we employed in Chapter 4 for the

univariate analysis. Then, classification accuracies of the two groups were compared

using two-tailed unpaired t-tests.

Channel group t95%Confidence Interval p

Lower Upper

Whole Brain 1.665 -1.250 13.778 0.100

Whole LH 0.215 -6.399 7.943 0.830

Whole RH 2.248 1.031 17.490 0.026

Table 5.4: Results of unpaired t-tests between groups (Pre-Readers vs

Readers), divided at ReadingScore=20 (degrees of freedom = 63).

110

5.3. Results

Figure 5.8: Boxplots of individual classification accuracies within the

two groups Pre-Readers vs Readers; the central red line in each box

indicates the median; the bottom and top edges of the box indicate

the 25th and 75th percentiles, respectively. The whiskers represent the

whole range of observed values. In the RH, the unpaired t-test revealed

a significant group difference, with Pre-Readers achieving better accu-

racies than Readers (60% vs 50%, respectively).

5.3.4 Within-groups classification

The results of the two within-groups classification support the findings already ob-

served: the classification accuracy reached in the Pre-Readers group is slightly better

than that reached in the Readers, but still not a good accuracy: the mean value is only

53.2%.

111

Chapter 5. Decoding the mental representation of word meaning in children to

predict their literacy skills, with multivariate analysis of fNIRS data

Figure 5.9: Results of within-groups classification. Classification accu-

racies obtained using trials from Readers are not statistically significant

for any of the channel subsets, compared to their corresponding null

distributions. For the Pre-Readers, accuracies are marginally significant

when using whole-brain channels and more so when using RH channels,

but still they are not high (53.2% and 54.1%, respectively).

112

5.4. Discussion

Figure 5.10: Boxplots of within-groups classification accuracies, for

the three selections of channels; the central red line in each box indicates

the median; the bottom and top edges of the box indicate the 25th and

75th percentiles, respectively. The whiskers represent the whole range

of observed values. The unpaired t-tests confirm a significant difference

between the two groups when using channels from the RH and also

Whole-Brain; however, the mean classification accuracy does not exceed

54.1%.

5.4 Discussion

Beginning readers make a greater effort, compared to skilled readers, when verbally

repeating non-words; this is due to the fact that correct non-word repetition requires

the emergence of a phonological processing capacity that is carved by the very process

of learning to read.

In this project, we explored whether this differential processing capacity – that at

behavioral level is manifested with a differential outcome in verbal repetition – is also

reflected in the fNIRS signal measured when the subject is only listening to words and

non-words.

To this end, we implemented a neural decoder, based on both LDA and SVM, and

applied it to the classification of distributed patterns of hemodynamic activity elicited

by word and non-words.

In accordance to the findings presented in Chapter 3, the classifier was trained to

use features from both the HbO and the HbR component of the fNIRS signal. To

our knowledge, this is the first attempt at using both hemoglobin components in the

113

Chapter 5. Decoding the mental representation of word meaning in children to

predict their literacy skills, with multivariate analysis of fNIRS data

context of a multi-channel analysis of fNIRS, thus it extends the work done so far in

this field.

Also, to keep into account the possible spatial specificity of the processes under inves-

tigation, not only patterns distributed over the whole brain were examined but also,

separately, patterns distributed only over the Left and over the Right Hemisphere were

used for classification.

The classification was performed at two different levels: firstly, it was performed within-

subject. This allowed to obtain a value of classification accuracy for each individual,

that could be used to interrogate the relation between decoding and literacy skills.

The first interesting result is that the performances of LDA and SVM were basically

equivalent; we were interested in this comparison because in our own previous work

[11], following relevant literature, we had employed only SVM, but several other works

reported that regularized LDA and linear SVMs tend to have both good and not

significantly different performances [134, 135]. Our results supported this observation,

and for this reason we focused on LDA for the remainder of the work.

Nonetheless, the within-subject classification yielded discouraging results: ranging from

20% to 79%, the average accuracy across all subjects did not exceed chance level;

furthermore, when comparing each subject’s observed accuracy with their respective

null distributions, only one subject turned out to have a significant accuracy at p<0.05.

The large variability of the classification outcomes may suggest that the individual

subjects’ data quality was inconsistent, or also that the individuals’ response patterns

to the stimuli differed too much from one another.

Another possible explanations can be found in the limited number of trials available for

classification: having only 8 trials per classes, the experimental design was probably

not well suited for within-subject classification, even more so considering that, by

concatenating HbO and HbR features, the vector included 116 features.

Furthermore, it becomes apparent from Figure 5.2 that using the peak value of each

epoch as a single feature may not necessarily have been representative enough: rather,

it would have been better to sample several values throughout the whole trial, as we

did in Chapter 3 for example. While that would likely capture the characteristics of the

trials much better, it would increase the dimensionality of the features far too much to

114

5.4. Discussion

be feasible, considering the limited number of observations available and the number

of channels.

Finally, the large variability could actually be representative of the question under

investigation, namely, it could be related to individual differences in the mental repre-

sentation of non-words and thus to individual differences in reading abilities. To test

whether this was the case, we performed a correlation analysis between the achieved

accuracies and the reading scores, but in neither of the three schemes for channel

selection was the correlation found significant.

However, when looking at accuracies averaged over the groups of subjects, divided using

reading scores, a significant group difference was found when using channels from the

Right Hemisphere (p= 0.026), and a marginally significant difference was found for the

“Whole-Brain” scheme (p= 0.1).

These results, however weak, show a tendency toward a greater word vs non-word

difference in the Pre-Readers; when taken as a group, they exhibit indeed a higher

decoding accuracy than Readers, especially in the Right Hemisphere (median: 60% vs

50%, respectively).

To mitigate the excessively unbalanced ratio between the number of features and the

number of trials, classification was also performed within-groups: trials within each

group, Pre-Readers and Readers, were pooled together so to achieve a higher number

of observations. Also in this case, results were weak: only for Pre-Readers, the achieved

accuracies were only marginally significant in the “Whole-Brain” scheme (p= 0.1) as

well as in the “RH” scheme (p=0.07), with quite low nominal values (53.2% and 54.1%,

respectively).

However low these accuracies are, a group difference does definitely exist: for the group

Readers the distribution of accuracies across permutations overlaps completely with

the null distribution (Figure 5.9) and it is significantly lower than that of Pre-Readers

both when using channels from the Whole-Brain and from the RH (Figure 5.10); this

observation supports the hypothesis that as children strengthen their reading abilities,

the difference in fNIRS patterns between word and non-words lessen.

In this Chapter, we conclude that with multi-variate approaches it is indeed possible

to explore finer-grained effects that escape the lens of the univariate analysis. In the

115

Chapter 5. Decoding the mental representation of word meaning in children to

predict their literacy skills, with multivariate analysis of fNIRS data

very case of this dataset, the experimental design was probably not optimally suited for

this use; we did, however, manage to bring to light a trend, although only marginally

statistically significant, that was not possible to observe in the channel-wise analysis.

Future applications of this method should include a careful planning of the experimental

setup, so as to be suitable for trial-to-trial classification: especially it should have a

sufficient number of trials in consideration of the number of channels to be used.

5.5 Lessons learned

•LDA and SVM performed very similarly.

•Subject-level classification accuracies were largely not significant.

•A statistically significant correlation between subject-level decoding accuracies

and reading abilities could not be identified.

•When averaged across groups, Pre-Readers’ accuracies were significantly higher

than Readers when using patterns from the RH and marginally also when using

whole-brain patterns.

•Within-groups classification accuracies differed substantially between the two

groups, with Pre-Readers achieving a higher decoding accuracy than Readers,

but their nominal value didn’t exceed 54.1%at most.

•MVPA has a great potential to bring to light subtle effects that escape the

channel-wise analysis, but for it to be really successful a careful planning of

the experimental setup is crucial. In particular, a number of trials should be ad-

ministered high enough to allow trial-by-trial classification within each subject.

116

Chapter 6

Summary and Conclusions

As the use and application of fNIRS is becoming increasingly widespread, the need for

an ad-hoc standardized data analysis procedure is now more pressing than ever.

A long way of work and research is still required to establish guidelines and automated

procedures. Nevertheless, much effort is being made in the direction of developing new

analysis methods and systematically comparing them to traditional ones, in quest of

defining a recipe for achieving the best and most accurate results. This doctoral thesis

is framed within this attempt.

In particular, the goal of this work was to examine and tackle some of the most critical

sources of concern of a typical data analysis workflow, in particular when the subjects

are children.

The first one was the inter-subject variability of the hemodynamic response, which

means that different subjects may not only respond differently to the same task but

that, when responding to the task, their hemodynamic responses may themselves vary

significantly in shape and characteristics. Furthermore, there is also a certain extent

of intra-subject variability: due to changes in muscles’ and small capillaries’ configu-

rations, different brain areas are sometimes characterized by hemodynamic responses

of different shapes. These two issues together speak to the fact that the model-based

type of analysis is oftentimes inaccurate; an a-priori fixed HRF model can hardly cap-

ture this natural variability and actual neural activity may remain undetected. This is

especially relevant with children, since they often display a non-typical hemodynamic

response, that is even more difficult to model.

117

Chapter 6. Summary and Conclusions

Another source of complication is given by the choice of chromophore: by analyzing

separately oxy- and deoxy-hemoglobin, it is rarely the case that the results on one

chromophore do precisely mirror the results on the other. This issue is problematic,

for instance, for the statistical test at group-level: since the hemodynamic fingerprint

is so subject-specific, some individuals may naturally show stronger results with one

chromophore while other individuals with the other one. In such a situation, two

separate group-level analyses for the two chromophores may prove highly ineffective in

uncovering the existing effects.

This doctoral work has been as a two-steps process: in the first part (Chapter 3)

these problems have been tackled in a very theoretical fashion, while the second part

(Chapters 4 and 5) provided a direct application of the newly proposed method to

experimental data collected in the context of a literacy development study.

In the first part, we proposed to classify single channels as “active” or “not active”,

using Linear Discriminant Analysis. Features included amplitude and slope of both

oxy- (HbO) and deoxy- hemoglobin (HbR). To precisely quantify the performances of

this method, a synthetic dataset was generated in which hemodynamic activations of

various sizes and shapes were added to physiological resting state signal. Moreover,

part of the simulated hemodynamic responses were also artificially affected by Meyer

waves; this allowed to compare the uni- and multivariate pipelines also in terms of

robustness against this physiological confound.

The classification was conducted, within each subject, by comparing data from differ-

ent temporal segments of the same recording (“resting state” vs “task”). As for the

univariate approach, data was analyzed by fitting a General Linear Model (GLM) to

HbO and HbR traces, separately. In terms of performances, the multivariate classifier

yielded a significantly greater classification accuracy than GLM, both when applied to

HbO and HbR.

Furthermore, the performances yielded by the (HbO+HbR) classifier also resulted less

susceptible to individual variations and physiological confounds. When considering

the correlation between achieved accuracy and demographic characteristics like Hair

Color, Age, Gender and Time of Measurement, we found that none of these correlated

significantly with the classification accuracies achieved by the multivariate classifier,

118

Chapter 6. Summary and Conclusions

while accuracies achieved by GLM on HbR were significantly higher in blond-haired

subjects compared to brown-haired.

This is a clear indication that a model-based analysis is an approximation that cannot

capture the intrinsic variability that exists between and within subjects, while a data-

driven strategy that works on the comparison “rest” vs “task” not only within the

same subject but also within the same channel enhances the detectability of effects,

also when they are small in comparison to other sources of variability.

In the second part of this project, the comparison between the two pipelines was put in

practice by analyzing a large dataset of fNIRS data collected in the context of a literacy

study. The goal of this project was twofold: on one hand, we aimed at highlighting

a differential functional lateralization in response to auditory stimuli depending on

literacy levels: since the visual system of a young reader specializes gradually, during

the first stages of reading both hemispheres should functionally activate, even when

only listening to speech, while with growing reading expertise, functional activation

should be progressively more focused and should slowly converge to the left occipito-

temporal areas.

The other goal of the project was to reveal a differential activation, depending on

literacy levels, to non-words: this hypothesis comes from the notion that as reading

increases phonological awareness, skilled readers can rely on the phonemic code they

acquired and can easily process speech sounds, even when they are meaningless; on

the other hand, beginning readers do not have yet this phonemic code, therefore they

struggle in processing and repeating non-words. Therefore, we explored whether it

was possible to distinguish various levels of literacy only by looking at hemodynamic

responses to non-words.

To pursue these goals, both analysis approaches were employed: first, a General Linear

Model was fit to the data at subject-level; these results were fed into a group-level

model looking to predict the main effects of Group (Pre-Readers, Readers), Stimulus

(Word, Non-Word) and the interaction effect Group x Stimulus. This step was success-

ful in that it clearly highlighted that Pre-Readers exhibit a more bilateral functional

activation than Readers.

This first result is a very important lesson of this project that should not go overlooked:

119

Chapter 6. Summary and Conclusions

the channel-wise analysis has many limitations, that we extensively listed, but has

the unique ability of providing a clear localization of the brain regions reflecting the

cognitive process under investigation. With this respect, univariate methods have the

advantage of being directly interpretable from a neuroscientific point of view.

However, in our experiment, it failed to expose a differential activation of one group

to one type of stimulus. Therefore a multivariate approach was undertaken.

In doing so, the concept of the classifier proposed earlier was taken even further: while

in the first part it was multivariate in that it combined features from the concurrent

variations of HbO and HbR, here it was also multichannel; the goal was, indeed, to use

HbO and HbR features from the whole pattern of channels to classify trials of Words

and Non-Words, with the expectation that classification accuracies would be higher for

lower levels of literacy skills. In addition, another extension we made to the previously

developed classifier was that this time it was also implemented with Support Vector

Machines (SVMs), for comparison with LDA.

The results of this analysis were admittedly underwhelming: when compared to their

respective null distributions, individual classification accuracies were largely not sig-

nificant. Furthermore, the correlation analysis between classification accuracies did

not reveal a significant relation between different activations between Words and Non-

Words and reading scores, when classifying patterns distributed across the whole brain.

However, when using patterns distributed over the right hemisphere and averaging

accuracies across pre-defined groups, we did find that Pre-Readers’ accuracies were

significantly higher than the Readers’.

There are a few possible explanations for these mediocre results; one is that the starting

hypothesis is flawed: beginning readers and illiterates do have a hard time processing,

memorizing and repeating out loud pseudo-words, and this is a known fact that has

been proven by several studies. However, this may not necessarily translate into a much

different functional activation to non-words with respect to real words, and so into a

high decoding accuracy, especially only at auditory level. This kind of neuroscientific

considerations extend beyond the scope and the field of operation of this doctoral work.

Another, very likely, explanation has to do with the experimental design: with only

8 trials for each class, 58 channels and 2 features for each channel (HbO and HbR),

120

Chapter 6. Summary and Conclusions

the feature matrix ended up having 16 observations and 116 features. This ratio was

definitely excessively unbalanced and so, in order to mitigate the issue, an additional

classification was performed by pooling participants’ data within each group (Pre-

Readers and Readers). This way, the feature matrices had a more balanced ratio

between number of features and number of observations. In this case, classification

accuracies achieved by the Pre-Readers group were marginally higher than those by

Readers group, both using whole-brain patterns and RH patterns.

Therefore, MVPA did manage to uncover, if not really a significant correlation, at least

a trend in the direction that we initially hypothesized.

This thesis has been an evolution: it identified some of the important challenging

factors for fNIRS data analysis; it proposed a new multivariate method to tackle these

challenges, and compared it against the most up to date state-of-the-art univariate

method. Once the comparison proved successful, it moved onto the next level, by

using it for the exploration of very fine-level questions that could not be answered by

the “typical” analysis pipeline. This last step, with all its limitations, showed that

multivariate and multichannel patterns do indeed have the potential of bringing to

light effects that elude the channel-wise analysis.

This doctoral work constitutes a step forward in the scrutiny of the different fNIRS

data analysis techniques, and it taught us many different lessons: first, that despite

all its limitations, the channel-wise approach constitutes still a cornerstone of any

analysis routine, since it’s the only option we have to localize the cognitive processes

of interest. Channel-wise doesn’t necessarily mean GLM: the classifier presented in

Chapter 3 also worked on single channels, and achieved great results by (1) using a

data-driven procedure and (2) combining HbO and HbR features.

Second, we learned that by combining information from different channels we can

decode brain states associated to the experiment, and that decoding models can answer

a whole other range of fine-grained research questions. But for this approach to be really

successful, it cannot just be applied to any dataset: a well-reasoned experimental design

is crucial, especially in terms of number of trials and number of features, if the goal is

trial-by-trial classification.

121

Chapter 6. Summary and Conclusions

To conclude, advocating the exclusive use of one approach against the other is im-

possible: single-channel and multi-channel approaches respond to different questions

and can actually greatly reinforce one another, when used in conjunction on the same

dataset.

122

Appendix A

Appendix

A.1 Supplementary information of Chapter 3

A.1.1 Correlation between individual measures and classifica-

tion accuracies

The linear model fitted to the individual classification accuracies revealed a significant

effect of Hair Color on the classification accuracies reached by the GLM with HbR.

The other predictors that were used were Gender, Time of Measurement, and Age.

Reported below are the distributions of classification results for each of these effects.

123

Appendix A. Appendix

Figure A.1: Distributions of classification accuracies grouped by Time

of Measurement. The central red marks represents the median values,

the blue boxes extend from the 25th to the 75th percentiles, and the

black whiskers extend to the most extreme data points not considered

outliers (which are marked with red crosses)

Figure A.2: Distributions of classification accuracies grouped by Gen-

der

A.1.2 Experimental results

In this section, individual subjects’ channel-wise classification results from the GLM-

based and LDA-based analyses are reported.

For each subject, a table is presented, which includes:

•Topographic images of channel-wise output values, namely GLM βcoefficient

values and LDA classifier outputs. The images were produced with nirsLAB

v2017.06 [1]. Large positive GLM(HbO) βvalues, or large negative GLM(HbR)

βvalues, indicate a good fit of the model to the data, and so a better chance of

that channel being statistically significant (i.e., of being classified as “active”).

LDA outputs are negative if the channel is classified as “not active” and positive

124

A.1. Supplementary information of Chapter 3

Figure A.3: (Left) Effect of subject age on classification accuracy,

according to Linear Mixed Effects analysis. Partial-dependency line

for each method is superimposed on the individual-subject data points

(Right) Coefficients estimated by the LME for Predictor: Age, Re-

sponse Variable: Classification Accuracy. None of the methods has a

statistically significant age dependence on accuracy.

is the channel is classified as “active”. Therefore, a high classifier output indicates

a good chance that the channel is labeled as “active”.

•Block averages of the signal in response to the stimulus (red and blue curves for

HbO and HbR, respectively). The shaded error bars indicate the standard error

computed over the 16 episodes of alternating left- and right-hand finger tapping.

The GLM plots are accompanied by the canonical basis function used by the

model; the LDA plots are accompanied by the block averages of the resting-state

trials. The block averages are shown for only the channels covering the motor

cortex.

A second table reports the numerical output of each analysis, along with the cor-

responding p-values. Channels flagged as NaN were of poor quality and therefore

discarded from the analysis.

125

Appendix A. Appendix

Figure A.4: Uncorrected p-values resulting from the 3 different analyses (GLM on HbO traces, GLM

on HbR traces, LDA-based classification using combination of HbO and HbR features). Cells highlighted

in red indicate the channels most likely covering the motor cortex. Green highlighting indicates channels

that are classified as “active” at p=0.05. Empty (dark-gray highlighting) cells indicate channels with poor

data quality, which were therefore discarded from the analysis.

126

A.1. Supplementary information of Chapter 3

Figure A.5: Plots of GLM and LDA results- Subject 1

127

Appendix A. Appendix

Figure A.6: Table of GLM and LDA results- Subject 1

128

A.1. Supplementary information of Chapter 3

Figure A.7: Plots of GLM and LDA results- Subject 2

129

Appendix A. Appendix

Figure A.8: Table of GLM and LDA results- Subject 2

130

A.1. Supplementary information of Chapter 3

Figure A.9: Plots of GLM and LDA results- Subject 3

131

Appendix A. Appendix

Figure A.10: Table of GLM and LDA results- Subject 3

132

A.1. Supplementary information of Chapter 3

Figure A.11: Plots of GLM and LDA results- Subject 4

133

Appendix A. Appendix

Figure A.12: Table of GLM and LDA results- Subject 4

134

A.1. Supplementary information of Chapter 3

Figure A.13: Plots of GLM and LDA results- Subject 5

135

Appendix A. Appendix

Figure A.14: Table of GLM and LDA results- Subject 5

136

A.1. Supplementary information of Chapter 3

Figure A.15: Plots of GLM and LDA results- Subject 6

137

Appendix A. Appendix

Figure A.16: Table of GLM and LDA results- Subject 6

138

A.1. Supplementary information of Chapter 3

Figure A.17: Plots of GLM and LDA results- Subject 7

139

Appendix A. Appendix

Figure A.18: Table of GLM and LDA results- Subject 7

140

A.1. Supplementary information of Chapter 3

Figure A.19: Table of all the experimental results. Uncorrected p-values resulting from the 3 different

analyses (GLM on HbO traces, GLM on HbR traces, LDA-based classification using combination of HbO

and HbR features). Highlighted cells (red) indicate the channels most likely covering the motor cortex.

Highlighted cells (green) indicate p<0.05. Highlighted cells (grey) with missing values indicate channels

with poor acquisition quality and therefore discarded from the analysis.

141

Appendix A. Appendix

A.2 Supplementary information of Chapter 4

A.2.1 Localization of optodes

Ch# MNI coordinates Brodmann Area Pr(%)

X Y Z

1 -65.6 -52.33 -14.83 20 - Inferior Temporal gyrus 28.2

21 - Middle Temporal gyrus 7

37 - Fusiform gyrus 64.8

2 -70 -28.33 -16.5 20 - Inferior Temporal gyrus 2.5

21 - Middle Temporal gyrus 97.5

3 -65.5 -8.33 -23.5 20 - Inferior Temporal gyrus 41.3

21 - Middle Temporal gyrus 58.7

4 -35.6 62.83 -7 10 - Frontopolar area 79.1

11 - Orbitofrontal area 20.9

5 -12.3 72 -1.5 10 - Frontopolar area 98.8

11 - Orbitofrontal area 1.2

6 17 72 -0.5 10 - Frontopolar area 100

7 39.3 63.33 -6.5 10 - Frontopolar area 87.3

11 - Orbitofrontal area 12.7

8 68.17 -6.33 -21.33 20 - Inferior Temporal gyrus 18.4

21 - Middle Temporal gyrus 81.6

9 72 -26.5 -14.83 20 - Inferior Temporal gyrus 1.3

21 - Middle Temporal gyrus 98.7

142

A.2. Supplementary information of Chapter 4

10 67 -48.67 -15.17 20 - Inferior Temporal gyrus 38.4

21 - Middle Temporal gyrus 6.8

37 - Fusiform gyrus 54.8

11 -62.3 -61 -5.5 21 - Middle Temporal gyrus 15.3

37 - Fusiform gyrus 84.7

12 -70 -39.33 -2 21 - Middle Temporal gyrus 88.8

22 - Superior Temporal Gyrus 11.3

13 -70 -17 -6.33 21 - Middle Temporal gyrus 97.5

22 - Superior Temporal Gyrus 2.5

14 -62 2.3 -19 21 - Middle Temporal gyrus 100

15 -49.3 45.83 -5.33 10 - Frontopolar area 1.5

47 - Inferior prefrontal gyrus 98.5

16 -25.3 67 9.34 10 - Frontopolar area 100

17 5 70 14.67 10 - Frontopolar area 100

18 28.83 67.67 10.17 10 - Frontopolar area 100

19 52 47.33 -2.83 10 - Frontopolar area 21.7

47 - Inferior prefrontal gyrus 78.3

20 64 4.33 -14.83 21 - Middle Temporal gyrus 100

21 72 -14.5 -3.33 21 - Middle Temporal gyrus 83.5

22 - Superior Temporal Gyrus 16.5

143

Appendix A. Appendix

22 72 -37.67 -0.83 21 - Middle Temporal gyrus 64.2

22 - Superior Temporal Gyrus 35.8

23 64 -58.5 -5.67 21 - Middle Temporal gyrus 6.9

37 - Fusiform gyrus 93.1

24 -67.17 -50.33 9.67 21 - Middle Temporal gyrus 34.2

22 - Superior Temporal Gyrus 65.8

25 -69 -26.33 13.5 40 - Supramarginal gyrus part of Wernicke’s area 1.3

42 - Primary and Auditory Association Cortex 98.7

26 -66.5 -5.33 7.83 22 - Superior Temporal Gyrus 93.2

42 - Primary and Auditory Association Cortex 6.8

27 -56.67 25.17 3.33 45 - pars triangularis Broca’s area 56.8

47 - Inferior prefrontal gyrus 43.2

28 -12 66.67 24.83 10 - Frontopolar area 100

29 15.67 67 25.83 10 - Frontopolar area 100

30 60 27.67 7.67 45 - pars triangularis Broca’s area 96.3

46 - Dorsolateral prefrontal cortex 3.7

31 68 -3.33 9 6 - Pre-Motor and Supplementary Motor Cortex 3.9

22 - Superior Temporal Gyrus 96.1

32 71.17 -25.33 13.17 40 - Supramarginal gyrus part of Wernicke’s area 2.5

42 - Primary and Auditory Association Cortex 97.5

33 69 -48.17 10.83 21 - Middle Temporal gyrus 11.7

22 - Superior Temporal Gyrus 88.3

144

A.2. Supplementary information of Chapter 4

34 -63 -59.33 21.33 22 - Superior Temporal Gyrus 45.2

39 - Angular gyrus, part of Wernicke’s area 50.7

40 - Supramarginal gyrus part of Wernicke’s area 4.1

35 -68 -36.5 30.67 40 - Supramarginal gyrus part of Wernicke’s area 100

36 -67 -12.33 27.33 1 - Primary Somatosensory Cortex 32.5

3 - Primary Somatosensory Cortex 18.8

4 - Primary Motor Cortex 20

6 - Pre-Motor and Supplementary Motor Cortex 7.5

43 - Subcentral area 21.3

37 -61 11.33 14.17 44 - pars opercularis, part of Broca’s area 100

38 -51 41 14.33 45 - pars triangularis Broca’s area 1.4

46 - Dorsolateral prefrontal cortex 98.6

39 -24.7 54.83 34.33 9 - Dorsolateral prefrontal cortex 90.6

10 - Frontopolar area 9.4

40 1.67 57.17 39.67 9 - Dorsolateral prefrontal cortex 100

41 27.33 55 34.67 9 - Dorsolateral prefrontal cortex 98.2

10 - Frontopolar area 1.8

42 53 40.33 18.67 46 - Dorsolateral prefrontal cortex 100

43 63 13.83 17 44 - pars opercularis, part of Broca’s area 90.7

45 - pars triangularis Broca’s area 9.3

44 69 -9.33 28.5 3 - Primary Somatosensory Cortex 34.6

145

Appendix A. Appendix

4 - Primary Motor Cortex 10.3

6 - Pre-Motor and Supplementary Motor Cortex 47.4

43 - Subcentral area 7.7

45 70 -34.67 29.5 40 - Supramarginal gyrus part of Wernicke’s area 100

46 64 -58.33 21 22 - Superior Temporal Gyrus 46.1

39 - Angular gyrus, part of Wernicke’s area 44.7

40 - Supramarginal gyrus part of Wernicke’s area 9.2

47 -64 -46.5 39.67 40 - Supramarginal gyrus part of Wernicke’s area 100

48 -65 -22.5 39.33 1 - Primary Somatosensory Cortex 32.5

2 - Primary Somatosensory Cortex 36.4

3 - Primary Somatosensory Cortex 29.9

40 - Supramarginal gyrus part of Wernicke’s area 1.3

49 -63 1.83 31.33 6 - Pre-Motor and Supplementary Motor Cortex 97.3

9 - Dorsolateral prefrontal cortex 2.7

50 -53 30.33 26.67 45 - pars triangularis Broca’s area 4.3

46 - Dorsolateral prefrontal cortex 95.7

51 -37.67 44.17 36.33 9 - Dorsolateral prefrontal cortex 96.2

46 - Dorsolateral prefrontal cortex 3.8

52 -12.33 47.67 49.67 8 - Includes Frontal eye fields 100

53 14.33 47.33 50.5 8 - Includes Frontal eye fields 100

54 38.67 44.67 38 9 - Dorsolateral prefrontal cortex 100

55 55 29.67 28.5 45 - pars triangularis Broca’s area 1.4

146

A.2. Supplementary information of Chapter 4

46 - Dorsolateral prefrontal cortex 98.6

56 65 3.33 33.33 6 - Pre-Motor and Supplementary Motor Cortex 87.7

9 - Dorsolateral prefrontal cortex 12.3

57 67.5 -22.5 40.67 1 - Primary Somatosensory Cortex 50.6

2 - Primary Somatosensory Cortex 18.2

3 - Primary Somatosensory Cortex 26

4 - Primary Motor Cortex 5.2

58 65.17 -47 38.67 40 - Supramarginal gyrus part of Wernicke’s area 100

Table A.1: The X, Y, and Z columns represent median MNI coordi-

nates across subjects. Anatomical labelling of the MNI coordinates was

conducted using the Talairach atlas. The last column lists the atlas-

based probability that the channel coordinates are within that anatom-

ical location. The Brodmann area with the highest probability is high-

lighted in bold.

147

Bibliography

[1] F. Jobsis. “Noninvasive, infrared monitoring of cerebral and myocardial oxygen

sufficiency and circulatory parameters”. In: Science 198.4323 (1977), pp. 1264–

1267. issn: 0036-8075. doi:10.1126/science.929199. arXiv: arXiv:1011.

1669v3.

[2] Lia M. Hocke, Ibukunoluwa K. Oni, Chris C. Duszynski, Alex V. Corrigan,

Blaise de B. Frederick, and Jeff F. Dunn. “Automated processing of fNIRS data-

A visual guide to the pitfalls and consequences”. In: Algorithms 11.5 (2018),

pp. 1–25. issn: 19994893. doi:10.3390/a11050067.

[3] Paola Pinti, Ilias Tachtsidis, Antonia Hamilton, Joy Hirsch, Clarisse Aichelburg,

Sam Gilbert, and Paul W. Burgess. “The present and future use of functional

near-infrared spectroscopy (fNIRS) for cognitive neuroscience”. In: Annals of

the New York Academy of Sciences (2018), pp. 1–25. issn: 17496632. doi:10.

1111/nyas.13948.

[4] W. D. Penny, K. J. Friston, J. T. Ashburner, S. J. Kiebel, and T. E Nichols. Sta-

tistical parametric mapping: the analysis of functional brain images. Academic

press, 2011. isbn: 9780124115118.

[5] Theodore J. Huppert. “Commentary on the statistical properties of noise and its

implication on general linear models in functional near-infrared spectroscopy”.

In: Neurophotonics 3.1 (2016), p. 010401. issn: 2329-423X. doi:10.1117/1.

NPh.3.1.010401.url:http://neurophotonics.spiedigitallibrary.org/

article.aspx?doi=10.1117/1.NPh.3.1.010401.

[6] Jeffrey W Barker, Ardalan Aarabi, Theodore J Huppert, M Suzuki, I Miyai, T

Ono, K Kubota, R J Cooper, J Selb, L Gagnon, D Phillip, H W Schytz, H K

Iversen, M Ashina, J C Ye, S Tak, K E Jang, J Jung, and J Jang. “Autoregressive

model based algorithm for correcting motion and serially correlated errors in

149

BIBLIOGRAPHY

fNIRS”. In: Inst. Stat. Math. 22. A. M. Dale Hum. Brain Mapp 4.8 (2013),

pp. 35–54. issn: 2156-7085. doi:10.1364/BOE.4.001366.

[7] Lauren L. Emberson, Benjamin D. Zinszer, Rajeev D.S. Raizada, and Richard

N. Aslin. “Decoding the infant mind: Multivariate pattern analysis (MVPA)

usingfNIRS”. In: PLoS ONE 12.4 (2017), pp. 1–23. issn: 19326203. doi:10.

1371/journal.pone.0172500.url:http://dx.doi.org/10.1371/journal.

pone.0172500.

[8] Benjamin D. Zinszer, Laurie Bayet, Lauren L. Emberson, Rajeev D. S. Raizada,

and Richard N. Aslin. “Decoding semantic representations from functional near-

infrared spectroscopy signals”. In: Neurophotonics 5.01 (2017), p. 1. issn: 2329-

423X. doi:10.1117/1.NPh.5.1.011003.url:https://www.spiedigitallibrary.

org/journals/neurophotonics/volume- 5/issue- 01/011003/Decoding-

semantic-representations-from-functional-near-infrared-spectroscopy-

signals/10.1117/1.NPh.5.1.011003.full.

[9] Jessica Gemignani, Eike Middell, Randall L Barbour, Harry L Graber, and Ben-

jamin Blankertz. “Improving the analysis of near-infrared spectroscopy data

with multivariate classification of hemodynamic patterns : a theoretical formu-

lation and validation”. In: Journal of Neural Engineering 15.4 (2018), 045001

(15pp). issn: 1741-2560. doi:10.1088/1741-2552/aabb7c.url:https://

doi.org/10.1088/1741-2552/aabb7c.

[10] Jessica Gemignani, Benjamin Blankertz, Richard N. Aslin, and K. R. Pugh.

“Observing the effects of literacy acquisition on the brain with fNIRS”. In:

NeuroImage in preparation (2019).

[11] Jessica Gemignani, Laurie Bayet, Claire Kabdebon, Benjamin Blankertz, Ken-

neth R. Pugh, and Richard N. Aslin. “Classifying the mental representation

of word meaning in children with Multivariate Pattern Analysis of fNIRS”. In:

Proceedings of the Annual International Conference of the IEEE Engineering in

Medicine and Biology Society, EMBS. IEEE, 2018.

[12] Eric C Peterson, Zhengfeng Wang, and Gavin Britz. Regulation of cerebral blood

flow. 2011. doi:10.1155/2011/823525.

[13] C.S. Roy and C.S Sherrington. “On the Regulation of the Blood-supply of the

Brain”. In: Journal of Physiology 11 (1890), pp. 85–158.

150

BIBLIOGRAPHY

[14] Mireille Bélanger, Igor Allaman, and Pierre J. Magistretti. “Brain energy metabolism:

Focus on Astrocyte-neuron metabolic cooperation”. In: Cell Metabolism 14.6

(2011), pp. 724–738. issn: 15504131. doi:10.1016/j.cmet.2011.08.016.

arXiv: arXiv:1011.1669v3.

[15] Helene Girouard and Costantino Iadecola. “Regulation of the Cerebral Cir-

culation Neurovascular coupling in the normal brain and in hypertension ,

stroke , and Alzheimer disease”. In: 10021 (2010), pp. 328–335. doi:10.1152/

japplphysiol.00966.2005.

[16] R G Shulman and D L Rothman. “Brain Energetics and Neuronal Activity:

Applications to fMRI and Medicine”. In: Medicine (2004).

[17] Hellmuth Obrig and Arno Villringer. “Beyond the visible - Imaging the hu-

man brain with light”. In: Journal of Cerebral Blood Flow and Metabolism 23.1

(2003), pp. 1–18. issn: 0271678X. doi:10.1097/01.WCB.0000043472.45775.

29.

[18] GK Aguirre, E Zarahn, and M. D’Esposito. “The Variability of Human, BOLD

Hemodynamic Responses•1”. In: Neuroimage 8.4 (1998), pp. 360–369. url:

http://www.sciencedirect.com/science/article/pii/S105381199890369X.

[19] D. T. Delpy and M. Cope. “Quantification in tissue near-infrared spectroscopy”.

In: Philosophical Transactions of the Royal Society B: Biological Sciences 352.1354

(1997), pp. 649–659. issn: 0962-8436. doi:10.1098/rstb.1997.0046.

[20] Felix Scholkmann, Stefan Kleiser, Andreas Jaakko Metz, Raphael Zimmermann,

Juan Mata Pavia, Ursula Wolf, and Martin Wolf. “A review on continuous

wave functional near-infrared spectroscopy and imaging instrumentation and

methodology”. In: NeuroImage 85 (2014), pp. 6–27. issn: 10538119. doi:10.

1016/j.neuroimage.2013.05.004.url:http://dx.doi.org/10.1016/j.

neuroimage.2013.05.004.

[21] Eiji Okada, Michael Firbank, Martin Schweiger, Simon R Arridge, Mark Cope,

and David T Delpy. “Theoretical and experimental investigation of near-infrared

light propagation in a model of the adult head”. In: Applied optics 36.1 (1997),

pp. 21–31.

[22] Eiji Okada, Michael Firbank, and David T Delpy. “The effect of overlying tissue

on the spatial sensitivity profile of near-infrared spectroscopy”. In: Physics in

Medicine & Biology 40.12 (1995), p. 2093.

151

BIBLIOGRAPHY

[23] Eiji Okada and David T Delpy. “Near-infrared light propagation in an adult

head model. II. Effect of superficial tissue thickness on the sensitivity of the

near-infrared spectroscopy signal”. In: Applied optics 42.16 (2003), pp. 2915–

2921.

[24] Marco Ferrari and Valentina Quaresima. “A brief review on the history of hu-

man functional near-infrared spectroscopy (fNIRS) development and fields of

application”. In: NeuroImage 63.2 (2012), pp. 921–935. issn: 10538119. doi:

10.1016/j.neuroimage.2012.03.049.

[25] Matthew Caldwell, Felix Scholkmann, Ursula Wolf, Martin Wolf, Clare Elwell,

and Ilias Tachtsidis. “Modelling confounding effects from extracerebral contam-

ination and systemic factors on functional near-infrared spectroscopy”. In: Neu-

roImage 143 (2016), pp. 91–105. issn: 10959572. doi:10.1016/j.neuroimage.

2016.08.058.url:http://dx.doi.org/10.1016/j.neuroimage.2016.08.

058.

[26] Ilias Tachtsidis and Felix Scholkmann. “False positives and false negatives in

functional near-infrared spectroscopy: issues, challenges, and the way forward”.

In: Neurophotonics 3.3 (2016), p. 031405.

[27] Martin Wolf, Ursula Wolf, Vlad Toronov, Antonios Michalos, L. Adelina Paunescu,

Jee Hyun Choi, and Enrico Gratton. “Different time evolution of oxyhemoglobin

and deoxyhemoglobin concentration changes in the visual and motor cortices

during functional stimulation: A near-infrared spectroscopy study”. In: Neu-

roImage 16.3 I (2002), pp. 704–712. issn: 10538119. doi:10.1006/nimg.2002.

1128.

[28] Felix Scholkmann and Martin Wolf. “General equation for the differential path-

length factor of the frontal human head depending on wavelength and age”. In:

Journal of Biomedical Optics (2013). issn: 1083-3668. doi:10.1117/1.JBO.

18.10.105004.

[29] M. Essenpreis, C. E. Elwell, M. Cope, P. van der Zee, S. R. Arridge, and D.

T. Delpy. “Spectral dependence of temporal point spread functions in human

tissues”. In: Applied Optics 32.4 (1993), p. 418. issn: 0003-6935. doi:10.1364/

AO.32.000418.url:https://www.osapublishing.org/abstract.cfm?URI=

ao-32-4-418.

152

BIBLIOGRAPHY

[30] Judith D. Schaechter. Motor rehabilitation and brain plasticity after hemiparetic

stroke. 2004. doi:10.1016/j.pneurobio.2004.04.001.

[31] Felix Biessmann, Frank C. Meinecke, Arthur Gretton, Alexander Rauch, Gregor

Rainer, Nikos K. Logothetis, and Klaus Robert Müller. “Temporal kernel CCA

and its application in multimodal neuronal data analysis”. In: Machine Learning

79.1-2 (2010), pp. 5–27. issn: 15730565. doi:10.1007/s10994-009-5153-3.

[32] Felix Biessmann, Sergey Plis, Frank C Meinecke, Tom Eichele, and Klaus-Robert

Müller. “Analysis of Multimodal Neuroimaging Data”. In: IEEE Reviews in

Biomedical Engineering 4 (2011), pp. 26–58.

[33] Jonathan R Wolpaw, Niels Birbaumer, Dennis J McFarland, Gert Pfurtscheller,

and Theresa M Vaughan. “Brain-computer interfaces for communication and

control.” In: Clinical neurophysiology : official journal of the International Fed-

eration of Clinical Neurophysiology 113.6 (2002), pp. 767–91. issn: 1388-2457.

doi:10.1016/S1388-2457(02)00057-3.url:http://www.ncbi.nlm.nih.

gov/pubmed/12048038.

[34] Siamac Fazli, Jan Mehnert, Jens Steinbrink, Gabriel Curio, Arno Villringer,

Klaus-Robert Müller, and Benjamin Blankertz. “Enhanced performance by a hy-

brid NIRS–EEG brain computer interface”. In: NeuroImage 59.1 (2012), pp. 519–

529. issn: 10538119. doi:10.1016/j.neuroimage.2011.07.084.url:http:

//linkinghub.elsevier.com/retrieve/pii/S1053811911008792.

[35] Silke Telkemeyer, Sonja Rossi, Stefan P Koch, Till Nierhaus, Jens Steinbrink,

David Poeppel, Hellmuth Obrig, and Isabell Wartenburger. “Sensitivity of new-

born auditory cortex to the temporal structure of sounds.” In: The Journal of

neuroscience : the official journal of the Society for Neuroscience 29.47 (2009),

pp. 14726–33. issn: 1529-2401. doi:10.1523/JNEUROSCI.1246-09.2009.url:

http://www.ncbi.nlm.nih.gov/pubmed/19940167.

[36] Sonja Rossi, Silke Telkemeyer, Isabell Wartenburger, and Hellmuth Obrig. “Shed-

ding light on words and sentences: Near-infrared spectroscopy in language re-

search”. In: Brain and Language 121.2 (2012), pp. 152–163. issn: 0093934X.

doi:10.1016/j.bandl.2011.03.008.url:http://dx.doi.org/10.1016/j.

bandl.2011.03.008.

[37] F. Wallois, A. Patil, C. Héberlé, and R. Grebe. “EEG-NIRS in epilepsy in

children and neonates”. In: Neurophysiologie Clinique/Clinical Neurophysiology

153

BIBLIOGRAPHY

40.5-6 (2010), pp. 281–292. issn: 09877053. doi:10.1016/j.neucli.2010.08.

004.

[38] John R Glover. “Overview of Functional Magnetic Resonance Imaging”. In:

Neurosurg Clin N Am. 22.2 (2011), pp. 133–139. doi:10.1016/j.nec.2010.

11.001.Overview.

[39] Michael K. Stehling, Robert Turner, and Peter Mansfield. Echo-planar imaging:

Magnetic resonance imaging in a fraction of a second. 1991. doi:10.1126/

science.1925560.

[40] Fa Hsuan Lin, Kevin W.K. Tsai, Ying Hua Chu, Thomas Witzel, Aapo Num-

menmaa, Tommi Raij, Jyrki Ahveninen, Wen Jui Kuo, and John W. Belliveau.

Ultrafast inverse imaging techniques for fMRI. 2012. doi:10.1016/j.neuroimage.

2012.01.072.

[41] Muyue Yang, Zhen Yang, Tifei Yuan, Wuwei Feng, and Pu Wang. “A systemic

review of functional near-infrared spectroscopy for stroke: Current application

and future directions”. In: Frontiers in Neurology 10.58 (2019). issn: 16642295.

doi:10.3389/fneur.2019.00058.

[42] Pei Yi Lin, Sang I. Lin, Trevor Penney, and Jia Jin Jason Chen. “Applications

of near infrared spectroscopy and imaging for motor rehabilitation in stroke pa-

tients”. In: Journal of Medical and Biological Engineering 29.5 (2009), pp. 210–

221. issn: 16090985.

[43] Hellmuth Obrig. “NIRS in clinical neurology - a ’promising’ tool?” In: NeuroIm-

age 85.85 (2014), pp. 535–546. issn: 10538119. doi:10.1016/j.neuroimage.

2013.03.045. arXiv: NIHMS150003.

[44] Lena H. Ernst, Sabrina Schneider, Ann Christine Ehlis, and Andreas J. Fallgat-

ter. Functional near infrared spectroscopy in psychiatry: A critical review. 2012.

doi:10.1255/jnirs.970.

[45] Simone Cutini, Sara Basso Moro, and Silvia Bisconti. “Review: Functional near

infrared optical imaging in cognitive neuroscience: An introductory review”. In:

Journal of Near Infrared Spectroscopy 20.1 (2012), pp. 75–92. issn: 09670335.

doi:10.1255/jnirs.969.

154

BIBLIOGRAPHY

[46] Sabrina Brigadoi, Lisa Ceccherini, Simone Cutini, Fabio Scarpa, Pietro Scat-

turin, Juliette Selb, Louis Gagnon, David A. Boas, and Robert J. Cooper. “Mo-

tion artifacts in functional near-infrared spectroscopy: A comparison of mo-

tion correction techniques applied to real cognitive data”. In: NeuroImage 85.1

(2014), pp. 181–191. issn: 10538119. doi:10.1016/j.neuroimage.2013.04.

082. arXiv: NIHMS150003.

[47] Judit Gervain. “Plasticity in early language acquisition: the effects of prenatal

and early childhood experience”. In: Current Opinion in Neurobiology 35.JUNE

(2015), pp. 13–20. issn: 09594388. doi:10.1016/j.conb.2015.05.004.url:

http://linkinghub.elsevier.com/retrieve/pii/S095943881500094X.

[48] M. Pena, A. Maki, D. Kovacic, G. Dehaene-Lambertz, H. Koizumi, F. Bouquet,

and J. Mehler. “Sounds and silence: An optical topography study of language

recognition at birth”. In: Proceedings of the National Academy of Sciences 100.20

(2003), pp. 11702–11705. issn: 0027-8424. doi:10.1073/pnas.1934290100.

[49] Richard N. Aslin. “Questioning the questions that have been asked about the

infant brain using near-infrared spectroscopy”. In: Cognitive Neuropsychology

29.1-2 (2012), pp. 7–33. issn: 02643294. doi:10.1080/02643294.2012.654773.

[50] Judit Gervain, Jacques Mehler, Janet F. Werker, Charles A. Nelson, Gergely

Csibra, Sarah Lloyd-Fox, Mohinish Shukla, and Richard N. Aslin. “Near-infrared

spectroscopy: A report from the McDonnell infant methodology consortium”.

In: Developmental Cognitive Neuroscience 1.1 (2011), pp. 22–46. issn: 18789293.

doi:10.1016/j.dcn.2010.07.004. arXiv: arXiv:1011.1669v3.

[51] S. Lloyd-Fox, A. Blasi, and C. E. Elwell. “Illuminating the developing brain:

The past, present and future of functional near infrared spectroscopy”. In: Neu-

roscience and Biobehavioral Reviews 34.3 (2010), pp. 269–284. issn: 01497634.

doi:10.1016/j.neubiorev.2009.07.008.

[52] Kenneth R. Pugh, Nicole Landi, Jonathan L. Preston, W. Einar Mencl, Alison

C. Austin, Daragh Sibley, Robert K. Fulbright, Mark S. Seidenberg, Elena L.

Grigorenko, R. Todd Constable, Peter Molfese, and Stephen J. Frost. “The rela-

tionship between phonological and auditory processing and brain organization

in beginning readers”. In: Brain and Language 125.2 (2013), pp. 173–183. issn:

0093934X. doi:10.1016/j.bandl.2012.04.004. arXiv: NIHMS150003.url:

http://dx.doi.org/10.1016/j.bandl.2012.04.004.

155

BIBLIOGRAPHY

[53] Robert J. Cooper, Juliette Selb, Louis Gagnon, Dorte Phillip, Henrik W. Schytz,

Helle K. Iversen, Messoud Ashina, and David A. Boas. “A systematic com-

parison of motion artifact correction techniques for functional near-infrared

spectroscopy”. In: Frontiers in Neuroscience 6.OCT (2012), pp. 1–10. issn:

16624548. doi:10.3389/fnins.2012.00147.

[54] Yiheng Zhang, Dana H. Brooks, Maria Angela Franceschini, and David A. Boas.

“Eigenvector-based spatial filtering for reduction of physiological interference in

diffuse optical imaging”. In: Journal of Biomedical Optics 10.1 (2005), p. 011014.

issn: 10833668. doi:10.1117/1.1852552.

[55] Behnam Molavi and Guy A. Dumont. “Wavelet-based motion artifact removal

for functional near-infrared spectroscopy”. In: Physiological Measurement 33.2

(2012), pp. 259–270. issn: 09673334. doi:10.1088/0967-3334/33/2/259.

[56] F Scholkmann, S Spichtig, T Muehlemann, and M Wolf. “How to detect and

reduce movement artifacts in near-infrared imaging using moving standard de-

viation and spline interpolation.” In: Physiological measurement 31.5 (2010),

pp. 649–662. issn: 0967-3334. doi:10.1088/0967-3334/31/5/004.

[57] Ivan W. Selesnick, Harry L Graber, Ying Ding, Tong Zhang, and Randall L

Barbour. “Transient Artifact Reduction using Sparse Optimization”. In: 62.1

(2014), pp. 5–7.

[58] Xian Zhang, Jack Adam Noah, and Joy Hirsch. “Separation of the global and

local components in functional near-infrared spectroscopy signals using principal

component spatial filtering”. In: Neurophotonics 3.1 (2016), p. 015004. issn:

2329-423X. doi:10.1117/1.NPh.3.1.015004.url:http://neurophotonics.

spiedigitallibrary.org/article.aspx?doi=10.1117/1.NPh.3.1.015004.

[59] Sabrina Brigadoi and Robert J Cooper. “How short is short? Optimum source–detector

distance for short-separation channels in functional near-infrared spectroscopy”.

In: Neurophotonics 2.2 (2015), p. 025005. issn: 2329-423X. doi:10.1117/1.

nph.2.2.025005.

[60] Ilias Tachtsidis, Peck H. Koh, Charlotte Stubbs, and Clare E. Elwell. “Func-

tional optical topography analysis using statistical parametric mapping (SPM)

methodology with and without physiological confounds”. In: Advances in Exper-

imental Medicine and Biology. Vol. 662. 2010, pp. 237–243. isbn: 9781441912398.

doi:10.1007/978-1-4419-1241-1_34.

156

BIBLIOGRAPHY

[61] Tsukasa Funane, Hirokazu Atsumori, Takusige Katura, Akiko N. Obata, Hi-

roki Sato, Yukari Tanikawa, Eiji Okada, and Masashi Kiguchi. “Quantitative

evaluation of deep and shallow tissue layers’ contribution to fNIRS signal using

multi-distance optodes and independent component analysis”. In: NeuroImage

85 (2014), pp. 150–165. issn: 10538119. doi:10.1016/j.neuroimage.2013.

02.026.

[62] Rolf B. Saager and Andrew J. Berger. “Direct characterization and removal

of interfering absorption trends in two-layer turbid media”. In: Journal of the

Optical Society of America A 22.9 (2005), p. 1874. issn: 1084-7529. doi:10.

1364/JOSAA.22.001874.url:https://www.osapublishing.org/abstract.

cfm?URI=josaa-22-9-1874.

[63] Paola Pinti, Felix Scholkmann, Antonia Hamilton, Paul Burgess, and Ilias Tacht-

sidis. “Current Status and Issues Regarding Pre-processing of fNIRS Neuroimag-

ing Data: An Investigation of Diverse Signal Filtering Methods Within a General

Linear Model Framework”. In: Frontiers in Human Neuroscience 12.January

(2019), pp. 1–21. doi:10.3389/fnhum.2018.00505.

[64] Theodore J. Huppert, Solomon G. Diamond, Maria A. Franceschini, and David

A Boas. “HomER: a review of time-series analysis methods for near- infrared

spectroscopy of the brain”. In: 15.10 (2009), pp. 1203–1214. issn: 08966273.

doi:10.1016/j.drugalcdep.2008.02.002.A. arXiv: NIHMS150003.

[65] Hendrik Santosa, Xuetong Zhai, Frank Fishburn, and Theodore Huppert. “The

NIRS Brain AnalyzIR Toolbox”. In: (2018). doi:10.3390/a11050073.

[66] Yong Xu, Harry L. Graber, and Randall L. Barbour. “nirsLAB: A Computing

Environment for fNIRS Neuroimaging Data Analysis”. In: Biomedical Optics

2014 (2014), BM3A.1. doi:10 .1364/ BIOMED.2014 .BM3A. 1.url:https:

//www.osapublishing.org/abstract.cfm?uri=BIOMED-2014-BM3A.1.

[67] Sungho Tak and Jong Chul Ye. “Statistical analysis of fNIRS data: A com-

prehensive review”. In: NeuroImage 85 (2014), pp. 72–91. issn: 10538119. doi:

10.1016/j.neuroimage.2013.06.016.url:http://linkinghub.elsevier.

com/retrieve/pii/S1053811913006538.

[68] Karl J Friston, Andrew P Holmes, Keith J Worsley, J-P Poline, Chris D Frith,

and Richard SJ Frackowiak. “Statistical parametric maps in functional imaging:

a general linear approach”. In: Human brain mapping 2.4 (1994), pp. 189–210.

157

BIBLIOGRAPHY

[69] Geoffrey M Boynton, Stephen A Engel, Gary H Glover, and David J Heeger.

“Linear Systems Analysis of Functional Magnetic Resonance Imaging in Human

V1”. In: The Journal of Neuroscience 16.13 (1996), pp. 4207–4221. issn: 0270-

6474. url:papers3 : / / publication / uuid / B2E61BDB - C31A - 4BC7 - 8119 -

A9F33C472D11.

[70] Hendrik Santosa, Theodore J Huppert, Hendrik Santosa, Frank Fishburn, Xue-

tong Zhai, and Theodore J Huppert. “Investigation of the sensitivity-specificity

of canonical- and deconvolution-based linear models in evoked functional near-

infrared spectroscopy”. In: Neurophotonics 6.02 (2019), p. 1. issn: 2329-423X.

doi:10.1117/1.NPh.6.2.025009.url:https://www.spiedigitallibrary.

org/journals/neurophotonics/volume-6/issue-02/025009/Investigation-

of-the-sensitivity-specificity-of-canonical--and-deconvolution/

10.1117/1.NPh.6.2.025009.full.

[71] A C Aitken. “IV.—On Least Squares and Linear Combination of Observations”.

In: Proceedings of the Royal Society of Edinburgh 55 (1936), pp. 42–48. doi:

10.1017/S0370164600014346.

[72] J Ye, S Tak, K Jang, and J Jung. “NIRS-SPM: Statistical parametric map-

ping for near-infrared spectroscopy”. In: NeuroImage 44.2 (2009), pp. 428–447.

issn: 10538119. doi:10 . 1016 / j . neuroimage . 2008 . 08 . 036.url:http :

//linkinghub.elsevier.com/retrieve/pii/S1053811908009695.

[73] Richard Henson, Michael D Rugg, Karl J Friston, and Others. “The choice of

basis functions in event-related fMRI”. In: NeuroImage 13.6 (2001), p. 149.

[74] Minako Uga, Ippeita Dan, Toshifumi Sano, Haruka Dan, and Eiju Watanabe.

“Optimizing the general linear model for functional near-infrared spectroscopy:

an adaptive hemodynamic response function approach”. In: Neurophotonics 1.1

(2014), p. 015004. issn: 2329-423X. doi:10.1117/1.nph.1.1.015004.

[75] V. D. Calhoun, M. C. Stevens, G. D. Pearlson, and K. A. Kiehl. “fMRI analysis

with the general linear model: Removal of latency-induced amplitude bias by

incorporation of hemodynamic derivative terms”. In: NeuroImage 22.1 (2004),

pp. 252–257. issn: 10538119. doi:10.1016/j.neuroimage.2003.12.029.

[76] M A Lindquist and T D Wager. “Validity and Power in Hemodynamic Response

Modeling: A Comparison Study and a New Approach”. In: Hum Brain Mapp

28.8 (2012), pp. 764–784. doi:10.1002/hbm.20310.Validity.

158

BIBLIOGRAPHY

[77] C. H. Liao, K. J. Worsley, J. B. Poline, J. A.D. Aston, G. H. Duncan, and

A. C. Evans. “Estimating the delay of the fMRI response”. In: NeuroImage 16.3

I (2002), pp. 593–606. issn: 10538119. doi:10.1006/nimg.2002.1096.

[78] Valeria Della-Maggiore, Wilkin Chau, Pedro R. Peres-Neto, and Anthony R.

McIntosh. “An empirical comparison of SPM preprocessing parameters to the

analysis of fMRI data”. In: NeuroImage 17.1 (2002), pp. 19–28. issn: 10538119.

doi:10.1006/nimg.2002.1113.

[79] Guilherne Augusto Zimeo Morais, Felix Scholkmann, Joana Bisol Balardin,

Rogério Akira Furucho, Renan Costa Vieira de Paula, Claudinei Eduardo Bi-

azoli, and João Ricardo Sato. “Non-neuronal evoked and spontaneous hemo-

dynamic changes in the anterior temporal region of the human head may lead

to misinterpretations of functional near-infrared spectroscopy signals”. In: Neu-

rophotonics 5.01 (2017), p. 1. issn: 2329-423X. doi:10.1117/1.NPh.5.1.

011002.url:https://www.spiedigitallibrary.org/journals/neurophotonics/

volume- 5/issue- 01/011002/Non- neuronal- evoked- and- spontaneous-

hemodynamic- changes- in-the- anterior/10.1117/1.NPh.5.1.011002.

full.

[80] Nils Volkening, Anirudh Unni, Birte Sofie Löffler, Sebastian Fudickar, Jochem

W. Rieger, and Andreas Hein. “Characterizing the Influence of Muscle Activ-

ity in fNIRS Brain Activation Measurements”. In: IFAC-PapersOnLine 49.11

(2016), pp. 84–88. issn: 24058963. doi:10.1016/j.ifacol.2016.08.013.

[81] M. Ferrari, T. Binzoni, and V. Quaresima. “Oxidative metabolism in mus-

cle”. In: Philosophical Transactions of the Royal Society B: Biological Sciences

352.1354 (1997), pp. 677–683. issn: 09628436. doi:10.1098/rstb.1997.0049.

[82] Toru Yamamoto and Toshinori Kato. “Paradoxical correlation between signal

in functional magnetic resonance imaging and deoxygenated haemoglobin con-

tent in capillaries: A new theoretical explanation”. In: Physics in Medicine and

Biology 47.7 (2002), pp. 1121–1141. issn: 00319155. doi:10.1088/0031-9155/

47/7/309.

[83] Thomas Nichols and Satoru Hayasaka. “Controlling the familywise error rate

in functional neuroimaging: A comparative review”. In: Statistical Methods in

Medical Research 12.5 (2003), pp. 419–446. issn: 09622802. doi:10 . 1191 /

0962280203sm341ra. arXiv: arXiv:1711.02231v1.

159

BIBLIOGRAPHY

[84] Yoav Benjamini and Yosef Hochberg. Controlling the False Discovery Rate: A

Practical and Powerful Approach to Multiple Testing. 1995. doi:10 . 2307 /

3866483.

[85] Archana K Singh and Ippeita Dan. “Exploring the false discovery rate in mul-

tichannel NIRS”. In: Neuroimage 33.2 (2006), pp. 542–549.

[86] Hellmuth Obrig, Sonja Rossi, Silke Telkemeyer, and Isabell Wartenburger. “From

acoustic segmentation to language processing: evidence from optical imaging”.

In: Frontiers in Neuroenergetics 2.13 (2010). issn: 16626427. doi:10.3389/

fnene.2010.00013.url:http://journal.frontiersin.org/article/10.

3389/fnene.2010.00013/abstract.

[87] Shirley Coyle, Tomás Ward, Charles Markham, and Gary McDarby. “On the

suitability of near-infrared (NIR) systems for next-generation brain-computer

interfaces”. In: Physiological Measurement 25.4 (2004), pp. 815–822. issn: 09673334.

doi:10.1088/0967-3334/25/4/003.

[88] Noman Naseer and Keum-Shik Hong. “fNIRS-based brain-computer interfaces: a

review”. In: Frontiers in Human Neuroscience 9.January (2015), pp. 1–15. issn:

1662-5161. doi:10.3389/fnhum.2015.00003. arXiv: 0005074v1 [arXiv:astro-ph].

url:http://journal.frontiersin.org/article/10.3389/fnhum.2015.

00003/abstract.

[89] Abdelhak Mahmoudi, Sylvain Takerkart, Fakhita Regragui, Driss Boussaoud,

and Andrea Brovelli. “Multivoxel pattern analysis for fMRI data: A review”.

In: Computational and Mathematical Methods in Medicine 2012 (2012). issn:

1748670X. doi:10.1155/2012/961257.

[90] Kenneth A. Norman, Sean M. Polyn, Greg J. Detre, and James V. Haxby.

“Beyond mind-reading: multi-voxel pattern analysis of fMRI data”. In: Trends

in Cognitive Sciences 10.9 (2006), pp. 424–430. issn: 13646613. doi:10.1016/

j.tics.2006.07.005.

[91] John-Dylan Haynes and Geraint Rees. “Decoding mental states from brain ac-

tivity in humans”. In: Nature Reviews Neuroscience 7.7 (2006), pp. 523–534.

issn: 1471-003X. doi:10.1038/nrn1931.url:http://www.nature.com/

doifinder/10.1038/nrn1931.

160

BIBLIOGRAPHY

[92] Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer,

2006, p. 758. isbn: 9780387310732. doi:10.1021/jo01026a014. arXiv: arXiv:

1011.1669v3.

[93] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification.

2001. doi:10.1007/BF01237942. arXiv: 0-387-31073-8.

[94] Benjamin Blankertz, Steven Lemm, Matthias Treder, Stefan Haufe, and K.-R.

Müller. “Single-trial analysis and classification of {ERP} components–a tuto-

rial”. In: NeuroImage 56.2 (2011), pp. 814–825.

[95] Johannes Höhne, Daniel Bartz, Martin N. Hebart, Klaus-Robert Müller, and

Benjamin Blankertz. “Analyzing neuroimaging data with subclasses: A shrink-

age approach”. In: NeuroImage 124 (2016), pp. 740–751. issn: 10538119. doi:

10.1016/j.neuroimage.2015.09.031.url:http://linkinghub.elsevier.

com/retrieve/pii/S1053811915008460.

[96] Maogeng Xia, Sutao Song, Li Yao, and Zhiying Long. “An empirical comparison

of different LDA methods in fMRI-based brain states decoding”. In: Bio-Medical

Materials and Engineering (2015). issn: 18783619. doi:10.3233/BME-151415.

[97] Gunther Bauernfeind, David Steyrl, Clemens Brunner, and Gernot R. Muller-

Putz. “Single trial classification of fNIRS-based brain-computer interface mental

arithmetic data: A comparison between different classifiers”. In: 2014 36th An-

nual International Conference of the IEEE Engineering in Medicine and Biology

Society, EMBC 2014. 2014. isbn: 9781424479290. doi:10.1109/EMBC.2014.

6944008.

[98] Jaeyoung Shin, Klaus-R Müller, and Han-Jeong Hwang. “Near-infrared spec-

troscopy (NIRS)-based eyes-closed brain-computer interface (BCI) using pre-

frontal cortex activation due to mental arithmetic”. In: Scientific Reports 6.Oc-

tober (2016), p. 36203. issn: 2045-2322. doi:10.1038/srep36203.url:http:

//www.nature.com/articles/srep36203.

[99] Stephen Marsland. Machine Learning: An Algorithmic Perspective. Chapman

and Hall/CRC, 2014, p. 406. isbn: 1420067192. url:http://books.google.

com/books?id=n66O8a4SWGEC{\&}pgis=1.

[100] Louis Gagnon, Katherine Perdue, Douglas N. Greve, Daniel Goldenholz, Gaya-

tri Kaskhedikar, and David A. Boas. “Improved recovery of the hemodynamic

161

BIBLIOGRAPHY

response in diffuse optical imaging using short optode separations and state-

space modeling”. In: NeuroImage 56.3 (2011), pp. 1362–1371. issn: 10538119.

doi:10.1016/j.neuroimage.2011.03.001. arXiv: NIHMS150003.url:http:

//dx.doi.org/10.1016/j.neuroimage.2011.03.001.

[101] Jeffrey W. Barker. “Estimation of Cerebral Physiology and Hemodynamics via

Near-Infrared Spectroscopy”. PhD thesis. University of Pittsburgh, 2014. url:

http://d-scholarship.pitt.edu/.

[102] Claude Julien. “The enigma of Mayer waves: Facts and models”. In: Cardio-

vascular Research 70.1 (2006), pp. 12–21. issn: 00086363. doi:10.1016/j.

cardiores.2005.11.008.

[103] Meryem A. Yücel, Juliette Selb, Christopher M. Aasted, Pei-Yi Lin, David Bor-

sook, Lino Becerra, and David A. Boas. “Mayer waves reduce the accuracy

of estimated hemodynamic response functions in functional near-infrared spec-

troscopy”. In: Biomedical Optics Express 7.8 (2016), p. 3078. issn: 2156-7085.

doi:10 . 1364 / BOE . 7 . 003078.url:https : / / www . osapublishing . org /

abstract.cfm?URI=boe-7-8-3078.

[104] Benjamin Blankertz, Laura Acqualagna, Sven Dähne, Stefan Haufe, Matthias

Schultze-Kraft, Irene Sturm, Marija Ušćumlic, Markus A Wenzel, Gabriel Cu-

rio, and Klaus-Robert Müller. “The Berlin Brain-Computer Interface: Progress

Beyond Communication and Control”. In: Frontiers in Neuroscience 10 (2016),

p. 530. issn: 1662-453X. doi:10 . 3389 / fnins . 2016 . 00530.url:https :

//www.frontiersin.org/article/10.3389/fnins.2016.00530.

[105] BBCI toolbox.url:https://github.com/bbci/bbci{\_}public (visited on

11/03/2016).

[106] Masaki Kameyama, Masato Fukuda, Toru Uehara, and Masahiko Mikuni. “Sex

and age dependencies of cerebral blood volume changes during cognitive acti-

vation: A multichannel near-infrared spectroscopy study”. In: NeuroImage 22.4

(2004), pp. 1715–1721. issn: 10538119. doi:10.1016/j.neuroimage.2004.03.

050.

[107] Hongyu Yang, Zhenyu Zhou, Yun Liu, Zongcai Ruan, Hui Gong, Qingming Luo,

and Zuhong Lu. “Gender difference in hemodynamic responses of prefrontal

area to emotional stress by near-infrared spectroscopy”. In: Behavioural Brain

162

BIBLIOGRAPHY

Research 178.1 (2007), pp. 172–176. issn: 01664328. doi:10.1016/j.bbr.

2006.11.039.

[108] M. J. Herrmann, A. Walter, A. C. Ehlis, and A. J. Fallgatter. “Cerebral oxy-

genation changes in the prefrontal cortex: Effects of age and gender”. In: Neu-

robiology of Aging 27.6 (2006), pp. 888–894. issn: 01974580. doi:10.1016/j.

neurobiolaging.2005.04.013.

[109] J.A.E Anderson, K.L Campbell, T Amer, C.L Grady, and L Hasher. “Timing

Is everything: Age differences in the cognitive control network are modulated

by time of day.” In: Psychology and aging 29.3 (2014), pp. 648–657. issn: 1939-

1498. doi:10.1037/a0037243.url:http://www.ncbi.nlm.nih.gov/pubmed/

24999661.

[110] Jack L. Lancaster, Marty G. Woldorff, Lawrence M. Parsons, Mario Liotti, Cata-

rina S. Freitas, Lacy Rainey, Peter V. Kochunov, Dan Nickerson, Shawn A.

Mikiten, and Peter T. Fox. “Automated Talairach Atlas labels for functional

brain mapping”. In: Human Brain Mapping 10.3 (2000), pp. 120–131. issn:

10659471. doi:10.1002/1097-0193(200007)10:3<120::AID-HBM30>3.0.CO;

2-8.

[111] Christopher M. Aasted, Meryem A. Yücel, Robert J. Cooper, Jay Dubb, Daisuke

Tsuzuki, Lino Becerra, Mike P. Petkov, David Borsook, Ippeita Dan, and David

A. Boas. “Anatomical guidance for functional near-infrared spectroscopy: At-

lasViewer tutorial”. In: Neurophotonics 2.2 (2015), p. 020801. issn: 2329-423X.

doi:10.1117/1.NPh.2.2.020801.url:http://neurophotonics.spiedigitallibrary.

org/article.aspx?doi=10.1117/1.NPh.2.2.020801.

[112] Mike X Cohen. Analyzing neural time series data: theory and practice. MIT

Press. MIT Press, 2014.

[113] Harry L. Graber, Rabah Al abdi, Yong Xu, Armand P. Asarian, Peter J. Pappas,

Lisa Dresner, Naresh Patel, Kuppuswamy Jagarlamundi, William B. Solomon,

and Randall L. Barbour. “Enhanced resting-state dynamics of the hemoglobin

signal as a novel biomarker for detection of breast cancer”. In: Medical Physics

42.11 (2015), pp. 6406–6424. issn: 00942405. doi:10.1118/1.4932220.url:

http://dx.doi.org/10.1118/1.4932220.

[114] Saying Chen, Pan Ning, Kaoru Sakatani, Huancong Zuo, Wemara Lichty, and

Shimin Zhao. “Auditory-evoked cerebral oxygenation changes in hypoxic-ischemic

163

BIBLIOGRAPHY

encephalopathy of newborn infants monitored by near infrared spectroscopy”.

In: Early Human Development 67.1-2 (2002), pp. 113–121. issn: 03783782. doi:

10.1016/S0378-3782(02)00004-X.

[115] K Sakatani, S Chen, W Lichty, H Zuo, and Y P Wang. “Cerebral blood oxygena-

tion changes induced by auditory stimulation in newborn infants measured by

near infrared spectroscopy.” In: Early human development 55.3 (1999), pp. 229–

236. issn: 0378-3782. doi:S0378-3782(99)00019-5[pii].

[116] Karen E. Waldie, Charlotte E. Haigh, Gjurgjica Badzakova-Trajkov, Jude Buck-

ley, and Ian J. Kirk. “Reading the wrong way with the right hemisphere”.

In: Brain Sciences 3.3 (2013), pp. 1060–1075. issn: 20763425. doi:10.3390/

brainsci3031060.

[117] Donald Shankweiler, W. Einar Mencl, David Braze, Whitney Tabor, Kenneth

R. Pugh, and Robert K. Fulbright. “Reading differences and brain: Cortical

integration of speech and print in sentence processing varies with reader skill”.

In: Developmental Neuropsychology 33.6 (2008), pp. 745–775. issn: 87565641.

doi:10.1080/87565640802418688.

[118] Karen E Waldie and James L Mosley. “Developmental trends in right hemi-

spheric participation in reading”. In: Neuropsychologia 38 (2000), pp. 462–474.

[119] Stanislas Dehaene. Reading in the brain: The new science of how we read. Pen-

guin, 2009.

[120] Archana K. Singh, Masako Okamoto, Haruka Dan, Valer Jurcak, and Ippeita

Dan. “Spatial registration of multichannel multi-subject fNIRS data to MNI

space without MRI”. In: NeuroImage 27.4 (2005), pp. 842–851. issn: 10538119.

doi:10.1016/j.neuroimage.2005.05.019.

[121] N Tzourio-Mazoyer, B Landeau, D Papathanassiou, F Crivello, O Etard, N Del-

croix, B Mazoyer, and M Joliot. “Automated anatomical labeling of activations

in SPM using a macroscopic anatomical parcellation of the MNI MRI single-

subject brain.” In: NeuroImage 15.1 (2002), pp. 273–289. issn: 1053-8119. doi:

10.1006/nimg.2001.0978.

[122] Xian Zhang, Jack Adam Noah, Swethasri Dravida, and Joy Hirsch. “Signal pro-

cessing of functional NIRS data acquired during overt speaking”. In: Neuropho-

tonics 4.04 (2017), p. 1. issn: 2329-423X. doi:10.1117/1.NPh.4.4.041409.

url:https://www.spiedigitallibrary.org/journals/neurophotonics/

164

BIBLIOGRAPHY

volume-4/issue-04/041409/Signal-processing-of-functional-NIRS-

data- acquired- during-overt- speaking/10.1117/1.NPh.4.4.041409.

full.

[123] Melissa Hardy and John Reynolds. “Incorporating Categorical Information into

Regression Models: The Utility of Dummy Variables”. In: Handbook of Data

Analysis. 2012, pp. 209–237. doi:10.4135/9781848608184.n9.

[124] A. Castro-Caldas, K. M. Petersson, A. Reis, S. Stone-Elander, and M. Ingvar.

“The illiterate brain. Learning to read and write during childhood influences the

functional organization of the adult brain”. In: Brain 121.6 (1998), pp. 1053–

1063. issn: 00068950. doi:10.1093/brain/121.6.1053.

[125] A Reis and A Castro-Caldas. “Illiteracy: a cause for biased cognitive develop-

ment.” In: Journal of the International Neuropsychological Society : JINS 3.5

(1997), pp. 444–50. issn: 1355-6177. url:http://www.ncbi.nlm.nih.gov/

pubmed/9322403.

[126] Federico De Martino, Giancarlo Valente, Noël Staeren, John Ashburner, Rainer

Goebel, and Elia Formisano. “Combining multivariate voxel selection and sup-

port vector machines for mapping and classification of fMRI spatial patterns”.

In: NeuroImage 43.1 (2008), pp. 44–58. issn: 10538119. doi:10 . 1016 / j .

neuroimage.2008.06.037.

[127] Christian Habeck and Y Stern. “Multivariate data analysis for neuroimaging

data: overview and Application to Alzheimer’s disease”. In: Cell Biochem Bio-

phys 58.2 (2010), pp. 53–67. doi:10.1007/s12013-010-9093-0.Multivariate.

url:http://link.springer.com/article/10.1007/s12013-010-9093-0.

[128] David D. Cox and Robert L. Savoy. “Functional magnetic resonance imaging

(fMRI) "brain reading": Detecting and classifying distributed patterns of fMRI

activity in human visual cortex”. In: NeuroImage 19.2 (2003), pp. 261–270. issn:

10538119. doi:10.1016/S1053-8119(03)00049-1.

[129] James V Haxby, M Ida Gobbini, Maura L Furey, Alumit Ishai, Jennifer L

Schouten, and Pietro Pietrini. “Distributed and overlapping representations of

faces and objects in ventral temporal cortex”. In: Science 293.September (2001),

pp. 2425–2430. issn: 0036-8075. doi:10.1126/science.1063736.

[130] Irina Simanova, Marcel van Gerven, Robert Oostenveld, and Peter Hagoort.

“Identifying object categories from event-related EEG: Toward decoding of

165

BIBLIOGRAPHY

conceptual representations”. In: PLoS ONE 5.12 (2010), e14465. doi:https:

//doi.org/10.1371/journal.pone.0014465.

[131] Radoslaw Martin Cichy, Dimitrios Pantazis, and Aude Oliva. “Resolving human

object recognition in space and time.” In: Nature neuroscience (2014). issn:

1546-1726. doi:10.1038/nn.3635.

[132] Hesheng Liu, Yigal Agam, Joseph R. Madsen, and Gabriel Kreiman. “Timing,

Timing, Timing: Fast Decoding of Object Information from Intracranial Field

Potentials in Human Visual Cortex”. In: Neuron (2009). issn: 08966273. doi:

10.1016/j.neuron.2009.02.025.

[133] Chih-chung Chang and Chih-jen Lin. “LIBSVM : A Library for Support Vec-

tor Machines”. In: ACM Transactions on Intelligent Systems and Technology

(TIST) 2 (2013), pp. 1–39. issn: 21576904. doi:10.1145/1961189.1961199.

arXiv: 0-387-31073-8.

[134] Masaya Misaki, Youn Kim, Peter A. Bandettini, and Nikolaus Kriegeskorte.

“Comparison of multivariate classifiers and response normalizations for pattern-

information fMRI”. In: NeuroImage 53.1 (2010), pp. 103–118. issn: 10538119.

doi:10.1016/j.neuroimage.2010.05.051.url:http://dx.doi.org/10.

1016/j.neuroimage.2010.05.051.

[135] Kelly Tai and Tom Chau. “Single-trial classification of NIRS signals during

emotional induction tasks: Towards a corporeal machine interface”. In: Journal

of NeuroEngineering and Rehabilitation 6.1 (2009), pp. 1–14. issn: 17430003.

doi:10.1186/1743-0003-6-39.

166