scieee Science in your language
[en] (orig)
Data-driven analysis for
multimodal neuroimaging
vorgelegt von
Felix Bießmann
Von der Fakultät IV Elektrotechnik und Informatik
der Technischen Universität Berlin
zur Erlangung des akademischen Grades
Doktor der Naturwissenschaen (Dr. rer. nat.)
genehmigte Dissertation
Promotionsausschuss:
Vorsitzender: Prof. Dr. Olaf Hellwich (Technische Universität Berlin)
1. Gutachter: Prof. Dr. Klaus-Robert Müller (Technische Universit Berlin)
2. Gutachter: Prof. Dr. Nikos K Logothetis (MPI Tübingen)
3. Gutachter: Prof. Dr. Tom Eichele (University of Bergen)
Tag der wissenschalichen Aussprache: 07.12.2011
Berlin, 2012
D
In memoriam Florian Perleth
Acknowledgements
I grateful for the support of many people during the past few years. Above
all I thank my supervisors Prof. Dr. Klaus-Robert Müller, Prof. Dr. Gregor
Rainer and Prof. Dr. Nikos K. Logothetis. Prof. Dr. Müller always encouraged
me with scientific enthusiasm and helped in countless stimulating discussions with
experienced advice on theoretical and practical aspects of statistical learning the-
ory. With his Machine Learning Group he created an inspiringly interdisciplinary
and friendly atmosphere and gave me the opportunity to pursue research interest
in neuroscience as well as in artificial intelligence in the best way I could imagine.
Furthermore I am very grateful to Prof. Dr. Gregor Rainer for continuous support
and for valuable insights into the experimental side of neuroscience. Of course I
owe my utmost gratitude to Prof. Dr. Nikos K. Logothetis. His pioneering work
on the neurophysiological basis of the hemodynamic signal is the starting point of
this dissertation. I profited enormously from his scientific advice. Without his sup-
port, this dissertation would not have been possible. Also I would like to thank
Dr. Yusuke Murayama for scientific advice, technical help and his interest in new
analysis methods. Moreover I am indebted to the outstanding experimental exper-
tise of Axel Öltermann, Dr. Marc Augath, Dr. Jozien Goense, Dr. Alexander Rauch,
Prof. Dr. Robert Kretz, Jule Veit and Anwesha Bhattacharyya. By the same token
I would like to thank Dr. Arthur Gretton, Dr. Jakob Macke and Prof. Dr. Matthias
Bethge for their help on mathematical problems during my time in Tübingen. I also
thank Dr. Andreas Harth foran interesting research project outside of neuroscience.
I was lucky to meet many friendly, creative and bright minds who helped a lot for
calibrating my sense of whats right and wrong in science and beyond, among them
Michael Gaebler, Dr. David Greenberg, Daniel Lebrecht, Dr. Dr. Franz Kiraly , Ste-
fan Haufe, Paul von Bünau and Jule Veit. Special thanks go to Frank C. Meinecke
forconstant andpatientsupervision. Hisexperienced skepticism, hissharp thinking
and his talent to explain complex concepts in a comprehensive way were invaluable
tome. TohimIalsoowethelayoutofthisthesis. AlsoIwouldliketothankValentina
Mosienko for her patience and support. Finally I owe my deepest gratitude to my
family. e most important decisions which led to this thesis were not taken by me,
but by my parents. ank you for making me curious.
vi
Abstract
N, the measurement, analysis and visualization of neural activity,
contributed considerably to our understanding of information processing in
the brain. e availability of non-invasive neuroimaging devices such as functional
magnetic resonance imaging (fMRI) has been increasing rapidly throughout the last
two decades. Nowadays every interested student can obtain non-invasively high res-
olution image time series of the blood-oxygen level dependent (BOLD) signal using
fMRI. How exactly neural activity is reflected in the BOLD contrast is still subject of
activeresearch. Fora moreaccurateinterpretationof the fMRI signaland the under-
lying neurovascular coupling mechanisms, combined measurements of intracranial
neural activity and fMRI signals are indispensable. Such simultaneous measure-
ments have become technically possible however appropriate analysis methods are
still lacking. Classical analysisapproachesrelyonsimplifying assumptionsaboutthe
neurovascular coupling dynamics. ese assumptions are convenient but numerous
studies have provided empirical evidence against them.
In this dissertation a novel analysis method, termed temporal kernel Canoni-
cal Correlation Analysis (tkCCA), will be developed, tested on artificial data and
applied to experimental data in order to investigate neurovascular coupling mech-
anisms. TkCCA estimates dependency structures between high dimensional data
with complex temporal coupling dynamics. e important advantages of tkCCA
compared to standard methods are a) tkCCA can be directly applied to multimodal
data, b) tkCCA is very efficient for high dimensional data with few data points (as is
the case for fMRI) and c) tkCCA does not make use of restrictive assumptions about
thedatageneratingprocess. InparticulartkCCAcanbeusedtoanalyzehighdimen-
sional simultaneous measurements of neural activity and fMRI signals. Predictions
of neural activity using tkCCA are better than when using classical methods. Basic
research as well as clinical applications can profit from this more accurate predic-
tion. Besides tkCCA is readily applicable to other domains in which data streams
have high dimensional features that are non-instantaneously coupled, such as data
from social networks in the World Wide Web.
viii
Zusammenfassung
B Verfahren in den Neurowissenschaen haben unser Verständnis
von Informationsverarbeitung im Hirn entscheidend geprägt. Die schnelle
Verbreitung von nicht-invasiven bildgebenden Verfahren wie etwa der funktionellen
Magnet-Resonanz Tomographie (fMRT) in den letzten beiden Dekaden erlaubt es
heutzutage jedem interessierten Studenten nicht-invasiv den Blutsauerstoffgehalt
im Hirn zu messen und so umlich hochaufgelöste Zeitreihen von Hirnaktivit
aufzuzeichnen. Wie genau sich jedoch neuronale Aktivit im Blut-Sauerstoff Gehalt
abhängigem Kontrast (englisch: Blood-oxygen level dependent oder BOLD contrast)
wiederspiegelt, isttrotzintensiverForschungan derNeurovaskulären Kopplung nach
wie vor aktuelles Forschungsthema. Für ein besseres Verständnis des fMRT Signals
sind kombinierte Messungen von intrakranialer neuronaler Aktivität und fMRT
Signalen zwingend erforderlich. Diese multimodalen Simultanmessungen sind in-
zwischen technisch möglich. Jedoch fehlen geeignete Analysemethoden. ngige
Verfahren basieren auf vereinfachenden Annahmen über die neurovaskuäre Kop-
plungsdynamik. Diese Annahmensind zwarpraktisch, erwiesen sichaber inzahlre-
ichen Studien als falsch.
In dieser Dissertation wird ein neuartiges Analyseverfahren, temporal kernel
CanonicalCorrelationAnalysis(tkCCA),entwickelt,getestedundangewandt. TkCCA
schätztAbhängigkeitsstruktureninhochdimensionalenDatenmitnicht-instantaner
Kopplung. DieentscheidendenVorteilevontkCCAgegenüberherkömmlichenMeth-
oden sinda) direkte Anwendbarkeit aufmultimodale Daten, b)Effizienz bei hochdi-
mensionalen Daten mit wenig Datenpunkten und c) Verzicht auf einschränkende
Annahmen über die generativen Modelle der gemessenen Daten. Insbesondere er-
laubttkCCA die Analysehochdimensionalermultimodaler Simultanmessungenvon
neuronaler Aktivität und fMRT Signalen. Mit Hilfe von tkCCA können neuronale
Signale besser aus multivariaten fMRI Messungen vorhergesagt werden. Davon
nnensowohlGrundlagenforschungalsauchklinischeDiagnostikprofitieren. Da-
rüberhinausisttkCCAdirektanwendbaraufandereArtenhochdimensionalerDaten-
ströme mit nicht-instantanen Abhängigkeiten, wie etwa in sozialen Netzwerken im
World Wide Web.
x
Contents
1 Introduction 1
I Unimodal Neuroimaging
2 A short history of neuroimaging 9
3 Unimodal neuroimaging 11
4 Unimodal analysis approaches 19
II Multimodal Neuroimaging
5 Multimodal neuroimaging 35
6 Multimodal analysis approaches 45
7 Data-driven multimodal analysis 53
8 Model-free analysis of neurovascular coupling 73
9 Summary and outlook 103
Appendix
A Mathematical preliminaries 109
B Experimental protocols 113
xii CONTENTS
Chapter 1
Introduction
it is in the brain that everything takes place.
Oscar Wilde
T brain is an exceptionally interesting organ. Like no other organ it is in the
focus of attention of a broad spectrum of scientific disciplines, ranging from
natural science over psychology to philosophy. Many of the insights we have about
the brain were gained by measuring its electrical and metabolic activity. Measure-
ment of brain activity and the visualization thereof using appropriate analysis tech-
niques is called neuroimaging. Various neuroimaging techniques have been devel- Measurement, analysis
and visualization of
brain activity is called
neuroimaging
oped, but each method has its specific technological and physiological limits. Some
techniques, such as electroencephalography (EEG), can visualize brain activity at a
high temporal resolution, but only at a very poor spatial resolution. Other meth-
ods, such as measurements of the blood-oxygen level dependent (BOLD) signal for
instance, canimage thewholebrain atonce, inparticular deepstructuresthatcannot
be measured with EEG, but they cannot give insights in the fast temporal dynam-
ics of brain activity. is is why combinations of multiple neuroimaging modalities
have become popular. Multimodal imaging setups can take advantage of comple-
mentary views on neural activity and enhance our understanding about how neural
information processing is reflected in each modality. In order to exploit the poten-
tial ofmultimodalmethods, dedicated analysismethodsareneeded. Manysolutions
to this data integration problem have been proposed. However the complex physi-
ological processes that give rise to the BOLD contrast are difficult to model. us,
every analysis method for hemodynamic neuroimaging data has to make simpli-
fying assumptions about the data generating process. is allows for more efficient
computations and easier interpretation of the results. However due to these assump-
tionssomeaspectsofbrainactivitymightbelostin the analysis. Inthis dissertationa
novel analysis framework for multimodal neuroimaging data will be proposed. Ap-
plying this analysis method on multimodal neural data, a simplifying assumption
underlying many studies based on BOLD contrast will be tested empirically.
2CHAPTER 1. INTRODUCTION
What is measured with fMRI? One of the most important applications of multi-
modal neuroimaging is the investigation of the neurovascular coupling mechanisms,
the relationship between brain activity and the hemodynamic signal. While accu-e relationship
between brain activity
and its hemodynamic
response is called
neurovascular coupling
rate measurements of brain activity require intracranial electrodes, hemodynamic
signalscanbemeasurednon-invasivelyusingforinstancefMRI [Ogawaetal.,].
ehopethatnon-invasivetechniqueswilleventuallyreplaceinvasiveneuralrecord-
ings is the basis for the enormous success of fMRI in recent years [Friston, ].
However the exact relationship between neural activity and fMRI signals is still sub-
ject of active research [Logothetis, ]. e physiological processes underlying
the generation of the BOLD signal are still not sufficiently well understood to model
them accurately. Many detailed models have been proposed, but they are difficult
to test empirically. A few reasons for this are: Recording the quantities involved in
neurovascular coupling is technically challenging. Large scale recordings of all elec-
trical and chemical forces involved cannot be obtained yet. Another reason is the
computational complexity of the analysis. Classical statistical methods are not de-
signed for state of the art multimodal data with tens of thousands of dimensions and
only a few data points a classical problem faced by researchers working on fMRI
data. Technical advances in the field of multimodal recordings enable researchers to
obtain high resolution hemodynamic measurements simultaneously with intracra-
nial neurophysiological recordings [Logothetis et al., , Oeltermann et al., ,
Goense and Logothetis, ]. For these data, novel analysis methods are needed
[Dale and Halgren, , Friston, ].
Anovelanalysisframework Inthisdissertationanovelanalysisframeworktermed
temporal kernel canonical correlation analysis (tkCCA) is proposed. It is tailored
to the specific needs in multimodal neuroimaging: Scalability to high dimensional
data and the ability to account for arbitrary non-instantaneous coupling mecha-
nisms while making only minimal assumptions about the data generating process
and the coupling mechanisms. In order to meet these requirements, tkCCA com-
bines well established statistical learning techniques with modern machine learning
methods.
Does spatiotemporal variability of fMRI signals contain neural information?
Aer introduction of tkCCA and validation thereof on synthetic data, the method
is used to test a simplifying assumption underlying many neuroimaging methods, in
particular multimodal analyses. Most neuroimaging methods assume that the spa-
tial dynamics of the hemodynamic response to neural activation is separable from
the temporal dynamics of the response. Non-separable spatiotemporal variability of
the hemodynamic response is thus neglected in these methods. Empirical evidence
suggests that this spatiotemporal separability assumption is a good approximation:
3
When considering the voxels in an fMRI image sequence located around a neural
ensemble of interest aer that ensemble was stimulated, the temporal dynamics are
very similar. However there is substantial evidence showing that the hemodynamic
response varies across brain regions [Aguirre et al., ] or different cortical lay-
ers [Yacoub et al., ]. Taking this variance into account can reveal areas of ac-
tivation that would have been overlooked by methods that assume spatiotemporal
separability [Mourão-Miranda et al., , Lu et al., ]. e scientific hypothesis
underlying the spatiotemporal separability assumption is: Does the spatiotemporal
variability of the hemodynamic response carry information about neural signals?
While the above studies provide only indirect evidence, tkCCA applied to simulta-
neous recordings of neural and hemodynamic activity can directly test this hypoth-
esis.
Overview of this dissertation
e structure of this thesis is divided in two parts. e first part will give a short
overview over popular neuroimaging methods in chapters , and . e second
part will deal with the combination of neuroimaging methods in multimodal se-
tups. Chapter gives an introduction to multimodal neuroimaging setups and some
motivating examples of their application in basic neuroscientific research and clin-
ical application. Common analysis standards established in the literature will be
reviewed in chapter .
e main part of this thesis is chapter in which the proposed algorithm is de-
veloped. In collaboration with the Max-Planck Institute for Biological Cybernetics,
Tübingen, we applied tkCCA for estimation of neurovascular coupling dynamics
and high accuracy prediction of neural activity from simultaneously recorded fMRI
signals. Chapter will show applications of the proposed algorithm on artificial and
real data.
Own contributions
e algorithm and its application to multimodal neural data have been published
in internationally renown journals in the machine learning and neuroimaging com-
munity; a detailed overview is given on the next page. Early stages of this work have
been presented in form of conference abstracts at the Forum of the European Neu-
roscience Society (FENS) in Geneva [Bießmann et al., ], the Computational
Neuroscience Society (CNS) in Berlin [Bießmann et al., ] and the Computa-
tional and Systems Neuroscience (COSYNE) Conference in Salt Lake City [Bieß-
mann et al., a]. is work profited from scientific exchange aer presentation
of preliminary results at the RIKEN Brain Science Institute (Tokyo), the Cognitive
4CHAPTER 1. INTRODUCTION
and Neurobiological Imaging Lab at Stanford University and the Redwood Cen-
ter for eoretical Neuroscience at University of California, Berkeley. Parts of this
thesis are based on a review article on multimodal neuroimaging analysis methods
[Bießmann et al., b]. Next to the work on multimodal neuroimaging we also
explored applications in other domains such data mining [Bießmann and Harth,
]. A summary of the work that is published as peer reviewed manuscripts is
given in the following.
Peer reviewed manuscripts
. Bießmann, Meinecke, Gretton, Rauch, Rainer, Logothetis, and Müller, Tem-
poral Kernel CCA and its Application in Multimodal Neuronal Data Analysis,
Machine Learning, 
Summary Temporal kernel canonical correlation analysis (tkCCA) was
proposed and tested in simulations and preliminary data from simulta-
neousmeasurementsofneurophysiological recordingsandfMRIsignals
during sensory stimulation.
Contribution I developed parts of the recording soware, did all analy-
ses and wrote the manuscript.
. Murayama, Bießmann, Meinecke, Müller, Augath, Oeltermann, and Logo-
thetis, Relationship between neural and hemodynamic signals during sponta-
neous activity studied with temporal kernel CCA,Magnetic Resonance Imag-
ing, 
Summary In this manuscript we showed that tkCCA can robustly esti-
mate neurovascular coupling dynamics from recordings of spontaneous
activity in primary visual cortex.
Contribution Ididpartsoftheanalysesandwrotepartsofthemanuscript.
. Bießmann and Harth, Analysing Dependency Dynamics in Web Data,Pro-
ceedings of AAAI Symposium, 
Summary An alternative application of tkCCA is illustrated on social
networkdata; usingtimeresolvedlisteningbehaviorofusersinmyfriend
subgraph on http://last.fm I extracted music trends and identified users
who where ahead of this trend and those who lagged behind.
Contribution I collected the data, analyzed it and wrote the manuscript.
. Bießmann, Plis, Meinecke, Eichele, and Müller, Analysis of Multimodal Neu-
roimaging Data,IEEE Reviews in Biomedical Engineering, 
5
Summary is review summarizes the state of the art in multimodal
neuroimaging analysis with a special focus on data driven methods.
Contribution I wrote major parts of the manuscript and provided ap-
plication examples from artificial and real data.
. Bießmann, Murayama, Logothetis, Müller, and Meinecke, Improved Decoding
of Neural Activity from fMRI Signals: Towards Non-Separable Spatiotemporal
Deconvolutions, in revision
Summary In this manuscript we show that the spatiotemporal variabil-
ity of the hemodynamic response carries information about neural ac-
tivity that most fMRI analysis methods neglect.
Contribution I analyzed the data and wrote the paper.
. Bhattacharyya, Bießmann, Veit, Kretz, and Rainer, Functional and laminar
dissociations between muscarinic and nicotinic cholinergic neuromodulation in
the tree shrew primary visual cortex, to appear in European Journal of Neuro-
science
Summary In this paper we showed differential effects of the neuromod-
ulator Acetylcholine (ACh) on visual information processing in differ-
ent layers of primary visual cortex.
Contribution I developed parts of the stimulus presentation soware,
helped recording some of the data, contributed to the data analysis and
wrote minor parts of the manuscript.
. Rauch, Zhang, Bießmann, Meinecke, Goense, Müller, Rainer, and Logothetis,
Baseline BOLD signal shi in macaque primary visual cortex (V) aer local
application of Acetylcholine, in preparation
Summary In this paper we showed differential effects of the neuromod-
ulator Acetylcholine (ACh) on visual information processing as mea-
sured with neurophysiological recordings and hemodynamic activity.
e BOLD signal exhibited a pronounced increase in baseline activity,
while the effects on neural activity were rather heterogeneous. A series
of additional experiments outside of the scanner helped to resolve the
heterogeneous effects on neural activity: In [Bhattacharyya et al., ]
we could show that the effects of ACh can be dissociated with respect to
laminar position and receptor type.
Contribution I developed the realtime recording system, helped record-
ing some of the data, performed all analyses and wrote minor parts of
the manuscript.
6CHAPTER 1. INTRODUCTION
Part I
Unimodal Neuroimaging
Chapter 2
A short history of neuroimaging
Es war von vornherein zu erwarten, daß auch im
Zentralnervensystem […] bioelektrische
Erscheinungen nachweisbar seien.
Berger [], Über das
Elektroenkephalogramm des Menschen
S the late th century researchers explore how cognitive functions are re-
flected in measurements of neural activity [Caton, ]. Using different neu-
roimaging techniques, disciplines such as psychology, biology and medicine accu-
mulated knowledge about how what we perceive, feel, think and do is related to the
complex activity patterns of neurons in our central nervous system. Table . high-
lights a selection of important achievements in the history of (multimodal) neu-
roimaging. Animal studies have been indispensable for a better understanding of
how neural activity is related to the outside world, some examples are [Mountcas-
tle, , Hubel and Wiesel, ]. But the most fascinating mental phenomena,
such as higher cognitive functions like language or reasoning, are difficult to study
with animals. Oen long training phases are needed to establish an experimental
paradigm. Humans have a clear advantage here: We can use language to commu-
nicate what a subject is supposed to do in an experiment. Human neuroscientific
experiments rely on non-invasive measurements of brain activity. ey can be ob-
tained for instance by electroencephalography (EEG) [Berger, ], magnetoen-
cephalography (MEG) [Cohen, ], near infrared spectroscopy (NIRS) [Jöbsis,
] or functional magnetic resonance imaging (fMRI) [Ogawa et al., ]. e
quantities measured by each of these modalities have different physiological origins
and thus different limitations and advantages. And each modality reflects neural ac-
tivity at a different spatiotemporal scale. A short summary of various neuroimaging
techniques is given in the next chapter.
10 CHAPTER 2. A SHORT HISTORY OF NEUROIMAGING
 Caton Electrocorticography (ECoG) reveals electrical ac-
tivity in the brain in response to visual stimulation
 Berger Electroencephalography (EEG) reveals brain oscilla-
tionsbetween Hz(α-rhythms)in occipital areas
associated with states of vigilance
 Mountcastle Detailed topographic mapping of sensory modalities
using intracranial electrophysiology
 Cohen Magnetoencephalography (MEG) for measuring α
activity in occipital cortex
 Jöbsis Near infrared spectroscopy (NIRS) for non-invasive
imaging of brain activity
 Grinvald et al. Combined intrinsic optical signal imaging and in-
tracranial electrophysiology to investigate neurovas-
cular coupling
 Ogawa et al. Functional magnetic resonance imaging (fMRI) to
visualize brain activity
 Dale et al. Sequential fMRI and MEG
 Lemieux et al. Simultaneous fMRI and EEG
 Logothetis et al. Simultaneous fMRI and intracranial
microelectrode recordings
Table .: A short history of (multimodal) neuroimaging methods
Chapter 3
Unimodal neuroimaging
M neuroimaging modalities measure either electrophysiological or hemo-
dynamic signals. A widely used neuroimaging technique for electrophysi-
ological activity with an exquisite spatial and temporal resolution are intracranial
microelectrode recordings. Among hemodynamic modalities, fMRI became most
popular [Friston, ]. Electrophysiological recordings pick up changes in electro-
magnetic fields induced by neural activity; energy consumption of neural activity
is correlated with blood oxygenation. e hemodynamic signal thus reflects neural
activity indirectly. Both electrophysiological and hemodynamic signals can be mea-
sured invasively and non-invasively. Electrodes can be placed directly in the neural Neural activity is
reflected directly in
electrical field changes
and indirectly in the
amount of oxygen
bound to hemoglobin
molecules; both can be
measured invasively
and non-invasively;
tissue, on the cortical surface (electrocorticograms or ECoG) [Caton, ] or on
the skull (electroencephalograms or EEG) [Berger, ]. Neural activity is also re-
flected in the magnetic field uctuations which can be measured non-invasively by
magnetoencephalograms (MEG) [Cohen, ]. EEG and MEG measure changes
in electrical andmagnetic fields, respectively, onthe scalp surface. eorigin of EEG
and MEG signals areelectrical dipoles in the cortex emerging from synchronized ac-
tivity of neighboring neurons with elongated shape [Murakami and Okada, ].
e strongest signal in EEG recordings arises from dipoles oriented perpendicular
to the scalp surface [Nunez and Srinivasan, ]. MEG in contrast is most sensi-
tive to cortical dipoles tangential to the scalp [da Silva and Niedermayer, ]. e
magnetic fields measured with MEG are not affected by volume conduction and can
resolve finer structures than EEG [Grynszpan and Geselowitz, , Cuffin and Co-
hen, ]. Hemodynamic activity can be measured semi-invasively using intrinsic
optical imaging [Grinvald et al., ] ornon-invasively by functional magnetic res-
onance imaging (fMRI) [Ogawa et al., ] and near infrared spectroscopy (NIRS)
[Jöbsis, ]. In NIRS infrared light is sent through the scull and the cortical tissue;
as oxygenated blood has different wavelength absorption characteristics than deoxy-
genated blood, neural activity can be measured by examining the reflected light. e
spatial resolution is much lower than fMRI and imaging is restricted to the cortical
surface but the temporal resolution can be higher than fMRI; besides the setup is
much cheaper and less complex than that of fMRI. Next to these non-invasive neu-
roimaging techniques, there are semi-invasive techniques that require opening the
12 CHAPTER 3. UNIMODAL NEUROIMAGING
Measurement Origin Resolution Invasive
Spatial Temporal
Electrophysiological activity
Intracranial
recordings
single cell /
population activity high high yes
EEG cortical dipoles
orthogonal to scalp low high no
ECoG cortical dipoles
orthogonal to scalp medium high yes
MEG cortical dipoles
tangential to scalp low high no
Hemodynamic activity
Optical imaging blood oxygenation
and volume high high yes
NIRS blood oxygenation
and volume low medium no
fMRI blood oxygenation high low no
Table .: Simplified overview of popular neuroimaging modalities;
skull but no penetration of neural tissue as in the case of intracranial electrophysi-
ological recordings. Semi-invasive preparations such as ECoG and intrinsic optical
imaging offer a higher resolution than completely non-invasive methods and bear
fewer risks than invasive recordings. ECoG measures electrical oscillations between
electrodes directly on the cortex. Similarly hemodynamic activity can be imaged
with optical imaging in minimally invasive preparations also in humans [Arthur
and Nader, ].
3.1 Electrophysiogical measurements
e computations carried out by our brain are reflected in changes of electrical po-
tentials across the cell membrane of neurons [Hodgkin and Huxley, ]. Under-
standing neural computations requires measurements of electrophysiological activ-
ity. is can be done at various levels, invasively, semi-invasively with ECoG or
non-invasively with EEG or MEG. e most detailed measurements of neural activ-
ity can be obtained with invasive electrophysiological recordings.
Physiologicalorigin Invasiveelectrophysiologicalrecordingsmeasurethechanges
in electrical fields emerging from neural activity. Neurons communicate mainly via
activation of chemosensitive ion channels located on the (post-)synapse, illustrated
as black dots in figure .. If a neuron releases neurotransmitters at the synapse, ion
channels on the postsynaptic neuron open. e opening allows ions to flow down
their electrochemical gradient. is results in a depolarization of the dendritic part
of the cell with respect to the extracellular medium (indicated by red plus sign in
3.1. ELECTROPHYSIOGICAL MEASUREMENTS 13
Dendrites
(Input)
Soma
Axon
(Output)
-
+
+
-
Neurophys WT05/06
T01
9
Electric Dipole
ground: V 0
V > 0
V < 0
E
The dependence of the potential polarity is due to the fact that the potential differences measured
with respect to ground depend on the electrode position within the electric field of a dipole.
Activated neurons generate transient electric dipols due to inhomogeneous charge distributions.
Neurophys WT05/06
T01
9
Electric Dipole
ground: V 0
V > 0
V < 0
E
The dependence of the potential polarity is due to the fact that the potential differences measured
with respect to ground depend on the electrode position within the electric field of a dipole.
Activated neurons generate transient electric dipols due to inhomogeneous charge distributions.
Synapse
V
Figure .: Sketch of a neuron: In grey the axon of another cell forming a synapse
with the neuron in black. During rest, the intracellular medium is approximately
at mV with respect to the extracellular medium; neurotransmitters released at
the synapse result ina depolarization inthedendrite, thecellbecomesanelectrical
dipole: the soma is negatively charged relative to the dendrite; if many neurons
are arranged in parallel and receive synchronized dendritic input, this dipole will
become stronger eventually strong enough give rise to electromagnetic fields
that can be measured outside of the scull using EEG;
the post synapse and a blue minus sign in the extracellular medium). Synchronized
depolarization of large ensembles of neighboring neurons result in the generation
of electromagnetic dipoles. e fields emerging from these dipoles can be measured
extracellularly with electrodes, onthe cortical surfacewith ECoG oroutsidetheskull
with non-invasive EEG or MEG measurements. e strength of the measured signal
depends critically on the dipole generating neurons, their shape and their spatial ar-
rangement in an ensemble. Aspiny inhibitory interneurons with radially symmetric
dendritic trees around their somata for instance will form dipoles that are difficult
to measure in extracellular recordings. Other cells, such as large pyramidal neu-
rons, have a bipolar shape with the some on one end and the dendrite elongating
in parallel to neighboring neurons (see fig. .). ese pyramidal ensembles gen-
erate dipoles with are much stronger in electrophysiological recordings [Murakami
and Okada, ]. e strongest signal in EEG recordings arises from dipoles ori-
ented perpendicular to the scalp surface, the strongest signal in MEG recordings are
dipoles tangentially to the scalp [da Silva and Niedermayer, , Nunez and Srini-
vasan, ]. Invasive electrical recordings offer the highest spatiotemporal resolu-
tion. Two aspects are differentiated in intracranial electrophysiological recordings,
fast discharges spikes or action potentials (APs) and low frequency content, oen
called local field potentials or LFP. Neuronal spikes are associated with the output of
the computations carried out by a neuron. e high frequency spectrum of neu-
rophysiological recordings containing the spiking activity of many single units is
called multi-unit activity (MUA). As the electrode is at a fixed position relative to
14 CHAPTER 3. UNIMODAL NEUROIMAGING
the cells in its surround, action potentials of different cells can be differentiated by
their distinct AP shapes recorded at the electrode. Classifying single cell units is
called spike sorting. e LFP is hypothesized to reflect subthreshold membrane os-Local Field Potentials
(LFPs) are slow oscil-
lations in neurophy-
siological signals; fast
(above 1KHz) oscilla-
tions are called multi-
unit activity (MUA)
cillations (the input to or state of a neuron before it is sending out spikes). Metabolic
signals of brain activity, such as the BOLD contrast, has been reported to be more
correlated with LFPs than spikes [Logothetis et al., , Goense and Logothetis,
]. is finding is consistent with the larger number of mitochondria¹ found in
the dendritic parts of neurons (the input site) as compared to the axons (the output
part) [Wong-Riley, , Attwell and Laughlin, ].
Signalpropertiesandlimitations Electrophysiologicalmeasurementshaveahigh
spatiotemporal resolution. Intracranial recordings have the potential to resolve sin-
gle cell activity. However the effective spatial resolution depends on the number
and arrangement of electrodes. For intracranial neurophysiology the temporal res-
olution is typically around Khz. Depending on the number of electrodes up to
 single cells can be recorded at once [Lehev and Nicolelis, ]. Recently there
has been increased interest in LFP signals. ey are easy to measure, with much
lower sampling rates than spikes, but exhibit a similar sensitivity for sensory fea-
tures compared to spikes in early visual cortex [Xing et al., ]. Also the LFP is
shown to correlate better than spikes with the fMRI signal [Logothetis et al., ,
Goense and Logothetis, ]; for many applications the high temporal resolution
of spikes is not needed and LFPs represent a promising alternative [Waldert et al.,
]. e spatial resolution of LFPs is on the order of hundreds of micrometers
[Katzner et al., ]. Non-invasive electrophysiological measures like EEG have a
lower spatial resolution on the order of several millimeters as measured by compar-
ing EEG based dipole estimates with fMRI activation centers [Im et al., ]. Al-
thoughthereis evidencethatthephase ofthe EEG oscillationsis coupledtointracra-
nial spikes [Whittingstall and Logothetis, ] neuronal spikes cannot be detected
directly using EEG. Hence EEG as well as MEG is typically sampled at frequencies
below KHz.
3.2 Hemodynamic measurements
Neural activity is reflected indirectly in the hemodynamic response. Hemodynamic
activity in the entire brain can be measured non-invasively using fMRI. Nowadays
commercially available fMRI setups can easily be operated without any knowledge
about MRI physics. us fMRI has become by far the most popular neuroimaging
technique [Friston, ]. It is important to keep in mind that fMRI does not mea-
sure neural activity directly. e relationship between neural and hemodynamic
¹Mitochondria are the cellular power plants and provide the energy needed for the ion pumps pre-
serving electrochemical gradients across neuronal membranes.
3.2. HEMODYNAMIC MEASUREMENTS 15
signals is still subject of active research [Logothetis, ]. As more and more non-
technically minded researchers are making use of this powerful technique, recent
studies highlighted the importance of solid statistical standards for fMRI analysis
[Kriegeskorte et al., , Bennett et al., , Vul et al., ].
Physiological origin Neural activity in the brain consumes energy which is de-
livered by the blood stream. e blood supply is controlled by the highly complex
cortical microvasculature, shown in fig. . as a vascular corrosion cast² of primary
visual cortex of the macaque monkey. Most importantly the vascular system has to
ensure that there is always enough oxygen delivered to the neurons. A blockage of
the arterial vessels on the cortical surface can affect whole cortical columns under-
neath and thus can have devastating consequences on cognitive functions. Oxygen
is carried through the blood stream via erythrocytes, the red blood cells. e cell
plasma of erythrocytes is rich in hemoglobin molecules, which transport oxygen.
e important aspect for neuroimaging is that deoxygenated hemoglobin (HbR) Oxygenated and deoxy-
genated blood have
different magnetic and
light absorption
properties
and oxygenated hemoglobin (HbO) have different magnetic and light absorption
properties. is gives rise to the blood-oxygen level dependent (BOLD) signal. e
BOLD contrast is a complex combination of blood oxygenation, blood flow and
blood volume [Buxton et al., ]. Hemodynamic signals can be measured in-
vasively using intrinsic signal optical imaging (ISOI) [Grinvald et al., , Frostig
et al., ] as well as non-invasively using functional magnetic resonance imaging
(fMRI) [Ogawa et al., ] or near infrared spectroscopy (NIRS) [Jöbsis, ].
Signal Properties and Limitations e temporal resolution of intrinsic optical
imaging data is typically around Hz (see e.g. [Berwick et al., ]). While op-
tical imaging can visualize hemodynamic signals on the cortical surface at a high
resolution, the depth resolution is rather poor. Intrinsic optical imaging requires
opening (or at least removing parts of) the skull. Although initial applications of
optical imaging were in basic neuroscience research [Grinvald et al., , Frostig
et al., ] its usefulness for diagnostic purposes is also being explored in human
patients [Pouratian et al., ]. NIRS operates with infrared light which can travel
through the intact skull, measurements can thus be taken non-invasively. is non-
invasiveness comes at the price of a poor spatial resolution. e advantage is that
NIRS setups are simple, low-cost and portable [Villringer and Chance, , Wolf
et al., ]. e advantages of optical imaging and NIRS are combined in fMRI:
Like NIRS, fMRI measurements are non-invasive. And like optical imaging, the spa-
tial resolution of fMRI measurements is high; in contrast to optical imaging, fMRI
can image deep subcortical structures. Using specialized imaging protocols and
hardware, fMRI can resolve cortical laminae [Goense and Logothetis, ]. e
²In vascular corrosion casts vessels are perfused with plastic and surrounding tissue is removed; the
negative of the vascular system can be imaged at high resolution with electron microscopy;
16 CHAPTER 3. UNIMODAL NEUROIMAGING
Figure .: Vascular system in primary visual cortex of the macaque monkey;
veins are shown in blue, arteries in red, micro vessels in grey (taken from [Keller
et al., ] with kind permission of Anna Lena Keller, MPI Biological Cybernet-
ics, Tübingen)
point spread function of fMRI signals is on the order of mm [Shmuel et al., ,
Sirotin et al., ]. Of course the neural signal is blurred by the vascular structure,
but this can actually be helpful for decoding the neural signal as the microvascula-
ture has a functional meaning. Recent work demonstrates a laminar specificity of
the ratio of microvasculature-to-cell-density [Weber et al., , Tian et al., ].
Incorporating these insights into detailed models of neurovascular coupling allowsCortical
microvasculature gives
rise to complex
hemodynamic
activation patterns
for a more detailed analysis of fMRI signals [Boas et al., , Guibert et al., ].
e temporal resolution of fMRI can be below Hz, but high temporal resolution
comes at the price of poor signal to noise ratio of the image sequence. In human
experiments the spatial resolution of fMRI signals is typically much lower and the
temporal resolution is usually below Hz; an advantage of human fMRI is that single
subjects brain scans can be co-registered with template brains. is makes multi-
subject analysis possible. But the co-registration requires spatial smoothing which
reduces the effective spatial resolution of human fMRI scans. To summarize, the
best spatiotemporal resolution for whole brain imaging is obtained by fMRI. As
fMRI requires hardware which is expensive to maintain and is not portable, opti-
cal imaging methods are a sensible alternative for measuring hemodynamic activity
in many applications.
3.3. PRIMARY VISUAL CORTEX: A WELL STUDIED BRAIN REGION 17
3.3 Primary visual cortex: A well studied brain region
When investigating neurovascular coupling, it is reasonable to choose a brain re-
gion that is well characterized. Testing hypotheses about how multiple modalities
are related is easier if the functional and anatomical properties of the brain region
measured are well understood. In terms of the functional characterization the best
studied sensory modality is undoubtedly the visual sense. In their seminal work Neurons in primary
visual cortex (V1)
respond selectively to
bars of light at a certain
orientation
[Hubel and Wiesel, ] found that cortical neurons in occipital regions respond
very selectively to bars of light of a certain orientation when presented in the vi-
sual field. In  the authors received the Nobel Prize. As the primary visual cor-
tex (also called V) is one of the best understood brain areas we will focus on data
recorded in this brain region throughout the entire dissertation. An example of the
cortical functions first described in [Hubel and Wiesel, ] is shown in figure ..
e data was taken from [Bhattacharyya et al., ]. When placing a microelec-
trode in V and presenting driing gratings of different orientation (see fig. .A)
to the animal, cortical neurons will emit more action potentials than during rest.
Cells in V are selective for gratings of a certain orientation: A cell will emit more
spikes when the preferred orientation was shown as opposed to when the orthogonal
orientation is shown (see fig. .B).
AB
0
20
40
60
80
100
Spikerate [Hz]
Preferred
Orthogonal
0 0.5 1 1.5 2 0
Time [s]
45°135°
225°315°
0
50
100
Spikerate [Hz]
mAChR Agonist
Preferred
Orthogonal
0.5
Time [s]
OTI 6.6
45°135°
225°315°
0
20
40
60
80
100
120
Spikerate [Hz]
Recover
Preferred
Orthogonal
0.5
Time [s]
OTI 1.7
45°135°
225°315°
A
B
E
C
F
B
Orthogonal (45°)
Preferred (135°)
Figure .: Selectivity of neurons recorded in primary visual cortex in response
to moving gratings of different orientation; A: Peristimulus time histogram (top
panel) and single trial spike raster plots (bottom panel) of a cortical cells spiking
responsetoa preferredstimulus(gratingwithangle)andorthogonalstimulus
(); stimulus on- and offset is marked by vertical black dashed line; highest
spike rates were recorded when the preferred orientation was shown, lowest spike
rates when the orthogonal orientation was shown; B: Polar plot summarizing the
responses (in a s window) to all stimulus orientations (in degree); black circles
indicate Hz increments in spike rate, a light blue inner polygon denotes median
firing rate for a given orientation, a grey tube indicates variance across trials;
18 CHAPTER 3. UNIMODAL NEUROIMAGING
Chapter 4
Unimodal analysis approaches
Tchapterwill recapitulatepopularanalysisconceptsforunimodalneuroimag-
ing data in order to set the stage for multimodal extensions in later chapters.
From a data analysts point of view, unimodal analysis methods can be categorized
into three classes:
Supervised methods
Are concerned with regression between measurements
and/or experimentally controlled variables.
Unsupervised methods
Find structure in measurements in an explorative fashion.
Model driven methods
Fit physiological models to measurements.
Many neuroimaging studies combine methods from more than one of these
three classes. us the analysis of a particular neuroimaging study is oen
difficult to assign to one of these three classes. Nonetheless this categorization can
be helpful. Each of these classes has advantages and drawbacks. Not all methods
can be applied to every data set. For instance supervised methods usually require
an experimentally controlled variable as regressor, which is not available in every
experimental setting. And some scientific questions can only be addressed with
methods from one of the three. For example in order to test a scientific hypothe-
sis about a certain biological process it is oen convenient to formulate a model and
compare the model predictions with experimental data. While an exhaustive review
of these categories is beyond the scope of this dissertation, a basic understanding of
standard neuroimaging analyses will be helpful in later chapters. e following sec-
tions will give a short introduction to some methods popular in classical unimodal
neuroimaging analyses.
20 CHAPTER 4. UNIMODAL ANALYSIS APPROACHES
4.1 Supervised Methods
Supervised methods have in common that a regressor x
R
Uis used to explain a
target variable y
R
f(x) = ˆ
y+ε(.)
where f(.)denotes a function that maps the regressor onto an estimateˆ
yof the target
variable yand εdenotes noise variance that is not captured by f(.). is function is
chosen such that some difference measure between target variable and its estimate
is minimized
argmin
f(.)(ˆ
yyp), (.)
where the distance measure oen is the pnorm xp=p
xp. Typically one
choses p= such that eq. (.) minimizes the least-squares error [Legendre, ].
During the last decades two main streams of supervised neuroimaging data analysis
have emerged, mass-univariate methods, also known as statistical parametric maps
(SPMs) [Friston and Buchel, ], and multivariate pattern analysis (MPA) [Cox
and Savoy, , Haynes and Rees, , Norman et al., ]. Both are based on
(mostly linear) regression, only the role of target variables and regressors are dif-
ferent. ere is an ongoing debate as to whether mass-univariate SPMs are better
than MPA approaches; the main argument for SPMs is that brain activation can be
better localized with mass-univariate methods [Kiebel and Friston, ]; other au-
thors advocate the higher sensitivity of multivariate methods [Kriegeskorte et al.,
]. While the methods used for SPMs and MPA approaches are similar, theyA hemodynamic
response function
(HRF) models the
temporal dynamics of
the fMRI signal in
response to a neural
stimulus
rely on different assumptions about the data generating process. Some assumptions
however the two approaches have in common. For instance all supervised settings,
when applied to fMRI data, need to correct for the hemodynamic lag when corre-
lating fMRI signals with a stimulus. is is usually done by convolving the stimulus
regressor with a canonical hemodynamic response function (HRF). An example of
a typical HRF as implemented in the analysis soware SPM [Friston and Buchel,
] is shown in figure .. It is based on a biomechanical model of the hemody-
namic response, see [Buxton et al., , Friston et al., ] and section .. e
HRF was generated using the MATLAB function spm_hrf.m in SPM¹. is HRF
is commonly assumed to be the same for all voxels and subjects, hence canonical.
While this is a convenient assumption, there is evidence for a considerable variabil-
ity of HRFs across subjects and brain regions [Aguirre et al., , Handwerker et al.,
] neglecting this variability will lead to poor sensitivity to hemodynamic ac-
tivation which does not follow the canonical HRF dynamics. Chapter will show
an alternative approach without these assumptions.
¹e parameters used were the default parameters, delay of response (relative to onset) s, delay
of undershoot (relative to onset) s, dispersion of response s, dispersion of undershoot s, ratio of
response to undershoot s, onset (seconds) s, s, temporal resolution Hz
4.1. SUPERVISED METHODS 21
5 10 15 20 25 30 35 40 45 50
Time [s]
Activity [a.u.]
Artificial measurements
Stimulus
Stimulus * HRF
0 5 10 15 20
Temporal delay [s]
Neuron
HRF
Figure .: Example of a canonical hemodynamic response function (HRF); in
order to construct a regressor for the GLM analysis, a stimulus time series (black,
le panel) is convolved () with the canonical HRF (grey, right panel); the result-
ing hypothesized fMRI time series is a smoothed version of the stimulus;
Mass-univariate methods In mass-univariate methods one treats single voxel or
dipole time series as target variable yand predicts their time course from a linear
combination of multivariate regressors xcontaining experimentally controlled vari-
ables and parameters that are not of interest but account for variance y. If the time
course is explained well by the regressor of interest, that voxel will have a high weight
in the SPM. Most SPM analyses are based on the so called general linear model Statistical parametric
maps (SPMs) visualize
activation patterns that
are correlated with an
experimental stimulus
(GLM). GLMs are the most oen used class of supervised methods for finding sta-
tistical parametric maps (SPMs) of neural activation [Friston et al., , Friston
and Buchel, ]. For a given data set the target variable y
R
L(e.g. a single voxel
time course of length L) is modeled as a linear combination of all Nregressors, each
weighted by a coefficient stored in a vector β
R
N, plus some gaussian i.i.d. error
ε N(, )
ˆ
y= +ε. (.)
e L×Nmatrix X= [xx. . . xN]containing the time series of Nregressors
xiof length Las column vectors is called design matrix. A typical GLM analysis General linear models
(GLMs) are the most
oen used method to
compute SPMs
includes as regressors all experimentally controlled parameters and additionally so
called nuisance regressors, which are not of interest in the analysis but explain some
of the variance in the data. In fMRI data, such a nuisance regressor could be for
instance head movement within the scanner. Including an estimate of movement
along each axis (pitch, roll, yaw) into the design matrix will improve the t of the
GLM. In SPM analyses for fMRI data yis a single voxel time course and the same de-
sign matrix Xis fitted to all voxels separately. is approach is called mass-univariate
analysis. Equation (.) can be solved by the ordinary least squares (OLS) solution
ˆ
β= (XX)Xy. (.)
e magnitude of the entries in the vector βcan now be subjected to statistical tests.
Voxels with values of βthat are significantly different from zero will be considered
22 CHAPTER 4. UNIMODAL ANALYSIS APPROACHES
in bands below 24 Hz (δ/θ,α,βbands) makes smaller
contributions to the BOLD signal changes. The estimated
optimal spatial filter (Fig. 2B), on the other hand, shows
that voxels within the gray matter (red stripe) of V1 make a
larger contribution to the maximized correlation between
the neural and fMRI signals. In the present study the γand
MUA contributions were greater than those of spiking
activity, which commonly represents a small group of cells.
The strong contribution of the γand MUA bands was
observed in most cases (see also Fig. 3A), and it confirms
previous results showing that the highest correlation
between the neural and the BOLD responses occurred in
the 30- to 140-Hz range of the neural signal in intracranial
recordings in the anesthetized monkey [11] and in human
patients [18,19].
In order to compare tkCCA with conventional correlation
analysis, we estimated the canonical correlogram as defined
in Eq. (3). This canonical correlogram is similar to the
univariate cross-correlogram, except that it is strictly
positive. However, the right sign of correlation can be
identified by looking at the signs of the projection
coefficients (see Methods section). Much as with the
correlogram computed in mass-univariate approaches, the
peak of the correlogram was found at a lag of about 5 s,
suggesting that, at least in area V1, the time-to-peak of the
BOLD response occurs 5 s after the onset of neural activity.
Fig. 1. Recording hardware, activation maps and analysis principles for combined physiology-fMRI experiments. (A) The upper column shows anatomical
images and the multichannel electrode used in this study. Blue arrows represent positions of recording sites on the electrode; 7 out of 10 channels are visible. The
lower column shows the BOLD responses (Pb.001) tested by means of a full-field visual stimulus. Note the minimal susceptibility artifacts caused by the
electrode; no distortions are evident in the activation map. (B) Spontaneous changes in the amplitude of different frequency bands. Depicted are the bands δ/θ,α,
β,γL, γ,γH, γVH, MUA (multi-unit activity) with 18, 812, 1224, 2440, 4060, 60100, 120250 and 10003000 Hz, respectively. Spikes were
extracted by detecting zero crossings and thresholding the high-pass signal. Arrows show examples of synchronous γand spiking activation. The entire
frequencytime matrix was used for tkCCA. (C) Spontaneous changes in the BOLD signal in the regions of V1, defined on the basis of anatomical criteria in each
slice. The upper trace shows the average time course of BOLD fluctuations from all voxels seen below.
5Y. Murayama et al. / Magnetic Resonance Imaging xx (2010) xxxxxx
ARTICLE IN PRESS
in bands below 24 Hz (δ/θ,α,βbands) makes smaller
contributions to the BOLD signal changes. The estimated
optimal spatial filter (Fig. 2B), on the other hand, shows
that voxels within the gray matter (red stripe) of V1 make a
larger contribution to the maximized correlation between
the neural and fMRI signals. In the present study the γand
MUA contributions were greater than those of spiking
activity, which commonly represents a small group of cells.
The strong contribution of the γand MUA bands was
observed in most cases (see also Fig. 3A), and it confirms
previous results showing that the highest correlation
between the neural and the BOLD responses occurred in
the 30- to 140-Hz range of the neural signal in intracranial
recordings in the anesthetized monkey [11] and in human
patients [18,19].
In order to compare tkCCA with conventional correlation
analysis, we estimated the canonical correlogram as defined
in Eq. (3). This canonical correlogram is similar to the
univariate cross-correlogram, except that it is strictly
positive. However, the right sign of correlation can be
identified by looking at the signs of the projection
coefficients (see Methods section). Much as with the
correlogram computed in mass-univariate approaches, the
peak of the correlogram was found at a lag of about 5 s,
suggesting that, at least in area V1, the time-to-peak of the
BOLD response occurs 5 s after the onset of neural activity.
Fig. 1. Recording hardware, activation maps and analysis principles for combined physiology-fMRI experiments. (A) The upper column shows anatomical
images and the multichannel electrode used in this study. Blue arrows represent positions of recording sites on the electrode; 7 out of 10 channels are visible. The
lower column shows the BOLD responses (Pb.001) tested by means of a full-field visual stimulus. Note the minimal susceptibility artifacts caused by the
electrode; no distortions are evident in the activation map. (B) Spontaneous changes in the amplitude of different frequency bands. Depicted are the bands δ/θ,α,
β,γL, γ,γH, γVH, MUA (multi-unit activity) with 18, 812, 1224, 2440, 4060, 60100, 120250 and 10003000 Hz, respectively. Spikes were
extracted by detecting zero crossings and thresholding the high-pass signal. Arrows show examples of synchronous γand spiking activation. The entire
frequencytime matrix was used for tkCCA. (C) Spontaneous changes in the BOLD signal in the regions of V1, defined on the basis of anatomical criteria in each
slice. The upper trace shows the average time course of BOLD fluctuations from all voxels seen below.
5Y. Murayama et al. / Magnetic Resonance Imaging xx (2010) xxxxxx
ARTICLE IN PRESS
in bands below 24 Hz (δ/θ,α,βbands) makes smaller
contributions to the BOLD signal changes. The estimated
optimal spatial filter (Fig. 2B), on the other hand, shows
that voxels within the gray matter (red stripe) of V1 make a
larger contribution to the maximized correlation between
the neural and fMRI signals. In the present study the γand
MUA contributions were greater than those of spiking
activity, which commonly represents a small group of cells.
The strong contribution of the γand MUA bands was
observed in most cases (see also Fig. 3A), and it confirms
previous results showing that the highest correlation
between the neural and the BOLD responses occurred in
the 30- to 140-Hz range of the neural signal in intracranial
recordings in the anesthetized monkey [11] and in human
patients [18,19].
In order to compare tkCCA with conventional correlation
analysis, we estimated the canonical correlogram as defined
in Eq. (3). This canonical correlogram is similar to the
univariate cross-correlogram, except that it is strictly
positive. However, the right sign of correlation can be
identified by looking at the signs of the projection
coefficients (see Methods section). Much as with the
correlogram computed in mass-univariate approaches, the
peak of the correlogram was found at a lag of about 5 s,
suggesting that, at least in area V1, the time-to-peak of the
BOLD response occurs 5 s after the onset of neural activity.
Fig. 1. Recording hardware, activation maps and analysis principles for combined physiology-fMRI experiments. (A) The upper column shows anatomical
images and the multichannel electrode used in this study. Blue arrows represent positions of recording sites on the electrode; 7 out of 10 channels are visible. The
lower column shows the BOLD responses (Pb.001) tested by means of a full-field visual stimulus. Note the minimal susceptibility artifacts caused by the
electrode; no distortions are evident in the activation map. (B) Spontaneous changes in the amplitude of different frequency bands. Depicted are the bands δ/θ,α,
β,γL, γ,γH, γVH, MUA (multi-unit activity) with 18, 812, 1224, 2440, 4060, 60100, 120250 and 10003000 Hz, respectively. Spikes were
extracted by detecting zero crossings and thresholding the high-pass signal. Arrows show examples of synchronous γand spiking activation. The entire
frequencytime matrix was used for tkCCA. (C) Spontaneous changes in the BOLD signal in the regions of V1, defined on the basis of anatomical criteria in each
slice. The upper trace shows the average time course of BOLD fluctuations from all voxels seen below.
5Y. Murayama et al. / Magnetic Resonance Imaging xx (2010) xxxxxx
ARTICLE IN PRESS
Figure .: Statistical parametric map (SPM) of early visual cortex in the macaque
monkey, for experimental details see appendix B or [Murayama et al., ]; vox-
els with significantly non-zero βvalues are superimposed on anatomical MRI
scans; color code corresponds to percent visual activation (relative to baseline);
as activated by the experimental paradigm modeled in the design matrix X. is
involves a statistical test for every voxel. Since typically fMRI data contains a few ten
thousand voxels it is almost certain that a large fraction of voxels will be considered
active due to random variations and not because those voxels are really activated.
e statistical threshold for significance has to be adapted to this multiple compari-
son problem. Figure . shows an example of anactivation pattern visualized with an
SPM superimposed on anatomical MRI scans of primary visual cortex of a Macaque
monkey [Murayama et al., ]. Experimental details can be found in appendix B.
e visual stimulus shown was a rotating checkerboard that was on for seconds
and then o for  seconds. is was repeated  times. e regressor in the de-
sign matrix X(here, X
R
L×,xis the visual stimulus time course and xan offset
term) thuswas a boxcar function which was if the stimuluswas on and otherwise
(that is before the convolution with the canonical HRF). is approach has beenReal neuroimaging data
violate the assumptions
of SPM approaches and continues to be the most popular analysis method for fMRI data. Due to the
assumptions underlying SPM estimation there are a number of caveats when deal-
ing with GLM type analyses, for a critical review see e.g. [Monti, ]. e most
oen criticized assumptions are temporal and spatial independence: Each voxel and
time sample is treated as an independent variable. ese assumptions are violated
in real fMRI data. e dependencies present already e.g. in fMRI measurements are
even enhanced by standard preprocessing procedures such as temporal smoothing
of regressors with a canonical HRF and spatial smoothing of the fMRI data. Tem-
poral smoothing is important to temporally align stimulus time course and voxel
time course. Spatial smoothing is needed to increase the signal to noise ratio of
fMRI data, to reduce numerical instabilities when co-registering single subject data
in a common template brain and to reduce the effective degrees of freedom of the
4.1. SUPERVISED METHODS 23
fMRI data for multiple comparison corrections. Temporal correlations are usually
removed from the regressors in the design matrix prior to the GLM fitting proce-
dure [Bullmore et al., ].
Multivariate methods As mass-univariate approaches consider only one voxel at
a time, certain multivariate voxel patterns cannot be detected [Lange et al., ];
in general mass-univariate measures are less sensitive than multivariate approaches
[Kriegeskorte et al., ]. Consequently there is increased interest among neuro-
scientists in multivariate methods, oen referred to as multivariate pattern analysis
(MPA)²; in the context of fMRI analysis this approach is also known as multi-voxel
pattern analysis [Norman et al., ], brain-reading [Cox and Savoy, ], dis-
tributed pattern analysis [Haxby et al., ], mind-reading [Haynes et al., ],
information-based functional brain mapping [Kriegeskorte et al., ] or prediction
[Haynes and Rees, ]. e goal of MPA is to find patterns in the data that pre-
dict the stimulus well. A common multivariate approach for fMRI analyses is to Multivariate Pattern
Analysis (MPA) predicts
univariate stimulus
labels from multivariate
measurements, like
voxel volumes of fMRI
signals
predict some stimulus label time course y
R
using a searchlight, that means to ex-
tract small voxel volumes v
R
U(Udenotes the number of voxels of the vectorized
volume), which are mapped through some function f(.)
f(v) = ˆ
y+ε, (.)
where εagain denotes noise variance that is not explained by f(v). ere are plenty
of possibilities to estimate f(.)from a given data set of size Lof stimulus target vari-
ables y
R
×Land voxel time series V= [v,v,...,vL]
R
U×L. Popular exam-
ples include support vector machines (SVMs) [Schölkopf and Smola, , Müller
et al., ], relevance vector machines [Tipping, ], gaussian processes (GPs)
[Rasmussen and Williams, ], linear regression and Fisher Linear Discriminant
analysis [Fisher, ]. For a comparison between some of these algorithms ap-
plied to fMRI data see e.g. [Lange et al., ]. Although choosing f(.)to be a
non-linear function can yield better prediction accuracy, linear methods where
f(V):=wV,w
R
U× are more popular [Müller et al., ]. First they are easy
to compute; secondly there is a straightforward interpretation³ of the weight vector
w: It contains the contribution of each voxel to the prediction ˆ
y
R
×L. In the case
of linear regression we find win complete analogy to eq. (.), only that Xis now
the data and the target yis the time series of stimulus labels. When estimating w
one is not interested in learning the data by heart. Any stimulus time course can be
²Of course mass-univariate methods are in principle also multivariate but only with respect to
the experimental variables and not with respect to fMRI voxels; multivariate in the context of fMRI
analyses oen refers to the notion of a multivariate measurement, as opposed to the mass-univariate
treatment of single voxels.
³In order to have an interpretable solution w, the data in Vshould have been normalized, such that
each row has the same variance.
24 CHAPTER 4. UNIMODAL ANALYSIS APPROACHES
predicted at  accuracy if the data has enough dimensions. Instead one is inter-
ested in a low generalization error; this means that wshould be such that new data,
which has not been seen by the learning algorithm, is predicted at high accuracy.
is can be achieved by constraining the complexity of the solution wand is called
regularization [Duda et al., ]. In the case of linear regression this can be imple-
mented by requiring w
to be small. e objective of regularized linear regression
is, in analogy to eq. (.)
argmin
w((ywX)+κw
)(.)
where κis called regularization parameter, controlling the amount of regularization.
e larger κthe more important is the simplicity’ of the solution in the optimization
of eq. (.). In the worst case of very high noise κwill we chosen to be very large
and consequently all entries of wwill be small and similar. If all entries of ware the
same, as in the case of high noise and hence high regularization, the solution is the
same (up to a constant scaling factor) as the mean of X, in which each dimension is
weighted with /L. Solving eq. (.) for wyields, in analogy to eq. (.)
w= (XX+)Xy. (.)
e matrix product XXis also called linear kernel of X. As the regularizationA linear kernel is the
inner product of a data
matrix and contains a
similarity measure for
all pairs of data points
adds a ridge of height κon this matrix, eq. (.) is oen referred to as kernel ridge
regression. In order to find the correct κ, classical statistical learning routines such
as cross-validation can be used, see e.g. [Lemm et al., ].
4.2 Unsupervised Methods
Not in all experimental settings a regressor or target variable, needed for supervised
analyses, is available. And oen there are many more processes reflected in the sig-
nal than the experimentally controlled stimulation. Many of these factors cannot be
measured or modeled. Given the same stimulus, repeated measurements of brain
activity will show variations around the mean response to that stimulus. ese vari-
ations might be considered as noise as they do not reflect the stimulus - but just
because we cannot explain certain aspects of neural activity by a stimulus (or addi-
tional nuisance regressors) that does not mean that this variability is noise. It merely
means that we do not have a model for it. If there is no model for a measurement,
unsupervisedmethodscanbeused tofindstructureinthedatainanexplorativeman-
ner. is can help to find interesting patterns in the data, for instance interesting in
the sense that a pattern found by explorative methods coincides with anatomical
findings obtained independently. In practice most unsupervised projection meth-
ods can be cast in the form of
Y=WX (.)
4.2. UNSUPERVISED METHODS 25
where X
R
M×Lis the data matrix, W= [w
,w
,. . . ,w
n]
R
N×Mis a lin-
ear mapping that transforms the data in Xinto a new representation Y
R
N×L.
Each basis vector wicomputes a linear combination of the Moriginal dimensions of
X. How the original dimensions of Xare combined depends on the criterion opti-
mizedbytherespectiveunsupervisedlearningmethodused. Inthefollowingwewill
shortly introduce two popular examples of unsupervised learning techniques, prin-
cipal component analysis (PCA) and independent component analysis (ICA). Both
have become standard techniques in the context of neuroimaging.
Principal Component Analysis (PCA)
Principal Component
Analysis (PCA) finds
directions of maximal
variance in the data
PCA [Pearson, ] finds a linear combination of features (such as sensors, voxels,
etc.) that maximizes the variance of the data. Inspecting the feature combinations
gives a good estimate of which combination of sensors captures most variance of the
signal. And projecting the data onto those feature combinations allows to reduce a
high dimensional data set to only a few dimensions that can be easily visualized.
PCA is probably the most oen used method for dimensionality reduction. Con-
sider a data set stored in a matrix X
R
D×L(Ddimensions and Ldata points). PCA
computes a linear projection w
R
Dthat maximizes the variance in the data set:
w=argmax
w
wXXws.t. w=ww=. (.)
e matrix product lXX,l=/Lis called covariance matrix C of X(we will omit
the normalization constant lin the following). It contains in its ith row and jth
column the covariance between the ith and jth dimension of X. We will introduce
two ways of solving PCA, first the standard solution and another version that does
not require the explicit computation of covariance matrices; this is advantageous in
case the data is high-dimensional. ere are probabilistic versions of PCA in which
the underlying generative model is a multivariate gaussian distribution [Roweis and
Ghahramani, ]. e principal components W= [w,w,. . . ,wD](we assume
here that DL) are the eigenvectors of the data covariance matrix C
CW =WΛ, (.)
wherethematrixΛcontainsonitsdiagonalthecorrespondingeigenvaluesλ,. . . ,λkPrincipal components
are eigenvectors of the
data covariance matrix
(and is in its off-diagonal entries). As Λ is the new covariance matrix of the data in
the PCA space, PCA is oen referred to as diagonalization of the covariance matrix.
e basis vectors of Ware aligned with the directions of maximal variance in the
data, as depicted in fig. .. Looking at the eigenvalues helps to decide how many
principal components should be considered when analyzing the data. A fast decay
aer ieigenvalues is a good sign that the data is well captured by the first iprinci-
pal components. We can transform the data Xinto its new representation XPCA in
26 CHAPTER 4. UNIMODAL ANALYSIS APPROACHES
−2 0 2
−2
−1
0
1
2
x1
x2
Before PCA
−2 0 2
−2
−1
0
1
2
PC1
PC2
After PCA
A B
Figure .: Two dimensional toy data example for PCA; in all panels, the first
dimension is plotted on the x-axis, the second dimension along the y-axis; A: data
was drawn from a two dimensional gaussian distribution; the first and second
principal component directions are plotted in green and red, respectively; B: e
same data, projected onto the first two principal components; the direction of the
data that exhibits most variance is now aligned with the x-axis and the direction
of the second largest variance is parallel to the y-axis;
which all rows of the data matrix Xare uncorrelated
XPCA =WX=
k
i=
eiw
iX, (.)
where the eiare the basis vectors of the canonical euclidean basis. An illustration
with two dimensional toy data is shown in figure .. e measurements x(in grey)PCA can be used to
discard low variance
directions (for dimensio-
nality reduction) or high
variance directions (for
artifact removal)
are drawn from a two dimensional gaussian distribution. e directions of maximal
variance, the principal components, are plotted on top of the data in fig. .A. Aer
projecting the data onto their first two principal components, the axes of maximal
variances of XPCA are rotated such that they fall on the main coordinates (fig. .B).
Oen PCA is used for dimensionality reduction. In this case one is interested in a
lower dimensional representation of the data, capturing as much variance as possi-
ble. is can easily be achieved by retaining only the first rows of XPCA, i.e. those that
correspond to the highest eigenvalues. Since Wis orthogonal we have X=WWX
and can therefore remove certain components also in the original basis of the data
by
X=
i∈I
wiw
iX(.)
where the index set Icontains all component indices that should be retained. Even
though some components might be removed, the new data set Xis expressed in the
same basis as Xand therefore has the same number of rows.
PCA for high-dimensional data If the dimensionality Dof the data is higher than
the number of samples Lit becomes more efficient to not work on covariance ma-
4.2. UNSUPERVISED METHODS 27
trices but on inner products of the data, also known as the linear kernel matrix KX,
see also eq. (.), of the data
KX=XX. (.)
e linear kernel KXof the data is only of size L×L, while the covariance matrix
would be of size D×D. We will consider here only linear kernels. An advantage of
linearkernelsisthattheprojectionswcanbe easilyrecoveredasalinearcombination
of data points. For the first principal component wthis means:
w= (.)
Where α
R
Lis a column vector containing the contribution of each data point in
Xto principal component w. e intuition behind this expansion is that if the num-
ber of dimensions is larger than the number of data points, then the solution has to
lie in the space spanned by the data. us we can express the solution was a combi-
nation of data points, each weighted according to its respective coefficient in α. e
relationship between linear kernel PCA and standard covariance matrix based PCA
as computed in eq. (.) is easy to understand when considering the singular value
decomposition of X, see appendix A.. If each data point contributes exactly the
same wis the mean over all data points. Next to the more efficient computation (in
some settings) compared to diagonalizing the covariance matrix, the kernel version
of PCA has another advantage: Non-linear extensions are straightforward to imple-
ment [Müller et al., , Schölkopf and Smola, ] by computing the covariance
not in the input space but in a high-dimensional feature space defined by non-linear
kernel functions [Schölkopf et al., ]. is kernel-trick can be helpful if the struc-
ture of the data is not well captured by a linear feature combination. Chapter will
explain the kernel-trick in more detail.
Independent Component Analysis (ICA)
Independent Component Analysis (ICA) is another unsupervised algorithm that
maximizes statistical independence between the components, i.e. the rows of matrix
Yin eq. (.). e ICA model can be formulated as
X=AS, (.)
meaning that the nth observed time series xn
R
×L(Ldenoting the number of
time points sampled) is a linear and instantaneous mixture of Mstatistically inde-
pendent source signals sm
R
×Lwith m=, . . . ,M. e matrix Ais usually
referred to as mixing matrix. e goal of ICA is to reconstruct the original source
signals Sgiven only the mixtures X. Maximizing the independence of the source es-
timates ˆ
S
R
N×Lcorresponds to minimizing the mutual information (see section
28 CHAPTER 4. UNIMODAL ANALYSIS APPROACHES
.) between the estimated source signals
I(S) =
n
H(Sn)H(S), (.)
where Sdenotes the random variable of the estimated source signals ˆ
S,H(Sn)de-
notes the entropy of the single source estimates (the nth row of ˆ
S) and H(S)the total
entropy of the source estimates. Over the last  years, a great number of different
algorithms have been proposed to find these non-gaussian projections; some opti-
mize contrast functions like the negentropy by gradient descent [Bell and Sejnowski,
], others solve the ICA problem by approximate diagonalization of the fourth-
order cumulant tensor (kurtosis) [Cardoso and Souloumiac, ], by fixed-point
algorithms [Hyvärinen, ] or by mutual information based approaches employ-
ing kernel canonical correlation analysis [Bach and Jordan, ]. Many ICA so-
ware packages have been developed and enable non-experts to use the most popular
ICA implementations on EEG/MEG and fMRI data.
Spatial and Temporal ICA Each column of the mixing matrix Ain eq. (.) re-
flects onesourcesignal andcanbe interpretedasthefield pattern thatshowshowthis
source contributes to the measured activity in all the different channels (EEG/MEG-
channels or fMRI-voxels). According to this interpretation the right side of equa-
tion (.) is the sum of outer products between space vectors Aand time vectors S,
where the time vectors are independent. erefore, this setting is oen called tem-
poral ICA. Another variant is called spatial ICA. Here the roles between row- and
column vectors have switched, i.e. instead of independent time series one computes
independent spatial patterns. is corresponds to performing ICA on Xinstead
of X. Spatial ICA on fMRI data is oen used for functional connectivity analysis
[van de Ven et al., , Ystad et al., ]. For temporal ICA, the number of time
points should be larger than the number of channels (i.e. T>N), for spatial ICA the
number of time points should be smaller than the number of dimensions (T<N).
erefore the former is suited for typical EEG and MEG analysis, while the latter is
the method of choice for fMRI data. For a more complete discussion about differ-
ences between spatial and temporal ICA and the assumptions underlying applica-
tions of ICA on fMRI data see e.g. [McKeown and Sejnowski, , Calhoun et al.,
]. For a review on ICA for fMRI data analysis see e.g. [Calhoun et al., ].
4.3 Model Based Approaches
Most of the above mentioned analysis methods can be regarded as data driven. In
the majority of studies the only model used is the canonical HRF. A problem with
data driven analysis methods is that they do not offer a physiologically interpretable
4.3. MODEL BASED APPROACHES 29
model of the underlying neural processes. However, if one aims at testing hypothe-
ses about physiological parameters, a well defined physiological model of the signal
of interest is indispensable. Hypotheses, or prior knowledge about anatomical and
functional properties, can be incorporated in models of the physiological processes
underlying the measurements. ese physiological models are called forward mod-
els. A forward model Fis specified by a set of equations describing the dynamics
of electromagnetic or hemodynamic activity and parameters φ. In analogy to the
standard supervised learning setting , see e.g. eq. (.), these parameters are opti-
mized such that some distance measure (e.g. the euclidean distance) between the
measured data Dand the model predictions F(φ)is minimized.
argmin
φ
(∥DF(φ)). (.)
e following two paragraphs section give two examples of popular forward mod-
els for EEG and fMRI data. A more detailed overview of model driven analysis ap-
proachesinthecontextofmultimodal datacanbefoundin[Bießmannet al.,b].
EEG/MEG forward models
e physical quantities modeled in EEG/MEG forward models are current dipoles
between a current source and sink. A dipole is a convenient representation of the
electrical field emerging from synchronized activation of a neural cell assembly. e
strength of a single electrical dipole moment s
R
(three orthogonal spatial com-
ponents) is related to electrode potentials measured in x
R
U(Udenotes number
of electrodes) by the forward model
x=As (.)
where A
R
U×contains in its rows the lead-field of the respective electrode, re-
flecting how much of the dipole activity is picked up by that electrode; in its columns
Acontains the gain of each electrode. is matrix models the conductivity pro-
file of the head and can be estimated using anatomical measurements obtained by
structural MRI [Dale et al., ]. e linear relationship in eq. (.) relies on
the assumption that magnetic induction and capacitive effects do not influence the
measurements. Adding Ndipoles results in a dipole vector s
R
Nand a lead-field
matrix A
R
U×N. e forward model in Acan be estimated using e.g. boundary
element methods [Oostendorp and van Oosterom, ]. In order to recover the
source activity the forward model has to be inverted
ˆ
s=Wx, (.)
whereˆ
sdenotes estimated source time courses and W
R
N×Uis the inverse solu-
tion that maps measurements to sources. Many approaches to computing Whave
30 CHAPTER 4. UNIMODAL ANALYSIS APPROACHES
been proposed, including a simple least squares approach as in section ., bayesian
model inversion [Sato et al., ] or ICA based techniques (see section .), to
name just a few. A popular solution to the model inversion problem is called mini-
mum norm approach [Dale and Sereno, ]
W=RA(ARA+κE)(.)
where E
R
U×Uis the sensor noise covariance matrix, κis the regularization pa-
rameter and R
R
N×Nis the dipole source covariance matrix. In fMRI con-
strained current source density analysis [Liu et al., , Dale et al., , Liu and
He, ] Ris assumed to be a diagonal matrix which contains on its diagonal spa-
tial weighting factors obtained from anatomical MRI scans or from GLM type anal-
yses (see section .). For a comprehensive review of the physiological origins of
EEG/MEG see [da Silva and Niedermayer, , Nunez and Srinivasan, ]. A
detailed review on solutions to EEG/MEG forward models and model inversion can
be found in [Baillet, ]. For realistic head models there are no analytical solu-
tions and numerical solutions for realistic head models are computationally costly.
Forward Models for fMRI data
Abiomechanicalmodel ofthe BOLDsignal is proposed in[Buxtonetal.,]. is
so called balloon model was extended by [Friston et al., ] to reflect the complete
process of the BOLD response in a veneous balloon starting from synaptic activity
u(t). For each location in the brain the temporal dynamics of four physiological
quantities are modeled, a flow inducing signal ξ, blood inflow φ, venous volume
νand deoxyhemoglobin concentration ω. e model is based on six parameters,
neuronal efficacy ε, signal decay τs, auto-regulation τf, transit time τ, stiffness pa-
rameter αand resting oxygen extraction E. e differential equations of the balloon
model are:
˙
ξ=εu(t)ξ
τsφ
τf
(.a)
˙
φ=ξ(.b)
˙
ν=
τ(φfout(ν,α)) (.c)
˙
ω=
τ(φE(φ,E)
Efout(ν,α)ω
x), (.d)
where
fout(ν,α) = ν/α(.)
E(φ,E) = (E)/φ(.)
4.3. MODEL BASED APPROACHES 31
e quantity fout relates the flow-volume relationship through the exponent α, which
has been determined in experiments to be in the range of .-. [Friston et al.,
]. Equation (.) describes the fraction of oxygen extracted from the inflow-
ing blood. Based on these equations, the authors of [Buxton et al., ] propose a
canonical hemodynamic response function as depicted in fig. .. Similar to the case
of electrophysiological forward models, there is no analytic solution to eq. (.).
Hence model inversion requires numerical solutions, which are computationally de-
manding for realistic data set sizes and numerically instable, given the low temporal
resolution of fMRI data. It is also important to note that this system of differential
equations neglects spatial dependencies. Oen studies model only the time course
of the average BOLD signal within a given ROI as there is no gold standard model
available for spatial and spatiotemporal dynamics. Other studies employ efficient
implementations of probabilistic forward models to tackle full brain hemodynamic
models, see e.g. [Riera et al., , Daunizeau et al., , Valdes-Sosa et al., ].
But some of the efficiency comes at the price of assumptions which are violated in
real fMRI data. An example is the spatiotemporal separability assumption of the
hemodynamic response function in [Riera et al., , Daunizeau et al., ].
32 CHAPTER 4. UNIMODAL ANALYSIS APPROACHES
Part II
Multimodal Neuroimaging
Chapter 5
Multimodal neuroimaging
I the past decade multimodal neuroimaging methods have become an indispens-
able tool for neuroscientific research and clinical application [Friston, ].
Multimodal neuroimaging methods combine single imaging modalities that yield
complementary views on brain activity, such as electrophysiological and hemody-
namic measurements (see figure .). Multimodal setups offer many advantages
comparedtounimodalmeasurements. Inclinicalapplicationsforinstanceoneneeds
high temporal resolution to capture the temporal dynamics of (non-convulsive)
epileptic activity and at the same time high spatial resolution to determine the origin
of the seizure in the brain. Here, multimodal setups can help to overcome limita-
tions of classical unimodal setups [Vulliemoz et al., , Rosenkranz and Lemieux,
]: While unimodal methods typically do not have both high temporal and high
spatial resolution, multimodal methods combine the advantages of single modalities
yielding a view on brain activity with an unprecedented spatiotemporal resolution.
Secondly, multimodal methods can enhance our understanding of how neural ac-
tivity is reflected in single imaging modalities. While fMRI has been established al-
ready more than two decades ago [Ogawa et al., ] it was not before multimodal
methods were established that one could actually measure how neural signals are
related to the fMRI signal [Logothetis et al., , Devor et al., ]. Knowing how
neural activity is reflected in different modalities is important for a better interpre-
tation of the measurements. But there are questions beyond these epistemological
issuesforwhichmultimodalneuroimagingis indispensable. Abetterunderstanding
of the mechanisms underlying neurovascular coupling is of paramount importance
for deeper insights in the pathology of e.g. epilepsy [Vulliemoz et al., a], stroke
and Alzheimer’s disease [Iadecola, , Girouard and Iadecola, ]. erefore
the investigation of neurovascular coupling mechanisms is one of the most impor-
tant applications of multimodal imaging. is chapter will first givea short overview
about multimodal neuroimaging setups in section ., followed in section . by
motivating application examples of these setups in a clinical as well as a research
context. Finally section . will provide an information theoretic view on multi-
modal data fusion. e information theoretic view can be helpful to understand the
different approaches to multimodal data fusion in later chapters.
36 CHAPTER 5. MULTIMODAL NEUROIMAGING
Figure .: Venn Diagram of multimodal neuroimaging analysis methods; some
aspects of the brain activity are reflected in electrophysiological recordings and
others in hemodynamic measurements; certain aspects such as fast neuronal os-
cillations are only detectable in electrophysiological signals (area ), others (such
as activity in deep brain structures) are better visible in fMRI signals (area ); as-
pects that are reflected in both modalities can be subdivided into signals originat-
ing from neural activity (area ) and non-neural physiological processes reflected
in both modalities, such as muscle contractions that lead to head movement (area
); besides these common artifact sources, there are many artifacts that are re-
flected in one modality only (area and ).
5.1 Multimodal imaging setups
Manydifferentmultimodal neuroimagingsetupshavebeenestablished. issection
will give a short introduction to the most popular multimodal setups that combine
hemodynamic and electrophysiological measurements of brain activity.
Invasive electrophysiology and optical imaging
Combined optical imaging and electrophysiology setups [Grinvald et al., ] are
comparatively simple and have a high spatiotemporal resolution on the cortical sur-
face. ese setups are oen used in animal studies for investigation of neurovascular
coupling phenomena, see e.g. [Martindale et al., , Devor et al., , Berwick
et al., , Sirotin et al., ]. Also for human patients the intrinsic optical BOLD
signal is an attractive diagnostic tool for clinical applications [Arthur and Nader,
]. A disadvantage is that the optical signal is restricted to the cortical surface,
while microelectrode measurements can be obtained from deep brain structures.
5.1. MULTIMODAL IMAGING SETUPS 37
Setup Properties Invasive
Optical imaging-
Microelectrodes
[Grinvald et al., ]
High spatiotemporal resolution
Optical signals restricted to cortical surface Yes
fMRI-MEG
[Dale et al., ]
MEG restricted to cortical surface
No simultaneous measurements No
fMRI-EEG
[Lemieux et al., ]
EEG restricted to cortical surface
Simple Setup No
fMRI-Microelectrodes
[Logothetis et al., ]
Depth resolution in both modalities
Complex setup Yes
NIRS-EEG
[Moosmann et al., ]
Both modalities restricted to cortical surface
Portable Setup No
Table .: Popular multimodal neuroimaging setups;
fMRI and MEG
Combined fMRI-MEG measurements were amongst the first multimodal setups. A
fundamental limitation of fMRI-MEG is that measurements cannot be obtained si-
multaneously. In order to analyze MEG-fMRI jointly an experimental paradigm is
needed that triggers the same neural activation patterns. Compared to EEG-fMRI
setups, the combination of MEG-fMRI might offer the advantage that MEG sig-
nals oen have a better spatial resolution than EEG setups. In contrast to simulta-
neous EEG-fMRI, sequential MEG-fMRI measurements have no scanner induced
artifacts.
fMRI and EEG
In contrast to fMRI-MEG measurements fMRI-EEG can be measured simultane-
ously. e first studies using EEG-fMRI had to solve a number of technical chal-
lenges [Lemieux et al., , Allen et al., , Goldman et al., , Allen et al.,
, Lemieux et al., ]. Aer these problems were solved, fMRI-EEG has be-
come the probably most oen used multimodal neuroimaging setup. With the ad-
vent of commercially available MRI compatible EEG setups and efficient artifact
removal methods, this methodology has become a standard tool in clinical applica-
tions and neuroscientific research. An fundamental limitation of fMRI-EEG is that
the EEG measurements are limited to the cortical surface. Also, the EEG sensors
pick up only superpositions of multiple neural dipoles, due to the volume conduc-
tion effects of the skull.
NIRS and EEG
e simple and comparatively low-cost setup of NIRS allows to measure hemody-
namic activity in many situations when fMRI measurements are not feasible, e.g. for
longterm monitoring at the bedside [Bozkurt et al., ] or even outside the lab via
38 CHAPTER 5. MULTIMODAL NEUROIMAGING
wireless transmission [Muehlemann et al., ]. For multimodal measurements
NIRS has the additional advantage that the electrophysiological measurements are
not corrupted by artifacts induced by changes in the strong magnetic fields needed
for measuring fMRI signals. Besides NIRS can be measured at temporal resolutions
of Hz and more, while fMRI measurements usually have temporal resolutions be-
low Hz. Of course the temporal response properties of the hemodynamic signal
do not change by faster measurements so the effective temporal resolution of the
measured hemodynamic response is the same as that of fMRI measurements, but
higher temporal resolution can help to understand better the temporal dynamics
of the blood response to neural activation. Taken together despite the low spatial
resolution and SNR of NIRS, it is an attractive alternative to fMRI in particular for
multimodal measurements in everyday life at the bedside of patients. A promis-
ing example are applications in the context of brain-computer interfaces (BCIs). In
BCIs a computer program extracts volitionally controlled brain signals in order to
allow paraplegic patients to communicate with the outside world. So called hybrid
BCIs [Pfurtscheller et al., ], BCIs that use multimodal measurements, can ro-
bustly increase the accuracy at which brain states can be decoded from brain activity
measurements. For a hybrid NIRS-EEG BCI see for instance [Fazli et al., ].
fMRI and intracranial microelectrode recordings
e combination of fMRI and intracranial microelectrode recordings offers the best
spatiotemporal resolution of all multimodal setups [Logothetis et al., , Goense
and Logothetis, ]. All other setups, non-invasive multimodal setups like fMRI-
MEG/EEG as well as optical imaging signals combined with intracranial recordings,
have at least one modality that can only measure brain activity in the cortical man-
tle. Intracranialmicroelectroderecordingscombinedwithhigh resolutionfMRIcan
measure brain activity in deep structures such as the thalamus in both modalities at
the same time [Logothetis et al., ]. Simultaneous electrophysiology and fMRI
recordings are technically challenging and require specialized equipment [Oelter-
mann et al., ]. A major problem are the artifacts in the electrophysiological
recordings induced by switching magnetic field gradients. In the initial part of the
work on this thesis a method to remove these artifacts online during a running mul-
timodal measurement was developed and is presented in section ..
5.2 Motivations for multimodal neuroimaging
Multimodal neuroimaging offers decisive advantages for basic research and clinical
applications. A few motivating examples will illustrate the usefulness of multimodal
methods in the following.
5.2. MOTIVATIONS FOR MULTIMODAL NEUROIMAGING 39
Understanding unimodal concepts of brain activity
Mostofwhatweknowaboutbrainactivityinhealthyhumansubjects hasbeenfound
with unimodal measurements of neural activity using EEG, MEG, fMRI, PET and
invasive microelectrode recordings. Consequently most neuroscientific concepts
are tightly related to the method that has been used to discover and investigate this
particular aspect of neural activity. Translating unimodal concepts from one modal-
ity to another is an important step towards understanding the neural mechanisms
underlying these phenomena.
From EEG concepts to fMRI An important concept in the EEG literature are event
related responses (ERPs) and amplitude modulations of neural oscillations. Both,
the amplitude of oscillations in a certain frequency band as well as ERPs are being
used as diagnostic markers in clinical practice. ERPs are computed by taking the av-
erage of EEG/MEG epochs time locked to the onset of a stimulus event. Systematic
peaks and troughs in the EEG/MEG signal aer stimulus onset will add up in the
average and can be used to infer the temporal dynamics of cortical processing. A
better understanding of the neural mechanisms underlying the generation of ERPs
requires a multimodal approach. In [Eichele et al., ] multimodal EEG-fMRI
is used to link ERPs to their fMRI representation. In an auditory target detection
experiment, the amplitude modulation at different latencies was used to predict spa-
tially separated fMRI activation patterns, reflecting different stages in the hierarchy
of cortical processing. Besides ERPs a well studied EEG feature is the average ampli-
tude of neural oscillations within a certain frequency band. e best studied oscilla-
tion is the so called αrhythm¹ [Berger, ]. Concurrent EEG-fMRI studies could
relate the modulation of alpha power to decreased BOLD signals in occipital cortex
and increased BOLD signals in thalamus [Goldman et al., , Moosmann et al.,
, de Munck et al., , Wu et al., ]. Oscillations in other frequency bands
have been studied with EEG-fMRI in [Laufs et al., , Scheeringa et al., ].
From fMRI concepts to EEG An example of a unimodal concept from fMRI data
are so called resting state networks [Laufs et al., , van de Ven et al., , Cole
et al., ]. ese networks are estimated by analyzing ongoing hemodynamic ac-
tivityintheabsenceofanytask. Althoughthefunctionalrelevanceofthesenetworks
is currently not fully understood there is evidence to suggest that they represent a
fundamental and functionally relevant modular organization [Smith et al., ].
It has been proposed that their integrity is compromised in a variety of neurologic
and psychiatric conditions [Greicius et al., , Gotman et al., ]. A prominent
¹ e αrhythm is characterized by its posterior distribution and oscillations in a frequency range
from about to  Hz and reactivity to eye opening/closure, i.e. closing the eyes will result in increased
amplitudes in these oscillations, while opening leads to amplitude decrease.
40 CHAPTER 5. MULTIMODAL NEUROIMAGING
example is the default mode network [Raichle et al., ]. Per definition the de-
fault mode network is a fMRI concept as it is based on the oxygen extraction fraction,
the ratio of oxygen used by the brain divided by the amount of oxygen delivered
by the vessels. As the metabolic needs of the brain in response to sensory stimula-
tion are rather subtle compared to its energy consumption during rest, the authors
of [Raichle et al., ] advocate the idea that it is crucial for our understanding of
neural processing to define a baseline state of brain activity. Studies investigating the
default mode network are lacking experimentally controlled variables that could be
used as regressor for analyzing the network of interest; hence resting state data re-
quires specialized analysis approaches. Before multimodal neuroimaging methods
were established most resting state studies have been relying on unimodal unsu-
pervised methods (see section .). As non-invasive multimodal EEG-fMRI setups
becamereadilyavailable,researcherswereabletostudythethesehemodynamicphe-
nomena with supervised methods by using one modality as regressor for the other
modality [Laufs et al., , Mantini et al., , de Munck et al., , Laufs, ,
Ritter et al., ].
Diagnosis of pathological brain activity
In clinical research the development of concurrent EEG-fMRI protocols was moti-
vatedbytheaimtoimproveseizurelocalizationforpre-surgicalplanningofpharmaco-
resistant epilepsy. It is crucial for surgeons to localize epileptic sources as precisely
aspossiblein ordertonotdamagestructuresthatareimportantforfunctionssuch as
memory, speech or motor coordination. Moreover it is important to understand the
mechanisms underlying abnormal brain activity in these patients, such as the spa-
tial extent of the seizure generating network. Both questions have been addressed in
studies employing multimodal methods. Several groups have explored and demon-
strated the feasibility of simultaneous intracranial, EEG and fMRI or NIRS record-
ingsforthispurpose[Ivesetal.,,Krakowetal.,,Lemieux etal.,]. Also
intracranially measured interictal² EEG eventsin epileptic patients events have been
used as predictors of hemodynamic activity measured in fMRI [Vulliemoz et al.,
]. e basic assumption in this approach is that hemodynamic responses at
the time of the interictal events would map on to the regions of the brain that are in-
volved in generating seizures - the irritative zone(s). is principle was then also ap-
plied in studies of patients with generalized non-convulsive seizures in which spike-
wavedischargeswereused topredicthemodynamicresponses[Gotmanetal.,],
as well as partial seizures. Other studies investigated the hemodynamic response of
epileptic activity in children using simultaneous EEG-NIRS measurements [Roche-
Labarbe et al., ]. e authors report an increase in deoxyhemoglobin shortly
before the onset of EEG spike-and-wave discharges. As NIRS is, like fMRI, a mea-
²Ictal refers to the period of an epileptic seizure. Interictal refers to the period between epileptic
seizures during which EEG events characteristic for epileptic disorders can be detected.
5.3. AN INFORMATION THEORETIC VIEW 41
sure of metabolic activity, this study shows that epileptic activity measured by elec-
trophysiological recordings is preceded by an increase in metabolic rate. is is
in line with previous studies in animal models [Zhao et al., ] and hints at the
potential of multimodal methods for epilepsy research: Without multimodal mea-
surements it is impossible to find such a dissociation between neural and hemody-
namic activity. Oen the localization of epileptic sources improves with multimodal
measurements. e accuracy of multimodal estimates of epileptic sources can be
validated with intracranial electrophysiology [Vulliemoz et al., ] and quantifi-
cation of post-surgical seizure reduction [ornton et al., a]. However there
are many patients in whom localization of epileptic sources from one modality does
not coincide with that of another modality. Novel analysis approaches might help to
solve these discrepancies [Jann et al., , ornton et al., b, Vulliemoz et al.,
b].
5.3 An information theoretic view
For an intuitive understanding of multimodal analyses it is helpful to take an infor-
mation theoretic view. Information theory has been developed as a generic frame-
work for telecommunication [Shannon, ] and became popular in application
domains far beyond that [Cover and omas, ]. Independent of the modality,
not all aspects of brain activity can be measured; and not for all aspects measured in
one modality there might be a counterpart in other modalities. Besides especially
in multimodal setups artifacts impose additional challenges on the analyses. Infor-
mation theoretic measures can help formalizing and illustrating these relationships.
e basic quantity underlying information theory is the entropy H(X)of a random
variable Xwith probability distribution p(x); it is measured in bits and defined as
H(X) =
x
p(x)log(p(x)). (.)
Entropy can be used to derive another measure, the mutual information I(X;Y)be-
tween Xand another variable Y, which is the reduction in entropy of Xby knowing
the state of Y
I(X;Y) = H(X)H(X|Y)(.)
where the conditional entropy H(X|Y)is given by
H(X|Y) =
x
y
p(x,y)log(p(y|x)) . (.)
All quantities are illustrated in figure .A. e mutual information I(X;Y)is the
intersection of the areas corresponding to H(X)and H(Y), respectively. Taking
the information theoretic stance one can easily relate theoretical insights to simple
42 CHAPTER 5. MULTIMODAL NEUROIMAGING
geometrical intuitions. In particular we see that the mutual information can also be
computed by
I(X;Y) = H(X) + H(Y)H(X,Y)(.)
which generalizes in the case of more than two variables X,X,. . . ,XNto
I(X;X;. . . ;XN) =
N
n=
H(Xn)H(X,. . . ,XN). (.)
isrelationshipisusedforinstancein ICA(seesection.). Wewilldenotetheran-
domvariableofall (discrete)brainstatesZ, the variabledescribingall EEGmeasure-
ments Xand that of all fMRI states Y. e joint entropy of all variables H(X,Y,Z)
is illustrated in figure . as the joint area within all three ellipses. e goal of mul-
timodal neuroimaging is to find those features in the measurements that maximize
the grey area in figure .B, which is the mutual information
I(Z;X,Y) = H(Z)H(Z|X,Y)(.)
between the brain activity Zand the joint entropy of all measurements X,Y. If an
analysis method picks those features that maximize equation (.), the entire area of
H(Z)intersectingwitheitherH(X)orH(Y)iscovered. eremainingareaofH(Z)
is the conditional entropy H(Z|X,Y). If there is information gained by multimodal
measurements the area covered by I(Z;X,Y)is larger than the areas of I(Z;X)
or I(Z;Y), representing unimodal measurements. As the true brain states Zare
unknown, eq. (.) cannot be maximized directly. Instead one oen maximizes
I(S;X,Y) = H(S)H(S|X,Y)(.)
the mutual information between a stimulus Sand the measurements. As the stim-
ulus entropy is fixed by the experimenter, eq. (.) reduces to minimizing the en-
tropy H(S|X,Y); in most analysis settings this translates into explaining as much
of the stimulus variance as possible using the measurements. Few studies formulate
this objective explicitly, but for a comparison of the various analysis approaches the
information theoretic intuitions illustrated in figure . can be helpful.
Other approaches are to maximize I(X;Y), the mutual information between
multiple measurements (area and in fig. .). is is equivalent to the aforemen-
tioned approaches, but it requires thorough removal of all artifacts (corresponding
to area ) and proper feature extraction covering all relevant parts of areas and ,
prior to maximizing I(X;Y). Examples of this approach are [Ostwald et al., ,
].
5.3. AN INFORMATION THEORETIC VIEW 43
Brain Activity
Electrophysiological
Measurements
Haemodynamic
Measurements
321
4
5 6
H(X,Y)
H(X|Y)
H(Y|X)
I(X;Y)
H(X)
AB
H(Y)
Figure .: Information theoretic illustration of multimodal analyses; AInfor-
mation theoretic quantities visualized as areas in a Venn diagram; BEntropies
of brain activity labeled as in fig. .; area denotes brain activity that is re-
flected only in electrophysiological measures, area activity reflected only in
hemodynamic signals and area brain activity reflected in both modalities; ide-
ally multimodal analysis methods maximize the mutual information I(Z;X,Y)
(see eq. (.)) between brain activity Zand all measurements X,Y(correspond-
ing to electrophysiological and hemodynamic signals, respectively), comprising
areas - (grey); area is that part of I(Z;X,Y)that is not contained in the
mutual information between Zand Y, so area =I(Z;X,Y)I(Z;Y), area
=I(Z;X,Y)I(Z;X)and area =I(Z;X) + I(Z;Y)I(Z;X,Y);
44 CHAPTER 5. MULTIMODAL NEUROIMAGING
Chapter 6
Multimodal analysis approaches
Tcombinationofimagingmodalitiesrequiresdedicatedanalysismethods[Dale
and Halgren, , Friston, ]. is chapter will give a short overview over
existing approaches to data integration of multimodal measurements. Most mul-
timodal analysis approaches are combinations of different unimodal methods (or
extensions thereof) as presented in chapter . When scanning through the multi- In multimodal imaging
a rule of thumb is: e
more accurate the
measurement, the
simpler the analysis
method used
modal neuroimaging literature there is a general tendency: e more detailed the
data is, the simpler are the methods employed. High resolution techniques such as
intracranial microelectrode recordings in combination with intrinsic optical imag-
ing [Grinvald et al., , Devor et al., , Dunn et al., , Berwick et al., ]
or fMRI [Logothetis et al., ] typically employ very simple methods such as
event-related averaging, spatial averaging and univariate correlation coefficients.
e further away from neural activity the measurements are, the more important
are models of the data generating process. For instance when using MEG/EEG mea-
surements, it is crucial to know the conductivity profile of the head (see section .)
in order to make inference about the locations of the neural oscillators underlying
the measurements. Hence especially in the field of non-invasive multimodal studies
using fMRI-EEG there has been an increased interest in model based data analyses
[Daunizeau et al., , Riera et al., , Valdes-Sosa et al., ]. e following
sections will give a broad overview over popular multimodal analysis approaches.
In analogy to chapter we divide the methods into supervised,unsupervised and
model-driven approaches.
6.1 Supervised multimodal methods
Oen multimodal methods rely on classical supervised methods (see section .).
is means that one modality is used as regressor. Alternatively a stimulus variable
can serve as regressor in order to extract relevant features (i.e. those that corre-
late with the experimentally controlled stimulus) from either modality; for instance
ERPs extracted from EEG recordings (using stimulus timing information) can serve
as regressor to find fMRI correlates thereof [Eichele et al., ]. Analogously fMRI
46 CHAPTER 6. MULTIMODAL ANALYSIS APPROACHES
activity patterns associated with a stimulus are used to bias cortical dipole estimates
obtained from EEG measurements [Liu et al., , Dale and Halgren, ]. Since
in many supervised multimodal analyses one modality is used to guide the analysis
of the respective other modality, these methods are oen categorized as asymmetric
data analysis approaches, see e.g. [Laufs et al., , Rosa et al., a]. Symmetric
data integration approaches try to integrate information from multiple modalities
simultaneously. But symmetric analyses based on supervised methods are not appli-
cable to all experimental paradigms. In the following we will give a short overview of
asymmetric and symmetric approaches using supervised methods for multimodal
data fusion.
Two ways of asymmetric data fusion in supervised learning settings
ere are two main approaches to supervised asymmetric multimodal fusion.
EEG-constrained fMRI analyses Here, features from EEG data are extracted to
temporally constrain the fMRI analysis. is is usually implemented by extracting
the time course of a certain feature in the EEG data and use that as regressor in the
GLM design matrix. e two most oen used EEG features are:
Amplitude modulations of event-related potentials (ERPs)
EEG epochs are extracted time locked to the presentation of a sensory stim-
ulus; the amplitude of the ERP can be used as regressor in the GLM design
matrix [Eichele et al., ];
Amplitude modulations of neural oscillations (Band power)
e strength of neural oscillations in a certain narrow frequency band and a
given time window is extracted; the time window length is oen matched to
that of the fMRI sampling rate; this band power time series can be correlated
with the BOLD signal [Logothetis et al., , Moosmann et al., ], used as
GLM regressor [Laufs et al., , Rosa et al., b] or can be used as target
variable in multivariate pattern analysis [Martino et al., ].
FMRI-constrained EEG analyses Here, information that is primarily reflected in
fMRI activity, is used as a spatial constraint for estimating dipole locations based
on EEG data. A prominent example is fMRI-constrained current density imaging
[Liu et al., , Dale et al., ] which has already been proposed in [Dale and
Sereno, ]. Activation patterns obtained from fMRI measurements are assumed
to reflect neural activation. Constrained source density imaging uses these pat-
terns as spatial priors to bias the solution of EEG/MEG dipole estimates towards
the fMRI activation patterns. While early implementations assumed stationary co-
variances between cortical sources, this approach has been extended to account for
6.1. SUPERVISED MULTIMODAL METHODS 47
time-varying spatial constraints [Liu and He, ]. In practice fMRI-constrained
current density imaging is implemented by estimating an fMRI activation pattern
using standard SPM methodology (see section .). is SPM is assumed to reflect
the current source covariance modeled in the matrix Rin eq. (.).
Limitations
Asymmetric supervised methods require unimodal feature selection In the
classical supervised learning setting, one of the two modalities is used to extract a
univariate regressor. is seemingly negligible limitation makes classical supervised
analysis methods difficult to use for multimodal data. Each modality is actually In asymmetric
multimodal data fusion
a regressor is extracted
from one modality to
bias the activity
estimate in another
modality
multivariate. Of course classical GLM type analyses can incorporate more than one
EEG regressor, see e.g. [Rosa et al., b]; but then the analysis is mass-univariate,
meaning that single voxel time series are analyzed separately and multivariate ac-
tivation patterns are neglected in the analysis [Lange et al., ]. Although the
extension of the GLM to multivariate data and multivariate regressors is straight-
forward, this approach has not been investigated in the context of multimodal data
integration. Another important aspect is that each modality reflects certain aspects
of brain activity that other modalities are blind to (see areas and in figure .).
When using asymmetric approaches it is likely that during feature selection of the
primary’ modality (the one that is used to bias the estimate of the other modal-
ity) one selects features that are not reflected in the other (secondary) modality at
all; trying to find these features in the secondary modality can thus lead astray the
scientific interpretation. An example are synchronized oscillations between brain
regions. Synchronization measures require a high temporal resolution as they are
based on millisecond time differences between oscillations recorded at different po-
sitions. ese synchronization phenomena are unlikely to be reflected in the slow
oscillations of voxel activations in fMRI signals. Unless synchronization between re-
gions imposes additional energy demands on both brain regions, synchronized on
the time scale of hemodynamic activation, using synchronization based regressors
should not yield any significant activation in the fMRI signal.
Symmetric supervised methods require a model of neural activity Symmet-
ric integration approaches try to overcome this problem by analyzing modalities
simultaneously rather than biasing one towards the other. A symmetric data fusion
approach based on supervised learning methods is taken in [Fazli et al., ]. e
authors use a multivariate pattern classifier on EEG and NIRS data separately. e
output of each classifier is then used to train another classifier to predict a stim-
ulus label from the multimodal EEG-NIRS data. is kind of symmetric anal- Symmetric supervised
data integration
requires stimulus
information
ysis is not applicable to experimental paradigms studying intrinsic neural activity,
such as epileptic activity or ongoing oscillations reflecting resting state activity, for
a review on paradigm-less multimodal studies see [Salek-Haddadi et al., ]. In
48 CHAPTER 6. MULTIMODAL ANALYSIS APPROACHES
principle supervised methods can be applied here but this requires manual fea-
ture selection e.g. by an experienced neurologist in order to construct a regressor.
If there is no model of neural activity and no expert knowledge available for extract-
ing features manually, feature extraction becomes difficult in supervised learning
settings. Wrong feature selection can reduce the amount of mutual information be-
tween modalities significantly and thus have detrimental effects on subsequent anal-
yses [Ostwald et al., ]. If there is no model about the underlying data generation
process, which could guide manual or supervised feature extraction, unsupervised
methods can be used to automatically extract features.
6.2 Unsupervised multimodal methods
Unsupervised methods extract structure in a data set in an explorative manner. A
major problem with unsupervised learning methods for multimodal neuroimaging
is the data fusion process. e unsupervised methods in chapter . are inherently
unimodal. us when using unsupervised methods on multimodal data oen each
modality is analyzed separately. When applying PCA or ICA to each modality sep-
arately, it is not clear how the components in one modality relate to those of another
modality the first PC in one modality does not necessarily correspond to the first
PC in another modality. Data integration is especially difficult if an unsupervised
method does not result in an ordered set of components, like ICA. Many approaches
extract unimodal features using unsupervised methods from one modality. e ex-
tracted featuresare then used as regressors in a GLM based SPM [Eichele et al., ,
Debener et al., ]. ree examples of solutions to this data integration problem
are sketched in the following:
Integration of unimodal analyses [Eichele et al., ]:
. Perform unsupervised analysis separately on each modality.
. Compute the correlation between single components to find pairs
of components across modalities.
Joint analyses of concatenated data [Moosmann et al., ]:
. Reduce the dimensionality of each modality separately by PCA.
. Concatenate the data of all modalities into one matrix.
. Apply unsupervised methods to the joint data matrix.
Reciprocally constrain each modality by other modality [Yang et al., ]:
. Perform unimodal temporal ICA separately on EEG data.
. Extract the band power of one IC to construct a regressor.
6.2. UNSUPERVISED MULTIMODAL METHODS 49
. Use the IC regressor to estimate SPM based fMRI activation pattern.
. Use the fMRI activation pattern to guide the EEG dipole estimation.
ese data integration approaches combine the results from unimodal unsupervised
analyses. Other unsupervised data integration approaches are specifically designed
for multimodal data and thus do not require a second step of data fusion. Many un-
supervised multimodal data integration approaches are special cases of a statistical
learning method called canonical correlation analysis (CCA) [Hotelling, ]. A
detailed derivation of CCA is described in chapter . Related methods are for in-
stance partial least-squares [Wold, , Geladi and Kowalski, ] or multivariate
linear regression; also the GLM (see section .) can be cast as a form of CCA [Hen-
son, ]. For a comprehensive introduction to CCA and its relationship to other
methods see [Borga, ]. e advantage of these multimodal unsupervised meth-
ods is that they automatically extract the relevant features and integrate the data in
one optimization step.
Limitations
Unimodal unsupervised methods are difficult to interpret When using un-
supervised methods for data integration, it can be difficult to relate the activation
profiles in one modality to those found in another modality. In general when an-
alyzing multimodal data with unimodal methods, the correspondence between the
components found in multiple modalities cannot be guaranteed. In extreme cases,
ICA would find in one modality independent components that do not show clear
correlations with any of the components in another modality. Or a component in
one modality is correlated to more than one component in another modality.
Temporalalignmentrequiresa modelof neurovasculardynamics Multimodal
unsupervised fusion requires each sample in one modality to have a corresponding
sample in the other modality such that single samples can be correlated. In order
to get corresponding samples from a high resolution (in time) electrophysiological
measurementanda lowresolutionfMRI timeseries, onecantakesimilarapproaches
as in the supervised settings above: Extraction of stimulus locked epochs, as done in
[Correa et al., ], or extraction of band power time series with temporal bins that
have the same duration as the fMRI image acquisition, as done in [Martínez-Montes
et al., , Murayama et al., , Bießmann et al., a]. Having the same num-
ber of temporally aligned samples is not enough. Unsupervised multimodal analy-
ses require that simultaneous samples in each modality be correlated. For combined
electrophysiology and hemodynamic measurements this is not the case. e HRF
delays the fMRI signal by several seconds. Temporal alignment is usually done by
convolving the electrophysiological time series with the same canonical HRF as in
classical supervised analysis methods, see fig. .. is approach does not take into
50 CHAPTER 6. MULTIMODAL ANALYSIS APPROACHES
account the spatiotemporal dynamics and hence can fail to detect certain aspects of
neural activity. A solution to this problem will be proposed in chapter and .
Only activity reected in both modalities is found Multimodal unsupervised
data fusion is based on the assumption that all modalities reflect a common under-
lying latent variable. CCA for instance aims to find those representations of each
modality that maximizes the correlation between multiple modalities . is implies
that those aspects of brain activity that are reflected in one modality only (see figure
., area and ) will not be captured by CCA. In practice however proper prepro-
cessing helps to avoid these cases: Extracting the fast oscillations of the neurophysi-
ological signal by computing the band power in a preprocessing step captures those
aspects that are not directly reflected in the BOLD signal and reduces them to the
relevant quantity changes in membrane potentials on the level of neural ensem-
bles. Similarly proper selection of voxels on the side of fMRI data prevents from
biasing the result towards highly active voxels containing e.g. vessels on the cortical
surface.
6.3 Model driven data fusion
Detailed biophysical models for multimodal data fusion are proposed e.g. in [Riera
et al., , Daunizeau et al., , Plis et al., , Bojak et al., ]. In short,
the models usually employ forward models for electrical and vascular signals as pre-
sented in section .. e modeling typically starts at a hypothetical quantity x(t)
representing the combined postsynaptic activity of a neural ensemble in a given
voxel. is synaptic activity depends on the synaptic activity at previous time points
and potentially also on input from other voxels. Additive noise εi(t)drives the sys-
tem. Synaptic activity is then translated via appropriate forward models into uni-
modal measurements. e dynamics of the vascular and electrophysiological for-
ward models are represented in systems of stochastic non-linear differential equa-
tions (SDEs). Following the line of thought in [Valdes-Sosa et al., ] EEG/fMRI
models are particular cases of State Space Models (SSMs) [Kalman, ]
˙
x(t) = f(x(t),s(t), Φ) + μ+Λ˙
εi(t)(.)
y(t) = g(x(t), Φ, s(t)) + εe(t), (.)
where εedenotes external measurement noise. Equation (.) is oen called state
equation, containing a set of SDEs describing the dynamics of the state vector x(t).
A state refers to the common underlying quantity driving electrophysiological and
hemodynamic activity and is oen referred to as ensemble of postsynaptic potentials
[Valdes-Sosa et al., ]. e vector s(t)describes external inputs which influence
the system, such as sensory stimulation, ascending neuromodulatory influences or
6.3. MODEL DRIVEN DATA FUSION 51
ongoing oscillations. e set of parameters specifying the forward models is con-
tained in Φ. In case of a combined EEG-fMRI model these parameters include the
volume conductivity profile of the head (for the EEG forward model, see section
.) and the parameters of the vascular forward model (section .). e vector
εi(t)models random inputs following a Gaussian distribution, with mean μand co-
variance matrix Λ. Equation (.) is called observation equation and describes how
the (multimodal) measurements are determined by the states x(t). Note that the
measurements are taken at discrete time steps t, while the underlying stochastic dif-
ferential equations are continuous in time. SSMs are well known in control theory
since the s and have become popular in the neuroimaging community mainly
under the name Dynamic Causal Modeling [Friston et al., ]. Solving these for-
ward models can be difficult. Analytical solutions to these equations do not exist,
they have to be solved numerically. e solution to equations (.) and (.) can be
approximated using local linearization [Riera et al., , Valdes-Sosa et al., ].
Limitations
Which level of abstraction is the best? A general problem with model based
data fusion is that it is difficult to assess how complex a model should be. A good
model should be detailed enough to make hypotheses about the neural processes
of interest but at the same time the model should be simple enough to be computa-
tionally tractable. Obviously it is impossible to model brain activity on a realistically
small scale. Neither does one have the measurements to build a sufficiently detailed
model, nor is it computationally tractable. Strictly speaking if one wants to describe
a neuron exactly, one needs to take into account the state of each ion channel and
many metabolic processes inside the cell. With current computer technology and
the limited knowledge we have about the neural and vascular anatomy this seems
ambitious even for point neuron¹ models, not to speak of more complex neuron
models. A typical voxel in a multimodal analysis contains thousands of neurons.
Even if it would be technically feasible to place multiple electrodes into every single
cell for a typical voxel size this would amount to a few thousand electrophysiology
channels per voxel there remains the problem of scalability of these models. Effi-
cient solutions for large scale models exist, see e.g. [Valdes-Sosa et al., ], but for
a data set that would offer a realistic chance to test the validity of the model assump-
tions, these approximations would be infeasible to compute as well. Abstraction is
needed to make the problem tractable. Most studies simplify the problem and do
not model single cells but rather net activity of cell ensembles. is approximation
seems reasonable. e net dipole strength can be measured with intracranial elec-
¹Point neurons are modeled as a point in space, without accounting for the cells morphology;
the morphology however does play an important role for the integration of inputs and subthreshold
computations; point neuron models are called phenomenological as they reproduce many phenomena
of real neurons activity profiles without a biologically realistic structure;
52 CHAPTER 6. MULTIMODAL ANALYSIS APPROACHES
trodes. Importantly these measurements can be combined with measurements of
fMRI signals [Logothetis et al., ]. is allows to test hypotheses involving the
model quantities. Other popular simplifying assumptions about the neurovascular
coupling mechanisms are:
. e HRF has an isotropic gaussian profile in space [Sirotin et al., ]
. e spatiotemporal dynamics of the hemodynamic response is
independent in space and time [Daunizeau et al., ]
Both assumptions speed up analyses considerably. However there is converging em-
pirical evidence speaking against them [Aguirre et al., , Yacoub et al., ,
Berwick et al., , Bießmann et al., a].
Realistic models are difficult to test empirically Realistic models are only use-
ful as long as they can be tested empirically. If the model quantities cannot be mea-
sured nor falsified they are of limited use for scientific research [Popper , ].
Some more detailed models for multimodal data fusion explicitly model inhibitory
and excitatory neural populations, see e.g. [Bojak et al., ]. is is biologically
plausible a single neuron can only have positive spiking rates, never negative ac-
tivity. For a balanced network it is necessary to have both inhibition and excitation
[Brunel, ]. However when modeling excitation and inhibition independently,
these quantities and the model predictions associated with them have to be testable
empirically. Measuring (morphologically or immunohistochemically identified) in-
hibitory and excitatory cells independently in vivo and ideally in combination with
hemodynamic measurements is technically very challenging and to the best of my
knowledge there are no labs that have tried to accomplish this task².
Summary
Taken together most approaches to multimodal neuroimaging analysis are based
on unimodal methods and can be categorized in analogy to chapter . e main
problems associated with the respective analysis approach are:
Supervised methods find only activity correlated with a univariate regressor
Unsupervised methods can be difficult to interpret in a multimodal context
Model driven methods are difficult to compute/validate for realistic models
²Inhibitory interneurons are typically very small and thus intracellular recordings are difficult to
obtain. Classifying inhibitory and excitatory neurons only on the basis of their action potential shapes
is technically feasible but as there are very fast spiking excitatory neurons it is oen difficult to make a
clear cut distinction.
Chapter 7
Data-driven multimodal analysis
T chapter will introduce a new data driven approach to multimodal data fu-
sion, temporal kernel canonical correlation analysis (tkCCA). In order to meet
the requirements for multimodal data acquired with state of the art measurement
devices the method was designed to fulfill three criteria:
. Multi-modality without unimodal feature selection
. Scalability in particular to fMRI data (>voxels but few data points)
. Model free coupling parameters should be estimated from data
is chapter will give a step-by-step introduction to tkCCA. e first step is
a recapitulation of the classical statistical learning technique canonical corre-
lation analysis (CCA) [Hotelling, ]. e use of CCA is motivated by the first
requirement, as CCA is specifically designed for multimodal data. In its classical
form CCA is computationally costly for high dimensional data such as fMRI sig-
nals. us in order to meet the second requirement, scalability, we will introduce
in a second step the kernelized version of CCA, kernel CCA (kCCA) [Fyfe and Lai,
, Akaho, , Melzer et al., ]. Although initially designed for data with
non-linear dependencies, another advantage of kCCA is that it is very efficient for
high dimensional data, especially if the number of data points is much lower than
the number of dimensions. Neurophysiological and hemodynamic activity is not
correlated instantaneously but they are related via complex neurovascular coupling
mechanisms. e temporal dynamics of this coupling are oen modeled by a canon-
ical HRF, see fig. .. is temporal profile of the HRF is hypothesized to be in-
dependent of the spatial dynamics of neurovascular coupling. ese assumptions
are a convenient approximation, however there is empirical evidence against them.
e HRF varies between brain regions [Aguirre et al., ] and cortical depth [Ya-
coub et al., ]. High resolution optical imaging studies suggest that the local fine
structure of cortical vascularization gives rise to spatiotemporal hemodynamics that
cannot be captured by a combination of an independent spatial and temporal com-
ponent [Berwick et al., ]. Accounting for this spatiotemporal variability can
54 CHAPTER 7. DATA-DRIVEN MULTIMODAL ANALYSIS
increase the prediction accuracy of neurophysiological activity and reveal areas of
activation that separable models would fail to detect [Lu et al., ]. For a detailed
analysis of neurovascular coupling mechanisms it is preferable to refrain from these
rather restrictive assumptions. In order to meet the third requirement, model-free
analysis, we extended kCCA such that it can account for arbitrary spatiotemporal
coupling mechanisms. e resulting algorithm, temporal kernel canonical correla-
tion analysis (tkCCA) does not require the usual assumptions about neurovascular
spatiotemporal dynamics, instead it estimates the optimal dynamics from the data.
For the sake of simplicity all algorithms in this chapter will be illustrated with sim-
ple toy data examples. Chapter will then give an overview over the applications of
tkCCA to more realistic artificial data and data from real experiments.
7.1 Canonical Correlation Analysis (CCA)
Suppose we have simultaneously recorded measurements of different multivariate
modalities [x,x,. . . ,xT] = X
R
U×Tand [y,y,. . . ,yT] = Y
R
V×T. Let us as-
sume that the data are artifact free and both modalities reflect neural activity. How
should one integrate these data in a common framework? One modality has Udi-
mensions, the other has Vdimensions. It is not clear how each dimension in one
modality is related to another dimension in the other modality. e only assump-
tion one can rely on is that both modalities reflect a common underlying process
z, in the case of multimodal neuroimaging neural activity. Figure . depicts the
graphical model for this multimodal setting. One could apply PCA on each modal-
ity or on a joint data set and continue with analyzing only the strongest principal
components of the data. In fact this is oen done in multimodal analysis frame-
works as a preprocessing step [Moosmann et al., ]. However the directions of
maximal variance within one modality might not be the ones reflecting the signal
of interest. Figure . illustrates this scenario with toy data. In the toy data setting,
the common underlying variable, a sine wave, has low variance in each modality. A
standard dimensionality reduction in fMRI preprocessing discards the lowest vari-
ance directions in the data. Here, joint PCA would discard the very component
of interest. In general when applying unimodal unsupervised methods there is no
guarantee that the ith component of one modality reflects the same process as the
ith component in another modality. is is even worse when using ICA, where the
components cannot be not ranked according to the amount of variance explained
like in PCA. A simple solution is offered by CCA, which finds new representations
for simultaneously measured modalities such that the modalities are maximally cor-
related. e idea behind CCA is that the representation of the data that maximizes
the correlation between the modalities xand yreflects the common underlying pro-
cess zbest.
7.1. CANONICAL CORRELATION ANALYSIS (CCA) 55
X
Z
Y
Figure .: Graphical model for CCA; a common hidden variable Z, representing
the true neural activity, is reflected in two modalities: X, the measurements of
neurophysiologicalspectrogramsandY, measurements ofhigh dimensional fMRI
time series;
Derivation We assume centered data such that T
i=xi= and T
i=yi=.
CCA learns for each modality a feature weighting wx,wy, such that the canonical
correlation¹ρ
ρ=w
xCxywy
w
xCxxwxw
yCyywy
(.)
between the data in Xand Yis maximized. Equation . contains the empirical
auto-covariance matrices of each data source
TXX=Cxx
R
U×Uand
TYY=Cyy
R
V×V, (.)
the empirical estimate of the cross-covariance matrix
TXY=Cxy
R
U×V(.)
and the first canonical directions wx
R
Uand wy
R
Vof Xand Y, respectively. e
numerator contains the cross-variance of the data in its new representation, i.e. aer
projecting the data onto wx,wy. In the denominator we have a product of the auto-
covariances within each data set, projected onto the canonical directions. Without
this denominator, ρwould merely be a covariance. With the auto-covariance terms,
ρwill be scaled into the interval [ ]. is correlation coefficient is called canonical
since ρis invariant with respect to linear transformations of the data. Many exten-
sions to CCA have been proposed, including non-linear versions of CCA [Leurgans
¹e canonical correlation is equivalent to the concept of principle angles, introduced earlier in
[Jordan, ]. Principal angles quantify the similarity between two subspaces in an euclidean space.
56 CHAPTER 7. DATA-DRIVEN MULTIMODAL ANALYSIS
et al., ], CCA using stochastic processes [Fyfe and Leen, ] and sparse ver-
sions of CCA [Fyfe, , Wiesel et al., , Hardoon and Shawe-Taylor, ].
It could also be shown that the CCA framework can be easily extended to more
than two multivariate variables [Kettenring, ]. CCA has found widespread ap-
plication in multivariate statistics [Anderson, ], signal processing [Larimore,
, Bach and Jordan, ], data mining [Bie and Cristianini, , Bießmann
and Harth, ], artificial intelligence [Kim et al., ] and neuroscience [Friman
et al., , Hardoon et al., , Macke et al., ] to name just a few examples.
e complete solution of CCA are sets of canonical projections Wx= [w
x,. . . ,wk
x]
and Wy= [w
y,. . . ,wk
y],k=min(U,V), one set of projections for each variable.
Each Wispans a subspace in the respective data space of the ith modality that maxi-
mizes eq. (.), the canonical correlation between the variables. We will first derive
the solution for the first pair of canonical directions w
x,w
yusing the Lagrange mul-
tiplier technique. is will lead to a CCA solution based on eigenvalue decomposi-
tions (see appendix A.) of covariance matrices. is formulation naturally extends
to a CCA solution for all remaining kcanonical directions. We want the first pair of
canonical directions wx,wyto maximize the covariance between Xand Y:
argmax
wx,wy
w
xCxywy, (.)
In order to have a correlation instead of a covariance we need the right normalization
for eq. (.). is can be done by requiring the covariance within each variable X,Y
to be , i.e. we maximise eq. (.) subject to:
w
xCxxwx=, w
yCyywy=. (.)
Combining the objective function in eq. (.) with the constraints in eq (.) yields
the Lagrangian
L=w
xCxywy
λx(w
xCxxwx)
λy(w
yCyywy). (.)
As we want to maximise eq. (.) we calculate the partial derivatives with respect to
wxand set it to zero:
L
∂wx
=CxywyλxCxxwx=
Cxywy=λxCxxwx
(.)
7.1. CANONICAL CORRELATION ANALYSIS (CCA) 57
Analogously for wy:
L
∂wy
=w
xCxy λyw
yCyy =
w
xCxy =λyCyywy
(.)
By le multiplying eq. (.) with w
xwe obtain:
w
xCxywy=λxw
xCxxwx(.)
and by le multiplying eq. (.) with w
y:
w
yw
xCxy =λyw
yCyywy
w
y(Cyxwx)=λyw
yCyywy
w
xCxywy=λyw
yCyywy.
(.)
We see from eq. (.), eq. (.) and eq. (.) that λx=λy=ρ. Combining eq.
(.) and eq. (.) in one equation results the generalized eigenvalue equation
[Cxy
Cyx ][ wx
wy]=[Cxx
Cyy ][ wx
wy]λ. (.)
Successive generalized eigenvectors Wx,Wyare orthogonal in the metric defined by
the within-modality covariance matrices Cxx,Cyy. We can thus formulate the CCA
solution for all canonical projections in complete analogy to eq. (.) as
[Cxy
Cyx ][ Wx
Wy]=[Cxx
Cyy ][ Wx
Wy]Λ. (.)
where Wx
R
U×k,Wy
R
V×kare the matrices with the respective canonical pro-
jections in their columns and
Λ=
λ. . .
.
.
.....
.
.
. . . λk
(.)
is the diagonal matrix of all canonical correlations. We can now project the data into
the subspace with the highest canonical correlation by multiplying the data with the
jcolumns of Wthat correspond to the jhighest eigenvalues,
X=
j
i=
eiwi
xX= [w
xw
x...wj
x]X=W
xX(.)
Y=
j
i=
eiwi
yY= [w
yw
y. . . wj
y]Y=W
yY. (.)
58 CHAPTER 7. DATA-DRIVEN MULTIMODAL ANALYSIS
Generative Model CCA can be interpreted in a probabilistic framework [Bach
and Jordan, ]. e generative model underlying the canonical directions is
based on a multivariate gaussian latent variable z
R
Ddrawn from a multivari-
ate normal distribution with diagonal covariance matrix Λ
z N(,Λ), Λ
R
D×D. (.)
is is the underlying variable in fig. . which is reflected in the modalities x
R
U
and y
R
V. e conditional probability distributions of xgiven z, and ygiven z,
respectively are given by
x|z N(Wxz,Qx),Wx
R
U×D
y|z N(Wyz,Qy),Wy
R
V×D. (.)
e maximum likelihood solution to eq. (.) can be estimated using expectation
maximization [Bach and Jordan, ]. Extensions to mixtures of non-gaussian
distributions such as Dirichlet distributions have been proposed in [Fyfe and Leen,
]. Although the generative model perspective is helpful in many cases, the so-
lution to eq. (.) involves expectation-maximization across the parameters and is
thus computationally costly for multimodal data. Oen an analysis is not interested
in estimating the full probability density function of the variables. If the goal of an
analysis is merely to decode neural activity it is simpler to find for each modality the
features wx,wythat represent that underlying neural process best. For this task the
algebraic perspective in eq. (.) is sufficient if a fast and efficient implementation
is needed.
Extensions to more than two variables CCA can be extended to more than two
variables, see [Kettenring, ]. In general if there are Nmultivariate variables
and corresponding centered data matrices {X,X,. . . ,XN}, the basis vectors of the
canonical subspace of each variable {W,W,. . . ,WN}can be found by solving the
generalized eigenvalue problem
C . . . CN
C . . . CN
.
.
..
.
.....
.
.
CNCN. . .
W
W
.
.
.
WN
=
C . . .
C . . .
.
.
..
.
....
CNN
W
W
.
.
.
WN
Λ.
(.)
where Cij denotes the covariance matrix between the ith and jth variable.
CCA and mutual information ere is a straightforward relationship between
canonical correlation and mutual information. If the data is completely character-
ized by its first two moments, mean and covariance, the mutual information I(X,Y)
7.1. CANONICAL CORRELATION ANALYSIS (CCA) 59
(see eq. (.)) is given by
I(X,Y) =
i
log (
(λ
i)). (.)
So the mutual information between two modalities X,Ycan be computed by sum-
ming up the diagonal entries of the new cross-covariance matrix Λ aer project-
ing the data on their canonical directions. Proofs can be found in [Borga, ] or
[Cover and omas, ]. is means that, leaving higher order moments aside
and assuming elliptical data distributions of Xand Y, CCA optimizes the mutual
information between two modalities (corresponding to area and in fig. .).
At first glance one might think that optimizing mutual information would miss ex-
clusively unimodal information (areas and ), but include irrelevant features be-
longing to multimodal artifacts of non-neuronal origin (area ). In practice how-
ever, standard preprocessing techniques (proper artifact correction and unimodal
preprocessings such as band power extraction for electrophysiological data) help to
avoid these cases.
Why CCA is better than PCA for multimodal data PCA is oen used in multi-
modal analyses for unimodal feature extraction. It is easy to see in a simple toy data
example why this can lead to wrong conclusions. Figure . illustrates the PCA and
CCA solution on a two dimensional toy data set, i.e. x
R
,y
R
, generated
according to:
X=[γ s(t)
γ εx(t)]Y=[γ εy(t)
γ s(t)](.)
where the underlying common signal s(t)is a sine wave (of arbitrary frequency) and
the measurement noise εx(t),εy(t)for each variable is drawn each from a normal
distribution with mean zero and standard deviation one. e variance of the signal
is controlled by the parameter γ, in our simulations we set γ=. in order to have
alow signal variance in one dimension of X,Yand a high noise variance γ. As
shown in fig. .A/B the generative model in eq. (.) assigns the low variance
signal to the first dimension of Xbut to the second dimension of Y, the respective
other dimension carries the noise. Computing PCA separately on this multimodal
data will assign high weights to the dimension of X,Ythat exhibit most variance
and low weights to those that carry low variance. If the common underlying signal
shas a lower variance than the modality specific noise εx,εy, PCA will consider the
noise to be the most important aspect of the data and the first principal component
will reflect not the common signal but only noise. CCA in contrast will maximize
not the variance within each variable, but rather the correlation between X and Y.
If the noise within each data source is stronger than the common signal in multiple
60 CHAPTER 7. DATA-DRIVEN MULTIMODAL ANALYSIS
modalities CCA clearly yields more sensible results. Figure .C/D show the results
of PCA and CCA on the toy data generated according to eq. (.); while the first
principal component captures only the modality specific noise, the first canonical
component reflects only the common signal s.
Variable X
X2
X1
Variable Y
Y2
Y1
CCA
PCA
X projected on first
component of PCA/CCA
CCA
PCA
Y projected on first
component of PCA/CCA
A
B
C
D
Figure .: Toy data example comparing PCA and CCA for multimodal data anal-
ysis; data was generated according to eq. (.); A: first dimension of Xcontains
high variance noise εxand second dimension the low variance signal s;B: Same
for Y, dimensions for sand εyare flipped compared to X;C: First principal and
canonical component of X; PCA captures only the noise, CCA captures the signal;
D: Same as C for variable Y;
7.2. KERNEL CCA (KCCA) 61
7.2 Kernel CCA (kCCA)
Young man, in mathematics
you dont understand things.
You just get used to them.
—John von Neumann
ere are situations in which the (canonical) correlation coefficient fails to de-
tect dependencies between two modalities. A simple example of non-linear depen-
dencies between two univariate variables is a phase shi as illustrated in fig. .. In
the example we consider univariate variables only for the sake of simplicity. e
variables are generated by
x=sin(t) + .ε,y=cos(t) + .ε(.)
such that the signals are exactly the same but phase shied by π/, as shown in
fig. .A. When plotting xagainst ythe two signals will show zero correlation
(fig. .B). is example was chosen for illustrative purposes only of course there
are many other methods that could be used to extract the non-linear dependencies
in this particular toy data example; e.g. simply correcting for the time lag of π/
would remove the non-linearity in this example. Ingeneral if there are dependencies
between two variables that cannot be captured by linear relationships one can use
the kernel trick [Aizerman et al., , Schölkopf et al., , Müller et al., ]
to extract non-linear dependencies from the data. e idea behind the kernel trick
is to map all data points x
R
U,y
R
Vinto a much higher (possibly infinite-
dimensional) space S
φx:
R
U S,φy:
R
V S (.)
and solve the solution to a learning problem in that space. e trick is that one never
has to compute these mappings explicitly if the objective function can be expressed
in terms of inner products in S, denoted .,.S. All computations can then be done
on the kernels KX,KY, where the ith row and jth column are the inner product be-
tween the ith and jth data point in the kernel feature space
KX,ij =kx(xi,xj) = φx(xi),φx(xj)S, (.)
where the kernel function kx(.,.)computes the inner product in S. is of course
requires Sto have an inner product and a norm. Popular kernel functions that map
to a space Swhich meets these criteria include
Linear kernels k(xi,xj) = x
ixj, (.)
Polynomial kernels k(xi,xj) = (xi,xj)d, (.)
Gaussian kernels k(xi,xj) = e(xixj)/(σ
x).
62 CHAPTER 7. DATA-DRIVEN MULTIMODAL ANALYSIS
In kCCA the original CCA objective in eq. (.) becomes
φ
x,φ
y=argmax
φx,φy
Corr(φx(X),φy(Y)), (.)
where
φx(X) = [φx(x),φx(x),. . . ,φx(xT)](.)
φy(Y) = [φy(y),φy(y),. . . ,φy(yT)]
are the Tdata points mapped into S. e intuition behind kernel CCA is that the
solutions φ
x,φ
yhave to lie in the space spanned by the data. is means that regard-
ing the data points mapped into Sas basis vectors of a subspace in S, the solution
can be expressed as
φ
x=
T
i
φx(xi)αi=φx(X)α, (.)
φ
y=
T
i
φy(yi)βi=φy(Y)β,
where α
R
Tand β
R
Tand the index i {,. . . ,T}runs over all training data
points. e entries αi,βicontain the contribution of each data point ito the solu-
tion φ
x,φ
y. Solving kernel CCA thus means finding the α,βvectors that maximize
eq. (.). is is all that is needed in order to express the solutions φ
x,φ
yin S. Fol-
lowing eq. (.) new samples xi,yican be projected onto their canonical directions
in kernel space by
φx(xi),φ
xS=φx(xi),
T
j
φx(xj)αjS=
j
kx(xi,xj)αj=Kxα(.)
φy(yi),φ
yS=φy(yi),
T
j
φy(yj)βjS=
j
ky(yi,yj)βj=Kyβ,
where αj,βjare the jth entry of α,β, respectively. It turns out that the solution to
kernel CCA, the coefficients αand β, can be obtained just as easily as the solution to
standard CCA in the previous section. Looking at eq. (.) and the original CCA
objective in eq. (.) we see that w
xCxywy=/T w
xXYwyis equivalent to
φx(X),φ
x
Sφy(Y),φ
yS=αKxKyβ. (.)
7.2. KERNEL CCA (KCCA) 63
e Lagrangian in eq. (.) thus becomes
L=αKxKyβ
λx(αK
xα)
λy(βK
yβ). (.)
So the solution of kernel CCA is exactly the same equation except that the canonical
projections wx,wyare replaced by α,βand the covariance matrices Cxy,Cxx,Cyy are
replaced by the respective kernel products KxKy,K
x,K
y. In complete analogy to the
CCA case we can find the solution to eq. (.) by taking the partial derivatives and
setting them to zero. e resulting equations can be rearranged and the coefficients
α= [α,α,. . . ,αT]and β= [β,β,. . . ,βT]can then be obtained analogously to
eq. (.) as solution to the generalized eigenvalue problem
[KXKY
KYKX][ α
β]=ρ[K
X
K
Y][ α
β]. (.)
Statistical consistency of kernel CCA has been shown in [Fukumizu et al., ].
An important advantage of kCCA is the ability to detect non-linear dependencies.
Figure . shows that while the linear correlation coefficient is zero, aer mapping
the data through a gaussian kernel function the data are perfectly correlated.
Non-linear kernel functions offers a powerful means to estimate non-linear de-
pendencies [Gretton et al., ]. But the visualization of the canonical directions in
feature space is oen difficult. Simple toy data examples in two dimensions are easy
to visualize, see fig. .. For most real data sets in which kernel methods are being
used this is more difficult. Especially in the context of high dimensional data such
as fMRI signals oen linear kernels are sufficient. Non-linear kernels do not neces-
sarily lead to an improvement as to the accuracy at which a data set can be predicted
from a stimulus [Bergstrand et al., ]. And more importantly linear kernels offer
the advantage that the solutions to eq. (.) φ
x,φ
ycan be easily visualized in the
input space of the respective modality as a linear expansion of data points:
φ
x=,φ
y=. (.)
where the coefficients αcan be obtained by solving eq. (.). Note that while the
solution to linear kCCA is the same as linear CCA, the computation of this solution
is much more efficient in kernel space as the kernel of the data is only of size L×L,
while the covariance matrices would be of size U×Uand V×V. In the case of fMRI
data this offers a significant computational speed-up.
Regularization Real data has a limited number of samples. Within a given data
set there will be spurious correlations that do not reflect the true coupling dynamics
butrathernoisefluctuations. Whenfittinganymodel givenalimitednumber ofdata
points, it is thus important to control the complexity of the solution. Otherwise the
64 CHAPTER 7. DATA-DRIVEN MULTIMODAL ANALYSIS
results will be overfitted to the training data set and they will not generalize to new
data. Controlling the complexity of the solution is called regularization and can be
implemented by requiring the solutions to the generalized eigenvalue problem α,β
to have a small euclidean norm. is additional constraint can be incorporated in
the standard kCCA constraint by simply adding a ridge of height κx,κyonto the
respective kernel matrix
αK
xα+κxI= (.)
βK
yβ+κyI=, (.)
where Idenotes the identity matrix of the same size as Kx,Ky. Plugging these con-
straints in equation (.) yields a new Lagrangian
L=αKxKyβ
λx(αK
xα+κxI)
λy(βK
yβ+κyI). (.)
Computing the derivative, setting it to zero and rearranging the equations, the so-
lution for regularized kernel CCA can be formulated as:
[KXKY
KYKX][ α
β]=ρ[K
X+κxI
K
Y+κyI][ α
β](.)
So regularizing the kCCA solution can also be achieved by simply adding a ridge on
the denominator of the generalized eigenvalue equation (.). e choice of the
right hyper-parameter is crucial. Typically one optimizes the regularizers κx,κyby
using cross-validation [Lemm et al., ] or permutation based approaches as in
[Bießmann et al., b].
The pre-image problem Equation(.) showsthatnew datapointscanbeeasily
evaluated in kernel space. So one can compute canonical correlations in S. But of-
ten one is also interested in looking at φ
xand φ
y, to see which features kernel CCA
picked to maximize the canonical correlation. Unfortunately the space in which
these solutions are living is potentially infinite dimensional and thus difficult to in-
terpret. Sometimes it can be helpful to find the mapping φ
x,φ
ythat brings φ
x,φ
y
from Sback into the respective input space
R
U,
R
V
φ
x:S
R
U,φ
y:S
R
V(.)
Finding the right φ
x,φ
yis called the pre-image problem. e pre-image of linear
kernels is given by eq. (.). Solving the pre-image problem for the non-linear case
is not as simple. Various methods for visualizing the solution of non-linear methods
in input space have been proposed [Schölkopf et al., , Mika et al., , Kwok
and Tsang, , Bakir et al., ] but these methods do not always yield satisfac-
tory results.
7.2. KERNEL CCA (KCCA) 65
While an exhaustive treatment of kernel methods is beyond the scope of this
thesis, there are two features of kernel methods that are useful for the practitioner
to keep in mind:
. Non-linear problems can become linear in kernel feature space
. A kernel algorithm operates only on similarities between data points
e first point highlights the power of kernel methods but also emphasizes the need
for a reasonable choice of a kernel; non-linear dependencies in the original data
space oen become linear in kernel feature space but not always the solution in
kernel space is easy to visualize and interpret in the original data space. e second
aspect has its positive and negative sides, too: Working on inner products in feature
space means that the computational cost of a kernel based algorithm will scale with
the number of data points and not with the dimensions of the feature space nor
with the dimensions of the input space. In case the dimensionality of a data set is
much larger than the number of data points, using kernel methods can considerably
speed up an algorithm. But if the number of data points is larger than the number
of dimensions, kernel methods can be much slower than equivalent non-kernelized
algorithms.
66 CHAPTER 7. DATA-DRIVEN MULTIMODAL ANALYSIS
−2
0
2
x(t)=sin(t)
Time −2
0
2
y(t)=cos(t)
Time
−1 0 1
−1
−0.5
0
0.5
1
X
Y
Correlation: 0.00
−1 −0.5 0
φx(X) 0 0.5 1
φy(Y)
φx(X)
φy(Y)
Correlation: 1.00
A B
C D
Figure .: Illustration of kernel CCA on toy data generated according to eq.
(.); kernel function was a gaussian kernel with σ=, see also eq. (.); A:
Time series of first modality x(t) = sin(t); for each value of Xthe corresponding
value φx(X)in the kernel feature space is plotted to the right of the time series plot;
B: Same as A for Y=cos(t);C: e standard linear canonical correlation coeffi-
cient between Xand Yis zero; D: First canonical component of Xand Yobtained
by kCCA plotted against each other; in kernel space, the two signals are perfectly
correlated;
7.3. TEMPORAL KCCA (TKCCA) 67
7.3 Temporal kCCA (tkCCA)
It’s just a temporal embedding?
I cant believe nobody tried this before.
—Nicole Krämer
In CCA the canonical correlation, see eq. (.), between two signals is maximized.
When the two signals are not instantaneously correlated but with some time lag τ
then classical CCA will fail to find sensible canonical directions. e toy data ex-
ample in figure . shows such a scenario. Non-linear kernel CCA could find the
dependency between the variables nonetheless. But oen there is no need to use
non-linear kernel functions. In the toy data example the data is actually linearly de-
pendent: If one knew the correct time lag of π/ one could shi the cosine signal
back in time and the signals would be perfectly correlated. In general when multiple
modalities are not coupled instantaneously the dependencies can be described as a
convolution. Neurovascular coupling mechanisms for instance can not be modeled
by a mere phase shi between neural and hemodynamic signal, as in the case of
the toy data example of kCCA. e hemodynamic response to a neural impulse in
a given neural population is a spatiotemporal pattern in voxel space. If one knows
this pattern, one can use it to deconvolve the voxel time series and obtain an esti-
mate of the underlying neural signal. Temporal kernel canonical correlation analysis
(tkCCA) [Bießmann et al., b] computes this convolution from high dimen-
sional multimodal data streams and thereby compensates for non-instantaneous
coupling between modalities.
Derivation e idea behind tkCCA is simple: As the correlations between Xand
Yare not instantaneous, the canonical correlation coefficient will be small for simul-
taneous samples. By shiing one data source back and forth in time (relative to the
other data source) by some relative time lag τ= [T,T+,. . . , , ...,T, T], the
correlation will become larger for the correct time lag. Ideally we would like to have
a time lag dependent weighting wx(τ)
R
Ufor one data source (the one shied
in time) that can compensate for delays or convolutions between data sources. e
canonical convolution wx(τ)should be such that there are high weights for those
time lags and features that maximize the correlation with the time series in Y. If
wx(τ)is estimated correctly the convolution of Xwith wx(τ)will compensate for
data-source specific differences in temporal dynamics. In the case of the neurovas-
cular coupling example the canonical convolution wx(τ)will lowpass filter the neu-
ral activity and bring the signal in temporal alignment with the hemodynamic ac-
tivity [Bießmann et al., b, Murayama et al., ]. In tkCCA the time resolved
68 CHAPTER 7. DATA-DRIVEN MULTIMODAL ANALYSIS
canonical convolution wx(τ)and the canonical direction wyare optimized such that:
argmax
wx(τ),wy
τ(wx(τ)Xτ)Ywy
τ(wx(τ)XτX
τwx(τ))w
yYYwy
(.)
where Xτdenotes the data in Xshied by a time lag τand wx(τ)is the canonical
convolution of Xat time lag τ. For τ= equation (.) reduces to the standard
CCA, as in eq. (.). In general tkCCA not only optimizes the canonical correlation
for simultaneous samples, but over a short time window specified by the range of τ.
To maximize eq. (.), we embed the data set Xinto a higher dimensional space
where the additional dimensions are occupied by copies of the original data, each
shied in time by a different τ:
˜
X=
Xτ=T
.
.
.
Xτ=T
R
M(T+)×L. (.)
By using the same time-delay embedding for the wx(τ),
˜
wx=
wx(τ=T)
.
.
.
wx(τ= +T)
R
M(T+)×(.)
we reduce the optimization problem in equation (.) to an ordinary CCA problem
in the embedded space that can be solved as in eq. (.) by maximizing the CCA
objective using the embedded data in ˜
X:
argmax
˜
wx,wy
˜
w
x˜
XYwy
˜
w
x˜
X˜
X˜
wxw
yYYwy
. (.)
e tkCCA solution is thus the CCA solution in the embedded space. Using linear
kernel CCA we can express the τ-dependent filter wx(τ)as an expansion of the data
points in ˜
Xand the respective other canonical projection wyas an expansion of the
data points in Yin complete analogy to equation (.)
˜
wx=
wx(τ=T)
.
.
.
wx(τ=T)
=˜
,
wy=. (.)
Or, remembering the structure of ˜
Xin eq. (.), the delay-dependent canonical
convolution wx(τ), can be recovered equivalently as expansion of the αvector using
the respective parts of ˜
X:
wx(τ) = Xτα. (.)
7.3. TEMPORAL KCCA (TKCCA) 69
Solving tkCCA thus requires to find the optimal dual coefficients αand β. Including
a regularizing ridge κxIand κyIfor each modality, the tkCCA solution αand βcan
be obtained by solving the generalized eigenvalue problem
[K˜
XKY
KYK˜
X][ α
β]=ρ[K
˜
X+κxI
K
Y+κyI][ α
β], (.)
where K˜
X=˜
X˜
Xis the linear kernel of the temporally embedded data matrix.
For non-linear kernel functions of course K˜
Xand KYare computed according to the
kernel function chosen. Note that the kernel computed from the embedded data ˜
Xis
ofthesamesizeasthekernelKX. usaerthekernel ˜
KXhasbeencomputedtkCCA
is computationally not more demanding than normal kCCA. Another advantage of
tkCCA is that cross-validation procedures or bootstrap re-samplings can operate
on the already computed kernel, which can speed up the re-samplings considerably.
Since tkCCA just leads to a single eigenvalue equation of the type given byeq. (.),
neither the canonical correlation ρnor the corresponding vectors αand βdepend
on τ. Nonetheless tkCCA can reveal the temporal dynamics between the data in
their respective canonical subspace. In analogy to a standard cross-correlogram for
univariate time series, we can compute a time lag dependent canonical correlogram
ρ(τ)by
ρ(τ) = wx(τ)Xτ,w
yY
wx(τ)XτX
τwx(τ)w
yYYwy
(.)
=αKτKYβ
αK
τα·βK
Yβ
, (.)
where Kτ=kx(Xτ,Xτ)is the kernel computed from Xτ, so in the case of linear ker-
nels Kτ=X
τXτ. e canonical correlogram ρ(τ)reflects the temporal dynamics in
kernel feature space Sbetween the two variables and can be considered as a multi-
variate generalization of the univariate cross-correlogram. An example is shown in
figure .D. It is worth mentioning that the canonical correlogram is different from
a usual correlogram between two one-dimensional time signals in that it is always
positive ². is is because CCA is ignorant with respect to the sign of the correlation;
possible negative correlations between the signals are encoded in the relative sign of
the canonical directions. So, to interpret the results of tkCCA, one always has to take
into account both the canonical correlogram ρ(τ)and the corresponding canonical
directions wx(τ)and wy. e temporal resolution of the correlogram is determined
by the spacing of the τ-values, in particular it can be higher than the temporal res-
olution of the signals in Y. Another important aspect is that while the shape of the
²Sincetheρ(τ)arenotdirectlycalculatedbykCCA,butbyeq.(.),thisisnotstrictlytrue,i.e.only
up to noise fluctuations.
70 CHAPTER 7. DATA-DRIVEN MULTIMODAL ANALYSIS
canonical directions in kernel feature space might be difficult to visualize, the tem-
poral dynamics as reflected in ρ(τ)can easily be computed in kernel feature space
using eq. .. In case the data is high dimensional this can offer a considerable
speedup of computations, if only the temporal dynamics of correlations (in kernel
feature space) are of interest. As tkCCA can be cast as a standard kCCA problem,
the statistical consistency shown in [Fukumizu et al., ] also applies to tkCCA.
0 2 4 6 8
−2
−1
0
1
2
Time
x(t)=sin(t)
Before tkCCA
After tkCCA
−2 0 2
−10
−5
0
5
10
τ
wx(τ)
Convolution
0 2 4 6 8
−2
−1
0
1
2
y(t)=cos(t)
Time −2 0 2
0
0.5
1
τ
ρ(τ)
Correlogram
−1 0 1
−1
−0.5
0
0.5
1
x(t)
y(t)
Correlation: 0.00
−1 0 1
−5
0
5
wx(τ) x(t)
wy
T y(t)
Correlation: 1.00
F
A
B
EC
D
Figure .: Toy data example for tkCCA on same data as in previous chapter; A:
First modality x(t) = sin(t)(in blue) and first canonical component obtained by
tkCCA (in purple); B: Second modality y(t) = cos(t);C: Canonical convolution
wx(τ);D: canonical correlogram ρ(τ);E: Raw signals x(t),y(t)are not correlated;
F: Aer convolving the signal x(t)with wx(τ), the resulting signal x(t)wx(τ)is
perfectly (anti-)correlated with y(t); the convolution with wx(τ)aligned the two
signals (see red time series in Band purple time series in A), but as CCA is inher-
ently blind to the sign of the correlation, the canonical correlation is negative;
7.4 Overview and comparison
ischapterintroduced adatadrivenapproachtomultimodalanalysisstarting from
CCA over kernel CCA to temporal kernel CCA. In order to differentiate these meth-
ods it is helpful to remember to which kind of data they are applicable. Table . lists
7.4. OVERVIEW AND COMPARISON 71
Dimensionality Dependencies Appropriate Method
CCA kCCA tkCCA
low instantaneous linear
high instantaneous linear ×
low/high instantaneous non-linear ×
low/high non-instantaneous non-linear × ×
Table .: Comparison of data-driven multimodal methods;
the different kinds of dependencies exhibited between modalities and the appropri-
ate CCA method.
If the data is low dimensional and has instantaneous linear dependencies, stan-
dard CCA will find the correct solution. e computational cost of standard CCA is
quadratic in the number of dimensions, while kernel CCA has quadratic cost with
respect to the number of data points. Most computers nowadays can deal with co-
variance matrices up to a few thousand dimensions fMRI data can easily have a
few ten thousand dimensions but the number of samples rarely exceeds a few hun-
dred. Here, linear kernel CCA offers a more robust and much faster way of comput-
ing canonical projections compared to standard covariance matrix based methods.
Next to the computational speed-up kernel CCA can detect non-linear dependen-
cies whereas standard CCA is restricted to linear dependencies. If the data is high
dimensional or exhibits non-linear dependencies, kernel CCA is better than stan-
dard CCA.
Most multimodal neuroimaging data has non-instantaneous couplings simul-
taneously recorded samples are not correlated. Importantly the coupling relation-
ship is not simply a phase shi, as in our toy data example. In this case, kernel
CCA will fail to find the optimally correlated (between modalities) representation
for each modality. Temporal kernel CCA will find the dependency structure for
non-instantaneously coupled data. In principle, standard CCA could also be used
for estimating spatiotemporal neurovascular coupling dynamics. But if one wants to
estimate a multivariate convolution for fMRI data, the estimation problem increases
by a factor of T, the number of time lags; for a ten thousand dimensional fMRI data
set with a temporal resolution of Hz a convolution of length  seconds thus re-
quires to solve a , dimensional problem. In this scenario, when we need to
account for convolutions between high dimensional data streams, tkCCA has a clear
advantage. Moreover using appropriate kernels, also non-linear dependencies can
be captured by tkCCA. is enables tkCCA to extract causal relationships between
multivariate and non-linearly coupled time series [Wu et al., ].
72 CHAPTER 7. DATA-DRIVEN MULTIMODAL ANALYSIS
Chapter 8
Model-free analysis of
neurovascular coupling
Ithe previouschapterapurelydatadrivenapproachtomultimodal neuroimaging
analysis, termed temporal kernel canonical correlation analysis (tkCCA), was
developed. e advantage of tkCCA is that it does not require the usual restrictive
assumptions about neurovascular coupling mechanisms. Most classical analysis ap-
proaches to fMRI data assume that every voxel reflects neural activity with the same
temporal dynamics typically modeled as a canonical HRF, see fig. .. In contrast
tkCCA can account for non-separable coupling dynamics, that is multivariate pat-
ternsin voxelspacethanchangeacrosstime lags. is chapterwill showapplications
of tkCCA to model-free estimation of neurovascular coupling parameters. A special
focus will be placed on the spatiotemporal dynamics of the hemodynamic response,
in particular spatiotemporally non-separable coupling dynamics. Using data from
simultaneous recordings of intracranial electrophysiological and hemodynamic ac-
tivity we applied tkCCA to estimate high dimensional spatiotemporal deconvolu-
tions that predict neurophysiological spectrograms from voxel volume time series
around the recording electrode. e scientific question that can now be addressed
is whether spatiotemporally non-separable dynamics of the BOLD response con-
tain neural information. If this is the case then tkCCA will predict neural activity
better than standard methods, which are based on the spatiotemporal separability
assumption. is chapter is structured as follows: First section . introduces the
data model used for the simulations. en section . presents the analysis settings
in which tkCCA was applied. Section . describes how the different classical anal-
ysis models have been computed for comparison with tkCCA. ereaer section
. shows applications of tkCCA on artificial data and comparisons with standard
methods. Section . summarizes the artifact cleaning procedure for simultaneous
electrophysiology and fMRI data that was implemented to run online during an ex-
perimental session. Finally section . will present applications of tkCCA to real
data and comparisons with standard methods.
74 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
Functional Form Matrix Form Meaning
z(t)z
R
×TEffective neural activity
x(f,t)X
R
F×TNeural band power
(Ffrequency bands, Ttime points)
y(s,t)Y
R
S×TVectorized fMRI signal
(Svoxels, Ttime points)
εGaussian i.i.d. noise N(,)
Table .: Nomenclature: Neural spectrograms x(f,t) = X
R
F×Tare measured
at time t {t,t,. . . ,tT}and in frequency bin f. Hemodynamic signals at each
time point tand at each location in voxel space s {s,s,. . . ,sS}is given by the
voxel activation y(s,t) = Y
R
S×T.
8.1 Asimplegenerativemodel forneurovascularcoupling
A simple generative model for multimodal neuroimaging data measured at a cer-
tain location sin the brain is depicted in figure . and can be described as fol-
lows: Brain activity z(t)underlying neurophysiological measurements as well as the
BOLD signal is reflected instantaneously in neural spectrograms with a frequency
band specific weighting α(f)plus some measurement noise
x(f,t) = α(f)z(t) + εx. (.)
Since not all aspects of neural signals are reflected in the BOLD signal we will refer
to z(t)as effective neural activity, meaning that part of neural activity that is reflected
in both hemodynamic and neural signals. e intuition behind the frequency de-
pendency α(f)is that some frequency bands are more correlated to the BOLD signal
than others [Logothetis et al., ]. e spatiotemporal dynamics of the hemody-
namic response can be modeled as a linear filter H(s,τ):
ˆ
y(s,t) =
τ,s
H(s,τ)z(t+τ) + εy. (.)
e parameter τ {,, . . . ,τmax}denotes the time lag of the hemodynamic re-
sponse relative to a neural event at time point t. When using this model in the fol-
lowing we will make, without loss of generality, three simplifying assumptions:
Band power representation of neural activity Neural activity x(f,t)is not mod-
eled in its raw form as voltage uctuations. Instead we consider only the slower dy-
namics of amplitude uctuations of oscillations within a certain frequency range.
e summed variance within a frequency band reflects the activity and thus energy
demands within a neural population. e exact shape of membrane potential fluc-
tuations can be important e.g. for spike sorting when looking for single cell activity.
8.1. A SIMPLE GENERATIVE MODEL FOR NEUROVASCULAR COUPLING 75
But band power modulations are closer to the neural mass-activity on the scale of a
voxel and preserve most of the information in the neural signal that is relevant for
modeling the BOLD signal.
Neurovascular coupling modeled as linear lter e motivation for this model
assumption was twofold: Using linear filters one can deal with high-dimensional
data efficiently while being able to visualize the filters easily. e framework pre-
sented is however not restricted to the linear case. In general one might use arbi-
trary non-linear kernel functions for tkCCA, the interpretations of the results would
however be more difficult. A more physiologically motivated reason to not use non-
linear kernel functions is that the temporal dynamics of the hemodynamic response
are well captured by a linear filter [Martindale et al., ].
in bands below 24 Hz (δ/θ,α,βbands) makes smaller
contributions to the BOLD signal changes. The estimated
optimal spatial filter (Fig. 2B), on the other hand, shows
that voxels within the gray matter (red stripe) of V1 make a
larger contribution to the maximized correlation between
the neural and fMRI signals. In the present study the γand
MUA contributions were greater than those of spiking
activity, which commonly represents a small group of cells.
The strong contribution of the γand MUA bands was
observed in most cases (see also Fig. 3A), and it confirms
previous results showing that the highest correlation
between the neural and the BOLD responses occurred in
the 30- to 140-Hz range of the neural signal in intracranial
recordings in the anesthetized monkey [11] and in human
patients [18,19].
In order to compare tkCCA with conventional correlation
analysis, we estimated the canonical correlogram as defined
in Eq. (3). This canonical correlogram is similar to the
univariate cross-correlogram,exceptthatitisstrictly
positive. However, the right sign of correlation can be
identified by looking at the signs of the projection
coefficients (see Methods section). Much as with the
correlogram computed in mass-univariate approaches, the
peak of the correlogram was found at a lag of about 5 s,
suggesting that, at least in area V1, the time-to-peak of the
BOLD response occurs 5 s after the onset of neural activity.
Fig. 1. Recording hardware, activation maps and analysis principles for combined physiology-fMRI experiments. (A) The upper column shows anatomical
images and the multichannel electrode used in this study. Blue arrows represent positions of recording sites on the electrode; 7 out of 10 channels are visible. The
lower column shows the BOLD responses (Pb.001) tested by means of a full-field visual stimulus. Note the minimal susceptibility artifacts caused by the
electrode; no distortions are evident in the activation map. (B) Spontaneous changes in the amplitude of different frequency bands. Depicted are the bands δ/θ,α,
β,γL, γ,γH, γVH, MUA (multi-unit activity) with 18, 812, 1224, 2440, 4060, 60100, 120250 and 10003000 Hz, respectively. Spikes were
extracted by detecting zero crossings and thresholding the high-pass signal. Arrows show examples of synchronous γand spiking activation. The entire
frequencytime matrix was used for tkCCA. (C) Spontaneous changes in the BOLD signal in the regions of V1, defined on the basis of anatomical criteria in each
slice. The upper trace shows the average time course of BOLD fluctuations from all voxels seen below.
5Y. Murayama et al. / Magnetic Resonance Imaging xx (2010) xxxxxx
ARTICLE IN PRESS
in bands below 24 Hz (δ/θ,α,βbands) makes smaller
contributions to the BOLD signal changes. The estimated
optimal spatial filter (Fig. 2B), on the other hand, shows
that voxels within the gray matter (red stripe) of V1 make a
larger contribution to the maximized correlation between
the neural and fMRI signals. In the present study the γand
MUA contributions were greater than those of spiking
activity, which commonly represents a small group of cells.
The strong contribution of the γand MUA bands was
observed in most cases (see also Fig. 3A), and it confirms
previous results showing that the highest correlation
between the neural and the BOLD responses occurred in
the 30- to 140-Hz range of the neural signal in intracranial
recordings in the anesthetized monkey [11] and in human
patients [18,19].
In order to compare tkCCA with conventional correlation
analysis, we estimated the canonical correlogram as defined
in Eq. (3). This canonical correlogram is similar to the
univariate cross-correlogram,exceptthatitisstrictly
positive. However, the right sign of correlation can be
identified by looking at the signs of the projection
coefficients (see Methods section). Much as with the
correlogram computed in mass-univariate approaches, the
peak of the correlogram was found at a lag of about 5 s,
suggesting that, at least in area V1, the time-to-peak of the
BOLD response occurs 5 s after the onset of neural activity.
Fig. 1. Recording hardware, activation maps and analysis principles for combined physiology-fMRI experiments. (A) The upper column shows anatomical
images and the multichannel electrode used in this study. Blue arrows represent positions of recording sites on the electrode; 7 out of 10 channels are visible. The
lower column shows the BOLD responses (Pb.001) tested by means of a full-field visual stimulus. Note the minimal susceptibility artifacts caused by the
electrode; no distortions are evident in the activation map. (B) Spontaneous changes in the amplitude of different frequency bands. Depicted are the bands δ/θ,α,
β,γL, γ,γH, γVH, MUA (multi-unit activity) with 18, 812, 1224, 2440, 4060, 60100, 120250 and 10003000 Hz, respectively. Spikes were
extracted by detecting zero crossings and thresholding the high-pass signal. Arrows show examples of synchronous γand spiking activation. The entire
frequencytime matrix was used for tkCCA. (C) Spontaneous changes in the BOLD signal in the regions of V1, defined on the basis of anatomical criteria in each
slice. The upper trace shows the average time course of BOLD fluctuations from all voxels seen below.
5Y. Murayama et al. / Magnetic Resonance Imaging xx (2010) xxxxxx
ARTICLE IN PRESS
X
Z
Y
!" #"" #!" $"" $!"
#"""!%&"""
%#$"!%#"""
%%'"!%%#$"
%%("!%%%'"
%%$(!%%%("
%%#$!%%%$(
%%%)!%%%#$
*+,-%./0
1234%.560
Neural Activity
Intracranial
Electrophysiology
in bands below 24 Hz (δ/θ,α,βbands) makes smaller
contributions to the BOLD signal changes. The estimated
optimal spatial filter (Fig. 2B), on the other hand, shows
that voxels within the gray matter (red stripe) of V1 make a
larger contribution to the maximized correlation between
the neural and fMRI signals. In the present study the γand
MUA contributions were greater than those of spiking
activity, which commonly represents a small group of cells.
The strong contribution of the γand MUA bands was
observed in most cases (see also Fig. 3A), and it confirms
previous results showing that the highest correlation
between the neural and the BOLD responses occurred in
the 30- to 140-Hz range of the neural signal in intracranial
recordings in the anesthetized monkey [11] and in human
patients [18,19].
In order to compare tkCCA with conventional correlation
analysis, we estimated the canonical correlogram as defined
in Eq. (3). This canonical correlogram is similar to the
univariate cross-correlogram,exceptthatitisstrictly
positive. However, the right sign of correlation can be
identified by looking at the signs of the projection
coefficients (see Methods section). Much as with the
correlogram computed in mass-univariate approaches, the
peak of the correlogram was found at a lag of about 5 s,
suggesting that, at least in area V1, the time-to-peak of the
BOLD response occurs 5 s after the onset of neural activity.
Fig. 1. Recording hardware, activation maps and analysis principles for combined physiology-fMRI experiments. (A) The upper column shows anatomical
images and the multichannel electrode used in this study. Blue arrows represent positions of recording sites on the electrode; 7 out of 10 channels are visible. The
lower column shows the BOLD responses (Pb.001) tested by means of a full-field visual stimulus. Note the minimal susceptibility artifacts caused by the
electrode; no distortions are evident in the activation map. (B) Spontaneous changes in the amplitude of different frequency bands. Depicted are the bands δ/θ,α,
β,γL, γ,γH, γVH, MUA (multi-unit activity) with 18, 812, 1224, 2440, 4060, 60100, 120250 and 10003000 Hz, respectively. Spikes were
extracted by detecting zero crossings and thresholding the high-pass signal. Arrows show examples of synchronous γand spiking activation. The entire
frequencytime matrix was used for tkCCA. (C) Spontaneous changes in the BOLD signal in the regions of V1, defined on the basis of anatomical criteria in each
slice. The upper trace shows the average time course of BOLD fluctuations from all voxels seen below.
5Y. Murayama et al. / Magnetic Resonance Imaging xx (2010) xxxxxx
ARTICLE IN PRESS
High resolution
fMRI signals
Time [s]
Electrode
Figure .: Illustration of the data analyzed in this chapter, for experimental de-
tailssee appendixBor[Murayamaetal.,]; neural activityZisinstantaneously
reflected in electrophysiological measurements X(le); a convenient representa-
tion of electrophysiological signals are amplitude uctuations of oscillations in
different frequency bands; note the band specific sensitivity to visual stimulation,
high frequencies of neural activation are strongly modulated by the visual stimu-
lus; time series of fMRI images Y(right) were recorded in the entire primary and
parts of secondary visual cortices;
76 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
Same temporal resolution across modalities In the examples presented in the
following the modalities are assumed to have the same temporal resolution. In gen-
eral the temporal resolution of tkCCA depends only on the temporal resolution of
the faster sampled signal. One can use a sliding window approach as in [Bießmann
et al., b] to compute canonical convolutions with a temporal resolution higher
than that of the fMRI image acquisition.
8.2 Predicting neural activity from fMRI and vice versa
TkCCA computes for each modality a new representation such that the two modal-
ities are maximally correlated. As the modalities of interest, electrophysiological os-
cillations and fMRI signals, are coupled via complex neurovascular coupling mech-
anisms, the temporal coupling dynamics have to be accounted for by equipping one
canonical direction with a temporal dimension. When applying tkCCA to simul-
taneous recordings of neural and hemodynamic activity one has to decide which
modality should be temporally embedded, see eq. (.); for this modality tkCCA
yields a time resolved canonical direction. As we can only use relative time lags
between sources only one canonical direction at a time is allowed to have a tem-
poral resolution. Both approaches temporally align the two signals, but the choice
of the modality to be embedded determines what conclusions can be drawn from
the tkCCA result. For the two modalities in our data set there are two options. If
one embeds the electrophysiological band power signal x(f,t), tkCCA computes
a canonical convolution wx(τ)
R
F×τmax and a canonical projection wy
R
S.
is allows for investigating the temporal coupling dynamics of different frequency
bands of neural signals with static voxel patterns. e other option is to embed
the fMRI signal y(s,t)such that tkCCA computes a spatiotemporal deconvolution
wy(τ)
R
S×τmax and a frequency band projection wx
R
F. is allows for ana-
lyzing how the spatiotemporal coupling dynamics of the hemodynamic response in
voxel space is related to a static neural spectrogram pattern. Both analysis settings
are illustrated in figure .:
APredicting voxel patterns from time windows of neural spectrograms
In the first setting (see fig. .A) an estimate ze(t)of effective neural activity is
obtained by convolving the electrophysiological spectrograms x(f,tτ)with
the canonical convolution wx(τ). Along with the convolution wx(τ)tkCCA
estimates a canonical direction wy
R
Fin voxel space. Projecting the fMRI
signal onto its canonical direction wyyields an estimate of effective neural
activity zf(t)based on the fMRI time series y(s,t)
ze(t) =
τ
wx(τ)x(f,tτ)(.)
zf(t) =w
yy(s,t).
8.2. ANALYSIS SETTINGS 77
e correlation between ze(t)and zf(t)is the canonical correlation. e time
window used for the convolution is indicated by a grey frame in fig. .A,
the time point which is predicted is indicated as a grey vertical line at t=.
Here the temporal dynamics of neurovascular coupling modeled as H(s,t)in
eq. (.) are reflected in the temporal structure of wx(τ)while the spatial dy-
namics are expressed in the canonical projection wy. e spatial activation
pattern wyin voxel space is invariant over time; an advantage of this setting
is that one can determine the temporal dynamics of neurovascular coupling
with respect to different frequencies of neural oscillations. e convolution
wx(τ)effectively aligns the neural spectrograms with the fMRI time course,
similar to a convolution of a stimulus with a standard canonical HRF (see also
fig. .). Examples of wx(τ)estimated on real data are depicted in figure ..
BPredicting neural signals from time windows of fMRI time series
Hemodynamic activity at time points t+τcan be used to predict neural ac-
tivity at time point t(see fig. .B):
ze(t) =w
xx(f,t)(.)
zf(t) =
τ
wy(τ)y(s,t+τ).
Here the hemodynamic response pattern wy(τ)reflects the inverse of the cou-
pling dynamics H(s,t)modeled in the data generating model in eq. (.); wx
reflects the frequency dependency pattern α(f). As this setting gives insights
into the spatiotemporal dynamics of the hemodynamic response a special fo-
cus will be placed on this approach in the following application examples.
e choice of the time lags τused for the embedding is important: Neural signals
are assumed to be the source of fMRI signals. us it would make little sense to use
electrophysiological signals at t+τfor predicting fMRI signals at time tunless there
is some periodicity in both signals, e.g. due to a stimulus. Similarly it would not
make sense to predict neural activity at time tfrom fMRI signals at time points tτ.
e physiologically most sensible choice of time lags is depicted in figure .. When
predicting fMRI signals at time point tit is reasonable¹ to take into account neural
signals at tτwhere τ<s. Analogously when predicting neural signals at time
point tfrom fMRI data it is reasonable to include data at time lags t+τ,τ<s.
¹e hemodynamic response function decays within approximately s [Martindale et al., ]
78 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
X
Z
YX
Z
Y
A B

ćJT DIBQUFS XJMM TVNNBSJ[F UIF BQQMJDBUJPOT PG UL$$" JO UIF DPOUFYU PG NVMUJ
NPEBM OFVSPJNBHJOH EBUB BOBMZTJT 8F IBWF XZ(Ѭ)XY(Ѭ)

ćJT DIBQUFS XJMM TVNNBSJ[F UIF BQQMJDBUJPOT PG UL$$" JO UIF DPOUFYU PG NVMUJ
NPEBM OFVSPJNBHJOH EBUB BOBMZTJT 8F IBWF XZ(Ѭ)XY(Ѭ)
25 15 5 5 15 25
4
3
2
1
x(t)
25 15 5 5 15 25
5
4
3
2
1
y(t)
Time [a.u.]
wy
Ty(t)
x(t) wx(τ)
25 15 5 5 15 25
4
3
2
1
x(t)
25 15 5 5 15 25
5
4
3
2
1
y(t)
Time [a.u.]
y(t) wy(τ)
wx
Tx(t)
Figure .: Illustration of two approaches to model-free neurovascular coupling
analysis using tkCCA; A: Time-frequency convolutions wx(τ)of neural signals at
time points tτ(grey area) can predict a voxel pattern wyat time point t(grey ver-
tical line); B: Spatiotemporal deconvolutions wy(τ)of fMRI signals at time points
t+τ(grey area) predict spectrograms of neural activity at time point t(grey ver-
tical line);
8.3. COMPARISONS WITH OTHER METHODS 79
8.3 Comparisons with other methods
We compared tkCCA with standard methods introduced in chapter in simula-
tions and real data experiments. In the comparisons we consider here only the set-
ting in fig. .B and predict neural activity from fMRI signals. For each method m
we compare the tkCCA deconvolution w
y(τ)with a spatiotemporal deconvolution
wm
y(τ)constructed according to the assumptions underlying that analysis method,
see fig. .. Almost² all standard methods assume spatiotemporal separability of
the hemodynamic response. According to this assumption the spatiotemporal de-
convolutions wm
y(τ)
R
S×Nτcan be written as the product of two independent
functions wm
y
R
S×and wτ
R
×Nτof space and time, respectively,
wm
y(τ) = wm
ywτ. (.)
e voxel pattern wm
ydescribes the spatial dynamics of the hemodynamic response
H(s,τ)and the temporal dynamics are captured by wτ. In agreement with the as-
sumption of a canonical HRF the temporal neurovascular coupling dynamics wτ
were the same in all models. All separable models were constructed as follows: An
optimal temporal filter wτand multivariate voxel pattern wmv
ywere obtained from
the non-separable estimate w
y(τ)using singular value decomposition (SVD), see
e.g. [Strang, ]. Examples of wτobtained from real data are plotted in fig. .A.
e temporal filter wτwas used for all separable models, the spatial filters were esti-
mated for each model separately. For doing so, we first create a univariate regressor
ze(t)of neural activity from the multivariate spectrograms as described in eq. (.).
e frequency combination wxused was the optimal one computed by tkCCA. is
univariate regressor was then convolved with the optimal temporal filter wτto ac-
count for the temporal coupling dynamics. e different spatial dynamics wm
ywere
estimated as follows:
. Mass-univariate unimodal methods (see section ., eq. (.))
e mass-univariate spatial filter wmu
yweights each voxel at position sby a
factor wmu
ys according to its correlation with the neural regressor ze(t):
wmu
ys =Corr(ze(t),y(s,t)) (.)
. Multivariate unimodal methods (see section ., eq. (.))
emultivariatespatialfilterwmv
yisthe linearcombinationofvoxelsthatmax-
imizes the correlation between fMRI signals and neural regressor ze(t):
wmv
y=argmax
wy
Corr(ze(t),w
yy(s,t))(.)
where wmv
yis the spatial factor obtained by a SVD of w
y(τ).
²Exceptions are e.g. [Aguirre et al., , Lu et al., , Mourão-Miranda et al., ]
80 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
. Unsupervised unimodal methods (see section .)
e PCA spatial lter is estimated according to the PCA objective in eq. (.):
wpca
y=argmax
wy
Var (w
yy(s,t))(.)
Note that for the estimation of wpca
ythe electrophysiological activity is not
taken into account.
. Spatial average model
As a control condition we computed the spatially averaged fMRI signal in the
spherical ROI mask around the recording electrode:
wiso
y(s,τ) =
Swτ(.)
where Sdenotes the number of voxels within the spherical ROI. e decon-
volution wiso
y(τ)is equivalent to the spatial smoothing oen performed as a
preprocessing step in fMRI analyses.
Neural
Signal
fMRI
Signal
HRF
Time
HRF
Time Time
Data
Model
(Assumed)
Prediction
Model
(Learned)
HRF
Time
Prediction
Correlation Canonical
Correlation
τ=2
τ=1
τ=0
τ=2
τ=1
τ=0
τ=2
τ=1
τ=0
τ=2
τ=1
τ=0
Isotropic Smoothing
Mass-univariate
Multivariate Separable
tkCCA
Accounts for
Isotropic ROIs
Single voxels
Voxel patterns
Time-Voxel patterns
Ignores Statistical
Dependencies
Between Voxels
YES
YES
NO
NO
Assumes
Spatiotemporal
Separability of HRF
YES
YES
YES
NO
Figure .: Comparison of classical methods and tkCCA; in red a neural im-
pulse and corresponding fMRI signal in voxels at τ=,, samples aer the neu-
ral event; in space-time separable models a neural regressor is convolved with a
canonical HRF and combined with a static spatial activation pattern the spa-
tial patterns are invariant over time and the HRF identical for all voxels; spatial
smoothing weights voxels equally; mass-univariate approaches assume statistical
independence of voxels and analyze single voxel time series separately; multivari-
ate separable models predict neural signals from multivariate voxel patterns; in
contrast to space-time separable models tkCCA estimates a non-separable spa-
tiotemporal hemodynamic activation pattern.
8.4. APPLICATIONS OF TKCCA TO SIMULATED DATA 81
8.4 Applications of tkCCA to simulated data
e following simulations illustrate tkCCA and how it compares to standard meth-
ods on artificial data. In later chapters tkCCA is used to estimate the spatiotem-
poral dynamics of hemodynamic activation, according to the setting illustrated in
fig. .B. Also the simulations will be based on this setting. e simulated data con-
tains two different time series designed to match the main characteristics of real
multimodal neuroimaging data: Variable x(f,t)represents the neural band power
in different frequency bands fand time bins tat an electrode position s. Variable
y(s,t)denotes the fMRI signal at location sin a two dimensional image patch. For
the sake of simplicity we assume only univariate hidden variables representing a
neural process z(t)which is reflected in both modalities. Following the graphical
model depicted in fig. . the data is generated from this effective neural activity
z(t), here simply gaussian noise³, drawn from N(, ). From this hidden effective
neural activity z(t)electrophysiological measurements x(f,t)are generated by
x(f,t) = (γ)α(f)z(t) + γ εx(.)
and hemodynamic measurements y(s,t)are generated by
y(s,t) = (γ)
τ
H(s,τ)z(tτ) + γ εy. (.)
Note that the noise εx,εyis independently drawn but the noise amplitude was con-
trolled by the same parameter γ. e hemodynamic response function H(s,τ)was
constructed such that its spatiotemporal dynamics were non-separable in space and
time. is is physiologically plausible given the complexity of cortical microvas-
culature (see fig. .) and a variety of factors influencing the BOLD response it is
unlikely that neurovascular coupling mechanisms are spatiotemporally separable.
Experimental evidence for non-separable hemodynamic response patterns can be
found in e.g. [Berwick et al., ]. A simple non-separable hemodynamic response
was constructed as the sum of two separable functions, one with a fast transient pos-
itive response in an +-shaped voxel pattern and another one reflecting a delayed
undershoot in a ×-shaped spatial pattern
H(s,τ) = h+(s)hI(τ) + h×(s)hU(τ)(.)
resulting in the dynamics presented in figure ..
³We also tested the effect of periodic signals in z(t), such as stimulus induced variations during
block stimulation, a common experimental paradigm in fMRI studies; simulations show that tkCCA
will find the true coupling dynamics, even in the presence of periodic experimental-stimulus-like
oscillations [Bießmann et al., b].
82 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
====
Initial Response
0 5 10
0.5
0
τ
H(s,τ=3) H(s,τ=6) H(s,τ=9)
Undershoot
=
Nonseparable
Spatiotemporal
Dynamics
0 5 10
0
0.5
1
τ
H(s,τ)
H(s,τ=1) H(s,τ=2) H(s,τ=4)
+
0 5 10
0
τ
H(s,τ=1) H(s,τ=3) H(s,τ=6)
Figure .: Artificial spatiotemporal hemodynamic response function H(s,τ)
used for generating synthetic fMRI time series data according to eq. (.); two
separable space-time dynamics, an initial response H+(s,τ)with a +-shaped
spatial profile and a later undershoot H×(s,τ)with ×-shaped spatial profile
were added together yielding a spatiotemporally non-separable hemodynamic re-
sponse function H(s,τ);
Parameter Value
Noise level γ[. . . .]
Number of data points N[  ]
Number of voxels S[]
length of convolution τmax 
Table .: Simulation parameters
Simulation parameters In the simulations all relevant parameters were varied
in order to test the performance of tkCCA. e parameter values are listed in table
.. We explored a parameter range that is realistic in terms of number of data points
N, in fMRI experiments typically around , and dimensionality of y(s,t), typically
the number of voxels is between  and . e length of most canonical HRF
models used in standard fMRI analysis is below swhich corresponds at a temporal
resolution of fMRI signals of .Hz to a maximal time lag of τmax= samples.
tkCCA runtime scales
quadratically with the
number of data points ...
Effect of voxel number and sample size on tkCCA runtime Figure.showsthe
runtime of tkCCA for different numbers of data points and dimensions on a stan-
dard laptop. As expected from eq. . the runtime scales quadratically with the
number of data points. If needed, this quadratic increase can be compensated for by
not using the full kernel matrices but low rank approximations thereof as suggested
in [Bach and Jordan, ]. Since in most fMRI experiments the number of data
points is around  (dashed red line in fig. .A) this quadratic scaling is negligible
in practice. Note that the voxel number does not influence the actual tkCCA opti-
8.4. APPLICATIONS OF TKCCA TO SIMULATED DATA 83
0 1 2 3 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
log10(Number of Voxels S)
Runtime [s]
200 Data Points
400 Data Points
600 Data Points
Log Scaling
200 300 400 500 600
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Data points N
Runtime [s]
112 Voxels
312 Voxels
512 Voxels
1012 Voxels
A B
Figure .: Runtime of tkCCA; A: Runtime as function of number of data points
N; different shades of grey indicate the number of voxels (); all
linesfall oneach otherasthe number ofvoxelsdoes notinfluence runtime; tkCCA
scales quadratically with the number of data points; since fMRI data sets have less
around  images (dashed red line), this quadratic scaling is negligible in prac-
tice; B: Runtime of the tkCCA optimization does not increase with dimensionality
of the data; different shades of grey indicate the number of data points (-);
logarithmic scaling with voxel number is indicated by dashed red line;
mization runtime; all lines fall on top of each other in fig. .A. is follows directly
from the tkCCA formulation in eq. .. Aer computing the temporal embedding
and the kernel matrices, tkCCA complexity does not depend on the dimensionality
of the data anymore. Figure .B shows the same data as in panel A, but here the x-
axis shows the number of voxels on a logarithmic scale, from to voxels. e
number of data points is indicated as different shades of grey. Increasing the num-
ber of voxels does not influence runtime. Logarithmic scaling is indicated as red
dashed line. Note that the true dimensionality of the problem is number of voxels ×... but tkCCA runtime is
independent of the data
dimensionality
number of time lags; when dealing with , voxels and  time lags this amounts
to ,×=. Most fMRI data sets typically have a temporal resolution
of .-Hz, meaning one image acquisition (oen called time of repetition or TR)
takes s. At a temporal resolution of .Hz  samples amount to  minutes
of fMRI signal. Most standard fMRI data sets contain under  images but more
than , voxels. ese data sets can be processed with tkCCA, including cross-
validation across several hyper-parameters, in under a minute. is highlights the
potential of tkCCA for online applications.
Accuracy of estimated coupling features Figure . shows the estimated spa-
tiotemporal deconvolutions wm
y(τ)for τ= and τ=, each model min a separate
line. e top line depicts the true non-separable spatiotemporal hemodynamic re-
sponse function H(s,τ). Results for two different noise levels, γ=. and γ=.,
84 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
wy(τ=2) wy(τ=11)
wy(τ=2) wy(τ=11)
True spatiotemporal
impulse response
tkCCA
Multivariate
Mass-univariate
PCA
Separable Models
Noise = 0.2 Noise = 0.7
Figure .: Examples of spatiotemporal deconvolutions wy(τ)estimated on arti-
ficial data generated according to eq. (.) and (.); for all analysis approaches
we plotted the spatial pattern at time lags τ= and τ= for two different noise
levels (. and .); the top row shows the true pattern of the coupling dynamics
H(s,τ); number of voxels was , number of data points used was ; none of
the separable models finds the true spatiotemporal dynamics, while tkCCA re-
covers the true dynamics even at high noise levels, high dimensional data and few
data points;
are shown. e spatiotemporal pattern estimated by tkCCA captures the dynam-
ics of the true HRF H(s,τ). Separable models cannot capture the spatiotemporalIn contrast to standard
methods tkCCA
captures non-separable
spatiotemporal
dependencies
dynamics of the hemodynamic response function used for the data generation (see
fig. .). e spatiotemporal deconvolutions estimated with methods implicitly as-
suming spatiotemporal separability are merely a stationary projection wyin voxel
space. In the best case (i.e. low noise conditions, shown in le panels), the de-
convolutions of the separable models wmv
y,wmu
yand wiso
ycapture a superposition of
both dynamics, the transient +-shaped pattern and the delayed ×-pattern. In con-
trast tkCCA learns the correct dynamics H(s,τ). Figure . shows in the bottom
row the similarity between the true coupling H(s,τ)and the estimated deconvolu-
tion averaged across  simulations for each parameter setting. Separable models
consistently fail to capture the coupling dynamics H(s,τ). e least flexible model
wiso
y, corresponding to spatial averaging of the fMRI signal in combination with a
canonical HRF, reflects the coupling dynamics worst. e non-separable deconvo-
lution estimated by tkCCA in contrast learns the correct spatiotemporal dynamics
from few data points and high dimensional, noisy data.
8.4. APPLICATIONS OF TKCCA TO SIMULATED DATA 85
Accuracyofneuralactivity predictions etoprowoffigure.showsthecanon-
ical correlation between true and estimated neural activity, averaged across  sim-
ulations. In each panel the top row shows the correlation between neural and esti-
mated neural activity deconvolved from artificial fMRI data y(s,t)using the model
specific spatiotemporal filter wm
y(s,τ). TkCCA (red line) predicts neural activity
best in all parameter settings. Mass-univariate methods (MU) and spatial averag-
ing (AVG) perform equally well, PCA based deconvolutions predict the underlying
neural activity worst.
0.2 0.4 0.6 0.8 1
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Canonical Correlation
N=600, log10(Voxels)=3
0.2 0.4 0.6 0.8 1
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 − Noise level
Similarity wy(τ)
2 2.5 3 3.5 4
0.5
0.6
0.7
0.8
0.9
1Noise=0.30, N=400
2 2.5 3 3.5 4
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
log10(Voxels)
200 300 400 500 600
0.5
0.6
0.7
0.8
0.9
1
Noise=0.30, log10(Voxels)=4
200 300 400 500 600
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
N (Data points)
tkCCA MV MUC PCA AVG
Figure .: Overview of simulation results for (from le to right) different levels
of noise, increasing voxel number and increasing number of data points N; top
row shows canonical correlation between artificial fMRI time series and neural
signals; plotted are results obtained with tkCCA (symbol), multivariate decon-
volutions wmv
y(τ)(symbol), mass-univariate wmu
y(τ)(×symbol), PCA wpca
y(τ)
(symbol) and the spatial average fMRI signal wiso
y(τ)(symbol); bottom row
shows the correlation coefficient between the true coupling dynamics H(s,τ)and
the estimated spatiotemporal deconvolution of each model wm
y(τ); note that MV,
MU, PCAandAVGanalyses wereprovidedwithoptimal temporaldynamicsprior
to the respective estimation of the spatial dynamics; tkCCA clearly outperforms
separable methods in terms of prediction accuracy of neural signals as well as ac-
curacy of the estimated deconvolutions;
86 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
8.5 Removal of fMRI scanner gradient artifacts
An important step in multimodal data analysis is artifact removal. When recording
electrophysiological signals during fMRI signal acquisition, the magnetic fields of
the scanner induce currents in the electrodes which are much stronger than the cur-
rents induced by the electromagnetic fields of the brain. Examples of scanner gradi-
ent induced artifacts in neurophysiological recordings are shown in fig. .A, C and
E. As part of this dissertation a realtime system was developed that removes these
artifacts online during an experiment. For artifact removal, the electrophysiological
timeseriesis splitintoepochswhose length matchesthe length oftheacquisitionpe-
riod of one fMRI image (see fig. . C and E). As the trigger signal sent by the fMRI
scanner is not accurate enough we chose to split the epochs according to an em-
pirical threshold on the first derivative of the electrophysiological signal. e steep
rise at the beginning of a scanner readout, aer the slice selector signal, served as a
reference point for temporal alignment of all epochs. e epochs xi,i= [, ...,L],
Lbeing the total number of fMRI volumes acquired, are then stacked as row vectors
in a data matrix X= [x
,x
,. . . ,x
L]. On the resulting data matrix we com-
pute linear kernel PCA. e first principal component is the strongest artifact (see
0 50
−5
0
5
Time [s]
[s.d. units]
Raw Signal
Visual Stimulus 0 0.1 0.2
−5
0
5
Time [s]
[s.d. units]
Raw Signal
0 0.05
Epoch 1
Epoch 2
Epoch 3
Time [s]
[s.d. units]
Epochs
0 0.025 0.05 0.075
PC_3
PC_2
PC_1
Principal Components
Time [s] 0 0.1 0.2
−5
0
5
Time [s]
[s.d. units]
Clean Signal
0 50
−10
0
10
Time [s]
[s.d. units]
Clean Signal
A EC
D FB
Figure .: Example of PCA based artifact removal from neurophysiological
recordings recorded during fMRI data acquisition; A: example of raw contam-
inated time series (in blue) in units of standard deviation; neural responses to
visual stimulation (in red) are much smaller than the artifacts and cannot be de-
tected B: Clean signal aer artifact removal, neural activity robustly follows visual
stimulation; C: Signal from A (magnified in E) split into single epochs (indicated
as background in E/F); note the high amplitude artifacts; D: first three principal
components computed from epochs in C; E: epochs from C concatenated before
cleaning; F: same as E aer subtraction of the PCs plotted in D;
8.6. APPLICATIONS OF TKCCA TO REAL DATA 87
fig. .D). We then retain all but the seven⁴ strongest (artifactual) components us-
ing eq. (.). e cleaning procedure relies on the assumption that the artifacts are
temporally aligned in each epoch, as opposed to the neural signal whose phase is not
correlated with that of the scanner acquisition. Temporal alignment of the artifact
means that there is a direction in data space where all the artifacts from the different
epochs can sum up to a very large signal, while the un-aligned neural activity cancels
out. erefore the principal components corresponding to the largest eigenvalues
will capture the artifact. Figure .D shows examples of the first three principal
components reflecting the artifact. Aer projecting out the artifactual components
we concatenate the epochs to recover a continuous time series (fig. .F). e effect
of the cleaning procedure becomes evident when looking at stimulus induced activ-
ity: Figure .B shows the same data as in A aer cleaning. Neural activity in the
stimulus on-periods is clearly stronger than in the stimulus-off periods.
8.6 Applications of tkCCA to real data
is section summarizes applications of tkCCA to multimodal data for the analysis
of neurovascular coupling mechanisms. Appendix B lists the experimental details of
the setup. In short, neural activity was measured simultaneously with intracranial
electrophysiological recordings and high resolution fMRI in primary visual cortex.
Evoked activity was recorded during a classical block stimulation paradigm. e vi-
sual stimulus was a rotating checkerboard covering the full visual field, presented for
seconds followed by a blank screen presented for  seconds. Spontaneous activity
was acquired during complete darkness (for details see [Logothetis et al., ]). For
each session we computed neural spectrograms x(f,t)and voxel time series y(s,t)in
a spherical voxel volume around the recording electrode. Prior to the voxel extrac-
tion the spherical region of interest (ROI) was masked with an anatomical mask dis-
carding non-brain structure. Two separate analyses were conducted, corresponding
to the two complementary views of eq. (.) and eq. (.). In a first step we used
tkCCA to estimate a convolution wx(τ)from multivariate spectrograms of neural
activity to high dimensional fMRI image time series (see fig. .A). is allows to
investigate the time frequency dynamics of the neurovascular coupling process, and
the spatial dynamics in fMRI voxel space. In a second step we estimated spatiotem-
poral deconvolutions wy(τ)of fMRI image time series to predict band power modu-
lations of neural activity (see fig. .B). is approach enables us to (a) estimate spa-
tiotemporally non-separable neurovascular coupling dynamics and (b) test whether
the spatiotemporal variability of the coupling dynamics contain information about
neural activity.
⁴Aer rejection of the seven strongest components, the neural frequency spectrum did not show
artifactual peaks and followed a /f spectrum.
88 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
Optimizing preprocessing parameters
Spatial smoothing has become a standard preprocessing operation for any fMRI
analysis. e correct choice of smoothing parameters is crucial especially in the
following analyses. If high frequency multivariate spatial structure contains infor-
mationaboutneural processes, itwouldbe disadvantageoustosmooth thisstructure
away. On the other hand, if a spatiotemporal deconvolution of over , voxels
and  seconds at a Hz temporal resolution is estimated, the robustness of the es-
timates prots from smoothing as the danger of overfitting to noise fluctuations is
decreased. Temporal and smoothing parameters were optimized using leave-one-Preprocessing
parameters are
optimized using
cross-validation
out cross-validation. For each smoothing parameter setting we computed tkCCA
on a training data set (all sessions of an experiment except the test data session) and
computed the canonical correlation between estimated and true neural activity on
a test data set that was not used for estimating the hemodynamic response patterns.
Smoothing in time as well as smoothing in space was implemented by a parzen func-
tion that weighted each sample according to the distance of each voxel sito a center
voxel sor the distance between the time point tiand the temporal window center t.
e parzen function parzen(d,ω)used here is also referred to as de la Valle-Poussin
window and is defined as
parzen(d,ω) = {(d
ω)if d>ω
(d
ω)+(d
ω)if dω
(.)
for a distance d=|did|(dibeing the ith voxel or time point) from the window
center dand a window size ω. e parameters tested are listed in table . for fMRI
data with a temporal resolution of Hz and mmvoxel size. SpatiotemporalSpatiotemporal
smoothing of the data
imposes a prior on the
deconvolution estimates
smoothing can be interpreted as imposing a prior on the tkCCA solution: Aer fil-
teringaway thehigh frequency structurein timeand/orspace, thetkCCAestimation
procedure will not be distracted by these high frequency structures and only find de-
pendencydynamicsbelowthespatiotemporalcutofffrequencies. Figure.A/Blists
the results for prediction of fMRI activity from neural data and fig. .C/D shows
results for prediction of neural activity from fMRI activity. e canonical correla-
tions on test data for prediction of fMRI from neural signals is highest for temporal
lowpass filtering at .-.Hz and spatial smoothing up to a smoothing kernel width
of mm larger smoothing kernels result in worse performance. e canonical cor-
relation (th-th-th percentile) for the best setting during spontaneous activity
Parameter Value
Temporal smoothing (Hz) [ . . . .]
Temporal smoothing (seconds) [  ]
Spatial smoothing (mm) [ ]
Table .: Smoothing parameters used for spatial and temporal smoothing
8.6. APPLICATIONS OF TKCCA TO REAL DATA 89
Spatial
Smoothing [mm]
Temporal
Smoothing [seconds]
Spontaneous
0 2 4 6 8
0
3
5
9
15
19
Median
Correlation
0.55
0.6
0.65
0.7
Spatial
Smoothing [mm]
Temporal
Smoothing [seconds]
Evoked
02468
0
3
5
9
15
19
Median
Correlation
0.55
0.6
0.65
0.7
0.75
Spatial
Smoothing [mm]
Temporal
Smoothing [seconds]
Spontaneous
0 2 4 6 8
0
3
5
9
15
19
Median
Correlation
0.45
0.5
0.55
0.6
0.65
0.7
Spatial
Smoothing [mm]
Temporal
Smoothing [seconds]
Evoked
02468
0
3
5
9
15
19
Median
Correlation
0.5
0.55
0.6
0.65
0.7
A
Neural to fMRI
B
C
fMRI to Neural
D
Figure .: Median canonical correlation on test data across all experiments
for different smoothing kernel widths; A: Predicting fMRI signals from neural
spectrograms during spontaneous activity; B: Predicting fMRI signals from neu-
ral spectrograms during evoked activity; C: Predicting neural spectrograms from
voxel volume time series during spontaneous activity; D: Predicting neural spec-
trograms from fMRI during evoked activity;
was .-.-. and for evoked activity .-.-.. For prediction of neural
signals from multivariate fMRI time series the optimal temporal smoothing param-
eters are the same; however optimal spatial smoothing widths for spontaneous and
evoked activity differ for this setting. Spontaneous activity is better predicted when
no spatial smoothing is applied, see fig. .C; fig. .D shows that for evoked activity
there is no clear effect of spatial smoothing on neural prediction performance. is
might be due to the experimental paradigm. e visual stimulus applied activated
the entire early visual cortex. Since almost every voxel carries the same information
about the stimulus in this unspecific stimulation to the full visual field, the fine detail
of the hemodynamic response is not important for predicting neural activity. With
the optimal smoothing parameters, the correlations (th-th-th percentile) be-
90 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
tween true and predicted neural activity were .-.-. (no spatial smoothing)
for spontaneous and .-.-. (mm spatial smoothing kernel) for evoked ac-
tivity.
Implications for multivariate decoding analyses Leaving aside the fact that all
subsequentanalysisprofitfroma properparameterselectionforpreprocessingoper-
ations such as smoothing, the optimal smoothing parameters have some interesting
implications. We find that prediction of neural signals during spontaneous activity
is generally worse when fMRI data was spatially smoothed (see fig. .A-C). NoteSpatial smoothing
decreases prediction
accuracy of fMRI signals
from neural signals
that when predicting fMRI signals from neural band power (fig. .A/B), the drop in
prediction accuracy for no or little temporal smoothing is much smaller than when
predicting neural signals from fMRI (fig. .C/D). Temporal high frequency struc-
ture in the fMRI signal does not seem to help for predicting neural signals. Notably
increasing spatial smoothing kernel widths have different effects on the prediction
of spontaneous and evoked neural activity (fig. .C/D). e implications for stan-
dard fMRI analyses are that when the BOLD activity pattern is expected to have a
fine scale structure, strong spatial smoothing is detrimental but temporal smoothing
is necessary.
Another point of this parameter selection procedure is worth noting: e re-Neural signals predict
fMRI signals better than
fMRI signals predict
neural activity
sults also show that the BOLD signal is predicted better from the past neural activity
(corresponding to the setting in fig. .A) than the neural activity is predicted from
the fMRI signal (corresponding to the setting in fig. .B). is is expected as the
origin of fMRI signals is neural activation, and not the other way around.
Translating neural spectrograms to fMRI signals
In the following the results from the prediction of fMRI signals using time win-
dows of neural spectrograms (see fig. .A) for the best parameters (.Hz tempo-
ral smoothing, mm spatial smoothing, see fig. .A/B) are presented in detail. Like
classical fMRI analyses this analysis does not account for non-separable spatiotem-
poral dynamics of the hemodynamic response; the hemodynamic response pattern
is only modeled as a temporally invariant projection wy. But this setting can give
useful insights in the time-frequency coupling of neural spectrograms. It was clearBOLD signals reflect
some neural frequencies
better than others from the first studies investigating the neural basis of the BOLD signal that some
frequencies of neural oscillations are reflected in the fMRI signal better than others
[Logothetisetal.,]. Whatwasnottestedinthesestudieswaswhethertherewere
also frequency dependent differences in the temporal dynamics of hemodynamic
response. For instance higher frequencies of neural oscillations could show faster
temporal neurovascular coupling dynamics than lower frequencies. is question
was addressed in [Bießmann et al., b, Murayama et al., ]. We estimated
the time-frequency neurovascular coupling dynamics using tkCCA for evoked ac-
tivity [Bießmann et al., b] and spontaneous activity [Murayama et al., ].
8.6. APPLICATIONS OF TKCCA TO REAL DATA 91
A representative example of the data before and aer tkCCA is plotted in figure
.. Panel A shows the mean activity of all bands of neural bandpower in blue and
the mean of all voxels in red. e neural activity convolved with a standard HRF,
as depicted in fig. ., is plotted as purple dashed line. e purple line is temporally
aligned with the red fMRI signal. e correlation coefficient between fMRI activity
predicted from the canonical HRF and true fMRI activity is .. In fig. .B the
two modalities aer tkCCA are depicted. Note that also tkCCA temporally aligned
the two signals. In addition to that the canonical components computed by tkCCA
show a higher correlation coefficient of . between predicted fMRI signal and true
fMRI activity.
Examples of the estimated time-frequency convolutions wx(τ)are shown in figure
.A/B, averaged across all sessions ( sessions for spontaneous activity and 
sessions for evoked activity). For an easier interpretation of the temporal dimen-
sion, the time lag axis is labeled with positive time lags, corresponding to the shi in
time of the BOLD signal relative to the neural signals. On top of the respective axis,
denoting neural frequency bands or time lags, respectively, the strongest frequency
and time lag factors are plotted in grey. Factorization was done using singular value
decomposition (SVD), see e.g. [Strang, ]. For both stimulus conditions tkCCA
robustly estimates the time frequency coupling dynamics, characterized by a pro-
nounced peak at a time lag of about s in the higher frequency regime starting from
Hz. Frequencies below Hz contribute less to the BOLD response. e filter ob-
tained from evoked activity shows a slower decay and a slightly smaller amplitude
at the peak response. ese effects are likely to be due to the rather long duration
of the visual stimulation. e s long block of cortical activation leads to a slower
decay of the HRF. As each convolution is normalized with its vector norm this ef-
fect also leads to the slightly smaller amplitude of the peak at τ=s. It is notewor-
thy that the main characteristics of the time frequency response, time point of the
peak response and weighting of different frequencies, are very similar when com-
paring results from evoked and spontaneous activity. is can be interpreted in two
ways. For one it means that the tkCCA results are not biased by stimulus induced
spatiotemporal autocorrelations with respect to the main parameters of the hemo-
dynamic coupling [Bießmann et al., b]. Secondly it confirms that even from
spontaneous data tkCCA can robustly estimate neurovascular coupling dynamics
[Murayama et al., ].
Translating fMRI signals to neural spectrograms
While the coupling dynamics of single frequencies of neural oscillations are an im-
portant aspect for understanding neurovascular coupling mechanisms, the usual
perspective on fMRI data is the other way around: Researchers measure fMRI activ-
itywithoutsimultaneousacquisitionofintracranialneural signals. Insteadonetakes
an experimentally controlled stimulus time course as a proxy for the neural activity
92 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
0510 15 20 25 8 12 24 40 100 250 3000
−0.5
0
0.5
1
1.5
2
Frequency [Hz]
τ [s]
wx(τ)
8 12 24 40 100 250 3000
0
0.5
Hz
Frequency Weights
0 5 10 15
0
0.5
Temporal Dynamics
τ [s]
0510 15 20 25 8 12 24 40 100 250 3000
−0.5
0
0.5
1
1.5
Frequency [Hz]
τ [s]
wx(τ)
8 12 24 40 100 250 3000
0
0.5
Hz
Frequency Weights
0 5 10 15
0
0.5
τ [s]
Temporal Dynamics
wy
wy
Electrode
Evoked Activity
Spontaneous Activity
Figure .: Le column: Time-frequency coupling dynamics wx(τ)between
neural spectrograms of different frequencies of neural oscillations and fMRI sig-
nals averaged across subjects during evoked activity (top) and spontaneous ac-
tivity (bottom); dominating frequency weights and temporal dynamics are plotted
on top; note that evoked and spontaneous activity yield very similar coupling dy-
namics consistent with most models of the HRF there is a pronounced peak at
τ=s; right column shows a single experiment voxel pattern wy; scale bar in top le
corner is -by-mm;
8.6. APPLICATIONS OF TKCCA TO REAL DATA 93
0 10 20 30
−2
0
2
4
Time [s]
[sdu]
Mean activity (r=0.70)
Neural Signal
Neural * HRF
fMRI Signal
0 10 20 30
−2
0
2
4
Time [s]
[sdu]
Canonical components (r=0.82)
Neural Signal
fMRI Signal
A B
Figure .: A: Examples of mean fMRI signal within spherical ROI around elec-
trode (red) and mean neural signal (blue) in standard deviation units; note the
time delay (hemodynamic lag) of about seconds; convolution with HRF pre-
dicts an fMRI signal (purple dashed line) which is temporally aligned with the
true fMRI signal, correlation coefficient is .; B: Canonical components com-
puted by tkCCA from the same data; canonical component for neurophysiolog-
ical spectrograms (blue) and fMRI signals (red); tkCCA temporally aligns fMRI
and neural signals without the assumption of a canonical HRF as in panel A; note
that tkCCA yields a substantial increase in prediction accuracy of neural signals,
indicated by a correlation coefficient of .;
of interest. e goal of most fMRI analyses is to infer the neural activity from mul-
tivariate voxel time series. In the following we will show how to use tkCCA in order
to estimate a spatiotemporal filter that translates fMRI activity into neural spectro-
grams. We use this deconvolution to decode neural activity from time series of voxel
volumes. e simultaneous recordings allow to draw direct conclusions about the
relationship between fMRI signals and neural activity.
Spatiotemporal dynamics of the BOLD signal When a neural population is ac-
tivated, the hemodynamic response is not instantaneous. e HRF exhibits com-
plex spatiotemporal dynamics, which are mainly governed by the structure of the
vascular system (see fig. .). Many fMRI analysis approaches assume that the spa-
tiotemporal dynamics of the HRF is separable into a spatial and a temporal com-
ponent. e spatiotemporal separability assumption is a convenient approxima-
tion. It implies that the temporal dynamics of the HRF are the same for each voxel. For faster computations
most fMRI analyses
assume spatiotemporal
separability of the HRF
e function used for modeling this response is hence oen called canonical HRF.
is canonical HRF is used to temporally align a regressor, such as an experimen-
tally controlled stimulus, with the fMRI signal, see fig. .. e spatial aspects of
hemodynamic activation are typically estimated using mass-univariate approaches
[Friston et al., , , Worsley and Friston, , Friston and Buchel, ]
or multivariate pattern analysis [Cox and Savoy, , Haxby et al., , Haynes
and Rees, , Kriegeskorte et al., ]. ere is empirical evidence justifying
this approximation. Spatiotemporal separability of the hemodynamic response has
94 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
been reported in studies on non-human primates [Sirotin et al., ] and humans
[Shmuel et al., ]. ese studies however used rather simple models of the spatial
profile of the hemodynamic response [Sirotin et al., ]. And although the results
of [Shmuel et al., ] show that large parts of the hemodynamic response can be
accounted for by a separable approximation, the authors also mention that some
spatiotemporal variability of the HRF cannot be explained by separable models. In-
deed many other studies provided empirical evidence showing that the HRF varies
between brain regions, subjects [Aguirre et al., , Handwerker et al., ] and
cortical depth [Yacoub et al., ]. In particular the local fine structure of cortical
vascularization gives rise to spatiotemporal hemodynamics that cannot be captured
by a combination of an independent spatial and temporal component [Berwick
et al., ]. Moreover it can be shown that this spatiotemporal variability changes
depending on the length and intensity of the sensory stimulus used [Logothetis,
, Berwick et al., ].
Taken together, while the separability assumption of the HRF is convenient, sev-
eral studies provide evidence against it. Consequently non-separable models can
reveal areas of activation that separable models would fail to detect, see for instance
[Woolrich et al., , Lu et al., , Mourão-Miranda et al., , van Gerven
et al., ]. Many fMRI decoding studies have the goal to decode neural activity
from fMRI signals. If the spatiotemporal variability of the HRF contains informa-
tion about neural processes, then accounting for non-separable hemodynamics will
increase the decoding accuracy of neural signals from fMRI data. Although this hy-
pothesis is highly relevant for many fMRI studies, it was not tested directly so far.
Using tkCCA to estimate non-separable spatiotemporal deconvolutions from simul-
taneous recordings of neurophysiology and fMRI signals, it can be shown that non-
separable spatiotemporal hemodynamics in response to neural activation indeed
contain information about the underlying neural activity [Bießmann et al., a].
is will be explained in more detail in the following.
High gammaband contributesmost tofMRI signal Sincetheneuralbandpower
timeserieswas modeledas a linear combinationwxof frequency bands, see equation
(.), it is important to ensure that the frequency combinations found by tkCCA are
physiologically plausible. Figure .C shows the grand mean and standard error of
mean of the normalized band power coefficients across all experiments. e coef-
ficients for each band were similar across experiments and werein line with previous
findings [Logothetis et al., , Niessing et al., , Goense and Logothetis, ]:
Lower frequencies contributed relatively little to the BOLD signal and the band with
the largest coefficients was the high gamma band (-Hz).
8.6. APPLICATIONS OF TKCCA TO REAL DATA 95
0 5 10 15 20
0
0.5
1
HRF estimated
τ [s]
0 5 10 15 20
0
0.5
1
HRF fitted (r=0.99)
τ [s]
8 12 24 40 100 250 3000
0
0.2
0.4
0.6
0.8
1
Bandstop [Hz]
Frequency Band Weights
A B C
Figure .: A: Temporal component wτ(mean ±standard error of mean across
subjects) exhibit typical characteristics of commonly used HRF models, a peak at
τs and a late undershoot; B: Fits of canonical HRFs usingstandard HRF model
(as implemented in SPM); correlation coefficient rbetween estimated and t-
ted HRFs was .; C: Normalized frequency coupling coefficients α(f), averaged
across all experiments (n=); neural oscillations in the range from -Hz con-
tribute most to the BOLD signal, lower frequencies and high-frequency bands
contributed less; note the similarity of the frequency coupling parameters with
those obtained in the time-frequency convolution in fig. .
Decomposition of spatial and temporal dynamics When comparing tkCCA
with classical analysis approaches it is important to ensure that the temporal dynam-
ics wτused as canonical HRF for the classical analyses is correct. We tested whether
the temporal dynamics in wτare well captured by the commonly assumed canonical
HRF model (see fig. .). A popular HRF model, as implemented in the standard
fMRI analysis package SPM [Friston and Buchel, ], was fitted to the mean wτ
for each experimental subject. is model explained wτvery well; the grand aver-
age, across all experiments, of the mean squared error between normalized fits
and tkCCA estimates of wτwas .·, the correlation between the the SPM HRF
model and the tkCCA estimate was .. Examples for wτand the canonical HRF
fits are depicted in fig. .A/B, respectively.
Model comparisons Acomparisonofthespatiotemporaldynamicsobtainedwith
tkCCA and classical analysis methods is depicted in fig. .. e le panel shows
the temporal dynamics wτ. Vertical lines in the temporal dynamics plot indicate
the time lags at which the spatial patterns in the panels on the right side have been
extracted. e color of the frame corresponds to the color of the vertical line in the
le panels. Figure .A shows that the non-separable tkCCA model learned a non-
separable spatiotemporal pattern: Some voxels follow the main hemodynamic re-
sponse, reflected in the factorized temporal dynamics wτ. An example are the voxels
close to the electrode, indicated as peak response voxels. Other voxels show differ-
ent temporal dynamics. For instance some voxels, indicated as undershoot voxels,
show a stronger decrease in the undershoot phase of the HRF. Figures .B/C show
deconvolutions obtained with multivariate and mass-univariate separable models.
96 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
e spatial dynamics of these separable models are by design the same for all
time lags, except a voxel-independent scaling factor given by wτ.
Neural signal prediction accuracy as function of ROI radius Figure .A lists
thecorrelations(medianacrossallsessions)obtainedwiththenon-separabledecon-
volutions and all separable deconvolutions for spontaneous activity and fig. .B
for evoked activity. At ROI radii of -mm at least every second session achieves
correlations above ., independent of the analysis method used. All models pre-
dicted neural activity worst for the smallest radius of mm (see fig. .A/B). While
mass-univariatedeconvolutionsandspatial averagingdo notshowincreased predic-
tion performance beyond of radii of mm, the predictions of the two multivariate
methods w
y(τ)and wmv
y(τ)saturate slightly later at ROI radii of mm.
Non-separablespatiotemporalhemodynamics containneuralinformation If
non-separable spatiotemporal variability of the HRF contains neural information
then tkCCA will predict neural signals better than separable models. In contrast if
the true HRF were separable in space and time, then the predictions of the separable
models would be greater or equal to those obtained with tkCCA. Figure .C/D
shows the percentage of sessions in which tkCCA predicted neural activity better
than separable models. For a ROI size of up to mm there is no significant differ-
ence between the separable and non-separable multivariate model. For ROI sizes
larger than mm, tkCCA consistently predicts neural signals better than any of the
separable models. Red symbols in fig. .C/D indicate that the medians of the
correlations obtained with tkCCA are significantly larger than those of the separa-
ble models (p<., Bonferroni corrected non-parametric sign test [Hollander and
Wolfe, ]) for a given ROI size. e separable model wmv
y(τ)whose predictions
were closest to those of the non-separable model is directly compared to the tkCCA
result in the scatterplots of fig. .. Each dot is one session. Dots falling above the
red line indicate that tkCCA is better than the separable multivariate model. Light
grey dots indicate that this effect is not significant for a given ROI radius. Non-
separable deconvolutions predict neural activity consistently better for ROI sizes
larger than mm.
Differences between non-separable and separable models in space and time
In order to examine which spatial and temporal features gave rise to this superiority
of the non-separable model we compared the deconvolutions of the separable and
the non-separable multivariate model. For each session we partitioned the decon-
volution w
y(τ)and wmv
y(τ)into time slices and spherical shells around the electrode.
8.6. APPLICATIONS OF TKCCA TO REAL DATA 97
0 10 20
0
0.5
Overall Dynamics wτ
wτ
0 10 20
0
’Peak’ Voxel
0 10 20
0
’Undershoot’ Voxel
τ [s] wy
*(τ=4s) wy
*(τ=10s) wy
*(τ=18s)
’Peak’
Voxel ’Undershoot’
Voxel
Electrode
AtkCCA
0 10 20
0
0.5
Overall Dynamics wτ
wτ
τ [s]
wy
mv(τ=4s) wy
mv(τ=10s) wy
mv(τ=18s)
BMultivariate Separable
0 10 20
0
0.5
Overall Dynamics wτ
wτ
τ [s]
wy
mu(τ=4s) wy
mu(τ=10s) wy
mu(τ=18s)
CMass−univariate Separable
Figure .: Examples of spatiotemporal deconvolutions wy(τ)for three time
lags τplotted on transversal section of V (frame color corresponds to vertical
lines in le panels); A: Non-separable tkCCA model w
y(τ)shows different tem-
poral dynamics in different voxels; le column shows overall temporal dynamics
wτ(top panel), temporal dynamics in a peak voxel (middle) and in undershoot
voxel (bottom); B: Separable multivariate model wmv
y(τ)shows the same temporal
dynamics in all voxels;C: Separable mass-univariate model wmv
y(τ)also exhibits
the same temporal dynamics in all voxels;
98 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
2 4 6 8 10 12
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9 Spontaneous
ROI Radius [mm]
Correlation
2 4 6 8 10 12
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
ROI Radius [mm]
Correlation
Evoked
tkCCA
MV
MU
ISO
2 4 6 8 10 12
40
50
60
70
80
90
100 Spontaneous
ROI Radius [mm]
tkCCA better [% sessions]
2 4 6 8 10 12
40
50
60
70
80
90
100
ROI Radius [mm]
tkCCA better [% sessions]
Evoked
MV
MU
ISO
Chance level
A B
C D
Figure .: Comparison of correlations between true and estimated neural sig-
nals, obtained with non-separable (tkCCA) and separable deconvolutions for dif-
ferent ROI sizes; A: Correlation between predicted and true neural signals during
spontaneousactivity; B:SameasAbutforevokedactivityC:Percentageofsessions
in which non-separable models predicted neural signals better than respective
separable model; dashed blue line indicates random difference in performance;
red symbols denote significant difference at p<. (Bonferroni corrected); D:
Same as C for evoked activity;
e time slices are intervals of lengths s and can be defined as sets of time delays
τthat have an absolute distance of less or equal than half a second to the time slice
index t={, , . . . }
Tt={τ,τt
}(.)
e spherical shells are concentric around the electrode and have a thickness of
mm. ey can be parametrized by a radius parameter r={, , . . . }and are de-
fined by the set of all voxels whose distance from the electrode lies in the interval
[r
,r+
]mm.
Sr={s,dist(s,s)r
}(.)
8.6. APPLICATIONS OF TKCCA TO REAL DATA 99
0 0.25 0.5 0.75 1
0
0.25
0.5
0.75
1Radius 2mm
Separable
Non−Separable
0.5 0.7 0.9
0.5
0.7
0.9 Radius 4mm
0.5 0.7 0.9
0.5
0.7
0.9 Radius 6mm
0.5 0.7 0.9
0.5
0.7
0.9 Radius 8mm
0.5 0.7 0.9
0.5
0.7
0.9 Radius 10mm
0.5 0.7 0.9
0.5
0.7
0.9 Radius 12mm
0 0.25 0.5 0.75 1
0
0.25
0.5
0.75
1Radius 2mm
Separable
Non−Separable
0.5 0.7 0.9
0.5
0.7
0.9 Radius 4mm
0.5 0.7 0.9
0.5
0.7
0.9 Radius 6mm
0.5 0.7 0.9
0.5
0.7
0.9 Radius 8mm
0.5 0.7 0.9
0.5
0.7
0.9 Radius 10mm
0.5 0.7 0.9
0.5
0.7
0.9 Radius 12mm
not significant significant
Spontaneous
Evoked
Figure .: Correlations between true and neural activity estimated by the
best separable devonvolution wmv
y(τ)and the non-separable (tkCCA) deconvolu-
tion w
y(τ); tkCCA predicts neural activity better for ROI radii larger than mm;
dashed red line indicates equal performance, each dot a session; dots falling above
the diagonal indicate tkCCA is better than the best separable model; light grey
indicates the medians of separable and non-separable correlations are not signif-
icantly different at p<. (Bonferroni corrected);
where dist(s,s)denotes the euclidean distance between the voxels with the index
sand the electrode voxel s. For each time slice Ttand each spherical shell Srwe
calculated the squared difference between the optimal non-separable deconvolution
w
y(s,τ)(estimated by tkCCA) and the filter obtained by separable models:
(w
y(s,τ)wmv
y(s,τ)),sSr,τTt(.)
Figure . shows spatiotemporal difference profiles comparing separable and non-
separable hemodynamic response models during evoked and spontaneous activity.
Plotted are the averages over all experiments. e multivariate separable and non-
separable deconvolutions show similar spatial dynamics close to the electrode dur-
ing the initial main response from τ=-s. With increasing cortical distance the
patterns become more dissimilar. In a time window from τ=s to τ=s the two
approaches yield very different spatial patterns, especially at voxels far away from
the recording position. is analysis shows that non-separable spatiotemporal dy-
namics are found mainly in distant voxels in the later phase of the HRF.
Implications for fMRI analysis methods
Isotropicspatialsmoothing predicts neuralsignalsworst efactthatisotropic
spatial averagingwasconsistentlyworsethanthenon-separabledeconvolutionssug-
gests that the spatial structure of the optimal spatiotemporal deconvolution is not a
100 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
τ [s]
Distance [mm]
Spontaneous
0 5 10 15
12
10
8
6
4
(wy
*(τ) − wy
mv(τ))2
0.01 0.02
τ [s]
Distance [mm]
Evoked
0 5 10 15
12
10
8
6
4
(wy
*(τ) − wy
mv(τ))2
0.01 0.02
A B
Figure .: Difference between non-separable and separable multivariate de-
convolutions as function of time lag and cortical distance during spontaneous (A)
and evoked (B) activity averaged across all experiments; note the high similarity of
non-separable and separable deconvolutions close to the electrode at time lags of
about s when the HRF is peaking; with cortical distance the two models become
less similar, indicating non-separable dynamics; distant voxels at time lags around
s exhibit most non-separable features;
spatially isotropic filter in combination with a canonical HRF. A standard prepro-
cessing step in fMRI analyses is to spatially smooth the fMRI data and to construct
a regressor for each voxel by temporally smoothing the experimentally controlled
variables with a canonical HRF. is preprocessing step corresponds to the isotropic
smoothing setting. Our results show that accounting for non-separable spatiotem-
poral dynamics improves predictions of neural activity. is makes sense, consid-
ering the anisotropic structure of primary visual cortex: Along the cortical laminae
there are different correlational structures than perpendicular to the cortical lami-
nae. Isotropic spatial smoothing does not take this into account. is finding can be
transferred to standard fMRI experiments: Consistent with recent findings on mul-
tivariate decoding of fMRI activity [Chen et al., ] our results show that using
anatomically or functionally informed priors leads to better results in subsequent
analyses.
Multivariate vs. mass-univariate methods e underlying assumption of mass-
univariateanalysesis thatthehemodynamicresponseto neural activationcanbe ap-
proximated by analyzing time series of single voxels separately. We tested this spatial
independence assumption by predicting neural activity from mass-univariate cor-
relation estimates of the spatial hemodynamic response pattern. Indeed the mass-
univariate approach predicted neural activity reliably with correlations above ..
Compared to multivariate estimates however mass-univariate predictions were con-
sistently worse (see fig. .). is effect is consistent with theoretical studies such
8.6. APPLICATIONS OF TKCCA TO REAL DATA 101
as [Kriegeskorte et al., ] comparing mass-univariate and multivariate estimates
of neural activity from multivariate voxel time series. e superiority of multivari-
ate decoding is unlikely to be due to wrong estimates of the temporal dynamics wτ
obtained from the non-separable model w
y(τ). As shown in fig. .A/B, wτob-
tained from factorizing the tkCCA solution is fitted well by a standard canonical
HRF model.
Optimal searchlight radii for decoding neural signals Ourresultsquantifyhow
much information about neural activity is lost when choosing wrong radii in stan-
dard searchlight approaches [Kriegeskorte et al., ]. We found lower decoding
accuracies for searchlight radii smaller than mm indicating that there is informa-
tion about local neural activity in the multivariate structure around the electrode
that is missing in small ROIs. is information is not captured by spatial averag-
ing nor mass-univariate analyses. On the other hand ROIs larger than mm do not
increase the prediction accuracy, suggesting that at distances more than mm away
from the site of neural activation, there is mostly redundant information about local
neural activity in the fMRI signal.
Non-separable vs. separable deconvolutions Space-time separability of the
hemodynamic response to neural activation has been reported previously in optical
imagingstudiesonratbarrelcortexfortheoxyhemoglobinandthetotalhemoglobin
response [Devor et al., ]; the spatiotemporal dynamics of deoxyhemoglobin⁵
seems to have a more complex structure. All three hemodynamic signals however
showed very similar spatial patterns at the initial peak of the HRF in this study. Op-
tical imaging studies in alert monkeys refined this result by showing that the initial
dip of the rapid deoxyhemoglobin response had the same spatial dynamics as the
slower blood volume response [Sirotin et al., ], also speaking in favor of spa-
tiotemporal separability. However the authors of [Sirotin et al., ] assume an
isotropic gaussian profile for the spatial dynamics and consider only the estimated
width parameter. Such a simple spatial profile seems at odds with the complex cor-
tical microvasculature [Weber et al., ] (see also fig. .), other optical imaging
results [Berwick et al., ] and our finding that anisotropic filters predict neu-
ral activity significantly better than isotropic gaussian profiles. It is worth keeping
in mind that while the results of optical imaging studies offer a very high in-plane
resolution, they are limited to superficial cortical layers down to μm. ere
is evidence from high resolution fMRI studies showing that cerebral blood volume
(CBV) changes in the surface vessels return faster to baseline levels aer stimula-
tion than CBV changes in deep cortical layers [Yacoub et al., ]. Also in humans
the response to highly localized visual stimuli seems to be well approximated by
separable dynamics: In [Shmuel et al., ] it was shown that the the first spatial
⁵Deoxyhemoglobin is an important factor contributing to the BOLD signal [Ogawa et al., ]
102 CHAPTER 8. MODEL-FREE ANALYSIS OF NEUROVASCULAR COUPLING
principal component of the hemodynamic point spread function accounts for 
of the variance in the fMRI signal. e authors also report variability of the point
spread function over time. Our results confirm these results in that separable ap-
proximations are very similar to non-separable hemodynamic response patterns
especially in an early time window directly aer a neural impulse (see fig. .). We
found however that for decoding neural activity, non-separable models are consis-
tently better than separable models. us the variability of the hemodynamic re-
sponse that is not explained by separable models contains information about neural
activity. is is consistent with findings from studies without neural measurements.
In [Mourão-Miranda et al., ] the authors showed that taking into account the
spatiotemporal dynamics of the hemodynamic response to a stimulus variable can
help to find patterns of activation that are difficult to detect with classical methods.
Limitations of the decoding approach Our results show that all methods tested
yield accurate predictions of neural activity. However non-separable deconvolu-
tions predicted neural activity consistently better on test data confirming that non-
separabledynamicsofthehemodynamicresponsecontaininformationaboutneural
activity. For the correct interpretation of our results it is important to keep in mind
that we cannot rule out that systematic time delayed correlations between the neural
activity recorded and neural activity within all other voxels distort our estimate of
spatiotemporal filters. An example of such a confound would be slowly traveling
waves of neural activity [Leopold et al., ]. While this potential confound might
bias our deconvolution estimates, there is evidence from optical imaging studies
showing that the spatiotemporal hemodynamic response several millimeters away
from the center voxel is not reflecting time delayed neural correlations but really
a hemodynamic effect [Devor et al., ]. Whether or not distant neural activity
is affecting our estimates, from a decoding point of view these potential confounds
are neglegible. e aim of many fMRI analyses is not the exact estimation of the
hemodynamic response pattern but rather an accurate prediction of neural activity.
Moreover if the scientific hypothesis can be formulated in a decoding framework,
then the only quantity of interest for falsifying the hypothesis is the accuracy with
which neural activity can be predicted from fMRI signals. Our results show that
tkCCA can be used to predict neural activity better than standard methods by ex-
ploiting spatiotemporal non-separable hemodynamics. We conclude that including
non-separable spatiotemporal dynamics in the analysis of fMRI data will help to im-
prove inference about neural activity from hemodynamic signals, especially if the
hemodynamic data has a high spatial resolution.
Chapter 9
Summary and outlook
T advances enable researchers to measure brain activity at an increas-
inglyhigh levelofdetail. eusefulnessoftheseadvancesdepends criticallyon
the choice of an appropriate analysis method. Over the years, many methods have
been proposed, from biologically realistic models as in [Daunizeau et al., , Riera
et al., , Valdes-Sosa et al., , Bojak et al., ], over multivariate data driven
approaches [Cox and Savoy, , Norman et al., , Haynes and Rees, ] and
less complicated mass-univariate approaches [Friston and Buchel, ] to simple
analyses based on averaging and bivariate correlation coefficients [Logothetis et al.,
, Martindale et al., ]. is dissertation proposes a compromise between
these approaches. We introduce tkCCA as a simple but powerful data-driven algo-
rithm for the analysis of high resolution multimodal neuroimaging data.
Good models of neurophysiological and hemodynamic activity are indispens-
able for testing hypotheses about physiological processes and of great value for con-
necting different levels of research intracellular recordings, large scale connectivity
data and functional measurements. But the appropriate tradeoff between biological
realism and tractability depends on the scientific question addressed. For instance Biologically realistic
models are necessary
and useful ...
in EEG forward modeling, a conductivity profile of the head is important if one
wants to estimate the exact location of a current source. If however the goal of a
study is not localization but if the scientific hypothesis can be cast in a decoding
setting, then dipole fitting is not needed aer all the EEG forward model is just a
linear mapping, see eq. (.). If the scientific hypothesis can be tested by decod- ..., but not every
scientific hypothesis
needs biologically
realistic models
ing a stimulus variable from the EEG time series, any linear classifier could undo
the volume conduction effects for the classification, or in general the decoding task.
Fitting a dipole model to the data will not increase the amount of information in the
data. To summarize, not every hypothesis needs a forward model furnished with all
biological details accumulated in empirical research. In fact including too much bi-
ological realism oen renders it difficult to assess whether or not a particular feature
is needed to explain the data.
104 CHAPTER 9. SUMMARY AND OUTLOOK
e scientific question addressed with tkCCA was: Can we increase the amount
of information about neural activity extracted from fMRI signals by abandoning
the assumption that the hemodynamic response to neural activation is separable
in space and time? While previous studies provided plenty of indirect evidence in
favor of this hypothesis, it has not yet been investigated directly. In order to an-
swer this question we chose a decoding approach. Using a spatiotemporal decon-
volution we decoded intracortical neural activity from fMRI voxel time series. e
spatiotemporal deconvolutions were constructed such that they reflected the main
assumptions underlying popular fMRI analysis approaches. In particular all models
except the tkCCA model assumed spatiotemporal separability of the hemodynamic
response. In this setting the above hypothesis can be easily translated into different
experimental outcomes: If the non-separable parts of the hemodynamic response
do carry information about neural activity, then the non-separable tkCCA decon-
volution will predict neural activity better than separable deconvolutions.
Ourresultsshowthatthepredictionsofnon-separabledeconvolutionsestimated
with tkCCA were consistently better than those of separable deconvolutions, corre-
sponding to isotropic spatial smoothing, mass-univariate analyses and multivari-
ate decoding approaches. is represents to the best of our knowledge the first di-
rect empirical evidence in favor of the hypothesis that the spatiotemporally non-
separable part of the hemodynamic response contains information about neural
activity. A biologically realistic model for the complete spatiotemporal dynamics
of high resolution fMRI has not been established yet. Such a model requires basic
research about the neurovascular coupling mechanisms. And analysis of these ex-
perimental data requires methods that do not rely on restrictive hypotheses. e ap-
proach taken here might seem simple, yet it yields new insights about the spatiotem-
poral dynamics neurovascular coupling: We can provide spatial and temporal pa-
rameters describing where non-separable dynamics occur and how important they
are for decoding neural activity. ese results could not have been obtained with
most of the classical analysis methods. Our findings thus might help to build more
detailed models of the spatiotemporal neurovascular coupling dynamics. Moreover
tkCCA can of course also increase the prediction accuracy of neural signals in stan-
dard non-invasive fMRI experiments.
Directions of future research
e application examples on real neuroimaging data were restricted to one brain
area only. is allowed for a detailed examination of neurovascular coupling mech-
anisms in an anatomically confined region. Future work will focus on application of
the presented algorithm to completely non-invasive EEG-fMRI data in more than
one brain area. Potential areas of application include hybrid brain computer inter-
105
faces such as in [Fazli et al., ], but also the detection and localization of epilep-
tic activity. In contrast to the matrix data of the neurophysiology spectrograms this
would involve tensor data, spanning the dimensions of frequency, time and EEG
channels. As the solution of tkCCA is computed on inner products of data points,
it naturally extends to tensor data.
Moreover tkCCA is ideally suited to study changes in neurovascular coupling
mechanisms. In the application examples presented, the well controlled experi-
mental conditions guaranteed stationarity of the neurovascular coupling dynam-
ics. However, certain neuromodulators can change the hemodynamic response to
neural activation. Preliminary results show that local application of Acetylcholine
increases the baseline activity of the BOLD signal [Rauch et al., ]. Comparing
the neurovascular coupling parameters estimated by tkCCA one can detect changes
in the spatiotemporal dynamics of neurovascular coupling. is approach could
yield new insights in neurovascular coupling pathologies associated with choliner-
gic dysfunction, such as Alzheimers disease.
Anotherdirectionoffutureresearchisthe explorationofnon-linearkernelfunc-
tions for tkCCA. As the visualization of the results is of particular importance in
neuroimaging applications, we opted for linear kernels in the analyses presented.
However if the visual inspection of the features is not essential but only the question
whether two modalities are coupled or not, then non-linear kernels in combina-
tion with tkCCA are a powerful alternative to standard methods. Non-linear kernel
functions allow to estimate arbitrary non-linear state space models [Yamada and
Azimi-Sadjadi, ]. When applied to multimodal data, tkCCA with non-linear
kernels can also quantify information flow [Wu et al., ] in a Granger-Causality
sense [Granger, ].
Of course tkCCA can be readily applied to any data beyond the field of neu-
roimaging. Preliminaryresultshavebeen presentedin[BießmannandHarth,],
in which dependency structures in high dimensional graphs extracted from a social
platform on the internet were analyzed. Similar approaches can be taken for analyz-
ing news feeds crawled from the internet. Web data are inherently multimodal as
they comprise pictures, text data, geographical information, structural markers and
many more features that might contain useful information. Current web data analy-
sis frameworks do not account for the dynamic aspects of web graphs. First attempts
to tackle dynamic aspects of large web data sets have been presented in [Leskovec
et al., ]. Future work involving more efficient implementations of kCCA, for in-
stance maximum-margin based approaches to CCA as presented in [Szedmak et al.,
], will be helpful for tackling larger data sets than those we dealt with in the
neuroimaging data applications.
106 CHAPTER 9. SUMMARY AND OUTLOOK
Appendix
Appendix A
Mathematical preliminaries
A.1 Generalized Eigenvalue Equations
An eigenvector w
R
nof a square matrix A
R
n×nis defined as a vector that is
mapped to a multiple of itself when le-multiplied with A:
Aw =λw. (A.)
e corresponding factor λis called eigenvalue and eq. (A.) is called an eigenvalue
equation. e scale of the eigenvectors is undefined, since for every eigenvector w, a
rescaled version αw with α
R
is also an eigenvector with the same eigenvalue. So
it is useful to think of eq. (A.) as defining only eigendirections, i.e. directions that
do not change when projected on A.
From the eigenvalue equation we see that the matrix AλI has not full rank,
since (AλI)w=, so we can find the eigenvalues of the matrix Aby finding the
zero set of the characteristic polynomial, i.e. by solving
det(AλI) =
for λ. Since this is a n-th order polynomial in λ, this implies that there are always n
eigenvalue/eigenvector pairs. Note however, that even for real matrices Athe eigen-
values are not necessarily real, since in general even real polynomials have complex
roots. If we write all the neigenvectors as columns of a matrix W, we can write
eq. (A.) as matrix equation
AW =WΛ (A.)
where Λ is a diagonal matrix with the eigenvalues λon its diagonal. Equation (A.)
is a special form of the generalized eigenvalue equation
Aw =λBw (A.)
with A,B
R
n×nor in matrix form
AW =BWΛ (A.)
110 APPENDIX A. MATHEMATICAL PRELIMINARIES
and the generalized eigenvalues can be found by solving det(AλB) = . Oen the
matrices corresponding to Aand Bwill be real and symmetric matrices computed
from measured data, for instances if A,Bare covariance matrices of kernel matrices.
is not only leads to real eigenvalues and eigenvectors, but also to a certain orthog-
onality structure of the eigenvectors. Consider two different eigenvalue/eigenvector
pairs (λi,wi)and (λj,wj). Multiplying the respective generalized eigenvalue equa-
tions with w
jand e
ifrom the le, we obtain
w
jAwi=λiw
jBwiand w
iAwj=λjw
iBwj
Symmetry of Aand Bmeans that either λi=λjor
w
jAwi=w
jBwi=. (A.)
so that eigenvectors with different eigenvalues are orthogonal with respect to the
scalar products defined by Aand B. For ordinary eigenvalue problems, where B=
I, the eigenvectors are orthogonal wrt. the standard euclidean scalar product. In
matrix notation this means that both matrices Aand Bare simultaneously diagonal
with respect to the basis of the generalized eigenvectors
WAW =DAand WBW =DB(A.)
where DAand DBare diagonal matrices. erefore, solving the generalized eigen-
value problem for two symmetric matrices is sometimes also called simultaneous di-
agonalization. For a detailed introduction to linear algebra see for instance [Strang,
]. An advantage of eigendecompositions is that depending on how one com-
putes the matrices Aand B one can cast a large number of algorithms in this frame-
work and apply standard eigenvalue solvers. is allows to understand, implement
and extend many existing statistical learning algorithms using only equation (A.).
Generalized eigenvalue equations are a generic way of solving multivariate regres-
sion problems. Among the many applications are e.g. t-tests, analysis of variance
(ANOVA) [Henson, ], partial least squares, ridge regression [Bie and Moor,
], (kernel) fisher linear discriminant analysis [Mika et al., ], blind source
separation [Diamantaras and Kung, ] or clustering [von Luxburg, ]. De-
spite the fact that eigenvalue equationsare linear equations, they are not restricted to
linear data analysis: Many non-linear methods can be solved efficiently in the form
of equation (A.) using non-linear kernel functions, see e.g. [Schölkopf and Smola,
, Schölkopf et al., , Shawe-Taylor and Cristianini, ]. ere is a simple
intuition underlying eigendecompositions: one wants to maximize the variance in
some subspace of the data (expressed in how one computes Ain eq. (A.)), while
minimizing or constraining the variance in another subspace of the data (expressed
in how one computes Bin eq. (A.)). For a concise review of eigenvalue problems
and their application to unsupervised and supervised analysis, such as regression,
latent variable recovery and clustering, the reader is referred to e.g. [Bie et al., ,
Shawe-Taylor and Cristianini, ].
A.2. RELATION BETWEEN PCA AND LINEAR KERNEL PCA 111
A.2 Relation between PCA and linear kernel PCA
erelationbetweenthestandard PCA onthecovariancematrix CX=XXandthe
linearkernelversiononthekernelKX=XXcanbeunderstoodbestbyperforming
asingular value decomposition (SVD) (see e.g. [Strang, ]) of the data matrix X
X=ESF (A.)
where E
R
D×Nand F
R
N×Dare orthogonal matrices and Sis a diagonal ma-
trix containing the so-called singular values. It can be shown that E
R
D×Nand
F
R
N×Dcontain the eigenvectors of XXand XX, respectively. e eigenvalues
of XXand XXare the squared singular values on the diagonal of S. It is straight-
forward to see this when computing CXand KXfrom the SVD solution:
CX=XX(A.)
=ESF(ESF)
=ESFFSE
=ESISE
=ESE
and
KX=XX(A.)
=FSE(FSE)
=FSEESF
=FSISF
=FSF
where we used EE=I=FF. So to summarize when D<N(less dimen-
sions than samples), it is more efficient to compute PCA on the covariance matrix
CX=XX. When D>N(less samples than dimensions), the dimensionality of
the principal component subspace is at most (N) thus it is more efficient to
compute PCA on the linear kernel matrix KX=XX.
112 APPENDIX A. MATHEMATICAL PRELIMINARIES
Appendix B
Experimental protocols
B.1 Preparative surgery and experiment procedure
is study involved combined electrophysiological fMRI experiments in healthy
monkeys (Macaca mulatta), weighting -kg. All experiments and surgical proce-
dures were approved by the local authorities (Regierungspräsidium) and were in full
compliance with the guidelines of the European Community (EUVD //EEC)
for the care and use of laboratory animals. A custom-made plastic head holder and
a recording chamber were implanted directly on to the skull of each animal un-
der general anesthesia (balanced anesthesia consisting of isoflurane .%and fen-
tanyl μg/kg i.v. as needed). Experiments were conducted under general anesthesia.
Aer premedication with glycopyrolate (.mg/kg i.m.) and ketamine (mg/kg
i.m.), animals were preoxygenated and intubated following induction with fentanyl
(μg/kg i.v.), thiopental (mg/kg i.v.) and succinyl chloride (mg/kg i.v.) and arti-
ficially ventilated (Servo Ventilator C; Siemens, Germany) maintaining an end-
tidal CO of mmHg and oxygen saturation of over %. Anesthesia was main-
tained with remifentanil (.-μg/kg i.v.) and muscle relaxation was achieved with
mivacurium (-mg/kg/h i.v.). Body temperature was kept at -.C, Lactated
Ringer�s solution (.%glucose) was infused at mL/kg/h i.v.. Drops of %cy-
clopentolate hydrochloride were instilled into each eye to achieve mydriasis and
hard contact lenses (Zeiss) with the appropriate dioptric power were used to bring
the eyes to focus on the stimulus plane. Visual stimuli were presented binocularly
using XGA fiber-optic projectors (Silent Vision; AVOTEC, Stuart, FL). Subject�s
fiber-scopes were aligned to the fovea of each eye using a modified fundus cam-
era (RC; Zeiss). e FOV of the scope was × degrees in visual angle. Vi-
sual stimuli were generated on-fly by an OpenGL based custom program. Stimulus
timings and neuronal recordings were synchronized to image acquisition and con-
trolled by the network connected PC which ran the custom program under QNX
real time OS [Logothetis et al., ].
114 APPENDIX B. EXPERIMENTAL PROTOCOLS
B.2 Data Acquisition
Wemademeasurementsinavertical.Tscannerwithacmdiameterbore(BioSpec
/v; Bruker Medical, Ettlingen, Germany) [Logothetis et al., ]. Customized,
small radio-frequency coils (-mm diameter) were used to increase sensitivity
overarecordingchamber. Forpositioningamulti-channelelectrode,aGEFI/FLASH
sequence was used with a FOV of ×mm at a matrix of × and a slice
thicknessofmm. Multi-slicefMRIwascarriedoutbyusingmulti-shot(segmented)
GE-EPI. Acquisition parameters of fMRI were: FOV of × or ×mm; or
 slices (mm thickness); matrix size of × or ×; segments; echo time
(TE) of -ms; repetition time (TR) of -ms; flip angle of -°; sweep
width of kHz; -sec dummy scanning. For anatomical reference, GEFI/FLASH
was used with the same FOV and slice angle as fMRI but with double matrix size and
.-mm slice thickness. All imaging data were reconstructed by using Bruker so-
ware. A small craniotomy was made in each experiment to position the MRI com-
patible multi-channel electrode in early visual areas, V and V. e electrode was
made of carbon composites as a carrier (×μm cross section), Pt/Ir (/)
wires (μm diameter; California fine wire, USA) as  contact sites in the brain
and flattened Ag wires glued on the carbon sha as ground and noise sensor in
the recording chamber. Procedures for signal amplification and noise minimiza-
tion during simultaneous recording based on current measurement are described
in detail elsewhere [Logothetis et al., , Oeltermann et al., ]. e noise in-
terference caused by the alternating magnetic field gradients during fMRI has two
major components; the far-interference that can be reduced by adjusted counter-
current injection through a junction electrode (Ag wire) placed in the mouth, and
the near-interference, that is not measurable since it is too close to the recording
site to permit the placement of additional sensors. is component can be approxi-
mated by a linear sum of gradient magnetic field changes in the XYZ directions and
minimized within the preamplifier by adjusting each XYZ factor. For our multi-
channel electrode, the far-interference compensation was done by a separate device
for all channels, while the near-interference compensation was done within each
channel-amplifier. For data collection, out of  channels were selected as visu-
ally responsive channels by using a computer-controlled receptive field plotting pro-
gram. Amplified electrophysiological data were digitized with a bit AD card (PCI-
E; National Instruments) at . kHz to collect broad band neural activities
including local field potential, spiking activities and residual interference noise. For
all measurements of spontaneous uctuations, the eyes were kept open in darkness,
with stimulus projectors powered off to avoid the effects of LCD flicker [Logothetis
et al., ]. We collected spontaneous activity data and data during full field visual
stimulation (blank-stimulus-blank/--s, visual stimulation with polar gratings,
%contrast, rotating infullfield, sessionduration minutes). Intotal weacquired
 sessions of evoked activity recordings and  sessions of spontaneous activity.
B.3. DATA PREPROCESSING 115
B.3 Data Preprocessing
Data analysis was done using custom-written soware in MATLAB (e Math-
Works, Natick, MA). Regions of interest (ROIs) were manually defined according to
anatomically defined borders [Saleem et al., ]. Aer discarding the first vol-
umes, voxeltime courses withineach ROIwere centeredandnormalized to standard
deviation units (SDU) based on blank periods. In addition to the ROI comprising
the entire primary and parts of secondary visual cortex, we masked the ROIs with
isotropic distance dependent ROIs at distances of , , , ,  and mm away from
the recording site. For neuronal data broad-band signals were decimated by a factor
of and a PCA based denoising method was applied to remove residual interfer-
ence (see section .). Aer this removal of the scanner gradient induced artefacts,
the signal was split into frequency bands fi(f=-, f=-, f=-, f=-,
f=-, f=-, f=-Hz). en the signal was converted into ab-
solute amplitude, resampled to Hz aer appropriate low pass filtering and nor-
malized to standard deviation units based on blank periods. Frequency separation
was based on results from information theoretical analysis [Belitski et al., ].
116 APPENDIX B. EXPERIMENTAL PROTOCOLS
Bibliography
G. K. Aguirre, E. Zarahn, and M. DEsposito. e variability of human, BOLD hemodynamic re-
sponses. Neuroimage, ():–, .
A. Aizerman, E. Braverman, and L. Rozonoer. eoretical foundations of the potential function
method in pattern recognition learning. Automation and Remote Control, :–, .
S. Akaho. A kernel method for canonical correlation analysis. In Proceedings of the International
Meeting of Psychometric Society, .
P. J. Allen, G. Polizzi, K. Krakow, D. Fish, and L. Lemieux. Identification of EEG events in the MR
scanner: the problem of pulse artifact and a method for its subtraction. Neuroimage, ():–,
.
P. J. Allen, O. Josephs, and R. Turner. A method for removing imaging artifact from continuous EEG
recorded during functional MRI. Neuroimage, ():–, .
T. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley, .
T. Arthur and P. Nader. In Vivo Optical Imaging of Brain Function, chapter Intraoperative Optical
Imaging. CRC Press, .
D. Attwell and C. Iadecola. e neural basis of functional brain imaging signals. Trends in Neuro-
sciences, ():–, .
D. Attwell and S. B. Laughlin. An energy budget for signaling in the grey matter of the brain. J Cereb
Blood Flow Metab, ():–, .
F. R. Bach and M. I. Jordan. Kernel independent component analysis. Journal of Machine Learning
Research, :–, .
F. R. Bach and M. I. Jordan. A probabilistic interpretation of canonical correlation analysis. Technical
Report, .
S. Baillet. Electromagnetic brain mapping. Signal Processing Magazine, IEEE, (): , .
G. Bakir, J. Weston, and B. B. Schölkopf. Learning to find pre-images. Advances in Neural Information
Processing Systems, .
A. Belitski, A. Gretton, C. Magri, Y. Murayama, M. A. Montemurro, N. K. Logothetis, and S. Panzeri.
Low-frequency local field potentials and spikes in primary visual cortex convey independent visual
information. J Neurosci, ():–, .
118 BIBLIOGRAPHY
A. J. Bell and T. J. Sejnowski. An information-maximization approach to blind separation and blind
deconvolution. Neural Computation, ():–, .
H. Benali, M. Pélégrini-Issac, and F. Kruggel. Spatio-temporal covariance model for medical images
sequences: Application to functional MRI data. Information Processing in Medical Imaging, :
–, .
C. M. Bennett, G. L. Wolford, and M. B. Miller. e principled control of false positives in neuroimag-
ing. Soc Cogn Affect Neurosci, ():–, .
H. Berger. Über das Elektroenkephalogramm des Menschen. Archiv für Psychatrie, , .
S. Bergstrand, M. B. Åberg, T. Niiniskorpi, and J. Wessberg. Towards unified analysis of EEG and
fMRI: a comparison of classifiers for single-trial pattern recognition. International Joint Conference
on Biomedical Engineering Systems and Technologies - BIOSTEC, pages –, .
J. Berwick, D. Johnston, M. Jones, J. Martindale, C. Martin, A. J. Kennerley, P. Redgrave, and J. E. W.
Mayhew. Fine detail of neurovascular coupling revealed by spatiotemporal analysis of the hemo-
dynamic response to single whisker stimulation in rat barrel cortex. Journal of Neurophysiology, 
():–, .
A. Bhattacharyya, F. Bießmann, J. Veit, R. Kretz, and G. Rainer. Functional and laminar dissocia-
tion between muscarinic and nicotinic cholinergic neuromodulation in the primary visual cortex.
Submitted, .
T. D. Bie and N. Cristianini. Kernel methods for exploratory pattern analysis: a demonstration on text
data. Springer Lecture Notes in Computer Science, :–, .
T. D. Bie and B. D. Moor. On the regularization of canonical correlation analysis. Int. Sympos. ICA
and BSS, .
T. D. Bie, N. Cristianini, and R. Rosipal. Eigenproblems in pattern recognition. Handbook of Geometric
Computing : Applications in Pattern Recognition, Computer Vision, Neuralcomputing, and Robotics,
.
F. Bießmann and A. Harth. Analysing dependency dynamics in web data. AAAI Spring Symposium,
.
F. Bießmann, A. Rauch, F. C. Meinecke, X. Zhang, G. Rainer, K.-R. Müller, and N. K. Logothetis.
InvestigatingtherelationshipbetweenpharmacologicalMRIandelectrophysiologyusingcanonical
correlation analysis. Forum of European Neuroscience, :, .
F. Bießmann, A. Gretton, F. C. Meinecke, X. Zhang, G. Rainer, N. K. Logothetis, K.-R. Müller, and
A. Rauch. Investigating neurovascular coupling using canonical correlation analysis between phar-
macological MRI and electrophysiology. BMC Neuroscience, (Suppl ):P, .
F. Bießmann, F. C. Meinecke, A. Bhattacharyya, J. Veit, R. Kretz, K.-R. Müller, and G. Rainer. Com-
parison of v receptive fields mapped with spikes and local field potentials. Computational and
systems neuroscience  (COSYNE), , a.
F. Bießmann, F. C. Meinecke, A. Gretton, A. Rauch, G. Rainer, N. K. Logothetis, and K.-R. Müller.
Temporal kernel cca and its application in multimodal neuronal data analysis. Machine Learning
Journal, (-):—, b.
BIBLIOGRAPHY 119
F. Bießmann, Y. Murayama, N. K. Logothetis, K.-R. Müller, and F. C. Meinecke. Improved decod-
ing of neural activity from fMRI signals: Towards non-separable spatiotemporal deconvolutions.
Neuroimage (in revision), a.
F. Bießmann, S. M. Plis, F. C. Meinecke, T. Eichele, and K.-R. Müller. Analysis of multimodal neu-
roimaging data. IEEE Reviews in Biomedical Engineering (accepted), b.
D. A. Boas, S. R. Jones, A. Devor, T. J. Huppert, and A. M. Dale. A vascular anatomical network model
of the spatio-temporal response to brain activation. Neuroimage, ():–, .
I. Bojak, T. F. Oostendorp, A. T. Reid, and R. tter. Towards a model-based integration of co-
registered electroencephalography/functional magnetic resonance imaging data with realistic neu-
ral population meshes. Philos Transact A Math Phys Eng Sci, ():–, .
M. Borga. Learning multidimensional signal processing. PhD esis, Dissertation No. , Linping
Studies in Science and Technology, .
A. Bozkurt, A. Rosen, H. Rosen, and B. Onaral. A portable near infrared spectroscopy system for
bedside monitoring of newborn brain. Biomed Eng Online, ():, .
N. Brunel. Dynamics of networks of randomly connected excitatory and inhibitory spiking neurons.
J Physiol Paris, (-):–, .
E. Bullmore, M. Brammer, S. C. Williams, S. Rabe-Hesketh, N. Janot, A. David, J. Mellers, R. Howard,
and P. Sham. Statistical methods of estimation and inference for functional mr image analysis.
Magn Reson Med, ():–, .
R. B. Buxton, E. C. Wong, and L. R. Frank. Dynamics of blood flow and oxygenation changes during
brain activation: the balloon model. Magn Reson Med, ():–, .
R. B. Buxton, K. Uludag, D. J. Dubowitz, and T. T. Liu. Modeling the hemodynamic response to brain
activation. Neuroimage, :–, .
V. D. Calhoun, T. Adali, L. K. Hansen, J. Larsen, and J. J. Pekar. ICA of functional MRI data: an
overview. Proceedings of the International Workshop on Independent Component Analysis and Blind
Signal Separation, pages –, .
V. D. Calhoun, J. Liu, and T. Adalı. A review of group ICA for fMRI data and ICA for joint inference
of imaging, genetic, and erp data. Neuroimage, (S):S–S, .
J.-F. Cardoso and A. Souloumiac. Blind beamforming for non gaussian signals. IEE Proceedings-F,
:–, .
R. Caton. e electric currents of the brain. British Medical Journal, ():, .
Y. Chen, P. Namburi, L. T. Elliott, J. Heinzle, C. S. Soon, M. W. Chee, and J.-D. Haynes. Cortical
surface-based searchlight decoding. NeuroImage, (): , . ISSN -.
Multivariate Decoding and Brain Reading.
D. Cohen. Magnetoencephalography: evidenceofmagneticfieldsproducedbyalpha-rhythmcurrents.
Science, ():–, .
D. M. Cole, S. M. Smith, and C. F. Beckmann. Advances and pitfalls in the analysis and interpretation
of resting-state fMRI data. Front Syst Neurosci, :, .
120 BIBLIOGRAPHY
N. M. Correa, T. Eichele, T. Adalı, Y.-O. Li, and V. D. Calhoun. Multi-set canonical correlation analysis
for the fusion of concurrent single trial erp and functional MRI. Neuroimage, ():–,
.
T. Cover and J. omas. Elements of information theory. Wiley series in telecommunications and signal
processing, .
D. Cox and R. Savoy. Functional magnetic resonance imaging (fMRI)“brain reading”: detecting
and classifying distributed patterns of fMRI activity in human visual cortex. Neuroimage, ():
–, .
B. N. Cuffin and D. Cohen. Comparison of the magnetoencephalogram and electroencephalogram.
Electroencephalogr Clin Neurophysiol, ():–, .
F. L. da Silva and E. Niedermayer. Biophysical aspects of EEG and magnetoencephalogram generation.
Williams & Wilkins, .
A. M. Dale and E. Halgren. Spatiotemporal mapping of brain activity by integration of multiple imag-
ing modalities. Curr Opin Neurobiol, ():–, .
A. M. Dale and M. I. Sereno. Improved localization of cortical activity by combining EEG and meg
with MRI cortical surface reconstruction: A linear approach. Journal of Cognitive Neuroscience, :
–, .
A. M. Dale, A. K. Liu, B. R. Fischl, R. L. Buckner, J. W. Belliveau, J. D. Lewine, and E. Halgren. Dynamic
statistical parametric mapping: combining fMRI and meg for high-resolution imaging of cortical
activity. Neuron, ():–, .
J. Daunizeau, C. Grova, G. Marrelec, J. Mattout, S. Jbabdi, M. Pélégrini-Issac, J.-M. Lina, and H. Be-
nali. Symmetricalevent-relatedEEG/fMRI informationfusionin avariationalbayesianframework.
Neuroimage, ():–, .
H. P. O. de Beeck. Against hyperacuity in brain reading: spatial smoothing does not hurt multivariate
fMRI analyses? Neuroimage, ():–, .
J. C. de Munck, S. I. Gonçalves, T. J. C. Faes, J. P. A. Kuijer, P. J. W. Pouwels, R. M. Heethaar, and
F. H. L. da Silva. A study of the brains resting state based on alpha band power, heart rate and
fMRI. Neuroimage, ():–, .
S. Debener, M. Ullsperger, M. Siegel, and A. K. Engel. Single-trial EEG-fMRI reveals the dynamics of
cognitive function. Trends Cogn Sci, ():–, .
A. Devor, A. K. Dunn, M. L. Andermann, I. Ulbert, D. A. Boas, and A. M. Dale. Coupling of total
hemoglobin concentration, oxygenation, and neural activity in rat somatosensory cortex. Neuron,
():–, .
A. Devor, I. Ulbert, A. K. Dunn, S. N. Narayanan, S. R. Jones, M. L. Andermann, D. A. Boas, and A. M.
Dale. Coupling of the cortical hemodynamic response to cortical and thalamic neuronal activity.
Proc Natl Acad Sci USA, ():–, .
K. I. Diamantaras and S. Y. Kung. Principal component neural networks: theory and applications. Wiley,
.
R. Duda, P. Hart, and D. Stork. Pattern classification. Wiley, .
BIBLIOGRAPHY 121
A. K. Dunn, A. Devor, A. M. Dale, and D. A. Boas. Spatial extent of oxygen metabolism and hemo-
dynamic changes during functional activation of the rat somatosensory cortex. Neuroimage, ():
–, .
T. Eichele, K. Specht, M. Moosmann, M. L. A. Jongsma, R. Q. Quiroga, H. Nordby, and K. Hugdahl.
Assessing the spatiotemporal evolution of neuronal activation with single-trial event-related po-
tentials and functional MRI. Proc Natl Acad Sci USA, ():–, .
T. Eichele, V. D. Calhoun, M. Moosmann, K. Specht, M. L. A. Jongsma, R. Q. Quiroga, H. Nordby, and
K. Hugdahl. Unmixing concurrent EEG-fMRI with parallel independent component analysis. Int
J Psychophysiol, ():–, .
S. Fazli, J. Mehnert, G. Curio, A. Villringer, K.-R. Müller, J. Steinbrink, and B. Blankertz. Enhanced
performance by a hybrid NIRS-EEG brain computer interface. Neuroimage, .
C. Fischer, F. Dailler, and D. Morlet. Noveltyp elicited by thesubjects ownnamein comatose patients.
Clin Neurophysiol, ():–, .
R. A. Fisher. e use of multiple measurements in taxonomic problems. Annals of Eugenics, :–,
.
O. Friman, M. Borga, P. Lundberg, and H. Knutsson. Exploratory fMRI analysis by autocorrelation
maximization. Neuroimage, ():–, .
K. J. Friston. Modalities, modes, and models in functional neuroimaging. Science, ():–,
.
K. J. Friston and C. Buchel. Statistical Parametric Mapping: e Analysis of Functional Brain Images.
Elseviers Science Publishers, .
K. J. Friston, P. Jezzard, and R. Turner. Analysis of functional MRI time-series. Human Brain Mapping,
:–, .
K. J. Friston, A. P. Holmes, J. B. Poline, P. J. Grasby, S. C. Williams, R. S. Frackowiak, and R. Turner.
Analysis of fMRI time-series revisited. Neuroimage, ():–, .
K. J. Friston, A. Mechelli, R. Turner, and C. J. Price. Nonlinear responses in fMRI: the balloon model,
volterra kernels, and other hemodynamics. Neuroimage, ():–, .
K. J. Friston, L. Harrison, and W. Penny. Dynamic causal modelling. Neuroimage, ():–,
.
R. D. Frostig, E. E. Lieke, D. Y. Tso, and A. Grinvald. Cortical functional architecture and local cou-
plingbetweenneuronalactivityandthemicrocirculationrevealedbyinvivohigh-resolutionoptical
imaging of intrinsic signals. Proc Natl Acad Sci USA, ():–, .
K. Fukumizu, F. R. Bach, and A. Gretton. Statistical consistency of kernel cca. Journal of Machine
Learning Research, :–, .
C. Fyfe. Two methods for sparsifying probabilistic canonical correlation analysis. Proceedings of the
International Conference on Neural Information Processing, pages –, .
C. Fyfe and P. Lai. ICA using kernel canonical correlation analysis. In Proc. Int. Workshop on Inde-
pendent Component Analysis, .
122 BIBLIOGRAPHY
C. Fyfe and G. Leen. Stochastic processes for canonical correlation analysis. Proceedings of the th
European Symposium on Artificial Neural Networks, .
P. Geladi and B. R. Kowalski. Partial least-squares regression: A tutorial. Analytica Chimica Acta,
.
H. Girouard andC. Iadecola. Neurovascular coupling in the normal brain and in hypertension, stroke,
and alzheimer disease. Journal of Applied Physiology, ():, .
J. B. M. Goense and N. K. Logothetis. Laminar specificity in monkey v using high-resolution se-fMRI.
Magnetic Resonance Imaging, ():–, .
J. B. M. Goense and N. K. Logothetis. Neurophysiology of the BOLD fMRI signal in awake monkeys.
Curr Biol, ():–, .
R. I. Goldman, J. Stern, J. EngelJr, and M. Cohen. Acquiring simultaneous EEG and functional MRI.
Clinical Neurophysiology, ():–, .
R. I. Goldman, J. M. Stern, J. Engel, and M. S. Cohen. Simultaneous EEG and fMRI of the alpha
rhythm. Neuroreport, ():–, .
J. Gotman, C. Grova, A. Bagshaw, E. Kobayashi, Y. Aghakhani, and F. Dubeau. Generalized epileptic
discharges show thalamocortical activation and suspension of the default state of the brain. Proc
Natl Acad Sci USA, ():–, .
C. W. J. Granger. Investigating causal relations by econometric models and cross-spectral methods.
Econometrica, :–, .
M. D. Greicius, B. Krasnow, A. L. Reiss, and V. Menon. Functional connectivity in the resting brain: a
network analysis of the default mode hypothesis. Proc Natl Acad Sci USA, ():–, .
A. Gretton, R. Herbrich, A. J. Smola, O. Bousquet, and B. Schölkopf. Kernel methods for measuring
independence. Journal of Machine Learning Research, :–, .
A. Grinvald, E. Lieke, R. D. Frostig, C. D. Gilbert, and T. N. Wiesel. Functional architecture of cortex
revealed by optical imaging of intrinsic signals. Nature, ():–, .
A. Grinvald, E. E. Lieke, R. D. Frostig, and R. Hildesheim. Cortical point-spread function and long-
range lateral interactions revealed by real-time optical imaging of macaque monkey primary visual
cortex. J Neurosci, ( Pt ):–, .
F. Grynszpan and D. B. Geselowitz. Model studies of the magnetocardiogram. Biophys J, ():–,
.
R. Guibert, C. Fonta, and F. Plouraboué. Cerebral blood flow modeling in primate cortex. J Cereb
Blood Flow Metab, ():–, .
E. Hamel. Perivascular nerves and the regulation of cerebrovascular tone. Journal of applied physiology,
():, .
D. A. Handwerker, J. M. Ollinger, and M. D’Esposito. Variation of BOLD hemodynamic responses
across subjects and brain regions and their effects on statistical analyses. Neuroimage, ():
–, .
D. R. Hardoon and J. Shawe-Taylor. Sparse canonical correlation analysis. Machine Learning, .
BIBLIOGRAPHY 123
D. R. Hardoon, J. Mourão-Miranda, M. Brammer, and J. Shawe-Taylor. Unsupervised analysis of fMRI
data using kernel canonical correlation. Neuroimage, ():–, .
J. V. Haxby, M. Gobbini, M. Furey, A. Ishai, J. Schouten, and P. Pietrini. Distributed and overlapping
representations of faces and objects in ventral temporal cortex. Science, ():, .
J.-D. Haynes and G. Rees. Predicting the stream of consciousness from activity in human visual cortex.
Current Biology, ():–, .
J.-D. Haynes and G. Rees. Decoding mental states from brain activity in humans. Nature Reviews
Neuroscience, ():–, .
J.-D. Haynes, K. Sakai, G. Rees, S. Gilbert, C. Frith, and R. E. Passingham. Reading hidden intentions
in the human brain. Current Biology, ():–, .
R. Henson. Demystifying parametric analyses: Illustrating canonical correlation analysis as the mul-
tivariate general linear model. Multiple Linear Regression Viewpoints, .
A. L. Hodgkin and A. F. Huxley. A quantitative description of membrane current and its application
to conduction and excitation in nerve. Journal of Physiology, :–, .
M. Hollander and D. Wolfe. Nonparametric statistical methods. John Wiley & Sons, Inc., .
H. Hotelling. Relations between two sets of variates. Biometrika, ():–, .
D. Hubel and T. Wiesel. Receptive fields of single neurones in the cats striate cortex. e Journal of
Physiology, :–, .
D. H. Hubel and T. N. Wiesel. Receptive fields, binocular interaction and functional architecture in
the cats visual cortex. e Journal of Physiology, :–, .
A. Hyvärinen. Fast and robust fixed-point algorithms for independent component analysis. IEEE
Transactions on Neural Networks, ():–, .
C. Iadecola. Neurovascular regulation in the normal brain and in alzheimers disease. Nature Reviews
Neuroscience, ():–, .
C.-H. Im, A. Gururajan, N. Zhang, W. Chen, and B. He. Spatial resolution of EEG cortical source
imaging revealed by localization of retinotopic organization in human primary visual cortex. Jour-
nal of Neuroscience Methods, ():–, .
J. R. Ives, S. Warach, F. Schmitt, R. R. Edelman, and D. L. Schomer. Monitoring the patients EEG
during echo planar MRI. Electroencephalogr Clin Neurophysiol, ():–, .
K. Jann, R. Wiest, M. Hauf, K. Meyer, C. Boesch, J. Mathis, G. Schroth, T. Dierks, and T. Koenig.
Bold correlates of continuously fluctuating epileptic activity isolated by independent component
analysis. Neuroimage, ():–, .
F. F. Jöbsis. Noninvasive, infrared monitoring of cerebral and myocardial oxygen sufficiency and cir-
culatory parameters. Science, ():–, .
C. Jordan. Essai sur la Géometrie à ndimensions. Bull. Soc. Math. France, :–, . = Oeuvres
(Tome III), Gauthiers-Villars, Paris, , -.
R. E. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering,
():–, .
124 BIBLIOGRAPHY
Y. Kamitani and Y. Sawahata. Spatial smoothing hurts localization but not information: Pitfalls for
brain mappers. Neuroimage, .
K. Katanoda, Y. Matsuda, and M. Sugishita. A spatio-temporal regression model for the analysis of
functional MRI data. Neuroimage, ():–, .
S. Katzner, I. Nauhaus, A. Benucci, V. Bonin, D. L. Ringach, and M. Carandini. Local origin of field
potentials in visual cortex. Neuron, ():–, .
A. L. Keller, B. Weber, and N. K. Logothetis. Systematic investigation of vascular corrosion casts of the
macaque monkey brain. Proceedings of the th Forum of European Neuroscienc (FENS), .
J. R. Kettenring. Canonical analysis of several sets of variables. Biometrika, ():–, .
S. J. Kiebel and K. J. Friston. Statistical parametric mapping for event-related potentials: I. generic
considerations. Neuroimage, ():–, .
T. Kim, S. Wong, and R. Cipolla. Tensor canonical correlation analysis for action classification. IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, .
K. Krakow, F. G. Woermann, M. R. Symms, P. J. Allen, L. Lemieux, G. J. Barker, J. S. Duncan, and
D. R. Fish. Eeg-triggered functional MRI of interictal epileptiform activity in patients with partial
seizures. Brain,  ( Pt ):–, .
N. Kriegeskorte, R. Goebel, and P. Bandettini. Information-based functional brain mapping. Proceed-
ings of the National Academy of Sciences, ():, .
N. Kriegeskorte, W. K. Simmons, P. S. F. Bellgowan, and C. I. Baker. Circular analysis in systems
neuroscience: the dangers of double dipping. Nat Neurosci, ():–, .
J. T. Kwok and I. W. Tsang. e pre-image problem in kernel methods. Proceedings of the Twentieth
International Conference on Machine Learning, .
N. Lange, S. C. Strother, J. R. Anderson, F. A. Nielsen, A. P. Holmes, T. Kolenda, R. Savoy, and L. K.
Hansen. Plurality and resemblance in fMRI data analysis. Neuroimage, ( Pt ):–, .
W. Larimore. Canonical variate analysis in identification, filtering, and adaptive control. Proceedings
of the th Conference on Decision and Control, pages –, .
H. Laufs. Endogenous brain oscillations and related networks detected by surface EEG-combined
fMRI. Human Brain Mapping, ():–, .
H. Laufs and J. S. Duncan. Electroencephalography/functional MRI in human epilepsy: what it cur-
rently can and cannot do. Curr Opin Neurol, ():–, .
H. Laufs, K. Krakow, P. Sterzer, E. Eger, A. Beyerle, A. Salek-Haddadi, and A. Kleinschmidt. Elec-
troencephalographic signatures of attentional and cognitive default modes in spontaneous brain
activity fluctuations at rest. Proc Natl Acad Sci USA, ():–, .
H. Laufs, J. Daunizeau, D. W. Carmichael, and A. Kleinschmidt. Recent advances in recording electro-
physiological data simultaneously with magnetic resonance imaging. Neuroimage, ():–,
.
A.-M. Legendre. Nouvelles méthodes pour la détermination des orbites des comètes, chap-
ter Sur la methode des moindres quarres. Firmin Didot, http://imgbase-scd-ulp.u-
strasbg.fr/displayimage.php?pos=-, .
BIBLIOGRAPHY 125
G. Lehev and M. A. L. Nicolelis. State-of-the-art microwire array design for chronic neural recordings
in behaving animals–methods for neural ensemble recordings. in Nicolelis MAL, (Ed) Methods for
Neural Ensemble Recordings. nd edition. Boca Raton (FL): CRC Press Frontiers in Neuroscience.,
.
L. Lemieux, P. J. Allen, F. Franconi, M. R. Symms, and D. R. Fish. Recording of EEG during fMRI
experiments: patient safety. Magn Reson Med, ():–, .
L. Lemieux, A. Salek-Haddadi, O. Josephs, P. J. Allen, N. Toms, C. Scott, K. Krakow, R. Turner, and
D. Fish. Event-related fMRI with simultaneous and continuous EEG: Description of the method
and initial case report. Neuroimage, ():–, .
S. Lemm, B. Blankertz, T. Dickhaus, and K.-R. Müller. Introduction to machine learning for brain
imaging. Neuroimage, ():–, .
in press.
D. A. Leopold, Y. Murayama, and N. K. Logothetis. Very slow activity fluctuations in monkey visual
cortex: Implications for functional brain imaging. Cerebral Cortex, pages –, .
J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In
Knowledge Discovery in Databases, .
S. Leurgans, R. Moyeed, and B. Silverman. Canonical correlation analysis when the data are curves.
Journal of the Royal Statistical Society, .
A. K. Liu, J. W. Belliveau, and A. M. Dale. Spatiotemporal imaging of human brain activity using
functional MRI constrained magnetoencephalography data: Monte carlo simulations. Proc Natl
Acad Sci USA, ():–, .
Z. Liu and B. He. fMRI-EEG integrated cortical source imaging by use of time-variant spatial con-
straints. Neuroimage, ():–, .
N. K. Logothetis. e underpinnings of the BOLD functional magnetic resonance imaging signal. J
Neurosci, ():–, .
N. K. Logothetis. What we can do and what we cannot do with fMRI. Nature, ():–,
.
N. K. Logothetis and B. A. Wandell. Interpreting the BOLD signal. Annu Rev Physiol, :–,
.
N. K. Logothetis, J. Pauls, M. A. Augath, T. Trinath, and A. Oeltermann. Neurophysiological investi-
gation of the basis of the fMRI signal. Nature, ():–, .
N. K. Logothetis, Y. Murayama, M. A. Augath, T. Steffen, J. Werner, and A. Oeltermann. How not to
study spontaneous activity. Neuroimage, ():–, .
N. K. Logothetis, M. A. Augath, Y. Murayama, A. Rauch, F. Sultan, J. B. M. Goense, A. Oeltermann,
and H. Merkle. e effects of electrical microstimulation on cortical signal propagation. Nature
Neuroscience, ():–, .
Y. Lu, C. Grova, E. Kobayashi, F. Dubeau, and J. Gotman. Using voxel-specific hemodynamic re-
sponse function in EEG-fMRI data analysis: An estimation and detection model. Neuroimage, 
():–, .
126 BIBLIOGRAPHY
J. H. Macke, G. Zeck, and M. Bethge. Receptive fields without spike-triggering. In Advances in Neural
Information Processing Systems : Proceedings of the  Conference, pages –, .
D. Mantini, M. G. Perrucci, C. D. Gratta, G. L. Romani, and M. Corbetta. Electrophysiological sig-
natures of resting state networks in the human brain. Proc Natl Acad Sci USA, ():–,
.
J. Martindale, J. Mayhew, J. Berwick, M. Jones, C. Martin, D. Johnston, P. Redgrave, and Y. Zheng. e
hemodynamic impulse response to a single neural event. J Cereb Blood Flow Metab, ():–,
.
E. Martínez-Montes, P. A. Valdés-Sosa, F. Miwakeichi, R. I. Goldman, and M. S. Cohen. Concurrent
EEG/fMRI analysis by multiway partial least squares. Neuroimage, ():–, .
F. D. Martino, A. W. de Borst, G. Valente, R. Goebel, and E. Formisano. Predicting EEG single trial
responses with simultaneous fMRI and relevance vector machine regression. Neuroimage, ():
–, .
M. McKeown and T. J. Sejnowski. Independent component analysis of fMRI data: Examining the
assumptions. Human Brain Mapping, pages –, .
T. Melzer, M. Reiter, and H. Bischof. Nonlinear feature extraction using generalized canonical corre-
lation analysis. International Conference for Artificial Neural Networks, pages –, .
S. Mika, B. B. Schölkopf, A. Smola, K.-R. Müller, M. Scholz, and G. Rätsch. Kernel pca and de–noising
in feature spaces. In Advances in Neural Information Processing Systems, pages –, .
M. M. Monti. Statistical analysis of fMRI time-series: A critical review of the glm approach. Front
Hum Neurosci, :, .
M. Moosmann, P. Ritter, I. Krastel, A. Brink, S. ees, F. Blankenburg, B. Taskin, H. Obrig, and A. Vill-
ringer. Correlates of alpha rhythm in functional magnetic resonance imaging and near infrared
spectroscopy. Neuroimage, ():–, .
M. Moosmann, T. Eichele, H. Nordby, K. Hugdahl, and V. D. Calhoun. Joint independent component
analysis for simultaneous EEG-fMRI: principle and simulation. Int J Psychophysiol, ():–,
.
V. B. Mountcastle. Modality and topographic properties of single neurons of cats somatic sensory
cortex. Journal of Neurophysiology, ():–, .
J. Mourão-Miranda, K. J. Friston, and M. Brammer. Dynamic discrimination analysis: a spatial-
temporal svm. Neuroimage, ():–, .
T. Muehlemann, D. Haensse, and M. Wolf. Wireless miniaturized in-vivo near infrared imaging. Opt
Express, ():–, .
C. Mulert and L. Lemieux. Eeg-fMRI: Physiological basis, technique, and applications. Springer, .
K.-R. Müller, S. Mika, G. Ratsch, K. Tsuda, and B. B. Schölkopf. An introduction to kernel-based
learning algorithms. IEEE Transactions on Neural Networks, ():–, .
K.-R. Müller, C. Anderson, and B. Birch. Linear and nonlinear methods for brain computer interfaces.
IEEE Transactions on Neural Systems and Rehabilitation Engineering, ():–, .
BIBLIOGRAPHY 127
S. Murakami and Y. Okada. Contributions of principal neocortical neurons to magnetoencephalog-
raphy and electroencephalography signals. Journal of Physiology, ():-, .
Y. Murayama, F. Bießmann, F. C. Meinecke, K.-R. Müller, M. A. Augath, A. Oeltermann, and N. K.
Logothetis. Relationship between neural and hemodynamic signals during spontaneous activity
studied with temporal kernel cca. Magnetic Resonance Imaging, ():–, .
J. Niessing, B. Ebisch, K. E. Schmidt, M. Niessing, W. Singer, and R. A. W. Galuske. Hemodynamic
signals correlate tightly with synchronized gamma oscillations. Science, ():–, .
K. A. Norman, S. M. Polyn, G. J. Detre, and J. V. Haxby. Beyond mind-reading: multi-voxel pattern
analysis of fMRI data. Trends in Cognitive Sciences, ():–, .
P. L. Nunez and R. Srinivasan. Electric fields of the brain: the neurophysics of EEG. Oxford University
Press, .
A. Oeltermann, M. A. Augath, and N. K. Logothetis. Simultaneous recording of neuronal signals and
functional nmr imaging. Magnetic Resonance Imaging, ():–, .
S. Ogawa, T. M. Lee, A. S. Nayak, and P. Glynn. Oxygenation-sensitive contrast in magnetic resonance
image of rodent brain at high magnetic fields. Magn Reson Med, ():–, .
S. Ogawa, D. W. Tank, R.Menon, J. M. Ellermann, S.G.Kim, H.Merkle, and K.Ugurbil. Intrinsic signal
changes accompanying sensory stimulation: functional brain mapping with magnetic resonance
imaging. Proc Natl Acad Sci USA, ():–, .
T. F. Oostendorp and A. van Oosterom. Source parameter estimation in inhomogeneous volume
conductors of arbitrary shape. IEEE Trans Biomed Eng, ():–, .
D. Ostwald, C. Porcaro, and A. P. Bagshaw. An information theoretic approach to EEG-fMRI inte-
gration of visually evoked responses. Neuroimage, ():–, .
D. Ostwald, C. Porcaro, and A. P. Bagshaw. Voxel-wise information theoretic EEG-fMRI feature in-
tegration. Neuroimage, pages –, .
K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, :
–, .
G. Pfurtscheller, B. Z. Allison, C. Brunner, G. Bauernfeind, T. Solis-Escalante, R. Scherer, T. O. Zander,
G. Mueller-Putz, C. Neuper and N. Birbaumer e hybrid BCI. Front Neurosci, :-, .
S. M. Plis, V. D. Calhoun, M. P. Weisend, T. Eichele, and T. Lane. Meg and fMRI fusion for non-linear
estimation of neural and BOLD signal changes. Frontiers in neuroinformatics, :, .
N. Pouratian, A. F. Cannestra, N. A. Martin, and A. W. Toga. Intraoperative optical intrinsic signal
imaging: a clinical tool for functional brain mapping. Neurosurg Focus, ():e, .
Karl Popper. Logik der Forschung, .
M. E. Raichle, A. M. MacLeod, A. Z. Snyder, W. J. Powers, D. A. Gusnard, and G. L. Shulman. A default
mode of brain function. Proc Natl Acad Sci USA, ():–, .
C. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. MIT Press, .
128 BIBLIOGRAPHY
A. Rauch, X. Zhang, F. Bießmann, F. C. Meinecke, J. B. M. Goense, K.-R. Müller, G. Rainer, and N. K.
Logothetis. Baseline BOLD signal shi in macaque primary visual cortex (V) aer local applica-
tion of acetylcholine. in preparation, .
J. J. Riera, E. Aubert, K. Iwata, R. Kawashima, X. Wan, and T. Ozaki. Fusing EEG and fMRI based on
a bottom-up model: inferring activation and effective connectivity in neural masses. Philos Trans
R Soc Lond, B, Biol Sci, ():–, .
J. J.Riera, J. C. Jimenez, X. Wan, R. Kawashima, and T. Ozaki. Nonlinearlocal electrovascular coupling.
ii: From data to neuronal masses. Human Brain Mapping, ():–, .
P. Ritter and A. Villringer. Simultaneous EEG-fMRI. Neurosci Biobehav Rev, ():–, .
P. Ritter, M. Moosmann, and A. Villringer. Rolandic alpha and beta EEG rhythms strengths are in-
versely related to fMRI-BOLD signal in primary somatosensory and motor cortex. Hum Brain
Mapp, ():–, .
L. R. Robinson, P. J. Micklesen, D. L. Tirschwell, and H. L. Lew. Predictive value of somatosensory
evoked potentials for awakening from coma. Crit Care Med, ():–, .
N. Roche-Labarbe, B. Zaaimi, P. Berquin, A. Nehlig, R. Grebe, and F. Wallois. Nirs-measured oxy- and
deoxyhemoglobin changes associated with EEG spike-and-wave discharges in children. Epilepsia,
():–, .
M. J. Rosa, J. Daunizeau, and K. J. Friston. Eeg-fMRI integration: A critical review of biophysical
modeling and data analysis approaches. J. Int. Neurosci., ():, a.
M. J. Rosa, J. Kilner, F. Blankenburg, O. Josephs, and W. Penny. Estimating the transfer function from
neuronal activity to BOLD using simultaneous EEG-fMRI. Neuroimage, ():–, b.
K. Rosenkranz and L. Lemieux. Present and future of simultaneous EEG-fMRI. Magn Reson Mater
Phy, (-):–, .
S. Roweis and Z. Ghahramani. A unifying review of linear gaussian models. Neural Computation, :
–, .
K. S. Saleem, J. M. Pauls, M. Augath, T. Trinath, B. A. Prause, T. Hashikawa, and N. K. Logothetis.
Magnetic resonance imaging of neuronal connections in the macaque monkey. Neuron, ():
–, .
A. Salek-Haddadi, K. Friston, L. Lemieux, and D. Fish. Studying spontaneous EEG activity with fMRI.
Brain research reviews, ():–, .
M. Sato, T. Yoshioka, S. Kajihara, K. Toyama, N. Goda, K. Doya, and M. Kawato. Hierarchical bayesian
estimation for meg inverse problem. Neuroimage, ():–, .
R. Scheeringa, M. C. M. Bastiaansen, K. M. Petersson, R. Oostenveld, D. G. Norris, and P. Hagoort.
Frontal theta EEG activity correlates negatively with the default mode network in resting state. Int
J Psychophysiol, ():–, .
B. Schölkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Opti-
mization, and Beyond. MIT Press, .
B. Schölkopf, A. J. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue
problem. Neural Computation, ():–, .
BIBLIOGRAPHY 129
B. Schölkopf, S. Mika, C. Burges, P. Knirsch, K.-R. Müller, G. Rätsch and A. Smola. Input space versus
feature space in kernel-based methods. Neural Networks, IEEE Transactions on, (): ,
.
C. E. Shannon. A mathematical theory of communication. Bell Syst. Tech. J., :–, .
J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge University Press,
.
A. Shmuel, E. Yacoub, D. Chaimow, N. K. Logothetis, and K. Ugurbil. Spatio-temporal point-spread
function of fMRI signal in human gray matter at tesla. Neuroimage, ():–, .
Y. B. Sirotin, E. M. C. Hillman, C. Bordier, and A. Das. Spatiotemporal precision and hemody-
namic mechanism of optical point spreads in alert primates. Proc Natl Acad Sci U S A, ():
–, .
S. M. Smith, P. T. Fox, K. L. Miller, D. C. Glahn, P. M. Fox, C. E. Mackay, N. Filippini, K. E. Watkins,
R. Toro, A. R. Laird, and C. F. Beckmann. Correspondence of the brains functional architecture
during activation and rest. Proc Natl Acad Sci USA, ():–, .
F. Steinke and B. Schölkopf. Kernels, regularization and differential equations. Pattern Recognition,
:–, .
G. Strang. Introduction to linear algebra. Wellesley-Cambridge Press, .
S. Szedmak, T. D. Bie, and D. R. Hardoon. A metamorphosis of canonical correlation analysis into
multivariate maximum margin learning. Proceedings of the th European Symposium on Artificial
Neural Networks, pages –, .
R. ornton, H. Laufs, R. Rodionov, S. Cannadathu, D. Carmichael, S. Vulliemoz, A. Salek-Haddadi,
A. McEvoy, S. Smith, and S. Lhatoo. Eeg correlated functional MRI and postoperative outcome in
focal epilepsy. Journal of Neurology, Neurosurgery & Psychiatry, ():, a.
R. C. ornton, R. Rodionov, H. Laufs, S. Vulliemoz, A. Vaudano, D. Carmichael, S. Cannadathu,
M. Guye, A. McEvoy, S. Lhatoo, F. Bartolomei, P. Chauvel, B. Diehl, F. D. Martino, R. D. C. Elwes,
M. C. Walker, J. S. Duncan, and L. Lemieux. Imaging haemodynamic changes related to seizures:
comparison of EEG-based general linear model, independent component analysis of fMRI and
intracranial EEG. Neuroimage, ():–, b.
P. Tian, I. C. Teng, L. D. May, R. Kurz, K. Lu, M. Scadeng, E. M. C. Hillman, A. J. D. Crespigny, H. E.
DArceuil, J. B. Mandeville, J. J. A. Marota, B. R. Rosen, T. T. Liu, D. A. Boas, R. B. Buxton, A. M.
Dale, and A. Devor. Cortical depth-specific microvascular dilation underlies laminar differences
in blood oxygenation level-dependent functional MRI signal. Proc Natl Acad Sci USA, ():
–, .
M. E. Tipping. Sparse bayesian learningand the relevance vector machine. Journal of Machine Learning
Research, :–, .
M. Ullsperger and S. Debener. Simultaneous EEG and fMRI: Recording, analysis, and application.
Oxford Univ Press, .
P. A. Valdes-Sosa, J. M. Sanchez-Bornot, R. C. Sotero, Y. Iturria-Medina, Y. Aleman-Gomez, J. Bosch-
Bayard, F. Carbonell, and T. Ozaki. Model driven EEG/fMRI fusion of brain oscillations. Human
Brain Mapping, ():–, .
130 BIBLIOGRAPHY
V. G.vandeVen, E. Formisano, D. Prvulovic, C. H.Roeder, and D. E. J. Linden. Functionalconnectivity
as revealed by spatial independent component analysis of fMRI measurements during rest. Human
Brain Mapping, ():–, .
M. A. J. van Gerven, B. Cseke, F. P. de Lange, and T. Heskes. Efficient bayesian multivariate fMRI
analysis using a sparsifying spatio-temporal prior. Neuroimage, ():–, .
A. Villringer and B. Chance. Non-invasive optical spectroscopy and imaging of human brain function.
Trends in Neurosciences, ():–, .
U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, ():–, .
E. Vul, C. Harris, P. Winkielman, and H. Pashler. Puzzlingly high correlations in fMRI studies of
emotion, personality, and social cognition. Perspectives on Psychological Science, ():–,
.
S. Vulliemoz, R. ornton, R. Rodionov, D. W. Carmichael, M. Guye, S. Lhatoo, A. W. McEvoy,
L. Spinelli, C. M. Michel, J. S. Duncan, and L. Lemieux. e spatio-temporal mapping of epileptic
networks: combination of EEG-fMRI and EEG source imaging. Neuroimage, ():–, .
S. Vulliemoz, L. Lemieux, J. Daunizeau, C. M. Michel, and J. S. Duncan. e combination of EEG
source imaging and EEG-correlated functional MRI to map epileptic networks. Epilepsia, ():
–, a.
S. Vulliemoz, R. Rodionov, D. W. Carmichael, R. ornton, M. Guye, S. D. Lhatoo, C. M. Michel, J. S.
Duncan, and L. Lemieux. Continuous EEG source imaging enhances analysis of EEG-fMRI in focal
epilepsy. Neuroimage, ():–, b.
S. Vulliemoz, D. W. Carmichael, K. Rosenkranz, B. Diehl, R. Rodionov, M. C. Walker, A. W. McEvoy,
and L. Lemieux. Simultaneous intracranial EEG and fMRI of interictal epileptic discharges in hu-
mans. Neuroimage, ():–, .
S. Waldert, T. Pistohl, C. Braun, T. Ball, A. Aertsen, and C. Mehring. A review on directional informa-
tion in neural signals for brain-machine interfaces. Journal of Physiology Paris, (-):–,
.
B. Weber, A. L. Keller, J. Reichold, and N. K. Logothetis. e microvascular system of the striate and
extrastriate visual cortex of the macaque. Cereb Cortex, ():–, .
K. Whittingstall and N. K. Logothetis. Frequency-band coupling in surface EEG reflects spiking ac-
tivity in monkey visual cortex. Neuron, ():–, .
A. Wiesel, M. Kliger, and A. Hero. A greedy approach to sparse canonical correlation analysis.
http://arxiv.org/abs/.v, .
H. Wold. Encyclopedia of statistical sciences, volume , chapter Partial least squares, pages –.
Wiley, .
M. Wolf, M. Ferrari, and V. Quaresima. Progress of near-infrared spectroscopy and topography for
brain and muscle clinical applications. J Biomed Opt, ():, .
M.T.Wong-Riley. Cytochromeoxidase: anendogenousmetabolicmarkerforneuronalactivity. Trends
in Neurosciences, ():–, .
BIBLIOGRAPHY 131
M. W. Woolrich, M. Jenkinson, J. M. Brady, and S. M. Smith. Fully bayesian spatio-temporal modeling
of fMRI data. IEEE transactions on medical imaging, ():–, .
K. J. Worsley and K. J. Friston. Analysis of fMRI time-series revisited–again. Neuroimage, ():–,
.
G. Wu, X. Duan, W. Liao, Q. Gao, and H. Chen. Kernel canonical-correlation granger causality for
multiple time series. Phys Rev E Stat Nonlin So Matter Phys, ( Pt ):, .
L. Wu, T. Eichele, and V. D. Calhoun. Reactivity of hemodynamic responses and functional connec-
tivity to different states of alpha synchrony: A concurrent EEG-fMRI study. Neuroimage, ():
–, .
D. Xing, C.-I. Yeh, and R. M. Shapley. Spatial spread of the local field potential and its laminar variation
in visual cortex. Journal of Neuroscience, ():–, .
E. Yacoub, K. Ugurbil, and N. Harel. e spatial dependence of the poststimulus undershoot as
revealed by high-resolution BOLD- and CBV-weighted fMRI. J Cereb Blood Flow Metab, ():
–, .
M. Yamada and M. R. Azimi-Sadjadi. Nonlinear signal estimation using kernel wiener filter in canon-
ical correlation analysis framework. In Proceedings of International Conference on Computational
Intelligence for Modelling, Control and Automation, .
L. Yang, Z. Liu, and B. He. Eeg-fMRI reciprocal functional neuroimaging. Clin Neurophysiol, ():
–, .
M. Ystad, E. Hodneland, S. Adolfsdottir, J. Haász, A. J. Lundervold, T. Eichele, and A. Lundervold.
Cortico-striatal connectivity and cognition in normal aging: a combined dti and resting state fMRI
study. Neuroimage, ():–, .
M. Zhao, M. Suh, H. Ma, C. Perry, A. Geneslaw, and T. H. Schwartz. Focal increases in perfusion
and decreases in hemoglobin oxygenation precede seizure onset in spontaneous human epilepsy.
Epilepsia, ():–, .