Oscillatory Source Tensor Discriminant Analysis (OSTDA): A regularized tensor pipeline for SSVEP-based BCI systems [original]

Oscillatory Source Tensor Discriminant Analysis (OSTDA): A regularized

tensor pipeline for SSVEP-based BCI systems

Tania Jorajuría

a,b,1,2

, Mina Jamshidi Idaji

b,c,d,2

, Zafer _

Isßcan

, Marisol Gómez

, Vadim V. Nikulin

b,f,

⇑

Carmen Vidaurre

⇑

Department of Statistics, Computer Science and Mathematics, Universidad Pública de Navarra, Pamplona, Spain

Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany

International Max Planck Research School NeuroCom, Leipzig, Germany

Machine Learning Group, Technical University of Berlin, Berlin, Germany

Department of Electrical & Electronics Engineering, Faculty of Engineering and Natural Sciences, Bahcesehir University, Istanbul, Turkey

Centre for Cognition and Decision Making, Institute for Cognitive Neuroscience, National Research University Higher School of Economics, Russian Federation

article info

Article history:

Received 26 February 2021

Revised 28 May 2021

Accepted 25 July 2021

Available online 23 December 2021

Communicated by Zidong Wang

Keywords:

Brain-computer interface

Steady-state visual evoked potential

Spatio-spectral decomposition

Higher order discriminant analysis

Analytical regularization

Tensor-based feature reduction

abstract

Periodic signals called Steady-State Visual Evoked Potentials (SSVEP) are elicited in the brain by flickering

stimuli. They are usually detected by means of regression techniques that need relatively long trial

lengths to provide feedback and/or sufficient number of calibration trials to be reliably estimated in

the context of brain-computer interface (BCI). Thus, for BCI systems designed to operate with SSVEP sig-

nals, reliability is achieved at the expense of speed or extra recording time. Furthermore, regardless of the

trial length, calibration free regression-based methods have been shown to suffer from significant perfor-

mance drops when cognitive perturbations are present affecting the attention to the flickering stimuli. In

this study we present a novel technique called Oscillatory Source Tensor Discriminant Analysis (OSTDA)

that extracts oscillatory sources and classifies them using the newly developed tensor-based discriminant

analysis with shrinkage. The proposed approach is robust for small sample size settings where only a few

calibration trials are available. Besides, it works well with both low- and high-number-of-channel set-

tings, using trials as short as one second. OSTDA performs similarly or significantly better than other

three benchmarked state-of-the-art techniques under different experimental settings, including those

with cognitive disturbances (i.e. four datasets with control, listening, speaking and thinking conditions).

Overall, in this paper we show that OSTDA is the only pipeline among all the studied ones that can

achieve optimal results in all analyzed conditions.

Ó2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND

license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction

A Brain-Computer Interface (BCI) is a system that uses brain sig-

nals to deliver commands according to their decoding [1]. Its appli-

cations range from neurological rehabilitation [2–6] to

communication for disabled people [1,7], controlling external

devices [8,9], marketing [10] and entertainment [11]. Over the past

years, several BCI systems have been developed using different

electroencephalogram (EEG) brain responses, including sensorimo-

tor rhythms [12–18], event-related potentials [19,20] and visual-

evoked potentials [21–25]. Among them, Steady-State Visual

Evoked Potentials (SSVEP) have attracted attention due to their

high signal-to-noise ratio (SNR) [26], robustness [27] and greater

information transfer rate [28,29], when compared to other brain

responses.

SSVEP signals are generated when a subject is looking at a flick-

ering stimulus. As a response, periodic signals with fundamental

frequency equal to the stimulus frequency, together with its har-

monics, can be observed in the occipito-parietal regions [30]. Based

on this phenomenon, the SSVEP-based BCIs can differentiate the

brain response to stimuli with different flickering frequencies

(classes) and therefore generate a specific output for each of those

classes. In order to analyze SSVEP signals, several methodologies

have been used in the recent years. Some of them include fast

Fourier transform [31,32], power spectral density-based analysis

[33–36], common spatial patterns [37], minimum energy combina-

tion [27,38,39], multivariate linear regression [40] and linear dis-

https://doi.org/10.1016/j.neucom.2021.07.103

0925-2312/Ó2021 The Authors. Published by Elsevier B.V.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

⇑

Corresponding authors.

E-mail addresses: [email protected] (V.V. Nikulin), carmen.vidaurre@unavar-

ra.es (C. Vidaurre).

Present address.

Both authors contributed equally.

Both senior authors contributed equally.

Neurocomputing 492 (2022) 664–675

Contents lists available at ScienceDirect

Neurocomputing

journal homepage: www.elsevier.com/locate/neucom

criminant analysis [41], among others. However, the most popular

method in the literature is the Canonical Correlation Analysis (CCA)

[42,43,39,44,45] and its extensions [46–48] or combinations with

other methods [49]. CCA-based methods aim at finding spatial fil-

ters that maximize the correlation between SSVEP signals and

sine–cosine reference templates at the fundamental and harmonic

frequencies of the stimuli. Many CCA-based methods have the

advantage of not requiring calibration (training) data, thus reduc-

ing or even eliminating the need of performing extra-recordings.

Nevertheless, they do not consider the possibly high overlap

between different classes and are very sensitive to cognitive per-

turbations that can be present in out-of-the-lab environments

[50]. Recently, a new method called Task-Related Component Anal-

ysis (TRCA) [51,52] has been developed in order to reduce trial

length in the design of SSVEP-based BCIs and operate at higher

speeds. This method is based on the maximization of the inter-

trial covariance matrix, and it was shown to be significantly more

efficient than CCA under time restricting conditions such as short

trial lengths. Another state-of-the-art method developed for

high-speed SSVEP-based BCIs, which we will refer to as High-

Speed BCI (HSBCI), was presented in [28]. HSBCI decomposes

SSVEPs and template signals into sub-band components by apply-

ing a filter bank analysis, and employs CCA-based spatial filters to

extract correlation features. Both TRCA and HSBCI use calibration

data to train their corresponding feature extraction procedures.

In fact, feature training has proven to critically increase SSVEP-

based BCI performance [53]. TRCA and HSBCI methods were origi-

nally introduced for a set of pre-selected channels over parietal and

occipital areas. Even though they both achieve high accuracy

results with short trial lengths, their ability to accurately classify

SSVEP data in the small sample size and in settings with higher

number of channels has to be established yet. Besides, in this

manuscript we also study how cognitive perturbations affect the

aforementioned state-of-the art methods to understand how they

could work in out-of-the-lab environments.

In this paper we propose a novel analysis pipeline, namely

Oscillatory Source Tensor Discriminant Analysis (OSTDA), to

extract SSVEP features from EEG data. Within this pipeline, we

introduce a method called shrinkage higher order discriminant

analysis (sHODA). sHODA is a modification of higher order discrim-

inant analysis (HODA) [54] that includes an analytic shrinkage

inspired by [55] as a tool for overcoming the small sample size

problem. sHODA is then used to extract features from oscillatory

brain sources, resulting in high accuracy results with trial lengths

as short as one second. In this manuscript, we compare OSTDA to

the aforementioned TRCA, HSBCI and CCA methods, and show that

it has the benefits of all of them, but none of their disadvantages. In

particular, we demonstrate that OSTDA is more robust in the small

sample size setting, and less affected by perturbations. Moreover,

our pipeline performs well with few channels and also in settings

with a high number of channels, which is a required condition in

certain applications. Thus, in this manuscript we propose a practi-

cal and fast SSVEP-based BCI system.

The paper is organized as follows: we introduce our proposed

pipeline, OSTDA, in Section 2. There, we also present sHODA and

show the details of the comparison between OSTDA and state-of-

the-art methods. In Section 3we describe the experimental data

used in this study as well as the evaluation settings. In Section 4,

we explain the details about OSTDA parameter selection, and in

Section 5we detail the statistical analysis done. Section 6is dedi-

cated to analyze results, which are discussed in Section 7. Finally,

Section 8includes a brief conclusion of the paper.

2. Methods

2.1. Notation

In this manuscript, tensors are denoted by calligraphic letters

(e.g. X), matrices are denoted by boldface capital letters (e.g. X),

whereas boldface lower-case letters are used to denote vectors

(e.g. x), and regular letters for scalars (e.g. x).

We denote the data matrix of the k-th training trial as

ðkÞ

CT

, where Cis the number of channels and Tis the number

of time samples. Therefore, the training data with Ktrials is a

three-way tensor X2R

CTK

2.2. Oscillatory Source Tensor Discriminant Analysis (OSTDA)

Fig. 1 shows the block diagram of our proposed feature extrac-

tion and classification pipeline, named Oscillatory Source Tensor

Discriminant Analysis (OSTDA). In the following sections, we elab-

orate upon the details necessary to employ OSTDA.

2.2.1. Feature extraction

Spatio-Spectral Decomposition (SSD). SSD [56] is a multivariate

and unsupervised method that aims at separating oscillatory

sources from brain signals recorded with multiple electrodes by

maximizing their signal-to-noise ratio (SNR) at the frequency band

of interest. It computes spatial filters that are basis vectors of a

subspace where the ratio of the signal power in a specific fre-

quency band to the power of signals at its marginal bands is

maximized.

Assume that the measured signal is X¼SþN, with Scorre-

sponding to the signal-of-interest in a specific frequency band

and Nthe noise component in a broad band. While filtering Xin

the frequency band of interest (resulting in X

) gives an approxi-

mation of the signal-of-interest plus noise, filtering in two narrow

bands (1 to 2 Hz) around the frequency band of interest (resulting

in X

) approximates the noise alone. Using generalized eigenvalue

Fig. 1. The block diagram of our proposed pipeline, OSTDA.

T. Jorajuría, M. Jamshidi Idaji, Z. _

Isßcan et al. Neurocomputing 492 (2022) 664–675

665

decomposition, SSD finds the optimum spatial filter w2R

C1

that

maximizes the ratio

, where C

and C

In order to apply SSD to our data, we concatenated all the trials

of the training tensor X2R

CTK

to have a X2R

CTK

data matrix.

Then, we applied SSD on a broad-band range between ½5;32Hz.

This band was selected because it covers all the fundamental and

second harmonic frequencies of the flickering stimuli. Afterwards,

we selected n

SSD

sources (n

SSD

6C) from the SSD components. We

obtained the SSD spatial filters separately for each subject using

the training trials. We denote the SSD spatial filters by

W2R

Cn

SSD

. The projection of Xonto the SSD subspace is denoted

by X

ssd

SSD

TK

, that corresponds to brain sources with

improved SNR within the frequency band ½5;32Hz.

Correlation with reference signals. As discussed before, a flick-

ering stimulus elicits brain activities oscillating with the funda-

mental frequency equal to the flickering frequency. Therefore,

one of the most informative SSVEP features is the similarity of

the EEG with single-tone sines and cosines at each flickering fre-

quency and its corresponding higher harmonics. In the case of this

study, there are four classes and we use the information of first and

second harmonics (similar to [50]). Thus, a total of M¼16 refer-

ence signals are necessary. In order to extract these similarities

and to reduce the dimensionality of the features, we computed

the Pearson’s correlation of each SSD source signal with these ref-

erence signals grouped in X

ref

MT

. This resulted in the training

data tensor X

corr

SSD

MK

, which is a concatenation of feature

matrices of all trials fX

ðkÞ

corr

k¼1

R

SSD

M

. In the following, we refer

to these feature matrices as correlation feature matrices.

Shrinkage Higher Order Discriminant Analysis (sHODA).

Higher Order Discriminant Analysis (HODA) [54], initially intro-

duced as DATER [57], is a generalization of linear discriminant

analysis (LDA) [58] for tensor data X

R

I

I

. In the special

case of matrix data, which is also the case of this manuscript,

assume fX

k¼1

R

I

is the set of the training data points cate-

gorized in multiple classes. In order to find the discriminant sub-

space for this data, a simultaneous matrix factorization should be

performed as the following [59]:

U

ð1Þ

ð2Þ

;k¼1;;Kð1Þ

The orthogonal basis factors fU

ðnÞ

r

n¼1

are computed,

with the same idea as in LDA, by maximizing between-class covari-

ance of the projections and minimizing the within-class covari-

ance. The projections of data points (or the features) are

computed as:

¼U

ð1Þ

ð2Þ

which can be vectorized and used for training a classifier. Test data

can be projected onto this discriminant subspace by using Eq. 2.

Note that G

r

. Since r

;n¼1;2, the discriminant sub-

space has a lower dimension than the original subspace of the data.

HODA employs an alternating least squares (ALS) algorithmic

scheme that iterates over the dimensions of the data tensor (here

the first two dimensions). In each iteration scatter matrices are

computed and the basis factors are estimated using an eigenvalue

decomposition. In other words, each iteration of the ALS algorithm

is an LDA problem as the following:

ðnÞ

¼argmax

ðnÞ

tr U

ðnÞT

n

ðnÞ



tr U

ðnÞT

n

ðnÞ



;s:t:U

ðnÞT

ðnÞ

¼Ið3Þ

where S

n

and S

n

are the between-class and total scatter matrices

of the n-th mode of the data tensor. In our special case of matrix

data, n¼1;2. More details about the computation of the scatter

matrices are presented in appendix A.

Tensor-based feature extraction methods are known to cope

well with small sample size scenarios [60,54]. However, even these

methods suffer from ill-conditioned scatter matrices. Therefore,

similar to conventional LDA, it is suggested that the within-class

covariance matrix should be regularized [54]. Previous works

[54,61] have implemented this regularization by means of a con-

stant regularization parameter. However, as we know from the lit-

erature on EEG classification, LDA can benefit from shrinkage,

where the shrinkage parameter is computed analytically in a

data-driven manner [55,62,63]. Inspired by [55], we implemented

an analytic shrinkage in HODA, and called it shrinkage-HODA

(sHODA), where the total scatter matrix of Eq. 3is regularized in

each of the iterations of the ALS algorithm. The optimization prob-

lem of sHODA is formulated as the following:

ðnÞ

¼argmax

ðnÞ

tr½U

ðnÞT

n

ðnÞ



tr½U

ðnÞT

ðS

n

ðnÞ

IÞU

ðnÞ



;s:t:U

ðnÞT

ðnÞ

¼Ið4Þ

where c

ðnÞ

is the regularization parameter of mode n. sHODA com-

putes this regularization parameter of each mode in a data-driven

approach. As mentioned before, [54,61] have introduced a constant

regularization parameter to the total scatter matrix, i.e.

ðnÞ

¼c

¼const;8n. The constant regularization adds a challenge

of selecting this parameter as a hyper-parameter of the pipeline,

which makes the calibration phase computationally more expen-

sive. Additionally, a constant regularization parameter may not be

the optimum value for regularizing the LDA problem in all itera-

tions of the ALS algorithm. sHODA overcomes both of these draw-

backs and can improve the performance of the feature extraction

in small sample size scenarios. Additionally, data-driven computa-

tion of the shrinkage parameter eliminates one parameter from

the cross-validation parameter tuning. For the details about the for-

mulation and algorithm of sHODA, we refer the reader to appendix

A of this manuscript.

After applying sHODA on the correlation features of SSVEP data

(i.e. fX

ðkÞ

corr

k¼1

R

SSD

M

), the feature matrix f2R

Kr

is obtained

and used for classification. Note that the only parameters of sHODA

are the new dimensions of the feature space, i.e. R¼½r

.

2.2.2. Classification

As in the original paper describing and analysing the datasets

used here [50], we applied the K-Nearest Neighbor (KNN) classifier

with five neighbors for all the analyses performed in this paper

with the OSTDA pipeline. The KNN implementation of the Statistics

and Machine Learning Toolbox of MATLAB

was used.

2.3. Benchmarking

We compared our proposed pipeline, OSTDA, with the following

state-of-the art methods.

Canonical Correlation Analysis (CCA). CCA is probably the most

popular method for feature extraction in SSVEP-based BCIs

[42,43,39,44,45]. CCA solves an optimization problem to find linear

mappings that maximize the correlation of two matrices. Here,

these two matrices are the EEG data from multiple electrodes

and the reference signals. In [50], for each SSVEP flickering fre-

quency, two sine–cosine reference signal matrices were used to

calculate canonical correlations; one of these matrices corre-

sponded to the stimulation frequency and the other one to its sec-

ond harmonic. Therefore, four canonical correlation values (CCA

features) were computed for each class, resulting in sixteen CCA

features.

To implement the above-mentioned CCA-based feature extrac-

tion in this paper, each trial was first detrended and band-pass fil-

T. Jorajuría, M. Jamshidi Idaji, Z. _

Isßcan et al. Neurocomputing 492 (2022) 664–675

666

tered between ½0:53;40Hz. This step removes DC, high frequency

artifacts and the 50 Hz power line noise [50]. Afterwards, the six-

teen CCA features were computed for each trial and classified with

a KNN classifier with five neighbors. This classifier achieved the

best classification results in the testing dataset in [50].

Task-Related Component Analysis (TRCA). TRCA [51] is a

recently published algorithm applied to high-speed SSVEP-based

BCIs. This method finds spatial filters that maximize the inter-

trial covariance matrix of each class separately by means of a gen-

eralized eigenvalue decomposition (more details can be found in

[52]).

In this paper, we followed the procedure explained in [51] to

implement TRCA, where the SSVEP signals are filtered into sub-

band components by using a filter bank. In order to fit the param-

eters of that work to the data used in this manuscript, the lowest

cut-off frequency was set to 5 Hz, thus we could include the small-

est flickering frequency. We computed the ensemble spatial filters

and the averaged templates for each stimuli and sub-band with the

training trials.

High-Speed BCI (HSBCI). A new high-speed SSVEP-based BCI

was introduced in [28]. HSBCI consists on a filter bank analysis that

decomposes SSVEPs and template signals into sub-band compo-

nents to extract correlation features using CCA-based spatial filters.

We calculated the SSVEP template signals with the training trials.

In order to implement HSBCI, we followed the procedure

explained in [28] with some exceptions to adapt the method to

the data analyzed here: the reference signals matrix to compute

CCA included the first and second harmonics, so the total fre-

quency range was ½5;32Hz. As for TRCA, we also set the lowest

cut-off frequency to 5 Hz to include the smallest flickering

frequency.

3. Experimental data

3.1. Data description

In this paper, we analyzed a public EEG dataset of a SSVEP-

based BCI recorded from 24 subjects from [50]. All subjects partic-

ipated in the study after giving a written informed consent in

accordance with the declaration of Helsinki and the experiments

were approved by the local ethics committee of National Research

University Higher School of Economics, Moscow [50]. The visual

stimuli consisted of four circles placed in different locations of a

screen, flickering at distinct frequencies of 5.45 Hz, 8.57 Hz,

12 Hz, and 15 Hz. Frequencies multiples of each other were

avoided in order to prevent coincidences between frequencies of

flickering stimuli and their higher harmonics. For more details

about the experimental setup, we refer the reader to [50].

Participants underwent offline and online (with feedback)

recording sessions. In this study, we used the offline data as the

training dataset and the online data as the testing dataset.

The offline session consisted of four runs of 25 trials each, with

a total of 100 trials (25 trials per class). In each offline trial, the sub-

jects were asked to focus on a randomly selected flickering circle

surrounded by a red oval for three seconds, followed by a one-

second resting period. In the online session, the subjects freely

chose a flickering circle to focus on for three seconds. After each

trial, the classification result was presented to the subject as a

feedback and they had to confirm or reject the result using the key-

board. There were four conditions (with and without perturbation)

in the online session (100 trials each, per subject). During each

condition, the subjects were asked to perform randomly one of

the following tasks, while looking at the stimulus: control (no per-

turbation), speaking (counting loudly and repeatedly from one to

ten), thinking (counting silently from one to ten repeatedly), and

listening (paying attention to a pre-recorded audio file where the

participant had counted from one to ten repeatedly).

EEG data were recorded from 60 channels based on the interna-

tional 10–20 system, as in Fig. 2, with the reference electrode on

the left mastoid and 1 kHz sampling rate. The data were segmented

using the stimuli markers specifying the onset and end of the flick-

ering. As aforementioned, offline and online trials of the studied

datasets were three seconds long. However, long trials decrease

the speed of BCI systems, thus an effort must be done to obtain

reliable results with short trial lengths. Consequently, in this study

we focused on short trials of only one second, with the aim of

increasing the information transfer rate of our system.

3.2. Evaluation settings

We evaluated the performance of two different settings with

our proposed pipeline and with the selected state-of-the-art meth-

ods. In particular, we analyzed how the training sample size and

the number of channels affect the performance of the studied

methods. The factor ‘‘sample size” addresses the amount of data

that is necessary to obtain a reliable performance. Reducing the

need of training (also called calibration) data, increases the ease

of use of BCIs. Typically, recording offline data is challenging

because participants become tired before starting the actual feed-

back BCI session. This problem is even greater for patients, whose

attention span might be compromised. Thus, BCI researchers inves-

tigate how short the training sessions can be to still obtain reliable

online performances [64–66], as we do in this work. On the other

hand, the factor ‘‘number of channels” is related to the trade-off

between a fast and easy setup with a low number of dimensions

and the need of flexible paradigms, specially in the case of partic-

ipants with special needs, for example due to pathologies of the

nervous system. This means that, specially for the case of patients,

it is not always possible to select only a few optimal channels, and

hence flexible systems with a higher number of dimensions might

be necessary. Thus, in order to design BCI systems that can be used

in out-of-the-lab conditions, it is necessary to study how reliable

they are under low sample size settings and with higher number

of dimensions, as we do here. That is, it is necessary to analyze

the trade-off between number of channels and the amount of avail-

able training data.

Sample size. Small sample size (SSS) problems in classification

arise when there are not enough training data points in compar-

ison to the number of unkowns. This condition results in overfit-

ting of the machine learning methods and therefore, a significant

drop in the classification results of test data is observed. In order

to assess the performance of our pipeline and other state-of-the-

art methods in SSS settings, we computed the classification results

for two numbers of training trials values. In particular, we used

only the first 5 or 25 trials of each class from the training dataset

for training. We refer to the case of using all the available training

data (i.e. 25 trials per class) as the full sample size (FSS) setting.

Fig. 2. Scalp plot of the electrodes configurations used in this study. 60-channel

setting: all shown electrodes; 9-channel setting: bold-circled electrodes.

T. Jorajuría, M. Jamshidi Idaji, Z. _

Isßcan et al. Neurocomputing 492 (2022) 664–675

667

Note that there are four classes and therefore we have in total 100

trials to train in FSS setting and 20 in the SSS setting.

Number of channels. We evaluated two different settings

regarding the number of channels used. The first setting included

all 60 EEG channels (all electrodes shown in Fig. 2). For the second

setting, only nine channels located in parietal and occipital areas

(bold-circled electrodes in Fig. 2) were selected according to

[28,51]. These channels are known to provide very good perfor-

mance for steady-state visual evoked potential paradigms [67].

4. Parameter selection

OSTDA pipeline has two parameters that were subject-

specifically selected: n

SSD

(the number of SSD components that

are retained) and R¼½r

(the number of components retained

by sHODA in each mode).

The range of n

SSD

was determined based on the number of chan-

nels used [68]. For the 9-channel setting 5 6n

SSD

69, while for the

60-channel setting 10 6n

SSD

635.

Rwas restricted to a specific set of possible values based on the

resulting total number of features (equal to the product of the two

elements of vector R, i.e. r

) according to the following criteria:

as we have four SSVEP flickering frequencies (classes), the lower

limit was set to three; additionally, taking into account that the

total number of trials in the FSS is 100, the upper bound was lim-

ited to about a 10% of this value [69], in this case to 12. Thus, we

had 3 6r

612. Therefore, the resulting possible values for R

were: ½1;3;½1;4;½1;5;½1;6;½1;7;½1;8;½1;9;½1;10;½1;11;½1;12;

½2;2;½2;3;½2;4;½2;5;½2;6;½3;1;½3;2;½3;3;½3;4;½4;1;½4;2;½4;3;

½5;1;½5;2;½6;1;½6;2;½7;1;½8;1;½9;1;½10;1;½11;1;½12;1.

Among these values, the ones that satisfied r

SSD

were used for

each n

SSD

For each evaluation setting, we selected n

SSD

and Rfor each sub-

ject based on the eigenvalues obtained in sHODA, taking the com-

bination of these two parameters that maximized the explained

percentage of variance. This was done with a chronological hold-

out cross-validation using the corresponding training dataset in

each sample size setting, taking the last 40% trials of each class

as the validation set. With these subject-specific parameters,

OSTDA was trained with all available trials in each sample size

setting.

5. Statistical Analysis

Two 3-way repeated measures ANOVA were applied to the clas-

sification results obtained with the testing dataset to see the influ-

ence of the analysis approach (OSTDA, TRCA, HSBCI, CCA), number

of training trials per class (5;25) and perturbation (Control, Listen-

ing, Speaking, Thinking) factors. They were performed separately

for individual accuracy values obtained with the 9- and 60-

channel settings, respectively.

For each ANOVA, we analyzed the sphericity of the response

variables in the repeated measures model, using a Mauchly’s test.

In case of rejection, we report the Greenhouse-Geisser adjusted

p-values. Furthermore, after each ANOVA we performed post hoc

tests using Tuckey’s honestly significant difference (HSD), when

the interaction of main factors was significant.

6. Results

6.1. Parameter selection (training dataset)

As mentioned in Section 4, we performed a chronological hold-

out to tune the parameters of OSTDA (n

SSD

and R) for each subject.

The last 40%trials of each class of the corresponding training data

were used as the validation set. Thus, subject-dependent parame-

ters were selected in each of the evaluation settings, i.e. 9- and

60-channel as well as small and full sample size (refer to

Section 3.2).

Fig. 3 shows the joint distribution of the selected parameters,

SSD

;Rg, over all the subjects in each setting. In the 9-channel set-

ting the best parameters for most of the subjects are f5;½2;6g in

the SSS setting, selected for 75%of subjects, and f5;½1;3g and

f5;½2;6g in the FSS setting, selected for 50%and 45:83%of sub-

jects, respectively. In the 60-channel setting the predominant

parameters are f10;½1;3g, which were selected for 70:83%and

75%of subjects in the SSS and the FSS settings, respectively.

6.2. Classification (testing dataset)

We benchmarked OSTDA against three state-of-the art meth-

ods, namely CCA, TRCA and HSBCI. Fig. 4 depicts test classification

mean accuracy values (%) and standard errors in different evalua-

tion settings and under different perturbations. Besides, in Fig. 5

these mean accuracy values obtained with analyzed approaches

are ranked for each evaluation setting and perturbation condition.

Two 3-way repeated measures ANOVA were separately con-

ducted (see Tables B.1 and B.2 in appendix B) for the individual

accuracies obtained in 9-channel and 60-channel settings. Each

ANOVA was designed with factors approach (OSTDA, TRCA, HSBCI,

CCA), number of training trials per class (5;25) and perturbation

(Control, Listening, Speaking, Thinking).

9-channel setting. The analysis of the 9-channel setting deliv-

ered significant results for all main effects (p<0:001 in each case)

and also for the 2-way interaction between approach and perturba-

tion (p<0:01) factors. Post-hoc tests showed that OSTDA per-

formed significantly better than CCA in Control, Speaking,

Thinking (p<0:001 in each case) and Listening (p<0:01) condi-

tions. There were no significant differences between OSTDA and

HSBCI nor between OSTDA and TRCA in any perturbation. Quanti-

tatively, however, mean accuracy values achieved with OSTDA

were higher than for TRCA and HSBCI in all sample sizes and per-

turbations. Specifically, they were 4:7%and 0:7%higher than for

Fig. 3. The joint distribution of selected OSTDA parameters (fn

SSD

;Rg) over N¼24

subjects in 9-channel and 60-channel settings for the SSS (5 training trials per class)

and the FSS (25 training trials per class) settings.

T. Jorajuría, M. Jamshidi Idaji, Z. _

Isßcan et al. Neurocomputing 492 (2022) 664–675

668

Loading more pages...