The tiny effects of respiratory masks on physiological, subjective, and behavioral measures under mental load in a randomized controlled trial [original]

Vol.:(0123456789)

Scientific Reports | (2021) 11:19601 | https://doi.org/10.1038/s41598-021-99100-7

www.nature.com/scientificreports

The tiny effects of respiratory

masks on physiological, subjective,

and behavioral measures

under mental load in a randomized

controlled trial

Robert P. Spang* & Kerstin Pieper

Since the outbreak of the coronavirus disease (COVID-19), face coverings are recommended to

diminish person-to-person transmission of the SARS-CoV-2 virus. Some public debates concern claims

regarding risks caused by wearing face masks, like, e.g., decreased blood oxygen levels and impaired

cognitive capabilities. The present, pre-registered study aims to contribute clarity by delivering

a direct comparison of wearing an N95 respirator and wearing no face covering. We focused on a

demanding situation to show that cognitive efficacy and individual states are equivalent in both

conditions. We conducted a randomized-controlled crossover trial with 44 participants. Participants

performed the task while wearing an N95 FFR versus wearing none. We measured physiological (blood

oxygen saturation and heart rate variability), behavioral (parameters of performance in the task),

and subjective (perceived mental load) data to substantiate our assumption as broadly as possible.

We analyzed data regarding both statistical equivalence and differences. All of the investigated

dimensions showed statistical equivalence given our pre-registered equivalence boundaries. None of

the dimensions showed a significant difference between wearing an FFR and not wearing an FFR.

Trial Registration: Preregistered with the Open Science Framework: https:// osf. io/ c2xp5 (15/11/2020).

Retrospectively registered with German Clinical Trials Register: DRKS00024806 (18/03/2021).

Throughout the COVID-19 pandemic, most countries quickly adopted – amongst others – face coverings as a

measure to protect the general public. Face coverings can be roughly categorized into face masks (including cloth

face coverings), surgical masks, and respirators. According to the FDA, face masks are coverings for the nose and

mouth and do not meet filtration efficiency levels (not intended for medical purposes). In contrast, surgical masks

meet several protection standards and are considered a medical device. However, their loose fit does not provide

complete protection from contaminants1. The tight-fitting filtering facepiece respirators (FFRs) such as N95

(US) provide specific filtration efficiencies (at least 95% of small (0.3-micron) particles) and thereby higher virus

protection1,2. Additionally, surgical masks and FFRs are disposable and should therefore be replaced regularly1.

Surgical masks and FFRs diminish person-to-person transmission of the SARS-CoV-2 virus3. Aerosols better

diffuse around one’s head by redirecting the exhaled emissions4. This process reduces exposures (if other measures

such as a sufficient distance are adopted as well)5,6. The scientific background at present shows that N95 FFRs

without a valve also filter particles, droplets, and aerosols in the in- and exhaled air, which reduces the risk of

infection for the person wearing such an FFR, but also, for the people next to them7 (protection factors of several

respirators can be found in8 and information about filter efficiency in9). Modeling the potential for wearing face

masks (including homemade cloth masks, surgical masks, and FFRs) demonstrated a drastic decrease in peak

hospitalizations and deaths, decreasing the SARS-CoV-2 virus’s effective transmission rate10.

An alarming number of people worldwide question scientific findings and countermeasures against the

SARS-CoV-2 virus transmission11–14. An early Twitter analysis estimated that around 25% of all tweets regard-

ing the COVID-19 disease contain misinformation15. While susceptibility to misinformation seems elevated

through social media16, COVID-19 related misinformation is shared frequently due to failing to question the

OPEN

Quality and Usability Lab, Institute of Software Engineering and Theoretical Computer Science, Electrical

Engineering and Computer Science, Technical University of Berlin, Berlin, Germany. *email: [email protected]

Vol:.(1234567890)

Scientific Reports | (2021) 11:19601 | https://doi.org/10.1038/s41598-021-99100-7

www.nature.com/scientificreports/

content’s truthfulness17. As such, the potential decline of cognitive performance is discussed. For example, one

article concludes that wearing facemasks has physiological and psychological consequences such as—among

others—decline in cognitive performance18. This is based on a not generalizable finding of declined arterial partial

oxygen pressure but unrelated to cognitive performance19. However, the manuscript showed several limitations

and was, therefore, retracted20.

Our study aims to provide clarity and evidence against known myths. We investigated multiple dimen-

sions relevant to cognitive performance. We employ a widely acknowledged questionnaire for mental workload

(NASA-TLX21) as a subjective assessment. The objective measures are physiological values indicating blood

oxygen saturation (SpO2) and heart rate variability (HRV). Regarding the behavioral dimensions, we focus on the

number of correctly solved problems within the same time interval, the correctness and response times per trial.

Related work. Several studies investigated potential physical consequences or health risks caused by face

coverings. Several studies showed that wearing a nonmedical face mask does not lead to a decline in oxygen

saturation: in older participants during minimal physical activity22, no effect on blood and muscle oxygenation

in healthy participants23, not affecting gas exchange during physical activity for neither healthy nor patients with

lung function impairment24, and no change in blood oxygen or the heart rate during rest and a flight simulation

of healthy pilots wearing N95 FFRs25. There were also no differences in heart rate and blood oxygen param-

eters in health care workers while a one-hour walk wearing N95 masks26 and FFR with low filter resistance27.

However28, provides evidence for slightly decreased blood oxygen saturation while wearing N95 respirators for

very severe COPD patients. Contrarily, only slight differences in heart rate and pulmonary responses were found

in29. Perceptions of increased body heat most likely originate from warming of the inhaled air, and the facial

skin, skin, and core temperature were not affected by wearing an N95 FFR for more than an hour during physi-

cal exercise30.

A subjective evaluation of surgeons reported a hampered performance and increased surgical fatigue while

wearing FFP2 masks31. Also, a decrease in the blood oxygen saturation and an increase in pulse rates before and

after wearing masks32. Another study compared wearing an FFR(N95) to exercising without one, which did not

show significant differences regarding heart rate, respiratory rate, blood pressure, oxygen saturation, or time to

exhaustion in a study by Epstein etal., 202033. Solely end-tidal carbon dioxide (EtCO2) levels were increased

while wearing an FFR. Other groups compared the physiological effects of exercising with N95 respirators dur-

ing pregnancy. Both did not find changed heart rate or blood oxygen levels (although diastolic pressure, mean

arterial pressure, and subjective exertion)34,35.

In an extensive review, several studies investigated the influence of face masks (medical FFR and non-medical

face masks) on physiological parameters. They concluded that the effects are negligible and would potentially

not impact healthy people even while exercising. However, persons with cardiopulmonary diseases might do

experience an effect anyhow36.

Deliberate misinformation often uses common knowledge to tell an allegedly fact-based story. Some social

media accounts connected heavier breathing while wearing FFR with the false claim to reduce blood oxygen

saturation. Indeed, respiration behavior (amongst others, frequency and intensity, see37 for a review) changes

while wearing an FFR (especially during exercise), and the physical dead volume of the respiratory system causes

breathing to be more strenuous38. However, there is no evidence that wearing face masks (cloth/surgical masks

or FFR) causes the blood oxygen levels to diminish22–24,26,27,29. Nevertheless, the literature lacks investigations

tailored to quantify the impact of face masks, especially high filtering N95 FFR, on cognitive performance. We

contribute to this research to refute misinformation and face worries regarding a connection between cognitive

functioning and wearing N95 FFR.

Regarding our variables of interest, findings from Scholey etal., 199939 suggest that in the state of high cogni-

tive demand, the heart rate helps regulate the metabolism, increasing blood oxygen circulation and improving

cognitive performance. They showed that oxygen saturation and cognitive performance correlate with each other.

Chung etal., 200640 presented similar findings where hyperoxic air administration led to increased blood oxygen

saturation and improved accuracy in a verbal cognition task compared to regular air administration. In a different

study, the HRV was shown to be sensitive for varying levels of cognitive performance. A higher HRV ampli-

tude is suggested to contribute to a decrease in cognitive performance41. Mental stress (e.g., induced by mental

arithmetic) decreases the HRV, which is suggested to be a regulation process of the autonomic nervous system42.

Additionally, the HRV seems to be a sensitive indicator to discriminate between rest, physical- and mental

load. In a study by Tealman etal., 2011 the combination of a physical task (computer mouse work) and a cognitive

task (complex arithmetic) showed a significant decrease in HRV features compared to the physical task alone43.

The Task Load Index was created to measure demand and the interaction of a subject performing a task21,44.

It has been frequently used in various like human factors and provides a solid basis for the perceived load.

Behavioral variables are commonly used to measure task difficulty and, thereby, workload. The performance

(e.g., measured as a number of solved/correct trials) is expected to decrease when workload reaches a certain

threshold45. The variation of the difficulty of a task can be indexed in a decrease of correct answers or even in no

responses, meaning it was too difficult to solve. At the same time, the duration for producing a response increases

if the task is more complex and thereby more mentally demanding than the one before46.

Our contribution. Given the body of evidence, we hypothesize equivalence of blood oxygen saturation

while wearing an N95 FFR compared to not wearing one. Further, we hypothesize equivalence of the cognitive

demand of the FFR and the no-FFR condition. We expect that the participants perform equally well in both test

conditions. In terms of behavioral data, we hypothesize equivalence between the conditions regarding the num-

Vol.:(0123456789)

Scientific Reports | (2021) 11:19601 | https://doi.org/10.1038/s41598-021-99100-7

www.nature.com/scientificreports/

ber of correctly solved tasks, the ratio of correct responses to all tasks presented, the ratio of correct responses to

all responses given, the average response time, and the average response time of correct responses.

In terms of physiological data, we assume similarbehavior in both test conditions. The task to be performed

has a cognitive focus and is carried out under time pressure. We expect no physical exertion in the relaxed sit-

ting position. Thus, only cognitive demand could influence the physiological parameters as described in the

mentioned literature. Providing that cognitive demand is equal in both conditions (with and without an FFR),

we hypothesize equivalent results regarding participants’ HRV, SpO2, TLX scores, and task performance. This

study adheres to CONSORT guidelines.

Results

For all following Two One-Sided Test of Equivalence (TOST) procedures, we employed equivalence boundaries

of dz = ±0.45. This smallest effect size of interest (SESOI) translates to the absolute values of the equivalence

boundaries reported in the following paragraphs. Figure1 provides an overview of the TOST confidence intervals

and the null hypotheses significance tests, together with the equivalence boundaries.

For both HRV analyses, we had to exclude three datasets due to incomplete recordings. Hence, both are based

on data from 41 participants. Given the chosen alpha level of α = 0.05 and the pre-defined equivalence bounds

of dz = ±0.45, both HRV TOST results have a statistical power of 1−β = 0.78. All other tests are based on all 44

participants, resulting in statistical power of the TOSTs of 1−β = 0.82.

Physiological data. See Fig.2 for a visualization of the blood oxygen saturation and the HRV measurement

(RMSSD) per condition. All result graphs share the same format and visualize different aspects of the group

comparison. First, we contrast the distribution of the two groups. For a precise understanding about outliers,

centers and spread of the inner 50%, we then align box-plots. In addition to that, we underline the equality of the

group means by adding simple bar-plots with 95%-range whiskers.

Physiological: blood oxygen levels. The mean difference of blood oxygen level between wearing an FFR (95% CI:

96.04–97.64%) and not doing so (95% CI: 96.48–97.79%) immediately after performing the 15min of mental

calculation is − 0.3% (difference Median:0%, IQR: 2%). The increase of blood oxygen level without a mask has

a negligible effect size of dz = −0.12. A Shapiro–Wilk test indicated a violation of the assumption of normal-

ity (W = 0.92, p = 0.004). Hence, we employed a robust TOST procedure using Wilcoxon signed-rank test. To

compare the measurements of two conditions, we define an equivalence interval. It is derived from our pre-

defined effect size of dz = ±0.45, which translates to ±0.736 in the units of the metric at hand (percent in this

case). Hence, the lower equivalence boundary ΔL = −0.74% and the upper equivalence boundary ΔU = 0.74%. The

TOST procedure reveals that the effect observed is statistically equivalent; the larger of the two p values is less

than α = 0.05 (V = 682, p = 0.014). According to the Neyman-Pearson approach, this means that one can reject the

hypothesis that the true effect is greater than dz = ±0.45 and act as if the effect size falls within these equivalence

bounds47. According to our pre-registration, we additionally run an exploratory null hypothesis significance test.

A pairwise Wilcoxon signed-rank test returned nonsignificant (V = 154, p = 0.259). Hence the H0 of no differ-

ence between groups is not rejected.

Physiological: heart rate variability (RMSSD). The mean difference of RMSSD between wearing an FFR (95%

CI: 28.81–58.6ms) and not wearing one (95% CI: 29.27–44.69ms) in the last five minutes of each condition is

6.73ms (difference Median:1.63ms, IQR: 11.92ms). The decrease of the RMSSD without a mask has a negli-

Figure1. Equivalence boundaries (dotted lines left and right), mean of the mask / no-mask difference

(diamond) and the 95% confidence interval (thin line; for the null hypothesis significance test), as well as the

90% confidence interval for the TOST (thick line). The x-axis shows the mean difference in the unit of the

metric.

Vol:.(1234567890)

Scientific Reports | (2021) 11:19601 | https://doi.org/10.1038/s41598-021-99100-7

www.nature.com/scientificreports/

gible effect size of dz = 0.15. A Shapiro–Wilk test indicated a violation of the assumption of normality (W = 0.38,

p < 0.001), hence we employed a robust TOST procedure using Wilcoxon signed-rank test, with equivalence

bounds of ΔL = –16.06 ms and ΔU = 16.06ms. It reveals that the effect observed is statistically equivalent, the

larger of the two p values is less than α = 0.05 (V = 56, p < 0.001). We additionally ran an exploratory null hypoth-

esis significance test. A pairwise Wilcoxon signed-rank test returned nonsignificant (V = 519, p = 0.257).

Subjective data. To investigate the NASA-TLX scores, we first computed the difference between post-task

and baseline ratings. The mean difference of these scores is − 0.01 (difference Median:0.1, IQR: 2.1, see Fig.3).

The decrease of the TLX score without an FFR (95% CI: 7.66–9.95) has a negligible effect size of dz = − 0.002

(95% CI of the mask condition: 7.74–9.85). The assumption of a normal distribution was not rejected (W = 0.99,

p = 0.919), so we used a TOST procedure based on Welch’s paired t-test with equivalence bounds ΔL = −0.78

and ΔU = 0.78. It reveals that the effect observed is statistically equivalent, the larger of the two p values is less

than α = 0.05 (t(43) = 2.96, p = 0.003). An exploratory null hypothesis significance test (pairwise Welch’s t-test)

returned nonsignificant (t(43) = −0.03, p = 0.977).

Figure2. Comparison of the physiological metrics (blood oxygen level and HRV) while wearing an FFR and

not wearing an FFR. The density plots to the left describe the similarity of the distributions of the two groups.

The box-plots in the center column compare the median and the interquartile range (IQR) and provide an

assessment of potential outliers. The bar charts to the right compare the plain mean of the two group; the

whiskers depict the inner 95% of the recorded data.

Figure3. Comparison of the subjective load ratings (NASA TLX) while wearing an FFR and not wearing

an FFR. While the distribution reveals minor differences between the groups, these are averaged out when

comparing mean and median values.

Vol.:(0123456789)

Scientific Reports | (2021) 11:19601 | https://doi.org/10.1038/s41598-021-99100-7

www.nature.com/scientificreports/

Behavioral data. See Fig.4 for a visualization of the following five behavioral performance data per condi-

tion.

Behavioral: correct responses. The mean difference between the number of correct responses while wear-

ing an FFR (95% CI: 79.13–97.37) against while not wearing one (95% CI: 82.38–98.39) is −2.14 (difference

Median:3.5, IQR: 24.5). The increase of correct responses in conditions without an FFR has a negligible effect

size of dz = −0.08. The assumption of a normal distribution was rejected (Shapiro–Wilk test, W = 0.94, p = 0.015),

so we used a robust TOST procedure based around the Wilcoxon signed-rank test with equivalence bounds

of ΔL = −8.89 and ΔU = 8.89. It reveals that the effect observed is statistically equivalent (V = 680, p = 0.016). An

exploratory null hypothesis significance test (pairwise Wilcoxon signed-rank test) returned nonsignificant

(V = 496, p = 0.995).

Behavioral: ratio correct responses/all tasks. We investigate the ratio of correct responses against the number

of all responses given (correct and incorrect). The mean difference between an FFR and no FFR is nearly zero

(difference Median: − 0.01, IQR: 0.09). The effect induced by the FFR (95% CI: 0.55–0.64) is negligible (dz = 0.03,

95% CI of the no-FFR condition: 0.55–0.63. The assumption of a normal distribution was not rejected (W = 0.96,

p = 0.133), so we used a TOST procedure based on Welch’s paired t-test with equivalence bounds ΔL = −0.04

and ΔU = 0.04. It reveals that the effect observed is statistically equivalent, the larger of the two p values is less

than α = 0.05 (t(43) = −2.64, p = 0.005). An exploratory null hypothesis significance test (pairwise Welch’s t-test)

returned nonsignificant (t(43) = 0.35, p = 0.728).

Behavioral: ratio correct responses/responses given. Next, we investigate the ratio of correct responses against

the number of all tasks presented.

The mean difference between FFR and no FFR is nearly zero (difference Median: − 0.01, IQR: 0,1). The effect

induced by the FFR (95% CI: 0.67–0.78) is negligible (dz = −0.01, 95% CI of the no-FFR condition: 0.68–0.78.

The assumption of a normal distribution was not rejected (W = 0.98, p = 0.524), so we used a TOST procedure

based around Welch’s paired t-test with equivalence bounds of ΔL = −0.04 and ΔU = 0.04. It reveals that the effect

observed is statistically equivalent, the larger of the two p values is less than α = 0.05 (t(43) = 2.92, p = 0.003).

An exploratory null hypothesis significance test (pairwise Welch’s t-test) returned nonsignificant (t(43) = −0.06,

p = 0.950).

Behavioral: mean response time. The mean difference between a mask and no mask of the average response

time is 0.29s (difference Median: − 0.05s, IQR: 1.46s). The decrease of the response time in conditions without

an FFR (95% CI: 5.06–5.82s) has a small effect size of dz = 0.21 (95% CI of the FFR condition: 5.29–6.17s). The

assumption of a normal distribution was rejected (Shapiro–Wilk test, W = 0.91, p = 0.002), so we used a robust

TOST procedure based around the Wilcoxon signed-rank test with equivalence bounds of ΔL = −0.63 s and

ΔU = 0.63s. It reveals that the effect observed is statistically equivalent (V = 329, p = 0.026). An exploratory null

hypothesis significance test (pairwise Wilcoxon signed-rank test) returned nonsignificant (V = 529, p = 0.699).

Behavioral: mean response time of correct responses. Lastly, we investigate the average response time of only

correct responses. The mean difference between an FFR (95% CI: 4.9–5.6s) and no FFR (95% CI: 4.76–5.35s) is

0.2s (difference Median: − 0.03s, IQR: 0,93s). The decrease of the response time in conditions without an FFR

has a negligible effect size of dz = 0.18. The assumption of a normal distribution was rejected (Shapiro–Wilk test,

W = 0.93, p = 0.009), so we used a robust TOST procedure based around Welch’s paired t-test with equivalence

bounds of ΔL = −0.51s and ΔU = 0.51s. It reveals that the effect observed is statistically equivalent (t(43) = −1.83,

p = 0.037). An exploratory null hypothesis significance test (pairwise Welch’s t-test) returned nonsignificant

(t(43) = 1.15, p = 0.255).

Discussion

The blood oxygen saturation shows a slight decrease of 0.3% after wearing an FFR. This effect is statistically

insignificant. Although some discussions against the use of facial masks argue that FFR would impair the body’s

oxygen supply, this is unstrained by our findings. Instead, we found statistical equivalence and no difference

between the test conditions. The HRV metric (RMSSD) showed statistical equivalence when comparing the FFR

against the no-mask condition and no significant difference from each other. The HRV seems to decrease slightly

(statistically insignificant) in the no-FFR condition on a descriptive level.

When interpreting the HRV metrics as mental load indicators, the RMSSD typically drops if the participant

is more strained48. On a descriptive level, we find opposing results: the RMSSD indicates slightly more strain,

higher intensity load, and focus in the no-FFR condition. This underlines that the changes induced by the FFR

cause less variability than the HRV can interpret reasonably.

The subjective NASA-TLX ratings show that the participants perceived a statistically equivalent workload

between wearing an FFR and not wearing one. This result may come as a surprise: Because we did not include

a blinding protocol, participants were always fully aware of wearing an FFR and not. We did not explicitly tell

them about our research question before the experiment was over. However, some participants might have

figured out why to wear an FFR sometimes and why not (none of the participants implied so). Nevertheless,

because we cannot rule out the possibility of the participants guessing our research question and perhaps even

being biased towards governmental pandemic restrictions, it remains possible to have recorded biased results.

For this very reason, it seems remarkable that the subjective TLX ratings show no evidence of favoring one of

Loading more pages...