scieee Science in your language
[en] (orig)
1
Vol.:(0123456789)
Scientific Reports | (2021) 11:19601 | https://doi.org/10.1038/s41598-021-99100-7
www.nature.com/scientificreports
The tiny effects of respiratory
masks on physiological, subjective,
and behavioral measures
under mental load in a randomized
controlled trial
Robert P. Spang* & Kerstin Pieper
Since the outbreak of the coronavirus disease (COVID-19), face coverings are recommended to
diminish person-to-person transmission of the SARS-CoV-2 virus. Some public debates concern claims
regarding risks caused by wearing face masks, like, e.g., decreased blood oxygen levels and impaired
cognitive capabilities. The present, pre-registered study aims to contribute clarity by delivering
a direct comparison of wearing an N95 respirator and wearing no face covering. We focused on a
demanding situation to show that cognitive efficacy and individual states are equivalent in both
conditions. We conducted a randomized-controlled crossover trial with 44 participants. Participants
performed the task while wearing an N95 FFR versus wearing none. We measured physiological (blood
oxygen saturation and heart rate variability), behavioral (parameters of performance in the task),
and subjective (perceived mental load) data to substantiate our assumption as broadly as possible.
We analyzed data regarding both statistical equivalence and differences. All of the investigated
dimensions showed statistical equivalence given our pre-registered equivalence boundaries. None of
the dimensions showed a significant difference between wearing an FFR and not wearing an FFR.
Trial Registration: Preregistered with the Open Science Framework: https:// osf. io/ c2xp5 (15/11/2020).
Retrospectively registered with German Clinical Trials Register: DRKS00024806 (18/03/2021).
Throughout the COVID-19 pandemic, most countries quickly adopted – amongst others – face coverings as a
measure to protect the general public. Face coverings can be roughly categorized into face masks (including cloth
face coverings), surgical masks, and respirators. According to the FDA, face masks are coverings for the nose and
mouth and do not meet filtration efficiency levels (not intended for medical purposes). In contrast, surgical masks
meet several protection standards and are considered a medical device. However, their loose fit does not provide
complete protection from contaminants1. The tight-fitting filtering facepiece respirators (FFRs) such as N95
(US) provide specific filtration efficiencies (at least 95% of small (0.3-micron) particles) and thereby higher virus
protection1,2. Additionally, surgical masks and FFRs are disposable and should therefore be replaced regularly1.
Surgical masks and FFRs diminish person-to-person transmission of the SARS-CoV-2 virus3. Aerosols better
diffuse around ones head by redirecting the exhaled emissions4. This process reduces exposures (if other measures
such as a sufficient distance are adopted as well)5,6. The scientific background at present shows that N95 FFRs
without a valve also filter particles, droplets, and aerosols in the in- and exhaled air, which reduces the risk of
infection for the person wearing such an FFR, but also, for the people next to them7 (protection factors of several
respirators can be found in8 and information about filter efficiency in9). Modeling the potential for wearing face
masks (including homemade cloth masks, surgical masks, and FFRs) demonstrated a drastic decrease in peak
hospitalizations and deaths, decreasing the SARS-CoV-2 viruss effective transmission rate10.
An alarming number of people worldwide question scientific findings and countermeasures against the
SARS-CoV-2 virus transmission1114. An early Twitter analysis estimated that around 25% of all tweets regard-
ing the COVID-19 disease contain misinformation15. While susceptibility to misinformation seems elevated
through social media16, COVID-19 related misinformation is shared frequently due to failing to question the
OPEN
Quality and Usability Lab, Institute of Software Engineering and Theoretical Computer Science, Electrical
Engineering and Computer Science, Technical University of Berlin, Berlin, Germany. *email: [email protected]
2
Vol:.(1234567890)
Scientific Reports | (2021) 11:19601 | https://doi.org/10.1038/s41598-021-99100-7
www.nature.com/scientificreports/
content’s truthfulness17. As such, the potential decline of cognitive performance is discussed. For example, one
article concludes that wearing facemasks has physiological and psychological consequences such as—among
others—decline in cognitive performance18. This is based on a not generalizable finding of declined arterial partial
oxygen pressure but unrelated to cognitive performance19. However, the manuscript showed several limitations
and was, therefore, retracted20.
Our study aims to provide clarity and evidence against known myths. We investigated multiple dimen-
sions relevant to cognitive performance. We employ a widely acknowledged questionnaire for mental workload
(NASA-TLX21) as a subjective assessment. The objective measures are physiological values indicating blood
oxygen saturation (SpO2) and heart rate variability (HRV). Regarding the behavioral dimensions, we focus on the
number of correctly solved problems within the same time interval, the correctness and response times per trial.
Related work. Several studies investigated potential physical consequences or health risks caused by face
coverings. Several studies showed that wearing a nonmedical face mask does not lead to a decline in oxygen
saturation: in older participants during minimal physical activity22, no effect on blood and muscle oxygenation
in healthy participants23, not affecting gas exchange during physical activity for neither healthy nor patients with
lung function impairment24, and no change in blood oxygen or the heart rate during rest and a flight simulation
of healthy pilots wearing N95 FFRs25. There were also no differences in heart rate and blood oxygen param-
eters in health care workers while a one-hour walk wearing N95 masks26 and FFR with low filter resistance27.
However28, provides evidence for slightly decreased blood oxygen saturation while wearing N95 respirators for
very severe COPD patients. Contrarily, only slight differences in heart rate and pulmonary responses were found
in29. Perceptions of increased body heat most likely originate from warming of the inhaled air, and the facial
skin, skin, and core temperature were not affected by wearing an N95 FFR for more than an hour during physi-
cal exercise30.
A subjective evaluation of surgeons reported a hampered performance and increased surgical fatigue while
wearing FFP2 masks31. Also, a decrease in the blood oxygen saturation and an increase in pulse rates before and
after wearing masks32. Another study compared wearing an FFR(N95) to exercising without one, which did not
show significant differences regarding heart rate, respiratory rate, blood pressure, oxygen saturation, or time to
exhaustion in a study by Epstein etal., 202033. Solely end-tidal carbon dioxide (EtCO2) levels were increased
while wearing an FFR. Other groups compared the physiological effects of exercising with N95 respirators dur-
ing pregnancy. Both did not find changed heart rate or blood oxygen levels (although diastolic pressure, mean
arterial pressure, and subjective exertion)34,35.
In an extensive review, several studies investigated the influence of face masks (medical FFR and non-medical
face masks) on physiological parameters. They concluded that the effects are negligible and would potentially
not impact healthy people even while exercising. However, persons with cardiopulmonary diseases might do
experience an effect anyhow36.
Deliberate misinformation often uses common knowledge to tell an allegedly fact-based story. Some social
media accounts connected heavier breathing while wearing FFR with the false claim to reduce blood oxygen
saturation. Indeed, respiration behavior (amongst others, frequency and intensity, see37 for a review) changes
while wearing an FFR (especially during exercise), and the physical dead volume of the respiratory system causes
breathing to be more strenuous38. However, there is no evidence that wearing face masks (cloth/surgical masks
or FFR) causes the blood oxygen levels to diminish2224,26,27,29. Nevertheless, the literature lacks investigations
tailored to quantify the impact of face masks, especially high filtering N95 FFR, on cognitive performance. We
contribute to this research to refute misinformation and face worries regarding a connection between cognitive
functioning and wearing N95 FFR.
Regarding our variables of interest, findings from Scholey etal., 199939 suggest that in the state of high cogni-
tive demand, the heart rate helps regulate the metabolism, increasing blood oxygen circulation and improving
cognitive performance. They showed that oxygen saturation and cognitive performance correlate with each other.
Chung etal., 200640 presented similar findings where hyperoxic air administration led to increased blood oxygen
saturation and improved accuracy in a verbal cognition task compared to regular air administration. In a different
study, the HRV was shown to be sensitive for varying levels of cognitive performance. A higher HRV ampli-
tude is suggested to contribute to a decrease in cognitive performance41. Mental stress (e.g., induced by mental
arithmetic) decreases the HRV, which is suggested to be a regulation process of the autonomic nervous system42.
Additionally, the HRV seems to be a sensitive indicator to discriminate between rest, physical- and mental
load. In a study by Tealman etal., 2011 the combination of a physical task (computer mouse work) and a cognitive
task (complex arithmetic) showed a significant decrease in HRV features compared to the physical task alone43.
The Task Load Index was created to measure demand and the interaction of a subject performing a task21,44.
It has been frequently used in various like human factors and provides a solid basis for the perceived load.
Behavioral variables are commonly used to measure task difficulty and, thereby, workload. The performance
(e.g., measured as a number of solved/correct trials) is expected to decrease when workload reaches a certain
threshold45. The variation of the difficulty of a task can be indexed in a decrease of correct answers or even in no
responses, meaning it was too difficult to solve. At the same time, the duration for producing a response increases
if the task is more complex and thereby more mentally demanding than the one before46.
Our contribution. Given the body of evidence, we hypothesize equivalence of blood oxygen saturation
while wearing an N95 FFR compared to not wearing one. Further, we hypothesize equivalence of the cognitive
demand of the FFR and the no-FFR condition. We expect that the participants perform equally well in both test
conditions. In terms of behavioral data, we hypothesize equivalence between the conditions regarding the num-
3
Vol.:(0123456789)
Scientific Reports | (2021) 11:19601 | https://doi.org/10.1038/s41598-021-99100-7
www.nature.com/scientificreports/
ber of correctly solved tasks, the ratio of correct responses to all tasks presented, the ratio of correct responses to
all responses given, the average response time, and the average response time of correct responses.
In terms of physiological data, we assume similarbehavior in both test conditions. The task to be performed
has a cognitive focus and is carried out under time pressure. We expect no physical exertion in the relaxed sit-
ting position. Thus, only cognitive demand could influence the physiological parameters as described in the
mentioned literature. Providing that cognitive demand is equal in both conditions (with and without an FFR),
we hypothesize equivalent results regarding participants’ HRV, SpO2, TLX scores, and task performance. This
study adheres to CONSORT guidelines.
Results
For all following Two One-Sided Test of Equivalence (TOST) procedures, we employed equivalence boundaries
of dz = ±0.45. This smallest effect size of interest (SESOI) translates to the absolute values of the equivalence
boundaries reported in the following paragraphs. Figure1 provides an overview of the TOST confidence intervals
and the null hypotheses significance tests, together with the equivalence boundaries.
For both HRV analyses, we had to exclude three datasets due to incomplete recordings. Hence, both are based
on data from 41 participants. Given the chosen alpha level of α = 0.05 and the pre-defined equivalence bounds
of dz = ±0.45, both HRV TOST results have a statistical power of 1−β = 0.78. All other tests are based on all 44
participants, resulting in statistical power of the TOSTs of 1−β = 0.82.
Physiological data. See Fig.2 for a visualization of the blood oxygen saturation and the HRV measurement
(RMSSD) per condition. All result graphs share the same format and visualize different aspects of the group
comparison. First, we contrast the distribution of the two groups. For a precise understanding about outliers,
centers and spread of the inner 50%, we then align box-plots. In addition to that, we underline the equality of the
group means by adding simple bar-plots with 95%-range whiskers.
Physiological: blood oxygen levels. The mean difference of blood oxygen level between wearing an FFR (95% CI:
96.04–97.64%) and not doing so (95% CI: 96.48–97.79%) immediately after performing the 15min of mental
calculation is 0.3% (difference Median:0%, IQR: 2%). The increase of blood oxygen level without a mask has
a negligible effect size of dz = −0.12. A Shapiro–Wilk test indicated a violation of the assumption of normal-
ity (W = 0.92, p = 0.004). Hence, we employed a robust TOST procedure using Wilcoxon signed-rank test. To
compare the measurements of two conditions, we define an equivalence interval. It is derived from our pre-
defined effect size of dz = ±0.45, which translates to ±0.736 in the units of the metric at hand (percent in this
case). Hence, the lower equivalence boundary ΔL = −0.74% and the upper equivalence boundary ΔU = 0.74%. The
TOST procedure reveals that the effect observed is statistically equivalent; the larger of the two p values is less
than α = 0.05 (V = 682, p = 0.014). According to the Neyman-Pearson approach, this means that one can reject the
hypothesis that the true effect is greater than dz = ±0.45 and act as if the effect size falls within these equivalence
bounds47. According to our pre-registration, we additionally run an exploratory null hypothesis significance test.
A pairwise Wilcoxon signed-rank test returned nonsignificant (V = 154, p = 0.259). Hence the H0 of no differ-
ence between groups is not rejected.
Physiological: heart rate variability (RMSSD). The mean difference of RMSSD between wearing an FFR (95%
CI: 28.81–58.6ms) and not wearing one (95% CI: 29.27–44.69ms) in the last five minutes of each condition is
6.73ms (difference Median:1.63ms, IQR: 11.92ms). The decrease of the RMSSD without a mask has a negli-
Figure1. Equivalence boundaries (dotted lines left and right), mean of the mask / no-mask difference
(diamond) and the 95% confidence interval (thin line; for the null hypothesis significance test), as well as the
90% confidence interval for the TOST (thick line). The x-axis shows the mean difference in the unit of the
metric.
Advertisement
4
Vol:.(1234567890)
Scientific Reports | (2021) 11:19601 | https://doi.org/10.1038/s41598-021-99100-7
www.nature.com/scientificreports/
gible effect size of dz = 0.15. A Shapiro–Wilk test indicated a violation of the assumption of normality (W = 0.38,
p < 0.001), hence we employed a robust TOST procedure using Wilcoxon signed-rank test, with equivalence
bounds of ΔL = –16.06 ms and ΔU = 16.06ms. It reveals that the effect observed is statistically equivalent, the
larger of the two p values is less than α = 0.05 (V = 56, p < 0.001). We additionally ran an exploratory null hypoth-
esis significance test. A pairwise Wilcoxon signed-rank test returned nonsignificant (V = 519, p = 0.257).
Subjective data. To investigate the NASA-TLX scores, we first computed the difference between post-task
and baseline ratings. The mean difference of these scores is 0.01 (difference Median:0.1, IQR: 2.1, see Fig.3).
The decrease of the TLX score without an FFR (95% CI: 7.66–9.95) has a negligible effect size of dz = − 0.002
(95% CI of the mask condition: 7.74–9.85). The assumption of a normal distribution was not rejected (W = 0.99,
p = 0.919), so we used a TOST procedure based on Welchs paired t-test with equivalence bounds ΔL = −0.78
and ΔU = 0.78. It reveals that the effect observed is statistically equivalent, the larger of the two p values is less
than α = 0.05 (t(43) = 2.96, p = 0.003). An exploratory null hypothesis significance test (pairwise Welchs t-test)
returned nonsignificant (t(43) = −0.03, p = 0.977).
Figure2. Comparison of the physiological metrics (blood oxygen level and HRV) while wearing an FFR and
not wearing an FFR. The density plots to the left describe the similarity of the distributions of the two groups.
The box-plots in the center column compare the median and the interquartile range (IQR) and provide an
assessment of potential outliers. The bar charts to the right compare the plain mean of the two group; the
whiskers depict the inner 95% of the recorded data.
Figure3. Comparison of the subjective load ratings (NASA TLX) while wearing an FFR and not wearing
an FFR. While the distribution reveals minor differences between the groups, these are averaged out when
comparing mean and median values.
5
Vol.:(0123456789)
Scientific Reports | (2021) 11:19601 | https://doi.org/10.1038/s41598-021-99100-7
www.nature.com/scientificreports/
Behavioral data. See Fig.4 for a visualization of the following five behavioral performance data per condi-
tion.
Behavioral: correct responses. The mean difference between the number of correct responses while wear-
ing an FFR (95% CI: 79.13–97.37) against while not wearing one (95% CI: 82.38–98.39) is −2.14 (difference
Median:3.5, IQR: 24.5). The increase of correct responses in conditions without an FFR has a negligible effect
size of dz = −0.08. The assumption of a normal distribution was rejected (Shapiro–Wilk test, W = 0.94, p = 0.015),
so we used a robust TOST procedure based around the Wilcoxon signed-rank test with equivalence bounds
of ΔL = −8.89 and ΔU = 8.89. It reveals that the effect observed is statistically equivalent (V = 680, p = 0.016). An
exploratory null hypothesis significance test (pairwise Wilcoxon signed-rank test) returned nonsignificant
(V = 496, p = 0.995).
Behavioral: ratio correct responses/all tasks. We investigate the ratio of correct responses against the number
of all responses given (correct and incorrect). The mean difference between an FFR and no FFR is nearly zero
(difference Median: 0.01, IQR: 0.09). The effect induced by the FFR (95% CI: 0.55–0.64) is negligible (dz = 0.03,
95% CI of the no-FFR condition: 0.55–0.63. The assumption of a normal distribution was not rejected (W = 0.96,
p = 0.133), so we used a TOST procedure based on Welchs paired t-test with equivalence bounds ΔL = −0.04
and ΔU = 0.04. It reveals that the effect observed is statistically equivalent, the larger of the two p values is less
than α = 0.05 (t(43) = −2.64, p = 0.005). An exploratory null hypothesis significance test (pairwise Welchs t-test)
returned nonsignificant (t(43) = 0.35, p = 0.728).
Behavioral: ratio correct responses/responses given. Next, we investigate the ratio of correct responses against
the number of all tasks presented.
The mean difference between FFR and no FFR is nearly zero (difference Median: 0.01, IQR: 0,1). The effect
induced by the FFR (95% CI: 0.67–0.78) is negligible (dz = −0.01, 95% CI of the no-FFR condition: 0.68–0.78.
The assumption of a normal distribution was not rejected (W = 0.98, p = 0.524), so we used a TOST procedure
based around Welchs paired t-test with equivalence bounds of ΔL = −0.04 and ΔU = 0.04. It reveals that the effect
observed is statistically equivalent, the larger of the two p values is less than α = 0.05 (t(43) = 2.92, p = 0.003).
An exploratory null hypothesis significance test (pairwise Welchs t-test) returned nonsignificant (t(43) = −0.06,
p = 0.950).
Behavioral: mean response time. The mean difference between a mask and no mask of the average response
time is 0.29s (difference Median: 0.05s, IQR: 1.46s). The decrease of the response time in conditions without
an FFR (95% CI: 5.06–5.82s) has a small effect size of dz = 0.21 (95% CI of the FFR condition: 5.29–6.17s). The
assumption of a normal distribution was rejected (Shapiro–Wilk test, W = 0.91, p = 0.002), so we used a robust
TOST procedure based around the Wilcoxon signed-rank test with equivalence bounds of ΔL = −0.63 s and
ΔU = 0.63s. It reveals that the effect observed is statistically equivalent (V = 329, p = 0.026). An exploratory null
hypothesis significance test (pairwise Wilcoxon signed-rank test) returned nonsignificant (V = 529, p = 0.699).
Behavioral: mean response time of correct responses. Lastly, we investigate the average response time of only
correct responses. The mean difference between an FFR (95% CI: 4.9–5.6s) and no FFR (95% CI: 4.76–5.35s) is
0.2s (difference Median: 0.03s, IQR: 0,93s). The decrease of the response time in conditions without an FFR
has a negligible effect size of dz = 0.18. The assumption of a normal distribution was rejected (Shapiro–Wilk test,
W = 0.93, p = 0.009), so we used a robust TOST procedure based around Welchs paired t-test with equivalence
bounds of ΔL = −0.51s and ΔU = 0.51s. It reveals that the effect observed is statistically equivalent (t(43) = −1.83,
p = 0.037). An exploratory null hypothesis significance test (pairwise Welchs t-test) returned nonsignificant
(t(43) = 1.15, p = 0.255).
Discussion
The blood oxygen saturation shows a slight decrease of 0.3% after wearing an FFR. This effect is statistically
insignificant. Although some discussions against the use of facial masks argue that FFR would impair the body’s
oxygen supply, this is unstrained by our findings. Instead, we found statistical equivalence and no difference
between the test conditions. The HRV metric (RMSSD) showed statistical equivalence when comparing the FFR
against the no-mask condition and no significant difference from each other. The HRV seems to decrease slightly
(statistically insignificant) in the no-FFR condition on a descriptive level.
When interpreting the HRV metrics as mental load indicators, the RMSSD typically drops if the participant
is more strained48. On a descriptive level, we find opposing results: the RMSSD indicates slightly more strain,
higher intensity load, and focus in the no-FFR condition. This underlines that the changes induced by the FFR
cause less variability than the HRV can interpret reasonably.
The subjective NASA-TLX ratings show that the participants perceived a statistically equivalent workload
between wearing an FFR and not wearing one. This result may come as a surprise: Because we did not include
a blinding protocol, participants were always fully aware of wearing an FFR and not. We did not explicitly tell
them about our research question before the experiment was over. However, some participants might have
figured out why to wear an FFR sometimes and why not (none of the participants implied so). Nevertheless,
because we cannot rule out the possibility of the participants guessing our research question and perhaps even
being biased towards governmental pandemic restrictions, it remains possible to have recorded biased results.
For this very reason, it seems remarkable that the subjective TLX ratings show no evidence of favoring one of
Advertisement
Loading more pages...