Neu oImage 256 (2022) 119217
Con en s lis s a ailable a ScienceDi ec
Neu oImage
jou nal homepage: www.else ie .com/loca e/neu oimage
Seeing a alking ace ma e s: The ela ionship be ween co ical acking o
con inuous audi o y ‐ isual speech and gaze beha iou in in an s, child en
and adul s
✩
S.H. Jessica Tan
a , ∗
, Ma ina Kalashniko a
b , c
, Gio anni M. Di Libe o
d
, Michael J. C osse
e
,
Denis Bu nham
a
a
The MARCS Ins i u e o B ain, Beha iou and De elopmen , Wes e n Sydney Uni e si y, Aus alia
b
The Basque Cen e on Cogni ion, B ain and Language, Aus alia
c
IKERBASQUE, Basque Founda ion o Science, Aus alia
d
School o Compu e Science and S a is ics, T ini y College Dublin, Dublin, I eland
e
Depa men o Mechanical, T ini y Cen e o Biomedical Enginee ing, Manu ac u ing AND Biomedical Enginee ing, T ini y College Dublin, Dublin, I eland
a i c l e i n o
Keywo ds:
Audi o y- isual speech benefi
Co ical acking
Gaze beha iou
Audi o y- isual speech pe cep ion
In an s
Child en
Adul s
a b s a c
An audi o y- isual speech benefi , he benefi ha isual speech cues b ing o audi o y speech pe cep ion, is
expe ienced om ea ly on in in ancy and con inues o be expe ienced o an inc easing deg ee wi h age. While
he e is bo h beha iou al and neu ophysiological e idence o child en and adul s, only beha iou al e idence
exis s o in an s –as no neu ophysiological s udy has p o ided a comp ehensi e examina ion o he audi o y-
isual speech benefi in in an s. I is also su p ising ha mos s udies on audi o y- isual speech benefi do no
concu en ly epo looking beha iou especially since he audi o y- isual speech benefi es s on he assump ion
ha lis ene s a end o a speake ’s alking ace and ha he e a e meaning ul indi idual diffe ences in looking
beha iou . To add ess hese gaps, we simul aneously eco ded elec oencephalog aphic (EEG) and eye- acking
da a o 5-mon h-olds, 4-yea -olds and adul s as hey we e p esen ed wi h a speake in audi o y-only (AO), isual-
only (VO), and audi o y- isual (AV) modes. Co ical acking analyses ha in ol ed o wa d encoding models
o he speech en elope e ealed ha he e was an audi o y- isual speech benefi [i.e., AV > ( A + V )], e iden
in 5-mon h-olds and adul s bu no 4-yea -olds. Examina ion o co ical acking accu acy in ela ion o looking
beha iou , showed ha in an s’ ela i e a en ion o he speake ’s mou h ( s. eyes) was posi i ely co ela ed
wi h co ical acking accu acy o VO speech, whe eas adul s’ a en ion o he display o e all was nega i ely
co ela ed wi h co ical acking accu acy o VO speech. This s udy p o ides he fi s neu ophysiological e idence
o audi o y- isual speech benefi in in an s and ou esul s sugges ways in which cu en models o speech
p ocessing can be fine- uned.
1. In oduc ion
When lis ening o a speake alk ace- o- ace, we p ocess isual
speech cues as well as he p edominan audi o y signal. These isual
speech cues come om acial mo emen s ha occu in andem wi h
acous ic speech and can p o ide addi ional in o ma ion ha augmen s
speech pe cep ion bo h in quie (e.g., Fo e al., 2013 ; Na a a and So o-
Fa aco, 2007 ) and in noise (e.g., Mo adi e al., 2013 ; Rudmann e al.,
2003 ; Schwa z e al., 2004 ; Sumby and Pollack, 1954 ). The augmen-
✩ This esea ch was unded by a doc o al schola ship o he fi s au ho unded by he MARCS Ins i u e a Wes e n Sydney Uni e si y and he HEARing Coope a i e
Resea ch Cen e (CRC), and by HEARingCRC unding o he las au ho . The second au ho ’s wo k is suppo ed by he Basque Go e nmen h ough he BERC
2018–2021 p og am, and PIBA PI-2019–0054, and by he Spanish Minis y o Science and Inno a ion h ough he Ramon y Cajal Resea ch Fellowship, PID2019–
105528GA-I00.
∗ Co esponding au ho .
E-mail add ess: [email p o ec ed] (S.H. Jessica Tan).
a ion o speech pe cep ion by isual speech cues, o he audi o y- isual
speech benefi , has been widely s udied. Mos o hese s udies ha e been
conduc ed wi h adul s, bu findings om s udies wi h child en and in-
an s sugges ha hey oo benefi om isual speech in o ma ion, e en
hough he deg ee o audi o y- isual speech benefi inc eases wi h age.
The s udies epo ed he e conce n he audi o y- isual speech benefi in
5-mon h-old in an s, 4-yea -old child en and adul s.
Beha iou al s udies p o ide e idence o an audi o y- isual speech
benefi ac oss ages. Fo ins ance, 7.5-mon h-olds success ully segmen ed
wo ds om a fluen speech s eam ha was blended wi h a back-
h ps://doi.o g/10.1016/j.neu oimage.2022.119217 .
Recei ed 7 No embe 2021; Recei ed in e ised o m 9 Ap il 2022; Accep ed 14 Ap il 2022
A ailable online 15 Ap il 2022.
1053-8119/© 2022 Published by Else ie Inc. This is an open access a icle unde he CC BY-NC-ND license ( h p://c ea i ecommons.o g/licenses/by-nc-nd/4.0/ )
S.H. Jessica Tan, M. Kalashniko a, G.M. Di Libe o e al. Neu oImage 256 (2022) 119217
g ound oice when he audi o y s imuli we e pai ed wi h ideos o
a speake ’s alking ace, bu no when hey we e pai ed wi h a s ill
image o he speake ’s ace ( Hollich e al., 2005 ). S udies wi h chil-
d en and adul s ound ha child en iden ified phonemes and wo ds
be e in he audi o y- isual modali y compa ed o he audi o y-only
modali y, and ha his benefi is e iden bo h in quie ( Lalonde and
Hol , 2015 ) and in noise ( Lalonde and Hol , 2016 ; Maidmen e al., 2015 ;
Ross e al., 2011 ). Addi ionally, compa isons be ween child en and
adul s e ealed ha adul s expe ienced g ea e audi o y- isual speech
benefi ( Maidmen e al., 2015 ; Ross e al., 2011 ).
The same de elopmen al end has been ound in neu ophysiological
s udies wi h child en and adul s. Knowland e al. (2014) p esen ed 6- o
11-yea -olds and adul s wi h audi o y- isual wo ds and wi h audi o y-
only wo ds. Bo h he child en and he adul s showed a enua ed am-
pli ude and sho e la encies o he audi o y P2 e en - ela ed po en ial
(ERP) componen o audi o y- isual compa ed o audi o y-only wo ds,
bu he adul s addi ionally showed he a enua ed ampli ude and sho e
la encies o N1 o audi o y- isual compa ed o audi o y-only s imuli.
Toge he hese esul s sugges ha isual speech modula ion o audi-
o y ERP componen s is p esen , ye no ully de eloped in child en
( Knowland e al., 2014 ).
O he ERP s udies ha e measu ed speech pe cep ion in audi o y-
isual s audi o y-only and isual-only speech in e ms o in eg a ion
a he han enhancemen . The c i e ion o audi o y- isual in eg a ion
is based on he ela i e magni ude o neu al esponses o audi o y-
isual (AV) s imuli compa ed wi h he summa ion o neu al esponses
o audi o y-only (A) and isual-only (V) s imuli [i.e., by es ing whe he
AV = ( A + V ) no in eg a ion, o whe he AV > A + V , in eg a ion].
Using his me hod, Kagano ich and Schumake (2014) e ealed ha
peak ampli udes o N1 and P2, and he la ency o P2 we e a enua ed
in audi o y- isual compa ed o he algeb aic sum o ERP esponses o
audi o y-only and isual-only /ba/, /da/, and /ga/ syllables in 7–8-yea -
olds, 10–11-yea -olds, and adul s, he eby indica ing audi o y- isual in-
eg a ion a all h ee ages. In a sepa a e s udy, adul pa icipan s showed
a significan ly sho e la ency o he audi o y N1/P2 esponse peak
when p esen ed wi h /ka/, /pa/, and / a/ in audi o y- isual syllables
han in audi o y-only o isual-only syllables ( an Wassenho e e al.,
2005 ).
The same in eg a ion app oach has no been used wi h in an s;
a he , he majo i y o he elec ophysiological s udies o audi o y- isual
speech pe cep ion in in an s ha e in ol ed he compa ison o neu al e-
sponses (in he o m o ERPs) o cong uen e sus incong uen audi o y-
isual syllables ( B is ow e al., 2009 ; Kushne enko e al., 2008 , 2013 )
and sho ph ases ( Hyde e al., 2011 ; Reynolds e al., 2013 ). Fo exam-
ple, Kushne enko e al. (2008) examined 5-mon h-olds’ neu al p ocess-
ing o conflic ing audi o y- isual syllables ha ypically esul in he
McGu k effec . Cong uen s imuli consis ed o audi o y- isual /ba/ and
audi o y- isual /ga/ while incong uen s imuli consis ed o he McGu k
effec s imuli (audi o y /ba/ dubbed on o a isual /ga/ which usually
esul s in a “da ”o “𝛿a
” esponse) and a conflic ing s imulus (audi o y
/ga/ dubbed on o a isual /ba/ which usually esul s in a combina-
ion, “bga ”, esponse). The ERPs in esponse o he conflic ing s imulus
we e mo e posi i e o e on al a eas and mo e nega i e o e empo al
a eas compa ed o ERPs in esponse o he o he s imulus ypes, sug-
ges ing ha 5-mon h-olds de ec ed he misma ch be ween he audi o y
/ga/ and isual /ba/ bu in eg a ed he audi o y /ba/ and isual /ga/,
ea ing i he same as hey did o he in eg a ion o cong uen audi o y-
isual s imuli. Simila findings we e epo ed in a s udy ha used sho
ph ases. Hyde e al. (2011) p esen ed 5-mon h-olds wi h an audi o y
eco ding o he ph ase, “Oh, hi baby ”, ha was ei he pai ed wi h a
ma ched ideo o a ace saying he same ph ase o a misma ched ideo
o a ace saying a diffe en ph ase. Mean ampli ude o isual N1 and a -
en ional Nc componen s we e mo e nega i e in he asynch onous han
he synch onous condi ion, while mean ampli ude o audi o y P2 com-
ponen was mo e posi i e in he synch onous han he asynch onous
condi ion. Al hough hese in an ERP s udies p o ide some neu al le el
e idence o audi o y- isual in eg a ion by compa ing neu al esponses
o cong uen e sus incong uen audi o y- isual s imuli, hey did no in-
clude audi o y-only and isual-only condi ions and so do no uly quan-
i y audi o y- isual in eg a ion and, in addi ion, do no affo d compa -
ison wi h he modula ing effec o isual in o ma ion ound in child en
and adul s.
Beyond elec ophysiological s udies, he hemodynamic ( NIRS) ap-
p oach has been used o in es iga e in an s’ p ocessing o audi o y-
isual speech ( Al a e -Mackensen and G ossman, 2016 ; 2018 ). The
neu al esponses o six-mon h-old Ge man-lea ning in an s we e en-
hanced in he le in e io on al egions when hey we e p esen ed wi h
ma ched audi o y- isual speech as compa ed o when hey we e p e-
sen ed wi h misma ched audi o y- isual speech ( Al a e -Mackensen and
G ossman, 2016 ). A sepa a e s udy compa ed in an s’ p ocessing o uni-
modal audi o y, isual, and mul imodal audi o y- isual speech a he
neu al le el by p esen ing six-mon h-old Ge man-lea ning in an s wi h
unimodal and mul imodal speech s imuli /a/, /e/, and /o/ ( Al a e -
Mackensen and G ossman, 2018 ). This s udy e ealed ha he in an
pa icipan s did no show diffe en ial esponses o unimodal and mul i-
modal speech wi hin he on al egions and be ween hemisphe es.
Taken oge he , ERP s udies wi h adul s and child en illus a e ha
audi o y- isual in eg a ion occu s a a neu al le el and sugges ha
isual speech in o ma ion is beneficial o speech pe cep ion. In con-
as , in an ERP s udies demons a e only he de ec ion o a misma ch
be ween audi o y and isual s imuli, and do no show whe he isual
speech in o ma ion augmen s in an s’ speech pe cep ion, i.e., whe he
he e is an audi o y- isual speech benefi . The NIRS app oach used wi h
in an s did no find any diffe ence in neu al esponses o unimodal o
mul imodal speech wi hin on al egions In addi ion o he pauci y o
s udies in es iga ing audi o y- isual speech benefi in in an s, a majo
d awback o hese s udies in gene al is ha in o de o e oke b ain e-
sponses hey equi e p esen ing pa icipan s wi h mul iple epe i ions o
iden ical sho s imuli which a e a e aged and hen compa ed be ween
condi ions. In he case o audi o y- isual speech pe cep ion, his com-
p ises he use o syllables o sho ph ases, s imuli ha a e no en i ely
ep esen a i e o na u al, con e sa ional speech.
A ecen app oach add esses his d awback by assessing co ical
acking, o he ma hema ical ela ionship be ween he speech dynam-
ics and he co esponding b ain esponses (e.g., Ding and Simon, 2012 ;
Fiedle e al., 2019; Golumbic e al., 2013 ; G oss e al., 2013; J.
O’Sulli an e al., 2014). This app oach has g ea e ecological alidi y
han ERP app oaches, as i allows he use o con inuous s imuli a he
han disc e e, epea ed s imuli, e.g., a he han single wo ds, passages
ha mo e closely esemble na u al speech, such as audiobooks o pod-
cas s. Acco dingly, his me hod has been inc easingly used o examine
audi o y-only speech pe cep ion in adul s (e.g., Ding and Simon, 2013 ;
Ding e al., 2016 ), child en ( Di Libe o, Pe e , e al., 2018 ; Vande Ghins
e al., 2019 ), and in an s (e.g., Jessen e al., 2019 ; Kalashniko a e al.,
2018 ). E en so, he ew s udies conduc ed wi h adul s so a sugges ha
co ical acking is augmen ed when isual speech in o ma ion om a
speake ’s alking ace is p o ided (e.g., C osse e al., 2015 ; C osse e al.,
2016 ; O’Sulli an e al., 2019 ). Impo an ly, al hough he e is e idence
ha co ical acking o speech can be eliably measu ed in child en and
in an s, whe he co ical acking o audi o y- isual speech is enhanced
in child en and in an s emains an open ques ion, one ha his pape
will add ess.
The audi o y- isual speech benefi effec es s upon he assump ion
ha lis ene s a end o a speake ’s acial mo emen s. I is hus somewha
su p ising ha mos audi o y- isual speech pe cep ion s udies do no
concu en ly examine pa icipan s’ looking beha iou o he speake ’s
ace (al hough see Foxe e al., 2015 ). I has been shown ha while
he eyes con ey emo ional and social in o ma ion, he mou h ans-
la es in o ma ion closely ela ed o he empo al and acous ic p ope -
ies o speech ( Yehia e al., 1998 ). Face iewing s udies indica e ha
humans a e cognisan o he a ious ypes o in o ma ion ha diffe -
en acial ea u es p o ide and will shi hei gaze om one acial
2
S.H. Jessica Tan, M. Kalashniko a, G.M. Di Libe o e al. Neu oImage 256 (2022) 119217
egion o ano he acco dingly (e.g., Buchan e al., 2008 ; Lansing and
McConkie, 1999 ). This a en ional shi is obse ed e en in in an s as
young as 6 mon hs ( Tenenbaum e al., 2013 ). In addi ion, idiosync a ic
diffe ences be ween indi iduals in acial scanning pa e ns o he eye
and mou h egions a e ela ed o pe cep ual pe o mance ( Gu le e al.,
2015 ; Mehouda e al., 2014 ; Pe e son and Ecks ein, 2012 ). Fo ins ance,
Gu le e al. (2015) ound ha indi iduals who epo expe iencing he
McGu k effec mo e equen ly also spend a la ge p opo ion o ime
fixa ing on he speake ’s mou h. This finding poin s owa d he s ong
likelihood ha indi iduals’ idiosync a ic p e e ences in hei fixa ion o
he speake ’s mou h o eyes will influence he ex en o which isual
speech in o ma ion augmen s hei speech pe cep ion.
In e indi idual a ia ions in looking beha iou o he speake ’s ace
may esul in sub le bu significan diffe ences in speech pe cep ion.
Fo example, he opening and closing o he mou h co esponds o he
syllabic imescale o audi o y speech ( Chand aseka an e al., 2009 ),
hus p o iding he ichness o edundan cues ela ing o he s a and
end poin s o syllables ha may augmen speech pe cep ion, especially
o lis ene s who fixa e on he speake ’s mou h egion. This pe ains
pa icula ly o young in an s in no mal lis ening condi ions because
hey a e jus beginning o acqui e a language sys em. In his ega d,
Lewkowicz and Hansen-Ti (2012) p o ided e idence o a de elopmen-
al end in looking beha iou : in an s mo e away om p e e en ial a -
en ion o he speake ’s eye egion o a ending mo e o he speake ’s
mou h egion some ime be ween 4 and 8 mon hs, and hen back o a -
ending mo e o he speake ’s eye egion by 12 mon hs o age. As his
pa e n coincides wi h he de elopmen al imeline o speech p oduc ion
( Ima uku e al., 2019 ), he esea che s p opose ha he ini ial eye- o-
mou h a en ional shi eflec s in an s’ a emp o ex ac he edundan
cues p esen in audi o y- isual speech while he second a en ional shi
con e ges wi h adul s’ looking beha iou o a alking ace and sugges s
some le el o language expe ise ha educes he need o ocus speci -
ically on he speake ’s mou h ( Lewkowicz and Hansen-Ti , 2012 ). No-
ably, ela i e a en ion o a alke ’s mou h a 6 mon hs is posi i ely
ela ed o exp essi e language skills bo h hen ( Tsang e al., 2018 ) and
a 18 mon hs ( Young e al., 2009 ), and o ecep i e ocabula y a 12
mon hs ( Ima uku and Myowa, 2016 ). Failu e o a end o he speake ’s
mou h is associa ed wi h la e language lea ning diso de s ( Pons e al.,
2019 ). Adul s, by compa ison, a e p oficien language use s and ins ead
ocus mo e on he alke ’s eye egion unde op imal lis ening condi ions
bu will inc easingly di ec hei a en ion o he alke ’s mou h as lis-
ening si ua ions become mo e challenging, such as when he e is back-
g ound noise (e.g., Buchan e al., 2008 ; S acey e al., 2020 ; Va ikio is-
Ba eson e al., 1998 ). These findings aise he possibili y ha indi id-
uals’ idiosync a ic diffe ences in looking pa e ns o a alking ace will
influence he deg ee o audi o y- isual speech benefi expe ienced. In-
es iga ing whe he his is indeed he case o ms he second aim o his
s udy.
1.1. This s udy and he hypo heses
To examine whe he co ical acking o audi o y- isual speech is
enhanced in in an s and child en, and whe he gaze beha iou modu-
la es he ex en o audi o y- isual speech benefi , EEG and gaze da a
we e simul aneously eco ded as 5-mon h-old and 4-yea -old pa ici-
pan s wa ched sho clips o a speake in audi o y-only (AO), isual-
only (VO), and audi o y- isual (AV) p esen a ion modes. AO p esen a-
ions consis ed o s ill pho os o he speake ’s ace pai ed wi h audi o y
eco dings, VO p esen a ions consis ed o silen ideos o he speake
alking, and AV p esen a ions consis ed o bo h he ideos and he au-
di o y eco dings. As his pa adigm has been used p e iously wi h adul
pa icipan s ( C osse e al., 2015 ; C osse e al., 2016 a, 2016 b), a g oup
o adul s was es ed as a con ol.
Beha iou al s udies illus a e ha he audi o y- isual speech bene-
fi is e iden ac oss de elopmen . Neu ophysiological s udies show he
same o child en (using ERPs) and adul s (using ERPs and co ical ack-
ing), while none ha e ye di ec ly examined he audi o y- isual speech
benefi in in an s. E en so, ERP s udies wi h in an s ha in es iga ed
hei de ec ion o audi o y- isual asynch ony coupled wi h beha iou al
findings sugges ha he audi o y- isual speech benefi may also be e -
iden a he neu ophysiological le el in in an s.
Wi h hese conside a ions in mind, we hypo hesise ha , ac oss he
h ee age g oups, (1) co ical acking o he speech en elope will be
mos accu a e du ing AV p esen a ions, ollowed by AO hen VO p e-
sen a ions, and (2) audi o y- isual speech benefi will be e iden as in-
dexed by he addi i e c i e ion [i.e., AV > ( A + V )]. Nex , acial scan-
ning and speech pe cep ion findings sugges ha gaze beha iou may
modula e co ical acking accu acy diffe en ly o in an s compa ed o
child en and adul s. A fi e mon hs, in an s a e likely o be in he p o-
cess o shi ing hei a en ional ocus om he speake ’s eyes o he
speake ’s mou h egion ( Lewkowicz and Hansen-Ti , 2012 ; Pons e al.,
2015 ). Fu he mo e, 5-mon h-olds a e in he p ocess o acqui ing lan-
guage and may benefi om any addi ional in o ma ion ha can be ex-
ac ed om isual speech cues. Acco dingly, we hypo hesise ha he
p opo ion o ime ha in an s spend a ending o he speake ’s mou h
will be posi i ely co ela ed wi h co ical acking accu acy when isual
speech in o ma ion is a ailable, i.e., du ing VO and AV p esen a ions.
On he o he hand, he same posi i e co ela ion is no expec ed o
4-yea -olds and adul s, gi en p e ious findings ha olde child en and
adul s ocus mo e on he speake ’s eyes when he audi o y speech signal
is clea (e.g., Lewkowicz and Hansen-Ti , 2012 ), p esumably because
he acous ic p ope ies om he audi o y signal a e sufficien o speech
pe cep ion and hey u n o he eyes o seek ou emo ional and social
in o ma ion ha may no be con eyed as clea ly by audi o y speech.
2. Me hods
2.1. Pa icipan s
Fi e-mon h-olds : A final sample o eigh een 5-mon h-old in an s
om Aus alian English monolingual backg ounds we e included (M
age = 5.49 mon hs, SD = 0.30 mon hs, 8 emales). This sample size
was decided upon by d awing on p e ious neu ophysiological s ud-
ies ha in es iga ed in an neu al p ocessing o AV asynch ony (e.g.,
Hyde e al., 2011 ; Kushne enko e al., 2008 ; Reynolds e al., 2013 ) and
compa ed child en’s and adul s’ neu al p ocessing o AV speech (e.g.,
Kagano ich and Schumake , 2014 ; Knowland e al., 2014 ). An addi-
ional 20 babies we e es ed bu excluded because o ussiness ( n = 6),
excessi ely noisy EEG eco dings ( n = 11), o insufficien gaze da a
( n = 3). The a i ion a e in his s udy is no uncommon o in an
EEG s udies (e.g., deBoe e al., 2007 ; Hyde e al., 2011 ; Reynolds e al.,
2013 ). All in an s came om a monolingual Aus alian English-speaking
backg ound.
Fou -yea -olds : A final sample o 19 Aus alian English monolingual
4-yea -olds we e included (M age = 4.16 yea s, SD = 0.14 yea s, 12
emales). An addi ional 14 child en we e es ed bu excluded because
hey we e e y fidge y and did no comple e he expe imen ( n = 5),
had excessi ely noisy EEG eco dings ( n = 3), o had insufficien gaze
da a ( n = 7).
Adul s : A final sample o 18 Aus alian English monolingual adul s
aged be ween 18 and 56 yea s we e included (M age = 23.42 yea s,
SD = 8.75 yea s, 15 emales). An addi ional eigh adul s we e es ed bu
excluded because se en had insufficien gaze da a, and one expe ienced
echnical ailu e.
All in an s and child en we e bo n ull- e m, no a - isk o any cog-
ni i e o language delay, wi h no mal hea ing and ision, and no his-
o y o ea in ec ions. P io o he s udy, hei pa en s p o ided w i en
in o med consen , we e b ie ed abou he p ocedu e and old ha he
session would e mina e immedia ely i hey wished so, o i hei child
showed any signs o dis ess du ing he session. All adul pa icipan s
had sel - epo ed no mal hea ing and no mal o co ec ed- o-no mal i-
sion, we e ee o neu ological diseases, and p o ided w i en in o med
3
S.H. Jessica Tan, M. Kalashniko a, G.M. Di Libe o e al. Neu oImage 256 (2022) 119217
consen . Adul pa icipan s ook pa in his s udy as pa o a Psychol-
ogy cou se equi emen and ecei ed esea ch pa icipa ion poin s. This
s udy was app o ed by he Human Resea ch E hics Commi ee a Wes -
e n Sydney Uni e si y (app o al numbe H11517). The app o ed p o-
ocol ega ding pa icipan ec ui men , da a collec ion and da a man-
agemen was adhe ed o.
Fo all g oups o pa icipan s, noisy EEG eco dings we e defined
as da ase s ha con ain mo e han 20 bad channels as in p e ious in-
an s udies (e.g., Kalashniko a e al., 2018 ). Addi ionally, o analysis
pu poses, pa icipan s we e equi ed o ha e a leas 10 ou o 30 com-
mon ials ac oss he h ee condi ions (audi o y-only, isual-only, and
audi o y- isual) wi h a minimum o 15% a en ion (as calcula ed by
𝑎𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 =
𝑡𝑜𝑡𝑎𝑙 𝑓𝑖𝑥𝑎𝑡𝑖𝑜𝑛 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛 𝑡𝑜 𝑠𝑐𝑟𝑒𝑒𝑛 𝑑𝑢𝑟𝑖𝑛𝑔 𝑡𝑟𝑖𝑎𝑙
𝑡𝑟𝑖𝑎𝑙 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛
) o be included in he fi-
nal sample. The exclusion c i e ion o a en ion (a leas 15% a en ion
in a minimum o 10 common ials) was decided upon because p e i-
ous eye- acking s udies wi h young in an s ha e used simila exclusion
c i e ion (e.g., 15% in LoBue e al., 2016 ; 20% in Taylo & He be ,
2013). As in an EEG s udies ha e a ypical a i ion a e o 50–75%
( deBoe e al., 2007 ), he lowe bound o 15% a en ion was chosen
o educe u he da a loss. The mean numbe o ials (pe condi ion)
included in he analyses a e 15.83 o in an s, 21.26 o 4-yea -olds,
and 25.61 o adul s. The mean le els o a en ion ac oss condi ions a e
56.24% o in an s, 62.66% o 4-yea -olds, and 79.95% o adul s.
2.2. S imuli
Audio isual eco dings o 30 sho speech passages we e made by
a emale na i e speake o Aus alian English expe ienced in p oducing
in an -di ec ed speech (see Appendix A o ansc ip s). To allow o in-
an s’ limi ed a en ion span hese passages we e ela i ely sho , bu
long enough o ensu e an amoun o EEG eco ding ha was sufficien
o analyses ( C osse e al., 2021 ). These speech passages we e adap ed
om Richoz e al. (2017) o om eco dings o in an -di ec ed speech
be ween mo he s and hei babies and a ied in du a ions om 8.44 s
o 16.35 s ( M = 11.35 s, SD = 1.76 s). The eco dings consis ed o a
close-up o he speake ’s ace and shoulde s agains a whi e backg ound.
The e we e h ee p esen a ion modes, audi o y-only (AO), isual-only
(VO) and audi o y- isual (AV) wi h he unimodal audi o y and isual
eco dings ex ac ed sepa a ely om he audi o y- isual eco dings. In
he audi o y-only condi ion, a s ill image o he speake ’s es ing ace
was shown on he sc een as he audi o y ack was played. In he isual-
only condi ion, he dynamic ideo o he speake ’s alking ace was p e-
sen ed in silence. In he audi o y- isual condi ion, bo h he dynamic
ideo and i s sound ack we e played oge he . The audi o y eco d-
ings ha e a sampling a e o 44.1 kHz and a 16-bi esolu ion. The 30
speech passages we e p esen ed in h ee blocks. Each block consis ed o
10 speech passages ha we e p esen ed once in each modali y. P esen-
a ion o de was andomised ac oss modali ies in such a manne ha he
same sen ence did no appea in wo modali ies on consecu i e ials.
A en ion-ge e s imuli we e used h oughou he expe imen o
main ain pa icipan s’ a en ion. The ype and equency diffe ed be-
ween age g oups. Fo 5-mon h-olds, a en ion-ge e s consis ed o 2-s
anima ions (o en used in he in an calib a ion ou ine in Tobii S udio)
ha appea ed a e each ial. Fo 4-yea -olds and adul s, a en ion-
ge e s consis ed o diffe en pic u es o ‘Minions’ ha appea ed in a
andom o de a e ei he wo o h ee ials, wi h hei equency an-
domly de e mined. In addi ion, a diffe en 3-s ca oon anima ion was
played o ma k he end o he block and o e-engage pa icipan s.
2.3. P ocedu e
2.3.1. Fi e ‐mon h ‐olds
In an s sa on hei ca egi e ’ laps app oxima ely 70 cm away om
he cen e o an LCD sc een. Con inuous EEG da a we e eco ded
wi h a 128-channel Hyd ocel Geodesic Senso Ne (HCGSN), Ne Amps
300 amplifie , and Ne S a ion 4.5.7 so wa e (EGI Inc) a a sampling
a e o 1000 Hz, wi h he e e ence elec ode placed a Cz. Elec ode
impedances we e kep below 50 k Ω. The EEG eco dings we e sa ed o
offline analyses.
S imulus p esen a ion was con olled using P esen a ion so wa e
(Neu obeha iou al Sys ems). T igge s indica ing he s a and end o
each ial we e eco ded along wi h he EEG. Eye- acking eco dings
we e co- egis e ed wi h EEG eco dings o wo pu poses: (i) o ensu e
ha in an s we e a ending o he isual s imuli and (ii) o examine
whe he gaze beha iou o he mou h egion modula es co ical ack-
ing o he speech en elope. To his end, a Tobii X120 eye acke was
placed below he sc een o ga he gaze fixa ion da a.
As he en i e du a ion o he session was qui e long o an in an
s udy (app oxima ely 25 min), he s imuli con inued o play un il in an s
showed signs o ussiness o un il comple ion, whiche e came fi s .
2.3.2. Fou ‐yea ‐olds and adul s
The p ocedu e o 4-yea -olds was iden ical o ha o 5-mon h-olds
wi h wo excep ions. Fi s , 4-yea -olds we e sea ed on hei own. Sec-
ond, he session was amed as a game; in o de o mo i a e child en
o ocus on he sc een, child en we e equi ed o p ess a bu on on a
esponse pad whene e a pic u e o a Minion appea ed on he sc een
( Kagano ich and Schumake , 2014 ).
Adul pa icipan s we e in o med p io o he s a o he expe imen
ha hey a e pa o a con ol g oup o an in an and child s udy. The
p ocedu e o adul s was simila o 4-yea -olds, excep ha adul s also
pa icipa ed in a second EEG ask which used simila s imuli bu in adul -
di ec ed speech (ADS). I s o de o p esen a ion (immedia ely be o e
o a e he fi s ask) was coun e balanced be ween pa icipan s ( he
esul s o his ADS session a e no epo ed he e).
2.3.3. EEG measu e
2.3.3.1. P e ‐p ocessing. EEG da a we e p e-p ocessed using EEGLAB
( Delo me and Makeig, 2004 ), FieldT ip ( Oos en eld e al., 2011 ),
NoiseTools ( h p://audi ion.ens. /adc/NoiseTools/ ), he mTRF Tool-
box ( C osse e al., 2016 ) and cus om sc ip s in MATLAB R2019a (The
Ma hwo ks, Inc). Fi s , EEG da a om he h ee ou e ings o he ne
we e emo ed because hese channels ha e been ound o be e y noisy
in in an s and child en ( Di Libe o e al., 2018 ; Folland e al., 2015 ;
Kalashniko a e al., 2018 ). EEG da a om he emaining 92 channels
we e high-pass fil e ed a 0.1 Hz, low-pass fil e ed a 12 Hz wi h Bu -
e wo h 8 h o de fil e s. As in an and child EEG eco dings a e noisy
due o mo emen s, a e ac subspace econs uc ion (ASR; Ko he and
Jung, 2014 ) was applied o emo e noise. ASR uses a sliding window
echnique whe eby each EEG window is decomposed ia p incipal com-
ponen analysis. Each EEG window is hen s a is ically compa ed wi h
e e ence EEG da a ob ained om clean po ions o he EEG eco d-
ing. Wi hin each window, he ASR algo i hm sea ches o p incipal sub-
spaces ha significan ly de ia e om he e e ence EEG da a. These
subspaces a e ejec ed and hen econs uc ed using a mixing ma ix
compu ed om he e e ence EEG da a ( Chang e al., 2019 ). As in
Kalashniko a e al. (2018) , his s udy used a sliding window o 500 ms
and a h eshold o 20 s anda d de ia ions o iden i y co up ed sub-
spaces. Noisy channels ha we e emo ed du ing ASR we e eplaced
wi h an es ima e o neighbou ing clean channels using sphe ical in e -
pola ion. Finally, EEG da a we e e- e e enced o he a e age o all chan-
nels (e.g., Kalashniko a e al., 2018 ) and la e downsampled o 100 Hz
o educe p ocessing ime.
To in es iga e he impac o isual speech cues on he co ical ack-
ing o audi o y speech, he speech s imuli we e p e-p ocessed in a man-
ne ollowing Jessen e al. (2019) . The audi o y sound acks o each
ideo we e ex ac ed, downsampled o 100 Hz o ma ch he sampling
a e o he EEG da a and cha ac e ised using he b oadband speech en-
elope o he acous ic signal h ough he NSL oolbox ha models he
audi o y pe iphe ical and subco ical p ocessing s ages ( Ru, 2001 ). A
spec og am ep esen a ion o each s imulus con ained band-specific en-
elopes o 128 loga i hmically-spaced equency bands be ween 0.1 and
4
S.H. Jessica Tan, M. Kalashniko a, G.M. Di Libe o e al. Neu oImage 256 (2022) 119217
4 kHz. The b oadband empo al en elope o each sound ack was ob-
ained by summing up he band-specific en elopes ac oss all equen-
cies.
2.3.3.2. Da a analysis. Co ical acking o he speech en elope was
measu ed by ma hema ically modelling esponse unc ions ha de-
sc ibe he linea mapping be ween he s imulus speech en elopes
and he co esponding neu al esponses. Fo his s udy, he s imulus-
esponse mapping unc ion is modelled in he o wa d di ec ion (see
C osse e al. (2016) o de ails), i.e., he esul ing model desc ibes an
op imal linea ans o ma ion om he s imulus domain o he neu al-
signal domain. Such a model is fi by conduc ing a lagged idge eg es-
sion be ween he en elope and he EEG da a while accoun ing o likely
ime-delays be ween he acous ic inpu and he co esponding EEG e-
sponse. The eg ession weigh s ob ained wi h his p ocedu e es ima e
he empo al esponse unc ion (TRF) be ween en elope and EEG a each
EEG channel. Significan non-ze o weigh s eflec EEG channels whe e
co ical ac i i y is ela ed o s imulus encoding ( Hau e e al., 2014 ).
TRFs a e simila o e en - ela ed po en ials (ERPs) in ha hey allow o
an examina ion o he ampli ude, la ency, and scalp opog aphy o he
s imulus-EEG ela ionship. Specifically, he dis ibu ion o TRF weigh s
can be examined ac oss he scalp a diffe en la encies, o diffe en ela-
i e ime lags be ween he ongoing speech and EEG signals. Fo example,
a ime lag o 100 ms e e s o he impac ha a change in he speech
s imulus a ime has on he EEG a ime + 100 ms.
To in es iga e neu al acking o con inuous s imuli, adul s udies
commonly compu e esponse unc ions based on a subse (e.g., n − 1
ials) o he a ailable da a om each pa icipan (e.g., C osse e al.,
2015 ), esul ing in TRFs ha a e hen used o model esponses o he
n h ial o each pa icipan . This app oach —subjec -dependan mod-
elling — equi es leng hy da ase s o each pa icipan ha a e ypi-
cally una ainable o he in an popula ion. To accoun o he limi ed
amoun o a ailable da a om he in an sample, he subjec -independen
app oach ( Di Libe o and Lalo , 2017 ) was used o his s udy. Ins ead
o compu ing an indi idual esponse unc ion o each pa icipan , his
app oach in ol es compu ing an a e age esponse unc ion o e n − 1
pa icipan s ha is hen used o p edic he EEG signal o he n h pa -
icipan ia lea e-one-ou c oss- alida ion. The subjec -dependan mod-
elling app oach has been shown o yield be e esul s han he subjec -
dependan modelling app oach when used wi h 5-min EEG eco dings
om 7-mon h-olds and adul s ( Jessen e al., 2019 ). Subjec -dependan
modelling was used o each age g oup. In o he wo ds, an a e age e-
sponse unc ion was compu ed o each age g oup o p edic he EEG
signal o he n h pa icipan om ha age g oup.
Ini ially, TRFs we e calcula ed o each s imulus a ime lags be-
ween − 200 and 1000 ms be o e selec ing a empo al egion o he TRF
(0–600 ms) ha included all ele an componen s o map he s imulus o
he EEG signal wi h no isible esponse ou side o his ange. Lea e-one-
ou c oss- alida ion using Tikhono egula iza ion was conduc ed o as-
sess how well he unseen EEG da a could be p edic ed based on he TRF.
The egula isa ion pa ame e o he idge eg ession was se o 𝜆= 100
o all pa icipan s. The lambda pa ame e alue was chosen o mi i-
ga e he po en ial ailu e o lambda uning due o he limi ed amoun o
da a a ailable ( o a discussion, see C osse e al., 2021 ). P edic ion ac-
cu acy was quan ified by calcula ing he Pea son co ela ion coefficien
be ween he p edic ed and o iginal EEG esponses a each elec ode. I
EEG da a is indeed eflec ing he encoding o he speech en elope, hen
he co ela ion alues would be significan ly g ea e han ze o. To in-
es iga e audi o y- isual speech benefi , ( A + V ) TRFs we e compu ed
and compa ed o AV TRFs in acco dance wi h he addi i e c i e ion. The
addi i e c i e ion was chosen o in es iga e audi o y- isual speech ben-
efi because his was used in p e ious s udies wi h simila pa adigms
(e.g., C osse e al., 2015 , 2016 ). The AV speech benefi was quan ified
as he diffe ence in p edic ion accu acy o AV TRFs ela i e o A + V
TRFs.
Table 1
Means (and S anda d De ia ions) o spa ial offse s (Measu ed in Pix-
els) in gaze da a o each age g oup.
5-mon h-olds 4-yea -olds Adul s
X-coo dina e 39.91 (519.75) 72.85 (278.33) 33.26 (159.45)
y-coo dina e 25.37 (225.46) 98.80 (315.86) 164.78 (130.44)
Fig. 1. A eas o in e es (AOIs) defined o he speake ’s eye and mou h egions.
2.3.4. Gaze measu es
Means and s anda d de ia ions o he spa ial offse s (x- and y-
coo dina es) o each age g oup a e epo ed in Table 1 . As 5-mon h-olds
and 4-yea -olds we e mo e fidge y han adul s du ing he s udy, he e
was a conside able amoun o da a loss om he eye- acke o hose
g oups. To ci cum en he cumula i e effec o da a loss due o gaze as
measu ed by he eye- acke and o noisy EEG da a, ideos o pa ici-
pan s who me he EEG da a inclusion c i e ion ( ≤ 20 noisy channels)
bu had eye- acking issues (i.e., pa icipan s we e looking a he sc een
bu hei gaze was no de ec ed by he eye- acke ) we e coded ame-
by- ame manually using ELAN so wa e ( e sion 5.9) o whe he o
no hey we e looking a he sc een. This esul ed in hand-coded ideos
o 11 ou -yea -olds, and 3 fi e-mon h-olds.
A eas o in e es (AOIs) co e ing he op hal and bo om hal o he
speake ’s ace dema ca ed he speake ’s eye and mou h egions ( Fig. 1 ).
These AOIs we e o equal dimensions (640 ×340 pixels) and we e ad-
jus ed using he de i ed mean spa ial offse s o each age g oup. The
p opo ion o o al looks (PTLs) o hese AOIs, in addi ion o a en ion,
we e compu ed o each ial:
1 A en ion =
[𝑡𝑜𝑡𝑎𝑙 𝑓𝑖𝑥𝑎𝑡𝑖𝑜𝑛 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛
𝑡𝑟𝑖𝑎𝑙 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛 ], (he ea e e e ed o as A en ion)
and
2 P opo ion looking o he speake ’s mou h egion (he ea e
e e ed o as PTL Mou h)
=
[𝑡𝑜𝑡𝑎𝑙 𝑓𝑖𝑥𝑎𝑡𝑖𝑜𝑛 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛 𝑡𝑜 𝑚𝑜𝑢𝑡ℎ
𝑡𝑜𝑡𝑎𝑙 𝑓𝑖𝑥𝑎𝑡𝑖𝑜𝑛 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛 𝑡𝑜 𝑚𝑜𝑢𝑡ℎ + 𝑡𝑜𝑡𝑎𝑙 𝑓𝑖𝑥𝑎𝑡𝑖𝑜𝑛 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛 𝑡𝑜 𝑒𝑦𝑒𝑠
].
No e ha PTL Mou h is a ela i e measu e o a en ion o he mou h
compa ed o eyes, so chance is 0.5, sco es > 0.5 show g ea e fixa ion
o mou h han eyes and sco es < 0.5 show g ea e fixa ion o eyes han
mou h. All s a is ical analyses on hese wo gaze measu es we e con-
duc ed using cus om sc ip s in MATLAB R2019a (The Ma hWo ks, Inc).
The 11 ou -yea -olds and 3 fi e-mon h-olds whose gaze da a we e man-
ually coded we e only included o analyses ha examined a en ion o
sc een — hey we e excluded om analyses ha in ol ed PTL Mou h.
2.4. S a is ical analyses
Es ima es o global field powe we e compu ed and opog aphic
maps o TRF weigh s plo ed o inspec he scalp egions whe e esponses
5
S.H. Jessica Tan, M. Kalashniko a, G.M. Di Libe o e al. Neu oImage 256 (2022) 119217
Fig. 2. Global field powe measu ed a each ime lag o all ages.
o he speech en elope we e g ea es . Mean TRFs we e hen compu ed
o hose scalp loca ions iden ified as egions o in e es (ROIs) o each
condi ion.
To e alua e model pe o mance, mean p edic ion accu acies we e
ob ained by a e aging ac oss all elec odes belonging o he ROIs and
hen es ed agains ze o. Addi ionally, hese mean p edic ion accu acies
we e compa ed be ween condi ions o in es iga e he audi o y- isual
speech benefi and any age diffe ences in model pe o mance. As age-
ela ed ana omical diffe ences may influence co ical acking be ween
g oups independen o effec s due o speech modali y, TRF componen s
and hei espec i e p edic ion accu acy we e no di ec ly compa ed
s a is ically be ween age g oups.
To examine gaze beha iou , ANOVAs we e conduc ed o each age
g oup o examine he diffe ences in a en ion and p opo ion looking
a speake ’s mou h be ween condi ions (see Eqs. (1) and (2)). To ex-
amine he ela ionship be ween gaze beha iou and co ical acking,
Pea son’s co ela ions we e conduc ed o each condi ion be ween (1)
co ical acking and a en ion, and (2) co ical acking and looking
p e e ence o each age g oup, whe e co ical acking is quan ified by
TRF p edic ion accu acy.
3. Resul s
3.1. P edic ion accu acies
Fi s , as a p elimina y s ep, global field powe (GFP) —a e e ence-
independen measu e o esponse s eng h ac oss he en i e scalp a
each ime lag ( Mu ay e al., 2008 ) — was es ima ed by calcula ing
he TRF a iance ac oss all channels. The empo al p ofile o GFP o
each age g oup showed clea TRF componen s a ∼200–400 ms o AO,
AV and ( A + V ), bu no VO ( Fig. 2 ). Topog aphies o TRF weigh s
( Figs. 3–5 ) e ealed ha he obse ed componen s we e mainly loca ed
o e he on al, occipi al and empo al scalp egions. To a oid dilu -
ing he effec s o in e es , subsequen analyses o TRFs we e he e o e
ocused on he on al, occipi al, and empo al g oups o elec odes.
These g oupings we e used in p e ious in an (e.g., Folland e al., 2015 ;
Table 2
Mean p edic ion accu acies (and S anda d De ia ions), quan ified by pea son’s
, o TRFs om on al, empo al and occipi al scalp ROIs o each condi ion
and age g oup.
AO VO AV A + V
5-mon h-olds .021 (0.018) .001 (0.008) .035 (0.019) .032 (0.018)
4-yea -olds .020 (0.018) − 0.005 (0.011) .018 (0.020) .014(0.015)
Adul s .009 (0.011) .0004 (0.011) .022 (0.015) .007 (0.012)
Pe e e al., 2016 ) and child (e.g., Co igall and T aino , 2014 ) EEG
s udies o examine he a e age esponses ac oss scalp egions ( Fig. 6 ).
To examine he p esence o en elope acking, TRF p edic ion accu a-
cies a he h ee scalp ROIs we e es ed agains ze o. To assess he diffe -
ence in he ex en o en elope acking, hese p edic ion accu acies we e
hen compa ed be ween condi ions. O in e es a e (1) he diffe ences
be ween co ical acking o AO, VO and AV speech, and (2) he p es-
ence o an audi o y- isual speech benefi as quan ified by he addi i e
c i e ion [i.e., AV s. ( A + V )]. One-sample - es s we e fi s conduc ed
o es p edic ion accu acies agains ze o. Nex , one-way ANOVAs we e
conduc ed o each age g oup wi h hei espec i e p edic ion accu acies
as he dependan a iable o examine whe he p edic ion accu acies di -
e ed be ween condi ions. Subsequen pos -hoc compa isons we e con-
duc ed using wo- ailed pai ed-sample - es s wi h Bon e oni-adjus ed
alpha le els whe e mul iple compa isons we e made. The same analy-
ses we e conduc ed wi h 15 andomly selec ed ials pe condi ion o
4-yea -olds and adul s o examine whe he diffe en amoun s o da a
om each age g oup influenced he esul s. Fi een ials we e chosen
because in an da a had he leas numbe o ials included wi h app ox-
ima ely 15 ials pe condi ion.
3.1.1. E idence o co ical acking
All means and s anda d de ia ions o p edic ion accu acy o each
condi ion and age g oup a e se ou in Table 2 .
Fi e-mon h-olds : One-sample - es s indica ed ha p edic ion accu-
acy o AO, AV, and ( A + V ) TRFs we e significan ly g ea e han ze o
6
S.H. Jessica Tan, M. Kalashniko a, G.M. Di Libe o e al. Neu oImage 256 (2022) 119217
Fig. 3. (A) Topog aphies and TRFs o on al, occipi al and empo al loca ions, and (b) p edic ion accu acy o TRFs om 5-mon h-olds’ da a.
Fig. 4. (A) Topog aphies and TRFs o on al, occipi al and empo al loca ions, and (b) p edic ion accu acy o TRFs om 4-yea -olds’ da a.
(AO: (17) = 5.15, p = < 0.001, Hedges’ g = 1.16; AV: (17) = 7.47, p <
.001, Hedges’ g = 1.68,; A + V: (17) = 7.42, p < .001, Hedges’ g = 1.67),
bu p edic ion accu acy o VO TRFs was no significan ly g ea e han
ze o, (17) = 0.75, p = .23, Hedges’ g = 0.17.
Fou -yea -olds : P edic ion accu acies o AO, AV, and ( A + V ) TRFs
we e significan ly g ea e han ze o (AO: (18) = 4.93, p = < 0.001,
Hedges’ g = 1.08; AV: (18) = 3.86, p < .001, Hedges’ g = 0.85; A + V:
(18) = 3.96, p < .001, Hedges’ g = 0.87), whe eas p edic ion accu acy o
VO TRFs was no significan ly g ea e han ze o ( (18) = − 2.13, p = .98,
Hedges’ g = − 0.47). The analyses wi h 15 ials e ealed only one diffe -
ence: p edic ion accu acy o VO TRFs was significan ly lowe han ze o
( (18) = − 2.38, p = .03, Hedges’ g = 0.48.
Adul s : P edic ion accu acies o AO, AV, and ( A + V ) TRFs we e sig-
nifican ly g ea e han ze o (AO: (17) = 3.49, p = .001, Hedges’ g = 0.79;
AV: (17) = 6.11, p < .001, Hedges’ g = 1.38; A + V: (17) = 2.48,
p = .012, Hedges’ g = 0.56), whe eas p edic ion accu acy o VO TRFs
was no significan ly g ea e han ze o, (17) = 0.17, p = .44, Hedges’
g = 0.04. Resul s om he analyses wi h 15 ials we e no diffe en .
3.1.2. Diffe ence in s eng h o co ical acking be ween condi ions
The one-way ANOVAs es ing be ween condi ions (AO, VO, AV,
A + V ) we e significan o all age g oups (5-mon h-olds: F (3,
68) = 14.95, p < .001, 𝜂p
2 = 0.40; 4-yea -olds: F (3, 72) = 9.63, p <
.001, 𝜂p
2
= 0.29; adul s: F (3, 68) = 9.22, p < .001, 𝜂p
2
= 0.29). To in-
7
S.H. Jessica Tan, M. Kalashniko a, G.M. Di Libe o e al. Neu oImage 256 (2022) 119217
Fig. 5. (A) Topog aphies and TRFs o on al, occipi al and empo al loca ions, and (b) p edic ion accu acy o TRFs om adul s’ da a.
Fig. 6. Elec ode g oupings used o analyses. (A) on al elec-
odes, (B) occipi al elec odes, (C) empo al elec odes.
spec he diffe ences be ween condi ions and o iden i y whe he he e
was audi o y- isual speech benefi [i.e., AV > ( A + V )], pos hoc com-
pa isons we e subsequen ly pe o med using pai ed-sample - es s wi h
Bon e oni-adjus ed alpha le el o 0.013 (0.05/4).
Fi e-mon h-olds : When p edic ion accu acies o AO, VO, and AV TRFs
we e compa ed, pai ed-sample - es s indica ed ha p edic ion accu acy
o AV TRFs was g ea es , ollowed by AO, hen VO TRFs (AO s. VO:
(17) = 5.13, p < .001, Hedges’ g = 1.42; AO s. AV: (17) = − 4.07,
p < .001, Hedges’ g = − 0.69; AV s. VO: (17) = 7.73, p < .001, Hedges’
g = 2.15). P edic ion accu acy o AV TRFs was also significan ly g ea e
han ( A + V ) TRFs, (17) = 2.82, p = 0.001, Hedges’ g = 0.16, sugges ing
ha audi o y- isual speech benefi was p esen a he scalp ROIs.
Fou -yea -olds : Pai ed-sample - es s e ealed ha he p edic ion ac-
cu acy o AO TRFs was significan ly g ea e han ha o VO TRFs
8
S.H. Jessica Tan, M. Kalashniko a, G.M. Di Libe o e al. Neu oImage 256 (2022) 119217
Fig. 7. Sca e plo s o A en ion (A) and p opo ion o o al looking ime o he mou h s. Eyes (PTL Mou h) (B) o all condi ions and age g oups and hei
co esponding ba g aphs (C: A en ion; D: PTL Mou h). e o ba s ep esen s anda d e o s o mean (SEM). Wi h espec o a en ion, ac oss age g oups, g ea e
a en ion was cap u ed in he AV condi ion. Wi h espec o he speake ’s mou h, adul s fixa ed he speake ’s mou h o a g ea e ex en in he AV condi ion han in
AO and VO.
( (18) = 5.66, p < .001, Hedges’ g = 1.68) bu no significan ly diffe en
om he p edic ion accu acy o AV TRFs ( (18) = 0.58, p = 0.57, Hedges’
g = 0.14). The p edic ion accu acy o AV TRFs was significan ly g ea e
han ha o VO TRFs ( (18) = 4.75, p < .001, Hedges’ g = 1.39), bu
was no significan ly g ea e han ha o ( A + V ) TRFs ( (18) = 1.06,
p = 0.30, Hedges’ g = 0.21). The analyses wi h 15 ials had simila
findings.
Adul s : Pai ed-sample - es s showed ha he p edic ion accu acy
o AV TRFs was g ea es , ollowed by AO, hen VO TRFs (AO s. VO:
(17) = 4.10, p < .001, Hedges’ g = 0.78; AO s. AV: (17) = − 3.85,
p = .001, Hedges’ g = − 0.88; AV s. VO: (17) = 7.36, p < .001, Hedges’
g = 1.57). P edic ion accu acy o AV TRFs was also significan ly g ea e
han ( A + V ) TRFs ( (17) = 5.01, p < .001, Hedges’ g = 1.06), sugges -
ing ha audi o y- isual speech benefi was p esen a he scalp ROIs.
The analyses wi h 15 ials e ealed only one diffe ence: p edic ion ac-
cu acy o AO TRFs is no significan ly diffe en om ha o VO TRFs,
(17) = 1.97, p = .07, Hedges’ g = 0.54.
3.2. Gaze beha iou
3.2.1. A en ion
Sepa a e one-way wi hin-subjec s ANOVAs we e conduc ed o each
age g oup wi h A en ion as he dependan a iable (see Eq. (1) in S a is-
ical Analyses) and Condi ion as he independen a iable. The ANOVAs
e ealed a significan main effec o Condi ion o all age g oups (5-
mon h-olds: F (2, 34) = 3.58, p = .04, 𝜂p
2 = 0.17; 4-yea -olds: F (1.44,
25.89) = 26.67 wi h G eenhouse-Geisse co ec ion, p < .001, 𝜂p
2 =
0.60; adul s: F (2, 34) = 7.16, p = .002, 𝜂p
2
= 0.30). Subsequen pos -hoc
compa isons be ween condi ions we e made using pai ed-sample - es s
wi h Bon e oni-adjus ed alpha le el o 0.017 (0.05/3). Fig. 7 con ains
sca e plo s and ba g aphs o A en ion and PTL Mou h o all condi-
ions and age g oups.
Fi e-mon h-olds : A en ion was significan ly g ea e in he AV han
he VO condi ion ( (17) = 2.93, p = .009, Hedges’ g = 0.50), bu he
diffe ences be ween AO and VO and be ween AO and AV condi ions
we e no significan (AO s. VO: (17) = 1.49, p = .15, Hedges’ g = 0.34;
AO s. AV: (17) = − 0.94, p = .36, Hedges’ g = 0.14).
Fou -yea -olds : A en ion was significan ly g ea e in he AV han in
he AO condi ion ( (18) = 6.10, p < .001, Hedges’ g = 1.54) and in he
VO condi ion ( (18) = 9.19, p < .001, Hedges’ g = 1.43), whe eas he di -
e ence in a en ion be ween AO and VO condi ions was no significan
( (18) = − 1.00, p = .33, Hedges’ g = − 0.26).
Adul s : A en ion was significan ly g ea e in he VO han he AO
condi ion ( (17) = 3.58, p = .002, Hedges’ g = 0.38) and in he AV han
he AO condi ion ( (17) = 3.06, p = .007, Hedges’ g = 0.40). The di -
e ence in a en ion be ween VO and AV condi ions was no significan
( (17) = 0.11, p = .91, Hedges’ g = 0.01).
Age compa isons: An Age x Condi ion mixed-design ANOVA was con-
duc ed wi h A en ion as he dependan a iable. The main effec s o
Condi ion and Age, and he Age x Condi ion in e ac ion we e significan
(Condi ion: F (1.68, 87.50) = 26.00 wi h G eenhouse-Geisse co ec ion,
p < .001, 𝜂p
2
= 0 . 33 ; Age: F (2, 52) = 25.21, p < .001, 𝜂p
2
= 0.49 ; Age x
Condi ion: F (3.37, 87.50) = 12.47 wi h G eenhouse-Geisse co ec ion,
p < .001, 𝜂p
2
= 0.32). To examine he Age x Condi ion in e ac ion, we
conduc ed independen -samples - es s o each condi ion. Fi e-mon h-
olds a ended less o he sc een han 4-yea -olds only in he AV condi-
ion ( (35) = − 4.45, p < .001), whe eas hey a ended o he sc een sim-
ila ly du ing AO and VO p esen a ions (AO: (35) = 0.45, p = .65; VO:
(35) = − 1.53, p = .13). Fi e-mon h-olds a ended less o he sc een han
adul s in all condi ions (AO: (34) = − 4.94, p < .001; VO: (34) = − 7.43,
p < .001: AV: (34) = − 6.10, p < .001). Fou -yea -olds a ended less o
he sc een in AO and VO condi ions han adul s bu no du ing AV p e-
sen a ions (AO: (35) = − 4.99, p < .001; VO: (35) = − 5.64, p < .001;
AV: (35) = − 1.88, p = .07).
3.2.2. PTL o he speake ’s mou h
Sepa a e one-way wi hin-subjec s ANOVAs we e conduc ed o each
age g oup (DV: PTL Mou h, IV: Condi ion). The ANOVAs we e signifi-
can o 5-mon h-olds and adul s (5-mon h-olds: F (2, 26) = 4.98, p = .01,
𝜂p
2
= 0.28; adul s: F (1.35, 23.00) = 13.40 wi h G eenhouse-Geisse co -
ec ion, p < .001, 𝜂p
2
= 0.44), bu no o 4-yea -olds ( F (2, 14) = 1.82,
p = .20, 𝜂p
2 = 0.21). Subsequen analyses in ol ed one-sample - es s
o assess whe he PTL Mou h was significan ly g ea e han chance
and pai ed-sample - es s wi h Bon e oni-adjus ed alpha le el o 0.017
9