Measu emen and Visual Analy ics o Non e bal Beha io and
In e ac ion
Ma us Gaspa ik, Ca olin B onowicz, Susanne Bleisch
In spi e o he ecen apid ad ances in a i icial in elligence, compu e ision and a p oli e a ion o o - he-shel
ools, s udying and unde s anding he dynamics o non e bal beha io (NVB) and in e ac ion s ill p o es
challenging. New eme ging esea ch disciplines, such as Social Signal P ocessing (SSP), a emp o le e age he
compu a ional me hods and machine-lea ning o au oma ically ex ac NVB om high- olume mul i-modal
ideo, audio and na u al language da a wi h mode a e success. The success o hese au oma ic app oaches is
p edica ed on he a ailabili y o high-quali y ex ensi e aining da ase s and is complica ed by he issues
conce ning he heo e ical and con ex -speci ic alidi y o he p edic ed cons uc s.
One unde explo ed app oach o u ilize senso da a wi h signi ican po en ial o imp o e ou unde s anding o NVB
is Visual Analy ics (VA). In his wo k, we p esen a me hodological app oach using he VA amewo k o ex ac
and analyze NVB and in e ac ion om ideo da a in con ex o collabo a i e-lea ning. Using s a e-o - he-a
Compu e Vision echniques, we ex ac high- esolu ion ime se ies o ace, hands and body landma ks om ideo
eco dings o small g oups o s uden s asked wi h collabo a i ely sol ing a ask on a compu e . We desc ibe a
isualiza ion pipeline ha ans o ms he aw landma ks o ele an NVB signals and gene a es isual
ep esen a ions ha acili a e an e ec i e analysis o empo al pa e ns by a human obse e . Addi ionally, we
p esen some isual-mapping echniques o deal wi h he isualiza ion o complex mul i-dimensional da a o wi h
he loss o da a due o agg ega ion. We showcase he po en ial o he VA o s udying indi idual and dyadic NVB
using he empo al pa e ns in head mo emen , pos u e and he mu ual o ien a ion ( acing di ec ions) o pai s o
pe sons.
1 In oduc ion
The unde s anding o non e bal beha io (NVB) and communica ion (NVC) in g oup p ocesses is o g ea in e es
ac oss many in e disciplina y domains and applica ion con ex s. A no able example is educa ional esea ch on
Compu e -Suppo ed Collabo a i e Lea ning (CSCL), whe e enhanced engagemen and imp o ed lea ning
ou comes ha e been demons a ed (J. Chen e al., 2018). Howe e , ou unde s anding o he in e pe sonal and he
g oup dynamics (e.g. he quali y o engagemen ) and how hey mani es h ough non e bal beha io is s ill in i s
ea ly s ages (Pane h e al., 2023).
The p og ess in s udying he dynamics o NVB is p edica ed on ou abili y o measu e NVB wi h su icien
p ecision and empo al esolu ion. The measu emen o NVB has adi ionally elied on coding by ained code s
o manually anno a e ideo eco dings. Howe e , his app oach has se e al d awbacks, mos no ably he
p ohibi i e human e o and high cogni i e load necessa y o anno a e ex ensi e leng hs o ideo da a as well as
he limi ed empo al esolu ion and g anula i y o he anno a ed da a. Fo p ac ical easons, he coding o NVB is
ypically pe o med o e small egula ime segmen s wi h an a bi a y leng h (on he o de o minu es) and he
obse ed non e bal cues (e.g. smiling, eye con ac ) a e eco ded as agg ega ed alues. The kind o agg ega ion
(e.g. coun / equency e sus du a ion) needs o be decided up on and mus be in o med by he esea ch objec i e
in ques ion o mul iple agg ega ion alues a e coded join ly a he cos o inc eased e o . Mo eo e , i he coding
in ol es abs ac , highe -o de cons uc s (e.g. assessing a pe son’s “cogni i e engagemen ”), o he issues a ise
ega ding he heo e ical alidi y o he cons uc and po en ial bias in oduced by subjec i e pe cep ion o he
code .
The ecen apid ad ancemen s in AI, compu e ision, na u al language o audio p ocessing, as well as he
abundance o ubiqui ous low-cos senso s ha e spa ked op imism abou he po en ial o le e aging hese
echnologies o measu e and s udy NVB and social in e ac ion. The in oduc ion o wea able senso s as
“sociome ic badges” in o social sciences and g oup esea ch made i possible o collec as amoun o da a o e
la ge empo al and spa ial ex en s and ou side he con olled labo a o y se ings (Kim e al. (2008), blok.e al_2017,
Zhang e al. (2018)). The sociome ic badges a e ypically equipped wi h mic ophones, accele ome e s as well as
p oximi y and adio equency senso s o cap u e high- esolu ion da a o he wea e ’s speech, body mo emen ,
loca ion as well as p oximi y o o he s (Pa ke e al., 2020). While hey only allow o cap u e some aspec s o
NVB and, by hemsel es, migh no be su icien o s udy NVB and i s dynamics wi h high g anula i y (unless
used in conjunc ion o o he senso s), he sociome ic senso s ha e helped use he social sciences in o he “big
da a” e a.
One p omising app oach ha migh help add ess he sho comings o he adi ional me hods o collec ing da a is
Social Signal P ocessing. Social Signal P ocessing (SSP) is eme ging esea ch domain encompassing beha io al
and social sciences, compu e science and enginee ing wi h he ambi ious goal o p o iding machines wi h social
in elligence, he abili y o “sense and unde s and human social signals” (Vincia elli e al., 2009). SSP posi s ha
he non e bal cues encoded in acial exp essions, pos u e, hand ges u es o non-linguis ic aspec s o speech
(pa alanguage) con ey machine-de ec able social signals amenable o au oma ic analysis by algo i hms. The
p ocess in SSP in ol es 1) he ex ac ion o he non e bal cues om mul i-modal senso da a and 2) he “social
signal unde s anding”. The main challenges in machine unde s anding o social beha io lie in unde s anding and
ep esen ing he empo al dynamics o NVB and in usion o mul i-modal da a wi h a ying imescales (Vincia elli
e al., 2009). In p ac ice his is usually add essed wi h a ious machine-lea ning and ea u e enginee ing
app oaches and equi es ca e ully cu a ed aining da ase s labelled by domain expe s. I should be also no ed
ha a ailable esea ch ypically ocuses only on speci ic aspec s o non e bal beha io wi h a goal o
au oma ically p edic , o example, u n- aking (Vincia elli e al., 2009) o ask and social cohesion (Lehmann-
Willenb ock & Hung, 2023). Deep-lea ning app oaches a e supe io o classical machine-lea ning in ha hey
don’ equi e ea u e enginee ing and ypically achie e much be e pe o mance in many asks. In he con ex o
NVB and deep-lea ning, me hods ha e been used o di ec ly p edic he heo e ical cons uc o in e es , e.g. he
g oup lea ning engagemen (Zheng e al., 2023).
Howe e , he abili y o collec la ge amoun s o nea ly con inuous quan i a i e da a in combina ion wi h da a-
d i en algo i hms is no a panacea when i comes o au oma ically de ec ing non e bal beha io s and comes wi h
i s own d awbacks. Fi s and o emos , i does no comple ely elimina e he manual coding as he aining o
models o au oma ic de ec ion o NVB equi e subs an ial e o by domain expe s o c ea e and main ain high-
quali y labelled da ase s (Lehmann-Willenb ock & Hung, 2023). The ained models will likely only wo k in
na ow con ex s depending on he speci ied objec i e o he model as well as he cha ac e is ics o he senso da a.
Secondly, i he models a e designed o p edic abs ac , highe -le el cons uc s, he conce ns su ounding hei
heo e ical alidi y (Luciano e al., 2018; Mülle e al., 2019), conside a ion o con ex ual and en i onmen al
ac o s and po en ial bias become e en mo e p essing (Renie e al., 2021). Mo e impo an ly, hese me hods a e
limi ed in e ms o gene a ing new insigh in o he NVB as hey se e me ely as a means o au oma e he exis ing
coding ou ines.
In e ac i e isualiza ions o e an al e na i e and e ec i e means o analyzing la ge and complex mul i-
dimensional da a, especially in cases whe e he heo e ical unde pinnings o he obse ed phenomena a e no ye
ully es ablished. We belie e ha adop ing he isual analy ics app oach o s udy he NVB has he po en ial o
imp o e ou unde s anding o he complex in e ac ions and empo al dynamics o in e ac ion and g oup p ocesses.
Ins ead o educing he aw inpu da a in o he model ou pu s in an opaque, black-box manne , isual analy ics
o e s a semi-au oma ic app oach, whe e a human obse e engages and in e ac s wi h isual ep esen a ions o
he da a in an i e a i e p ocess, wi h he goal o gene a ing insigh om he da a. In he isualiza ion pipeline, he
aw da a is aken h ough a se ies o adjus able ans o ma ions ha ul ima ely p oduce a low-dimensional (ideally
1D o 2D) g aphical ep esen a ion ha acili a es apid insigh in o he complex da a hanks o he isual and
cogni i e capabili ies o a human obse e . These ans o ma ions ope a e on he da a o on isualiza ions
hemsel es and in he case o he o me , should be limi ed o algeb aic o well-es ablished ea u e ex ac ion
app oaches wi h li le ambigui y in o de o no nega i ely a ec he heo e ical alidi y. Fo example, ins ead o
elying on a black-box model o p edic some abs ac cons uc (such as he quali y o he g oup engagemen ),
we seek o de i e new signals using compu e ision, ace and human pose de ec ions algo i hms and pe o m
some o m o isual signal usion o de i e isualiza ions which a e e ec i e in e ealing empo al pa e ns.
In his con ibu ion we ou line a me hodological app oach using isual analy ics (VA) o s udy non e bal beha io
(NVB) and in e ac ion-based ideo eco dings o g oup lea ning ac i i y. Conc e ely, we show a p ac ical
app oach o:
● ex ac ace, hands and body landma ks a a bi a y empo al esolu ion using simple monocula
came a eco dings and o - he-shel compu e ision ools,
● s uc u e he da a in a way ha is op imized o wo king wi h la ge mul i-dimensional ime-se ies da a,
● ans o m he landma k da a in o non e bal signals co esponding o indi idual’s head mo emen s and
changes in pos u e,
● de i e an app oxima ion o eye con ac in dyadic in e ac ion using he head’s pose and simple ec o
algeb a,
● use some non-s anda d isual mapping echniques o deal wi h high dimensionali y o he da a o he
loss o de ail due o da a agg ega ion
We no e ha ou goal in his wo k is o demons a e some easible me hodological app oaches and p o ide poin e s
o u he esea ch a he han ying o p esen a comp ehensi e and wholis ic ea men o how o measu e and
analyze non e bal beha io and g oup dynamics using he isual analy ics amewo k. VA is agnos ic o o igin
and ype o he analyzed signals and lends i sel pe ec ly o s udying complex mul i-modal such as social signals.
The speci ics o da a collec ion, p ocessing, and isualiza ion p ocedu es should be guided by ca e ul
conside a ion o he inhe en ade-o s be ween he desi ed scope, quali y, and g anula i y o he da a and he
equi ed expe imen al e o .
2 Rela ed wo k & s a e o he A
Human isual pe cep ion and cogni ion a e ema kably powe ul sys ems capable o p ocessing la ge amoun s o
isual in o ma ion in pa allel and in a non-linea ashion. Unlike language o audi o y p ocessing, which is
inhe en ly sequen ial and empo ally cons ained, he isual sys em enables apid, simul aneous assessmen o
mul iple spa ial ea u es such as shape, colo , posi ion, and mo ion (Wa e, 2013). This pa allelism allows humans
o de ec pa e ns, ends, and anomalies in complex isual scenes almos ins an aneously, a capabili y ha
unde lies he e ec i eness o isualiza ion as a cogni i e aid. Visual ep esen a ions ha e long se ed as ex e nal
aids o human cogni ion, p eda ing he ad en o compu e s by cen u ies. F om ea ly ca og aphic ep esen a ions
and scien i ic diag ams o he use o cha s in s a is ical analysis, isualiza ions ha e his o ically suppo ed
easoning, disco e y, and communica ion (Ca d e al., 1999; Shneide man, 1996). They ex end human cogni i e
capaci y by ex e nalizing abs ac da a in o isual o m, he eby le e aging he e iciency o he human pe cep ual
sys em o analy ical asks. Visualiza ion can be b oadly de ined as a mapping o da a o g aphical o ms h ough
a se ies o adjus able ans o ma ions. These ans o ma ions ansla e da a a ibu es in o isual a iables, such
as posi ion, size, shape, colo , o mo ion, ha can be pe cei ed and in e p e ed by human obse e s. The goal o
his mapping p ocess is o gene a e isual ep esen a ions ha acili a e he apid disco e y o pa e ns and
ela ionships wi hin da a, suppo ing insigh gene a ion and hypo hesis o ma ion. In e ac ion plays a c ucial ole
in his p ocess. By allowing use s o p obe and explo e da a a a bi a y le els o de ail, in e ac ion ans o ms
isualiza ion om a s a ic display in o a dynamic explo a o y en i onmen . Common o ms o in e ac ion include
modi ying da a ans o ma ions (e.g., il e ing, agg ega ion), adjus ing isual mappings (e.g., colo encoding o
scaling), and pe o ming iew ans o ma ions (e.g., panning, zooming, na iga ion). These p inciples a e
encapsula ed in Shneide man’s Visual In o ma ion-Seeking Man a: “O e iew i s , zoom and il e , hen de ails
on demand” (Shneide man, 1996). The isual design o a isualiza ion is equally c i ical. I in ol es de eloping
isual me apho s ha e ec i ely ep esen abs ac o complex da a. Since many da a ypes, pa icula ly hose
wi hou inhe en spa ial s uc u e, lack in ui i e g aphical analogs, hough ul design choices a e equi ed o ensu e
ha isual o ms con ey he in ended meaning and main ain pe cep ual cla i y. In o ma ion Visualiza ion is a
sub ield conce ned wi h he isual ep esen a ion o abs ac , non-spa ial da a (C. Chen, 2010). Unlike scien i ic
o geospa ial isualiza ion, which deals wi h physical o spa ially g ounded phenomena (e.g., molecula s uc u es,
e ain models), In o ma ion Visualiza ion add esses da a wi hou na u al spa ial mappings, such as ne wo ks, ex
co po a, o in e ac ion pa e ns. The co e challenges o In o ma ion Visualiza ion lie in he de elopmen o
sui able isual encodings and in e ac ion echniques o such abs ac da a, as well as he design o algo i hms ha
can e icien ly ans o m da a in o meaning ul isual s uc u es. Th ough i e a i e design and use -cen e ed
e alua ion, In o ma ion Visualiza ion seeks o c ea e ep esen a ions ha make unde lying da a pa e ns and
ela ionships pe cep ually salien and cogni i ely accessible.
Visual Analy ics (VA) ex ends he p inciples o isualiza ion and In o ma ion Visualiza ion by in eg a ing hem
wi h me hods om da a analysis, machine lea ning, and human–compu e in e ac ion o suppo analy ical
easoning and decision-making (C. Chen, 2010). I is a mul idisciplina y ield encompassing (1) analy ical
easoning echniques o de i ing insigh , (2) isual ep esen a ions and in e ac ion echniques o explo ing da a,
and (3) da a ep esen a ions and ans o ma ions o managing complexi y (Cook & Thomas, 2005). The cen al
aim o VA is o combine he compu a ional powe o algo i hms wi h he pe cep ual and cogni i e s eng hs o
humans, he eby c ea ing a human–machine analy ical loop. Th ough his in eg a ion, use s can in e ac i ely guide
compu a ional p ocesses, e ine hypo heses, and i e a i ely explo e da a ep esen a ions. Despi e signi ican
ad ances, he discipline o VA emains in an e ol ing s a e. Designing e ec i e isual ep esen a ions o
complex, high-dimensional, o abs ac da a con inues o be as much a c a as i is a science. A ela ed concep is
isual da a mining, which le e ages isualiza ion o suppo he disco e y o s uc u es, co ela ions, and ou lie s
in la ge da ase s (Keim & Wa d, 2003). Visual da a mining and VA sha e he goal o enabling in e ac i e
explo a ion, bu VA ypically places s onge emphasis on easoning and decision-making in complex, unce ain
con ex s.
In spi e o he apid ad ances in a i icial in elligence, compu e ision, and machine lea ning, unde s anding
non e bal beha io (NVB) and in e ac ion dynamics emains a signi ican challenge. Au oma ed me hods in
Social Signal P ocessing (SSP) aim o ex ac NVB om mul imodal da a using machine-lea ning echniques, ye
hei success depends hea ily on la ge, high-quali y aining da ase s and aises ques ions abou he heo e ical
alidi y o in e ed cons uc s. In his con ex , Visual Analy ics o e s an unde explo ed bu p omising app oach.
By combining compu a ional ea u e ex ac ion wi h in e ac i e isualiza ion, VA enables he explo a ion o
complex empo al and ela ional pa e ns in NVB da a ha a e di icul o cap u e h ough pu ely algo i hmic o
s a is ical means. Th ough in e ac i e isual mappings, esea che s can obse e synch ony, mimic y, and
coo dina ion pa e ns ac oss indi iduals o g oups, acili a ing insigh s in o he social and cogni i e dynamics o
in e ac ion. Applying he VA amewo k o NVB analysis hus p o ides a complemen a y, human-cen e ed
app oach o da a in e p e a ion, one ha aligns wi h he s eng hs o ou isual–cogni i e sys em and allows
analys s o disco e eme gen beha io al s uc u es ha migh o he wise emain hidden in aw o agg ega ed da a.
3 Me hodological F amewo k
Figu e 1: Schema ic ep esen a ion o a isualiza ion pipeline con ex ualized o he analysis o non e bal beha io (adap ed
om he o iginal by Ca d e al. (1999, pg. 17)).
3.2 Da a Acquisi ions
Video eco dings o g oups o 3-4 s uden s asked wi h collabo a i ely sol ing an assignmen on a compu e we e
ob ained o wo ypes o se ings: 1) a ace- o- ace ( 2 ) labo a o y se ing and 2) an online mee ing o a s udy
pe o med du ing he Co id-19 pandemics. The b oade esea ch con ex as well as he de ails on he s udy design
and he compliance wi h e hical s anda ds a e epo ed in Pane h e al. (2024). In he ace- o- ace se ing, he
s uden s we e sea ed nex o each o he a a la ge able equipped wi h wo moni o s, a compu e mouse and a
keyboa d. A ideo came a was ins alled a a dis ance o ca. 1.5 m om he able wi h a good iew on all h ee
pa icipan s. A ypical consume -g ade digi al came a (Panasonic HC-X909) wi h in eg a ed mic ophone we e
used o eco d each lea ning session. Fo he online s udy, eco dings o online mee ing sessions we e p oduced
consis ing o a mosaic o he indi idual pa icipan s’ came a eeds. The leng hs o he ideo eco dings anged
om 75 minu es o 100 minu es, and om 25 minu es o 30 minu es, o he 2 and he online s udy, espec i ely.
3.3 Ba ch-p ocessing
Fo he ex ac ion o landma ks om ideo eco dings we used he MediaPipe (Luga esi e al., 2019) lib a y and
speci ically he “holis ic” solu ion sui e which combines he pe cep ion pipelines o de ec ing ace, hand and
body (pose) landma ks in o a seman ically consis en end- o-end solu ion. Ou a ionale o choosing MediaPipe
was p ima ily based on he p ojec ’s mul i-pla o m and mul i-language suppo , he ease o in eg a ion in o ou
codebase and i s eal- ime capabili ies making he p ocessing o la ge amoun s o ideo da a manageable. Table 1
shows he sizes o ace, hand and pose landma ks in e ms o he numbe o dis inc poin s. A o al o 543
landma ks a e hus de ec ed o a single pe son o each sampled ideo ame. In ou case howe e , his didn’
include he lowe po ions o he body (hips, legs, ee ) as hey we e occluded om he came a iew o he en i e y
o he ideo eco ding.
Table 1: Dimensions o landma ks ex ac ed wi h he MediaPipe lib a y (mediapipe-holis ic, 2020)
Fea u e
Size
Coo dina e sys em
Face
468 × 3
sc een, me ic
Pose
33 × 4
sc een
Hands (lhand, hand)
21 × 3
sc een
A schema ic o he ba ch-p ocessing ou ine designed o ex ac he landma k da a om ideos is shown in
Figu e 2. S a ing wi h he aw ideo eco dings, i s , he indi idual ames a e sampled a a speci ied cons an
sampling a e. We used an a bi a y sampling a e o 5 ames pe second (5 Hz) in o de o ob ain da a a
su icien ly high empo al esolu ions. This sampling a e is high enough ha i enables cap u ing mic o-
exp essions such as head nodding om he head mo emen da a. In he online s udy, he ideo eco dings
consis ed o a mosaic o indi idual pa icipan s’ came a eeds. A e he ame sampling s ep, each mosaic ame
was di ided in o ROIs ( egions o in e es ) co esponding o he indi idual came a eeds, and each ROI was hen
p ocessed sepa a ely. Since MediaPipe’s holis ic solu ion lacked mul i-pe son unc ionali y a he ime o he
s udy, he pipeline inco po a ed a masking s ep. This s ep gene a ed a sequence o ames, each masking all bu
one pe son, be o e passing hem o he landma k ex ac ion s age. We used he p e- ained YOLO 8 model o
ob ain he ins ance segmen a ion maps o he pe son class and applied simple heu is ics (le - o- igh so ing) o
iden i y he indi idual pe sons in he ame. I should be no ed ha his simple app oach is only applicable o s a ic
scenes (as in ou case) whe e he pa icipan s don’ change hei posi ions ela i e o each o he du ing he ideo
eco ding. A mo e obus app oach would need o in ol e some o m o pe son acking. Finally, he ex ac ed
landma ks o each indi idual pe son a e s o ed oge he wi h co esponding imes amps and he me ada a
consis ing o anonymous s udy, g oup and pe son iden i ie s in he HDF5 o ma which is op imized o la ge
hie a chical and mul i-dimensional a ay da a. The HDF5 da ase s con ain he landma k da a o all g oups o all
pe sons wi hin each g oup and se e as inpu s in o he isualiza ion pipeline.
An example o ex ac ed landma ks o a single ideo ame is p esen ed in Figu e 3. I shows he ace-mesh
consis ing o 468 disc e e, he uppe -body po ion o he pose (body) landma ks and he le and he igh hand
landma ks each consis ing o 21 disc e e poin s and using di e en colo -coding o igh e sus le hand.
Figu e 2: Schema ics o he ba ch-p ocessing pipeline o he ex ac ion o landma ks om ideo eco ding.
Figu e 3: Example o a ideo ame wi h ex ac ed landma ks shown as an o e lay.
As he aw ideo eco dings cons i u e pe sonal, po en ially sensi i e da a o he s udy pa icipan s, we op ed o
a s ic sepa a ion o any pe son- ela ed da a in he p ocessed da ase s. The landma k da a ob ained om he ba ch-
p ocessing a e s o ed in nume ic coo dina e alues associa ed wi h he abs ac pe son iden i ie s (designa ed as
«P1», «P2», e c.) and wi hou he co esponding ideo ame samples. In o de o be able o quali y-check he
landma k de ec ions o p oduce isualiza ion ou pu s, we de eloped an e icien ou ine o combining he
landma k da a wi h he co esponding ideo ames on- he- ly, based on a pe o man andom-access e ie al o
a bi a y ames wi hin a ideo ile. This also had he posi i e side-e ec o signi ican ly educing he ile sizes
equi ed o s o ing he HDF5 da ase s.
3.4 Coo dina e spaces
The ou pu o he landma k de ec ion algo i hm (MediaPipe) is gi en in 3D sc een coo dina es ha ep esen he
landma k posi ion in he ideo ame, no malized o he ame’s wid h and heigh , espec i ely (Figu e 4, le ). z-
alues a e ela i e coo dina es ob ained unde he weak pe spec i e came a model and escaled o ma ch he scale
o x (Ka ynnik e al., 2019). Thus, he z-coo ds a e no me ically accu a e bu accep able o he in ended
applica ion domain o he MediaPipe sui e o ools (AR). We deno e as “pixel coo dina es” he 2D sc een
coo dina es scaled o he ideo ame’s wid h and he heigh (Figu e 4, middle). The pixel coo dina es a e hus
dependen on he esolu ion o he ideo ame.
Fo he ace landma ks, he MediaPipe lib a y also includes a ou ine ha es ima es he me ic coo dina es based
on a ligh -weigh P oc us es analysis o a canonical ace-mesh model which is speci ied in cen ime e uni s
(Figu e 4, igh ). We u ilized he Py hon e sion by Rasmus (2021) o ob ain he 3D me ic coo dina es o he
ace-mesh as well as i s posi ion and o ien a ion (pose) unde he igid-body ans o ma ion.
Figu e 4: The di e en coo dina e sys ems used o he landma ks a di e en s ages o he isualiza ion pipeline.
3.5 Da a ep esen a ion
Wi hin he isualiza ion pipeline, he p ocessed da a om he HDF5 s o e o an en i e g oup (single ideo
eco ding) is mapped o a specialized in-memo y da a s uc u e which is op imized o handling mul i-dimensional
a ay da a. Figu e 5 shows he concep ual ep esen a ion o his da a s uc u e which was implemen ed using he
xa ay Py hon lib a y (Hoye & Joseph, 2017). This s uc u e uses a well-es ablished API (Applica ion
P og amming In e ace) o handling complex nume ical a ays in he Py hon ecosys em and acili a es slicing o
agg ega ing he da a along a bi a y dimensions. Mo eo e , pe o man ma hema ical ope a ions hanks o he
b oadcas ing and ec o iza ion (Ha is e al., 2020) can be applied o en i e mul i-dimensional a ays which is
c ucial o handling la ge da a wi hing he isualiza ion pipeline.
This da a ep esen a ion aligns e ec i ely wi h he concep ual app oach o he space- ime cubes o isualizing
complex empo al da a desc ibed by Bach e al. (2014). In ac , i can hough o as a gene aliza ion o he space-
ime cube in o highe dimensions (“space- ime hype cubes”). While i is beyond he scope o his wo k o elabo a e
on such a gene aliza ion o he concep ual amewo k, in p ac ical e ms, his da a ep esen a ion oge he wi h
isualiza ion APIs such as HoloViews (S e ens e al., 2015) ha suppo easy c ea ion o in e ac i e isualiza ions
o e dimensioned con aine s o da a, make many o he ope a ions desc ibed in Bach e al. (2014) i ial o
implemen .
Figu e 5: A: Concep ual ep esen a ion o a da a s uc u e used o accessing he landma ks da a o an en i e g oup om he
HDF5 s o e as a mul i-dimensional a ay wi h di e en dimensions. The “ ea u e” dimension deno es ei he a se o
landma ks co esponding o dis inc body pa s ( ace, le hand, e c.) o a de i ed quan i y calcula ed in he pos -p ocessing
such as ace-mesh posi ion and di ec ion, “landma k” e e s o he indi idual landma k index wi hin a se and “xyz”
ep esen he 3D ca esian coo dina es. B: Schema ic ep esen a ion o he “space- ime hype cube” concep ual amewo k as
an ex ension o he space- ime cube by Bach e al. (2014) in o highe dimensions.
3.6 De i ed signals
The ex ac ed landma ks ep esen dense geome ic da a om which i is possible o ex ac new de i ed signals
ele an o NVB wi h sui able da a ans o ma ions. A ange o spa ial que ies can be made wi h espec o he
ins an aneous (e.g. posi ion, o ien a ion, dis ance) o dynamic alues (e.g. eloci y, accele a ion). In his wo k, we
explo ed only a ew o such de i ed signals desc ibed below and much mo e could be done in u u e esea ch. Fo
example, he me ic ace-mesh da a could be used o de i e new signals ela ed o he acial exp essions wi hin
he pe missible spa ial and empo al esolu ion, o he hands and he pose landma ks could be used o ine-
g anula analysis o ges u es. In he con ex o g oup p ocesses esea ch, e en mo e in e es ing would be he
explo a ion o he spa io- empo al pa e ns a he g oup le el by conside ing signals de i ed om all pe sons
simul aneously.
Th ough he P oc us es Analysis used by he MediaPipe lib a y, we ob ain he ace-mesh in me ic coo dina es
supe imposed wi h he canonical ace-mesh model. This p ocedu e emo es he scale, o a ion, and ansla ion
componen s and addi ionally p o ides he 4x4 homogeneous ans o ma ion ma ix ep esen ing he igid-body
ans o ma ion o he ace-mesh. F om his, we ex ac he ansla ional componen s (𝑡_𝑥, 𝑡_𝑦, 𝑡_𝑧) ep esen ing
he posi ion o he ace-mesh in he me ic space and he axis-aligned elemen al o a ions co esponding o “pi ch”,
“yaw” and “ oll” (ex insic Eule angles, Figu e 6) using he scipy.spa ial module (Vi anen e al., 2020):
𝐇=[𝐑 𝐓
0 1]=[𝑟11 𝑟12 𝑟13 𝑡𝑥
𝑟21 𝑟22 𝑟23 𝑡𝑦
𝑟31 𝑟32 𝑟33 𝑡𝑧
0 0 0 1] 𝐑=𝐑(pi ch,yaw, oll) (1)
Figu e 6: The o a ion o he ace-mesh a ound he x-axis (pi ch), and i s i s wo de i a i es
Using he ace-mesh pose, we es ima e he posi ion and he di ec ion o indi idual’s ace (head). The “ acing”
di ec ion (𝑑) is ob ained by he ma ix- ec o mul iplica ion o he ace pose (𝐻) wi h o wa d- acing ec o o
canonical ace-mesh co esponding o [0,0,1]:
𝑑=𝐻𝑧 𝑧=[0,0,1] (2)
Based on he posi ions and di ec ions we can es ima e he mu ual acing o ien a ions o each pai o indi iduals
which se e as a p oxy o eye-con ac , o mo e gene ally, o e alua ing how indi iduals a e spa ially o ien ed
wi h espec o each o he .
Fo a gi en pai (pe sons A and B) le 𝑝𝐴 and 𝑝𝐵 deno e he posi ions, and 𝑑𝐴
and 𝑑𝐵
he acing-di ec ions,
espec i ely. The ela i e posi ional ec o 𝑟𝐵𝐴 is de ined as he ec o om 𝐴 o 𝐵:
𝑟𝐴𝐵=𝑝𝐵−𝑝𝐴 (3)
Wha we a e looking o is a p ojec ion o 𝑑𝐴
on o 𝑟𝐵𝐴 :
𝑢𝑓1=p oj𝑟𝐴𝐵𝑑𝐴
=𝑑𝐴
⋅𝑟𝐵𝐴
|𝑟𝐴𝐵|2𝑟𝐵𝐴 (4)
Using no malized ec o s, his simpli ies o a do p oduc :
do 𝐴→𝐵=𝑑𝐴
⋅𝑟𝐴𝐵
(5)
which yields a alue be ween −1 ( acing away) and 1 ( acing di ec ly owa ds he o he pe son). Finally, o ob ain
he “ acing o ien a ion index” we map he alue o he do p oduc o a mo e in e p e able ange o [0,1]:
o ien a ion𝐴→𝐵≈do 𝐴→𝐵+1
2 (6)
We e e o his alue as “mu ual o ien a ion index” (“MOI”) as i does no necessa ily co espond o he eye
con ac . A ep esen a ion o eco ded scene om he bi d’s-eye pe spec i e and he ec o ep esen a ion o he
calcula ion o he MOI is shown in Figu e 7.
Figu e 7: Le : Bi d’s-eye iew o he eco ded scene showing he posi ions and o ien a ions o he ace-meshes o pe sons
P1 o P3 oge he wi h he placemen o he came a a compu e sc een in me ic coo dina es. Righ : Vec o ep esen a ion
used o he calcula ion o he “mu ual o ien a ion indices” (MOI) be ween pe sons “A” and “B”.
Kinema ic Pos u e Model
To s eamline he analysis o pos u al changes we implemen ed a kinema ic model in e ace analogous o hose
used in obo ics and human mo ion imi a ion (Kulić, 2019). The kinema ic model is de ined as a hie a chical
g aph o nodes ep esen ing ana omic landma ks (i.e. join s) o mechanical poin s (e.g. cen e o mass o he ace
mesh) s a ing wi h he midpoin be ween he shoulde s as an a bi a ily de ined oo node. Due o occlusions, we
limi he analysis o he isible po ion o he uppe body. Since he es ima ion o he me ic coo dina es in
MediaPipe is only possible o he ace landma ks ( ace-mesh), bu no o hands and body landma k, all geome ic
que ies a e pe o med in pixel coo dina es. While his is limi ing and doesn’ allow ue spa ial que ies o be
pe o med, i is ne e heless su icien in o ma ion o analyze empo al changes in he spa ial con igu a ion o he
pos u e.
We no e ha he pu pose o ou kinema ic model is no o p o ide he o wa d and in e se kinema ics (Kulić,
2019) unc ionali y as in obo ics. Ra he , i s main pu pose is o p o ide an in ui i e API o de ining geome ic
o mo ion-dynamic que ies ela ed o he pos u e (such as angles, accele a ions o dis ances be ween dis inc
nodes, Figu e 8) in he code. In a la e sec ion, we show how hese da a can be used as inpu ea u es o a ec o -
embedding-based dimensionali y educ ion app oach o explo e empo al pa e ns in pos u al changes.
Figu e 8: Pos u e analysis on an example o ace- o-hand dis ance and join angle geome ic que ies. (Le ) Geome ic
objec s used in ace- o-hand dis ance que ies de ined as poin s (w is s) o polygons ( ace, hands). Middle: dis ances o ace
o w is poin s and hand polygons, espec i ely, in pixel uni s. (Righ ) Join angles in deg ees.
3.7 Visual Mappings
Visual mapping e e s o he mapping o da a o isual (g aphical) s uc u es a anged spa ially on he ou pu
medium (e.g. sc een). Di e en da a a ibu es o dimensions can be encoded along di e en isual channels such
as posi ion, leng h, a ea, colo , shape, layou , e c. Howe e , c ea ing e icien isual ep esen a ions o abs ac
o non-spa ial da a is a non- i ial ask ha mus be guided by he p inciples o design and he Ges al p inciples
o isual pe cep ion and e alua ed in e ms o exp essi eness and e ec i eness o he isual design (Munzne ,
2014). Visualiza ions o da a wi h mo e han 3 dimensions p esen a simila challenge, whe eby specially-designed
isual mappings a e used o map such da a on o a 2D medium. Fo example, Beddow (1990) used a glyph-based
isualiza ion echnique, which he dubbed “shape coding”, o analyze he empo al pa e ns in 13-pa ame e
magne osphe e and sola wind da a. Fo he pu poses o join ly isualizing highe -dimensional ime-o ien ed da a
in he p esen wo k, we implemen ed a cus om glyph-based isual mapping simila o he “shape coding” by
Beddow (1990), which we e e o as “glyph a ays” and “glyph maps”. Figu e 9 depic s he gene al idea behind
glyph a ays and glyph maps using andom dummy da a. The glyph a ays a e made up o a 2D a angemen o
elemen al shapes (ci cle o squa e) whe e he alues (o po en ially wo alues) can be encoded using he size o
he colo (o bo h). Since he ou pu medium o he isualiza ion is limi ed o wo dimensions, he highe
dimensions (beyond 3 o 4) a e handled by a anging he glyph a ays in o hie a chies (glyph maps) using he
p oximi y Ges al p inciple in a ecu si e pa e n (Figu e 9, bo om). The dimensions need no necessa ily
ep esen dis inc (spa ial) dimensions and can ins ead ep esen a space- illing a angemen o an o he wise 1D
da a (simila o ex -w apping and a angemen in o mul iple columns). In his implemen a ion he indi idual
dimensions a e s ic ly o dinal and in conc e e isualiza ions (Figu e 13) hey a e anno a ed wi h ex ma ks o
accompanied by an explana o y legend.
Figu e 9: Visualiza ion o highe -dimensional da a using “glyph a ays” and “glyph maps”. (le ) example glyph a ays o
andom da a wi h spa ial dimensions o 10x5 encoding single o wo alues using di e en isual channels (colo , size). The
shape o he elemen al glyph (squa e o ci cle) can be chosen delibe a ely bu is no used as a isual encoding. Missing da a
(2. and 3. columns and ows) can be indica ed wi h a dis inc colo o size (o bo h). ( igh ) indi idual glyph a ays can be
a anged hie a chically based on he p oximi y Ges al p inciple o ep esen highe dimensions on a 2D medium.
4 Demons a i e Use Cases
Head mo emen pa e ns
The isualiza ion o he aw mo ion da a wi h e y high empo al esolu ion is imp ac ical o high-le el asks
such as analyzing pa e ns in NVB as he da a quan i y is o e whelming and di icul o make sense o . In e ac i e
isualiza ions alle ia e his p oblem somewha hanks o zooming and panning. Fo example, Figu e 10 shows a
segmen o he x- o a ion (pi ch) da a o a pa icula pe son wi h a no able peak co esponding o head-nod.
Ano he way o educe he in o ma ion o e load is o agg ega e ( esample) he aw da a a lowe ime esolu ion
a he cos o losing de ail and he possibili y o analyze small ime-scale e en s (mic o-exp essions).
6 Conclusion
In his con ibu ion we demons a ed a me hodological app oach o analyse non e bal beha io om ideo
eco dings using he Visual Analy ics amewo k. We desc ibed a da a-p ocessing pipeline o ex ac ich high-
empo al- esolu ion body mo emen da a using o - he-shel compu e ision ools ha de ec ace, hands and
body landma ks. Based on hese da a, we showcased possible app oaches o de i e indi idual and dyadic
non e bal beha io , as well as echniques o designing e ec i e isualiza ions o complex high-dimensional
da a. We belie e ha Visual Analy ics unlocks new oppo uni ies o u ilize he high- olume mul i-modal senso
da a o s udy he complex dynamics o NVB and in e ac ion and in i e u u e esea ch o u he e ine hese
me hods and demons a e hei use ulness in eal-wo ld scena ios.
Re e ences
Ca d, S. K., Mackinlay, J. D., & Shneide man, B. (Eds). (1999). Readings in in o ma ion isualiza ion: Using
ision o hink. Mo gan Kau mann Publishe s Inc.
Chen, C. (2010). In o ma ion isualiza ion. WIREs Compu a ional S a is ics, 2(4), 387–403.
h ps://doi.o g/10.1002/wics.89
Chen, J., Wang, M., Ki schne , P. A., & Tsai, C.-C. (2018). The Role o Collabo a ion, Compu e Use, Lea ning
En i onmen s, and Suppo ing S a egies in CSCL: A Me a-Analysis. Re iew o Educa ional Resea ch, 88(6),
799–843.
Cook, K. A., & Thomas, J. J. (2005). Illumina ing he Pa h: The Resea ch and De elopmen Agenda o Visual
Analy ics. IEEE Compu e Socie y, Los Alami os, CA, Uni ed S a es(US).
h ps://www.os i.go /biblio/912515
Keim, D., & Wa d, M. (2003). Visualiza ion. In M. Be hold & D. J. Hand (Eds), In elligen Da a Analysis: An
In oduc ion (pp. 403–427). Sp inge . h ps://doi.o g/10.1007/978-3-540-48625-1_11
Kulić, D. (2019). Human Mo ion Imi a ion. In A. Goswami & P. Vadakkepa (Eds), Humanoid Robo ics: A
Re e ence (pp. 1657–1677). Sp inge Ne he lands. h ps://doi.o g/10.1007/978-94-007-6046-2_34
Lehmann-Willenb ock, N., & Hung, H. (2023). A Mul imodal Social Signal P ocessing App oach o Team
In e ac ions. O ganiza ional Resea ch Me hods, 10944281231202741.
h ps://doi.o g/10.1177/10944281231202741
Luciano, M. M., Ma hieu, J. E., Pa k, S., & Tannenbaum, S. I. (2018). A Fi ing App oach o Cons uc and
Measu emen Alignmen : The Role o Big Da a in Ad ancing Dynamic Theo ies. O ganiza ional Resea ch
Me hods, 21(3), 592–632. h ps://doi.o g/10.1177/1094428117728372
Mülle , J., Fàb egues, S., Guen he , E. A., & Romano, M. J. (2019). Using Senso s in O ganiza ional Resea ch—
Cla i ying Ra ionales and Valida ion Challenges o Mixed Me hods. F on ie s in Psychology, 10.
h ps://doi.o g/10.3389/ psyg.2019.01188
Pane h, L., Jei zine , L. T., Rack, O., & Zahn, C. (2023). A Mul i-Me hod App oach o Cap u e Quali y o
Collabo a i e G oup Engagemen . 91–98. h ps://doi.o g/10.22318/cscl2023.134087
Renie , L. A., Schmid Mas , M., Dael, N., & Kleinlogel, E. P. (2021). Non e bal Social Sensing: Wha Social
Sensing Can and Canno Do o he S udy o Non e bal Beha io F om Video. F on ie s in Psychology, 12,
606548. h ps://doi.o g/10.3389/ psyg.2021.606548
Shneide man, B. (1996). The eyes ha e i : A ask by da a ype axonomy o in o ma ion isualiza ions.
P oceedings 1996 IEEE Symposium on Visual Languages, 336–343. h ps://doi.o g/10.1109/VL.1996.545307
Vincia elli, A., Salamin, H., & Pan ic, M. (2009). Social Signal P ocessing: Unde s anding social in e ac ions
h ough non e bal beha io analysis. 2009 IEEE Compu e Socie y Con e ence on Compu e Vision and
Pa e n Recogni ion Wo kshops, 42–49. h ps://doi.o g/10.1109/CVPRW.2009.5204290
Zheng, L., Long, M., Niu, J., & Zhong, L. (2023). An au oma ed g oup lea ning engagemen analysis and eedback
app oach o p omo ing collabo a i e knowledge building, g oup pe o mance, and socially sha ed egula ion
in CSCL. In e na ional Jou nal o Compu e -Suppo ed Collabo a i e Lea ning, 18(1), 101–133.
h ps://doi.o g/10.1007/s11412-023-09386-0