Comparison of Visual Saliency for Dynamic Point Clouds: Task-free vs. Task-dependent

Author: Zhou, Xuemei; Viola, Irene; Rossi, Silvia; Cesar, Pablo

Publisher: Zenodo

DOI: 10.1109/TVCG.2025.3549863

Source: https://zenodo.org/records/17674125/files/Comparison_of_Visual_Saliency_for_Dynamic_Point_Clouds_Task-free_vs._Task-dependent.pdf

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 31, NO. 5, MAY 2025
2964
Recei ed 18 Sep embe 2024; e ised 13 Janua y 2025; accep ed 13 Janua y 2025.
Da e o publica ion 11 Ma ch 2025; da e o cu en e sion 31 Ma ch 2025.
Digi al Objec Iden i ie no. 10.1109/TVCG.2025.3549863
1077-2626 © 2025 IEEE. All igh s ese ed, including igh s o ex and da a mining, and aining o a i icial in elligence and simila echnologies.
Pe sonal use is pe mi ed, bu epublica ion/ edis ibu ion equi es IEEE pe mission. See h ps://www.ieee.o g/publica ions/ igh s/index.h ml o mo e in o ma ion.
Compa ison o Visual Saliency o Dynamic Poin Clouds:
Task- ee s. Task-dependen
Xuemei Zhou , I ene Viola , Sil ia Rossi , Pablo Cesa
Time
Fig. 1: Fixa ion maps o dance sequences wi h uni o m empo al sampling e e y 30 ames. The blue egions ep esen ask- ee
condi ions, while he ed egions indica e ask-dependen condi ions. G ay a eas deno e nonsalien egions in bo h condi ions, and
o e lapping a eas a e shown as a blend o he wo colo maps.
Abs ac — This pape p esen s a Task-F ee eye- acking da ase o Dynamic Poin Clouds (TF-DPC) aimed a in es iga ing isual
a en ion. The da ase is composed o eye gaze and head mo emen s collec ed om 24 pa icipan s obse ing 19 scanned dynamic
poin clouds in a Vi ual Reali y (VR) en i onmen wi h 6 deg ees o eedom. We compa e he isual saliency maps gene a ed om his
da ase wi h hose om a p io ask-dependen expe imen ( ocused on quali y assessmen ) o explo e how high-le el asks inﬂuence
human isual a en ion. To measu e he simila i y be ween hese isual saliency maps, we apply he well-known Pea son co ela ion
coe ﬁcien and an adap ed e sion o he Ea h Mo e ’s Dis ance me ic, which akes in o accoun bo h spa ial in o ma ion and he
deg ees o saliency. Ou expe imen al esul s p o ide bo h quali a i e and quan i a i e insigh s, e ealing signiﬁcan di e ences in
isual a en ion due o ask inﬂuence. This wo k enhances ou unde s anding o he isual a en ion o dynamic poin cloud (speciﬁcally
human ﬁgu es) in VR om gaze and human mo emen ajec o ies, and highligh s he impac o ask-dependen ac o s, o e ing
aluable guidance o ad ancing isual saliency models and imp o ing VR pe cep ion.
Index Te ms—dynamic poin cloud, eye- acking, ask- ee, isual saliency me ic, simila i y measu emen
1INTRODUCTION
The Human Vision Sys em (HVS) p ocesses as amoun s o isual
in o ma ion by selec i ely ocusing on ele an pa s o he su ounding
en i onmen . This mechanism, known as isual saliency o isual
a en ion, allows o e ﬁcien in e p e a ion o complex scenes. Visual
saliency has become a key ocus in image and ideo p ocessing due
o i s abili y o e ﬁcien ly iden i y egions o in e es , imp o ing bo h
p ocessing and ansmission [20,33], wi h ex ensi e s udies al eady
conduc ed in his a ea [10,11,28,43]. In pa icula , esea che s ha e
in es iga ed how he oculomo o beha io and a en ion is a ec ed by
high-le el isual asks [32], such as Image Quali y Assessmen (IQA)
o Video Quali y Assessmen (VQA), compa ed wi h ee iewing,
whe e use s obse e he media con en as hey no mally would, which
esul s in so-called na u al scene saliency. Fo example, Liu [36] and
Le Meu [32] ha e collec ed eye- acking da a unde bo h ee iewing
and quali y assessmen scena ios. Thei ﬁndings sugges ha he main
egion o in e es o image/ ideo emains highly simila , wi h ce ain
de ia ions obse ed du ing quali y assessmen asks.
• Xuemei Zhou and Pablo Cesa a e wi h Cen um Wiskunde en In o ma ica,
Ams e dam, The Ne he lands, and wi h TU Del , Del , The Ne he lands.
E-mail: [email p o ec ed].
• I ene Viola, and Sil ia Rossi a e wi h Cen um Wiskunde en In o ma ica,
Ams e dam, The Ne he lands. E-mail: [email p o ec ed]
Recen ad ancemen s in imme si e media ha e shi ed he ocus
on 3D con en . Speciﬁcally, olume ic ideo, such as dynamic poin
cloud, has become one o he mos popula o ma s [6]. Unlike 2D
images and ideos, whe e isual saliency has been ex ensi ely s udied,
dynamic poin clouds p esen unique challenges ha ha e no been
ully add essed in he li e a u e. Fo example, dynamic poin clouds
di e om adi ional ideo in e ms o da a olume, and he use o
Head-Moun ed Displays (HMDs) o hei consump ion in oduces
addi ional complexi ies. Thus, es ablished ﬁndings o isual saliency
in image and ideo, such as he spa ial bias [45] and cen al bias [62]
in ﬁxa ion da a, may no hold o dynamic poin clouds.
One o he main challenges hinde ing he ad ancemen o saliency-
guided applica ions o dynamic poin clouds is he lack o g ound- u h
saliency da a. To add ess his gap, se e al s udies ha e a emp ed o
collec eye- acking da a o gene a e g ound- u h saliency maps o
poin clouds. Fo ins ance, Alexiou e al. [7] conduc ed an eye- acking
expe imen in VR unde ask-dependen scena io. Nguyen e al. [40]
eleased an open sou ce, ask- ee eye- acking da ase o 4 dynamic
poin clouds in mixed eali y using Hololens 2. Zhou e al. [69] p e-
sen ed a ask-dependen eye- acking da ase o 50 dynamic poin
clouds. A summa y o exis ing isual a en ion da ase s o poin
clouds is shown in Table 1. These da ase s, in which gaze pa e ns
a e eco ded unde ee iewing o di e en ask-dependen condi ions,
ha e been ins umen al in c ea ing g ound- u h isual saliency maps
used o model design and alida ion. Despi e hese e o s, he impac
o ask- ee and ask-dependen condi ions on human isual a en ion
deploymen in poin clouds is s ill unexplo ed, unlike in i s 2D coun e -
pa . A da ase ha cap u es saliency maps o he same con en ac oss
Au ho ized licensed use limi ed o: TU Del Lib a y. Downloaded on Augus 29,2025 a 20:12:19 UTC om IEEE Xplo e. Res ic ions apply.
2965
zhou ET AL.: CompA ison o VisuAL sALiEnCy o DynAmiC poinT CLouDs...
Table 1: Publicly a ailable isual a en ion da ase s o poin clouds.
Da ase Type S imuli Display In e ac ion *Visual A en ion Task- ee
ViA PCVR [7] S a ic 8 VR   
QAVA-DPC [69]Dynamic 50 VR   
ComPEQ-MR [40]Dynamic 4 AR   
TF-DPC (Ou s) Dynamic 19 VR   
*
In e ac ion he e e e s o being able o mo e a ound and obse e he poin cloud om di e en angles.
di e en pe cep ual asks in VR/AR is s ill needed o assess he impac
o asks.
In his s udy, we aim o add ess hese challenges by c ea ing a no el
Task-F ee da ase o Dynamic Poin Clouds (TF-DPC), which will ben-
eﬁ bo h he esea ch communi y and p o ide ex ensi e aining da a.
The da ase is composed o eye gaze and head mo emen s collec ed
om 24 pa icipan s obse ing 19 scanned dynamic poin cloud in a
Vi ual Reali y (VR) en i onmen wi h 6 Deg ees o F eedom (DoF).
Based on he collec ed da a, we in es iga e how human isual a en-
ion is a ec ed by high-le el isual asks, by compa ing ou ask- ee
saliency maps wi h hose ob ained in a subjec i e quali y assessmen
scena io p esen ed in [69]. To be e quan i y he di e ence be ween
saliency maps in ask- ee and ask-dependen scena ios, we use Pea -
son’s Co ela ion Coe ﬁcien (PCC) and a modiﬁed e sion o he
Ea h Mo e ’s Dis ance (EMD) me ic o image e ie al [55]. Ou
expe imen al esul s p o ide bo h quali a i e and quan i a i e insigh s,
e ealing signiﬁcan di e ences in isual a en ion due o ask inﬂu-
ence. Fo example, Figu e 1shows he ﬁxa ion maps o he dance
sequence in bo h ask- ee and ask-dependen condi ions ( ep esen ed
by blue and ed a eas, espec i ely). Use s end o ocus on di e en
egions o he con en based on he expe imen condi ion. Speciﬁcally,
in he ask-dependen scena io, pa icipan s show a mo e consis en
ocus on acial exp essions o ﬁne de ails, eﬂec ing he speciﬁc ask o
e alua ing he quali y o he con en . To conclude, ou con ibu ions
a e h ee old and can be summa ised as ollows:
•
We c ea e a isual a en ion da ase o 19 o iginal dy-
namic poin clouds in a ask- ee VR expe imen wi h 6-
DoF. We elease all aw da a, con aining he gaze sam-
ples and mo emen ajec o y collec ed in ou s udy, along
wi h he code o compu e and compa e he dynamic poin
cloud isual saliency maps.
h ps://gi hub.com/cwi-dis/
TVCG2025-TaskF ee_Poin CloudEyeT acking
•
We p o ide an in-dep h analysis o he collec ed da ase , using
quan i a i e measu es o explo e he da ase in e ms o gaze
and ajec o y; u he mo e, we use quali a i e me hods o d aw
u he insigh s om in e iews.
•
We compa e he isual saliency maps unde ask- ee and ask-
dependen condi ions, o explo e he impac o he high-le el
quali y assessmen ask on human isual a en ion.
This no el da ase o e s aluable oppo uni ies o de eloping e-
liable saliency models o 3D ep esen a ions, which a e essen ial o
augmen ed and mixed eali y applica ions [23,24]. Fo ins ance, hey
can enable ad ancemen s in se e al a eas, including saliency-guided
comp ession [44,66] and li e econs uc ion [51] o poin cloud s eam-
ing, saliency-awa e poin cloud mixup o da a augmen a ion [67],
olume isualiza ion [27], o ea ed ende ing [54], poin cloud ans-
mission [51] and isual quali y assessmen [12,61,68].
2R
ELATED WORK
2.1 Visual A en ion o Poin Clouds
In he ea ly s ages o isual a en ion compu a ion, due o he limi a-
ions o eye- acking echnologies, di e en collec ion p ocedu es o
salien poin s we e pu sued. Fo example, Chen e al. [14] in es iga e
“Schelling poin s" on 3D meshes, ea u e poin s selec ed by people
in a pu e coo dina ion game due o hei salience. They designed an
online expe imen ha asked people o selec poin s ia mouse- acking
echnology on 3D su aces ha hey expec ed would be selec ed by
o he people. This da ase is widely used as a benchma k o objec i e
saliency de ec ion algo i hms o colo less poin cloud/mesh [16,59].
La e me hods employ handc a ed desc ip o s [16,35] om mo e low-
le el geome ic p ope ies o de ec he poin cloud/mesh saliency, bu
hese app oaches lack exp essi eness and o e look eal human iew-
ing beha io s [39]. Mo e ecen ly, o explo e he isual a en ion o
3D poin clouds, eye- acking expe imen s emain he main way o
unde s and human isual beha io s. Abid e al. [2] compu e he isual
saliency o he poin cloud conside ing he iewpoin om which he
3D con en was seen/ ende ed, using an o ﬂine-compu ed iew-based
saliency map. One eye- acking expe imen on 2D sc een is conduc ed
o e i y he p oposed saliency map. Alexiou e al. [7] conduc an
eye- acking expe imen in an imme si e 3D scene. A me hod o ex-
ploi he high-quali y eco ded gaze measu emen s is in oduced based
on pe -session p oﬁling, and a scheme o de e mine a eas o ﬁxa ions
in a s a ic poin cloud is p oposed. Zhou e al. [69] collec a da ase
con aining he subjec i e opinion sco es and isual saliency maps in a
VR en i onmen using eye- acking echnology, which ﬁ s es ablishes
a link be ween quali y assessmen and isual a en ion wi hin he con-
ex o he dynamic poin clouds. Nguyen e al. [40] p opose a da ase
wi h comp essed dynamic poin clouds, a ing sco es, and eye- acking
da a wi h Augmen ed Reali y (AR) HMD. Howe e , only 4 e e ence
dynamic poin clouds ha e an associa ed isual saliency map. In ou
da ase , we collec ed a dynamic poin cloud da ase in VR wi h ee
iewing. By using he same con en as [69]and[40] and ex ending i
wi h o he dynamic sequences, ou da ase p o ides he possibili y o
in es iga e he ask impac o de ice impac o isual a en ion deploy-
men in VR o be ween VR and AR, as well as using he collec ed da a
o o he applica ions (i.e., saliency-guided comp ession).
2.2 Task Impac on Visual A en ion
Unde s anding how he alloca ion o human isual a en ion changes
depending on pe cep ual asks o e s clea beneﬁ s in de eloping ech-
niques and imp o ing he quali y o expe ience in VR/AR. This is a
complex beha io ha holds g ea impo ance o he ﬁeld o IQA/VQA.
Speciﬁcally, ask- ee means ha he use obse es he con en as na u-
ally as possible, wi h ﬁxa ion da a om such ee iewing commonly
used o e alua e isual saliency. In con as , ask-dependen means ha
he use obse es he media con en o ulﬁll a speciﬁc ask; in he case
o IQA/VQA, o e alua e he isual quali y. In hese expe imen s, he
mean opinion sco e ( ypically anging om 1 o 5) ac oss use s se es
as he g ound u h o quali y e alua ion.
Meu e al. [32] ca y ou wo eye- acking expe imen s on 10 o ig-
inal ideo sequences in a ee iewing and a quali y assessmen ask,
sepa a ely. The compa ison be ween eye mo emen s indica es ha he
deg ee o simila i y be ween human p io i y maps is a he high. They
obse e ha saliency-based dis o ion pooling does no signiﬁcan ly
imp o e he pe o mances o he VQA me ic. Liu e al. [36] and Hani
e al. [5] pe o m a simila expe imen p ocedu e o IQA, Liu e alua es
whe he and o wha ex en he addi ion o na u al scene saliency is
beneﬁcial o objec i e quali y p edic ion in gene al e ms, and Hani
conclude ha i is no ai o compa e he e ec o adding saliency in
objec i e me ics wi hou speci ying how he saliency was measu ed.
In la ge con ex s, ask e ec s mo e b oadly inﬂuence isual a en-
ion in imme si e en i onmen s. Hadne -Hun e e al. [21] in es i-
ga ed ee- iewing, sea ch, and na iga ion asks in in e ac i e i ual
en i onmen s and ound ask-speciﬁc di e ences in se e al human
isual a en ion measu es, pa icula ly du ing na iga ion. Thei ﬁnd-
ings demons a ed he po en ial o using a en ion da a o dynamically
adap i ual simula ions and games. Hu e al. [25] analyzed eye and
head mo emen s o pa icipan s pe o ming ee- iewing, isual sea ch,
saliency, and acking asks in 360-deg ee VR ideos. They e ealed
signiﬁcan ask-d i en di e ences in ﬁxa ion du a ions, saccade ampli-
udes, head o a ion eloci ies, and eye-head coo dina ion. EHTask–a
lea ning-based me hod ha employs eye and head mo emen s o ec-
ognize use asks in VR is p oposed. Thei wo k p o ides meaning ul
insigh s in o human isual a en ion unde di e en VR asks and guides
u u e wo k on ecognizing use asks in VR. Malpica e al. [38] sys-
ema ically examined he impac o ee explo a ion, memo y, and
isual sea ch asks on isual beha io in imme si e scenes. They e-
po ed consis en ask-speciﬁc di e ences in eye and head mo emen
pa e ns, o e ing p ac ical insigh s o designing ask-o ien ed imme -
si e applica ions. To he bes o ou knowledge, we a e he ﬁ s o
Au ho ized licensed use limi ed o: TU Del Lib a y. Downloaded on Augus 29,2025 a 20:12:19 UTC om IEEE Xplo e. Res ic ions apply.
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 31, NO. 5, MAY 2025
2966
Fig. 2: Dis ibu ion o SI and TI o 19 sou ce dynamic poin clouds om 3
da ase s, he colo alue is compu ed by (SI2+TI2).
in es iga e he impac o asks on human a en ion deploymen in he
con ex o dynamic poin clouds, building on insigh s om ideo, VR,
and imme si e media s udies.
2.3 E alua ion Me ics o Saliency Maps
The de ia ion be ween wo saliency maps is o en quan iﬁed depend-
ing on how he isual saliency is ep esen ed [48]. Following he
e alua ion me ics on 2D image saliency maps, we can di ide he e alu-
a ion me ic in o loca ion-based me ics and dis ibu ion-based me ics.
Loca ion-based me ics, such as AUC [22], NSS [42], IG [30], a e
designed speciﬁcally o saliency e alua ion, and hey ope a e on he
g ound u h ep esen ed as disc e e ﬁxa ion loca ions. On he o he
hand, dis ibu ion-based me ics, such as SIM [47], PCC, KL [34],
and EMD [50], ha e been adap ed om in o ma ion heo y (IG, KL-
di e gence), s a is ics (PCC) and image ma ching and e ie al (SIM,
EMD), and ope a e on he g ound u h ep esen ed as a con inuous
ﬁxa ion map. In e es ed eade s can e e o [13] o mo e in o ma-
ion abou he ecommenda ion o me ic selec ions unde speciﬁc
assump ions and o speciﬁc applica ions. Howe e , he a o emen-
ioned me ics a e designed o g id-based 2D saliency maps, which
makes hem di ﬁcul o apply o poin cloud saliency maps due o he
in insic cha ac e is ics o dynamic poin clouds. Based on he ecom-
menda ions o me ic selec ion, we chose PCC and adap ed EMD as
hey a e well-sui ed o e alua ing dis ibu ion-based saliency maps and
can be easily ex ended o 3D scena ios, aligning wi h he na u e o ou
poin cloud saliency maps. We u he cla i y ou choice in Sec ion 5.2.
3DATASET CONSTRUCTION
To in es iga e how isual a en ion is deployed on dynamic poin clouds
and compa e i wi h ask-dependen saliency maps [69], we conduc ed
a ask- ee eye- acking expe imen in a VR en i onmen . Du ing he
expe imen , we eco ded he posi ion (x, y, z coo dina es) and o a ion
( h ee Eule angles a ound he x, y, and z axes) o he came a associa ed
wi h each pa icipan ’s HMD, along wi h imes amped da a. This
in o ma ion was used o analyze pa icipan s’ na iga ion mo emen s
wi hin he physical space (i.e., he ﬂoo ). Gaze- ela ed da a (gaze o igin
in x, y, z, and no malized gaze di ec ion ec o , he posi ion o he poin
cloud ames) was collec ed ollowing he same me hod as in [69] o
gene a e saliency maps.
3.1 Ma e ials
We selec all 12 poin cloud sequences om UVG-VPC dynamic poin
cloud da ase [19], 5 e e ence sequences om he QAVQ-DPC da ase
[69], and 2 sequences om he Owlii da ase [64] o he ask- ee
eye- acking expe imen . We selec ed all he e e ence con en s om
he QAVA-DPC da ase as i con ains ask-dependen isual a en ion
maps, hus aiding us in ou pu pose o compa ing ask-dependen and
ask- ee iewing, and we complemen ed i wi h addi ional high-quali y
con en s o p o ide addi ional saliency da a. We compu e he Spa ial
In o ma ion (SI) and Tempo al In o ma ion (TI) o each con en [1],
by p ojec ing he poin cloud in o 4 iews, namely, le , igh , on , and
back iew, o i s bounding box o apply SI and TI sepa a ely, hen ob ain
he maximum alue among he 4 iews o e all he ames as he ﬁnal
SI/TI o one sequence. The dis ibu ion o all dynamic poin clouds
can be seen in Figu e 2. The dispe sed s a e in SI (ho izon al axes)/TI
( e ical axes) shows he di e si y o ou con en s in he spa ial/ empo al
domain. All he s imuli a e e e ence quali y (wi hou any comp ession
dis o ion).
3.2 Appa a us
To ensu e ha he high-le el ask is he only a iable, we used he
same appa a us as [69], o enable a ai compa ison wi h he o he ask-
dependen expe imen . Ou expe imen so wa e is de eloped in Uni y
( e sion 2021.3.10. 1). The CWI poin cloud uni y package ( e sion
0.10.0) is used o impo and playback he dynamic poin clouds [46].
Fo he UVG-VPC da ase , each sequence con ains 250 ames, while
o he sequences con ain 300 ames. The ame a e is 30 ames pe
second, wi h each ideo being displayed 3 imes. We use HTC Vi e P o
Eye de ices wi h eye- acking capabili ies and Vi e hand con olle s
o pa icipan in e ac ion. The eye- acking applica ions a e de eloped
using he na i e HTC Vi e SRanipal SDK.
We ensu ed a wa e igh appea ance o all he s imuli by adjus ing
he poin size o he a e age dis ance among i s 10 nea es neighbo s
all o e all poin s in he poin cloud [57]. They a e escaled o a simila
size, a ound 1.8m in heigh , o mimic ealis ic ele-imme si e scena ios.
The VR scene is illumina ed by a i ual lamp on he ceiling cen e ed
abo e he models. The lamp is se as an a ea ligh wi h in ensi y alues
o 2 in Uni y o simula e o dina y ligh ing in a oom.
3.3 P ocedu e
In his s udy, we use a wi hin-subjec design. To a oid he e ec s o con-
ex ual o memo y compa isons, we andomly gene a ed a playlis o
each subjec . Be o e he expe imen , he isual acui y and colo ision
o e e y subjec was es ed using Snellen [18] and Ishiha a [15] cha s.
Pa icipan s we e b ie ed and signed a consen o m p io o aking pa
in he s udy. A he beginning o he session, he in e -pupilla y dis ance
was measu ed and he headse was adjus ed by he subjec acco dingly.
Then, a aining session was conduc ed o help amilia ize he subjec s
wi h he sys em, including he con olle s and he naming o each bu -
on o in e ac mo e easily. Two aining sequence, namely loo and
edandblack, we e used, which we e no included in he da ase . The
subjec s always s a ed a he same loca ion, which is 1.5 me e s away
om he cen e o he i ual oom, bu could mo e eely om he e
onwa d and ended anywhe e hey p e e ed. A s imulus was loca ed in
he cen e o he i ual oom, and each s imulus was andomly o a ed
be ween
[0◦
,360◦]
o a oid bias. Du ing he expe imen , he subjec s
we e ins uc ed o iew each model eely in he VR en i onmen du -
ing he playback o each sequence. The subjec s we e also equi ed o
s and s ill while doing he calib a ion and e o p oﬁling.
Fo each subjec , he es was spli in o wo ounds, las ing o
a ound 17 minu es each, wi h a manda o y 5-minu e b eak in be ween.
Be o e and a e each ound, pa icipan s we e eques ed o ﬁll in
a Simula o Sickness Ques ionnai e (SSQ) on a 1-4 disc e e scale
(1=none o 4=se e e) [26]. Fo e e y model and subjec , a ound was
spli in o h ee consecu i e s eps:
1
Calib a ion was done a he beginning o he expe imen , and
only when calib a ion was success ul use s could en e in o he
dynamic poin cloud playback s age.
2
Inspec ion o models is he s ep whe e he pa icipan s a e ob-
se ing he dynamic poin cloud na u ally, while hei mo emen
ajec o y and gaze- ela ed in o ma ion a e eco ded.
3
E o p oﬁling is issued as he las s ep in o de o es ima e he ac-
cu acy o he gaze measu emen s due o calib a ion inaccu acies,
o HMD displacemen s.
Au ho ized licensed use limi ed o: TU Del Lib a y. Downloaded on Augus 29,2025 a 20:12:19 UTC om IEEE Xplo e. Res ic ions apply.
2967
zhou ET AL.: CompA ison o VisuAL sALiEnCy o DynAmiC poinT CLouDs...
A e pa icipan s ﬁnished he wo ounds, hey we e eques ed o ﬁll
ou he Imme si e P esence Ques ionnai e (IPQ)
1
. Las , he esea che s
conduc ed a semi-s uc u ed in e iew. The in e iew was conduc ed in-
di idually in a non-VR se ing, and he en i e con e sa ion was eco ded
o analysis pu poses.
3.4 Pa icipan s
A o al o 24 pa icipan s ook pa in he subjec i e es s o his s udy,
wi h a di e se composi ion ha includes 1 non-bina y indi idual, 12
males, and 11 emales. The pa icipan s’ ages anged om 23 o 35,
wi h an a e age age o 28.33 and a s anda d de ia ion o 3.10. Each
pa icipan obse ed all he dynamic poin cloud s imuli. In e ms o
occupa ion, he majo i y (
66.67%
) o he pa icipan s we e s uden s,
anging om mas e o PhD le els. The emaining
33.34%
we e e-
sea che s (scien is and lec u e ), one landscape designe , and one
accoun an . Rega ding amilia i y wi h VR de ices, 5 pa icipan s had
ne e expe ienced VR be o e he expe imen , 13 pa icipan s had in-
e media e expe ience (using VR 1 o 3 imes), and 6 o hem we e
conside ed expe s, ha ing backg ounds as VR designe s o esea che s.
Addi ionally, 17 ou o 24 pa icipan s wo e glasses du ing he expe i-
men . No e hical app o al was sough o his s udy, due o he absence
o an es ablished e hical e iew boa d a he ins i u ion whe e he e-
sea ch was conduc ed. The expe imen al p o ocol, including pa icipan
consen and da a collec ion, was e iewed h ough an in e nal boa d o
be complian wi h cu en GDPR legisla ion. Pa icipan s consen ed
o he collec ion and usage o hei da a a he s a o he expe imen ,
a e being in o med abou he s udy.
4E
XPERIMENT RESULTS
4.1 Analysis o mo emen s on he physical space
The analysis o he mo emen s on he physical space is based on he
eco ded da a associa ed wi h he posi ion and o a ion o HMD col-
lec ed du ing expe imen s. Fo he ollowing analysis, he da a was
esampled a 30Hz. A gene al o e iew o he na iga ion beha iou
o pa icipan s on he ﬂoo (plane
XY
) is gi en in Figu e 3 o h ee
selec ed con en s, a a2,HelloGoodbye and CasualSpin. We chose
hese olume ic poin clouds based on hei SI and TI alues o in-
es iga e how he use s mo emen s change in ela ion wi h con en
cha ac e is ics. As shown in Figu e 2, a a2 has low TI and SI, Casu-
alSpin has high alue o SI while HelloGoodbye is cha ac e ised by
high TI. The olume ic con en is ini ially placed app oxima ely a
he cen e o he ﬂoo plane and since he sequences a e dynamic, we
also ep esen hei posi ion o e ime wi h a ajec o y o pink do s.
I can be no ed ha he ﬁ s sequence is he less dynamic since a a2
s ays in i s ini ial posi ion (Figu e 3(a)). This b ings o a mo e s a ic
beha iou also om he pa icipan s who mainly s ay in one loca ion
wi hou explo ing he a ea a ound he con en : he e a e indeed some
s ong ed spo s which ep esen he posi ion whe e use s spen mos
o hei ime and he shadow o he use posi ion is qui e compac ed
a ound he con en . The poin cloud CasualSpin is ins ead spinning
a ound i sel . In his case, pa icipan s a e mo e sp ead a ound he
con en o display i om di e en pe spec i e as shown in Figu e 3(b)
bu hey a e s ill qui e compac . On he con a y, Figu e 3(c) shows a
mo e dynamic explo a o y beha iou om he use s while displaying
HelloGoodbye. To be no ed ha his sequence is also he mos dynamic
one since i walks back and o wa d. Thus, use s end o explo e mo e
while wa ching dynamic sequences, as al eady obse ed in [49].
4.2 Analysis o gaze da a
To unde s and deepe isual explo a ion, we now analyze he ela-
ionship be ween gaze and con en s. Following he same gaze da a
p ocessing in [69], we igno ed he ini ial 400 ms gaze da a o each use
o a oid unin en ional gaze because o he unexpec ed appea ance o
he dynamic poin cloud. Then, only he alid gaze samples p o ided
by he na i e HTC Vi e SRanipal SDK we e selec ed. Each alid gaze
sample was p ocessed as ollows: 1) Ve i y he da a alidi y o gaze
da a by calcula ing he weigh ed a e age angula e o o each gaze
1h ps://www.ig oup.o g/pq/ipq/index.php
sample wi h he help o GazeMe ics [9]; 2) Iden i y ﬁxa ion poin s
o gaze da a by dispe sion h eshold iden iﬁca ion algo i hm; 3) Map
gaze da a o dynamic poin cloud ame wi h unca ed-cone-sec o
algo i hm [7]; 4) Fuse mul iple use s’ gaze da a o dynamic poin cloud
ames. A e he ou s eps, we ob ained he saliency map pe ame.
Each poin cloud ame has a no malized hea alue ange in [0,1] o
each poin , 0 meaning non-salien and 1 meaning he mos salien . Fo
he p ocessing de ails, please e e o [69]. Figu e 4 ep esen s he num-
be o ﬁxa ions o each subjec on each con en . Speciﬁcally, each ow
deno es he numbe o ﬁxa ion poin s pe con en ac oss he di e en
use s. Blue colo s indica e a low alue o ﬁxa ions while yellow ones
indica e high alues. Ve ically, we can no ice consis en beha io pe
pa icipan ac oss he di e en con en . Fo example, Use 14 always
has a low alue o ﬁxa ion, independen o he isualized olume ic
con en , indica ing a mo e e a ic beha io . On he con a y, Use 1
appea s o ha e mo e consis en ﬁxa ions ac oss he con en . Thus, pa -
icipan s end o p ese e simila gaze beha io (highly e a ic o qui e
s a ic) independen ly o he olume ic con en . Simila ou comes we e
obse ed also in [49]. Looking a Figu e 4 pe ow (i.e., a single con en
ac oss di e en use s), we can no ice ha con en s wi h highe TI go
mo e a en ion: Flowe Dance and model, which a e cha ac e ized by
highe TI, p esen mo e ﬁxa ions han a a2. To u he ou analysis, in
Figu e 5, we show he saliency map ( andomly selec ed ame
150 h
)
o hese h ee sequences. We can see ha all h ee sequences show
ﬁxa ions on seman ically ele an a eas, such as he ace. Howe e , in
Flowe Dance, who is in he middle o a spinning mo ion, and model,
who is simply adjus ing he d ess, he ﬁxa ion a eas a e smalle and
mo e dispe sed ac oss he con en , as he use s’ a en ion is d awn by
he mo ion o he d esses o any pa e ns on hem. We u he analyse
and discuss gaze da a in Sec ion 5.1.
4.3 Analysis o SSQ and IPQ da a
SSQ comp ises 16 symp oms which a e u he g ouped in o h ee
di e en ca ego ies: Oculomo o , Nausea, and Diso ien a ion; we also
compu ed he o al sco e acco ding o [26]. The simula o sco es in-
c eased a e he expe imen . Speciﬁcally, he o al sco es ose om
6.37 o 10.33 be o e and a e Session 1, and om 5.91 o 10.08 be o e
and a e Session 2. Howe e , i can be seen ha b eaks help in educing
simula o sickness. The cu en e sion o he IPQ has h ee subscales
(Spa ial P esence, In ol emen , Expe ienced Realism) and one addi-
ional gene al i em no belonging o a subscale. We calcula e he mean
ac oss he use s o each ac o . The pa icipan s expe ience high le -
els o Spa ial P esence (
MSP =4.5
) and In ol emen (
MINV =3.8
),
whe eas lowe le els o Realisms (
MREAL =3.3
). The possible eason
is ha he e is no in e ac ion be ween he use and he con en , as men-
ioned in Sec ion 4.4.4, and he e is no eye con ac . The i ual oom is
emp y o be e cap u ing he isual a en ion, which no mally ge a
lowe sco e o he ques ion: “ he i ual wo ld seemed mo e ealis ic
han he eal wo ld."
4.4 Quali a i e esul s
22 alid in e iew audio eco dings we e ansc ibed in o ex s and
coded using Do e ail
2
. Following Magui e’s guideline on hema ic
analysis [37], we ini ially e iewed and labeled he ex , o ganized hese
labels in o hemes, and subsequen ly con ened o es ablish he con-
nec ion be ween con en and isual a en ion du ing he subjec i e es .
Each pa icipan is deno ed as P1-P24, wi h he numbe o pa icipan s
concu ing wi h each s a emen indica ed in pa en heses.
4.4.1 Fac o s ha Cap u e Visual A en ion Alloca ion
Tempo al in o ma ion Pa icipan s (18) poin ed ou ha mo e-
men is he mos a ac i e ac o in ou dynamic poin cloud playback
scene(P21: “when you a e wa ching a ideo, i ’s easy o ollow he
di ec ion o he mo emen s”). 11 o hem in e p e ed he in o ma ion
con eyed by he con en as in e es ing o a ac hei a en ion. How-
e e , pa icipan s (16) also no ed ha high-mo ion sequences do no
necessa ily a ac mo e a en ion han low-mo ion sequences.
2h ps://do e ail.com/
Au ho ized licensed use limi ed o: TU Del Lib a y. Downloaded on Augus 29,2025 a 20:12:19 UTC om IEEE Xplo e. Res ic ions apply.
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 31, NO. 5, MAY 2025
2968
(a) Ra a2 (b) CasualSpin (c) HelloGoodbye
Fig. 3: Spa ial dis ibu ion o e ime o he main loca ion isi ed by use s while displaying h ee di e en con en : (a) a a2, (b) HelloGoodbye and (c)
CasualSpin. The cen oid posi ion o each olume ic con en is ep esen ed by a sequence o pink poin s on he ﬂoo .
12345689
10
11
12
13
14
15
16
17
18
19
20
21
23
24
Use
In
de
x
a a2
dance
exe cise
longd ess
soldie
BlueBackpack
BlueSpin
BlueSqua
CasualSpin
CasualSqua
Elegan Dance
Elegan Wa e
Flowe Dance
Flowe Wa e
Gymnas
HelloGoodbye
ReadyFo Win e
baske ballPlaye
model
S imuli
100
200
300
400
500
600
700
Fig. 4: The ﬁxa ions pe subjec con en in he p oposed TF-DPC da ase .
Each ow deno es he ﬁxa ions on a speciﬁc con en and each column
deno es he ﬁxa ions o each subjec , espec i ely.
a a2 Flowe Dance model
Fig. 5: The isual saliency map o he 150 h ame o he dynamic poin
cloud wi h he on iew.
A i ac s and De ails A i ac s (9) and de ails (9) a e iden iﬁed
as he co-second ac o s a ac ing people’s a en ion. (P8: “wha I
ocused on also nega i e hings a e, on he edges o he poin calls o en
he e was like ippling, so o ﬂicke ing, a ac s a lo o a en ion,
dis ac s me, o he han ha , I hink eyes like aces in gene al, people
like he exp ession.")
Geome y and Tex u e Geome y (2) and ex u e (7) a e iden-
iﬁed as he second and hi d ac o s inﬂuencing he subjec i e a ing
o poin clouds unde sc u iny. (P3: “ I was obse ing p ecisely wo
hings, he edges o he body and how dis o ed hey a e and also some
dis o ions inside he cos ume.”)
In e ms o isual a en ion alloca ion, empo al in o ma ion p o es
o be mo e c ucial han ei he geome y o ex u e, wi h bo h geome y
and ex u e showing ela i ely low impo ance. The de ails o he
dynamic poin cloud all somewhe e in be ween, while nega i e a i ac s
in he poin cloud a ac signiﬁcan a en ion, aligning wi h ﬁndings
om a p e ious s udy [60].
4.4.2 Fac o s A ec ing Visual A en ion
Pa icipan s (12) epo ed he ealism o he con en and na u alness
o he ac ion would change hei a en ion. (P1: “I ha e o say he e’s
an e ec , i I see he quali y is good, I usually will look close . I
will check he de ails. Bu i he quali y is so poo ha I can see
dis o ion e e ywhe e, hen I will consciously, I will ealize his is no
eal. So I will be less in e es ed.") Ab up dis o ions o he sequence
will shi a en ion, (P5: “The poin cloud’s in ended ocal poin migh
end up being o e looked because he ﬂaws d aw my a en ion away
om i , ins ead I ocus on he impe ec ions."). I is wo h no ing ha
all he poin clouds unde es we e o e e ence quali y; ha is, any
impai men was de i ed om he acquisi ion i sel , and was no due o
any addi ional p ocessing such as comp ession. Thus, he acquisi ion
me hods hemsel es can ha e a signiﬁcan impac on isual a en ion.
This obse a ion aligns wi h Zhang’s conclusion [65] ha dis o ions
always change he a ended egions.
4.4.3 Fac o s Inﬂuencing Use In e ac ion
Pa icipan s (14) a ibu ed mos o hei mo emen o he need o
obse e he on ace o ha e mo e unde s anding o he human ﬁgu e.
They no ed ha sequences showing he same human ﬁgu e wi h only
sligh a ia ions in mo emen and clo hing, as in he UVG-VPC da ase ,
led o dec eased mo emen and educed in e es . This epe i ion (5)
and he mono onous ac ions (5) made he ask eel no engaging and
dull. Limi ed space (8) and cable (1) esul in less mo emen o he
pa icipan s.
4.4.4
Designing he Cons uc ion o a Visual A en ion Da ase
Con en Pa icipan s a o ed he “longd ess" (7), “soldie " (6),
and “Gymnas " (5) poin cloud sequences among all he con en s, de-
sc ibing hem as bo h ealis ic and engaging. Howe e , some pa ici-
pan s (3) no ed ha he e a e only human ﬁgu es. Addi ionally, hey
Au ho ized licensed use limi ed o: TU Del Lib a y. Downloaded on Augus 29,2025 a 20:12:19 UTC om IEEE Xplo e. Res ic ions apply.

2969
zhou ET AL.: CompA ison o VisuAL sALiEnCy o DynAmiC poinT CLouDs...
exp essed a desi e o mo e a ied objec s and inc eased in e ac i i y,
such as eye con ac be ween hemsel es and he con en , o enhance he
imme si e expe ience.
Display equipmen o dynamic poin cloud Pa icipan s
(16) s a ed ha using an HMD in VR is a be e al e na i e o a 2D
sc een, as i p o ides g ea e imme sion and eedom. (P7: “I hink i ’s
mo e in ui i e i you eel mo e eal when you see i , by 1 o 1 a io is
like you size, i ’s like nex o you while on he sc een i ’s like eally
small, you can zoom in bu hen he sc een is no as big o you only
see maybe one pa o i e en hough i ’s a big sc een, i ’s no 3D.")
Howe e , he HMD is hea y (2) and uncom o able o p olonged use
(2), while 5 pa icipan s no ed ha i s e ec i eness depends on he
speciﬁc applica ion.
5
C
OMPARISON BETWEEN TASK
-
FREE AND TASK
-
DEPENDENT
To explo e how isual asks impac he isual a en ion, we quan i-
a i ely analyze gaze s a is ics and saliency map simila i y be ween
ask- ee and ask-dependen scena ios. To be no ed hese analyses a e
limi ed o he ﬁ e sha ed sequences ac oss he p oposed da ase and he
one p esen ed in [69]: a a2 (low SI, TI), dance (medium SI, high TI),
exe cise (low SI, high TI), longd ess (high SI, medium TI), and soldie
(medium SI, TI).
5.1 Compa ison Consis ency o Gaze
To analyze he alloca ion o isual a en ion depending on he ask,
we p opose h ee measu emen s. We choose he o al ﬁxa ion numbe
ins ead o o he s a is ics o he gaze [4] ( he mean du a ion o scan-pa h
magni ude), because since he ﬁxa ion is ob ained h ough he dispe -
sion h eshold iden iﬁca ion algo i hm, he du a ion o consecu i e gaze
samples is implici ly conside ed. Apa om gaze beha io , ou ocus
is on whe e he gaze is alloca ed wi hin 3D poin cloud ames. We
selec he Volumes o In e es (VoI) [56], which can show how many
olumes ha e been obse ed by humans, and he dis ibu ion o he VoI,
which can ell us how hei a en ion is dispe sed ac oss he poin cloud.
VoI is compu ed as he o al numbe o poin s whose hea alue is la ge
han ze o, he sp ead o VoI is he a e age pai wise dis ance o he VoI
wi hin he poin cloud. Figu e 6, om le o igh , shows he ﬁxa ion,
VoI, and he sp ead o VoI ac oss pa icipan s in bo h a ask- ee and
ask-depended expe imen . We can obse e he ollowing: 1) Fixa ions
o all 5 sequences wi h a ian SI and TI pe o m consis en ly. The
ﬁxa ion numbe unde ask- ee is lowe han unde ask-dependen
condi ions since people need o ocus ela i ely mo e o e alua e he
quali y o he sequences. 2) Gene ally, mo e ﬁxa ions mean la ge VoI
and spa se dis ibu ion o he VoI. Howe e , his is no ue o dance
and a a2 sequences.
To analyze he di e ence be ween asks wi h espec o hese mea-
su es o isual a en ion, we an a se o analysis o a iance (ANOVA)
es s. We g ouped all ﬁxa ions by ask and agg ega ed measu es by
pa icipan o each con en pe ame. One-way ANOVAs indica e he
o e all e ec o he ask on hese measu es. The p- alue is below he
h eshold (0.05) o signiﬁcance o all he con en s pe measu e excep
o he sp ead o dis ibu ions o a a2 and he RoI o dance , which
a e 0.1641 and 0.8008, sepa a ely. In conclusion:
•
Ac oss all 5 sequences, he numbe o ﬁxa ions is signiﬁcan ly
di e en be ween ask- ee and ask-dependen scena ios. Task-
dependen iewe s, who we e e alua ing he quali y o he con en ,
consis en ly had mo e ﬁxa ions compa ed o ask- ee iewe s,
who likely scanned he con en mo e eely. This suppo s he idea
ha ask- ela ed goals equi e mo e ocused a en ion, leading o
a highe ﬁxa ion coun . Sequences wi h highe SI and TI, such as
longd ess and dance , end o cap u e mo e a en ion, e idenced
by he highe numbe o ﬁxa ions. In con as , lowe SI and TI
sequences like a a2 gene ally had ewe ﬁxa ions, as hey may
no ha e been as isually engaging.
•
The e is a signiﬁcan di e ence o mos con en s, wi h ask-
dependen condi ions leading o la ge VoIs. This sugges s ha
when pa icipan s a e gi en speciﬁc asks, hey dis ibu e hei
a en ion mo e widely ac oss he poin cloud (mul iple speciﬁc
Table 2: P ope y o E alua ion Me ics o Image Saliency Map
AUC NSS
IG
SIM KL PCC EMD
Loca ion-based 
Dis ibu ion-based 
Simila i y  
Dissimila i y 
Sensi i e o 0 alues 
Wi h spa ial dis ance 
a eas), pe haps because he asks p omp hem o explo e mo e
egions o ele an in o ma ion. While in ee- iewing, hey
explo ed gene ally, d i en by pe sonal cu iosi y o passi e obse -
a ion a he han he ac i e sea ch o speciﬁc de ails. dance
s ands ou as he only con en whe e bo h condi ions co e he
same. This could mean ha he na u e o he dance does no
lead o a no iceable change in he a eas pa icipan s a end o,
ega dless o whe he hey a e gi en a ask o no .
•
The e is a signiﬁcan di e ence o mos con en s, wi h ask-
dependen condi ions leading o a b oade sp ead o a en ion.
Howe e , o a a2, he e is no signiﬁcan di e ence be ween he
wo condi ions since i lacks o a main a en ion a ea, likely due
o i s low SI and TIand no pa icula ly engaging ea u es o a ac
iewe s’ a en ion. As a esul , people end o look a ound mo e.
The possible eason o he highe sp ead o VoI o dance while
emaining he same VoI is due o i s con inuous mo emen s o e
ime, wi h he dynamic dance ges u es e enly cap u ing a en ion
ac oss he poin cloud.
5.2 Compa ison Consis ency o Visual Saliency Map
We aim o compa e he poin cloud saliency map in ask- ee and ask-
dependen scena ios. Commonly used me ics o such a compa ison
a e lis ed in Table 2. The key p ope ies include loca ion o dis ibu ion-
based, simila i y o dissimila i y measu emen , sensi i i y o 0 alues,
and conside a ion o spa ial dis ance. Since he gene a ed saliency
map o dynamic poin clouds uses exac ly he same me hod in [69],
which does no include an explici ﬁxa ion poin on he poin cloud,
he loca ion-based me ics a e no applicable o ou con inuous poin
cloud saliency maps. Among he dis ibu ion-based me ics, SIM, as a
simila i y me ic, penalizes misalignmen and is sensi i e o missing
alues and 0 alues, while KL, as a dissimila i y me ic, is also sensi i e
o 0 alues. Thus, based on he ecommenda ion o me ic selec ion
[13,48] and he cha ac e is ics o ou dynamic poin cloud saliency
map, i.e., he majo i y o he poin s a e non-salien (i.e., hea alues
equal o 0), we op no o use hem. EMD, as a dissimila i y, is he
only me ic ha conside s spa ial dis ance. He ein we choose PCC o
measu e he simila i y and adap EMD, which is used o measu e he
2D saliency map, o measu e he dissimila i y.
The PCC is a s a is ical me hod o measu e how co ela ed o depen-
den wo a iables a e. In ou scena io, gi en he isual saliency maps
ob ained om a ask- ee Fand ask-dependen Dexpe imen , PCC
can be deﬁned as ollows: [31]:
PCC(F,D)=co (F,D)
σFσD
.(1)
whe e
co (·)
is he co a iance and
σ
is he s anda d de ia ion. PCC
anges om -1 o 1, wi h highe absolu e alues indica ing s onge
co ela ion be ween isual saliency maps. Howe e , PCC is sensi i e
o ou lie s and only compa es he magni udes o co esponding poin s.
This makes i unable o accoun o shi s in poin loca ions o pa ial
ma ches in a ended a eas, which a e common in eye- acking expe i-
men s due o de ice limi a ions o pa icipan p e e ences. This issue is
especially no iceable in la ge poin clouds. To add ess his, we p opose
o adap EMD o dissimila i y measu emen [50], as i be e cap u es
he dis ibu ion o a en ion by inco po a ing spa ial in o ma ion. EMD
helps o alle ia e he issues o poin shi s and pa ial ma ches in la ge
olume ic con en cases. Speciﬁcally, we gene a e he “signa u e" (a
ea u e ha can ep esen he saliency map) by calcula ing a his og am
o he hea alue a each poin in 3D space. We deno e a disc e e,
Au ho ized licensed use limi ed o: TU Del Lib a y. Downloaded on Augus 29,2025 a 20:12:19 UTC om IEEE Xplo e. Res ic ions apply.
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 31, NO. 5, MAY 2025
2970
Fig. 6: Agg ega ion o ﬁxa ions, VoI, and he sp ead o he dis ibu ion ac oss pa icipan s o ask- ee and ask-dependen expe imen al scena ios o
he 5 sha ed dynamic poin clouds om bo h QAVQ-DPC and p oposed TF-DPC da ase s, sepa a ely.
(a) a a2 (b) dance (c) exe cise (d) longd ess (e) soldie
Fig. 7: Simila i y o poin cloud saliency maps be ween ask- ee and ask-dependen scena ios h ough EMD (•) and PCC (•) o he sha ed 5
sequences pe ame, sepa a ely.
ﬁni e dis ibu ion
p
om he saliency map ob ained in he ask- ee
expe imen as
p={(p1,w1),...,(pm,wm)}≡(P,w)∈DK×m(2)
whe e P
=[p1,..., pm]∈RK×m
ep esen s he signa u e wi h
m
poin s
(o clus e s),
wi≥0
ep esen s he weigh o ac ion associa ed wi h
he
i
- h poin (o clus e ) o all
i=1,···,m
. He e
K
is he dimension o
ambien space (Euclidean space o 3D poin cloud) o he poin s
pi∈
RK
, and
m
is he numbe o poin s (o clus e s). The o al weigh o he
dis ibu ion pis
wΣ=Σm
i=1wi
. Gi en wo dis ibu ions in ask- ee and
ask-dependen scena ios as p
=(
P
,
w
)∈DK,m
and q
=(
Q
,
u
)∈DK,n
.
We used he ollowing EMD [50]:
EMD(p,q)=minF=( ij)∈F(p,q)WORK(F,p,q)
min(wΣ,uΣ).(3)
The EMD dis ance
EMD(p,q)
be ween
p
and
q
is he minimum amoun
o wo k o ma ch be ween dis ibu ion
p
and
q
, no malized by he
weigh o he ligh e dis ibu ion. Thus, o ob ain he EMD alue, we
need o ﬁnd he op imal ﬂow by sol ing he anspo a ion p oblem.
The wo k done by a easible ﬂow
F∈F(p,q)
in ma ching
p
and
q
is
gi en by
WORK(F,p,q)=
m
∑
i=1
n
∑
j=1
ijdij,(4)
whe e
dij =d(pi,qj)
is he “g ound dis ance" be ween
pi
and
qj
. We
conside he deg ee o salience and he spa ial in o ma ion o he poin
cloud join ly, he g ound dis ance is now deﬁned as
dij =λ|hi−hj|+(1−λ)[(xi−xj)2+(yi−yj)2+(zi−zj)2]1
2,(5)
whe e
hi
is he middle alue o he
i h
bin o he his og am in
p
, and
(xi,yi,zi)
is he loca ion o he cen oid poin loca ed in
i h
bin o
p
.
λ
is a weigh used o balance he impo ance be ween spa ial in o ma ion
and he magni ude o he hea alue. The ﬂow
F
is a easible ﬂow
(a) A e age PCC (b) A e age EMD
Fig. 8: Simila i y o poin cloud isual saliency maps be ween ask- ee
and ask-dependen o he sha ed 5 sequences a e aged o e 300
ames, sepa a ely.
be ween pand qi
ij ≥0i=1,...,m,j=1,...,n,(4.1)
n
∑
j=1
ij ≤wii=1,...,m,(4.2)
m
∑
i=1
ij ≤ujj=1,...,n,and (4.3)
m
∑
i=1
n
∑
j=1
ij =min(wΣ,uΣ).(4.4)
The de ailed explana ion o he cons ain s can be ound in [50].
The coo dina es o he dis ibu ion poin s a e no used di ec ly in he
EMD o mula ion, only he g ound dis ances
dij
be ween poin s a e
needed. A la ge EMD indica es a la ge di e ence be ween wo
dis ibu ions while an EMD o ze o indica es ha wo dis ibu ions a e
he same. In his pape , we emo e he poin s ha a e non-salien in
bo h expe imen s be o e we compu e he PCC and EMD o ob ain an
accu a e measu emen . The bin numbe o he his og am is se o 30,
λ
is se o 0.5.
To ai ly compa e simila i y and dissimila i y me ics, we no malize
he EMD alues o [0,1] ange and con e dissimila i y in o simila i y.
This is achie ed by di iding he compu ed EMD by he maximum
possible EMD o a gi en his og am, assuming all he mass (i.e., salien
poin s) s a ing in he le mos bin need o be mo ed o he igh mos
bin. The simila i y sco e o EMD is hen calcula ed as 1 minus he
no malized EMD. Figu e 7compa es PCC and EMD alues o he
sha ed 5 con en s pe ame, sepa a ely. We obse e ha PCC exhibi s
g ea e a iance o exe cise,longd ess, and soldie , as e idenced by
ﬂuc ua ions in he PCC alues ac oss ames. This a iabili y sugges s
Au ho ized licensed use limi ed o: TU Del Lib a y. Downloaded on Augus 29,2025 a 20:12:19 UTC om IEEE Xplo e. Res ic ions apply.
2971
zhou ET AL.: CompA ison o VisuAL sALiEnCy o DynAmiC poinT CLouDs...
N
M
N
TD TF TD TF
M
ame:58 (
ܧܯܦ
௠௔௫ ) ame:61(
ܲܥܥ௠௔௫
)
TD TF TD TF
ame:58 (
ܧܯܦ௠௔௫
) ame:61(
ܲܥܥ௠௔௫
)
Fig. 9: Saliency map isualiza ion o soldie in ame 58 and ame 61, iden iﬁed as he mos simila maps using he adap ed EMD and PCC me ics.
The le side o he do ed line shows he on iew o he soldie , while he igh side shows he back iew. TD e e s o he saliency map collec ed
unde ask-dependen condi ions, and TF e e s o ask- ee.
ha PCC is sensi i e o ou lie s in he saliency map, leading o g ea e
a ia ion in isual simila i y o e ime o hese con en s. In con as ,
EMD demons a es mo e s able and consis en beha io , wi h alues
ha emain wi hin a na owe ange, indica ing educed ﬂuc ua ions.
This s abili y a ises om EMD’s conside a ion o spa ial in o ma ion
and i s pa ial ma ch p ope y. Figu e 8b and 8a show he a e age simi-
la i y ac oss ames in ask- ee and ask-dependen scena ios. No ably,
dance is iden iﬁed as he mos simila sequence by PCC, while soldie
is he mos simila acco ding o EMD. PCC’s emphasis on ma ching
magni udes a he same poin s leads o high simila i y sco es o dance ,
whe e ob ious salien egions iden iﬁed by humans emain consis en
o e ime, independen ly o he ask.
Combining Figu e 7and Figu e 8, i becomes clea ha bo h EMD
([0, 0.35]) and PCC ([-0.25, 0.4]) exhibi low simila i y alues, sug-
ges ing subs an ial di e ences be ween ask- ee and ask-dependen
scena ios. This highligh s ha ask-dependen scena ios in dynamic
poin clouds signiﬁcan ly al e human isual a en ion. EMD iden iﬁes
o e lapping egions o a en ion in bo h scena ios, p o iding a mo e
spa ially-awa e simila i y measu e, while PCC cap u es sha p a ia-
ions o speciﬁc con en . Figu e 9shows saliency maps o soldie
a he ames wi h maximum simila i y unde EMD and PCC me ics.
Visually, he saliency in he
58 h
ame appea s mo e simila han in he
61 h
ame, wi h he inse o he head showing g ea e o e lap, pa icu-
la ly om he back iew. This compa ison u he demons a es ha
while bo h PCC and EMD ha e hei s eng hs, EMD’s conside a ion
o spa ial in o ma ion makes i mo e sui able o e alua ing saliency in
poin cloud da a.
5.3 Summa y
Quali y assessmen , as a high-le el pe cep ual ask, signiﬁcan ly inﬂu-
ences how isual a en ion is deployed when e alua ing dynamic poin
clouds in VR. As discussed in Sec ion 5.1, one key obse a ion is ha
pa icipan s exhibi ewe ﬁxa ions in ask- ee condi ions compa ed o
ask-dependen ones. This is e iden in Figu e 9, whe e ask-dependen
iewe s ocus mo e on speciﬁc de ails, such as he spo ligh on he
soldie ’s ha . In con as , ask- ee iewe s ypically o m a gene al im-
p ession, p ima ily a ending o b oade ea u es like acial exp essions,
a he han ho oughly explo ing “less c i ical" de ails once hey ha e
g asped he o e all scene.
In ask-dependen condi ions, he demand o p ecise quali y e al-
ua ion p omp s pa icipan s o obse e he sequence mo e ca e ully.
Thei goal is o ga he isual cues o assess he con en ’s quali y, which
explains why saliency maps unde he quali y assessmen ask end o
ha e a la ge VoI. Addi ionally, due o con en epe i ion (same con en
wi h di e en quali y le el), pa icipan s in ask-dependen condi ions
a e less inclined o explo e he back o he poin cloud, p e e ing he
p ima y a eas in he on iew ha hey deem ele an o he quali y
assessmen ask. In ask- ee condi ions, pa icipan s gene ally scan he
con en b oadly, ocusing on p ominen mo emen s o a i ac s. Since
hey a e no bound by a speciﬁc objec i e, hey end o obse e bo h
he on and back iews o he poin clouds wi hou pa icula ocus.
The sp ead o he VoI, howe e , a ies be ween di e en condi ions
o di e en easons. in ask-dependen , pa icipan s’ a en ion is d awn
o speciﬁc ea u es om head o oe, like he spo ligh on he ha , he
wa ch on he hand, and he shoes, as shown in Figu e 5 he ame 58
unde ask-dependen condi ion. Pa icipan s’ a en ion is mo e a ge ed,
wi h indi idual di e ences in s a egies o assessing quali y. This
a iabili y con ibu es o he sp ead o he VoI bu wi h g ea e ocus on
elemen s ha a e c ucial o quali y judgmen . In con as , he ask- ee
condi ion eﬂec s a mo e passi e iewing app oach. Pa icipan s o m
a holis ic iew o he scene, only di ec ing hei gaze owa d a eas o
mo emen o ob ious a i ac s. Wi hou he demand o assess quali y,
hei ocus is less concen a ed on speciﬁc de ails, and hei iewing
pa e ns eﬂec a b oade explo a ion o he scene.
Mo emen and seman ic in o ma ion in he dynamic poin clouds,
such as acial exp essions o body mo emen s, consis en ly a ac i-
sual a en ion in bo h scena ios. Fo example, in Figu e 1, pa icipan s
equen ly ﬁxa e on aces ac oss mul iple ames. In e es ingly, isual
a en ion appea s o be mo e consis en in ask-dependen condi ions,
especially when i comes o ﬁne de ails, ega dless o whe he he scene
has high o low TI. Pa icipan s a e mo e likely o sc u inize hese de-
ails o de ec sub le dis o ions, which a e c i ical o assigning quali y
sco es. This di e ence in a en ion deploymen highligh s how ask-
d i en objec i es shape isual beha io , wi h ask-dependen iewe s
engaging in op-down mechanisms and ask- ee iewe s adop ing a
mo e elaxed, imp essionis ic app oach.
6D
ISCUSSION
6.1 Visual a en ion collec ion limi a ions
In his s udy, we collec a ask- ee saliency da ase o dynamic poin
clouds and in es iga e he ask impac on human a en ion alloca ion.
We obse ed ha a cen al bias pe sis s o some ex en when iewing
human aces, ega dless o whe he he condi ions a e ask- ee o ask-
dependen . Howe e , ou s udy is limi ed by he ac ha TF-DPC
ocuses solely on human ﬁgu es, excluding o he imme si e con en
ypes like landscapes o in e ac i e objec s. This limi a ion s ems
di ec ly om he lack o high-quali y, ealis ic da ase s o dynamic poin
cloud objec s, as o da e, only syn he ic da ase s including dynamic
objec s a e p esen in he li e a u e [58,63]. Thus, he ou comes o
his s udy a e alid only o he dynamic human ca ego y, and u u e
wo k should explo e b oade con en ypes. We chose a wi ed HMD
o main ain consis ency wi h he condi ions o he p e ious s udy;
howe e , his choice es ic ed physical mo emen due o he HMD’s
cable, and he de ice’s weigh and discom o may ha e inc eased
cogni i e load, po en ially esul ing in ewe and less s able ﬁxa ion
poin s. To explo e his, u u e s udies should conside assessing pupil
dila ion and blink a e, eliable indica o s o cogni i e load, alongside
gaze ampli ude and ﬁxa ion pa e ns. These cons ain s may limi he
Au ho ized licensed use limi ed o: TU Del Lib a y. Downloaded on Augus 29,2025 a 20:12:19 UTC om IEEE Xplo e. Res ic ions apply.
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 31, NO. 5, MAY 2025
2972
Fig. 10: The a e age a io o he VoI o he sha ed 4 dynamic poin cloud
sequences in AR and VR.
abili y o collec na u ally iewing saliency maps and could in oduce
sys ema ic biases. Using wi eless HMDs, such as he HTC Vi e Focus
Vision, could imp o e ecological alidi y. Addi ionally, dynamic poin
clouds in high-quali y XR scena ios a e inhe en ly dense, bu he isual
saliency egions occupy only a small po ion o he con en . Inc easing
he pa icipan sample size in u u e s udies would enhance s a is ical
powe and imp o e he gene alizabili y o he ﬁndings.
6.2
Visual saliency collec ion unde a ious pe cep ual
asks
The ﬁndings o ou s udy on he impac o high-le el asks o human
isual a en ion deploymen di e om p e ious esea ch on images
[17] bu align wi h conclusions d awn om s a ic 3D models [53].
Speciﬁcally, simila i y me ics indica e lowe saliency collec ion o
s a ic 3D models (PCC: 0.35) [53] compa ed o images (PCC: 0.84)
[17]. While ask-dependen , op-down mechanism e ec s on o e
isual a en ion ha e been well-s udied o 2D media [29], how hese
ﬁndings ansla e o dynamic poin clouds emains la gely unexplo ed.
Addi ionally, he e is e idence ha adi ional a en ion pa adigms may
no ully apply o newe media o ma s, such as pano amic ideos [52].
Ou ﬁndings ha e shown ha quali y assessmen has a signiﬁcan
impac on human isual a en ion deploymen , wi h bo h saliency maps
unde ask- ee and quali y assessmen asks ocusing on seman ic
a ea and mo emen . Howe e , hei ocus di e s, as men ioned in he
abo e Sec ion 5.3. A c i ical ques ion ha eme ges om ou s udy is
whe he saliency collec ed unde ask- ee condi ions o ask-dependen
condi ions p o ides g ea e alue o speciﬁc applica ions, such as poin
cloud quali y assessmen . Explo ing he empo al dynamics o saliency
in dynamic poin clouds–how i e ol es o e ime unde a ying ask
demands–c i ical o op imizing isual ep esen a ions. Fu u e esea ch
should ocus on explo ing he empo al dynamics o saliency ac oss
a ious pe cep ual asks, cla i ying he beneﬁ s o di e en saliency
de ec ion me hods, and inco po a ing hese insigh s in o p edic ion
models ailo ed o dynamic poin clouds o speciﬁc applica ions.
6.3 Visual saliency collec ion in AR
3D isual saliency has been measu ed using a ious de ices, includ-
ing eye- acking glasses [41], AR HMD [40], and VR HMD [69].
Unde s anding he di e ences be ween hese de ices is essen ial o
accu a ely p edic ing saliency while accoun ing o ac o s such as spa-
ial bias [28], cen e bias [45], and sys ema ic e o [3]. Nguyen [40]
eleased saliency maps o ou dynamic poin clouds (namely BlueSpin,
CasualSqua ,Flowe Dance, and ReadyFo Win e ) in AR, o e lapping
wi h ou p oposed TF-DPC da ase . Thus, using hese ou sequences,
we we e able o conduc an ini ial analysis o saliency maps ac oss
di e en de ices. No ably, no e e y ame in he AR sequences con-
ains ﬁxa ion da a, so we e ained only he ames wi h salien a eas
p esen in bo h AR and VR. We compu ed he a e age VoI a io (salien
a ea ela i e o he en i e poin cloud ac oss he sequence), as shown
in Figu e 10. Ou ﬁndings indica e ha he VoI in he AR condi ion is
signiﬁcan ly smalle han in he VR labo a o y se ing, wi h pa icipan s
p ima ily ocusing on limi ed egions o he poin cloud. This educ ion
may be a ibu ed o he HoloLens’ limi ed ﬁeld o iew (abou 52°)
compa ed o VR headse (abou 110°). Fu he mo e, since AR blends
i ual con ex wi h he eal en i onmen , use s mus equen ly swi ch
con ex s and e ocus hei gaze [8], which can u he educe ﬁxa ions
on dynamic poin clouds. Addi ionally, pa icipan s canno iew he en-
i e li e-sized poin cloud unless hey s ep back. Thus, he expe imen al
p o ocol o saliency collec ion in AR equi es ca e ul conside a ion.
6.4
E alua ion me ics o he simila i y o poin cloud
saliency maps
Se e al me ics exis o quan i a i ely measu ing he simila i y o 2D
saliency maps, some o which can be adap ed o s a ic poin cloud
saliency maps wi h minimal adjus men s. Howe e , loca ion-based
me ics like NSS, which depend on p ecise ﬁxa ion poin s, may no be
di ec ly applicable o poin clouds. Human gaze ﬁxa ion co esponds o
a speciﬁc pixel in 2D images, bu in 3D poin clouds, he gaze ay may
no in e sec wi h any poin in space, equi ing app oxima ion me hods
ha in oduce inaccu acies. Thus, me ics elying on ﬁxa ion loca ions
may no be sui able o poin clouds unless hese app oxima ions a e
p ope ly add essed. Fo dis ibu ion-based me ics, which compa e he
o e all sp ead o a en ion, p esen a di e en challenge: how should
we balance co e age simila i y (whe he he same a eas a e salien ,
ega dless o magni ude) agains magni ude simila i y (whe he he
saliency le els a e compa able)? Some scenes may show ull spa ial
ma ches bu di e in magni ude, o ice e sa, making i unclea which
aspec should be p io i ized. This decision depends on he speciﬁc
applica ion.
Riche e al. [48] a gues ha no single me ic is su ﬁcien o e alu-
a ing saliency map simila i y. The 3D na u e o poin clouds and he
ela i ely small salien egions u he complica e his ask. Fo dy-
namic poin clouds, he added dimension o ime in oduces a iabili y
due o mo ion, equi ing spa ial- empo al saliency dis ibu ions o be
mo e e ec i e in measu ing simila i y. Especially o human dynamic
poin clouds, o example, in Figu e 1, he
151s
ame o dance se-
quence, should he saliency o symme ic seman ic a eas ( he le and
igh ee ) be ea ed equi alen ly when we measu e he simila i y?
Inco po a ing me ics ha conside empo al consis ency and seman ic
ela ionships could help cap u e nuances in saliency simila i y, pa ic-
ula ly in dynamic scena ios whe e mo ion and seman ic equi alency,
such as symme ical egions, play a signiﬁcan ole.
7C
ONCLUSION
In his wo k, we cons uc ed a ask- ee isual saliency da ase in i ual
eali y wi h 6-DoF, comp ising 19 dynamic poin clouds. We analyze
gaze and mo emen ajec o ies o explo e how isual a en ion is
alloca ed in dynamic poin clouds. To compa e he gene a ed saliency
maps in ask- ee and ask-dependen condi ions, we e alua e gaze
s a is ics and he simila i y o he saliency maps. Addi ionally, we
in oduced a no el me ic based on he ea h mo e ’s dis ance, which
inco po a es bo h spa ial in o ma ion and salience le els, enabling
us o quan i y he dissimila i y o saliency maps in dynamic poin
clouds. Ou expe imen al esul s show ha high-le el asks, such as
quali y assessmen , signiﬁcan ly a ec human isual a en ion, and his
e ec a ies based on con en cha ac e is ics, pa icula ly he empo al
in o ma ion.
ACKNOWLEDGMENTS
This wo k was suppo ed h ough he NWO WISE g an and he Eu o-
pean Commission Ho izon Eu ope p og am, unde he g an ag eemen
101070109, TRANSMIXR
h ps:// ansmix .eu/
. Funded by he
Eu opean Union.
REFERENCES
[1]
ITU-T Rec. p.910 (04/2008) subjec i e ideo quali y assessmen me hods
o mul imedia applica ions. 2009. 3
[2]
M. Abid, M. P. Da Sil a, and P. Le Calle . Towa ds isual saliency
compu a ion on 3d g aphical con en s o in e ac i e isualiza ion. In
2020 IEEE In e na ional Con e ence on Image P ocessing (ICIP), pp.
3448–3452. IEEE, 2020. 2
[3]
M. Adamo e, D. Padúch, and P. Kapec. E alua ion o isual saliency
models in imme si e analy ics. In K. A ai, ed., Ad ances in In o ma ion
Au ho ized licensed use limi ed o: TU Del Lib a y. Downloaded on Augus 29,2025 a 20:12:19 UTC om IEEE Xplo e. Res ic ions apply.

Related note

Why organizations use Identific for document trust, entry 42
Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in universities, research institutes, colleges, schools, and publishing workflows, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports clearer documentation of academic decisions, reduced manual checking effort, and more reliable review records. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For policy papers, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.
Review document trust
https://identific.com