Enhancing Sa e y in Indus y 5.0: Human-Compu e
Collabo a ion Bene i s h ough a Da ase o
P o ec i e Equipmen De ec ion
Alexand os Pe opoulos∗, Geo gia Apos olou∗, Geo gios Tsoumplekas†, Geo ge Tziolas‡
Nikolaos N ampakis†, Ilias Siniosoglou†, Vasileios A gy iou§
Panagio is Sa igiannidis¶, Ilias Gialampoukidis∗, S e anos V ochidis∗
∗ITI, CERTH, Thessaloniki, G eece
Emails: {alpe op, gapos olou, heliasgj, s e anos}@i i.g
†METAMIND INNOVATIONS P.C., Kozani, G eece
Emails: {g soumplekas, nn ampakis, isiniosoglou}@me amind.g
‡Sid oco Holdings L d, Nicosia, Cyp us
Email: [email p o ec ed]
§Kings on Uni e si y London, London, Uni ed Kingdom
Email: [email p o ec ed]
¶Uni e si y o Wes e n Macedonia, Kozani, G eece
Email: psa [email protected]
Abs ac —This pape p esen s he de elopmen and implemen-
a ion o a comp ehensi e da ase designed o enhance wo kplace
sa e y in indus ial en i onmen s h ough ad anced compu e
ision echnologies. The da ase ocuses on he de ec ion o
essen ial p o ec i e equipmen , such as helme s and es s, wo n
by wo ke s in a ious indus ial se ings. U ilizing his da ase ,
a YOLO 8-based compu e ision model is ained o achie e
82.1% mAP accu acy (70.5% o helme s, 93.7% o es s) in
eal- ime iden i ica ion o whe he wo ke s a e equipped wi h
he app op ia e sa e y gea , demons a ing high eliabili y o
sa e y compliance moni o ing.
This ini ia i e is pa o he Eu opean Resea ch P ojec
TALON, which aims o demons a e he po en ial o collabo a i e
e o s be ween humans and machines in achie ing highe sa e y
s anda ds, h ough i s 4 h pilo . TALON sys em will enable
au oma ed, lexible, adap able, p og ammable, explainable and
ene gy-e icien edge A i icial In elligence (AI) ne wo king by
de eloping complemen a y echnologies such as AI o ches a o ,
blockchain, edge ne wo king and digi al wins (DTs) in an
in eg a ed and inno a i e way. The da ase , along wi h he
de eloped p edic i e model, o e s a signi ican con ibu ion o
he ield o sa e y, showcasing how echnological ad ancemen s
can be le e aged o sa egua d human li es in he wo kplace.
Index Te ms—p o ec i e equipmen , sa e y, compu e ision,
indus y, human- obo collabo a ion, deep lea ning, CNN
I. INTRODUCTION
In indus ial en i onmen s, ensu ing he sa e y o wo ke s is
pa amoun . P o ec i e equipmen , such as helme s and es s,
plays a c i ical ole in mi iga ing he isks associa ed wi h
wo kplace haza ds. Howe e , manual moni o ing o compli-
ance wi h sa e y egula ions can be labo -in ensi e and p one
o human e o . To add ess his challenge, we p opose he
de elopmen o a da ase ailo ed o he de ec ion o p o ec i e
equipmen using compu e ision echnologies.
This pape in oduces he da ase c ea ion p ocess, which
encompasses he collec ion, anno a ion, and alida ion o
images depic ing wo ke s in a ious indus ial scena ios. The
da ase is designed o ain a compu e ision model capable
o accu a ely iden i ying whe he wo ke s a e wea ing he
equi ed p o ec i e gea . This ini ia i e is unde aken as pa
o he Eu opean esea ch p ojec TALON [1], which aims o
showcase he syne gy be ween human wo ke s and machine
lea ning sys ems in enhancing wo kplace sa e y.
The TALON p ojec highligh s he impo ance o collabo a-
i e e o s be ween humans and machines o achie e g ea e
sa e y s anda ds. The powe o machine lea ning and compu e
ision is le e aged o c ea e a sa e wo k en i onmen and
p omo e a cul u e o sa e y compliance. The da ase and he
p edic i e model de eloped h ough his p ojec ep esen a
signi ican ad ancemen in he ield o sa e y, demons a ing
how echnological inno a ion can be ha nessed o p o ec
human li es in he wo kplace. The main objec i es o his
wo k a e:
•C ea e a da ase and ill he gap ha exis s in al eady
exis ing da ase s ega ding changing ligh ing condi ions
and occlusions. An addi ional ea u e ha di e en ia es i
om he o he co esponding da ase s is ha he images
a e cap u ed by a d one, p o iding mo e di e si y.
•T ain a s a e-o - he-a deep lea ning model wi h he
abo e da ase .
•P esen a scena io in which he da ase and he model
will be used wi hin he TALON p ojec .
II. RELATED WORK
The e a e se e al da ase s on p o ec i e equipmen , mos o
which ocus on he class ’helme ’. Howe e , o he da ase s
wi h mo e classes ha e been c ea ed ecen ly.
The SHD da ase [2] comp ises 5000 images ea u ing h ee
objec classes—helme , head, and pe son. Howe e , a signi i-
can po ion o hese images is pa ially labeled. SHEL5K [3]
da ase on he o he hand, in oduced an enhanced e sion o
he SHD da ase , con aining a o al o 75,570 labels.
The ha dha da ase [4] is a sa e y helme da ase p o ided
by No heas e n Uni e si y, comp ising 7,063 labeled images.
I consis s o h ee dis inc classes wi h abou 27,000 labeled
objec s: helme s, heads, and pe sons. Howe e , in he da ase ,
he ”pe son” class is no labeled accu a ely and he e is an
unequal dis ibu ion o images ac oss he classes.
Pic o - 3 da ase [5] ocuses on he classes wo ke , ha ,
and es . Images a e collec ed h ough c owd-sou cing om
cons uc ion p ojec s and web-mining using sea ch engines.
Human anno a o s used LabelBox [6] and LabelMe [7] ools
o anno a ion o ensu e accu acy. The a ailable c owd-sou ced
subse con ains 775 images con aining wo ke s.
Si e Objec De ec ion Da ase (SODA) [8], in oduced a
la ge-scale image da ase speci ically designed o objec de-
ec ion in cons uc ion si es, con aining 15 di e en objec
classes. The da ase was de eloped o add ess he need o
la ge-scale, anno a ed images in cons uc ion si es, which a e
c ucial o aining and e alua ing objec de ec ion algo i hms
SHWD [9] o e s a comp ehensi e da ase speci ically o
sa e y helme usage and human head de ec ion. I comp ises
7,581 images ea u ing 9,044 ins ances whe e sa e y helme s
a e wo n (posi i e) and 111,514 ins ances o no mal heads
(nega i e). The posi i e examples we e sou ced om Google
and Baidu, and we e manually anno a ed using LabelImg [10].
Finally, a signi ican con ibu ion o he ield was made by
[11] wi h hei SH17 da ase , which ocused on human sa e y
and pe sonal p o ec i e equipmen de ec ion. The da ase con-
ains o e 17,000 anno a ed images cap u ing a ious scenes in
manu ac u ing en i onmen s, wi h bounding box anno a ions
o di e en ypes o PPE, such as ha d ha s, sa e y es s, and
sa e y glasses. Howe e , as no ed in hei pape , challenges
emained in e ms o de ec ing PPE in a ying ligh ing con-
di ions and occlusions.
Despi e hese de elopmen s, he e a e s ill se e al chal-
lenges in he ield o pe sonal p o ec i e equipmen de ec-
ion da ase s, such as: he di e en ligh ing condi ions, he
exis ence o occlusions and he limi ed a ailabili y o d one
image y o use in co esponding scena ios. Ou wo k aims
o add ess hese limi a ions by in oducing a comp ehensi e
da ase and exploi ing i in eal-wo ld condi ions wi hin he
TALON p ojec .
III. DATASET CREATION AND CHARACTERISTICS
A. In oduc ion o he Da ase
This da ase was c ea ed as pa o he TALON p ojec o
suppo he 4 h pilo . This pilo aims o expand he le el o au-
oma ion in he inc easingly complex manu ac u ing landscape
h ough human- obo collabo a ion. Fo he needs o he pilo ,
a wa ehouse was used whe e, depending on he scena io, i is
necessa y o de ec in di e en a eas o he wa ehouse (ei he
indoo o ou doo ) whe he wo ke s a e wea ing pe sonal
p o ec i e equipmen such as helme s and es s. Be o e he
use o TALON, he moni o ing o he p emises was ca ied
ou by a pe son on pa ol. Howe e , now he pa ols a e
done by d ones and he TALON sys em uns deep lea ning
algo i hms whe e i de ec s i wo ke s a e wea ing he equi ed
p o ec i e equipmen , while aising an ale on he pla o m’s
dashboa d. The goal is o c ea e a di e se, well-labeled da ase
ha ep esen s eal-wo ld scena ios, p o iding he ounda ion
o aining compu e ision models ha can be deployed o
sa e y moni o ing in indus ial and cons uc ion se ings.
B. Da a Collec ion P ocess
The c ea ion o he da ase s a ed wi h he planning o
he shoo s ha needed o be done o o e as much di e si y
as possible. Hence, based on he a ailable space whe e he
wo ke s mo e, a ious scena ios we e designed acco ding o
which he d ones would ly. O he 5 a ailable spaces, 3 we e
indoo s, 1 was ou doo and he las was indoo s, bu wi h di ec
access ou side o ucks o en e /exi . The ilming was done
du ing he wa ehouse opening hou s wi h he pa icipa ion o
he s a and he eam conduc ing he measu emen s. S a
ollowed hei daily wo king ou ine wi h no in e up ions, o
ge a ealis ic esul . The e o e o achie e di e si y :
1) mul iple di e en d one ou es we e made in all a eas
2) in each un, pa ame e s such as he heigh o he d one
we e changed, om 3 - 7 me e s indoo s and up o 20
me e s ou doo s
3) he angle o he came a o he d one was changing om
45 o 90 deg ees
4) o ha e di e en ligh ing condi ions, he sho s we e aken
a di e en imes o he day, bo h in he mo ning and in
he a e noon, when he sun was less. In addi ion, whe e
a ailable, he a i icial ligh licke ed.
5) he e we e di e en numbe s o people in each scene and
each ime some wo e p o ec i e gea and some did no .
Wo ke s a e cap u ed in s anding, si ing, walking, and
ope a ing machine y poses, simula ing eal-li e scena ios
whe e di e en body o ien a ions may a ec he isibili y
o PPE.
6) he da ase also con ains images wi h a ying backg ound
clu e and complex scenes (e.g., wo ke s in g oups,
occlusions om machine y o o he wo ke s) o simula e
eal-wo ld challenges o PPE de ec ion sys ems.
7) o he needs o he scena io helme s and sa e y es s
we e used as his was he equipmen wo n by he wo ke s
in he wa ehouse
A sample o he images aken om he da a se unde
di e en condi ions is shown in Figu e 1.
C. Da ase Cha ac e is ics
1) Image Size and Resolu ion: All images in he da ase a e
in ull HD (1920x1080) esolu ion. A be e esolu ion han
Fig. 1. Sample images om he da ase
usual was chosen, because he d one was a some dis ance
and in o ma ion on bo h pe sonal p o ec i e equipmen and
ambien ligh ing condi ions needed o be be e cap u ed.
In addi ion, s a ing wi h high- esolu ion images p o ides
he lexibili y o expe imen wi h di e en esolu ions du ing
p ep ocessing wi hou losing he o iginal quali y.
2) Labeling and Anno a ions: To anno a e he da ase , he
d one ideos we e con e ed in o ames, keeping he mos
impo an ones. The open sou ce- ool c a [12] was used, wi h
2 people doing he anno a ion and a hi d pe son doing he
inal e alua ion a he end. The de ined classes a e helme and
es , as hese a e he mos common means o p o ec ion in
he wa ehouse. Nex , special a en ion was gi en o labeling
occluded o pa ially isible PPE, as hese a e common in eal-
wo ld scena ios. Finally, he da ase was expo ed o YOLO 1
o ma , ensu ing compa ibili y wi h widely used compu e
ision amewo ks.
3) Class Imbalance: The g aph in Figu e 2, illus a es he
dis ibu ion o classes wi hin ou da ase . As e iden , he e is
a no able imbalance, wi h a ce ain class being signi ican ly
unde ep esen ed compa ed o ano he . Gi en ha p o ec i e
gea migh no always be isible o migh be wo n in di e en
con igu a ions (e.g., helme s alone, helme s and es s), we
obse ed ha he ’helme ’ class has abou 1750 snapsho s,
while he ’ es ’ class has only 500.
4) P i acy and E hical Conside a ions: While he da ase
is designed o indus ial sa e y, p i acy and e hical conce ns
ega ding he inclusion o wo ke images we e ca e ully
conside ed. All wo ke s appea ing in he da ase ha e gi en
explici consen o hei images o be used o esea ch and
aining pu poses. Addi ionally, any iden i ying in o ma ion
such as aces o logos on clo hing we e ei he obscu ed o
excluded o p o ec p i acy.
5) Da ase Spli ing and E alua ion: To e alua e he pe -
o mance o deep lea ning models ained on his da ase , i
was di ided in o aining, alida ion, and es se s. The spli
was as ollows: T aining Se : 75%, Valida ion Se : 15% and
Tes Se : 10%.
Addi ionally, i was ensu ed ha he spli s we e balanced in
e ms o he dis ibu ion o wo ke s wea ing ull PPE e sus
hose wi hou , as well as he a ious en i onmen al condi ions.
Fig. 2. Class imbalance diag am.
6) Challenges and Limi a ions: While he da ase has been
designed o be di e se, he e a e s ill se e al challenges and
limi a ions:
•Pa ial Occlusion: In some scena ios, PPE migh be
pa ially co e ed by o he objec s o wo ke s, making i
di icul o models o co ec ly de ec he equipmen .
•Image Quali y: Va iabili y in shoo ing condi ions, such
as di e en ligh ing condi ions, a i ac s, noise and e-
lec ions, can signi ican ly a ec he consis ency, cla i y
and o e all quali y o images.
Despi e hese challenges, his da ase se es as a comp ehen-
si e esou ce o aining models ha can iden i y PPE and
ensu e wo ke sa e y ac oss a a ie y o en i onmen s and
scena ios.
IV. PPE DETECTION WITH YOLOV8
Following he in oduc ion o he p oposed da ase , he
nex s ep is o e alua e how e ec i ely i can be used o
ain an objec de ec o aimed a PPE de ec ion in indus ial
se ings. Au oma ing PPE de ec ion could enhance indus ial
wo ke s’ sa e y by allowing o b oade co e age o indus ial
wo kspaces h ough d one image y. A he same ime, i could
educe he possibili y o e o s due o human ac o s by using
p ecise objec de ec ion models. In his sec ion, we analyze
how he p oposed da ase can be u ilized o ain such an
e ec i e objec de ec ion model o PPE de ec ion and assess
i s e ec i eness h ough a se ies o e alua ions.
In he ollowing subsec ions, i s , a desc ip ion o
YOLO 8n is p o ided, a ligh weigh model ha enables eal-
ime objec de ec ion, which was u ilized as he examined
objec de ec o in his s udy. Nex , he expe imen al se ing
in which he model was ained, as well as he me ics used
o i s pe o mance e alua ion a e ou lined. Finally, he esul s
ob ained a e p esen ed, demons a ing he da ase ’s sui abili y
o aining an e ec i e PPE de ec ion model ha can be easily
deployed in eal-wo ld applica ions.
A. Model Desc ip ion
O e he yea s, a ious objec de ec ion models ha e been
de eloped ocusing on bo h gene al objec de ec ion [13] as
well as applica ion-speci ic cases, such as PPE de ec ion [14].
Gene ally, he mos common objec de ec ion models can be
ca ego ized as one-s age o wo-s age de ec o s. One-s age
de ec o s u ilize ancho s [15] o an ini ial se o po en ial objec
cen e s [16], [17] o make p edic ions. On he o he hand,
wo-s age de ec o s gene a e p oposals ha a e subsequen ly
e ined o ob ain he inal p edic ions [13]. Finally, in ecen
yea s, a ious no el app oaches based on he ans o me
a chi ec u e [18] ha e been p oposed ha elimina e he need
o ini ial guesses in he o m o ancho s o p oposals, such
as DETR [19] and DINO [20].
An impo an conside a ion o he examined use case is ha
he selec ed model should be able o p oduce eal- ime p e-
dic ions, enabling i s applica ion in eal-wo ld en i onmen s
ha in ol e a con inuous s eam o inpu , such as ideo. A
he same ime, he model should also be e icien in e ms o
i s compu a ional equi emen s du ing aining and in e ence,
allowing i s deploymen in po en ially esou ce-cons ained
de ices. While wo-s age de ec o s, such as Fas e -RCNN [13],
ha e demons a ed s ong pe o mance on a ious benchma ks
and applica ions, he addi ional o e head in oduced by hei
Region P oposal Ne wo k ende s hem less e ec i e o eal-
ime objec de ec ion. On he o he hand, mode n ans o me -
based objec de ec ion models alle ia e he need o p oposal
gene a ion. Howe e , hey ypically le e age la ge backbone
a chi ec u es [19] ha equi e signi ican compu a ional powe ,
e en du ing in e ence, making hem less app op ia e o
deploymen in en i onmen s wi h es ic ed compu a ional
esou ces.
Since he goal is o de elop an objec de ec ion model
sui able o eal- ime objec de ec ion in eal-wo ld indus ial
applica ions, we op o YOLO 8n [21], a well-es ablished
one-s age objec de ec ion model ha combines compu a-
ional e iciency wi h eal- ime objec de ec ion capabili ies.
YOLO 8’s a chi ec u e, enowned o i s balanced accu acy-
speed ade-o , has been alida ed in indus ial eal- ime
applica ions [22], making i ideal o ou sa e y moni o ing
con ex . In pa icula , YOLO 8n is he smalles a ian o
he YOLO 8 model amily, consis ing o app oxima ely 3.2
million pa ame e s. Building upon i s p edecesso s in he
You-Only-Look-Once [15] (YOLO) model amily, he model
le e ages he CSPDa kne 53 [23] ne wo k as i s backbone
ea u e ex ac o , which ollows he Fea u e Py amid Ne wo k
s uc u e in oduced in [24], ha enables he ex ac ion o
ea u e maps a di e en scales. Consequen ly, his ea u e
allows o mul i-scale objec de ec ion, which is pa icula ly
use ul in he con ex o he examined da ase , whe e objec s
such as es s may be la ge compa ed o smalle objec s like
helme s, especially in images aken om a signi ican dis ance.
The second majo componen o YOLO 8 is i s de ec ion head,
which p ocesses he mul i-scale ea u e maps gene a ed by
he backbone ea u e ex ac o using a se ies o con olu ional
laye s. The esul ing ep esen a ions a e subsequen ly u ilized
by h ee de ec ion modules, each ope a ing a di e en scales,
which a e esponsible o p oducing he inal classi ica ion
sco es and bounding boxes o he de ec ed objec s.
As a inal no e, i is wo h no ing ha wi hin he con ex o
he examined use case in he TALON p ojec , he de eloped
model is deployed in he cloud, which ypically o e s g ea e
lexibili y in e ms o he a ailable compu a ional esou ces.
Howe e , due o YOLO 8n’s ligh weigh a chi ec u e and
i s one-s age de ec ion me hod based on ancho gene a ion,
i can also be e ec i ely deployed on esou ce-cons ained
en i onmen s, such as edge de ices, wi h limi ed esou ce
a ailabili y. This is a c ucial equi emen in mos indus ial
applica ions, whe e he e is a need o bo h pe o mance in
e ms o objec classi ica ion and localiza ion accu acy, as well
as educed compu a ional and ene gy needs [25].
B. Expe imen al Se ing
The p oposed da ase con ains wo classes, namely hel-
me and es , bo h o which a e s anda d PPEs ypically
encoun e ed in indus ial si es. Howe e , a c i ical ac o in
such use cases is he abili y o de ec he absence o hese
objec s o ensu e compliance wi h egula ions ha manda e
he use o such PPE and po en ially help educe inju ies in
he e en o acciden s. As a esul , we aim o enhance he
selec ed YOLO 8n objec de ec ion model wi h his capabili y
ollowing a wo-s age ine uning app oach.
Speci ically, s a ing om a YOLO 8n model p e ained on
MS-COCO [26], he i s s age in ol es ine uning he en i e
model using he Wo ke -Sa e y da ase [27]. In pa icula , he
da ase consis s o i e classes: pe son, helme , no-helme , es ,
and es , allowing us o de ec bo h he exis ence o absence
o c i ical PPE in indus ial se ings. We u ilized 80% o he
a ailable da a as he aining se o his ine uning s age, while
he emaining 20% se ed as ou es se . Addi ionally, we
ollow a s a i ied app oach du ing ain- es spli ing o ensu e
ha class p opo ions a e main ained ac oss he wo se s. As
o he selec ed hype pa ame e s, we op ed o he de aul
alues p o ided by he Ul aly ics [28] amewo k, gi en hei
e ec i eness in aining a model ha can success ully gene -
alize wi hin he speci ic da ase and no u he imp o emen
was achie ed by using di e en hype pa ame e alues in ou
p elimina y expe imen a ion.
Since bo h helme and es objec s, along wi h hei co e-
sponding labels, we e included in he Wo ke -Sa e y da ase ,
i is possible o immedia ely use he de eloped model o
in e ence in he p oposed da ase . Howe e , since he p oposed
da ase is ocused on a mo e ealis ic se ing whe e images
a e cap u ed by a d one om a dis ance, po en ial issues
could a ise due o dis ibu ional o domain shi s be ween he
Wo ke -Sa e y da ase ’s aining se and he p oposed da ase ’s
e alua ion se . To add ess any pe o mance deg ada ion ha
migh esul om his dis ibu ional shi , we in oduce a
second ine uning s age whe e he model is u he ine uned
using he aining se o he p oposed da ase . To a oid
ca as ophic o ge ing [29] o classes o which labels a e
no a ailable in he p oposed da ase , speci ically pe son, no-
helme , and no- es , we only ine une he h ee de ec ion
modules while keeping he es o he model ozen. Simila o
he i s ine uning s age, we u ilize he de aul hype pa ame e
alues p o ided by he Ul aly ics amewo k.
Fo bo h da ase s, we e alua e model pe o mance using
Mean A e age P ecision (mAP), which is de ined as ollows:
mAP (%) = 1
N
N
X
i=1
APi(1)
whe e APiis he A e age P ecision (%) ob ained o he
i- h class, gi en Nclasses using an In e sec ion o e Union
(IoU) h eshold o 50%. Addi ionally, we epo P ecision and
Recall alues in he es se s o bo h da ase s, as well as
hei co esponding P ecision-Recall cu es and no malized
con usion ma ices.
C. Expe imen al Resul s
The de eloped YOLO 8n model is e alua ed on he es
se s o bo h da ase s used du ing he ine uning s ages a e
comple ing bo h s ages. The ollowing subsec ions p esen and
analyze he esul s ob ained o bo h da ase s.
TABLE I
MODEL PERFORMANCE IN THE WORKER- SAFETY DATASET
Class P ecision(%) Recall(%) mAP(%)
pe son 97.9 91.7 99.3
helme 98.7 98.0 99.3
no-helme 95.8 100.0 99.5
es 97.8 94.5 97.9
no- es 99.5 96.0 99.2
all 97.9 96.0 99.0
1) Wo ke -Sa e y Da ase : Table I p esen s he esul s o
each class, as well as agg ega ed esul s o all classes on
he Wo ke -Sa e y da ase . No ably, he model achie es an
o e all mAP o 99.0% wi h he A e age P ecision o each
class exceeding 99%, excep o he es class, which has a
sligh ly lowe A e age P ecision o 97.9%, demons a ing he
p oposed model’s e ec i eness in de ec ing c i ical PPE wi h
minimal e o s. Addi ionally, he model exhibi s signi ican ly
high p ecision and ecall alues, indica ing ha i p oduces
only a e y small numbe o alse posi i e and alse nega i e
p edic ions ac oss all examined classes. This is u he co ob-
o a ed in Figu e 3, which illus a es he P ecision-Recall cu e
o his da ase , showing ha he model main ains e y high
sco es o bo h p ecision and ecall wi hou su e ing om
any p ecision- ecall adeo s ha a e ypically encoun e ed in
deep lea ning models.
Figu e 4 p esen s a no malized e sion o he con usion
ma ix ob ained o he model’s p edic ions, demons a ing ha
he model achie es ema kably high de ec ion pe o mance o
Fig. 3. P ecision-Recall cu e o he Wo ke -Sa e y da ase .
Fig. 4. No malized con usion ma ix o he Wo ke -Sa e y da ase .
all classes o in e es . I is also wo h no ing ha he model
makes minimal e o s o misclassi ying an objec as belonging
o a di e en class, wi h mos e o s a ising om he model
ailing o de ec he objec and ins ead conside ing i pa o
he image’s backg ound. O e all, i is e iden ha while he
model is e alua ed on he es se o he da ase used in he
i s s age a e he second ine uning s age is comple ed, i
manages o main ain i s high pe o mance. This esul indica es
ha he de eloped model does no su e om ca as ophic
o ge ing, which can be a ibu ed o he ac ha du ing he
second ine uning s age, only he de ec ion modules o he
model we e ine uned, leading o only minimal changes in he
ep esen a ions lea ned du ing he i s s age.
2) P oposed Da ase : Table II p esen s he o e all esul s
and he esul s pe class ob ained in he p oposed da ase using
he model a e he wo ine uning s ages. The o e all mAP o
82.1% is lowe compa ed o he 99.0% alue in he Wo ke -
TABLE II
MODEL PERFORMANCE IN THE PROPOSED DATASET
Class P ecision(%) Recall(%) mAP(%)
helme 80.5 71.6 70.5
es 96.5 88.0 93.7
all 88.5 79.8 82.1
Sa e y da ase , which could be a ibu ed o he p oposed
da ase ’s mo e challenging and ealis ic se ing, whe e objec s
appea signi ican ly smalle because he images a e aken
om a g ea e dis ance, making de ec ion mo e di icul . The
pe o mance disc epancy in he wo da ase s is mo e e iden
o helme de ec ion, whe e a mAP o 70.7% is achie ed in
he p oposed da ase .
Fig. 5. P ecision-Recall cu e o he p oposed da ase .
Rega ding pe o mance disc epancies be ween helme and
es de ec ion, i is gene ally expec ed ha pe o mance o
helme s is going o be lowe compa ed o es s (93.7% in
he p oposed da ase ) due o hei smalle size, which makes
hem mo e challenging o de ec . Figu e 5 u he illus a es
his disc epancy by p esen ing he P ecision-Recall cu es o
each class. Fo es de ec ion, he model main ains s ong
pe o mance o bo h p ecision and ecall, while o he mo e
challenging helme class he e is a clea adeo be ween
hese wo me ics, pa icula ly when a ecall is se abo e 0.8,
esul ing in a signi ican d op in p ecision. Howe e , lowe ing
he con idence de ec ion h eshold du ing in e ence could help
s ike a be e balance be ween alse posi i e and alse nega i e
e o s o he helme class.
Figu e 6 p esen s he no malized e sion o he con usion
ma ix o he model’s p edic ions on he p oposed da ase .
In acco dance wi h ou p e ious indings, he model’s pe o -
mance is lowe compa ed o he Wo ke -Sa e y da ase , which
can be a ibu ed o he mo e challenging se ing in oduced
in he p oposed da ase . Once again, his disc epancy is mo e
e iden in he de ec ion o helme s due o hei signi ican ly
Fig. 6. No malized con usion ma ix o he p oposed da ase .
smalle size. I is also wo h no ing ha , unlike he Wo ke -
Sa e y da ase , in he p oposed da ase , a signi ican numbe
o he misclassi ica ion e o s occu du ing backg ound and
o eg ound classi ica ion whe e elemen s in he backg ound
a e inco ec ly iden i ied as helme s.
O e all, al hough he pe o mance measu es in his da ase
a e lowe , he esul s ob ained o bo h classes can be con-
side ed highly compe i i e, especially gi en he challenging
e alua ion se ing, unde sco ing he sui abili y o he da ase
o aining objec de ec ion models ha can be e ec i ely
deployed in eal-wo ld indus ial applica ions. Finally, i is
impo an o no e ha e en hough he model can p edic all
i e classes in oduced in he Wo ke -Sa e y da ase , we only
epo e alua ion me ics o helme s and es s in he p oposed
da ase due o he absence o labels o he es o he classes.
V. SCENARIO
This sec ion in oduces he TALON p ojec , i s objec i es,
and bene i s. I explains how he da a se and p edic ion
model will be in eg a ed and used in a eal-wo ld scena io,
demons a ing he sys em’s bene i s unde eal condi ions.
A. In oduc ion o TALON p ojec and use case scena io
TALON add esses Indus y 5.0 challenges by combining
edge-cloud AI o in elligen au oma ion. I s 4 h use case
ocuses on human- obo collabo a ion in manu ac u ing, de-
ploying d ones and compu e ision o enhance sa e y in
wa ehouses whe e wo ke s mus wea helme s/ es s.
Cu en sa e y checks ely on manual inspec ions, isking
e o s. TALON au oma es moni o ing using d ones and eal-
ime deep lea ning o de ec non-compliance, such as missing
sa e y gea , and igge ale s. This educes esponse imes,
ensu es as e haza d iden i ica ion, alida es sa e y p o ocols,
and secu es he wo kplace h ough e icien human- obo col-
labo a ion.
B. O e iew o he scena io pipeline
Thus, his scena io demons a es in eg a ing deep lea ning
and edge compu ing o eal- ime wo ke sa e y moni o ing.
As a esul , d one de ices a e deployed in a wa ehouse se ing
o moni o wo ke s’ compliance wi h p o ec i e equipmen
(PPE) s anda ds. The da a collec ed by hese d ones is in o a
YOLO 8 [22] model ha de ec s whe he wo ke s a e wea ing
he equi ed p o ec i e gea , such as helme s and es s. The
sys em achie es 30 FPS p ocessing wi h sub-100ms la ency
du ing d one-based deploymen , enabling immedia e sa e y
in e en ions. Addi ionally, p i acy conce ns a e add essed
h ough ace anonymiza ion, and he esul s a e isualized in
a dashboa d use in e ace (UI) ha p o ides ac ionable sa e y
insigh s. Figu e 7 shows a schema ic diag am o he scena io.
Fig. 7. Diag am o he UC4 scena io
As shown in he diag am, he TALON componen s ha
pa icipa e in he scena io a e di ided in o 2 main ca ego ies,
hose loca ed a he edge and hose loca ed in he cloud. On
he edge a e he d ones, while in he cloud a e he main
componen s ha will be used, which a e he p edic ion model,
he anonymisa ion ool and he dashboa d. All hese sys ems
communica e wi h each o he and a e o ganised in a uni ied
way unde he TALON umb ella.
C. S ep by s ep analysis
This sec ion p esen s he scena io s ep by s ep, along wi h
pic u es o he esul s a e unning i .
Ini ially, he d one is pilo ed in o he wa ehouse, equipped
wi h high-de ini ion came as o cap u e li e ideo oo age o
he wo kplace. These ideos a e subsequen ly ed in o he
YOLO de ec o , as de ailed in he p e ious sec ion. Figu e
8 depic s he d one’s ligh pa h wi hin he wa ehouse, while
Figu e 9 p esen s a sc eensho o a ideo aken in a co ido .
Fig. 8. D one pa hs
Fig. 9. D one s eam
Nex , TALON’s AI o ches a o akes cha ge, in elligen ly
alloca ing esou ces ei he in he cloud o a he edge o
ensu e all scena io modules un seamlessly. By op imizing
he a ailable esou ces a any gi en ime, he sys em achie es
op imal pe o mance and scalabili y.
Once he esou ces ha e been alloca ed, he YOLO objec
de ecion model as al eady men ioned, is loaded. This model
p ocesses he d one’s ideo inpu , iden i ying wo ke s and
checking i hey a e wea ing helme s o es s, based on
hei speci ic wo k a eas. The ou pu is a ideo anno a ed
wi h bounding boxes and ex ha highligh he objec s o
in e es — he wo ke s and hei p o ec i e equipmen . Figu e
10 illus a es a sc eensho o he module’s esul s.
Fig. 10. Resul s o secu i y compliance o PPE de ec o
The ou pu ideo om he de ec o is subsequen ly p o-
cessed by an anonymiza ion ool o ensu e p i acy and adhe e
o e hical s anda ds. This ool blu s he aces o employees,
sa egua ding hei iden i ies. Figu e 11 demons a es he e-
sul s a e anonymiza ion.
Finally, he dashboa d, whe e ale s a e displayed, is he
mos c ucial componen o he sys em. Du ing he scena io,
he sys em con inuously moni o ed he p ocessed da a o de ec
any ins ances o non-compliance wi h MAP egula ions. I an
employee was ound no wea ing he equi ed PPE, an ale
was gene a ed and displayed on he dashboa d, accompanied
by he ele an image om he anonymiza ion ool. Figu e 12
shows he ale o a wo ke ound wi hou a helme and es .
This enabled sa e y manage s o ake immedia e co ec i e
ac ions, he eby enhancing o e all wo kplace sa e y.
I is impo an o no e ha all he scena io s eps, along
wi h hei esul s, a e p esen ed wi hou blu ing aces o
Fig. 11. Resuls o Anonymiza ion ool
Fig. 12. Ale s dashboa d
demons a e he unc ion o each module indi idually. In
he inal sys em, only he inal image wi h obscu ed aces
accompanied by ale s would be seen by he ope a o on he
dashboa d. Addi ionally, all pa icipan s we e in o med abou
and p o ided consen o he p ocessing o hei da a.
VI. CONCLUSION AND FUTURE DIRECTIONS
Fi s ly, we add essed he c ea ion o a da ase designed o
b idge he gap in exis ing da a, pa icula ly conce ning ligh ing
condi ions, pa ial occlusions, and d one sho s o a ious
scena ios. The da ase was hen u ilized o ain a de ec o ,
yielding highly compe i i e esul s despi e he challenging
condi ions o simila scenes. Looking ahead, we could expand
he da ase by in oducing addi ional classes such as goggles
and p o ec i e glo es, while also inc easing he o e all numbe
o da a.
Subsequen ly, a use case scena io was p esen ed wi hin he
con ex o he TALON p ojec . This scena io demons a ed
signi ican bene i s in e ms o secu i y o wa ehouse wo ke s.
Based on he ea ly esul s o his scena io and compa ed
o he ea lie app oach, whe e he secu i y o ice manually
pa olled he si e and eco ded wo ke s a us on pape , he
new sys em has led o a ema kable imp o emen —inspec ion
ime has been educed by 75%, while accu acy has inc eased
by 70%. Mo ing o wa d, he scena io could be enhanced by
inco po a ing addi ional elemen s like explainable AI, which
would p o ide be e jus i ica ion and explana ion o he esul s
o he secu i y o ice . Finally, es ing he scena io in di e en
wo kplaces would p esen se e al challenges ha need o be
add essed.
VII. MATCH AND CONTRIBUTION
Ou esea ch aligns closely wi h he scope o ICE IEEE
2025 con e ence esea ch objec i es. By add essing he man-
agemen o eme ging echnologies, we showcase he in eg a-
ion o d ones, edge-cloud AI, and compu e ision o au o-
ma e PPE compliance in indus ial en i onmen s. The de elop-
men o a di e se, d one-acqui ed da ase and deploymen o a
eal- ime YOLO 8 model p o ides a p ac ical amewo k o
enhancing sa e y h ough human-machine collabo a ion. We
analyze implemen a ion challenges such as ligh ing a iabili y,
occlusion, and domain adap a ion, o e ing ealis ic solu ions
and me ics o pe o mance. Ul ima ely, ou wo k ocuses on
alue c ea ion by educing wo kplace haza ds and imp o ing
sa e y p o ocol en o cemen h ough in elligen au oma ion.
ACKNOWLEDGMENT
This wo k was suppo ed by he Eu opean Union’s Ho izon
Eu ope Resea ch and Inno a ion p og amme TALON, unde
g an ag eemen No. 101070181. Acknowledgmen s o ou
colleagues a he VANOS wa ehouses [30] o hei suppo
in he measu emen p ocedu e and hei collabo a ion.
REFERENCES
[1] “TALON — alon-p ojec .eu,” h ps:// alon-p ojec .eu/, [Accessed 08-
01-2025].
[2] “Sa e y Helme De ec ion — kaggle.com,” h ps://www.kaggle.com/
and ewm d/ha d-ha -de ec ion, [Accessed 07-01-2025].
[3] M.-E. O gonbold, M. Gochoo, F. Alnajja , L. Ali, T.-H. Tan, J.-W. Hsieh,
and P.-Y. Chen, “SHEL5K: An Ex ended Da ase and Benchma king
o Sa e y Helme De ec ion,” Senso s, ol. 22, no. 6, p. 2315, Jan.
2022. [Online]. A ailable: h ps://www.mdpi.com/1424-8220/22/6/2315
[4] L. Xie, “Ha dha — da a e se.ha a d.edu,” h ps://da a e se.ha a d.
edu/da ase .xh ml?pe sis en Id=doi:10.7910/DVN/7CBGOS, [Accessed
07-01-2025].
[5] N. D. Na h, A. H. Behzadan, and S. G. Paal, “Deep lea ning o si e
sa e y: Real- ime de ec ion o pe sonal p o ec i e equipmen ,” Au oma-
ion in Cons uc ion, ol. 112, p. 103085, 2020. [Online]. A ailable:
h ps://www.sciencedi ec .com/science/a icle/pii/S0926580519308325
[6] “Labelbox — The da a ac o y o AI eams — labelbox.com,” h ps:
//labelbox.com/, [Accessed 23-01-2025].
[7] “Label Me &x2013; Label P in ing — labelme.g ,” h ps://www.labelme.
g /, [Accessed 23-01-2025].
[8] R. Duan, H. Deng, M. Tian, Y. Deng, and J. Lin, “SODA: Si e Objec
De ec ion dA ase o Deep Lea ning in Cons uc ion,” Feb. 2022,
a Xi :2202.09554. [Online]. A ailable: h p://a xi .o g/abs/2202.09554
[9] “Gi Hub - nj isionpowe /Sa e y-Helme -Wea ing-Da ase : Sa e y helme
wea ing de ec da ase , wi h p e ained model — gi hub.com,” h ps:
//gi hub.com/nj isionpowe /Sa e y-Helme -Wea ing-Da ase , [Accessed
07-01-2025].
[10] “Gi Hub - HumanSignal/labelImg: LabelImg is now pa o he Label
S udio communi y. The popula image anno a ion ool c ea ed by
Tzu alin is no longe ac i ely being de eloped, bu you can check
ou Label S udio, he open sou ce da a labeling ool o images, ex ,
hype ex , audio, ideo and ime-se ies da a. — gi hub.com,” h ps:
//gi hub.com/HumanSignal/labelImg, [Accessed 24-01-2025].
[11] H. M. Ahmad and A. Rahimi, “Sh17: A da ase o human sa e y and
pe sonal p o ec i e equipmen de ec ion in manu ac u ing indus y,”
2024. [Online]. A ailable: h ps://a xi .o g/abs/2407.04590
[12] B. Sekache , N. Mano ich, M. Zhil so , A. Zha o onko , D. Kalinin,
B. Ho , TOsmano , D. K uchinin, A. Zanke ich, Dmi iySidne ,
M. Ma kelo , Johannes222, M. Chenue , a and e, elenachos,
A. Melniko , J. Kim, L. Ilouz, N. Glazo , P iya4607, R. Teh ani,
S. Jeong, V. Skub ie , S. Yoneku a, ugia uong, zliang7, lizhming,
and T. T uong, “openc /c a : 1.1.0,” Aug. 2020. [Online]. A ailable:
h ps://doi.o g/10.5281/zenodo.4009388
[13] S. Ren, K. He, R. Gi shick, and J. Sun, “Fas e -cnn: Towa ds eal- ime
objec de ec ion wi h egion p oposal ne wo ks,” IEEE T ansac ions on
Pa e n Analysis and Machine In elligence, ol. 39, no. 6, pp. 1137–
1149, 2017.
[14] V. Isailo ic, A. Peulic, M. Djapan, M. Sa ko ic, and A. M. Vukice ic,
“The compliance o head-moun ed indus ial ppe by using deep lea ning
objec de ec o s,” Scien i ic Repo s, ol. 12, no. 1, p. 16347, 2022.
[15] J. Redmon, S. Di ala, R. Gi shick, and A. Fa hadi, “You only look
once: Uni ied, eal- ime objec de ec ion,” in 2016 IEEE Con e ence on
Compu e Vision and Pa e n Recogni ion (CVPR), 2016, pp. 779–788.
[16] Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: A simple and s ong
ancho - ee objec de ec o ,” IEEE T ansac ions on Pa e n Analysis and
Machine In elligence, ol. 44, no. 4, pp. 1922–1933, 2022.
[17] X. Zhou, D. Wang, and P. K ¨
ahenb¨
uhl, “Objec s as poin s,” a Xi
p ep in a Xi :1904.07850, 2019.
[18] A. Vaswani, N. Shazee , N. Pa ma , J. Uszko ei , L. Jones, A. N.
Gomez, L. Kaise , and I. Polosukhin, “A en ion is all you need,” 2023.
[Online]. A ailable: h ps://a xi .o g/abs/1706.03762
[19] N. Ca ion, F. Massa, G. Synnae e, N. Usunie , A. Ki illo , and
S. Zago uyko, “End- o-end objec de ec ion wi h ans o me s,” in
Compu e Vision – ECCV 2020, A. Vedaldi, H. Bischo , T. B ox, and
J.-M. F ahm, Eds. Cham: Sp inge In e na ional Publishing, 2020, pp.
213–229.
[20] H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. Ni, and H.-Y. Shum,
“DINO: DETR wi h imp o ed denoising ancho boxes o end- o-end
objec de ec ion,” in The Ele en h In e na ional Con e ence on Lea ning
Rep esen a ions, 2023.
[21] J. Te en, D.-M. C´
o do a-Espa za, and J.-A. Rome o-Gonz´
alez, “A
comp ehensi e e iew o yolo a chi ec u es in compu e ision: F om
yolo 1 o yolo 8 and yolo-nas,” Machine Lea ning and Knowledge
Ex ac ion, ol. 5, no. 4, pp. 1680–1716, 2023.
[22] R. Va ghese and S. M., “Yolo 8: A no el objec de ec ion algo i hm
wi h enhanced pe o mance and obus ness,” in 2024 In e na ional
Con e ence on Ad ances in Da a Enginee ing and In elligen Compu ing
Sys ems (ADICS), 2024, pp. 1–6.
[23] A. Bochko skiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolo 4: Op-
imal speed and accu acy o objec de ec ion,” a Xi p ep in
a Xi :2004.10934, 2020.
[24] T.-Y. Lin, P. Doll´
a , R. Gi shick, K. He, B. Ha iha an, and S. Belongie,
“Fea u e py amid ne wo ks o objec de ec ion,” in 2017 IEEE Con e -
ence on Compu e Vision and Pa e n Recogni ion (CVPR), 2017, pp.
936–944.
[25] G. Tsoumplekas, V. Li, I. Siniosoglou, V. A gy iou, S. K. Goudos, I. D.
Moscholios, P. Radoglou-G amma ikis, and P. Sa igiannidis, “E alua ing
he ene gy e iciency o ew-sho lea ning o objec de ec ion in
indus ial se ings,” a Xi p ep in a Xi :2403.06631, 2024.
[26] T.-Y. Lin, M. Mai e, S. Belongie, J. Hays, P. Pe ona, D. Ramanan,
P. Doll´
a , and C. L. Zi nick, “Mic oso coco: Common objec s in con-
ex ,” in Compu e Vision – ECCV 2014, D. Flee , T. Pajdla, B. Schiele,
and T. Tuy elaa s, Eds. Cham: Sp inge In e na ional Publishing, 2014,
pp. 740–755.
[27] compu e ision, “Wo ke -sa e y da ase ,” jul 2022, isi ed on 2025-02-
03. [Online]. A ailable: h ps://uni e se. obo low.com/compu e - ision/
wo ke -sa e y
[28] G. Joche , A. Chau asia, and J. Qiu, “Ul aly ics yolo 8,” 2023.
[Online]. A ailable: h ps://gi hub.com/ul aly ics/ul aly ics
[29] M. McCloskey and N. J. Cohen, “Ca as ophic in e e ence in connec-
ionis ne wo ks: The sequen ial lea ning p oblem,” in Psychology o
lea ning and mo i a ion. Else ie , 1989, ol. 24, pp. 109–165.
[30] “VANOS S.A.” h ps://www. anos.g /, [Accessed 23-02-2025].