scieee Science in your language
[en] (orig)

A New Loss Function for Simultaneous Object Localization and Classification

Author: Sánchez Chica, Ander,Ugartemendia Telleria, Beñat,Zulueta Guerrero, Ekaitz,Fernández Gámiz, Unai,Gómez Hidalgo, Javier María
Publisher: MDPI
Year: 2023
DOI: 10.3390/math11051205
Source: https://addi.ehu.eus/bitstream/10810/60344/1/mathematics-11-01205.pdf
Ci a ion: Sanchez-Chica, A.;
Uga emendia-Telle ia, B.;
Zulue a, E.; Fe nandez-Gamiz, U.;
Gomez-Hidalgo, J.M. A New Loss
Func ion o Simul aneous Objec
Localiza ion and Classi ica ion.
Ma hema ics 2023,11, 1205.
h ps://doi.o g/10.3390/
ma h11051205
Academic Edi o s: Debiao Meng
and Shui Yu
Recei ed: 25 Janua y 2023
Re ised: 23 Feb ua y 2023
Accep ed: 25 Feb ua y 2023
Published: 1 Ma ch 2023
Copy igh : © 2023 by he au ho s.
Licensee MDPI, Basel, Swi ze land.
This a icle is an open access a icle
dis ibu ed unde he e ms and
condi ions o he C ea i e Commons
A ibu ion (CC BY) license (h ps://
c ea i ecommons.o g/licenses/by/
4.0/).
ma hema ics
A icle
A New Loss Func ion o Simul aneous Objec Localiza ion
and Classi ica ion
Ande Sanchez-Chica 1, Beña Uga emendia-Telle ia 1, Ekai z Zulue a 1,*, Unai Fe nandez-Gamiz 2
and Ja ie Ma ia Gomez-Hidalgo 3
1Sys em Enginee ing and Au oma ion Con ol Depa men , Uni e si y o he Basque Coun y (UPV/EHU),
Nie es Cano 12, 01006 Vi o ia-Gas eiz, Spain
2Depa men o Nuclea and Fluid Mechanics, Uni e si y o he Basque Coun y (UPV/EHU),
Nie es Cano 12, 01006 Vi o ia-Gas eiz, Spain
3MERCEDES BENZ España, Las a enas 1, 10152 Vi o ia-Gas eiz, Spain
*Co espondence: [email p o ec ed]
Abs ac :
Robo s play a pi o al ole in he manu ac u ing indus y. This has led o he de elopmen o
compu e ision. Since AlexNe won ILSVRC, con olu ional neu al ne wo ks (CNNs) ha e achie ed
s a e-o - he-a s a us in his a ea. In his wo k, a no el me hod is p oposed o simul aneously de ec
and p edic he localiza ion o objec s using a cus om loop me hod and a CNN, pe o ming wo
o he mos impo an asks in compu e ision wi h a single me hod. Two di e en loss unc ions
a e p oposed o e alua e he me hod and compa e he esul s. The ob ained esul s show ha he
ne wo k is able o pe o m bo h asks accu a ely, classi ying images co ec ly and loca ing objec s
p ecisely. Rega ding he loss unc ions, when he a ge classi ica ion alues a e compu ed, he
ne wo k pe o ms be e in he localiza ion ask. Following his wo k, imp o emen s a e expec ed o
be made in he localiza ion ask o ne wo ks by e ining he aining p ocesses o he ne wo ks and
loss unc ions.
Keywo ds:
image classi ica ion; objec de ec ion; deep lea ning; deep con olu ional neu al ne wo ks;
compu e ision; cus om aining loop
MSC: 49J05; 49J15
1. In oduc ion
Nowadays, obo s a e essen ial o he manu ac u ing indus y. The use o obo s has
helped he manu ac u ing indus y o manu ac u e p oduc s mo e e icien ly, sa ing bo h
cos s and ime. Despi e he ac ha an inc easing need o obo s has been obse ed in
all indus ial sec o s in ecen yea s, he elec onics indus y has been he main cus ome
o indus ial obo s since 2020, when i o e ook he au omo i e indus y. Howe e , he
la e s ill demands 80,000 obo s a yea ; hence, i is s ill an impo an sec o o obo
manu ac u e s. Indus ial obo manu ac u e s a e making e e y e o o design and
de elop sa e and human- iendly obo s. This is spu ed on by he ac ha small- and
medium-sized companies a e inc easing hei use o indus ial obo s due o he a ailabili y
o a o dable solu ions and easy- o-use collabo a i e obo s. Hence, collabo a i e solu ions,
whe e humans and obo s wo k oge he , a e becoming he new on ie in indus ial
obo ics [
1
,
2
]. The use o collabo a i e obo s is also suppo ed by he cu en end o
au oma ion and da a exchange in manu ac u ing indus ies, also called Indus y 4.0 [3].
In he case o he au omo i e indus y, obo s a e used mainly in he manu ac u ing
p ocess. A he beginning o he 20 h cen u y, when chain p oduc ion was in oduced by he
Fo d Model T, ca s we e handmade. Nowadays, his p ocess is mainly au oma ic. Howe e ,
he e a e s ill asks whe e humans need o in e ene. In his con ex , collabo a i e obo s
can help wo ke s imp o e he e iciency and educe he manu ac u ing aul s o p oduc ion
Ma hema ics 2023,11, 1205. h ps://doi.o g/10.3390/ma h11051205 h ps://www.mdpi.com/jou nal/ma hema ics
Ma hema ics 2023,11, 1205 2 o 13
lines [
4
]. Robo s also help in o he a eas, such as neu osu ge y [
5
]. Since collabo a i e
obo s need o be awa e o hei su oundings, compu e ision can help obo s o de ec
hei en i onmen and objec posi ions.
Image classi ica ion, objec de ec ion and seman ic segmen a ion a e he main asks in
compu e ision. Since LeCun e al. [
6
] p oposed LeNe -5 in 1998 o documen ecogni ion,
con olu ional neu al ne wo ks (CNNs) ha e become he s a e-o - he-a echniques o
hese asks. The explosion o deep lea ning has led o he need o high-quali y, di e se and
s uc u ed image da ase s. In 2010, ImageNe [
7
] was p esen ed as a solu ion o his p oblem.
Following he pa h ha s a ed LeNe -5, in 2012, AlexNe [
8
] won he ImageNe La ge-Scale
Visual Recogni ion Challenge (ILSVRC) [
9
] by ou pe o ming p e ious image-classi ying
s a e-o - he-a echniques. This success led esea che s o imp o e he pe o mance o
CNNs. Following his end o making deepe and la ge ne wo ks, Zeile e al. [
10
]
p oposed ZFNe o unde s and and imp o e he esul s ob ained by K izhe sky e al. [
8
]. In
2014, Simonyan e al. [
11
] p oposed a CNN wi h e y small con olu ion il e s o e alua e
he e ec o he con olu ional ne wo k dep h on i s accu acy in he la ge-scale image
ecogni ion se ing.
To ackle he di icul y o aining e y deep neu al ne wo ks, He e al. [
12
] p esen ed
a esidual lea ning amewo k called Residual Ne wo k (ResNe ), which allows o an
inc ease in he dep h, while main aining he complexi y o he ne wo k. ResNe c oss-
laye connec ions y o sol e he g adien di usion p oblem ha e y deep con olu ional
ne wo ks, such as VGGNe and ZFNe , ha e. Howe e , aining e y deep esidual
ne wo ks is ex emely cos ly. Thus, Zago uyko e al. [
13
] conduc ed an expe imen al
s udy on he a chi ec u e o ResNe blocks. Based on his s udy, hey p oposed a no el
a chi ec u e, whe e hey dec eased he dep h and inc eased he wid h o esidual ne wo ks.
The esul ing ne wo k s uc u es, called wide esidual ne wo ks (WRNs), o e pe o med
hei hin and e y deep coun e pa s.
GoogLeNe [
14
], also called Incep ion- 1, combines incep ion modules wi h con en-
ional con olu ion modules o imp o e he u iliza ion o he compu ing esou ces inside
he ne wo k, hence inc easing he dep h and wid h o he ne wo k, while keeping he com-
pu a ional budge cons an by a ying he sizes o he con olu ion il e s. Incep ion- 4 [
15
]
agg ega es esidual connec ions o he incep ion a chi ec u e o accele a e aining, while
main aining he accu acy o simila ly expensi e incep ion ne wo ks. Huang G. e al. [
16
]
p oposed he Dense Con olu ional Ne wo k (DenseNe ), which connec s each laye o
e e y o he laye downs eam, eusing he ea u es o all p e ious laye s o s eng hen ea-
u e p opaga ion and educe he anishing g adien p oblem. Following he end o dense
connec i i y, wi h CondenseNe V2, Yang e al. [
17
] ensu ed ha each laye simul aneously
lea ned o ca y ou he ollowing:
1. Selec i ely euse he se o he mos impo an ea u es om p eceding laye s;
2.
Ac i ely upda e he se o p eceding ea u es o inc ease hei u ili y o la e lay-
e s, achie ing p omising pe o mance in image classi ica ion (ImageNe ) and objec
de ec ion (MS COCO) in e ms o bo h heo e ical e iciency and p ac ical speed.
In ecen yea s, la ge ad ancemen s ha e been made in image classi ica ion asks [
18
].
The e o e, CNNs ha e g ea alue when he e is a need o iden i y images. Howe e ,
no mally, his ea u e is no use ul when i is used alone. I can be combined wi h a egion
p oposal ne wo k (RPN) and pe o m adi ional objec de ec ion.
The adi ional objec de ec ion me hod consis s o gene a ing egion p oposals i s
using an RPN and hen classi ying each p oposal in o di e en objec ca ego ies [
19
]. This
is he case o R-CNN [
20
]. Ne e heless, his p ocess is no mally e y compu a ionally
cos ly. In o de o ackle his issue, di e en i e a ions o R-CNN ha e been p oposed.
Gi shick e al. [
21
] imp o ed hei o iginal R-CNN o be as e and mo e accu a e.
Ren e al. [
22
] imp o ed his by in oducing an RPN ha sha es ull-image con olu ional
ea u es wi h he de ec ion ne wo k, hus enabling nea ly cos - ee egion p oposals. The
Fas e R-CNN a chi ec u e has achie ed good esul s in objec de ec ion asks. Fo example,
Ma hema ics 2023,11, 1205 3 o 13
Fu e al. [
23
] and Song e al. [
24
] used he Fas e R-CNN based on ZFNe and VGG16,
espec i ely, o de ec kiwi ui s in o de o enable obo s o pick hem up.
The o he objec de ec ion me hod wi h ega d o he ask o eg ession o he classi-
ica ion p oblem adop s a uni ied amewo k o achie e he inal esul s (ca ego ies and
loca ions) di ec ly. Redmon e al. [
25
] p edic ed bounding boxes and hei associa ed class
p obabili ies di ec ly om ull images in one e alua ion. They called his new app oach o
objec de ec ion You Only Look Once (YOLO). The Single-Sho Mul ibox De ec o (SSD) [
26
]
disc e izes he ou pu space o bounding boxes in o a se o de aul boxes, adjus ing hem
by he sco es gene a ed o he p esence o each objec ca ego y in each de aul box in o de
o be e ma ch he objec shape. Cen e Ne , p oposed by Duan e al. [
27
], p esen s an
e icien solu ion based on he de ec ion o each objec as a iple o key poin s a he han
a pai , imp o ing bo h p ecision and ecall.
Howe e , each o hese me hods has i s own issues: he adi ional objec de ec ion
echniques equi e a high compu a ional powe , whe eas he single-s age me hods do no
ha e he same le el o accu acy as he adi ional echniques. In 2017, Li e al. [
28
] p oposed
a wo-s age objec de ec o based on ResNe -101 [
12
] o add ess he sho comings o hese
ypes o de ec o s, ha is, he slow speed o hese ne wo ks due o hei hea y-head designs.
In 2018, Zhang e al. [
29
] p oposed a no el single-sho -based de ec o ha achie es a be e
accu acy han he wo-s age me hods and main ains an e iciency compa able o ha o he
one-s age me hods. Examples o he ad ancemen s ha ha e been made in objec de ec ion
asks in ecen yea s a e in e e ence [30].
In 2019, E icien De [
31
] p oposed a new amily o objec de ec o s based on E i-
cien Ne backbones and op imized he weigh ed bi-di ec ional ea u e py amid ne wo k
(BiFPN) and he compound scaling me hod. In pa icula , he model E icien De -D7
achie ed s a e-o - he-a esul s a MS COCO. Ano he example o his appea ed in 2022,
when Liu e al. [
32
] p esen ed a ne wo k called Con NeX s, cons uc ed en i ely om
s anda d Con Ne modules. These modules a e ResNe modules mode nized owa ds he
design o a ision ans o me , and hey compe e a o ably wi h ans o me s in e ms o
accu acy and scalabili y.
Addi ionally, he ne wo ks obse ed in he li e a u e ocus on single- ask p oblems:
image classi ica ion, objec de ec ion, image ecogni ion, e c. To he bes o ou knowledge,
he e a e no o e y ew examples o CNNs ha ha e been used o simul aneously pe o m
di e en asks. The e o e, we see he need o explo ing image classi ica ion and objec
localiza ion asks using he same CNN. The objec i e o his a icle is o de e mine whe he
bo h asks can be pe o med accu a ely wi h a single CNN. The e o e, we p opose a cus om
e alua ion loop ha me ges he c oss-en opy loss (Ex) o he classi ica ion ask and he
hal mean squa e e o (mse) o he eg ession ask (objec localiza ion). We also compa e
wo di e en loss unc ions using di e en Ex and mse loss p opo ions and de e mine
which me hod is he bes .
2. Ma e ials and Me hods
2.1. Con olu ional Neu al Ne wo k
A CNN is a ype o deep neu al ne wo k ha uses con olu ional laye s o ex ac
ea u e maps om he inpu image. Usually, he ne wo k consis s o one inpu laye ,
one o mo e con olu ional laye s, one ully connec ed laye and one ou pu laye [
33
].
In his case, he ne wo k has wo ully connec ed laye s a he end o he con olu ional
laye s, sepa a ing each one om he main b anch. This allows he ne wo k o pe o m
wo di e en asks using he same con olu ional laye s. A he end o one ully connec ed
laye , a so max laye is connec ed. This b anch pe o ms he classi ica ion ask, while he
o he pe o ms he de ec ion ask. In Figu e 1, he s uc u e o he ne wo k can be seen.
Ma hema ics 2023,11, 1205 4 o 13
Ma hema ics 2023, 11, x FOR PEER REVIEW 4 o 13
Figu e 1. S uc u e o he p oposed CNN.
The inpu laye in has a dimension o 100 × 100 × 1. The e o e, he inpu da a consis
o a single ma ix wi h dimensions o 100 × 100, which con ain he alue o each pixel in
g ay-scale om 0 (black) o 255 (whi e).
The con olu ional laye is he speci ic laye o he CNN. The con olu ional equa ion
used is ha shown in Equa ion (1):
𝑦=(𝑤
 ·𝑥
)+𝑏
 (1)
whe e 𝑦 is he esul , 𝑤 is he il e ma ix, 𝑥 is he inpu o he con olu ional laye ,
and 𝑏 is he bias e m. In his case, he ne wo k ea u es h ee con olu ional laye s. The
i s one has 16 il e s wi h a 5 × 5 size. The second one has 32 il e s wi h a 3 × 3 size.
Finally, he hi d one also has 32 il e s wi h a 3 × 3 size, al hough i has a s ide o one,
ins ead o wo like he second laye . Fu he mo e, he ou pu o his ne wo k uses a non-
linea ac i a ion unc ion (ReLU), as shown in Equa ion (2):
𝑓
(𝑥)=󰇥𝑥,𝑥0
0,𝑥  0 (2)
In o de o speed up he aining and educe he sensi i i y o ne wo k ini ializa ion,
a ba ch no maliza ion laye is included be ween each con olu ional laye and he ReLU
laye . This is achie ed by no malizing a mini-ba ch o da a ac oss all obse a ions o each
channel independen ly. The pa ame e s o he model a e lis ed in de ail in Table 1.
Figu e 1. S uc u e o he p oposed CNN.
The inpu laye in has a dimension o 100
×
100
×
1. The e o e, he inpu da a consis
o a single ma ix wi h dimensions o 100
×
100, which con ain he alue o each pixel in
g ay-scale om 0 (black) o 255 (whi e).
The con olu ional laye is he speci ic laye o he CNN. The con olu ional equa ion
used is ha shown in Equa ion (1):
yj=∑wij·xj+bj(1)
whe e
yj
is he esul ,
wij
is he il e ma ix,
xj
is he inpu o he con olu ional laye , and
bj
is he bias e m. In his case, he ne wo k ea u es h ee con olu ional laye s. The i s one
has 16 il e s wi h a 5
×
5 size. The second one has 32 il e s wi h a 3
×
3 size. Finally, he
hi d one also has 32 il e s wi h a 3
×
3 size, al hough i has a s ide o one, ins ead o wo
like he second laye . Fu he mo e, he ou pu o his ne wo k uses a nonlinea ac i a ion
unc ion (ReLU), as shown in Equa ion (2):
(x)=x,x≥0
0, x<0(2)
In o de o speed up he aining and educe he sensi i i y o ne wo k ini ializa ion,
a ba ch no maliza ion laye is included be ween each con olu ional laye and he ReLU
laye . This is achie ed by no malizing a mini-ba ch o da a ac oss all obse a ions o each
channel independen ly. The pa ame e s o he model a e lis ed in de ail in Table 1.
Ma hema ics 2023,11, 1205 5 o 13
Table 1. Pa ame e s and ou pu shapes o he p oposed CNN model.
Laye
Name
P e ious
Laye Func ion Weigh Fil e Size
/Ke nels Padding S ide Ou pu Tenso
Size
Lea nable
Pa ame e s
in - - - - 100 ×100 ×1 -
con 1 in con 2d 5 ×5×1/16 same 1 100 ×100 ×16 416
bn1 con 1 - - - 100 ×100 ×16 32
elu1 bn1 ReLU - - 100 ×100 ×16 -
con 2 elu1 con 2d 3 ×3×16/32 same 2 50 ×50 ×32 4608
bn2 con 2 - - - 50 ×50 ×32 64
elu2 bn2 ReLU - - 50 ×50 ×32 -
con 3 elu2 con 2d 3 ×3×32/32 same 1 50 ×50 ×32 9216
bn3 con 3 - - - 50 ×50 ×32 64
elu3 bn3 ReLU - - 50 ×50 ×32 -
c1 elu3 - - - 1 ×1×2 160 k
so max c1 so max - - 1 ×1×2 -
c2 elu3 - - - 1 ×1×2 160 k
The p oposed neu al ne wo k con ains 334.4 k pa ame e s and has a model size o
1.22 MB a e aining. This ne wo k was used because i showed good esul s in simila
asks. I was conside ed aluable o use o he ypes o con olu ional neu al ne wo ks, such
as VGG-16 and ZFNe , bu hese ne wo ks had oo many pa ame e s o his applica ion,
and, hus, aining would ake oo long o e alua e he pe o mance o he p oposed cus om
aining loop wi h a cus om loss unc ion.
We analyzed he basic s uc u e o he p oposed con olu ional neu al ne wo k; in he
nex subsec ion, he lea ning p ocess o he ne wo k is discussed.
2.2. Lea ning P ocess
The lea ning p ocess o a deep lea ning ne wo k consis s o h ee s eps: da a acquisi-
ion, da a p epa a ion and model aining. In he cu en wo k, he i s s ep consis s o
cap u ing images o he su oundings o he pin. The images a e aken using a came a ha
cap u es images o 612
×
512 pixels. The images a e in g ay-scale and a e sa ed as a i ile.
The second s ep consis s o p epa ing he da a o ain he ne wo k. The i s ask is
o label he images. A e manually iden i ying he pin in each image, he da a a e used
o gene a e images s a ing om he seed images. The iden i ica ion is made by d awing
a ec angle su ounding he pin. The cen e pixel o he ma ked ec angle is aken as he
loca ion o he pin, which is hen used in he aining p ocess as he a ge alue. Then,
om each seed image, 5 images a e ob ained. In hese images, he posi ion o he pin
is he same, bu he con as and he b igh ness o he images a e andomly modi ied
using Equa ions (3)–(5):
Con as ac o : C =1−0.2· and, (3)
B igh ness ac o : B =0.3·( and −0.5), (4)
Iij =C ·Isij +B (5)
whe e
and
is a andom alue be ween 0 and 1,
Isij
is he seed pixel alue, and
Iij
is he
esul ing pixel alue. This is applied o all seed images o ob ain 1620 images.
These images, howe e , s ill ha e a size o 612
×
512. In o de o ain he ne wo k,
he images need o be ans o med so ha hei size is 100
×
100. The e o e, each image
ecei es a andom ans o ma ion, whe e a 100
×
100 size egion is chosen om each
image. This is ca ied ou by andomly selec ing whe he he image has a pin, he chance
o which is 50/50. A he end o he ans o ma ion, he e a e 810 images wi h a pin and
810 wi hou a pin.
The inal s ep o he aining consis s o he model aining i sel . In his case, a cus om
aining loop is used. MATLAB is he so wa e chosen o de elop he di e en algo i hms

Ma hema ics 2023,11, 1205 6 o 13
ha a e in ol ed in his wo k. This so wa e has di e en ools o de elop and ain deep
neu al ne wo ks. One o hese unc ionali ies is o ain cus om aining loops, upda ing
he lea nable pa ame e s o he ne wo k using di e en sol e s. In his case, he Adam
(adap a i e momen es ima ion) sol e is used [34].
In his p ocess, each mini-ba ch o da a is e alua ed using he modeloG adien s
unc ion. The modeloG adien s unc ion akes he ollowing as inpu s: he ne wo k and a
mini-ba ch o inpu da a, wi h he co esponding a ge s T1 and T2 con aining he labels
and posi ions, espec i ely. Then, i e u ns he g adien s o he loss wi h espec o he
lea nable pa ame e s, he upda ed ne wo k s a e and he co esponding loss.
The loss o each mini-ba ch
θ
is calcula ed by adding he c oss-en opy loss o he
classi ica ion ask and he hal mean squa ed e o , wi h he la e mul iplied by ac o
λ=0.1, ollowing Equa ion (6):
lossθ=lossEx,θ+λ·lossmse,θ(6)
The c oss-en opy loss (Ex) o each mini-ba ch θis calcula ed using Equa ion (7):
lossEx,θ=−1
N
N
∑
n=1
K
∑
i=1
ni ln yni (7)
whe e
N
is he numbe o samples,
K
is he numbe o classes,
ni
is he indica o showing
ha he
n h
sample belongs o he
i h
class, and
yni
is he ou pu o sample
n
o class
i
.
Tha is, yni is he p obabili y ha he ne wo k associa es he n h inpu wi h class i.
The hal mean squa ed e o (mse) ope a ion compu es he hal mean squa ed e o
loss be ween he ne wo k p edic ions and a ge alues o eg ession asks. The loss o
each mini-ba ch θis calcula ed using he ollowing Equa ion (8):
lossmse,θ=1
2N
M
∑
i=1
(Xi−Ti)2(8)
whe e
Xi
is he ne wo k p edic ion,
Ti
is he a ge alue,
M
is he o al numbe o esponses
in X(ac oss all obse a ions), and Nis he o al numbe o obse a ions in X.
A e wa ds, he calcula ed g adien s a e used o upda e he lea nable pa ame e s o
he ne wo k. This p ocess con inues un il he aining ends, which is when he aining
eaches 200 epochs. Each mini-ba ch consis s o 60 elemen s. The e o e, 5400 i e a ions a e
pe o med. The pa ame e s o he Adam sol e a e lis ed in Table 2.
Table 2. Pa ame e alues o he Adam sol e .
Pa ame e Value
Lea n a e 0.001
G adien decay ac o 0.9
Squa ed g adien decay ac o 0.999
Epsilon * 10−8
* Small cons an o p e en ing di ide-by-ze o e o s.
Du ing he aining, a alida ion e alua ion is pe o med. This is ca ied ou o ensu e
ha he aining is pe o ming well and ha he esul s a e con e ging. To pe o m his
ask, a new da ase is c ea ed ollowing he same s eps as hose used o he aining da a.
In his case, 3 images a e ob ained om each seed image in o de o speed up he alida ion
p ocess. This da ase is e alua ed as he aining da ase in g oups o 60 da a samples. A
he end o each aining epoch, all he alida ion da a a e e alua ed, and he a e age loss
alue is e u ned by he algo i hm.
The i s loss unc ion is based on a cons an a io be ween he wo di e en losses.
Rega ding he second loss unc ion, we only wan o pe o m he localiza ion ask when
he ne wo k de ec s an objec in o de o e alua e whe he his app oach imp o es he
Ma hema ics 2023,11, 1205 7 o 13
e ec i eness o he ne wo k. This new loss unc ion is also based on he c oss-en opy loss
o he classi ica ion ask and he hal mean squa e e o o he eg ession ask. Howe e ,
he combina ion o bo h is no a simple cons an a io, as wi h he i s loss unc ion. A
i s , we hough ha he loss unc ion only needed o ake in o accoun he c oss-en opy
loss when he classi ica ion was no pe o med co ec ly, because ying o loca e a pin in
an image ha does no ha e one would no be co ec . The e o e, he loss unc ion ha
was p oposed included he a ge alues o he classi ica ion ask, as well as he ne wo k
p edic ion. Howe e , he use o he p edic ions o calcula e he loss led he ne wo k
o classi y all images in one g oup due o he lea nable pa ame e s being ela ed o he
p edic ions. Because o his, i was decided ha he ne wo k p edic ions should no be
used. Consequen ly, only he a ge alues o he classi ica ion ask a e used. In he images
whe e he e is no pin, only he c oss-en opy loss is used o calcula e he o e all loss. In
he o he case, he hal mean squa ed e o is also compu ed. This is ca ied ou wi h he
objec i e o only aking in o accoun he localiza ion ask when he e is a pin o loca e. All
his is pe o med in each image µo he mini-ba ch θusing Equa ion (9):
lossθ=lossEx,θ+1
N∑ µ
pc1·lossmse,µ(9)
whe e
µ
pc1
is he a ge p obabili y ha image
µ
con ains a pin,
lossEx,θ
is he c oss-en opy
loss o he mini-ba ch
θ
(Equa ion (7)),
lossmse,µ
is he hal mean squa e e o loss o he
image µ(Equa ion (8)), and Nis he numbe o images in he mini-ba ch θ.
As wi h he i s p oposed loss in his a icle, his loss is used o calcula e he g adi-
en s o he loss wi h espec o he lea nable pa ame e s in o de o upda e he la e o
imp o e he p edic ions o he ne wo k. The same base ne wo k is used o compa e he
ob ained esul s.
A e inalizing he aining, he same alida ion da a a e used o e alua e he aining.
A his poin , 10 andomly selec ed images a e chosen o e alua e he ne wo k pe o mance.
The same images a e used o e alua e he aining o he second loss unc ion. The e o e,
bo h esul s a e di ec ly compa able and allow one o conclude whe he he p oposed
me hod is e ec i e and which loss unc ion has he bes pe o mance.
3. Resul s
In his sec ion, he esul s o he in es iga ion a e p esen ed. Fi s , he ne wo k is
ained using he p esen ed loss unc ion. The loss du ing he aining and he a e age
alida ion loss a e p esen ed in Figu e 2. The quick d op ha appea s in he i s i e a ions
sugges s ha he classi ica ion o he images is op imized ea ly in he aining. The alues
ob ained a he end o he aining a e collec ed in Table 3.
Table 3. T aining p ope ies alues.
P ope y Value
Epoch 200
I e a ion 5400
T aining ime 54 min 37 sec
Loss 2.11
Classi ica ion loss 0
Reg ession loss 21.087
Valida ion loss 1.87
Valida ion classi ica ion loss 2×10−4
Valida ion eg ession loss 18.699
Ma hema ics 2023,11, 1205 8 o 13
Ma hema ics 2023, 11, x FOR PEER REVIEW 8 o 13
Figu e 2. The loss du ing he aining p ocess ( ed) and he a e age alida ion e o ob ained o
each epoch o he aining da a (blue) o he i s p oposed loss unc ion. In his igu e, he down-
wa d end o he loss alue can be seen. The loss alue con inues o d op a a low bu cons an a e
a e he quick d op o he ea ly i e a ions.
Table 3. T aining p ope ies alues (Figu e 2).
P ope y Value
Epoch 200
I e a ion 5400
T aining ime 54 min 37 sec
Loss 2.11
Classi ica ion loss 0
Reg ession loss 21.087
Valida ion loss 1.87
Valida ion classi ica ion loss 2 × 10
-4
Valida ion eg ession loss 18.699
Looking a he esul s in Table 4, i can be seen ha he ne wo k achie es e y good
esul s in he classi ica ion ask, labeling mos o he images co ec ly. A e analyzing all
he images in he alida ion da ase , 10 images we e andomly selec ed o expose he ain-
ing esul s. Rega ding he pin localiza ion, he esul s can be imp o ed. Mos o he ime,
he ne wo k is able o loca e he pin wi h decen p ecision. Howe e , he localiza ion ask
ails when he e is no pin in he image, o example, as shown in images 816, 165 and 836
in Figu e 3. I can also be no ed ha image 357 is no classi ied co ec ly, al hough he
localiza ion ask is pe o med accu a ely.
Figu e 2.
The loss du ing he aining p ocess ( ed) and he a e age alida ion e o ob ained o
each epoch o he aining da a (blue) o he i s p oposed loss unc ion. In his igu e, he downwa d
end o he loss alue can be seen. The loss alue con inues o d op a a low bu cons an a e a e
he quick d op o he ea ly i e a ions.
Looking a he esul s in Table 4, i can be seen ha he ne wo k achie es e y good
esul s in he classi ica ion ask, labeling mos o he images co ec ly. A e analyzing
all he images in he alida ion da ase , 10 images we e andomly selec ed o expose he
aining esul s. Rega ding he pin localiza ion, he esul s can be imp o ed. Mos o he
ime, he ne wo k is able o loca e he pin wi h decen p ecision. Howe e , he localiza ion
ask ails when he e is no pin in he image, o example, as shown in images 816, 165 and
836 in Figu e 3. I can also be no ed ha image 357 is no classi ied co ec ly, al hough he
localiza ion ask is pe o med accu a ely.
Table 4. The alues o he analyzed alida ion images.
Image
Index
T1 Y1 T2 Y2
Loss Classi ica ion Reg ession
No Pin Pin No Pin Pin x y x y
774 0 1 0 1 47 25 44.19 26.49 0.505 0 5.053
34 1 0 1 0 0 0 −0.40 −2.47 0.314 0 3.142
816 1 0 1 0 0 0 29.24 33.41 98.58 0 985.8
869 0 1 0 1 43 55 37.63 47.26 4.438 0 44.384
11 0 1 0 1 60 66 55.52 64.16 1.173 0 11.728
165 1 0 1 0 0 0 10.94 15.25 17.624 0 176.24
836 1 0 1 0 0 0 10.53 7.06 8.036 0 80.364
357 0 1 1 0 31 25 21.04 16.74 44.416 36.044 83.724
697 0 1 0 1 28 31 33.59 40.84 6.401 0 64.014
827 0 1 0 1 28 29 21.69 22.37 4.193 1.59 ×10−441.93
T1 and T2 a e he a ge alues o each image. Y1 and Y2 a e he p edic ions made by he ne wo k. Loss is he
o al loss; Classi ica ion is he c oss-en opy e o ; Reg ession is he hal mean squa e e o . All he alues smalle
han 10−4a e conside ed null.
Ma hema ics 2023,11, 1205 9 o 13
Ma hema ics 2023, 11, x FOR PEER REVIEW 9 o 13
Figu e 3. Images used o analyze he pe o mance o he ne wo k. The ed ma king shows he eal
posi ion o he pin (manually labeled), whe eas he blue ma king shows he p edic ion o he ne -
wo k.
Table 4. The alues o he analyzed alida ion images (Figu e 3).
Image
Index
T1 Y1 T2 Y2
Loss Classi ica ion Reg ession
No Pin Pin No Pin Pin x y x y
774 0 1 0 1 47 25 44.19 26.49 0.505 0 5.053
34 1 0 1 0 0 0 −0.40 −2.47 0.314 0 3.142
816 1 0 1 0 0 0 29.24 33.41 98.58 0 985.8
869 0 1 0 1 43 55 37.63 47.26 4.438 0 44.384
11 0 1 0 1 60 66 55.52 64.16 1.173 0 11.728
165 1 0 1 0 0 0 10.94 15.25 17.624 0 176.24
836 1 0 1 0 0 0 10.53 7.06 8.036 0 80.364
357 0 1 1 0 31 25 21.04 16.74 44.416 36.044 83.724
697 0 1 0 1 28 31 33.59 40.84 6.401 0 64.014
827 0 1 0 1 28 29 21.69 22.37 4.193 1.59 × 10
-4
41.93
T1 and T2 a e he a ge alues o each image. Y1 and Y2 a e he p edic ions made by he ne wo k.
Loss is he o al loss; Classi ica ion is he c oss-en opy e o ; Reg ession is he hal mean squa e
e o . All he alues smalle han 10
−4
a e conside ed null.
The same obse a ion can be made wi h he second p oposed loss. In Figu e 4, he
loss du ing he aining and he a e age alida ion loss a e p esen ed.
Figu e 3.
Images used o analyze he pe o mance o he ne wo k. The ed ma king shows he eal
posi ion o he pin (manually labeled), whe eas he blue ma king shows he p edic ion o he ne wo k.
The same obse a ion can be made wi h he second p oposed loss. In Figu e 4, he
loss du ing he aining and he a e age alida ion loss a e p esen ed.
Ma hema ics 2023, 11, x FOR PEER REVIEW 9 o 13
Figu e 3. Images used o analyze he pe o mance o he ne wo k. The ed ma king shows he eal
posi ion o he pin (manually labeled), whe eas he blue ma king shows he p edic ion o he ne -
wo k.
Table 4. The alues o he analyzed alida ion images (Figu e 3).
Image
Index
T1 Y1 T2 Y2
Loss Classi ica ion Reg ession
No Pin Pin No Pin Pin x y x y
774 0 1 0 1 47 25 44.19 26.49 0.505 0 5.053
34 1 0 1 0 0 0 −0.40 −2.47 0.314 0 3.142
816 1 0 1 0 0 0 29.24 33.41 98.58 0 985.8
869 0 1 0 1 43 55 37.63 47.26 4.438 0 44.384
11 0 1 0 1 60 66 55.52 64.16 1.173 0 11.728
165 1 0 1 0 0 0 10.94 15.25 17.624 0 176.24
836 1 0 1 0 0 0 10.53 7.06 8.036 0 80.364
357 0 1 1 0 31 25 21.04 16.74 44.416 36.044 83.724
697 0 1 0 1 28 31 33.59 40.84 6.401 0 64.014
827 0 1 0 1 28 29 21.69 22.37 4.193 1.59 × 10
-4
41.93
T1 and T2 a e he a ge alues o each image. Y1 and Y2 a e he p edic ions made by he ne wo k.
Loss is he o al loss; Classi ica ion is he c oss-en opy e o ; Reg ession is he hal mean squa e
e o . All he alues smalle han 10
−4
a e conside ed null.
The same obse a ion can be made wi h he second p oposed loss. In Figu e 4, he
loss du ing he aining and he a e age alida ion loss a e p esen ed.
Figu e 4.
The loss du ing he aining p ocess ( ed) and he a e age alida ion e o ob ained o each
epoch o he aining da a (blue) o he second p oposed loss unc ion. In his igu e, he downwa d
end o he loss alue can be seen. Compa ing he alues wi h hose in Figu e 2, he loss alue is
highe a he beginning, al hough a he end o he aining p ocess, he alues con e ge, as can be
seen in Table 5.
Table 5. T aining p ope ies alues.
P ope y Value
Epoch 200
I e a ion 5400
T aining ime 52 min 17 sec
Loss 4.497
Classi ica ion loss 0
Reg ession loss 8.431
Valida ion loss 6.461
Valida ion classi ica ion loss 0
Valida ion eg ession loss 12.922