A New Loss Function for Simultaneous Object Localization and Classification

Author: Sánchez Chica, Ander,Ugartemendia Telleria, Beñat,Zulueta Guerrero, Ekaitz,Fernández Gámiz, Unai,Gómez Hidalgo, Javier María

Publisher: MDPI

Year: 2023

DOI: 10.3390/math11051205

Source: https://addi.ehu.eus/bitstream/10810/60344/1/mathematics-11-01205.pdf

Ci a ion: Sanchez-Chica, A.;
Uga emendia-Telle ia, B.;
Zulue a, E.; Fe nandez-Gamiz, U.;
Gomez-Hidalgo, J.M. A New Loss
Func ion o Simul aneous Objec
Localiza ion and Classi ica ion.
Ma hema ics 2023,11, 1205.
h ps://doi.o g/10.3390/
ma h11051205
Academic Edi o s: Debiao Meng
and Shui Yu
Recei ed: 25 Janua y 2023
Re ised: 23 Feb ua y 2023
Accep ed: 25 Feb ua y 2023
Published: 1 Ma ch 2023
Copy igh : © 2023 by he au ho s.
Licensee MDPI, Basel, Swi ze land.
This a icle is an open access a icle
dis ibu ed unde he e ms and
condi ions o he C ea i e Commons
A ibu ion (CC BY) license (h ps://
c ea i ecommons.o g/licenses/by/
4.0/).
ma hema ics
A icle
A New Loss Func ion o Simul aneous Objec Localiza ion
and Classi ica ion
Ande Sanchez-Chica 1, Beña Uga emendia-Telle ia 1, Ekai z Zulue a 1,*, Unai Fe nandez-Gamiz 2
and Ja ie Ma ia Gomez-Hidalgo 3
1Sys em Enginee ing and Au oma ion Con ol Depa men , Uni e si y o he Basque Coun y (UPV/EHU),
Nie es Cano 12, 01006 Vi o ia-Gas eiz, Spain
2Depa men o Nuclea and Fluid Mechanics, Uni e si y o he Basque Coun y (UPV/EHU),
Nie es Cano 12, 01006 Vi o ia-Gas eiz, Spain
3MERCEDES BENZ España, Las a enas 1, 10152 Vi o ia-Gas eiz, Spain
*Co espondence: [email p o ec ed]
Abs ac :
Robo s play a pi o al ole in he manu ac u ing indus y. This has led o he de elopmen o
compu e ision. Since AlexNe won ILSVRC, con olu ional neu al ne wo ks (CNNs) ha e achie ed
s a e-o - he-a s a us in his a ea. In his wo k, a no el me hod is p oposed o simul aneously de ec
and p edic he localiza ion o objec s using a cus om loop me hod and a CNN, pe o ming wo
o he mos impo an asks in compu e ision wi h a single me hod. Two di e en loss unc ions
a e p oposed o e alua e he me hod and compa e he esul s. The ob ained esul s show ha he
ne wo k is able o pe o m bo h asks accu a ely, classi ying images co ec ly and loca ing objec s
p ecisely. Rega ding he loss unc ions, when he a ge classi ica ion alues a e compu ed, he
ne wo k pe o ms be e in he localiza ion ask. Following his wo k, imp o emen s a e expec ed o
be made in he localiza ion ask o ne wo ks by e ining he aining p ocesses o he ne wo ks and
loss unc ions.
Keywo ds:
image classi ica ion; objec de ec ion; deep lea ning; deep con olu ional neu al ne wo ks;
compu e ision; cus om aining loop
MSC: 49J05; 49J15
1. In oduc ion
Nowadays, obo s a e essen ial o he manu ac u ing indus y. The use o obo s has
helped he manu ac u ing indus y o manu ac u e p oduc s mo e e icien ly, sa ing bo h
cos s and ime. Despi e he ac ha an inc easing need o obo s has been obse ed in
all indus ial sec o s in ecen yea s, he elec onics indus y has been he main cus ome
o indus ial obo s since 2020, when i o e ook he au omo i e indus y. Howe e , he
la e s ill demands 80,000 obo s a yea ; hence, i is s ill an impo an sec o o obo
manu ac u e s. Indus ial obo manu ac u e s a e making e e y e o o design and
de elop sa e and human- iendly obo s. This is spu ed on by he ac ha small- and
medium-sized companies a e inc easing hei use o indus ial obo s due o he a ailabili y
o a o dable solu ions and easy- o-use collabo a i e obo s. Hence, collabo a i e solu ions,
whe e humans and obo s wo k oge he , a e becoming he new on ie in indus ial
obo ics [
1
,
2
]. The use o collabo a i e obo s is also suppo ed by he cu en end o
au oma ion and da a exchange in manu ac u ing indus ies, also called Indus y 4.0 [3].
In he case o he au omo i e indus y, obo s a e used mainly in he manu ac u ing
p ocess. A he beginning o he 20 h cen u y, when chain p oduc ion was in oduced by he
Fo d Model T, ca s we e handmade. Nowadays, his p ocess is mainly au oma ic. Howe e ,
he e a e s ill asks whe e humans need o in e ene. In his con ex , collabo a i e obo s
can help wo ke s imp o e he e iciency and educe he manu ac u ing aul s o p oduc ion
Ma hema ics 2023,11, 1205. h ps://doi.o g/10.3390/ma h11051205 h ps://www.mdpi.com/jou nal/ma hema ics
Ma hema ics 2023,11, 1205 2 o 13
lines [
4
]. Robo s also help in o he a eas, such as neu osu ge y [
5
]. Since collabo a i e
obo s need o be awa e o hei su oundings, compu e ision can help obo s o de ec
hei en i onmen and objec posi ions.
Image classi ica ion, objec de ec ion and seman ic segmen a ion a e he main asks in
compu e ision. Since LeCun e al. [
6
] p oposed LeNe -5 in 1998 o documen ecogni ion,
con olu ional neu al ne wo ks (CNNs) ha e become he s a e-o - he-a echniques o
hese asks. The explosion o deep lea ning has led o he need o high-quali y, di e se and
s uc u ed image da ase s. In 2010, ImageNe [
7
] was p esen ed as a solu ion o his p oblem.
Following he pa h ha s a ed LeNe -5, in 2012, AlexNe [
8
] won he ImageNe La ge-Scale
Visual Recogni ion Challenge (ILSVRC) [
9
] by ou pe o ming p e ious image-classi ying
s a e-o - he-a echniques. This success led esea che s o imp o e he pe o mance o
CNNs. Following his end o making deepe and la ge ne wo ks, Zeile e al. [
10
]
p oposed ZFNe o unde s and and imp o e he esul s ob ained by K izhe sky e al. [
8
]. In
2014, Simonyan e al. [
11
] p oposed a CNN wi h e y small con olu ion il e s o e alua e
he e ec o he con olu ional ne wo k dep h on i s accu acy in he la ge-scale image
ecogni ion se ing.
To ackle he di icul y o aining e y deep neu al ne wo ks, He e al. [
12
] p esen ed
a esidual lea ning amewo k called Residual Ne wo k (ResNe ), which allows o an
inc ease in he dep h, while main aining he complexi y o he ne wo k. ResNe c oss-
laye connec ions y o sol e he g adien di usion p oblem ha e y deep con olu ional
ne wo ks, such as VGGNe and ZFNe , ha e. Howe e , aining e y deep esidual
ne wo ks is ex emely cos ly. Thus, Zago uyko e al. [
13
] conduc ed an expe imen al
s udy on he a chi ec u e o ResNe blocks. Based on his s udy, hey p oposed a no el
a chi ec u e, whe e hey dec eased he dep h and inc eased he wid h o esidual ne wo ks.
The esul ing ne wo k s uc u es, called wide esidual ne wo ks (WRNs), o e pe o med
hei hin and e y deep coun e pa s.
GoogLeNe [
14
], also called Incep ion- 1, combines incep ion modules wi h con en-
ional con olu ion modules o imp o e he u iliza ion o he compu ing esou ces inside
he ne wo k, hence inc easing he dep h and wid h o he ne wo k, while keeping he com-
pu a ional budge cons an by a ying he sizes o he con olu ion il e s. Incep ion- 4 [
15
]
agg ega es esidual connec ions o he incep ion a chi ec u e o accele a e aining, while
main aining he accu acy o simila ly expensi e incep ion ne wo ks. Huang G. e al. [
16
]
p oposed he Dense Con olu ional Ne wo k (DenseNe ), which connec s each laye o
e e y o he laye downs eam, eusing he ea u es o all p e ious laye s o s eng hen ea-
u e p opaga ion and educe he anishing g adien p oblem. Following he end o dense
connec i i y, wi h CondenseNe V2, Yang e al. [
17
] ensu ed ha each laye simul aneously
lea ned o ca y ou he ollowing:
1. Selec i ely euse he se o he mos impo an ea u es om p eceding laye s;
2.
Ac i ely upda e he se o p eceding ea u es o inc ease hei u ili y o la e lay-
e s, achie ing p omising pe o mance in image classi ica ion (ImageNe ) and objec
de ec ion (MS COCO) in e ms o bo h heo e ical e iciency and p ac ical speed.
In ecen yea s, la ge ad ancemen s ha e been made in image classi ica ion asks [
18
].
The e o e, CNNs ha e g ea alue when he e is a need o iden i y images. Howe e ,
no mally, his ea u e is no use ul when i is used alone. I can be combined wi h a egion
p oposal ne wo k (RPN) and pe o m adi ional objec de ec ion.
The adi ional objec de ec ion me hod consis s o gene a ing egion p oposals i s
using an RPN and hen classi ying each p oposal in o di e en objec ca ego ies [
19
]. This
is he case o R-CNN [
20
]. Ne e heless, his p ocess is no mally e y compu a ionally
cos ly. In o de o ackle his issue, di e en i e a ions o R-CNN ha e been p oposed.
Gi shick e al. [
21
] imp o ed hei o iginal R-CNN o be as e and mo e accu a e.
Ren e al. [
22
] imp o ed his by in oducing an RPN ha sha es ull-image con olu ional
ea u es wi h he de ec ion ne wo k, hus enabling nea ly cos - ee egion p oposals. The
Fas e R-CNN a chi ec u e has achie ed good esul s in objec de ec ion asks. Fo example,
Ma hema ics 2023,11, 1205 3 o 13
Fu e al. [
23
] and Song e al. [
24
] used he Fas e R-CNN based on ZFNe and VGG16,
espec i ely, o de ec kiwi ui s in o de o enable obo s o pick hem up.
The o he objec de ec ion me hod wi h ega d o he ask o eg ession o he classi-
ica ion p oblem adop s a uni ied amewo k o achie e he inal esul s (ca ego ies and
loca ions) di ec ly. Redmon e al. [
25
] p edic ed bounding boxes and hei associa ed class
p obabili ies di ec ly om ull images in one e alua ion. They called his new app oach o
objec de ec ion You Only Look Once (YOLO). The Single-Sho Mul ibox De ec o (SSD) [
26
]
disc e izes he ou pu space o bounding boxes in o a se o de aul boxes, adjus ing hem
by he sco es gene a ed o he p esence o each objec ca ego y in each de aul box in o de
o be e ma ch he objec shape. Cen e Ne , p oposed by Duan e al. [
27
], p esen s an
e icien solu ion based on he de ec ion o each objec as a iple o key poin s a he han
a pai , imp o ing bo h p ecision and ecall.
Howe e , each o hese me hods has i s own issues: he adi ional objec de ec ion
echniques equi e a high compu a ional powe , whe eas he single-s age me hods do no
ha e he same le el o accu acy as he adi ional echniques. In 2017, Li e al. [
28
] p oposed
a wo-s age objec de ec o based on ResNe -101 [
12
] o add ess he sho comings o hese
ypes o de ec o s, ha is, he slow speed o hese ne wo ks due o hei hea y-head designs.
In 2018, Zhang e al. [
29
] p oposed a no el single-sho -based de ec o ha achie es a be e
accu acy han he wo-s age me hods and main ains an e iciency compa able o ha o he
one-s age me hods. Examples o he ad ancemen s ha ha e been made in objec de ec ion
asks in ecen yea s a e in e e ence [30].
In 2019, E icien De [
31
] p oposed a new amily o objec de ec o s based on E i-
cien Ne backbones and op imized he weigh ed bi-di ec ional ea u e py amid ne wo k
(BiFPN) and he compound scaling me hod. In pa icula , he model E icien De -D7
achie ed s a e-o - he-a esul s a MS COCO. Ano he example o his appea ed in 2022,
when Liu e al. [
32
] p esen ed a ne wo k called Con NeX s, cons uc ed en i ely om
s anda d Con Ne modules. These modules a e ResNe modules mode nized owa ds he
design o a ision ans o me , and hey compe e a o ably wi h ans o me s in e ms o
accu acy and scalabili y.
Addi ionally, he ne wo ks obse ed in he li e a u e ocus on single- ask p oblems:
image classi ica ion, objec de ec ion, image ecogni ion, e c. To he bes o ou knowledge,
he e a e no o e y ew examples o CNNs ha ha e been used o simul aneously pe o m
di e en asks. The e o e, we see he need o explo ing image classi ica ion and objec
localiza ion asks using he same CNN. The objec i e o his a icle is o de e mine whe he
bo h asks can be pe o med accu a ely wi h a single CNN. The e o e, we p opose a cus om
e alua ion loop ha me ges he c oss-en opy loss (Ex) o he classi ica ion ask and he
hal mean squa e e o (mse) o he eg ession ask (objec localiza ion). We also compa e
wo di e en loss unc ions using di e en Ex and mse loss p opo ions and de e mine
which me hod is he bes .
2. Ma e ials and Me hods
2.1. Con olu ional Neu al Ne wo k
A CNN is a ype o deep neu al ne wo k ha uses con olu ional laye s o ex ac
ea u e maps om he inpu image. Usually, he ne wo k consis s o one inpu laye ,
one o mo e con olu ional laye s, one ully connec ed laye and one ou pu laye [
33
].
In his case, he ne wo k has wo ully connec ed laye s a he end o he con olu ional
laye s, sepa a ing each one om he main b anch. This allows he ne wo k o pe o m
wo di e en asks using he same con olu ional laye s. A he end o one ully connec ed
laye , a so max laye is connec ed. This b anch pe o ms he classi ica ion ask, while he
o he pe o ms he de ec ion ask. In Figu e 1, he s uc u e o he ne wo k can be seen.
Ma hema ics 2023,11, 1205 4 o 13
Ma hema ics 2023, 11, x FOR PEER REVIEW 4 o 13
Figu e 1. S uc u e o he p oposed CNN.
The inpu laye in has a dimension o 100 × 100 × 1. The e o e, he inpu da a consis
o a single ma ix wi h dimensions o 100 × 100, which con ain he alue o each pixel in
g ay-scale om 0 (black) o 255 (whi e).
The con olu ional laye is he speci ic laye o he CNN. The con olu ional equa ion
used is ha shown in Equa ion (1):
𝑦=(𝑤
 ·𝑥
)+𝑏
 (1)
whe e 𝑦 is he esul , 𝑤 is he il e ma ix, 𝑥 is he inpu o he con olu ional laye ,
and 𝑏 is he bias e m. In his case, he ne wo k ea u es h ee con olu ional laye s. The
i s one has 16 il e s wi h a 5 × 5 size. The second one has 32 il e s wi h a 3 × 3 size.
Finally, he hi d one also has 32 il e s wi h a 3 × 3 size, al hough i has a s ide o one,
ins ead o wo like he second laye . Fu he mo e, he ou pu o his ne wo k uses a non-
linea ac i a ion unc ion (ReLU), as shown in Equa ion (2):
𝑓
(𝑥)=󰇥𝑥,𝑥0
0,𝑥  0 (2)
In o de o speed up he aining and educe he sensi i i y o ne wo k ini ializa ion,
a ba ch no maliza ion laye is included be ween each con olu ional laye and he ReLU
laye . This is achie ed by no malizing a mini-ba ch o da a ac oss all obse a ions o each
channel independen ly. The pa ame e s o he model a e lis ed in de ail in Table 1.
Figu e 1. S uc u e o he p oposed CNN.
The inpu laye in has a dimension o 100
×
100
×
1. The e o e, he inpu da a consis
o a single ma ix wi h dimensions o 100
×
100, which con ain he alue o each pixel in
g ay-scale om 0 (black) o 255 (whi e).
The con olu ional laye is he speci ic laye o he CNN. The con olu ional equa ion
used is ha shown in Equa ion (1):
yj=∑wij·xj+bj(1)
whe e
yj
is he esul ,
wij
is he il e ma ix,
xj
is he inpu o he con olu ional laye , and
bj
is he bias e m. In his case, he ne wo k ea u es h ee con olu ional laye s. The i s one
has 16 il e s wi h a 5
×
5 size. The second one has 32 il e s wi h a 3
×
3 size. Finally, he
hi d one also has 32 il e s wi h a 3
×
3 size, al hough i has a s ide o one, ins ead o wo
like he second laye . Fu he mo e, he ou pu o his ne wo k uses a nonlinea ac i a ion
unc ion (ReLU), as shown in Equa ion (2):
(x)=x,x≥0
0, x<0(2)
In o de o speed up he aining and educe he sensi i i y o ne wo k ini ializa ion,
a ba ch no maliza ion laye is included be ween each con olu ional laye and he ReLU
laye . This is achie ed by no malizing a mini-ba ch o da a ac oss all obse a ions o each
channel independen ly. The pa ame e s o he model a e lis ed in de ail in Table 1.
Ma hema ics 2023,11, 1205 5 o 13
Table 1. Pa ame e s and ou pu shapes o he p oposed CNN model.
Laye
Name
P e ious
Laye Func ion Weigh Fil e Size
/Ke nels Padding S ide Ou pu Tenso
Size
Lea nable
Pa ame e s
in - - - - 100 ×100 ×1 -
con 1 in con 2d 5 ×5×1/16 same 1 100 ×100 ×16 416
bn1 con 1 - - - 100 ×100 ×16 32
elu1 bn1 ReLU - - 100 ×100 ×16 -
con 2 elu1 con 2d 3 ×3×16/32 same 2 50 ×50 ×32 4608
bn2 con 2 - - - 50 ×50 ×32 64
elu2 bn2 ReLU - - 50 ×50 ×32 -
con 3 elu2 con 2d 3 ×3×32/32 same 1 50 ×50 ×32 9216
bn3 con 3 - - - 50 ×50 ×32 64
elu3 bn3 ReLU - - 50 ×50 ×32 -
c1 elu3 - - - 1 ×1×2 160 k
so max c1 so max - - 1 ×1×2 -
c2 elu3 - - - 1 ×1×2 160 k
The p oposed neu al ne wo k con ains 334.4 k pa ame e s and has a model size o
1.22 MB a e aining. This ne wo k was used because i showed good esul s in simila
asks. I was conside ed aluable o use o he ypes o con olu ional neu al ne wo ks, such
as VGG-16 and ZFNe , bu hese ne wo ks had oo many pa ame e s o his applica ion,
and, hus, aining would ake oo long o e alua e he pe o mance o he p oposed cus om
aining loop wi h a cus om loss unc ion.
We analyzed he basic s uc u e o he p oposed con olu ional neu al ne wo k; in he
nex subsec ion, he lea ning p ocess o he ne wo k is discussed.
2.2. Lea ning P ocess
The lea ning p ocess o a deep lea ning ne wo k consis s o h ee s eps: da a acquisi-
ion, da a p epa a ion and model aining. In he cu en wo k, he i s s ep consis s o
cap u ing images o he su oundings o he pin. The images a e aken using a came a ha
cap u es images o 612
×
512 pixels. The images a e in g ay-scale and a e sa ed as a i ile.
The second s ep consis s o p epa ing he da a o ain he ne wo k. The i s ask is
o label he images. A e manually iden i ying he pin in each image, he da a a e used
o gene a e images s a ing om he seed images. The iden i ica ion is made by d awing
a ec angle su ounding he pin. The cen e pixel o he ma ked ec angle is aken as he
loca ion o he pin, which is hen used in he aining p ocess as he a ge alue. Then,
om each seed image, 5 images a e ob ained. In hese images, he posi ion o he pin
is he same, bu he con as and he b igh ness o he images a e andomly modi ied
using Equa ions (3)–(5):
Con as ac o : C =1−0.2· and, (3)
B igh ness ac o : B =0.3·( and −0.5), (4)
Iij =C ·Isij +B (5)
whe e
and
is a andom alue be ween 0 and 1,
Isij
is he seed pixel alue, and
Iij
is he
esul ing pixel alue. This is applied o all seed images o ob ain 1620 images.
These images, howe e , s ill ha e a size o 612
×
512. In o de o ain he ne wo k,
he images need o be ans o med so ha hei size is 100
×
100. The e o e, each image
ecei es a andom ans o ma ion, whe e a 100
×
100 size egion is chosen om each
image. This is ca ied ou by andomly selec ing whe he he image has a pin, he chance
o which is 50/50. A he end o he ans o ma ion, he e a e 810 images wi h a pin and
810 wi hou a pin.
The inal s ep o he aining consis s o he model aining i sel . In his case, a cus om
aining loop is used. MATLAB is he so wa e chosen o de elop he di e en algo i hms

Ma hema ics 2023,11, 1205 6 o 13
ha a e in ol ed in his wo k. This so wa e has di e en ools o de elop and ain deep
neu al ne wo ks. One o hese unc ionali ies is o ain cus om aining loops, upda ing
he lea nable pa ame e s o he ne wo k using di e en sol e s. In his case, he Adam
(adap a i e momen es ima ion) sol e is used [34].
In his p ocess, each mini-ba ch o da a is e alua ed using he modeloG adien s
unc ion. The modeloG adien s unc ion akes he ollowing as inpu s: he ne wo k and a
mini-ba ch o inpu da a, wi h he co esponding a ge s T1 and T2 con aining he labels
and posi ions, espec i ely. Then, i e u ns he g adien s o he loss wi h espec o he
lea nable pa ame e s, he upda ed ne wo k s a e and he co esponding loss.
The loss o each mini-ba ch
θ
is calcula ed by adding he c oss-en opy loss o he
classi ica ion ask and he hal mean squa ed e o , wi h he la e mul iplied by ac o
λ=0.1, ollowing Equa ion (6):
lossθ=lossEx,θ+λ·lossmse,θ(6)
The c oss-en opy loss (Ex) o each mini-ba ch θis calcula ed using Equa ion (7):
lossEx,θ=−1
N
N
∑
n=1
K
∑
i=1
ni ln yni (7)
whe e
N
is he numbe o samples,
K
is he numbe o classes,
ni
is he indica o showing
ha he
n h
sample belongs o he
i h
class, and
yni
is he ou pu o sample
n
o class
i
.
Tha is, yni is he p obabili y ha he ne wo k associa es he n h inpu wi h class i.
The hal mean squa ed e o (mse) ope a ion compu es he hal mean squa ed e o
loss be ween he ne wo k p edic ions and a ge alues o eg ession asks. The loss o
each mini-ba ch θis calcula ed using he ollowing Equa ion (8):
lossmse,θ=1
2N
M
∑
i=1
(Xi−Ti)2(8)
whe e
Xi
is he ne wo k p edic ion,
Ti
is he a ge alue,
M
is he o al numbe o esponses
in X(ac oss all obse a ions), and Nis he o al numbe o obse a ions in X.
A e wa ds, he calcula ed g adien s a e used o upda e he lea nable pa ame e s o
he ne wo k. This p ocess con inues un il he aining ends, which is when he aining
eaches 200 epochs. Each mini-ba ch consis s o 60 elemen s. The e o e, 5400 i e a ions a e
pe o med. The pa ame e s o he Adam sol e a e lis ed in Table 2.
Table 2. Pa ame e alues o he Adam sol e .
Pa ame e Value
Lea n a e 0.001
G adien decay ac o 0.9
Squa ed g adien decay ac o 0.999
Epsilon * 10−8
* Small cons an o p e en ing di ide-by-ze o e o s.
Du ing he aining, a alida ion e alua ion is pe o med. This is ca ied ou o ensu e
ha he aining is pe o ming well and ha he esul s a e con e ging. To pe o m his
ask, a new da ase is c ea ed ollowing he same s eps as hose used o he aining da a.
In his case, 3 images a e ob ained om each seed image in o de o speed up he alida ion
p ocess. This da ase is e alua ed as he aining da ase in g oups o 60 da a samples. A
he end o each aining epoch, all he alida ion da a a e e alua ed, and he a e age loss
alue is e u ned by he algo i hm.
The i s loss unc ion is based on a cons an a io be ween he wo di e en losses.
Rega ding he second loss unc ion, we only wan o pe o m he localiza ion ask when
he ne wo k de ec s an objec in o de o e alua e whe he his app oach imp o es he
Ma hema ics 2023,11, 1205 7 o 13
e ec i eness o he ne wo k. This new loss unc ion is also based on he c oss-en opy loss
o he classi ica ion ask and he hal mean squa e e o o he eg ession ask. Howe e ,
he combina ion o bo h is no a simple cons an a io, as wi h he i s loss unc ion. A
i s , we hough ha he loss unc ion only needed o ake in o accoun he c oss-en opy
loss when he classi ica ion was no pe o med co ec ly, because ying o loca e a pin in
an image ha does no ha e one would no be co ec . The e o e, he loss unc ion ha
was p oposed included he a ge alues o he classi ica ion ask, as well as he ne wo k
p edic ion. Howe e , he use o he p edic ions o calcula e he loss led he ne wo k
o classi y all images in one g oup due o he lea nable pa ame e s being ela ed o he
p edic ions. Because o his, i was decided ha he ne wo k p edic ions should no be
used. Consequen ly, only he a ge alues o he classi ica ion ask a e used. In he images
whe e he e is no pin, only he c oss-en opy loss is used o calcula e he o e all loss. In
he o he case, he hal mean squa ed e o is also compu ed. This is ca ied ou wi h he
objec i e o only aking in o accoun he localiza ion ask when he e is a pin o loca e. All
his is pe o med in each image µo he mini-ba ch θusing Equa ion (9):
lossθ=lossEx,θ+1
N∑ µ
pc1·lossmse,µ(9)
whe e
µ
pc1
is he a ge p obabili y ha image
µ
con ains a pin,
lossEx,θ
is he c oss-en opy
loss o he mini-ba ch
θ
(Equa ion (7)),
lossmse,µ
is he hal mean squa e e o loss o he
image µ(Equa ion (8)), and Nis he numbe o images in he mini-ba ch θ.
As wi h he i s p oposed loss in his a icle, his loss is used o calcula e he g adi-
en s o he loss wi h espec o he lea nable pa ame e s in o de o upda e he la e o
imp o e he p edic ions o he ne wo k. The same base ne wo k is used o compa e he
ob ained esul s.
A e inalizing he aining, he same alida ion da a a e used o e alua e he aining.
A his poin , 10 andomly selec ed images a e chosen o e alua e he ne wo k pe o mance.
The same images a e used o e alua e he aining o he second loss unc ion. The e o e,
bo h esul s a e di ec ly compa able and allow one o conclude whe he he p oposed
me hod is e ec i e and which loss unc ion has he bes pe o mance.
3. Resul s
In his sec ion, he esul s o he in es iga ion a e p esen ed. Fi s , he ne wo k is
ained using he p esen ed loss unc ion. The loss du ing he aining and he a e age
alida ion loss a e p esen ed in Figu e 2. The quick d op ha appea s in he i s i e a ions
sugges s ha he classi ica ion o he images is op imized ea ly in he aining. The alues
ob ained a he end o he aining a e collec ed in Table 3.
Table 3. T aining p ope ies alues.
P ope y Value
Epoch 200
I e a ion 5400
T aining ime 54 min 37 sec
Loss 2.11
Classi ica ion loss 0
Reg ession loss 21.087
Valida ion loss 1.87
Valida ion classi ica ion loss 2×10−4
Valida ion eg ession loss 18.699
Ma hema ics 2023,11, 1205 8 o 13
Ma hema ics 2023, 11, x FOR PEER REVIEW 8 o 13
Figu e 2. The loss du ing he aining p ocess ( ed) and he a e age alida ion e o ob ained o
each epoch o he aining da a (blue) o he i s p oposed loss unc ion. In his igu e, he down-
wa d end o he loss alue can be seen. The loss alue con inues o d op a a low bu cons an a e
a e he quick d op o he ea ly i e a ions.
Table 3. T aining p ope ies alues (Figu e 2).
P ope y Value
Epoch 200
I e a ion 5400
T aining ime 54 min 37 sec
Loss 2.11
Classi ica ion loss 0
Reg ession loss 21.087
Valida ion loss 1.87
Valida ion classi ica ion loss 2 × 10
-4
Valida ion eg ession loss 18.699
Looking a he esul s in Table 4, i can be seen ha he ne wo k achie es e y good
esul s in he classi ica ion ask, labeling mos o he images co ec ly. A e analyzing all
he images in he alida ion da ase , 10 images we e andomly selec ed o expose he ain-
ing esul s. Rega ding he pin localiza ion, he esul s can be imp o ed. Mos o he ime,
he ne wo k is able o loca e he pin wi h decen p ecision. Howe e , he localiza ion ask
ails when he e is no pin in he image, o example, as shown in images 816, 165 and 836
in Figu e 3. I can also be no ed ha image 357 is no classi ied co ec ly, al hough he
localiza ion ask is pe o med accu a ely.
Figu e 2.
The loss du ing he aining p ocess ( ed) and he a e age alida ion e o ob ained o
each epoch o he aining da a (blue) o he i s p oposed loss unc ion. In his igu e, he downwa d
end o he loss alue can be seen. The loss alue con inues o d op a a low bu cons an a e a e
he quick d op o he ea ly i e a ions.
Looking a he esul s in Table 4, i can be seen ha he ne wo k achie es e y good
esul s in he classi ica ion ask, labeling mos o he images co ec ly. A e analyzing
all he images in he alida ion da ase , 10 images we e andomly selec ed o expose he
aining esul s. Rega ding he pin localiza ion, he esul s can be imp o ed. Mos o he
ime, he ne wo k is able o loca e he pin wi h decen p ecision. Howe e , he localiza ion
ask ails when he e is no pin in he image, o example, as shown in images 816, 165 and
836 in Figu e 3. I can also be no ed ha image 357 is no classi ied co ec ly, al hough he
localiza ion ask is pe o med accu a ely.
Table 4. The alues o he analyzed alida ion images.
Image
Index
T1 Y1 T2 Y2
Loss Classi ica ion Reg ession
No Pin Pin No Pin Pin x y x y
774 0 1 0 1 47 25 44.19 26.49 0.505 0 5.053
34 1 0 1 0 0 0 −0.40 −2.47 0.314 0 3.142
816 1 0 1 0 0 0 29.24 33.41 98.58 0 985.8
869 0 1 0 1 43 55 37.63 47.26 4.438 0 44.384
11 0 1 0 1 60 66 55.52 64.16 1.173 0 11.728
165 1 0 1 0 0 0 10.94 15.25 17.624 0 176.24
836 1 0 1 0 0 0 10.53 7.06 8.036 0 80.364
357 0 1 1 0 31 25 21.04 16.74 44.416 36.044 83.724
697 0 1 0 1 28 31 33.59 40.84 6.401 0 64.014
827 0 1 0 1 28 29 21.69 22.37 4.193 1.59 ×10−441.93
T1 and T2 a e he a ge alues o each image. Y1 and Y2 a e he p edic ions made by he ne wo k. Loss is he
o al loss; Classi ica ion is he c oss-en opy e o ; Reg ession is he hal mean squa e e o . All he alues smalle
han 10−4a e conside ed null.
Ma hema ics 2023,11, 1205 9 o 13
Ma hema ics 2023, 11, x FOR PEER REVIEW 9 o 13
Figu e 3. Images used o analyze he pe o mance o he ne wo k. The ed ma king shows he eal
posi ion o he pin (manually labeled), whe eas he blue ma king shows he p edic ion o he ne -
wo k.
Table 4. The alues o he analyzed alida ion images (Figu e 3).
Image
Index
T1 Y1 T2 Y2
Loss Classi ica ion Reg ession
No Pin Pin No Pin Pin x y x y
774 0 1 0 1 47 25 44.19 26.49 0.505 0 5.053
34 1 0 1 0 0 0 −0.40 −2.47 0.314 0 3.142
816 1 0 1 0 0 0 29.24 33.41 98.58 0 985.8
869 0 1 0 1 43 55 37.63 47.26 4.438 0 44.384
11 0 1 0 1 60 66 55.52 64.16 1.173 0 11.728
165 1 0 1 0 0 0 10.94 15.25 17.624 0 176.24
836 1 0 1 0 0 0 10.53 7.06 8.036 0 80.364
357 0 1 1 0 31 25 21.04 16.74 44.416 36.044 83.724
697 0 1 0 1 28 31 33.59 40.84 6.401 0 64.014
827 0 1 0 1 28 29 21.69 22.37 4.193 1.59 × 10
-4
41.93
T1 and T2 a e he a ge alues o each image. Y1 and Y2 a e he p edic ions made by he ne wo k.
Loss is he o al loss; Classi ica ion is he c oss-en opy e o ; Reg ession is he hal mean squa e
e o . All he alues smalle han 10
−4
a e conside ed null.
The same obse a ion can be made wi h he second p oposed loss. In Figu e 4, he
loss du ing he aining and he a e age alida ion loss a e p esen ed.
Figu e 3.
Images used o analyze he pe o mance o he ne wo k. The ed ma king shows he eal
posi ion o he pin (manually labeled), whe eas he blue ma king shows he p edic ion o he ne wo k.
The same obse a ion can be made wi h he second p oposed loss. In Figu e 4, he
loss du ing he aining and he a e age alida ion loss a e p esen ed.
Ma hema ics 2023, 11, x FOR PEER REVIEW 9 o 13
Figu e 3. Images used o analyze he pe o mance o he ne wo k. The ed ma king shows he eal
posi ion o he pin (manually labeled), whe eas he blue ma king shows he p edic ion o he ne -
wo k.
Table 4. The alues o he analyzed alida ion images (Figu e 3).
Image
Index
T1 Y1 T2 Y2
Loss Classi ica ion Reg ession
No Pin Pin No Pin Pin x y x y
774 0 1 0 1 47 25 44.19 26.49 0.505 0 5.053
34 1 0 1 0 0 0 −0.40 −2.47 0.314 0 3.142
816 1 0 1 0 0 0 29.24 33.41 98.58 0 985.8
869 0 1 0 1 43 55 37.63 47.26 4.438 0 44.384
11 0 1 0 1 60 66 55.52 64.16 1.173 0 11.728
165 1 0 1 0 0 0 10.94 15.25 17.624 0 176.24
836 1 0 1 0 0 0 10.53 7.06 8.036 0 80.364
357 0 1 1 0 31 25 21.04 16.74 44.416 36.044 83.724
697 0 1 0 1 28 31 33.59 40.84 6.401 0 64.014
827 0 1 0 1 28 29 21.69 22.37 4.193 1.59 × 10
-4
41.93
T1 and T2 a e he a ge alues o each image. Y1 and Y2 a e he p edic ions made by he ne wo k.
Loss is he o al loss; Classi ica ion is he c oss-en opy e o ; Reg ession is he hal mean squa e
e o . All he alues smalle han 10
−4
a e conside ed null.
The same obse a ion can be made wi h he second p oposed loss. In Figu e 4, he
loss du ing he aining and he a e age alida ion loss a e p esen ed.
Figu e 4.
The loss du ing he aining p ocess ( ed) and he a e age alida ion e o ob ained o each
epoch o he aining da a (blue) o he second p oposed loss unc ion. In his igu e, he downwa d
end o he loss alue can be seen. Compa ing he alues wi h hose in Figu e 2, he loss alue is
highe a he beginning, al hough a he end o he aining p ocess, he alues con e ge, as can be
seen in Table 5.
Table 5. T aining p ope ies alues.
P ope y Value
Epoch 200
I e a ion 5400
T aining ime 52 min 17 sec
Loss 4.497
Classi ica ion loss 0
Reg ession loss 8.431
Valida ion loss 6.461
Valida ion classi ica ion loss 0
Valida ion eg ession loss 12.922

Related note

Why institutions use Plag.ai for originality review, entry 47
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by research administrators in North America, Europe, Latin America, and international online education, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also stronger evidence for review committees, more reliable review records, and clearer documentation of academic decisions. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For research files, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai