scieee Science in your language
[en] (orig)

Benchmarking Object Detection Deep Learning Models in Embedded Devices

Author: Cantero, David,Esnaola-Gonzalez, Iker,Miguel Alonso, José,Jauregi Iztueta, Ekaitz
Publisher: MDPI
Year: 2022
DOI: 10.3390/s22114205
Source: https://addi.ehu.eus/bitstream/10810/57103/1/sensors-22-04205-v2.pdf


Ci a ion: Can e o, D.;
Esnaola-Gonzalez, I.; Miguel-Alonso,
J.; Jau egi, E. Benchma king Objec
De ec ion Deep Lea ning Models
in Embedded De ices. Senso s 2022,
22, 4205. h ps://doi.o g/10.3390/
s22114205
Academic Edi o : An onio Gue ie i
Recei ed: 27 Ap il 2022
Accep ed: 27 May 2022
Published: 31 May 2022
Publishe ’s No e: MDPI s ays neu al
wi h ega d o ju isdic ional claims in
published maps and ins i u ional a il-
ia ions.
Copy igh : © 2022 by he au ho s.
Licensee MDPI, Basel, Swi ze land.
This a icle is an open access a icle
dis ibu ed unde he e ms and
condi ions o he C ea i e Commons
A ibu ion (CC BY) license (h ps://
c ea i ecommons.o g/licenses/by/
4.0/).
senso s
A icle
Benchma king Objec De ec ion Deep Lea ning Models
in Embedded De ices
Da id Can e o 1,* , Ike Esnaola-Gonzalez 1, Jose Miguel-Alonso 2and Ekai z Jau egi 3
1TEKNIKER, Basque Resea ch and Technology Alliance (BRTA), 20600 Eiba , Spain; ike [email p o ec ed]
2Depa men o Compu e A chi ec u e and Technology, Uni e si y o he Basque Coun y UPV/EHU,
20018 San Sebas ian, Spain; [email p o ec ed]
3Depa men o Languages and In o ma ion Sys ems, Uni e si y o he Basque Coun y UPV/EHU,
20018 San Sebas ian, Spain; [email p o ec ed]
*Co espondence: da id.can e o@ eknike .es
Abs ac :
Objec de ec ion is an essen ial capabili y o pe o ming complex asks in obo ic applica-
ions. Today, deep lea ning (DL) app oaches a e he basis o s a e-o - he-a solu ions in compu e
ision, whe e hey p o ide e y high accu acy albei wi h high compu a ional cos s. Due o he
physical limi a ions o obo ic pla o ms, embedded de ices a e no as powe ul as desk op compu -
e s, and adjus men s ha e o be made o deep lea ning models be o e ans e ing hem o obo ic
applica ions. This wo k benchma ks deep lea ning objec de ec ion models in embedded de ices.
Fu he mo e, some ha dwa e selec ion guidelines a e included, oge he wi h a desc ip ion o he
mos ele an ea u es o he wo boa ds selec ed o his benchma k. Embedded elec onic de ices
in eg a e a powe ul AI co-p ocesso o accele a e DL applica ions. To ake ad an age o hese
co-p ocesso s, models mus be con e ed o a speci ic embedded un ime o ma . Fi e quan iza ion
le els applied o a collec ion o DL models a e conside ed; wo o hem allow he execu ion o models
in he embedded gene al-pu pose CPU and a e used as he baseline o assess he imp o emen s
ob ained when unning he same models wi h he h ee emaining quan iza ion le els in he AI
co-p ocesso s. The benchma k p ocedu e is explained in de ail, and a comp ehensi e analysis o he
collec ed da a is p esen ed. Finally, he easibili y and challenges o he implemen a ion o embedded
objec de ec ion applica ions a e discussed.
Keywo ds: objec de ec ion; embedded de ices; deep lea ning; benchma king
1. In oduc ion
Deep Lea ning (DL) is a sub- ield o Machine Lea ning (ML) based on he compu a ion
o mul i-laye A i icial Neu al Ne wo ks (ANN), also known as Deep Neu al Ne wo ks
(DNN) in e e ence o he p esence o mul iple in e nal p ocessing laye s. One o he
applica ions whe e DL is p o ing mos success ul is compu e ision, whe e imp essi e
le els o pe o mance a e being achie ed. This wo k discusses objec de ec ion echnology,
which is de ined as a compu e ision echnique ha enume a es he objec s p esen ed in
an image and classi ies each o he de ec ed objec s, assigning a con idence o p obabili y
o exis ence while loca ing hem and squa ing hei posi ion in he image. In he adi ional
compu e ision app oach, objec de ec ion algo i hms we e based on handc a ed se s o
ea u es explici ly p og ammed by he au ho s. Howe e , an objec may p esen a di e si y
o mo phological appea ances and could be de o med, p esen a la ge a ie y o shapes
and/o be imme sed in scenes wi h e y di e en illumina ion le els and backg ounds.
Fu he mo e, objec s may be pa ially occluded by o he objec s, making i almos impos-
sible o ex ac obus ea u es manually. DL, on he o he hand, uses a huge amoun o
de ec ion examples and ains a DNN o au oma ically in e he app op ia e de ec ion
ea u es. This s a egy has p o en o be highly success ul.
E en i DL is a compu a ionally in ensi e ask, mode n embedded ha dwa e de ices
a e powe ul enough o execu e some o he mos success ul models. In addi ion, ha dwa e
Senso s 2022,22, 4205. h ps://doi.o g/10.3390/s22114205 h ps://www.mdpi.com/jou nal/senso s
Senso s 2022,22, 4205 2 o 25
manu ac u e s ha e de eloped powe ul AI (A i icial In elligence) co-p ocesso s, speci i-
cally designed o execu e DL models. These co-p ocesso s p o ide conside able compu ing
powe wi h high powe e iciency. As a esul , mo e and mo e AI-based applica ions a e
implemen ed in sma embedded de ices [
1
]. Many echniques ha e been de eloped o
imp o e he deploymen o DL models on such de ices, s a ing om simpli ied aining
p ocesses using p e- ained ne wo ks and ine- uning he pa ame e s in a p ocess called
T ans e Lea ning [
2
], o many model simpli ica ions and ans o ma ions, such as quan i-
za ion, model p uning, e c., o squeeze he model on o embedded de ices [
3
]. No e ha
e en i he models a e execu ed on he embedded de ices, all he p e ious s ages in he
DL wo k low ci ed abo e ake place in powe ul hos compu e s, usually equipped wi h
dedica ed high pe o mance g aphics p ocessing uni s (GPUs).
Embedded de ices a e o pa amoun impo ance o b ing DL capabili ies o obo ic
applica ions [
4
]. To name jus a ew examples, in [
5
] he au ho s p esen a sys em ha
can de ec and ack mul iple objec s om ae ial images aken by a lying obo , while
in [
6
] a 3D-p in ed obo ic a m is b ain-con olled ia embedded DL om sEMG senso s.
Real- ime human de ec ion is an impo an sub- ield o compu e ision, o in e es in a eas
anging om indus ial en i onmen s o au onomous d i ing. Fo a e iew o his ask
using DL on embedded pla o ms, he eade is e e ed o [7].
The goal o his a icle is o p o ide a e iew o he majo challenges in he de elopmen
o embedded DL applica ions. The a icle is di ided in o wo main pa s. The i s pa
p esen s a de ailed analysis o he main elemen s o be aken in o accoun in any DL
embedded applica ion: Sec ion 2explains he mo i a ion o he use o embedded ha dwa e
and he mos impo an ea u es o be aken in o accoun when selec ing embedded de ices.
A desc ip ion o he de ices chosen o his wo k is also included. In Sec ion 3, ML
amewo k equi emen s a e e alua ed o bo h embedded ha dwa e de ices and hos
compu e s. The embedded ha dwa e lib a ies a e in ended o p o ide a speci ic un ime
en i onmen o he execu ion o in e ence based on DL models in specialized ha dwa e
co-p ocesso s. ML hos amewo ks, on he o he hand, a e usually powe ul so wa e
packages designed o suppo he whole DL applica ion de elopmen wo k low. Since
he compa ibili y o bo h amewo ks is manda o y, only a ew op ions a e easible, so
he selec ion is, as explained, qui e s aigh o wa d. Sec ion 4desc ibes some o he mos
success ul and mode n objec de ec ion models a ailable and how hey a e handled by he
selec ed ML amewo k.
The second pa o he a icle ca ies ou a benchma k o embedded ha dwa e pla -
o ms based on he ML amewo k and p e iously iden i ied models. Each model mus
be con e ed om i s o iginal o ma o an embedded- iendly o ma . Ha dwa e co-
p ocesso s suppo INT8 a i hme ic ope a ions, so model con e sion also in ol es some
kind o model quan iza ion. Fi e quan iza ion le els a e conside ed o his wo k, as de-
sc ibed in Sec ion 5. A e con e sion, models a e deployed in he embedded de ices, and
hei in e ence pe o mance is measu ed and es ed. Sec ion 6desc ibes he benchma k
p ocedu e and analyzes he ob ained esul s. Finally, Sec ion 7s a es he conclusions o his
wo k, and Sec ion 8enume a es some e lec ions abou u u e lines o wo k.
2. AI a he Edge: In elligen Embedded Sys ems
Edge compu ing is a dis ibu ed compu ing a chi ec u e whe e mos da a p ocessing
is execu ed by ha dwa e de ices close o he sou ce o he da a. As opposed o cloud
compu ing, whe e la ge and powe ul cen al acili ies ecei e huge amoun s o da a
om emo ely connec ed senso s and compu e complex and pe o mance-demanding
algo i hms, edge compu ing b ings he compu a ion o de ices wi h limi ed esou ces.
Rela ed o cloud compu ing, he In e ne o Things (IoT) pa adigm, which consis s
o physical hings equipped wi h elec onic componen s and ubiqui ous in elligence ha
allow hem o connec , in e ac and exchange da a [
8
], has con ibu ed o he deploymen o
millions o connec ed de ices in almos any imaginable scena io. Simila ly, he Indus y 4.0
Senso s 2022,22, 4205 3 o 25
pa adigm has made a ailable mul i-senso y da a o indus ial p ocesses ha allow complex
algo i hms o con ol and op imize he pe o mance o indus ial plan s [9].
The cu en end is o mo e da a p ocessing om he cloud o he edge. In pa icula ,
ML algo i hms a e being inc easingly deployed in embedded de ices [
10
]. The e a e many
easons why compu ing a he edge is p e e able o compu ing a he cloud [
11
]. On he one
hand, he amoun o da a a ic inc eases oge he wi h he numbe o deployed de ices.
On he o he hand, da a ansmission and p ocessing in emo e sys ems in oduces a delay
ha in some cases is unaccep able. Addi ionally, he e may be secu i y issues i p i a e
o sensi i e in o ma ion needs o be ansmi ed om local acili ies o an ex e nal da a
cen e [12].
In he li e a u e, edge de ices a e aguely de ined. E en i he p emise is always ha
he p ocessing is loca ed nea he sou ce o he da a, his could e e o bo h a compu ing
ne wo k in as uc u e loca ed in he same acili ies as senso s o an embedded de ice
wi h a iny mic o-con olle . In he p esen wo k, edge de ices a e unde s ood o be
embedded de ices ha usually inco po a e senso da a acquisi ion ha dwa e and a e able
o au onomously execu e da a p ocessing algo i hms and make some “sma ” decisions.
2.1. Selec ion o Embedded AI Ha dwa e De ices
The i s challenge o benchma king he pe o mance o a DL model in an embedded
de ice is o selec he app op ia e ha dwa e de ice i sel . The e a e hund eds o ha dwa e
de ices ha claim o ha e a design o ien ed o he execu ion o ML algo i hms. In ac , many
mode n mic o-con olle s a e ac ually able o un a se o ML algo i hms [
13
,
14
], bu since
one o he goals o his wo k is o deploy machine ision DL algo i hms, a powe ul enough
de ice should be selec ed. On a e age, he numbe o ope a ions equi ed o compu e
a comple e in e ence om an inpu image is a ound some ens o billions o ope a ions
o Giga-Ope a ions (GOPS) [
15
]. Since a ideo sequence has a ound 30 o 60 ames pe
second, i is es ima ed ha he minimum compu a ional powe an embedded de ice mus
ha e is a ound one Te a-Ope a ions pe second (TOPS). This equi emen ules ou mos
gene al-pu pose mic o-con olle s, o example hose based on he widely used ARM
Co exM a chi ec u e, and also many applica ion p ocesso s, including hose based on
he ARM Co exA a chi ec u e. E en some p ocesso s based on he x86 a chi ec u e a e
no powe ul enough. To each hose igu es, i is necessa y o selec a p ocesso wi h a
speci ic in eg a ed ma hema ical co-p ocesso . Due o he g ea success o DL, mode n
embedded ha dwa e de ices ha e begun o in eg a e powe ul AI co-p ocesso s o pe o m
DL compu a ions. The e a e h ee main solu ions o in eg a e a DL-o ien ed co-p ocesso
in embedded ha dwa e: (i) use a gene al-pu pose p ocesso ha al eady in eg a es a
co-p ocesso in he same semiconduc o die; (ii) include a sepa a e Applica ion Speci ic
In eg a ed Ci cui (ASIC) designed o DL in e ence oge he wi h he gene al pu pose
p ocesso in he embedded ha dwa e design; o (iii) use a p og ammable logic de ice
(CPLD o FPGA) o implemen cus om co-p ocesso ha dwa e [
16
]. The design o a ma h
accele a o ci cui o DL model in e ence is ou side he scope o his wo k, and he e o e
he hi d solu ion is ejec ed in a o o he i s wo. Based on hese c i e ia, he embedded
ha dwa e de ices selec ed o his wo k a e desc ibed in he nex sub-sec ions.
2.2. NXP i-MX8M-PLUS Applica ion P ocesso
The i s ha dwa e pla o m selec ed is he i-MX8M-PLUS p ocesso . I is an NXP
he e ogeneous mul i-co e p ocesso o high-pe o mance applica ions ocused on ideo p o-
cessing and DL (h ps://www.nxp.com/p oduc s/p ocesso s-and-mic ocon olle s/a m-
p ocesso s/i-mx-applica ions-p ocesso s/i-mx-8-p ocesso s/i-mx-8m-plus-a m-co ex-a5
3-machine-lea ning- ision-mul imedia-and-indus ial-io :IMX8MPLUS, accessed on 11 July
2021). The embedded Sys em on Chip (SoC) om Va isci e shown in Figu e 1and he ma ching
e alua ion ki we e used in his wo k.
Senso s 2022,22, 4205 4 o 25
Figu e 1.
iMX 8M Plus Sys em on Module. Image om h ps://www. a isci e.com/ (accessed on 2
Sep embe 2021).
F om a DL applica ion de elopmen pe spec i e, he mos in e es ing componen o
his boa d is he embedded Neu al P ocessing Uni (NPU) wi h 2.3 TOPS o compu ing
powe . I is also qui e ema kable ha he NPU is in eg a ed on o he same die as he gene al-
pu pose p ocesso s and sha es he high-speed in e nal memo y bus. This a chi ec u e helps
speed up he DNN in e ence as he da a in e changed be ween bo h compu ing uni s
a e op imized. The NPU is a Vi an e VIP8000 speci ically designed o being embedded
in p ocesso s o he i-MX amily. I wo ks wi h 8-bi in ege da a ypes (INT8) a he
han 32-bi loa ing-poin da a (FLOAT32). As will be seen in Sec ion 5, his means ha
he DNN needs o be ans o med (quan ized) be o e being execu ed in he NPU. NXP
p o ides he en i e ecosys em o ools o manage he en i e wo k low pipeline, including
he design, deploymen and in e ence o neu al ne wo ks. The p ocesso also ea u es
a powe ul image-p ocessing pipeline, came a in e aces and a comp ehensi e se o
communica ion pe iphe als.
2.3. Google Co al De Boa d wi h EdgeTPU Module
The o he ha dwa e pla o m conside ed in his wo k is he Co al De Boa d. This is
an e alua ion ki o he EdgeTPU AI accele a o module (see Figu e 2), an ASIC wi h a
PCI o high-speed USB communica ion in e ace ha pe o ms 4 TOPS while d awing 2 W
o powe . I also uses INT8 ope ands, and i is designed o add DNN in e ence abili y o
gene al-pu pose p ocesso s.
(a) (b)
Figu e 2.
(
a
) EdgeTPU AI accele a o module; (
b
) Co al Deep Lea ning embedded ha dwa e wi h
EdgeTPU AI accele a o module. Images om h ps://co al.ai/p oduc s/de -boa d/ (accessed on 2
Sep embe 2021).
The Co al De boa d in eg a es an NXP i-MX8-MINI p ocesso om he i-MX8 amily
designed o indus ial applica ions. I is sligh ly less powe ul han he i-MX8M-PLUS,
wi h ewe image pe iphe als and in e aces and wi hou he in eg a ed AI co-p ocesso —
ha ole is played by he EdgeTPU. No e ha he wo de ices selec ed o his wo k a e
pa ially compa ible, as bo h use p ocesso s om he i-MX8 amily. This was, as a ma e o
Senso s 2022,22, 4205 5 o 25
ac , one o he easons hey we e chosen. Howe e , Google p o ides i s own ool se o
bo h he EdgeTPU and he i-MX8-MINI SoC, based on a Mendel Linux dis ibu ion and
Tenso Flow Li e amewo k.
3. Deep Lea ning F amewo ks
ML’s success and popula i y could no be unde s ood wi hou he exis ence o pow-
e ul and, a he same ime, use - iendly applica ion de elopmen amewo ks. Some
echnology companies and uni e si ies ha e de eloped comple e ML in e ence lib a ies
o hei own esea ch pu poses ha hey ha e ended up making public as open sou ce
so wa e. Many ML algo i hms a e based on complex and qui e cumbe some ma hema ical
o mula ions ha a e no easy o implemen . F amewo ks simpli y he de elopmen o
such algo i hms by exposing a high-le el API o deal wi h complex calcula ions. In he case
o DL ne wo ks, amewo ks allow he implemen a ion o a comple e wo k low, including
de ining he ne wo k a chi ec u e, aining and op imiza ion, model pe o mance es ing
and model deploymen in o he inal embedded de ices.
The e a e many amewo ks o choose om, and in gene al he e a e a lo o esou ces
a ailable on he web o almos all o hem, bu some amewo ks ha e gained popula i y
among p og amme s and o e be e suppo o applica ion de elopmen . In [
17
], some
o he mos popula DL amewo ks a e classi ied by use access s a is ics o Gi Hub
eposi o ies. These amewo ks demand conside able compu ing powe , and hey un
on powe ul compu e s usually complemen ed wi h GPUs [
18
]. Some o he p ocesses
in ol ed in DL applica ions, such as model aining and alida ion, equi e a la ge amoun
o memo y and compu a ional powe . Fo ha eason, hey s ill un on high-end compu ing
sys ems, and a ely on embedded de ices.
Each amewo k uses i s own model o ma s and APIs o build and implemen DL
applica ions. I he model is going o un in an embedded de ice, he amewo k mus be
suppo ed by he embedded so wa e dis ibu ion. This in ac de e mines he selec ion
o he amewo k in he hos (high-end) compu e because he so wa e o he hos and
he de ice mus be compa ible. To deal wi h his challenge, a s anda d in e ope abili y
lib a y called Open Neu al Ne wo k Exchange (ONNX) (h ps://onnx.ai/, accessed on
20 July 2021) was designed. Many embedded so wa e dis ibu ions suppo his s an-
da d, allowing he selec ing o he hos amewo k wi hou wo ying abou embedded
de ice compa ibili y issues, as shown in Figu e 3. Fu he mo e, his means ha , a leas
heo e ically, any model de eloped using any ML amewo k could be deployed in o any
embedded de ice by adequa ely con e ing he o ma o he model. In eali y, embedded
so wa e dis ibu ions p esen s ong es ic ions, e en mo e so i he embedded ha dwa e
in eg a es design-speci ic AI co-p ocesso s, so in e ope abili y is a om o al. A main
issue is ha ONNX is no widely suppo ed by all embedded de ices, and ha dwa e
manu ac u es p o ide speci ic lib a ies o deploy DNN in hei co-p ocesso s ha sup-
po a limi ed, i no unique, model o ma . Fo his eason, in he ollowing sec ions he
amewo ks and lib a ies a ailable in he selec ed embedded de ices a e e ised.
Figu e 3. In e ope abili y o di e en amewo ks by using ONNX.

Senso s 2022,22, 4205 6 o 25
3.1. Yoc o Dis ibu ion and eIQ Machine Lea ning F amewo k o NXP i-MX8M P ocesso s
The Yoc o P ojec (h ps://www.yoc op ojec .o g/, accessed on 20 July 2021) is an
open-sou ce collabo a i e p ojec ha helps de elope s c ea e cus om Linux-based sys ems
ega dless o ha dwa e a chi ec u e. NXP ( he manu ac u e o he i-MX8M-PLUS p oces-
so ) p o ides a so wa e elease based on he Yoc o P ojec amewo k. I can be used o
build images o any i-MX8M boa d.
The compila ion p ocess downloads and ins alls many lib a ies and packages o c ea e
he bina y image o a unc ional Linux dis ibu ion o he boa d. This bina y image con ains
all he esou ces NXP p o ides o c ea e an embedded ML applica ion. In pa icula , he eIQ
de elopmen en i onmen suppo s hese six un- ime en i onmen s (in e ence engines):
A mNN, Tenso Flow Li e, ONNX Run ime, PyTo ch, OpenCV and DeepView
TM
RT. To ully
exploi he po en ial o he boa d, he amewo k selec ed mus be suppo ed by he in e nal
NPU p ocesso . Figu e 4shows he suppo ed eIQ in e ence engines ac oss he i-MX
compu ing uni s.
Figu e 4. i-MX8 Deep Lea ning un ime en i onmen s suppo ed by embedded compu ing uni s.
Py o ch and OpenCV a e no suppo ed by he embedded NPU and a e di ec ly
disca ded. A use guide (h ps://www.nxp.com/design/so wa e/embedded-so wa e/
i-mx-so wa e/embedded-linux- o -i-mx-applica ions-p ocesso s:IMXLINUX, accessed
on 20 July 2021) explains he capabili ies o all in e ence engines. Fo easons ha will
become appa en in he nex subsec ion, he mos sui able un ime en i onmen o his
wo k is Tenso Flow Li e (h ps://www.Tenso Flow.o g/li e/guide, accessed on 20 July
2021). As he name sugges s, his is a ligh weigh e sion o he Tenso Flow lib a y o
mobile, IoT and embedded de ices. I is a un ime package ha p o ides a way o un
Deep Neu al Ne wo ks on a speci ic ha dwa e p ocesso .
3.2. Mendel Linux and Tenso Flow Li e in Co al De Boa d
The Co al De Boa d uses a Mendel Linux dis ibu ion main ained by Google. Unlike
NXP Linux dis ibu ions, Co al Mendel Linux is speci ically designed o his e alua ion
boa d ki , so he e is no need o con igu e and compile he ke nel o ins all any so wa e
packages o lib a ies. E e y hing is al eady a ailable in a bina y image ha can be down-
loaded om h ps://co al.ai/docs/de -boa d/ge -s a ed/ (accessed on 20 July 2021).
The Co al De Boa d has a comple e un ime eady o deploy DL models on i s EdgeTPU
AI co-p ocesso uni . This co-p ocesso was designed by Google o deploy Tenso Flow
models in embedded ha dwa e, so he use o Tenso Flow and i s a ian Tenso Flow Li e is
manda o y. Tenso Flow Li e models mus be o -line p ocessed wi h a speci ic ool named
“EdgeTPU Compile ” be o e being deployed in he EdgeTPU AI co-p ocesso .
3.3. Hos PC Se up
The hos compu e is an essen ial pa o he whole de elopmen ecosys em. Fo his
wo k, a hos PC unning Ubun u 18.04 64-bi is used. The ML amewo k ins alled in he
hos is Tenso Flow 2.5.0. The selec ion was s aigh o wa d, as bo h embedded de ices
suppo he Tenso Flow Li e un ime. I comp ises many unc ionali ies, bu he only
one used in his wo k is he abili y o con e objec de ec ion models in o “li e” o ma s
Senso s 2022,22, 4205 7 o 25
sui able o embedded sys ems. The Tenso Flow p og amming in e ace is mainly w i en
o Py hon, and i was decided o use his language o w i e all he model con e sion sc ip s.
Tenso Flow (and Tenso Flow Li e) can be in eg a ed wi h Py hon and C/C++ applica-
ions. I was decided o use Py hon o de elop all he necessa y sc ip s o he benchma ks
desc ibed in his pape .
4. Objec De ec ion Models
Objec de ec ion models a e specialized ANN a chi ec u es designed o sol e he
compu e ision ask o objec iden i ica ion and localiza ion in a digi al image. F om he
model a chi ec u e pe spec i e, objec de ec ion models inhe i he ea u e ex ac ion
backbone om classi ica ion models. I is common o implemen an objec de ec ion model
by eusing a classi ica ion model such as VGG16, Mobilene o Resne , ained on a e y
la ge image da ase . The backbone used in embedded de ices mus be ca e ully selec ed,
as he numbe o laye s in he models a ies g ea ly. In eg a ion o he classi ica ion
and localiza ion heads in he model de ines wo sepa a e solu ions: wo-s age models
and one-s age models, in e e ence o he numbe o unc ional pa s ha he model
con ains. In he case o wo-s age models, he i s s age gene a es egion p oposals o
objec de ec ion, and he second s age compu es each p oposed egion and ex ac s bo h
he classi ica ion esul and he bounding boxes. Compa ed o one-s age models (which
pe o m all unc ions oge he ) wo-s age models end o ha e highe accu acy, al hough a
a highe compu a ional cos [
19
]. One o he i s and mos ep esen a i e wo-s age
models is R-CNN [
20
], whose egion p oposal s age p oposes a ound 2000 egions om
he inpu image.
One-s age models use a eed- o wa d a chi ec u e in which e e y hing is in e ed in a
single pass by applying a single neu al ne wo k o he en i e image. This app oach esul s
in signi ican ly lowe accu acy han wo-s age de ec o s, bu also highe de ec ion speed.
One o he i s one-s age de ec o s was YOLO [21].
The Tenso Flow lib a y is accompanied by auxilia y lib a ies ha complemen i s unc-
ionali ies. O pa icula in e es o DL is he Tenso Flow models eposi o y (h ps://gi hub.
com/Tenso Flow/models, accessed on 30 July 2021), also called he Tenso Flow model zoo.
This eposi o y con ains models o many DL applica ions, such as na u al language p ocessing,
speech ecogni ion and objec de ec ion. The model gi eposi o y e sion 2.5.0 was cloned (in
acco dance wi h he Tenso Flow e sion). Inside he “models” di ec o y, he “o icial” olde in-
cludes he code and models di ec ly main ained by Google. The “ esea ch” olde con ains some
s a e-o - he-a echnologies main ained by he de elope s hemsel es. The “objec _de ec ion”
di ec o y inside he “ esea ch” olde con ains he lib a ies, code and models ha ha e been
used o ha dwa e benchma king. A b ie explana ion and an ins alla ion p ocedu e can be
ound in h ps://gi hub.com/Tenso Flow/models/blob/mas e / esea ch/objec _de ec ion/g3
doc/ 2.md (accessed on 30 July 2021). The Tenso Flow model zoo con ains se e al ypes o
objec de ec ion model a chi ec u es, which a e desc ibed in he ollowing pa ag aphs.
4.1. Cen e Ne
Cen e Ne (h ps://gi hub.com/xingyizhou/Cen e Ne , accessed on 15 Sep embe
2021) is a one-s age objec de ec ion ne wo k ha in e s objec posi ion by assigning one
poin o e e y objec a he han a squa e [
22
]. The size and e en he pose o he objec a e
calcula ed a e wa ds using a eg ession ne wo k. This s a egy inc eases he accu acy o
he ne wo k while main aining as in e ence ime.
4.2. Single Sho Mul ibox De ec ion (SSD)
SSD ne wo ks [
23
] a e widely used in embedded de ices. They we e he i s one-s age
ne wo ks, along wi h YOLO ne wo ks, ha achie ed accu acy simila o ha o wo-s age
ne wo ks. Combined wi h he “mobilene ” backbone, i is he mos suppo ed ne wo k
in Tenso Flow Li e, mainly because i was de eloped by Google Resea ch (among o he
Senso s 2022,22, 4205 8 o 25
esea che s om academia) and i is a ligh weigh ne wo k sui able o deploymen in
embedded de ices.
SSD ne wo ks usually come wi h a specialized componen named a Fea u e Py amid
Ne wo k (FPN) [
24
] designed o imp o e he de ec ion pe o mance wi h objec s a di e en
scales. Usually objec de ec ion ne wo ks unc ion qui e poo ly wi h e y small o e y big
objec s (in e ms o he numbe o pixels ha an objec occupies in he image). FPNs sol e his
p oblem, inc easing de ec ion accu acy bu also inc easing p ocessing ime.
4.3. E icien De
The E icien De [
25
] DNN desc ibes an imp o ed one-s age ne wo k a chi ec u e ha
can be op imized and scaled o ob ain a comple e amily o neu al ne wo ks. Depending
on he a ailable compu ing esou ces and equi emen s, i is possible o selec he mos
adequa e membe o he amily. E icien De -D0 is he leas esou ce demanding ne wo k o
he amily, and i should be adequa e o embedded de ices. The backbone used as ea u e
ex ac o is called E icien Ne , hence i s name.
4.4. Fas e R-CNN
Fas e R-CNN [
26
] is a wo-s age objec de ec ion ne wo k. This a chi ec u e inco -
po a es a new i s -s age egion p oposal ha imp o es ne wo k pe o mance, achie ing
in e ence imes compa able o hose o single-s age ne wo ks while main aining high accu-
acy. I is he la es o consecu i ely imp o ed a chi ec u es, s a ing wi h R-CNN, hen
Fas -RCNN and inally Fas e -RCNN. Some enhancemen s a e also applied o he Fas e
R-CNN a chi ec u e o imp o e bo h in e ence speed and esul accu acy [27,28].
4.5. Mask R-CNN
Mask R-CNN is an objec segmen a ion model [
29
]. Objec segmen a ion is a echnique
ha , ins ead o de ec ing he objec inside he image, ca ego izes each indi idual pixel o
he image as belonging o a pa icula class. The goal is o ob ain all he pixels belonging o
a gi en class in he image, being able o d aw he silhoue e and he exac con ou o an
objec , no only he su ounding squa e. In his sense, objec segmen a ion can be seen as
an imp o emen o e objec de ec ion. Some a chi ec u e enhancemen s a e a ailable in
he li e a u e [30].
5. Model Con e sion o Embedded Ha dwa e De ices
The Design and T aining s ages o a DL model a e almos always accomplished using a
powe ul hos compu e . The hos compu e includes an ins alla ion o a ull ML amewo k
wi h a se o packages and lib a ies o suppo and acili a e he whole DL applica ion
de elopmen wo k low. The embedded de ices, on he o he hand, con ain a un ime
en i onmen designed only and speci ically o un a DL model in e ence.
In he Tenso Flow en i onmen , a model is desc ibed by a compu a ional g aph con-
aining bo h he node connec ions and he weigh s o pa ame e s o each node. The model
is usually de ined as a code ile con aining he API unc ion calls necessa y o build he
model, o example using Ke as API (h ps://ke as.io/ge ing_s a ed/, accessed on 15
Sep embe 2021). The model is buil sequen ially by adding a se ies o compu a ional
laye s ha ully desc ibe he model a chi ec u e. Howe e , a his poin , he model is
no unc ional because i does no ye con ain he alue o he weigh s, which a e com-
pu ed in he aining p ocess. Weigh s a e s o ed in sepa a ed iles named checkpoin s.
A checkpoin can be s o ed and eloaded a any ime. This allows compa ing he pe -
o mance o di e en aining s ages, o e aining some o he model laye s o accom-
plish an objec de ec ion ask di e en om he one he model was p e iously ained
o . Once he model is c ea ed, i is possible o sa e he compu a ional g aph and he
weigh s all oge he in a single ile o ma named “Sa edModel” o ma using a speci ic
Tenso Flow API unc ion call. A b ie u o ial on Tenso Flow model o ma s is a ailable in
h ps://www.Tenso Flow.o g/ u o ials/ke as/sa e_and_load (accessed on 11 July 2021).
Senso s 2022,22, 4205 9 o 25
Fo he Tenso Flow Li e un ime en i onmen , models c ea ed in Tenso Flow mus be
con e ed using a speci ic lib a y. This p ocess modi ies he model o ma app op ia ely
o adap i o un e icien ly on he speci ic AI co-p ocesso s. Con e sions mainly a ec
model weigh s, inpu enso s and ou pu enso s. In gene al, Tenso Flow models by de aul
use loa ing-poin pa ame e s, which a e app op ia e o high-pe o mance CPUs and
GPUs, bu embedded AI accele a o s no mally a e es ic ed o wo k wi h in ege s only.
Con e ing om loa o in ege ypes is called quan iza ion.
In his wo k, i e di e en quan iza ion le els a e conside ed based on he Tenso -
Flow Li e op imiza ion guide (h ps://www.Tenso Flow.o g/li e/pe o mance/model_
op imiza ion, accessed on 11 July 2021). A b ie desc ip ion o he quan iza ion le els is
p esen ed in Table 1, assigning o each le el a nume ical alue. No e ha he Tenso Flow
Li e con e sion wi h no quan iza ion has (p ope ly) a quan iza ion le el 0. In he es o his
wo k, models wi h quan iza ion le els 0 and 1 will be e e ed o as CPU models since hey
will un en i ely on he main p ocesso . In con as , le el 2, 3 and 4 models a e in ended
o be execu ed in he specialized AI co-p ocesso and will be e e ed o as co-p ocesso
models. An impo an pa o his wo k is o measu e he pe o mance ad an ages o
co-p ocesso models o e CPU models when an AI accele a o is a ailable.
Table 1. Model quan iza ion (op imiza ion) le els used in his wo k.
Le el Inpu Weigh s Ou pu Desc ip ion
0 loa loa loa No quan iza ion (all da a is FLOAT32)
1 loa in 8 loa Quan iza ion o model weigh s
3 loa in 8 loa
Quan iza ion o weigh s and in e nal a iables using a
ep esen a i e da ase . Inpu and ou pu laye s emain
in FLOAT32
3 in 8 in 8 loa
Quan iza ion o inpu enso uses he ep esen a i e
da ase
4 in 8 in 8 in 8
Full in ege con e sion. All compu a ion is in ended o be
done in embedded AI co-p ocesso
5.1. Model Con e sion Issues
The model con e sion wo k low is depic ed as a block diag am in Figu e 5. Models
downloaded om he Tenso Flow model zoo a e al eady ained. The pa ame e s in
he ained checkpoin iles a e expo ed in o a “Sa edModel” ile, and a e wa d model
con e sion is applied. Fi e con e sion Py hon sc ip s we e implemen ed o ob ain he
i e co esponding Tenso Flow Li e models, one pe quan iza ion le el. These models
a e eady o be deployed in he i-MX8M-PLUS p ocesso , bu o he EdgeTPU module
an ex a compila ion s ep mus be done using a speci ic compile de eloped by Google
named “edge pu_compile ”. The e o e, a e his compila ion ano he i e quan ized
models a e ob ained.
The e a e mo e han 80 models a ailable In he Tenso Flow model zoo (h ps://gi hub.
com/Tenso Flow/models/blob/mas e / esea ch/objec _de ec ion/g3doc/ 2_de ec ion_
zoo.md, accessed on 30 July 2021). Table 2lis s he nine models selec ed o be used in he
p esen wo k. The name o each model desc ibes he a chi ec u e, he inpu enso size
and he da ase used o aining (all models a e ained using COCO 2017 da ase ). Some
o he models in eg a e a Fea u e Py amid Ne wo k (FPN) componen , which imp o es
he de ec ion o objec s a di e en scales in he image. No e ha all he objec de ec ion
a chi ec u es om he Tenso Flow model zoo a e ep esen ed excep o Mask R-CNN. This
model is in ac an objec segmen a ion model wi h e y di e en in e ence esul s and
compu a ion equi emen s, no compa able wi h he o he s, and o his eason i was no
included in he benchma k. The jus i ica ion o he selec ion o he es o he models will
become clea in he ollowing subsec ions. Fo a gi en ne wo k, a o al o en op imized
embedded “. li e” models a e gene a ed ( i e o i-MX8M-PLUS and ano he i e o
Senso s 2022,22, 4205 16 o 25
6.2.1. Wa m Up Time Analysis
Wa m up imes o he i-MX8M-PLUS a e displayed in Figu e 13. The igu e shows
clea ly how he wa m-up ime inc eases wi h model size. I is also e iden ha he co-
p ocesso models p esen much la ge imes han he o he CPU models. This could be
easily explained by aking in o accoun ha he la e a e execu ed comple ely in he CPU,
so AI co-p ocesso ini ializa ion is no necessa y, while he o me a e deployed in he
AI co-p ocesso .
Figu e 13. i-MX8M-PLUS wa m up imes.
The wa m-up imes a y o co-p ocesso models om app oxima ely 10 s o abou
150 s. Fo small, non-quan ized models i is smalle han 10 s, bu when model size inc eases,
he wa m-up ime is ex emely long. In ac , he la ges model aises an execu ion e o .
Quan iza ion le el 1 p esen s wa m-up imes om some seconds o a ound 25 s. All hese
igu es ep esen a conside able amoun o ime, which mus be conside ed in applica ion
design and de elopmen .
In he EdgeTPU module, he wa m-up imes beha e di e en ly han in he i-MX8M-
PLUS (see Figu e 14). The wa m-up ime o co-p ocesso models is nea ly he same as ha
o any o he in e ence ime, showing no signi ican o e head in EdgeTPU module ini ial-
iza ion. Fo small models, he wa m-up ime is in he o de o hund eds o milliseconds,
making a speci ic ini ializa ion s age unnecessa y. Howe e , he EdgeTPU did no beha e
well when he model size inc eased, showing wa m-up imes o mo e han 10 s. Indeed,
he la ges co-p ocesso models do no un in he EdgeTPU module.
6.2.2. Auxilia y P ocessing Time Analysis
Auxilia y p ocessing imes a e ai ly homogeneous in all ne wo k a chi ec u es. Fo
i-MX8M-PLUS (Figu e 15), he alues a y be ween 20 and 40 ms wi h no co ela ion wi h
model size. Howe e , co ela ion wi h model quan iza ion le el is obse ed. The models
wi h loa inpu enso s (le els 0, 1 and 2) p esen no ably la ge imes han hose wi h
quan ized INT8 inpu enso s. This is mo e e iden in “SSD_Mobilene ” ne wo ks. I is
also obse ed ha in he models wi h a la ge inpu size o 640
×
640, he di e ence is
e en bigge . The explana ion is s aigh o wa d. The “SSD_Mobilene ” models need a
p epa a o y scale ope a ion ( hose models ha e a loa [
−
1, 1] inpu ange) ha in ol es
loa ing-poin ope a ions in he inpu image. The cos o hese ope a ions inc eases wi h
he size o he inpu enso . The di e ence anges o m 4–5 ms o 320
×
320 inpu enso s

Senso s 2022,22, 4205 17 o 25
up o 15 ms o sizes o 640
×
640. This ime di e ence is no e y high, bu , especially in
eal ime applica ions, should no be neglec ed.
Figu e 14. EdgeTPU wa m up imes o la ge models.
Figu e 15. i-MX8M-PLUS auxilia y p ocessing imes.
Auxilia y p ocessing imes in he EdgeTPU a e sligh ly la ge (a ound 5 ms) han hose
in he i-MX8M-PLUS due o he sligh ly smalle compu ing powe o he Co al De gene al
pu pose p ocesso . Howe e , he imes beha e exac ly in he same way as explained abo e.
6.2.3. i-MX8M-PLUS In e ence Time Analysis
The DL model in e ence ime is he mos ele an pa ame e o be analyzed in o de o
measu e he pe o mance o he embedded ha dwa e and he easibili y o he deploymen o
DL objec de ec ion applica ions. Bo h de ices’ in e ence imes a e analyzed independen ly,
s a ing he e wi h he i-MX8M-PLUS p ocesso , and he esul s a e compa ed a e wa ds.
Senso s 2022,22, 4205 18 o 25
The in e ence imes o he i-MX8M-PLUS s ongly depend on quan iza ion le el.
As expec ed, CPU models ha e conside ably longe in e ence imes han co-p ocesso
models. CPU models’ in e ence imes in Figu e 16 ange om 500 ms o a ound 25 s.
The quan iza ion le el 0 in e ence ime o “SSD_Mobilen _V1” p esen s an ou lie alue ex-
ceeding one minu e. This poin s o e en longe in e ence imes o “SSD_Resne ” ne wo ks,
bu hose models do no wo k on he i-MX8M-PLUS. The co-p ocesso models’ in e ence
imes in Figu e 17 ange om 20 ms o nea 800 ms. No e ha he imescale in he igu e is
100 imes lowe han in he p e ious igu e abo e. The yellow line in he igu e ep esen s
he quan iza ion le el 3 models’ in e ence ime and is used la e o compa e esul s be ween
ha dwa e de ices.
A ending o he in e ence imes, i is clea ha “ssd_mobilene _ 2_320” should be
mo ed o i s place, and “ssd_mobilene _ 2_640
×
640” should be mo e back one posi ion
behind “e icien de _li e0_320”. This means ha he in e ence imes canno be di ec ly
in e ed om model size; a he , ne wo k complexi y should be aken in o accoun . So ed
by ascending in e ence ime, “SSD_Mobilene _V2” is ollowed by ne wo ks wi h Fea u e
Py amid Ne wo k (FPN), which in oduces compu a ion complexi y, and a e wa d he
models wi h size 640
×
640 a e posi ioned as expec ed a he end. I is impo an o no e
ha he e is no signi ican di e ence in he in e ence imes be ween co-p ocesso models
wi h di e en quan iza ion le els.
Figu e 16. i-MX8M-PLUS in e ence ime o CPU models.
No e also ha e en i hey appea in he igu e abo e, Cen e Ne and “SSD_Resne ”
Ne wo k do no ob ain good in e ence esul s. The in e ence ime igu es we e included
in he benchma k because he CPU models wo ked p ope ly, and he ob ained in e ence
imes a e also cohe en wi h model size and complexi y.
Senso s 2022,22, 4205 19 o 25
Figu e 17. i-MX8M-PLUS in e ence o co-p ocesso models.
6.2.4. EdgeTPU In e ence Time Analysis
In e ence imes o he EdgeTPU module beha e nea ly in he same way as hose o
he i-MX8M-PLUS. The imes o CPU models (Figu e 18) a e conside ably longe han
hose o co-p ocesso models (Figu e 19). Howe e , he CPU models did no p esen he
anomalous beha io o la ge models, and all o hem we e co ec ly execu ed on he Co al
De Boa d.
In he case o co-p ocesso models, o la ge models, he e is no ime educ ion com-
pa ed wi h CPU models, and hose models a e omi ed in he in e ence ime analysis.
The yellow line in he Figu e 19 belongs o he quan iza ion le el 3 models, as was he case
o he i-MX8M-PLUS. The as es model is, as in he case o he i-MX8M-PLUS p oces-
so , he “ssd_mobilene _ 2_320” model, wi h in e ence ime below 20 ms. The “e icien-
de _li e0_320” model, wi h 145 ms in e ence ime, o e akes he “cen e ne _Mobilene _320”,
wi h mo e han 500 ms, and “ssd_mobilene _V2_640”, wi h 650 ms in e ence ime.
Figu e 18. EdgeTPU in e ence ime o CPU models.
Senso s 2022,22, 4205 20 o 25
Figu e 19. EdgeTPU in e ence ime o co-p ocesso models.
6.2.5. i-MX8M-PLUS s. EdgeTPU In e ence Time Compa ison
A pe o mance imp o emen ac o is calcula ed by di iding he in e ence imes o
he quan iza ion le el 1 model by he in e ence ime o he co esponding model wi h
quan iza ion le el 3. The imp o emen ac o o he i-MX8M-PLUS p ocesso inc eases
mono onically wi h model size, as can be obse ed in Figu e 20. I s alue a ies om 5 o
smalle models up o mo e han 30 o he la ges model, “ssd_ esne _101_V1”.
Fo he EdgeTPU module, he pe o mance imp o emen ac o p esen s a alue o
a ound 4, excep o he ne wo k “ssd_mobilene _ 2_320”, which ob ains a alue o 23.
The alues a e below hose o he i-MX8M-PLUS p ocesso , and hese esul s a e e en
wo se aking in o accoun ha he in e ence imes o quan ized le el 1 models in he Co al
De boa d a e longe (a ound 10%) han he co esponding alues in he i-MX8M-PLUS
p ocesso due o he compu ing powe di e ences in he gene al pu pose ARM CPUs o
bo h de ices.
Figu e 20. In e ence ime imp o emen ac o calcula ed using quan iza ion le els 1 and 3.
In Figu e 21, he in e ence imes o quan iza ion le el 3 models o bo h de ices a e
displayed. In he case o he EdgeTPU, only he i s , small models a e depic ed because he
las h ee models do no ha e alid in e ence imes. The i-MX8M-PLUS p ocesso shows
be e pe o mance han he EdgeTPU Co al De boa d o he i s h ee models and almos
Senso s 2022,22, 4205 21 o 25
he same pe o mance o he nex wo. Taking in o accoun ha he EdgeTPU has 4 TOPS
compu ing powe and he i-MX8M-PLUS has 2.3 TOPS, hese esul s sugges ha he
i-MX8M-PLUS p ocesso is mo e e icien han he EdgeTPU module when deploying and
unning DL models.
Figu e 21. i-MX8M-PLUS s. EdgeTPU in e ence imes o quan ized le el 3 models.
This be e pe o mance is con i med by looking a he beha io o he la ges models.
In he i-MX8M-PLUS p ocesso , he in e ence ime is kep unde one second, wi h a
imp o emen ac o o up o 30, while he EdgeTPU module p esen s imes o e 10 s
and imp o emen ac o s below 2.
7. Conclusions
The i s e ec ela ed o AI a he edge pa adigm is he eme gence o many embedded
de ices wi h specialized AI co-p ocesso s o execu e deep neu al ne wo k in e ences. In his
wo k, a e a de ailed e iew o he a ailable embedded ha dwa e de ices, wo o hem
we e selec ed o demons a e and e alua e he easibili y o he deploymen o DL objec
de ec ion models in esou ce cons ained de ices: Va isci e i-MX8M-PLUS Boa d and
EdgeTPU Co al De Boa d. Requi emen s o selec a de ice o his analysis included:
(1) i mus belong o an impo an and eliable manu ac u e , and (2) i mus o e a s ong
de elopmen communi y suppo ing he ools and applica ions. The de ices selec ed we e
designed by NXP and Google. NXP is one o he mos success ul indus ial p ocesso
manu ac u e s, and Google could be he mos impo an playe in he AI a ena. A la ge
po ion o his wo k was de o ed o se ing up he ha dwa e de ices—unde s anding wha
lib a ies and packages needed o be ins alled and he app op ia e ools o use. One o
he main goals o he wo k was o lea n and unde s and he wo k low o AI applica ion
de elopmen , and i can be concluded ha he success o his ask depends conside ably
on he selec ion o he de elopmen amewo k.
The AI amewo k used o de elop and deploy DL ne wo ks in embedded de ices
was Tenso Flow, oge he wi h Tenso Flow Li e. As a i s wo k low s age, Tenso Flow
models need o be con e ed in o Tenso Flow Li e o ma . E en i an easy- o-use ool is
p o ided by Tenso Flow Li e o con e he models, he con e sion is no i ial because o
a numbe o incompa ibili ies be ween bo h amewo ks. Many ma hema ical ope a ions
deeply hidden in he laye s o he neu al ne wo ks a e no suppo ed by he Li e e sion
un ime, and he con e sion o many model a chi ec u es emains s ill unsol ed.
All ou main model a chi ec u es o objec de ec ion in he Tenso Flow model eposi-
o y we e conside ed: “Cen e Ne ", “SSD", “E icien De ” and “Fas e R-CNN”. Howe e ,
in he ea ly s ages we ealized ha Tenso Flow Li e con e sion o some o he models was

Senso s 2022,22, 4205 22 o 25
impossible. As a ma e o ac , only “SSD” and “Cen e Ne " a chi ec u es a e compa ible
wi h he cu en Tenso Flow Li e con e e ; hus, a se o se en models we e inally se-
lec ed: six “SSD” wi h di e en ea u e ex ac o backbones and one “Cen e Ne ”. Fu he ,
an “E icien De ” model al eady con e ed o Tenso Flow Li e o ma was added o es as
many a chi ec u es as possible.
AI co-p ocesso s a e e y specialized ha dwa e uni s ha only accep eigh -bi in ege s
as ope ands, so he models mus also be quan ized. Fi e quan iza ion le els we e de ined
in acco dance wi h he capabili ies o he Tenso Flow Li e lib a y API. A e execu ing
model quan iza ion sc ip s, 35 models o each de ice we e compiled, plus he 2 al eady
con e ed, gi ing a o al o 72 models.
I is no easy o unde s and he quali y o he con e ed model o guess how he model
should be deployed in he AI co-p ocesso . As a guideline, in he case o he i-MX8M-PLUS,
he in e ence sc ip e u ns a lis o unsuppo ed ope a ions in he ini ial execu ion s age,
while in he case o he EdgeTPU, a log ile is c ea ed when he Tenso Flow Li e model is
compiling, wi h he numbe o ope a ions mapped o bo h he EdgeTPU and he CPU.
The benchma k consis ed o execu ing all he con e ed models, e i ying co ec
beha io and measu ing he model in e ence ime. Many issues we e de ec ed du ing his
p ocess. Some con e ed models did no de ec he alida ion image objec s he same was
as he o iginal model; o he s simply did no un in he embedded de ices. The numbe o
models wi h co ec beha io was conside ably sho ened. Only o y o he ini ial se en y-
wo models p o ided accep able esul s. I only quan ized models wi h ep esen a i e
da ase s a e conside ed, he numbe dec eases o only 16 models, 2 o hem belonging o an
“E icien De _li e0” ne wo k no c ea ed by he “s anda d” wo k low. Finally, only he ou
“SSD_Mobilene ” amewo ks we e p o en o be alid o embedded de ices. Again, he
p oblems ely on he e iciency and quali y o he con e ed models and he abili y o he
embedded un ime o i he models in o specialized ha dwa e.
Bo h ha dwa e de ices, he i-MX8M-PLUS and he EdgeTPU, we e able o execu e he
quickes objec de ec ion models in app oxima ely 20 ms. The auxilia y CPU p ocessing
ime spen ano he 25 ms. The whole in e ence ime supposes nea ly 50 ms, o 20 ames
pe second. The in e ence imes inc eased up o 100 ms o mo e complex ne wo k models
and e en mo e o 500–800 ms when inpu image size inc eased. E en i he EdgeTPU
claims o ha e almos double compu ing powe , his benchma k demons a es ha he
i-MX8M-PLUS de ice pe o med sligh ly be e in gene al. The pe o mance imp o emen
o co-p ocesso models compa ed wi h CPU models is abou 10 imes in he i-MX8M-PLUS
and 5 o e en wo se in he EdgeTPU.
A ew quick calcula ions we e ca ied ou o de e mine he quali y o he AI co-
p ocesso in e ence ime esul s. The i-MX8M-PLUS p ocesso in eg a es ou ARM Co ex-
A53 co es a 1.8 GHz. Assuming ( o ob ain a e y aw es ima e o compu ing powe ) ha
he co es a e able o execu e one ope a ion pe clock, he maximum heo e ical compu ing
p ocessing powe should be a ound 10 Giga-ope a ions pe second (GOPS) o loa ing-
poin ope a ions. Compa ed o he AI co-p ocesso ’s 2.3 TOPS, he heo e ical op imal
imp o emen ac o should be in he o de o 100. The calcula ion is based on e y imp ecise
and simpli ied assump ions, and he ac ual numbe should be lowe han he heo e ical
numbe . E en hough, he imp o emen ac o o 5 o 15 ob ained o mos o he small
“SSD_mobilen ” ne wo ks is qui e a om ha igu es. Once again, he con e ed model is
no compe en o be e icien ly execu ed in he AI co-p ocesso . The models a e pa i ioned
when unsuppo ed ope a ions a e ound, and many ope a ions a e delega ed back o he
gene al pu pose CPU, slowing down he o al in e ence pe o mance.
In gene al, he eeling abou he cu en s a e o objec de ec ion o embedded de ices
is ha many aspec s o pe o mance depend on he e iciency o he so wa e amewo ks on
bo h he hos compu e and he embedded de ice, and on hei abili y o ex ac maximum
pe o mance om he embedded ha dwa e co-p ocesso s. Those lib a ies a e now unde
cons uc ion and con inuous modi ica ions. Nea ly e e y mon h, NXP eleases a new
e sion o he Yoc o amewo k o he i-MX p ocesso amily (a leas wo new e sions
Senso s 2022,22, 4205 23 o 25
we e eleased since he i s benchma k es was accomplished). Co al also eleases new
compile ools, API lib a ies and ained models pe iodically. In he case o Tenso Flow
and Tenso Flow Li e, e en i he lib a ies we e upda ed many imes along he de elopmen
o he benchma k, new eleases a e now a ailable o be downloaded. The eposi o y o
models is upda ed e e y day ( he e a e con inuous commi s o he esea ch eposi o y),
and an o icial e sion is eleased synch onized wi h e e y Tenso Flow elease.
8. Fu u e Wo k
I should be clea a e eading he p e ious sec ions ha many issues emain open
and unsol ed. The p esen wo k does no make a quan i a i e assessmen o he (nume i-
cal) pe o mance o he con e ed models. Pe o mance co ec ness is decided by isual
inspec ion o he de ec ed objec s and co ec objec classi ica ion. E en i his app oach
easily de ec s ca as ophic ailu es (such as hose shown in Figu e 11), sub le pe o mance
a ia ions a e unde ec ed. A means o measu e he e o should be included as pa o
he in e ence sc ip . The e is a s aigh o wa d e o compu a ion s anda d de ined by
he COCO da ase , called mean a e age p ecision (mAP), speci ically de ined o objec
de ec ion. This e o me ic is in ac a ailable in Tenso Flow, bu needs o be implemen ed
om sc a ch in embedded de ices. I would be in e es ing o in es iga e whe he di e en
le els o quan iza ion in oduce no iceable e o s, o whe he ce ain ne wo k a chi ec u es
a e mo e sensi i e o quan iza ion p ocesses. We plan o ca y ou a quan i a i e e alua ion
o hese aspec s in a u u e pape .
One o he main cons ain s imposed on he wo k was he equi emen o using
p e-buil models om he Tenso Flow model zoo. Tenso Flow p o ides he possibili y
o implemen he model using a lexible API a di e en le els o abs ac ion. I would
be illus a i e o build he s anda d objec de ec ion models used in his wo k, o e en
o he simila ones, and o in es iga e how hose models beha e a e quan iza ion in he
embedded de ices conside ed he e. The inal objec i e should be o lea n i he e is a way o
op imize model deploymen by de ining model in e nal ope a ions and laye connec ions
using suppo ed ope a ions o he Tenso Flow Li e embedded un ime. Fu he mo e,
addi ional model sou ces besides Tenso Flow should be in es iga ed. The ONNX model
exchange should allow he impo o models om o he AI amewo ks. The EdgeTPU is
only suppo ed by Tenso Flow Li e un ime lib a ies, bu he i-MX8M-PLUS has some o he
suppo ed amewo ks, such as DeepViewRT, a mNN o he p e iously men ioned ONNX.
Finally, mo e ha dwa e de ices should be conside ed. The wo embedded boa ds
conside ed in his wo k sha ed many ha dwa e speci ica ions. Bo h ha e an NXP i-MX
amily p ocesso , in eg a e an in ege enso p ocesso and ely on Tenso Flow Li e lib a ies
as a un ime. In o de o ha e a mo e global iew o he ha dwa e pe o mance, di e en
ypes o embedded de ices should be es ed. A he beginning o he p esen wo k, a hi d
ha dwa e pla o m called Je son Nano was p e-selec ed o be included in he benchma k.
The Je son Nano N idia AI pla o m in eg a es a loa ing poin a i hme ic AI co-p ocesso
and uses o he specialized lib a ies called Tenso RT. The boa d was success ully launched,
and some p elimina y es s ha e been pe o med, bu he so wa e amewo k is qui e
di e en om he one used wi h he o he wo boa ds, and signi ican wo k is needed o
implemen he in e ence p ocesses.
Au ho Con ibu ions:
Concep ualiza ion, in es iga ion and w i ing, D.C. as pa o his PhD esea ch;
me hodology, o e all supe ision and w i ing, including e iew and edi ing, I.E.-G., J.M.-A. and E.J.
All au ho s ha e ead and ag eed o he published e sion o he manusc ip .
Funding:
This wo k has ecei ed suppo om he ollowing p og ams: PID2019-104966GB-I00
(Spanish Minis y o Science and Inno a ion), IT-1244-19 (Basque Go e nmen ), KK-2020/00049,
KK-2021/00111 and KK-2021/00095 (Elka ek p ojec s 3KIA, ERTZEAN and SIGZE, unded by he
SPRI-Basque Go e nmen ) and he AI-PROFICIENT p ojec unded by Eu opean Union’s Ho izon
2020 esea ch and inno a ion p og am unde g an ag eemen no. 957391.
Ins i u ional Re iew Boa d S a emen : No applicable.
Senso s 2022,22, 4205 24 o 25
In o med Consen S a emen : No applicable.
Da a A ailabili y S a emen : No applicable.
Con lic s o In e es : The au ho s decla e no con lic o in e es .
Re e ences
1.
Me enda, M.; Po ca o, C.; Ie o, D. Edge machine lea ning o ai-enabled io de ices: A e iew. Senso s
2020
,20, 2533. [C ossRe ]
[PubMed]
2. Weiss, K.; Khoshgo aa , T.M.; Wang, D. A su ey o ans e lea ning. J. Big Da a 2016,3, 1–40. [C ossRe ]
3.
Mu shed, M.S.; Mu phy, C.; Hou, D.; Khan, N.; Anan hana ayanan, G.; Hussain, F. Machine lea ning a he ne wo k edge:
A su ey. ACM Compu . Su . 2021,54, 1–37. [C ossRe ]
4.
Pena, D.; Fo embski, A.; Xu, X.; Moloney, D. Benchma king o CNNs o low-cos , low-powe obo ics applica ions. In P oceedings
o he RSS 2017 Wo kshop: New F on ie o Deep Lea ning in Robo ics, Rhodes, G eece, 15–16 July 2017; pp. 1–5.
5.
Hossain, S.; Lee, D. Deep lea ning-based eal- ime mul iple-objec de ec ion and acking om ae ial image y ia a lying obo
wi h GPU-based embedded de ices. Senso s 2019,19, 3371. [C ossRe ] [PubMed]
6.
Lonsdale, D.; Zhang, L.; Jiang, R. 3D p in ed b ain-con olled obo -a m p os he ic ia embedded deep lea ning om sEMG
senso s. In P oceedings o he 2020 In e na ional Con e ence on Machine Lea ning and Cybe ne ics (ICMLC), Adelaide, Aus alia,
2 Decembe 2020; pp. 247–253.
7.
Rahmania , W.; He nawan, A. Real- ime human de ec ion using deep lea ning on embedded pla o ms: A e iew. J. Robo .
Con ol 2021,2, 462–468.
8.
Gubbi, J.; Buyya, R.; Ma usic, S.; Palaniswami, M. In e ne o Things (IoT): A ision, a chi ec u al elemen s, and u u e di ec ions.
Fu u e Gene . Compu . Sys . 2013,29, 1645–1660. [C ossRe ]
9. Lasi, H.; Fe ke, P.; Kempe , H.G.; Feld, T.; Ho mann, M. Indus y 4.0. Bus. In . Sys . Eng. 2014,6, 239–242. [C ossRe ]
10. Vés ias, M.P.; Dua e, R.P.; de Sousa, J.T.; Ne o, H.C. Mo ing deep lea ning o he edge. Algo i hms 2020,13, 125. [C ossRe ]
11.
Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge compu ing: Vision and challenges. IEEE In e ne Things J.
2016
,3, 637–646.
[C ossRe ]
12. Cao, K.; Liu, Y.; Meng, G.; Sun, Q. An o e iew on edge compu ing esea ch. IEEE Access 2020,8, 85714–85728. [C ossRe ]
13.
B anco, S.; Fe ei a, A.G.; Cab al, J. Machine lea ning in esou ce-sca ce embedded sys ems, FPGAs, and end-de ices: A su ey.
Elec onics 2019,8, 1289. [C ossRe ]
14.
Ajani, T.S.; Imoize, A.L.; A aye o, A.A. An o e iew o machine lea ning wi hin embedded and mobile de ices–op imiza ions
and applica ions. Senso s 2021,21, 4412. [C ossRe ]
15.
Bianco, S.; Cadene, R.; Celona, L.; Napole ano, P. Benchma k analysis o ep esen a i e deep neu al ne wo k a chi ec u es. IEEE
Access 2018,6, 64270–64277. [C ossRe ]
16.
Im an, H.A.; Mujahid, U.; Wazi , S.; La i , U.; Mehmood, K. Embedded de elopmen boa ds o edge-AI: A comp ehensi e epo .
a Xi 2020, a Xi :2009.00803.
17.
Zacha ias, J.; Ba z, M.; Sonn ag, D. A su ey on deep lea ning oolki s and lib a ies o in elligen use in e aces. a Xi
2018
,
a Xi :1803.04818.
18.
Dai, W.; Be lean , D. Benchma king con empo a y deep lea ning ha dwa e and amewo ks: A su ey o quali a i e me ics. In
P oceedings o he 2019 IEEE Fi s In e na ional Con e ence on Cogni i e Machine In elligence (CogMI), Los Angeles, CA, USA,
12–14 Decembe 2019; pp. 148–155.
19.
Zhao, Z.Q.; Zheng, P.; Xu, S.; Wu, X. Objec de ec ion wi h deep lea ning: A e iew. IEEE T ans. Neu al Ne w. Lea n. Sys .
2019
,
30, 3212–3232. [C ossRe ]
20.
Gi shick, R.; Donahue, J.; Da ell, T.; Malik, J. Region-based con olu ional ne wo ks o accu a e objec de ec ion and segmen a ion.
IEEE T ans. Pa e n Anal. Mach. In ell. 2015,38, 142–158. [C ossRe ]
21.
Redmon, J.; Di ala, S.; Gi shick, R.; Fa hadi, A. You only look once: Uni ied, eal- ime objec de ec ion. In P oceedings o he
IEEE Con e ence on Compu e Vision and Pa e n Recogni ion, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788.
22. Zhou, X.; Wang, D.; K ähenbühl, P. Objec s as poin s. a Xi 2019, a Xi :1904.07850.
23.
Liu, W.; Anguelo , D.; E han, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Be g, A.C. Ssd: Single sho mul ibox de ec o . In P oceedings o
he Eu opean Con e ence on Compu e Vision, Ams e dam, The Ne he lands, 11–14 Oc obe 2016; pp. 21–37.
24.
Lin, T.; Dollá , P.; Gi shick, R.B.; He, K.; Ha iha an, B.; Belongie, S.J. Fea u e Py amid Ne wo ks o Objec De ec ion. In P oceed-
ings o he IEEE Con e ence on Compu e Vision and Pa e n Recogni ion, Honolulu, HI, USA, 21–26 July 2016; pp. 2117–2125.
25.
Tan, M.; Pang, R.; Le, Q.V. E icien de : Scalable and e icien objec de ec ion. In P oceedings o he IEEE/CVF Con e ence on
Compu e Vision and Pa e n Recogni ion, Sea le, WA, USA, 13–19 June 2020; pp. 10781–10790.
26.
Ren, S.; He, K.; Gi shick, R.; Sun, J. Fas e R-CNN: Towa ds eal- ime objec de ec ion wi h egion p oposal ne wo ks. Ad . Neu al
In . P ocess. Sys . 2015,28, 1–9. [C ossRe ]
27.
Cao, C.; Wang, B.; Zhang, W.; Zeng, X.; Yan, X.; Feng, Z.; Liu, Y.; Wu, Z. An Imp o ed Fas e R-CNN o Small Objec De ec ion.
IEEE Access 2019,7, 106838–106846. [C ossRe ]
28.
Chu, J.; Guo, Z.; Leng, L. Objec De ec ion Based on Mul i-Laye Con olu ion Fea u e Fusion and Online Ha d Example Mining.
IEEE Access 2018,6, 19959–19967. [C ossRe ]
Senso s 2022,22, 4205 25 o 25
29.
He, K.; Gkioxa i, G.; Dollá , P.; Gi shick, R.B. Mask R-CNN. In P oceedings o he IEEE In e na ional Con e ence on Compu e
Vision (ICCV), Venice, I aly, 22–29 Oc obe 2017; pp. 2961–2969.
30.
Zhang, Y.; Chu, J.; Leng, L.; Miao, J. Mask-Re ined R-CNN: A Ne wo k o Re ining Objec De ails in Ins ance Segmen a ion.
Senso s 2020,20, 1010. [C ossRe ] [PubMed]