scieee Science in your language
[en] (orig)

Integrating digital factory twin and AI for monitoring manufacturing systems through synthetic data generation and vision transformers

Author: Urgo, Marcello; Terkaj, Walter
Publisher: Zenodo
DOI: 10.1016/j.cirp.2025.04.037
Source: https://zenodo.org/records/17301342/files/1-s2.0-S0007850625000848-main.pdf
In eg a ing digi al ac o y win and AI o moni o ing manu ac u ing
sys ems h ough syn he ic da a gene a ion and ision ans o me s
Ma cello U go (2)
a,
*, Wal e Te kaj
b
a
Mechanical Enginee ing Depa men , Poli ecnico di Milano, Via La Masa 1, Milano, 20127, I aly
b
CNR-STIIMA, Via Al onso Co i 12, Milano, 20133, I aly
ARTICLE INFO
A icle his o y:
A ailable online 27 Ap il 2025
ABSTRACT
In eg a ing Digi al Twin and A ificial In elligence echnologies is eshaping manu ac u ing moni o ing sys-
ems by le e aging syn he ic da a and ad anced compu e ision models. This pape p esen s an app oach
whe e a Digi al Twin o a ac o y is used o gene a e syn he ic da ase s o ain Vision T ans o me s o objec
de ec ion and image segmen a ion in manu ac u ing p ocesses. The s udy demons a es imp o ed accu acy
in de ec ing and moni o ing ac o y asse s, alida ed h ough syn he ic and eal-wo ld da ase s. An indus ial
case s udy u he illus a es i s po en ial o iden i y anomalies.
© 2025 The Au ho s. Published by Else ie L d on behal o CIRP. This is an open access a icle unde he CC
BY-NC-ND license (h p://c ea i ecommons.o g/licenses/by-nc-nd/4.0/)
Keywo ds:
Manu ac u ing
Digi al win
A ificial in elligence
1. In oduc ion and p oblem s a emen
The adop ion o A ificial In elligence (AI) in manu ac u ing is con-
s an ly p og essing as new models eme ge and he eliabili y o he
echnology imp o es, en ailing a conside able impac on exis ing
wo kflows [1,2].
Among AI echnologies, compu e ision s ands ou as one o he
mos p omising and ma u e a eas due o i s abili y o p o ide accu-
a e and eliable esul s, suppo ing a wide ange o applica ions,
om au oma ion [3], o human moni o ing [4], and quali y con ol
p ocesses [5,6]. Wi h ad anced algo i hms and imp o ed ha dwa e,
compu e ision sys ems can de ec de ec s, moni o p oduc ion
lines, and ensu e accu acy, ul ima ely leading o lowe cos s and
inc eased p oduc i i y in manu ac u ing.
The inco po a ion o deep lea ning echniques has significan ly
shaped ecen ad ancemen s in objec de ec ion models. While adi-
ional models based on con olu ional neu al ne wo k (CNN) a chi ec-
u es (e.g., Fas e R-CNN and YOLO) ha e been widely adop ed, ision
ans o me s (ViTs) ha e eme ged as a compelling al e na i e ha
exploi s sel -a en ion mechanisms o p ocess images in a simila
way o how ans o me s ha e e olu ionised na u al language p oc-
essing (NLP) asks [7].
ViTs demons a ed consis en pe o mance ac oss asks such as
image classifica ion, objec de ec ion, and seman ic segmen a ion.
They also exhibi ed obus ness o dis u bances (e.g., occlusions),
making hem a eliable choice o eal-wo ld applica ions. Among
hese models, he De ec ion T ans o me (DETR) uses a ans o me
encode -decode a chi ec u e o cap u e ela ionships be ween
image ea u es and objec que ies, he eby simpli ying he aining
p ocess [8].
These s a e-o - he-a echniques le e age he s eng hs o ans-
o me s o imp o e gene alisa ion, yielding p omising esul s o
complex objec de ec ion asks. Howe e , hey demand highe
compu a ional e o and la ge da ase s. The la e is he main ba -
ie o adop ing AI in manu ac u ing, as collec ing and managing
hese la ge da ase s can be challenging, especially wi h limi ed da a
in as uc u e.
The a ailabili y o a digi al win (DT) o he ac o y, i.e., a digi al
eplica [9] including geome ic cha ac e is ics bu also p oduc s, p o-
cesses, esou ces and hei in eg a ed beha iou [10], has been sup-
po ing se e al applica ions in manu ac u ing, anging om
p ocesses o main enance, e gonomics and sys em con ol [11,12],
and acili a ing he implemen a ion o knowledge-based and da a-
d i en app oaches, especially AI.
A DT o ac o y asse s can be used o gene a e syn he ic da a ha
suppo he aining o objec de ec ion AI models wi h a wide ange
o possible applica ions such as acking p oduc s o gene al en i ies,
suppo ing and op imise handling ope a ions, moni o ing he p og-
ess o p oduc ion ac i i ies, enabling ad anced sa e y and e gonom-
ics analyses [13]. This capabili y is i al when he ac ual sys em is
non-exis en o inaccessible.
This pape add esses moni o ing ac o y objec s by iden i ying
hei loca ions wi hin a manu ac u ing sys em. I builds upon an
exis ing amewo k [13] o objec de ec ion in manu ac u ing
sys ems ha elies on syn he ically gene a ed da a p oduced
h ough a ac o y DT, enhancing i by in eg a ing ViTs models.
The app oach is alida ed wi hin a i ual ac o y en i onmen .
An applica ion o an indus ial case is also explo ed, le e aging
he capabili y o ViTs o pe o m objec de ec ion alongside
image segmen a ion, suppo ing he implemen a ion o s uc-
u ed ision pipelines enabling he de ec ion o objec and possi-
ble anomalies.
* Co esponding au ho .
E-mail add ess: [email p o ec ed] (M. U go).
h ps://doi.o g/10.1016/j.ci p.2025.04.037
0007-8506/© 2025 The Au ho s. Published by Else ie L d on behal o CIRP. This is an open access a icle unde he CC BY-NC-ND license
(h p://c ea i ecommons.o g/licenses/by-nc-nd/4.0/)
CIRP Annals - Manu ac u ing Technology 74 (2025) 639643
Con en s lis s a ailable a ScienceDi ec
CIRP Annals - Manu ac u ing Technology
jou nal homepage: h ps://www.edi o ialmanage .com/CIRP/de aul .aspx
2. Me hodology
The p oposed app oach (Fig. 1) specialises and ex ends he ain-
ing wo kflow p esen ed in [13]. The fi s s ep in ol es c ea ing a DT
model o he ele an ac o y asse s, ollowing a specified da a model
and he necessa y le el o de ail (ac i i y A1). The da a model can be
a e e ence on ology [14].
Fo each ac o y asse ha needs moni o ing (e.g., pa s, fix u es,
palle s), syn he ic da a a e gene a ed as images (ac i i y A2,Fig. 1)
using he DT model and o he digi al echnologies, such as i ual
eali y (VR) (Sec . 2.1).
The syn he ic da a, possibly oge he wi h a se o eal pic u es o
he same asse s, a e used o ain a DETR model o objec de ec ion
(ac i i y A3, Fig. 1) by le e aging he unc ionali ies o deep lea ning
amewo ks (Sec . 2.2). The ained DETR model is es ed and ali-
da ed (ac i i y A4, Fig. 1) and, i he pe o mance me ics (KPIs) mee
he equi emen s, i can be used o de ec he selec ed asse s and sup-
po moni o ing (Sec . 3).
The con en ional aining o objec de ec ion models is based on
ac i i ies A3 and A4 in Fig. 1, bu le e aging syn he ic da a (ac i i y
A1 plus A2,Fig. 1) can p o ide significan benefi s [13].
The ained DETR model can also p o ide panop ic segmen a ion
[6], i.e., iden i y he pixels in he image associa ed wi h he de ec ed
objec s. This enables he p ocessing o a eal da ase o emo e he
backg ound om he images.
The esul can be used o e ec i ely ain a VGG [15] model o
image classifica ion (ac i i y A5,Fig. 1) wi h educed da a equi e-
men s o achie e easonable pe o mance.
2.1. Gene a ion o syn he ic da ase s
VR ools ypically gene a e a i ual ep esen a ion o ac o y
asse s based on he co esponding DT model. In addi ion, he VR web
applica ion VEB.js [13] p o ides ad anced unc ionali ies o au oma -
ically cap u e he ende ed isualisa ion acco ding o he specific pe -
spec i e defined by he came a configu a ion o he VR en i onmen
(Fig. 2). The VR geome ic model enables he au oma ic iden ifica ion
o isible asse s wi hin he scene and he calcula ion o hei bound-
ing box coo dina es as p ojec ed on o he iewpo . This in o ma ion
can be di ec ly ob ained om he ende ing engine, simpli ying he
c ea ion o syn he ic images wi h accu a e objec anno a ions. Fo
each ele an isible asse , he anno a ion con ains i s iden ifie ,
class, and he posi ion and size o he bounding box.
VEB.js (Fig. 2) enables bo h manual and au oma ic gene a ion o
syn he ic images. In manual mode, he use con ols na iga ion and
came a pe spec i e, while au oma ic mode comple ely con ols he
VR came a, gene a ing images o selec ed asse s om a ious pe -
spec i es and dis ances. Au oma ic mode is ideal o c ea ing la ge
aining da ase s cen ed on specific asse s, whe eas manual mode is
be e sui ed o gene a ing a ge ed es ing da ase s.
A ealis ic o complex s a egy can be used o inco po a e ele an
ac o y asse s in o he VR scene. In he fi s s a egy, he scene ep e-
sen s a ealis ic con ex ha eflec s he ac ual use o he asse s
wi hin a manu ac u ing sys em. In he second s a egy, he scene
se es as a backg ound consis ing o many in e sec ing asse s usually
ound in a manu ac u ing en i onmen (Fig. 2).
This s a egy, p oposed in [16], add esses he need o ain objec
de ec ion models using a da ase wi h adequa e a ie y [13].
Bo h s a egies can be illus a ed in a case ela ed o moni o ing a
palle used in assembly ope a ions. Fig. 3 (le ) shows he palle mo -
ing along a con eyo , while Fig. 3 ( igh ) displays he palle agains a
andomly gene a ed backg ound ea u ing pa s o wo ks a ions,
con eyo s, and obo s.
2.2. T aining he objec de ec ion model
A p e- ained DETR model [17] is used as an objec de ec ion
model. A ans e lea ning app oach has been implemen ed, ini ialis-
ing he DETR model wi h p e- ained weigh s o he ans o me
and backbone pa s and using da a o an addi ional aining phase
wi h educed lea ning a es. Va ious syn he ic da ase s can be gene -
a ed based on he s a egies o he scene backg ound (Sec . 2.1) and
he image cha ac e is ics. Table 1 lis s he da ase s used o ain a i-
ous DETR models. The aining phase was ca ied ou o e 60 epochs,
using he PyTo ch amewo k suppo ed by he PyTo ch-ligh ning
[18], p o iding a simplified and di ec in e ace o handle PyTo ch,
Fig. 1. IDEF0 diag am ep esen ing he aining wo kflow based on [12].
Fig. 2. VR ep esen a ion o a scene buil acco ding o he complex s a egy using he
VEB.js en i onmen .
Fig. 3. Gene a ion o syn he ic images and bounding boxes o ac o y objec s in ealis-
ic (le ) and complex ( igh ) en i onmen s using VEB.js.
640 M. U go and W. Te kaj / CIRP Annals - Manu ac u ing Technology 74 (2025) 639643
and Supe ision lib a ies [19], o p epa e and p ep ocess he da a-
se s, as well as calcula ing KPIs o he alida ion and es ing phase.
3. Valida ion and es ing
This sec ion demons a es how a ained DETR model can be
es ed and alida ed (ac i i y A4,Fig. 1), while also de i ing insigh s
o iden i y he mos e ec i e aining s a egy. The e e ence use case
is ela ed o moni o ing palle s mo ing h ough an assembly line p o-
ducing hinges o he u ni u e ma ke [14].
Se en DETR models ha e been ained on he di e en da ase s
(Table 1) and es ed using only syn he ic images om he ealis ic VR
en i onmen , con aining palle s in di e en posi ions and dimen-
sions in he pic u e aming, as epo ed in Fig. 4.
Fo he es ing, he ollowing KPIs ha e been es ima ed based on
he concep s o p ecision T ue Posi i es
T ue Posi i esþFalse Posi i es

, and ecall
T ue Posi i es
T ue Posi i esþFalse Nega i es

:
AP
IoU=0.50
: A e age p ecision wi h IoU = 0.5. I measu es he p eci-
sion o he model by only conside ing p edic ed bounding boxes
wi h a alue o IoU (in e sec ion-o e -union be ween he de ec ed
and g ound u h bounding boxes) o a leas 50 %.
AP: A e age P ecision. I is he AP ac oss a ange o IoU om 0.50
o 0.95.
AR: A e age Recall. The a e age ac ion o objec s de ec ed ac oss
a ange o IoU om 0.50 o 0.95.
The esul s in Table 2 show ha aining on da a om a ealis ic
en i onmen (D1, Table 2) only p o ides easonable bu inadequa e
esul s, wi h an AP = 0.917. Using images exclusi ely om a complex
en i onmen pe o ms wo se, wi h AP = 0.870 (D4, Table 2).
Be e pe o mance is achie ed by me ging he wo da ase s,
enabling he ained model o gene alise mo e e ec i ely and dem-
ons a e g ea e obus ness. Addi ionally, con e ing he images o a
g ey scale u he imp o ed pe o mance by p e en ing he model
om elying on colou ea u es, which migh no be ele an in some
cases, and emphasising edge ea u es.
This migh ha e helped educe he dimensionali y o he da a,
making i easie o he model o ocus on he essen ial ea u es o
he images and inc ease he de ec ion a e (D2 and D5, Table 2).
Also, escaling he images o a fixed size o 640 by 640 pixels was
beneficial. This ans o ma ion helped make he da ase s consis en
so ha possible di e ences in he esolu ion o he sc eensho s did
no play any influence.
This led o an AP equal o 0.961, which is conside ed an excellen
pe o mance (D7, Table 2). Values o AP
IoU=0.5
we e iden ical o AP,
meaning ha de ec ions had a e y high o e lapping wi h he g ound
u h, and he ones used o calcula e AP and AP
IoU=0.50
we e he same.
Simila conside a ions apply o he AR, showing ha he model
ained wi h da ase D7 could de ec 97.5 % o he objec s in he es -
ing images (see Fig. 5). Howe e , he main eme ging issue was he
occu ence o alse posi i es.
Compa ed o al e na i e models (e.g., he YOLO class o models
[12]), he DETR model demons a ed supe io pe o mances,
expe iencing a minimal numbe o alse posi i es o non-exis ing
ins ances (Fig. 5, bo om- igh ). Howe e , mul iple de ec ions o he
same palle wi h di e en bounding boxes occu ed (Fig. 6). The asso-
cia ed confidence sco e o hese de ec ions is usually e y high (see
he numbe s in he de ec ed bounding boxes in Fig. 6). Adjus ing his
pa ame e , i.e., inc easing he h eshold confidence o de ec ions, is
expec ed o educe he numbe o mul iple de ec ions. Howe e ,
comple ely elimina ing his issue may no be possible, as mul iple
de ec ions also occu wi h high confidence (Fig. 6, bo om-le ). Fu -
he in es iga ions will be conduc ed o add ess his by le e aging
non-maximum supp ession (NMS) h esholds [17].
4. Indus ial case
The p oposed compu e ision wo kflow has been applied o sup-
po au oma ic quali y con ol and anomaly de ec ion in an indus ial
case. The ocus is on elec ic mo o s, whe e winded coils mus be
assembled in he s a o . The quali y p oblem conce ns he winding
p ocess, whe e a coppe wi e is winded a ound he coil. This can
cause he pape o be included wi hin he coppe filamen and
Fig. 4. Cha ac e isa ion o he es da ase acco ding o he dimension and posi ion o
he objec s (bounding boxes) o be de ec ed.
Table 1
Da ase s.
Da ase Desc ip ion # images
D1 Realis ic en 100
D2 Realis ic en + g ay-scaling 100
D3 Realis ic en + g ay-scaling + escaling 100
D4 Complex en 100
D5 Complex en + g ay-scaling 100
D6 Realis ic + Complex en + g ay-scaling 200
D7 Realis ic + Complex en + g ay-scaling + escaling 200
Table 2
KPIs o DETR models ained wi h di e en da ase s.
Da ase AP
IoU=0.50
AP AR
D1 0.917 0.917 0.952
D2 0.933 0.933 0.957
D3 0.921 0.921 0.952
D4 0.870 0.870 0.900
D5 0.891 0.891 0.923
D6 0.957 0.957 0.969
D7 0.961 0.961 0.975
Fig. 5. Examples o co ec de ec ions.
Fig. 6. Examples o mul iple o w ong de ec ions.
M. U go and W. Te kaj / CIRP Annals - Manu ac u ing Technology 74 (2025) 639643 641
comp omise he insula ion o he coppe om he aluminium o he
lamina ion s acks (Fig. 7, op).
As hese anomalies canno be ecognised du ing he winding p o-
cess, non-con o m coils would be assembled in o he s a o , causing
he mo o o be de ec i e. The objec i e is o use a compu e ision
app oach o iden i y anomalies ( he de ec i e coils) a he end o he
winding p ocess when hey a e s ill on he winding machine (Fig. 7,
bo om) and disca d hem.
Since mul iple ypes o he same de ec can occu , aining he
model o de ec each would be complica ed. Thus, a di e en
app oach was used. Fi s , a DETR model has been ained (ac i i y A3,
Fig. 1) o de ec coils using bo h syn he ically gene a ed (ac i i y A2,
Fig. 1) and eal images o con o ming and non-con o ming pa s, hus
aining i o de ec he coils wi hou disc imina ing be ween con-
o ming and non-con o ming ones. Hence, le e aging DETR capabili y
o segmen an image (ac i i y A4,Fig. 1), he backg ound is emo ed
om eal images, ob aining new coil images. These a e u he pa i-
ioned in o con o m and non-con o m ones; hus, a VGG-16 model
[15] can be ained o image classifica ion (ac i i y A5,Fig. 1).
This model was ained o 10 epochs on a da ase o 400 images
(50 % eal and 50 % syn he ically gene a ed), also using da a augmen-
a ion echniques le e aging a o a ion o §10°; up o 10 % o zoom,
wid h and heigh shi s; ho izon al and e ical flipping; eaching an
accu acy o 93.2 % ( aining) and 82.3 % ( alida ion). The DETR and
image classifica ion models ained so a we e hen used o he
desc ibed indus ial case. Gi en an image o a coil a e he winding
phase, he DETR model segmen s he image and emo es he back-
g ound. The ob ained image is analysed h ough he image classifica-
ion model o de ec non-con o m coils.
The es ing was ca ied ou on 190 images cap u ed om he indus-
ial en i onmen , wi h di e en esolu ions, aming and o ien a ion
o he coils, leading o he esul s summa ised in Fig. 8 (le ). The op-
le cell in he ma ix ep esen s ue nega i es, i.e., images o con o m-
ing coils ha a e co ec ly p edic ed. The bo om- igh cell co esponds
o he ac ion o ue posi i es, i.e., de ec i e coils co ec ly p edic ed
as de ec i e. The op- igh cell ep esen s he ac ion o alse posi i e
cases, coils e oneously p edic ed as non-con o ming. In con as , he
bo om-le cell epo s alse nega i es, i.e., coils a e inco ec ly p e-
dic ed o con o m when hey a e no . Based on his con usion ma ix,
he p ecision ob ained was 93.8 %, ecall 80.0 %, accu acy 87.4 % and
specifici y (i.e., he ue nega i e a e) equal o 94.7 %.
A adi ional app oach based on edge de ec ion was also imple-
men ed. A subse o con o ming and de ec i e coils was used o
define e e ence classes, wi h classifica ion pe o med using cosine
simila i y [20]. Due o he need o consis en image dimensions and
aming, his me hod was applied o a limi ed sample se . A egion o
in e es was defined, ocusing on he bo om insula ion, along wi h a
h eshold o igge he edge de ec ion. Fig. 8 ( igh ) shows ha he
edge de ec ion me hod achie ed 64 % p ecision, 90 % ecall, 70 %
accu acy, and 50 % specifici y.
Al hough he ecall was highe han he image classifica ion
app oach, he low specifici y is a c i ical issue, wi h hal o he con-
o ming coils misclassified as de ec i e. This is mainly due o he flex-
ibili y o he insula ion pape , which can a ec he pe cei ed shape,
e en i i has no been wound oge he wi h he coppe filamen .
Fig. 9 shows an example whe e pape bending led o a alse posi i e.
This is p oblema ic, as he company p io i ises minimising alse posi-
i es, as hese coils a e disca ded immedia ely, leading o ma e ial
was e and added labou . In con as , alse nega i es can be easily
caugh du ing inspec ion be o e subsequen assembly s ages.
5. Conclusions
The p oposed wo kflow le e ages he s eng hs o syn he ic da a
gene a ion and ViTs o achie e imp o ed objec de ec ion and image
classifica ion pe o mance by le e aging segmen a ion. Combining
syn he ic and eal da ase s p o ed e ec i e o aining ViTs, achie -
ing pe o mance supe io o he p e ious gene a ion o objec de ec-
ion models [13], demons a ing ha syn he ic da a can mi iga e he
impac o limi ed da a. The indus ial case s udy u he a fi med he
applicabili y o he p oposed wo kflow in quali y con ol applica-
ions. Ne e heless, assessing pe o mance in eal en i onmen s
dese es u he in es iga ion. Fu u e esea ch will also add ess a
mo e gene al and s uc u ed wo kflow and ools o suppo applica-
ions in o he manu ac u ing con ex s.
Decla a ion o in e es
The au ho s decla e ha hey ha e no known compe ing financial
in e es s o pe sonal ela ionships ha could ha e appea ed o influ-
ence he wo k epo ed in his pape .
CRediT au ho ship con ibu ion s a emen
Ma cello U go: W i ing  e iew & edi ing, W i ing o iginal
d a , Visualiza ion, Valida ion, So wa e, Me hodology, In es iga ion,
Fo mal analysis, Da a cu a ion, Concep ualiza ion. Wal e Te kaj:
W i ing o iginal d a , Visualiza ion, So wa e, Me hodology, In es-
iga ion, Fo mal analysis, Da a cu a ion, Concep ualiza ion.
Acknowledgemen s
The au ho s hank B.C. Mo lock, B. B€
ohm, G. P acella, E. Sala, and E.
Scal i o hei suppo and con ibu ions o da ase s and me hodol-
ogy, and an anonymous company o he indus ial case. This
esea ch has been pa ially unded by he EU Ho izon Eu ope p o-
g amme unde GA 101058505 and 101138930.
Fig. 7. Example o a non-con o m coil ( op) and a con o m coil in he winding machine
(bo om).
Fig. 8. Resul s o he es ing on he indus ial case using image classifica ion (le ) and
a adi ional edge de ec ion app oach( igh ).
Fig. 9. Example o a con o m coil classified as non-con o m h ough he edge de ec ion
app oach and cosine simila i y.
642 M. U go and W. Te kaj / CIRP Annals - Manu ac u ing Technology 74 (2025) 639643
Re e ences
[1] Gao RX, K €
uge J, Me klein M, M€
oh ing H-C, V
ancza J (2024) A ificial In elligence
in Manu ac u ing: S a e o he A , Pe spec i es, and Fu u e Di ec ions. CIRP
Annals 73(2):723–749.
[2] Cao J, Bambach M, Me klein M, Moza a M, Xue T (2024) A ificial In elligence in
Me al Fo ming. CIRP Annals 73(2):561–587.
[3] K €
uge J, Leh J, Schl€
u e M, Bischo N (2019) Deep Lea ning o Pa Iden ifica ion
Based on Inhe en Fea u es. CIRP Annals 68(1):9–12.
[4] U go M, Be a dinucci F, Zheng P, Wang L (2024) AI-Based Pose Es ima ion o
Human Ope a o s in Manu ac u ing En i onmen s. in Tolio T, (Ed.) CIRP No el
Topics in P oduc ion Enginee ing, Lec u e No es in Mechanical Enginee ing,CIRP
No el Topics in P oduc ion Enginee ing, Lec u e No es in Mechanical Enginee ing,1,
Sp inge , Cham, 3–38.
[5] Caggiano A, Zhang J, Alfie i V, Caiazzo F, Gao R, Te i R (2019) Machine Lea ning-
based Image P ocessing o on-Line De ec Recogni ion in Addi i e Manu ac u -
ing. CIRP Annals 68(1):51–454.
[6] Zhang Y, Shan S, F umosu FD, Calaon M, Yang W, Liu Y, Hansen HN (2022) Au o-
ma ed Vision-based Inspec ion o Mould and Pa Quali y in So Tooling Injec ion
Moulding Using Imaging and Deep Lea ning. CIRP Annals 71(1):429–432.
[7] Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y, Yang Z,
Zhang Y, Tao D (2023) A Su ey on Vision T ans o me . IEEE T ans Pa e n Anal
Mach In ell 45:87–110.
[8] Ca ion N, Massa F, Synnae e G, Usunie N, Ki illo A, Zago uyko S (2020) End- o-
End Objec De ec ion wi h T ans o me s. in Vedaldi A, Bischo H, B ox T, F ahm J-
M, (Eds.) Compu e Vision ECCV 2020, Sp inge In e na ional Publishing, Cham,
213–229.
[9] Te kaj W, Annoni M, Ma inez BO, Pesso E, So ino M, U go M (2024) Digi al Twin
o Fac o ies: Challenges and Indus ial Applica ions, Sp inge Na u e Swi ze land,
Cham255–274.
[10] Te kaj W, Gaboa di P, T e isan C, Tolio T, U go M (2019) A Digi al Fac o y Pla -
o m o he Design o Roll Shop Plan s. CIRP Jou nal o Manu ac u ing Science and
Technology 26:88–93.
[11] Nassehi A, Colledani M, K
ad
a B, Lu e s E (2022) Dayd eaming Fac o ies. CIRP
Annals 71(2):671–692.
[12] U go M, Te kaj W (2020) Fo mal Modelling o Release Con ol Policies as a Plug-in
o Pe o mance E alua ion o Manu ac u ing Sys ems. CIRP Annals 69(1):377–
380.
[13] U go M, Te kaj W, Simone i G (2024) Moni o ing Manu ac u ing Sys ems Using
AI: A Me hod Based on a Digi al Fac o y Twin o T ain Cnns on Syn he ic Da a.
CIRP Jou nal o Manu ac u ing Science and Technology 50:249–268.
[14] K
ad
a B, Te kaj W, Sacco M (2013) Seman ic Vi ual Fac o y Suppo ing In e op-
e able Modelling and E alua ion o P oduc ion Sys ems. CIRP Annals 62(1):443–
446.
[15] Simonyan, K., Zisse man, A., 2015, Ve y deep con olu ional ne wo ks o la ge-
scale image ecogni ion, h ps://a xi .o g/abs/1409.1556. a Xi :1409.1556.
[16] Jhang, Y.-C., Palma , A., Li, B., Dhakad, S., Vishwaka ma, S.K., Hogins, J., C espi, A.,
Ke , C., Chockalingam, S., Rome o, C., Thaman, A., Ganguly, S., 2020, T aining a
Pe o man Objec De ec ion ML Model on Syn he ic Da a Using Uni y Pe cep ion
Tools, h ps://blogs.uni y3d.com/2020/09/17/ aining-a-pe o man -objec -
de ec ion-ml-model-on-syn he ic-da a-using-uni y-compu e - ision- ools/.
[17] Wol , T., Debu , L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cis ac, P., Ma, C.,
Je ni e, Y., Plu, J., Xu, C., Le Scao, T., Gugge , S., D ame, M., Lhoes , Q., Rush, A.M.,
2024, T ans o me s: S a e-o - he-A Na u al Language P ocessing, h ps://
gi hub.com/hugging ace/ ans o me s.
[18] Falcon W (2024) The PyTo ch Ligh ning Team. PyTo ch Ligh ning . 2.5 h ps://
gi hub.com/Ligh ning-AI/py o ch-ligh ning.
[19] Roboflow, 2024, Supe ision, 0.25.0, h ps://gi hub.com/ oboflow/supe ision.
[20] Peng Y, Ruan S, Cao G, Huang S, Kwok N, Zhou S (2019) Au oma ed P oduc
Bounda y De ec De ec ion Based on Image Momen Fea u e Anomaly. IEEE Access
7:52731–52742.
M. U go and W. Te kaj / CIRP Annals - Manu ac u ing Technology 74 (2025) 639643 643