scieee Science in your language
[en] (orig)

AI on the Edge: An Automated Pipeline for PyTorch-to-Android Deployment and Benchmarking

Author: Saif U Din; Hussain, Muhammad Ahsan; Ikram, Mohsin; Ignatov, Dmitry; Timofte, Radu
Publisher: Zenodo
DOI: 10.5281/zenodo.17684575
Source: https://zenodo.org/records/17684575/files/AI_on_the_edge_automatic_pipeline-1.pdf
AI on he Edge: An Au oma ed Pipeline o PyTo ch- o-And oid Deploymen
and Benchma king
Sai U Din*
, Muhammad Ahsan Hussain, Mohsin Ik am, Dmi y Igna o , Radu Timo e
Compu e Vision Lab, CAIDAS & IFI, Uni e si y o W¨
u zbu g, Ge many
Abs ac
The deploymen o deep lea ning models on mobile de-
ices is a co ne s one o mode n AI applica ions. How-
e e , pe o mance benchma king in his domain emains
a p edominan ly manual, ime consuming, and non scal-
able p ocess. This pape in oduces a ully au oma ed,
end- o-end pipeline NN Li e h ps://gi hub.com/
AB ain-One/nn-li e ha b idges he c i ical gap
be ween model de elopmen in PyTo ch and igo ous pe -
o mance e alua ion on he And oid pla o m. Ou sys-
em comp ises a Py hon based o ches a ion amewo k ha
manages model con e sion, emula o con ol, and da a col-
lec ion, wo king in andem wi h a ligh weigh And oid ap-
plica ion o on de ice benchma king. The o ches a o
sys ema ically con e s PyTo ch models o Tenso Flow Li e
o ma , deploys he benchma k applica ion, execu es in e -
ence es s, and e ie es de ailed pe o mance epo s. The
ou pu is a collec ion o s uc u ed JSON epo s con aining
p ecise in e ence la ency me ics en iched wi h de ice spe-
ci ic ha dwa e analy ics. This amewo k elimina es man-
ual in e en ion, ensu es ep oducibili y, and p o ides a
scalable solu ion o e alua ing he on de ice pe o mance
o di e se neu al ne wo k a chi ec u es. In a la ge scale
e alua ion, he sys em success ully p ocessed o e 7,500
models, demons a ing excep ional wi h 48+ hou s o con-
inuous una ended ope a ion, he eby es ablishing a new
s anda d o au oma ed mobile ML es ing in as uc u e.
1. In oduc ion
The p oli e a ion o machine lea ning (ML) on mobile de-
ices has c ea ed an u gen need o au oma ed amewo ks
o alida e model pe o mance in di e se and esou ce con-
s ained ha dwa e en i onmen s. While amewo ks like
PyTo ch Mobile [10] and Tenso Flow Li e [4] ha e simpli-
ied he deploymen p ocess, he benchma king phase o en
emains a manual bo leneck. Resea che s and de elope s
mus indi idually con e , deploy, and es models a p o-
*Co esponding au ho : [email p o ec ed]g.de
cess ha is no only edious bu also p one o inconsis en-
cies, making la ge scale compa a i e s udies imp ac ical.
Exis ing wo k, such as ha by Gooda zi e al. [9], high-
ligh s he complexi y o he mobile pe o mance landscape,
while Kochne e al. [13] ha e explo ed e icien model
design. Howe e , a comp ehensi e, au oma ed sys em o
end- o-end e alua ion is s ill lacking. This gap hinde s apid
i e a ion and da a d i en decision making o mobile ML
deploymen .
To add ess his challenge, we p esen a ully au oma ed
sys em o con inuous model deploymen and alida ion.
Ou amewo k ea u es a Py hon based in as uc u e ha
handles he end- o-end con e sion o PyTo ch models o
Tenso Flow Li e, sys ema ic benchma king on And oid em-
ula o s, and comp ehensi e analy ics collec ion. A key in-
no a ion is i s s a e managemen , which enables esumable
p ocessing o housands o models and sophis ica ed ailu e
eco e y mechanisms. This in eg a ed solu ion ensu es e-
liable deploymen and alida ion o ML models ac oss mo-
bile pla o ms, p o iding a scalable ounda ion o scien i ic
esea ch and indus ial applica ion. The sys em’s e icacy is
demons a ed by i s abili y o p ocess o e 7,500 models
and un con inuously o mo e han 48 hou s wi hou man-
ual in e en ion.
2. Rela ed Wo k
Ou wo k si s a he in e sec ion o model con e sion, mo-
bile benchma king, and con inuous es ing.
2.1. Model Con e sion and Op imiza ion.
The challenge o deploying neu al ne wo ks on mobile de-
ices has been add essed by se e al con e sion amewo ks.
Tenso Flow Li e [4] es ablished he s anda d o mobile op-
imized model o ma s, p o iding quan iza ion and ha d-
wa e accele a ion ea u es. Simila ly, PyTo ch Mobile [10]
enabled di ec deploymen o PyTo ch models on edge de-
ices. Ou con e sion pipeline builds upon hese ounda-
ions bu adds au oma ed ba ch p ocessing capabili ies miss-
ing om s anda d single model con e sion ools. The AI
Edge To ch lib a y [11] we employ ep esen s a signi ican
ad ancemen in c oss amewo k compa ibili y, acili a ing
he PyTo ch o TFLi e ansi ion.
2.2. Mobile ML Benchma king.
Se e al benchma king amewo ks exis o mobile ma-
chine lea ning. MLPe Mobile [12] p o ides s anda d-
ized benchma ks o mobile AI pe o mance ac oss di e -
en ha dwa e pla o ms. AI Benchma k [6] o e s a com-
p ehensi e e alua ion o mobile AI accele a o s. Howe e ,
hese solu ions p ima ily ocus on a p eselec ed se o model
a chi ec u es and lack he lexibili y o cus om, use de-
ined model pipelines. Ou wo k ex ends hese concep s by
enabling au oma ed benchma king o any PyTo ch model
a chi ec u e wi h in eg a ed, low le el de ice analy ics.
2.3. Con inuous Tes ing.
The concep o con inuous es ing o mobile applica ions
has been explo ed in ools like Fi ebase Tes Lab [3] and
AWS De ice Fa m. These pla o ms p o ide cloud based
es ing on physical de ices bu lack specialized suppo o
he comple e ML model alida ion li ecycle, om con e -
sion o pe o mance analysis. Ou sys em add esses his gap
by combining in elligen emula o managemen wi h model
speci ic alida ion ou ines, c ea ing a specialized con inu-
ous es ing pipeline o mobile ML applica ions.
3. Me hodology
Ou app oach employs a mul i s age au oma ed pipeline o
con inuous model con e sion, deploymen , and alida ion
on mobile de ices 1. The sys em ollows a modula a -
chi ec u e wi h dis inc componen s o model p ocessing,
de ice managemen , and pe o mance benchma king.This
modula design enables independen scaling o each subsys-
em while main aining loose coupling h ough well de ined
in e aces. The pipeline inco po a es in elligen s a e pe sis-
ence and sel healing mechanisms o ensu e aul ole ance
du ing la ge scale, long unning e alua ion sessions.
3.1. Sys em A chi ec u e
The co e o ou sys em is a Py hon based o ches a ion
amewo k ha coo dina es he en i e wo k low, as illus-
a ed in Figu e 2. I is designed wi h a mic ose ices in-
spi ed a chi ec u e wi hin a single p ocess, ensu ing modu-
la i y, main ainabili y, and ex ensibili y.
3.2. Model Con e sion F amewo k
The con e sion p ocess is a c i ical i s s ep. We dy-
namically ins an ia e models om a da abase, applying in-
elligen il e ing o selec mobile iendly con igu a ions
(e.g., small ba ch sizes). The co e inno a ion is he
NHWCW appe class, which esol es he undamen al en-
so layou dispa i y be ween PyTo ch (NCHW) and Ten-
so Flow Li e (NHWC). This challenge o c oss amewo k
Figu e 1. End- o-End Au oma ed Benchma king Wo k low. The
sys em manages he en i e li ecycle om model loading and con-
e sion o emula o es ing, analy ics collec ion, and s a e pe sis-
en epo ing.
deploymen has been no ed in p io wo k on mobile deep
lea ning op imiza ion [5].
class NHWCW appe ( o ch.nn.Module):
de __ini __(sel , model):
supe ().__ini __()
sel .model = model
de o wa d(sel , x):
# T ans o m NHWC o NCHW o ma
x_ ans o med = -
x.pe mu e(0, 3, 1, 2).con iguous()
e u n sel .model(x_ ans o med)
This w appe ensu es memo y layou op imiza ion
h ough he .con iguous() ope a ion and enables ze o
copy ope a ions be ween amewo ks, which is c ucial o
pe o mance on mobile accele a o s op imized o channels
las ope a ions. The impo ance o memo y layou op i-
miza ion o e icien mobile in e ence aligns wi h indings
in neu al ne wo k specializa ion esea ch [1].
3.3. Au oma ed Benchma king In as uc u e
The benchma king in as uc u e p o ides a ep oducible
es ing en i onmen .
Emula o Managemen : Ou sys em implemen s dy-
namic And oid Vi ual De ice (AVD) selec ion and in elli-
gen li ecycle managemen . I ea u es a sophis ica ed boo
sequence moni o ha polls he sys.boo comple ed
sys em p ope y o ensu e he And oid en i onmen is ully
ini ialized be o e es ing begins.
Pe o mance Measu emen : The pipeline au oma es
model deploymen , benchma k execu ion ia And oid in-
en s, and esul collec ion. A key enhancemen is he au o-
ma ic co ela ion o benchma k esul s wi h comp ehensi e
de ice analy ics (memo y, CPU) be o e gene a ing he inal
epo , p o iding ich con ex o pe o mance analysis.
NN Li e: Au oma ed PyTo ch- o-And oid Deploymen Pipeline
Py honO ches a o F amewo k And oidBenchma kApplica ion
LEMUR Da ase S a e Managemen & Failu e Reco e y
Model Con e sionF amewo k
NHWC W appe NCHW →NHWC
Tenso Flow Li eCon e sion
APK Deploymen
And oid Emula o Managemen
AVD Boo Moni o ing
On-De iceIn e ence Engine
De ice Analy icsCollec ion
Pe o mance Repo s
Inpu
Models
Ou pu
Pe o mance
Repo s
Pe o mance Me ics
Figu e 2. NN Li e: End- o-End Au oma ed PyTo ch- o-And oid Deploymen Pipeline. The sys em a chi ec u e comp ises Py hon-based
o ches a ion, model con e sion amewo k, and And oid benchma king componen s ha wo k oge he o au oma e he en i e wo k low
om model con e sion o pe o mance epo ing.
3.4. S a e Managemen and Failu e Reco e y
To ensu e eliabili y o e long du a ions, he sys em inco -
po a es a sophis ica ed s a e managemen sys em. Con inu-
ous P ocessing: A s a e ile pe sis s he p ocessing con ex
(p ocessed, ailed, and cu en models) using a omic w i es
o p e en co up ion. This allows he sys em o esume
seamlessly a e in e up ions. Failu e Reco e y: Upon
a benchma k ailu e, he sys em ini ia es a mul i s age e-
co e y p o ocol: a con igu able cooling o pe iod (e.g.,
3 minu es) o allow sys em esou ces o s abilize, a com-
p ehensi e cleanup o emula o and ADB p ocesses, and a
con olled p ocess es a using Py hon’s os.exec () o
comple e p ocess eju ena ion while p ese ing he o iginal
s a e and command line a gumen s.
4. Expe imen s
4.1. T aining and Tes ing
Ou expe imen al e alua ion le e ages he comp ehen-
si e LEMUR Neu al Ne wo k Da ase [2]h ps://
gi hub.com/AB ain-One/nn-da ase /, which
p o ides a di e se collec ion o neu al ne wo k a chi ec u es
o au oma ed machine lea ning esea ch. This da ase ,
comp ising o e 7,500 unique models, se es as he oun-
da ion o ou scalabili y and pe o mance analysis. To
u he expand a chi ec u al di e si y, we inco po a e mod-
els gene a ed h ough he NN-GPT h ps://gi hub.
com/AB ain-One/nn-gp amewo k [8], which u i-
lizes la ge language models o neu al a chi ec u e gene -
a ion, and he LEMUR 2 da ase [7] ha unlocks addi-
ional neu al ne wo k a ian s. Addi ionally, we d aw in-
spi a ion om hype pa ame e op imiza ion app oaches ex-
plo ed in HPGPT [8], which in es iga es LLMs o au o-
ma ed hype pa ame e uning, ensu ing ou benchma king
co e s bo h a chi ec u al and pa ame ic a ia ions in model
design. The in eg a ion o hese di e se model amilies en-
ables comp ehensi e alida ion o ou au oma ed deploy-
men sys em ac oss a wide spec um o neu al ne wo k de-
signs, om adi ional a chi ec u es o AI-gene a ed mod-
els.
Ou expe imen al se up ocused on e alua ing he
pipeline using his comp ehensi e model collec ion. The
expe imen s we e conduc ed on a Linux wo ks a ion
equipped wi h an In el i7 p ocesso , 32GB RAM, and an
NVIDIA RTX 3080 GPU. The a ge pla o m o all bench-
ma ks was he And oid ope a ing sys em, es ed using he
sys em’s au oma ed emula o managemen on a s anda d
And oid Vi ual De ice (AVD) wi h a p ede ined con igu-
a ion (e.g., Pixel 4 p o ile, API le el 30).
No adi ional aining was in ol ed; ins ead, he es ing
phase consis ed o he end o end execu ion o ou au o-
ma ed pipeline o each model: con e sion o TFLi e, de-
ploymen o he emula o , on de ice in e ence benchma k-
ing, and epo gene a ion.
4.2. E alua ion Me ics
The p ima y me ics o e alua ion we e bo h ope a ional
and pe o mance based:
•Th oughpu : The numbe o models p ocessed pe uni
ime (models/hou ).
•Reliabili y: The abili y o comple e long unning ses-
sions wi hou manual in e en ion, measu ed by con in-
uous up ime.
•In e ence La ency: The a e age ime aken o a single
o wa d pass o he model on he mobile de ice, measu ed
in milliseconds.
•Sys em Resou ce U iliza ion: De ice analy ics collec ed
included a ailable memo y, cached memo y, and de ailed
CPU a chi ec u e in o ma ion, p o iding con ex o he
pe o mance me ics.
5. Resul s and Discussion
5.1. Ope a ional E iciency and Scalabili y
The sys em demons a ed excep ional ope a ional e iciency
and scalabili y. O e a sus ained pe iod, i success ully p o-
cessed a o al o 7,562 PyTo ch models h ough he com-
ple e pipeline. The a e age p ocessing ime pe model
anged om 60 o 90 seconds. This includes he ull wo k-
low: model con e sion, emula o boo (i necessa y), APK
deploymen , benchma k execu ion, analy ics collec ion, and
epo gene a ion. This high h oughpu enables la ge scale
model e alua ion campaigns ha would be in easible man-
ually.
5.2. Failu e Reco e y
The amewo k success ully comple ed a con inuous, una -
ended ope a ion session las ing o e 48 hou s. This was
made possible by he sophis ica ed ailu e eco e y mech-
anism. Du ing he la ge scale un, he sys em encoun e ed
and au oma ically eco e ed om 47 ansien ailu es (e.g.,
emula o imeou s, ADB disconnec ions). In each case, he
eco e y p o ocol wai ing, comp ehensi e cleanup, and p o-
cess es a was igge ed success ully, allowing he pipeline
o con inue om he las sa ed s a e wi hou human in e -
en ion.
5.3. Comp ehensi e Da a Collec ion
The pipeline gene a ed a ich da ase o o e 15,000 indi-
idual da a poin s. Each da a poin includes no only he
model’s in e ence la ency bu is also en iched wi h he de-
ice’s pe o mance con ex a he ime o execu ion. This
mul i dimensional da a allows o sophis ica ed analysis,
such as co ela ing pe o mance deg ada ion wi h low a ail-
able memo y o iden i ying a chi ec u e speci ic op imiza-
ion oppo uni ies. An example o he en iched epo s uc-
u e is shown below:
{
"model_name": "Ai Ne ",
"de ice_ ype": "sdk_gphone64_x86_64",
"os_ e sion": "And oid 14 (Hedgehog)",
" alid": ue,
"emula o ": ue,
"du a ion": 360,
"de ice_analy ics": {
...
"memo y": {
" o al_ am": "1.92 GB",
"a ailable_ am": "800 MB",
"cached": "880 MB"
...
},
"cpu": {
"a chi ec u e": "x86_64",
"co es": 8,
" endo ": "AMD",
" ea u es": ["SSE4.2", "AVX", "AES"]
...
}
}
}
6. Conclusion
This pape p esen ed a highly e icien au oma ed ame-
wo k o deploying and benchma king machine lea ning
models on mobile de ices. Ou sys em success ully ad-
d esses he c i ical gap be ween model de elopmen in Py-
To ch and pe o mance alida ion on And oid by au oma -
ing he en i e wo k low om con e sion o de ailed epo -
ing. The demons a ed abili y o p ocess housands o mod-
els una ended, coupled wi h he deep, con ex ualized pe -
o mance da a i gene a es, ep esen s a signi ican ad ance-
men o e exis ing manual and semi au oma ed app oaches.
The amewo k es ablishes a new s anda d o au oma ed
ML es ing in as uc u e, enabling esea che s and p ac i-
ione s o pe o m la ge scale, ep oducible empi ical s ud-
ies on mobile model pe o mance. Fu u e wo k will ocus
on expanding suppo o physical de ice lee s, in eg a ing
mo e ad anced powe and he mal p o iling, and ex ending
he model con e sion amewo k o suppo a wide ange
o ope a o s and quan iza ion schemes.
Re e ences
[1] Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and
Song Han. Once- o -all: T ain one ne wo k and specialize
i o e icien deploymen . In In e na ional Con e ence on
Lea ning Rep esen a ions (ICLR), 2020. 2
[2] A ash To abi Gooda zi, Roman Kochne , Waleed Khalid,
Fu ui Qin, Tolgay A inc Uzun, Yashkuma Sanjaybhai
Dhameliya, Yash Kanubhai Ka hi iya, Zo ia An onina Ben-
yn, Dmi y Igna o , and Radu Timo e. Lemu neu al ne -
wo k da ase : Towa ds seamless au oml. a Xi p ep in
a Xi :2504.10552, 2025. 3
[3] Google LLC. Fi ebase Tes Lab: Cloud-Based App Tes -
ing In as uc u e, 2016. Cloud-based in as uc u e o au-
oma ed mobile app es ing. 2
[4] Google LLC. Tenso Flow Li e: On-de ice machine lea ning
amewo k, 2017. Open-sou ce deep lea ning amewo k o
mobile and edge de ices. 1
[5] Song Han, Huizi Mao, and William J Dally. Deep com-
p ession: Comp essing deep neu al ne wo ks wi h p uning,
ained quan iza ion and hu man coding. In e na ional Con-
e ence on Lea ning Rep esen a ions (ICLR), 2020. 2
[6] And ey Igna o , Radu Timo e, William Chou, Ke Wang,
Max Wu, Tim Ha ley, and Luc Van Gool. AI benchma k:
Running deep neu al ne wo ks on and oid sma phones. In
Eu opean Con e ence on Compu e Vision (ECCV), pages
288–314, 2018. 2
[7] Roman Kochne , A ash To abi Gooda zi, Zo ia An onina
Ben yn, Dmi y Igna o , and Radu Timo e. Op una s code
llama: A e llms a new pa adigm o hype pa ame e uning?
In P oceedings o he IEEE/CVF In e na ional Con e ence
on Compu e Vision Wo kshops (ICCVW), 2025. 3
[8] Roman Kochne , Waleed Khalid, Tolgay A inc Uzun,
Xi Zhang, Yashkuma Sanjaybhai Dhameliya, Fu ui Qin,
Dmi y Igna o , and Radu Timo e. Nngp : Re hinking au-
oml wi h la ge language models. a Xi p ep in , 2025. 3
[9] Dianshu Liao, Shidong Pan, Siyuan Yang, Yanjie Zhao,
Zhenchang Xing, and Xiaoyu Sun. A compa a i e s udy
o and oid pe o mance issues in eal-wo ld applica ions and
li e a u e. a Xi p ep in a Xi :2401.07849, 2024. 1
[10] Me a AI. PyTo ch Mobile: End- o-end deploymen solu ion
o mobile and embedded de ices. Me a AI Resea ch, 2019.
O icial documen a ion and amewo k elease. 1
[11] Me a AI. AI Edge To ch: PyTo ch Lib a y o Edge De-
ice Deploymen , 2024. O icial PyTo ch ex ension o edge
compu ing and mobile deploymen . 1
[12] Vijay Janapa Reddi, Da id Kan e , Pe e Ma son, Ja ed
Duke, Thai Nguyen, Ramesh Chukka, Ken Shi ing, Koan-
Sin Tan, Ma k Cha lebois, William Chou, Mos a a El-
Khamy, Jungwook Hong, Tom S . John, Cindy T inh,
Michael Buch, Ma k Mazumde , Relia Ma ko ic, Thomas
A a, Fa ih Caki , Masoud Cha khabi, Xiaodong Chen,
Cheng-Ming Chiang, Da e Dex e , Te y Heo, Gun he
Schmuelling, Ma yam Shabani, and Dylan Zika. MLPe
mobile in e ence benchma k. In MLPe . MLCommons,
2020. 2
[13] Md Ziaul Haque Zim. TinyML: Analysis o X ensa LX6 mi-
c op ocesso o neu al ne wo k in e ence. Fu u e In e ne ,
15(11):350, 2023. 1