AI on the Edge: An Automated Pipeline for PyTorch-to-Android Deployment and Benchmarking

Author: Saif U Din; Hussain, Muhammad Ahsan; Ikram, Mohsin; Ignatov, Dmitry; Timofte, Radu

Publisher: Zenodo

DOI: 10.5281/zenodo.17684575

Source: https://zenodo.org/records/17684575/files/AI_on_the_edge_automatic_pipeline-1.pdf

AI on he Edge: An Au oma ed Pipeline o PyTo ch- o-And oid Deploymen
and Benchma king
Sai U Din*
, Muhammad Ahsan Hussain, Mohsin Ik am, Dmi y Igna o , Radu Timo e
Compu e Vision Lab, CAIDAS & IFI, Uni e si y o W¨
u zbu g, Ge many
Abs ac
The deploymen o deep lea ning models on mobile de-
ices is a co ne s one o mode n AI applica ions. How-
e e , pe o mance benchma king in his domain emains
a p edominan ly manual, ime consuming, and non scal-
able p ocess. This pape in oduces a ully au oma ed,
end- o-end pipeline NN Li e h ps://gi hub.com/
AB ain-One/nn-li e ha b idges he c i ical gap
be ween model de elopmen in PyTo ch and igo ous pe -
o mance e alua ion on he And oid pla o m. Ou sys-
em comp ises a Py hon based o ches a ion amewo k ha
manages model con e sion, emula o con ol, and da a col-
lec ion, wo king in andem wi h a ligh weigh And oid ap-
plica ion o on de ice benchma king. The o ches a o
sys ema ically con e s PyTo ch models o Tenso Flow Li e
o ma , deploys he benchma k applica ion, execu es in e -
ence es s, and e ie es de ailed pe o mance epo s. The
ou pu is a collec ion o s uc u ed JSON epo s con aining
p ecise in e ence la ency me ics en iched wi h de ice spe-
ci ic ha dwa e analy ics. This amewo k elimina es man-
ual in e en ion, ensu es ep oducibili y, and p o ides a
scalable solu ion o e alua ing he on de ice pe o mance
o di e se neu al ne wo k a chi ec u es. In a la ge scale
e alua ion, he sys em success ully p ocessed o e 7,500
models, demons a ing excep ional wi h 48+ hou s o con-
inuous una ended ope a ion, he eby es ablishing a new
s anda d o au oma ed mobile ML es ing in as uc u e.
1. In oduc ion
The p oli e a ion o machine lea ning (ML) on mobile de-
ices has c ea ed an u gen need o au oma ed amewo ks
o alida e model pe o mance in di e se and esou ce con-
s ained ha dwa e en i onmen s. While amewo ks like
PyTo ch Mobile [10] and Tenso Flow Li e [4] ha e simpli-
ied he deploymen p ocess, he benchma king phase o en
emains a manual bo leneck. Resea che s and de elope s
mus indi idually con e , deploy, and es models a p o-
*Co esponding au ho : [email p o ec ed]g.de
cess ha is no only edious bu also p one o inconsis en-
cies, making la ge scale compa a i e s udies imp ac ical.
Exis ing wo k, such as ha by Gooda zi e al. [9], high-
ligh s he complexi y o he mobile pe o mance landscape,
while Kochne e al. [13] ha e explo ed e icien model
design. Howe e , a comp ehensi e, au oma ed sys em o
end- o-end e alua ion is s ill lacking. This gap hinde s apid
i e a ion and da a d i en decision making o mobile ML
deploymen .
To add ess his challenge, we p esen a ully au oma ed
sys em o con inuous model deploymen and alida ion.
Ou amewo k ea u es a Py hon based in as uc u e ha
handles he end- o-end con e sion o PyTo ch models o
Tenso Flow Li e, sys ema ic benchma king on And oid em-
ula o s, and comp ehensi e analy ics collec ion. A key in-
no a ion is i s s a e managemen , which enables esumable
p ocessing o housands o models and sophis ica ed ailu e
eco e y mechanisms. This in eg a ed solu ion ensu es e-
liable deploymen and alida ion o ML models ac oss mo-
bile pla o ms, p o iding a scalable ounda ion o scien i ic
esea ch and indus ial applica ion. The sys em’s e icacy is
demons a ed by i s abili y o p ocess o e 7,500 models
and un con inuously o mo e han 48 hou s wi hou man-
ual in e en ion.
2. Rela ed Wo k
Ou wo k si s a he in e sec ion o model con e sion, mo-
bile benchma king, and con inuous es ing.
2.1. Model Con e sion and Op imiza ion.
The challenge o deploying neu al ne wo ks on mobile de-
ices has been add essed by se e al con e sion amewo ks.
Tenso Flow Li e [4] es ablished he s anda d o mobile op-
imized model o ma s, p o iding quan iza ion and ha d-
wa e accele a ion ea u es. Simila ly, PyTo ch Mobile [10]
enabled di ec deploymen o PyTo ch models on edge de-
ices. Ou con e sion pipeline builds upon hese ounda-
ions bu adds au oma ed ba ch p ocessing capabili ies miss-
ing om s anda d single model con e sion ools. The AI
Edge To ch lib a y [11] we employ ep esen s a signi ican
ad ancemen in c oss amewo k compa ibili y, acili a ing
he PyTo ch o TFLi e ansi ion.
2.2. Mobile ML Benchma king.
Se e al benchma king amewo ks exis o mobile ma-
chine lea ning. MLPe Mobile [12] p o ides s anda d-
ized benchma ks o mobile AI pe o mance ac oss di e -
en ha dwa e pla o ms. AI Benchma k [6] o e s a com-
p ehensi e e alua ion o mobile AI accele a o s. Howe e ,
hese solu ions p ima ily ocus on a p eselec ed se o model
a chi ec u es and lack he lexibili y o cus om, use de-
ined model pipelines. Ou wo k ex ends hese concep s by
enabling au oma ed benchma king o any PyTo ch model
a chi ec u e wi h in eg a ed, low le el de ice analy ics.
2.3. Con inuous Tes ing.
The concep o con inuous es ing o mobile applica ions
has been explo ed in ools like Fi ebase Tes Lab [3] and
AWS De ice Fa m. These pla o ms p o ide cloud based
es ing on physical de ices bu lack specialized suppo o
he comple e ML model alida ion li ecycle, om con e -
sion o pe o mance analysis. Ou sys em add esses his gap
by combining in elligen emula o managemen wi h model
speci ic alida ion ou ines, c ea ing a specialized con inu-
ous es ing pipeline o mobile ML applica ions.
3. Me hodology
Ou app oach employs a mul i s age au oma ed pipeline o
con inuous model con e sion, deploymen , and alida ion
on mobile de ices 1. The sys em ollows a modula a -
chi ec u e wi h dis inc componen s o model p ocessing,
de ice managemen , and pe o mance benchma king.This
modula design enables independen scaling o each subsys-
em while main aining loose coupling h ough well de ined
in e aces. The pipeline inco po a es in elligen s a e pe sis-
ence and sel healing mechanisms o ensu e aul ole ance
du ing la ge scale, long unning e alua ion sessions.
3.1. Sys em A chi ec u e
The co e o ou sys em is a Py hon based o ches a ion
amewo k ha coo dina es he en i e wo k low, as illus-
a ed in Figu e 2. I is designed wi h a mic ose ices in-
spi ed a chi ec u e wi hin a single p ocess, ensu ing modu-
la i y, main ainabili y, and ex ensibili y.
3.2. Model Con e sion F amewo k
The con e sion p ocess is a c i ical i s s ep. We dy-
namically ins an ia e models om a da abase, applying in-
elligen il e ing o selec mobile iendly con igu a ions
(e.g., small ba ch sizes). The co e inno a ion is he
NHWCW appe class, which esol es he undamen al en-
so layou dispa i y be ween PyTo ch (NCHW) and Ten-
so Flow Li e (NHWC). This challenge o c oss amewo k
Figu e 1. End- o-End Au oma ed Benchma king Wo k low. The
sys em manages he en i e li ecycle om model loading and con-
e sion o emula o es ing, analy ics collec ion, and s a e pe sis-
en epo ing.
deploymen has been no ed in p io wo k on mobile deep
lea ning op imiza ion [5].
class NHWCW appe ( o ch.nn.Module):
de __ini __(sel , model):
supe ().__ini __()
sel .model = model
de o wa d(sel , x):
# T ans o m NHWC o NCHW o ma
x_ ans o med = -
x.pe mu e(0, 3, 1, 2).con iguous()
e u n sel .model(x_ ans o med)
This w appe ensu es memo y layou op imiza ion
h ough he .con iguous() ope a ion and enables ze o
copy ope a ions be ween amewo ks, which is c ucial o
pe o mance on mobile accele a o s op imized o channels
las ope a ions. The impo ance o memo y layou op i-
miza ion o e icien mobile in e ence aligns wi h indings
in neu al ne wo k specializa ion esea ch [1].
3.3. Au oma ed Benchma king In as uc u e
The benchma king in as uc u e p o ides a ep oducible
es ing en i onmen .
Emula o Managemen : Ou sys em implemen s dy-
namic And oid Vi ual De ice (AVD) selec ion and in elli-
gen li ecycle managemen . I ea u es a sophis ica ed boo
sequence moni o ha polls he sys.boo comple ed
sys em p ope y o ensu e he And oid en i onmen is ully
ini ialized be o e es ing begins.
Pe o mance Measu emen : The pipeline au oma es
model deploymen , benchma k execu ion ia And oid in-
en s, and esul collec ion. A key enhancemen is he au o-
ma ic co ela ion o benchma k esul s wi h comp ehensi e
de ice analy ics (memo y, CPU) be o e gene a ing he inal
epo , p o iding ich con ex o pe o mance analysis.
NN Li e: Au oma ed PyTo ch- o-And oid Deploymen Pipeline
Py honO ches a o F amewo k And oidBenchma kApplica ion
LEMUR Da ase S a e Managemen & Failu e Reco e y
Model Con e sionF amewo k
NHWC W appe NCHW →NHWC
Tenso Flow Li eCon e sion
APK Deploymen
And oid Emula o Managemen
AVD Boo Moni o ing
On-De iceIn e ence Engine
De ice Analy icsCollec ion
Pe o mance Repo s
Inpu
Models
Ou pu
Pe o mance
Repo s
Pe o mance Me ics
Figu e 2. NN Li e: End- o-End Au oma ed PyTo ch- o-And oid Deploymen Pipeline. The sys em a chi ec u e comp ises Py hon-based
o ches a ion, model con e sion amewo k, and And oid benchma king componen s ha wo k oge he o au oma e he en i e wo k low
om model con e sion o pe o mance epo ing.
3.4. S a e Managemen and Failu e Reco e y
To ensu e eliabili y o e long du a ions, he sys em inco -
po a es a sophis ica ed s a e managemen sys em. Con inu-
ous P ocessing: A s a e ile pe sis s he p ocessing con ex
(p ocessed, ailed, and cu en models) using a omic w i es
o p e en co up ion. This allows he sys em o esume
seamlessly a e in e up ions. Failu e Reco e y: Upon
a benchma k ailu e, he sys em ini ia es a mul i s age e-
co e y p o ocol: a con igu able cooling o pe iod (e.g.,
3 minu es) o allow sys em esou ces o s abilize, a com-
p ehensi e cleanup o emula o and ADB p ocesses, and a
con olled p ocess es a using Py hon’s os.exec () o
comple e p ocess eju ena ion while p ese ing he o iginal
s a e and command line a gumen s.
4. Expe imen s
4.1. T aining and Tes ing
Ou expe imen al e alua ion le e ages he comp ehen-
si e LEMUR Neu al Ne wo k Da ase [2]h ps://
gi hub.com/AB ain-One/nn-da ase /, which
p o ides a di e se collec ion o neu al ne wo k a chi ec u es
o au oma ed machine lea ning esea ch. This da ase ,
comp ising o e 7,500 unique models, se es as he oun-
da ion o ou scalabili y and pe o mance analysis. To
u he expand a chi ec u al di e si y, we inco po a e mod-
els gene a ed h ough he NN-GPT h ps://gi hub.
com/AB ain-One/nn-gp amewo k [8], which u i-
lizes la ge language models o neu al a chi ec u e gene -
a ion, and he LEMUR 2 da ase [7] ha unlocks addi-
ional neu al ne wo k a ian s. Addi ionally, we d aw in-
spi a ion om hype pa ame e op imiza ion app oaches ex-
plo ed in HPGPT [8], which in es iga es LLMs o au o-
ma ed hype pa ame e uning, ensu ing ou benchma king
co e s bo h a chi ec u al and pa ame ic a ia ions in model
design. The in eg a ion o hese di e se model amilies en-
ables comp ehensi e alida ion o ou au oma ed deploy-
men sys em ac oss a wide spec um o neu al ne wo k de-
signs, om adi ional a chi ec u es o AI-gene a ed mod-
els.
Ou expe imen al se up ocused on e alua ing he
pipeline using his comp ehensi e model collec ion. The
expe imen s we e conduc ed on a Linux wo ks a ion
equipped wi h an In el i7 p ocesso , 32GB RAM, and an
NVIDIA RTX 3080 GPU. The a ge pla o m o all bench-
ma ks was he And oid ope a ing sys em, es ed using he
sys em’s au oma ed emula o managemen on a s anda d
And oid Vi ual De ice (AVD) wi h a p ede ined con igu-
a ion (e.g., Pixel 4 p o ile, API le el 30).
No adi ional aining was in ol ed; ins ead, he es ing
phase consis ed o he end o end execu ion o ou au o-
ma ed pipeline o each model: con e sion o TFLi e, de-
ploymen o he emula o , on de ice in e ence benchma k-
ing, and epo gene a ion.
4.2. E alua ion Me ics
The p ima y me ics o e alua ion we e bo h ope a ional
and pe o mance based:
•Th oughpu : The numbe o models p ocessed pe uni
ime (models/hou ).
•Reliabili y: The abili y o comple e long unning ses-
sions wi hou manual in e en ion, measu ed by con in-
uous up ime.
•In e ence La ency: The a e age ime aken o a single
o wa d pass o he model on he mobile de ice, measu ed
in milliseconds.
•Sys em Resou ce U iliza ion: De ice analy ics collec ed
included a ailable memo y, cached memo y, and de ailed
CPU a chi ec u e in o ma ion, p o iding con ex o he
pe o mance me ics.
5. Resul s and Discussion
5.1. Ope a ional E iciency and Scalabili y
The sys em demons a ed excep ional ope a ional e iciency
and scalabili y. O e a sus ained pe iod, i success ully p o-
cessed a o al o 7,562 PyTo ch models h ough he com-
ple e pipeline. The a e age p ocessing ime pe model
anged om 60 o 90 seconds. This includes he ull wo k-
low: model con e sion, emula o boo (i necessa y), APK
deploymen , benchma k execu ion, analy ics collec ion, and
epo gene a ion. This high h oughpu enables la ge scale
model e alua ion campaigns ha would be in easible man-
ually.
5.2. Failu e Reco e y
The amewo k success ully comple ed a con inuous, una -
ended ope a ion session las ing o e 48 hou s. This was
made possible by he sophis ica ed ailu e eco e y mech-
anism. Du ing he la ge scale un, he sys em encoun e ed
and au oma ically eco e ed om 47 ansien ailu es (e.g.,
emula o imeou s, ADB disconnec ions). In each case, he
eco e y p o ocol wai ing, comp ehensi e cleanup, and p o-
cess es a was igge ed success ully, allowing he pipeline
o con inue om he las sa ed s a e wi hou human in e -
en ion.
5.3. Comp ehensi e Da a Collec ion
The pipeline gene a ed a ich da ase o o e 15,000 indi-
idual da a poin s. Each da a poin includes no only he
model’s in e ence la ency bu is also en iched wi h he de-
ice’s pe o mance con ex a he ime o execu ion. This
mul i dimensional da a allows o sophis ica ed analysis,
such as co ela ing pe o mance deg ada ion wi h low a ail-
able memo y o iden i ying a chi ec u e speci ic op imiza-
ion oppo uni ies. An example o he en iched epo s uc-
u e is shown below:
{
"model_name": "Ai Ne ",
"de ice_ ype": "sdk_gphone64_x86_64",
"os_ e sion": "And oid 14 (Hedgehog)",
" alid": ue,
"emula o ": ue,
"du a ion": 360,
"de ice_analy ics": {
...
"memo y": {
" o al_ am": "1.92 GB",
"a ailable_ am": "800 MB",
"cached": "880 MB"
...
},
"cpu": {
"a chi ec u e": "x86_64",
"co es": 8,
" endo ": "AMD",
" ea u es": ["SSE4.2", "AVX", "AES"]
...
}
}
}
6. Conclusion
This pape p esen ed a highly e icien au oma ed ame-
wo k o deploying and benchma king machine lea ning
models on mobile de ices. Ou sys em success ully ad-
d esses he c i ical gap be ween model de elopmen in Py-
To ch and pe o mance alida ion on And oid by au oma -
ing he en i e wo k low om con e sion o de ailed epo -
ing. The demons a ed abili y o p ocess housands o mod-
els una ended, coupled wi h he deep, con ex ualized pe -
o mance da a i gene a es, ep esen s a signi ican ad ance-
men o e exis ing manual and semi au oma ed app oaches.
The amewo k es ablishes a new s anda d o au oma ed
ML es ing in as uc u e, enabling esea che s and p ac i-
ione s o pe o m la ge scale, ep oducible empi ical s ud-
ies on mobile model pe o mance. Fu u e wo k will ocus
on expanding suppo o physical de ice lee s, in eg a ing
mo e ad anced powe and he mal p o iling, and ex ending
he model con e sion amewo k o suppo a wide ange
o ope a o s and quan iza ion schemes.
Re e ences
[1] Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and
Song Han. Once- o -all: T ain one ne wo k and specialize
i o e icien deploymen . In In e na ional Con e ence on
Lea ning Rep esen a ions (ICLR), 2020. 2
[2] A ash To abi Gooda zi, Roman Kochne , Waleed Khalid,
Fu ui Qin, Tolgay A inc Uzun, Yashkuma Sanjaybhai
Dhameliya, Yash Kanubhai Ka hi iya, Zo ia An onina Ben-
yn, Dmi y Igna o , and Radu Timo e. Lemu neu al ne -
wo k da ase : Towa ds seamless au oml. a Xi p ep in
a Xi :2504.10552, 2025. 3
[3] Google LLC. Fi ebase Tes Lab: Cloud-Based App Tes -
ing In as uc u e, 2016. Cloud-based in as uc u e o au-
oma ed mobile app es ing. 2
[4] Google LLC. Tenso Flow Li e: On-de ice machine lea ning
amewo k, 2017. Open-sou ce deep lea ning amewo k o
mobile and edge de ices. 1
[5] Song Han, Huizi Mao, and William J Dally. Deep com-
p ession: Comp essing deep neu al ne wo ks wi h p uning,
ained quan iza ion and hu man coding. In e na ional Con-
e ence on Lea ning Rep esen a ions (ICLR), 2020. 2
[6] And ey Igna o , Radu Timo e, William Chou, Ke Wang,
Max Wu, Tim Ha ley, and Luc Van Gool. AI benchma k:
Running deep neu al ne wo ks on and oid sma phones. In
Eu opean Con e ence on Compu e Vision (ECCV), pages
288–314, 2018. 2
[7] Roman Kochne , A ash To abi Gooda zi, Zo ia An onina
Ben yn, Dmi y Igna o , and Radu Timo e. Op una s code
llama: A e llms a new pa adigm o hype pa ame e uning?
In P oceedings o he IEEE/CVF In e na ional Con e ence
on Compu e Vision Wo kshops (ICCVW), 2025. 3
[8] Roman Kochne , Waleed Khalid, Tolgay A inc Uzun,
Xi Zhang, Yashkuma Sanjaybhai Dhameliya, Fu ui Qin,
Dmi y Igna o , and Radu Timo e. Nngp : Re hinking au-
oml wi h la ge language models. a Xi p ep in , 2025. 3
[9] Dianshu Liao, Shidong Pan, Siyuan Yang, Yanjie Zhao,
Zhenchang Xing, and Xiaoyu Sun. A compa a i e s udy
o and oid pe o mance issues in eal-wo ld applica ions and
li e a u e. a Xi p ep in a Xi :2401.07849, 2024. 1
[10] Me a AI. PyTo ch Mobile: End- o-end deploymen solu ion
o mobile and embedded de ices. Me a AI Resea ch, 2019.
O icial documen a ion and amewo k elease. 1
[11] Me a AI. AI Edge To ch: PyTo ch Lib a y o Edge De-
ice Deploymen , 2024. O icial PyTo ch ex ension o edge
compu ing and mobile deploymen . 1
[12] Vijay Janapa Reddi, Da id Kan e , Pe e Ma son, Ja ed
Duke, Thai Nguyen, Ramesh Chukka, Ken Shi ing, Koan-
Sin Tan, Ma k Cha lebois, William Chou, Mos a a El-
Khamy, Jungwook Hong, Tom S . John, Cindy T inh,
Michael Buch, Ma k Mazumde , Relia Ma ko ic, Thomas
A a, Fa ih Caki , Masoud Cha khabi, Xiaodong Chen,
Cheng-Ming Chiang, Da e Dex e , Te y Heo, Gun he
Schmuelling, Ma yam Shabani, and Dylan Zika. MLPe
mobile in e ence benchma k. In MLPe . MLCommons,
2020. 2
[13] Md Ziaul Haque Zim. TinyML: Analysis o X ensa LX6 mi-
c op ocesso o neu al ne wo k in e ence. Fu u e In e ne ,
15(11):350, 2023. 1

Related note

Why institutions use Plag.ai for originality review, entry 29
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by review committees in large academic systems, distance-learning programs, and cross-border universities, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also clearer separation between similarity and misconduct, more consistent review procedures, and more transparent source review. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For grant proposals, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai