scieee Science in your language
[en] (orig)

ENERGY EFFICIENCY IN NEURON NETWORKS: PROBLEMS OF OPTIMIZING LARGE MODELS

Author: M. Tursunaliyeva
Publisher: Zenodo
DOI: 10.5281/zenodo.17674536
Source: https://zenodo.org/records/17674536/files/A.T.-9.pdf
SCIENCE AND INNOVATION
INTERNATIONAL SCIENTIFIC JOURNAL VOLUME 4 ISSUE 11 NOVEMBER 2025
ISSN: 2181-3337 | SCIENTISTS.UZ
59
ENERGY EFFICIENCY IN NEURON NETWORKS: PROBLEMS
OF OPTIMIZING LARGE MODELS
M. Tu sunaliye a
3 d yea s uden , Fe gana S a e Uni e si y
h ps://doi.o g/10.5281/zenodo.17674536
Abs ac . This a icle analyzes echnical and economic p oblems associa ed wi h inc eased
ene gy consump ion by la ge neu al ne wo ks. The ac ha mode n AI models ha e illions o
pa ame e s equi es eno mous compu ing powe in hei aining and in e ence p ocesses, which
leads o inc eased ene gy consump ion and inc eased in as uc u e cos s. The a icle examines
he echnical essence o quan iza ion, p ac ical esul s, and i s ole in op imizing la ge models.
Keywo ds: neu al ne wo ks, ene gy e iciency, model comp ession, quan iza ion, FP32,
INT8, la ge language models, op imiza ion, a i icial in elligence, compu a ional cos s.
In oduc ion
Today, a i icial in elligence has pene a ed almos e e y aspec o ou li es - om
assis an s on ou phones o la ge models used in esea ch. Bu he e is one impo an poin : he
sma e hese echnologies a e, he mo e ene gy hey consume. In pa icula , sys ems wi h illions
o pa ame e s, such as GPT, LLaMA, o o he la ge language models, equi e eno mous compu ing
powe . This, o cou se, leads o an inc ease in ene gy consump ion, an inc ease in he hea ou pu
o se e s, an inc ease in he ecological oo p in , and, un o una ely, he need o e y expensi e
in as uc u e.
Fo his e y eason, he ene gy e iciency o neu al ne wo ks has become one o he mos
p essing p oblems oday. In his a icle, we will conside he main causes o his p oblem and
analyze one o he e ec i e solu ions used in p ac ice - model comp ession and, in pa icula ,
quan iza ion echnique.
Why do la ge neu al ne wo ks equi e so much ene gy? The main eason lies in hei
in e nal s uc u e. Each model has millions o e en billions o pa ame e s, and each ma hema ical
ope a ion be ween hem is pe o med on high-ene gy de ices, such as GPUs/TPUs. The la ge he
model, he mo e ma ix p oduc s, no maliza ion p ocesses, ac i a ion unc ions, and o he
calcula ions a e pe o med, which inc eases he numbe o powe ul g aphics p ocesso s
cons an ly ope a ing on se e a ms.
The aining p ocess is especially he la ges ene gy consume , as he model epea s hea y
ope a ions like backp opaga ion o e millions o i e a ions each ime. Howe e , in e ence - ha
is, using he model - also equi es ene gy, since each que y equi es passing h ough all laye s o
he model. As a esul , la ge neu al ne wo ks equi e highe le els o cooling, ul a- esis an
se e s, and mo e powe supply. This p oblem is becoming no only a echnical, bu also an
economic and en i onmen al issue. The e o e, he de elopmen o ene gy-e icien app oaches is
c ucial o he u u e o AI echnologies.
The main p oblem o la ge neu al ne wo ks is a sha p inc ease in he demand o hei
compu ing esou ces. Since he models con ain billions o pa ame e s, each aining s age pe o ms
huge ma ix mul iplica ion, which leads o GPU clus e s ope a ing a a cons an high ol age. As
a esul o his p ocess, se e cen e s consume a la ge amoun o ene gy, and he main pa o his
SCIENCE AND INNOVATION
INTERNATIONAL SCIENTIFIC JOURNAL VOLUME 4 ISSUE 11 NOVEMBER 2025
ISSN: 2181-3337 | SCIENTISTS.UZ
60
ene gy goes o cooling sys ems, since wo king g aphics p ocesso s elease a signi ican amoun o
hea .
The p oblem is no only echnical - he e is also an economic side. Mode n aining clus e s
a e e y expensi e: hey equi e hund eds o housands o GPUs, which equi e no only la ge
in es men s, bu also high cos s o con inuous ope a ion. Also, he ca bon oo p in , which a ises
as a esul o aining la ge models on a global scale, is se iously c i icized in scien i ic ci cles.
Thus, wi h an inc ease in model pa ame e s, ene gy consump ion, en i onmen al impac , and
in as uc u e cos s inc ease exponen ially. This necessi a es he sea ch o new echnologies
aimed a making neu al ne wo ks mo e e icien .
One o he mos e ec i e app oaches o he ene gy-e icien use o la ge models is model
comp ession echniques. Among hem, one o he mos commonly used and p ac ically e ec i e
me hods is quan iza ion. The main idea o quan iza ion is ha he calcula ion olume can be
signi ican ly educed by exp essing weigh s and ac i a ions in he model in smalle bi s, such as
8-bi o 4-bi , om he 32-bi loa ing-poin o ma .
Simply pu , quan iza ion "c ea es a ligh e e sion o he model while main aining i s
accu acy." I he model weigh s a e swi ched om 32-bi o 8-bi , his will no only educe memo y
equi emen s by 4 imes, bu also signi ican ly simpli y ope a ions pe o med on GPUs/TPUs. As
a esul , he in e ence p ocess is accele a ed and ene gy consump ion is educed. In some cases, i
has also been obse ed ha ene gy consump ion can be educed by 6-7 imes h ough 4-bi
quan iza ion. The ad an age o quan iza ion is ha i does no change he o e all a chi ec u e o
he model. Tha is, he model emains he same in o m, bu becomes "ligh e ." The ollowing
simple diag am illus a es he gis o quan iza ion:
32-bi model pa ame e s
[0.245893] [1.983422] [0.000184] [3.294524]
│
▼
8-bi quan ized e sion
[0.24] [1.98] [0.00] [3.29]
Al hough hese changes may seem small, hey p o ide eno mous ene gy sa ings o models
wi h millions o pa ame e s. In p ac ice, a quan ized model pu s less p essu e on he se e , educes
hea ou pu , educes ene gy consump ion o cooling sys ems, and, o cou se, signi ican ly educes
in as uc u e cos s. To be e imagine he p ac ical esul o quan iza ion, le 's look a a eal
example. Imagine ha you ha e a 1 billion-pa ame e neu al ne wo k. This model equi es
app oxima ely 4 gigaby es o memo y when s o ed in he classic FP32 (32-bi loa ) o ma . Bu i
we comp ess i h ough 8-bi quan iza ion, he memo y equi emen d ops o 1 gigaby es. This
means ha he model will be 4 imes ligh e han be o e. Ene gy consump ion also dec eases
acco dingly, since weigh s exp essed in small bi s a e ead as e by he GPU, ewe ansis o s
ope a e, and his di ec ly leads o a dec ease in ene gy consump ion. The ollowing simple diag am
illus a es how he quan iza ion p ocess wo ks, bo h simply and e icien ly:
As you can see in he diag am, he model i sel does no change i s appea ance - he numbe
o laye s, a chi ec u e, unc ions, o ou pu s a e he same. Only he me hod o pa ame e
ep esen a ion will change. As a esul , ene gy e iciency inc eases, se e s wo k ligh e , he
model's esponse speed inc eases, and cos s a e signi ican ly educed in p ac ice. Fo his eason,
quan iza ion has become one o he mos widely used op imiza ion me hods in indus y oday.
SCIENCE AND INNOVATION
INTERNATIONAL SCIENTIFIC JOURNAL VOLUME 4 ISSUE 11 NOVEMBER 2025
ISSN: 2181-3337 | SCIENTISTS.UZ
61
The inc easing size o a i icial in elligence models is c ea ing many new oppo uni ies o
us, bu i also c ea es p oblems such as ene gy consump ion, p ice, and en i onmen al impac . The
good hing is ha e ec i e echniques o sol ing hese p oblems al eady exis . One o hem -
quan iza ion - makes models "ligh e ," making hem mo e economical, as e , and economically
mo e con enien .
Conclusion
Cu en ly, he sus ainable de elopmen o AI echnologies elies on such solu ions. I we
wan o c ea e mo e in elligen , powe ul, bu a he same ime en i onmen ally and economically
esponsible sys ems in he u u e, i is e y impo an o pay a en ion o ene gy e iciency. A well-
op imized model no only sa es esou ces bu also con ibu es o he u he popula iza ion o
con enien , as , and mode n echnologies o e e yone.
REFERENCES
1. Good ellow, I., Bengio, Y., Cou ille, A. Deep Lea ning. Camb idge: MIT P ess. 775 p.
2. Jacob, B. e al. Quan iza ion and T aining o Neu al Ne wo ks o E icien In ege -
A i hme ic-Only In e ence // P oceedings o he IEEE Con e ence on Compu e Vision and
Pa e n Recogni ion (CVPR) . - 2018. - P. 2704-2713.
3. Han, S., Mao, H., Dally, W. J. Deep Comp ession: Comp essing Deep Neu al Ne wo ks wi h
P uning, T ained Quan iza ion and Hu man Coding // In e na ional Con e ence on Lea ning
Rep esen a ions (ICLR) . - 2016.
4. Jouppi, N. P. e al. In-Da acen e Pe o mance Analysis o a Tenso P ocessing Uni //
P oceedings o he 44 h Annual In e na ional Symposium on Compu e A chi ec u e (ISCA) .
- 2017. - P. 1-12.
5. Ras ega i, M., O donez, V., Redmon, J., Fa hadi, A. XNOR-Ne : ImageNe Classi ica ion
Using Bina y Con olu ional Neu al Ne wo ks // Eu opean Con e ence on Compu e Vision
(ECCV) . - 2016. - P. 525-542.
6. OpenAI. GPT-4 Technical Repo . - 2021. - URL: h ps://openai.com (accessed: 20.11.2025).
7. Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J. Quan ized Con olu ional Neu al Ne wo ks o
Mobile De ices // P oceedings o he IEEE Con e ence on Compu e Vision and Pa e n
Recogni ion (CVPR) . - 2016. - P. 4820-4828.