scieee Science in your language
[en] (orig)

Offline LLM: Generating human like responses without internet

Author: Soppari, Kavitha; Basupally, Nuthana; Toomu, Harika; Bijili, Pavan Kalyan
Publisher: Zenodo
DOI: 10.5281/zenodo.17312807
Source: https://zenodo.org/records/17312807/files/WJARR-2025-1783.pdf
 Co esponding au ho : Nu hana Basupally
Copy igh © 2025 Au ho (s) e ain he copy igh o his a icle. This a icle is published unde he e ms o he C ea i e Commons A ibu ion Liscense 4.0.
O line LLM: Gene a ing human like esponses wi hou in e ne
Ka i ha Soppa i, Nu hana Basupally *, Ha ika Toomu and Pa an Kalyan Bijili
Depa men CSE (AI-ML) o ACE Enginee ing College Hyde abad, India.
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 1823-1827
Publica ion his o y: Recei ed on 29 Ma ch 2025; e ised on 11 May 2025; accep ed on 13 May 2025
A icle DOI: h ps://doi.o g/10.30574/wja .2025.26.2.1783
Abs ac
This s udy explo es he in eg a ion o ligh weigh and o line-capable na u al language p ocessing (NLP) ools o
ex ac i e and abs ac i e ex summa iza ion in esou ce-cons ained en i onmen s. D awing om ounda ional wo k
such as Tex Rank (Mihalcea & Ta au, 2004) and he NLTK oolki (Bi d e al., 2009), he sys em combines g aph-based
ex ac i e summa iza ion and equency-based keywo d ex ac ion o e icien o line ex analysis. PyMuPDF
acili a es accu a e PDF ex ex ac ion, enabling documen con e sion in o analyzable o ma s. Abs ac i e
summa iza ion le e ages he T5-small model (Ra el e al., 2020) o gene a ing concise summa ies wi h minimal
compu a ional o e head, while Hugging Face ans o me s (Wol e al., 2020) enable sen imen analysis o use
eedback in e p e a ion. Emphasizing low-connec i i y usage, he a chi ec u e suppo s local deploymen o NLP
models (Anas asopoulos e al., 2021) and u ilizes Flask (Kuma & Singh, 2021) o in eg a ing NLP se ices in o a use -
iendly o line web applica ion. Fu he , he deploymen o comp essed models on edge de ices (Chen e al., 2022)
highligh s he easibili y o deli e ing obus summa iza ion and analysis ools wi hou eliance on cloud in as uc u e.
This wo k p o ides a modula , e icien , and accessible amewo k o documen unde s anding in o line scena ios.
Keywo ds: O line P ocessing; Language Models; T5-Small; Flask; Tex Summa iza ion; Keywo d Ex ac ion; Pymupd
(Fi z); NLTK; PDF Tex Ex ac ion; P i acy.
1. In oduc ion
Wi h inc easing demand o eal- ime ex summa iza ion and analysis in emo e o low-connec i i y a eas, adi ional
cloud-dependen NLP sys ems become imp ac ical. Many exis ing ools equi e in e ne access and hea y
compu a ional esou ces, limi ing hei usabili y in o line o edge en i onmen s such as u al educa ion, ield esea ch,
and secu e co po a e se ings. This p ojec is mo i a ed by he need o c ea e a ligh weigh , o line-capable sys em ha
can p ocess documen s, ex ac summa ies, iden i y keywo ds, and analyze sen imen e icien ly. By le e aging p o en
echniques like Tex Rank, NLTK, and compac ans o me models like T5-small, he solu ion aims o b idge he gap
be ween ad anced NLP capabili ies and accessibili y in esou ce-cons ained scena ios.
This sys em in oduces an o line language p ocessing sys em ha combines he powe o p e- ained models wi h
open-sou ce lib a ies o pe o m key NLP asks wi hou in e ne access. Using a Flask web in e ace, he sys em allows
use s o upload PDF documen s, ex ac ex using PyMuPDF ( i z), and p ocess he con en h ough wo summa iza ion
me hods—ex ac i e (summa) and abs ac i e (T5-small). Addi ionally, i pe o ms keywo d ex ac ion using NLTK’s
equency dis ibu ion. While he sys em ope a es p ima ily o line, i includes an op ional sen imen analysis ea u e
ha can be enabled when in e ne access is a ailable.
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 1823-1827
1824
2. Li e a u e e iew
2.1. Tex ank and G aph-Based Summa iza ion Techniques - Mihalcea & Ta au (2004)
The esea ch p esen s Tex Rank, an unsupe ised, g aph-based ex ac i e summa iza ion echnique whe e key
sen ences a e anked based on impo ance and in e connec edness. I o ms he ounda ion o summa.summa ize ,
used o ex ac i e summa iza ion in o line en i onmen s.
Me hodologies Used: G aph-based anking algo i hms, unsupe ised ex ac i e summa iza ion.
2.2. NLTK: Na u al Language Toolki - Bi d, Klein & Lope (2009)
NLTK is a widely adop ed Py hon lib a y o ex p ocessing and compu a ional linguis ics. I p o ides essen ial ools
o okeniza ion, s opwo d emo al, and equency-based keywo d ex ac ion used in his p ojec .
Me hodologies Used: Rule-based and s a is ical NLP, okeniza ion, equency dis ibu ion, s opwo d il e ing.
2.3. PyMuPDF o E icien PDF Tex Ex ac ion- Sainz (2018)
PyMuPDF (also known as i z) is a ligh weigh and as Py hon lib a y ha enables accu a e ex ex ac ion om PDF
iles, c ucial o con e ing scanned o s uc u ed documen s in o analyzable ex o ma s.
Me hodologies Used: Ligh weigh PDF pa sing, page-wise ex ex ac ion, mul i- o ma documen suppo .
2.4. Tex Summa iza ion wi h P e ained T ans o me s - Ra el e al. (2020)
This pape in oduced he T5 (Tex - o-Tex T ans e T ans o me ) model, which e ames all NLP asks in o a ex - o-
ex o ma , enabling uni ied aining. T5-small, a ligh weigh a ian , is capable o gene a ing high-quali y abs ac i e
summa ies wi h educed compu a ional o e head.
Me hodologies Used: T ans o me -based sequence- o-sequence a chi ec u e, supe ised p e aining on la ge co po a,
ine- uning o summa iza ion asks.
2.5. Sen imen Analysis wi h T ans o me s - Wol e al. (2020)
This wo k discusses he use o Hugging Face’s ans o me pipeline o sen imen analysis and how p e- ained models
can be adap ed o eal- ime use eedback and opinion mining. Though in e ne -based by de aul , he models can be
p eloaded o limi ed o line use.
Me hodologies Used: P e- ained ans o me models, sen imen classi ica ion pipelines, ze o-sho and ine- uned
analysis.
2.6. O line-Capable NLP Sys ems o Low-Connec i i y Scena ios - Anas asopoulos e al. (2021)
This s udy highligh s he impo ance o designing NLP sys ems ha ope a e in o line o low- esou ce se ings, ocusing
on minimizing dependency on ex e nal se e s while main aining pe o mance.
Me hodologies Used: Local model deploymen , educed-size p e- ained models, op imiza ion o edge de ices.
2.7. Flask-Based Web Applica ions o NLP In eg a ion - Kuma & Singh (2021)
The s udy p esen s a modula a chi ec u e o in eg a ing NLP unc ionali ies in o Flask-based web applica ions,
emphasizing usabili y and o line capabili y. I showcases how Flask can hos p e- ained models locally o p o ide
se ices like summa iza ion and keywo d ex ac ion. Me hodologies Used: Flask web amewo k, REST ul in eg a ion
o NLP models, o line-se ing o p e- ained models.
2.8. Ligh weigh Language Models o Edge De ices - Chen e al. (2022)
This pape explo es deploying compac NLP models like T5-small on edge de ices o educe dependency on cloud
compu ing, making AI mo e accessible in o line en i onmen s. The s udy alida es he e ec i eness o small
ans o me models o summa iza ion and ques ion answe ing.
Me hodologies Used: Model comp ession, ans o me -based summa iza ion, edge-compa ible NLP deploymen .
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 1823-1827
1825
2.9. Objec i es
The p ima y objec i e o his sys em is o enable e icien o line na u al language p ocessing by le e aging ligh weigh
la ge language models such as T5-small. I aims o p o ide human-like summa iza ion, keywo d ex ac ion, and ex
unde s anding wi hou equi ing in e ne access, he eby ensu ing da a p i acy and elimina ing la ency issues. By
in eg a ing PyMuPDF o PDF ex ex ac ion and NLTK o equency-based keywo d analysis, he sys em acili a es
comple e o line handling o uploaded documen s. The use o summa.summa ize allows ex ac i e summa iza ion
wi hou connec i i y, while he T5-small model enables abs ac i e summa iza ion h ough p eloaded esou ces.
Addi ionally, a ligh weigh Flask web in e ace ensu es use - iendly in e ac ion, and an op ional sen imen analysis
module is included o ex ended insigh s using p e-downloaded Hugging Face models. The o e all design is op imized
o low- esou ce en i onmen s, making he sys em sui able o p i acy-sensi i e o bandwid h-cons ained scena ios.
3. Expe imen al esul s and Discussion
To e alua e he pe o mance o he o line ex summa iza ion sys em, expe imen s we e conduc ed using mul iple PDF
documen s o a ying leng hs and opics, anging om academic a icles o gene al epo s. Two summa iza ion
app oaches we e compa ed:
3.1. Ex ac i e Summa iza ion: (summa summa ize )
• Speed: Fas (~1-2 seconds o mos documen s).
• O line Capabili y: Fully o line.
• Summa y Na u e: Sen ence-based ex ac ion, less cohe en in low bu e ains ac ual accu acy.
• Comp ession Ra io: ~40 50% o he o iginal ex .
• Bes Use Case: Technical o ac ual documen s whe e sen ence impo ance can be de e mined by equency
and posi ion.
3.2. Abs ac i e Summa iza ion (T5-small)
• Speed: Mode a e (~5 10 seconds o medium-leng h ex s).
• O line Capabili y: Requi es p e-downloaded model bu wo ks o line once loaded.
• Summa y Na u e: Mo e human-like, pa aph ased, cohe en summa ies wi h na u al low.
• Comp ession Ra io: ~20 30% o he o iginal ex .
• Bes Use Case: A icles o na a i e ex whe e eadabili y and cohe ence a e key.
O line ex summa iza ion can be achie ed using a ious algo i hms. Summa (Tex Rank) and LexRank a e g aph-
based ex ac i e me hods ha o e as , o line p ocessing bu lack deep seman ic unde s anding.
Figu e 1 Pe o mance compa ison o summa iza ion me hods
3.3. Pe o mance compa ison be ween Ex ac i e summa iza ion and Abs ac i e summa iza ion.
Luhn Algo i hm is equency-based and e icien o simple asks, hough limi ed in handling complex ex s. T5, an
abs ac i e model, gene a es human-like summa ies wi h be e cohe ence bu is esou ce-in ensi e and equi es a p e-
downloaded model. T adi ional me hods a e as e and ligh weigh , while T5 o e s iche , mo e luen ou pu .
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 1823-1827
1826
Figu e 2 Compa ison o o line summa iza ion algo i hms
3.4. Compa ison o o he o line ex summa iza ion models o summa and T5.
Summa, LexRank, and Luhn a e as o line me hods o summa izing ex . T5 gi es be e , human-like summa ies bu
needs mo e powe and s o age.
Table 1 Compa ison o di e en Me hodologies
S.
No
Ti le & Au ho
Focus A ea
Me hodologies
Used
Con ibu ions
Limi a ions
1
Tex Rank and G aph-Based
Summa iza ion
TechniquesMihalcea &
Ta au (2004)
Ex ac i e
Summa iza ion
G aph-based
anking
algo i hms,
unsupe ised
ex ac i e
summa iza ion
In oduced
Tex Rank
algo i hm o
unsupe ised
summa iza ion;
ounda ion o
g aph-based
app oaches
Limi ed o
ex ac i e
summa ies; igno es
seman ic
unde s anding
2
NLTK: Na u al Language
Toolki Bi d, Klein & Lope
(2009)
NLP Toolki
Rule-based and
s a is ical NLP,
okeniza ion,
equency
dis ibu ion,
s opwo d il e ing
P o ided
comp ehensi e
NLP lib a y o
p ep ocessing and
basic NLP asks
No op imized o
la ge-scale o
neu al NLP models
3
PyMuPDF o E icien PDF
Tex Ex ac ionSainz
(2018)
PDF Tex
Ex ac ion
Ligh weigh PDF
pa sing, page-
wise ex
ex ac ion, mul i-
o ma suppo
Enables accu a e
and as ex
ex ac ion om
PDFs
S uggles wi h
poo ly
scanned/complex
PDFs
4
Tex Summa iza ion wi h
P e ained
T ans o me sRa el e al.
(2020)
Abs ac i e
Summa iza ion
T ans o me -
based seq2seq
a chi ec u e,
supe ised
p e aining, ine-
uning
In oduced T5
model e aming
all NLP asks in o
ex - o- ex ;
s ong abs ac i e
summa ies
High compu a ional
equi emen s o
la ge models
5
Sen imen Analysis wi h
T ans o me sWol e al.
(2020)
Sen imen
Analysis
P e- ained
ans o me
models, sen imen
classi ica ion
pipelines, ze o-
sho & ine- uned
analysis
Demons a ed
e ec i e
ans o me -
based sen imen
analysis wi h
Hugging Face
pipeline
Requi es in e ne
o la ge local
models o op imal
esul s
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 1823-1827
1827
6
O line-Capable NLP
Sys ems o Low-
Connec i i y
Scena iosAnas asopoulos
e al. (2021)
O line NLP
Deploymen
Local model
deploymen ,
educed-size
models, edge
op imiza ion
P oposed
s a egies o
unning NLP
o line in low-
esou ce se ings
T ade-o in model
size s.
pe o mance;
limi ed o line p e-
ained models
7
Flask-Based Web
Applica ions o NLP
In eg a ionKuma & Singh
(2021)
Web App
In eg a ion
Flask web
amewo k,
REST ul NLP
in eg a ion,
o line-se ing
Showed modula
in eg a ion o NLP
in o Flask apps;
suppo ed o line
model hos ing
Scalabili y issues
o complex NLP
pipelines
8
Ligh weigh Language
Models o Edge
De icesChen e al. (2022)
Edge NLP
Deploymen
Model
comp ession,
ans o me -
based
summa iza ion,
edge-compa ible
deploymen
Valida ed small
ans o me s (T5-
small) o
summa iza ion &
QA on edge
de ices
Accu acy lowe
han ull-scale
models; limi ed
ha dwa e suppo
4. Conclusion
This s udy demons a es he e ec i eness o combining ligh weigh , o line-capable NLP ools o summa iza ion and
sen imen analysis in low- esou ce en i onmen s. By in eg a ing g aph-based me hods like Tex Rank o ex ac i e
summa iza ion, NLTK o keywo d ex ac ion, and compac ans o me models like T5-small o abs ac i e
summa iza ion, he sys em achie es a s ong balance be ween pe o mance and e iciency. The use o PyMuPDF ensu es
accu a e PDF ex ex ac ion, while Flask enables a use - iendly o line in e ace. Expe imen al esul s alida e ha he
p oposed a chi ec u e deli e s eliable summa iza ion and analysis wi hou eliance on cloud in as uc u e, making i
sui able o edge de ices and o line applica ions in educa ion, esea ch, and secu e deploymen s.
Compliance wi h e hical s anda ds
Disclosu e o con lic o in e es
No con lic o in e es o be disclosed.
Re e ences
[1] R. Mihalcea and P. Ta au, “Tex ank and G aph-Based Summa iza ion Techniques,” in P oceedings o he ACL,
2004.
[2] S. Bi d, E. Klein, and E. Lope , “NLTK: Na u al Language Toolki ,” O’Reilly Media, 2009.
[3] J. Sainz, “PyMuPDF o E icien PDF Tex Ex ac ion,” Gi Hub Reposi o y, 2018. [Online]. A ailable.
[4] C. Ra el, N. Shazee , A. Robe s, e al., “Explo ing he Limi s o T ans e Lea ning wi h a Uni ied Tex - o-Tex
T ans o me ,” J. Mach. Lea n. Res., ol. 21, no. 140, pp. 1–67, 2020.
[5] Wol , L. Debu , V. Sanh, e al., “T ans o me s: S a e-o - he-a Na u al Language P ocessing,” in P oceedings o he
EMNLP: Sys em Demons a ions, pp. 38–45, 2020, doi: 10.18653/ 1/2020.emnlp-demos.6.
[6] A. Anas asopoulos, G. Neubig, and D. Chiang, “O line-Capable NLP Sys ems o Low-Connec i i y Scena ios,” in
P oc. o he AAAI Con e ence on A i icial In elligence, ol. 35, no. 14, pp. 12447–12455, 2021.
[7] A. Kuma and N. Singh, “Flask-Based Web Applica ions o NLP In eg a ion,” In . J. Compu . Appl., ol. 183, no. 28,
pp. 1–5, 2021.
[8] X. Chen, Y. Liu, and D. Wang, “Ligh weigh Language Models o Edge De ices”.