scieee Science in your language
[en] (orig)

A Comparative Study Of K-Means And Parallel K-Means Clustering Algorithms For Efficient Data Analysis

Author: Kudale Gautam Appasaheb; Shinde Monika; Dr. Gupta Gaurav
Publisher: Zenodo
DOI: 10.5281/zenodo.17312746
Source: https://zenodo.org/records/17312746/files/S063815.pdf
80
In e na ional Jou nal o Ad ance and Applied Resea ch
www.ijaa .co.in
ISSN – 2347-7075
Impac Fac o – 8.141
Pee Re iewed
Bi-Mon hly
Vol. 6 No. 38
Sep embe - Oc obe - 2025
A Compa a i e S udy O K-Means And Pa allel K-Means Clus e ing
Algo i hms Fo E icien Da a Analysis
Kudale Gau am Appasaheb1, Shinde Monika2 & D . Gup a Gau a 3
1&2Resea ch S uden s, D . A.P.J. Abdul Kalam Uni e si y, Indo e, M.P., India
3Resea ch Guide, D . A.P.J. Abdul Kalam Uni e si y, Indo e, M.P., India
Co esponding Au ho –M. Sadani Kudale Gau am Appasaheb
DOI - 10.5281/zenodo.17312746
Abs ac :
Clus e ing g oup’s unlabeled da a in o meaning ul pa e ns. K-Means, a popula pa i ion-
based algo i hm, o e s simplici y and e iciency bu s uggles wi h la ge-scale, high-dimensional da a
due o scalabili y and ini ializa ion issues. Pa allel K-Means add esses hese limi a ions by u ilizing
pa allel and dis ibu ed compu ing amewo ks, enhancing pe o mance, scalabili y, and
compu a ional e iciency in clus e ing asks. This pape p esen s a compa a i e s udy o adi ional K-
Means and Pa allel K-Means clus e ing algo i hms. I e iews clus e ing echniques and algo i hms,
analyzes hei me hodologies, ad an ages, and limi a ions, and highligh s he impo ance o pa allel
app oaches. The s udy concludes by emphasizing pa allelism’s ole in enhancing clus e ing e iciency
and scalabili y o da a-in ensi e applica ions.
Keywo ds: Machine Lea ning, Da a mining, Clus e ing, K-Means Clus e ing, Pa allel K-Means
Clus e ing
In oduc ion:
Da a mining in ol es ex ac ing use ul
in o ma ion om la ge da abases, aiding
o ganiza ions in e ie ing aluable insigh s
om da a wa ehouses. Applicable o a ious
da abase ypes, i is widely used in sec o s like
banking, insu ance, and pha maceu icals. As a
b anch o machine lea ning, da a mining
emphasizes explo a o y da a analysis and is
key in p edic i e analy ics. [16]
Machine lea ning (ML), a subse o
a i icial in elligence, enables compu e s o
lea n om da a and make p edic ions wi hou
explici p og amming. ML algo i hms build
models om aining da a o asks like
p edic ion and decision-making. I includes
supe ised, unsupe ised, and ein o cemen
lea ning, wi h applica ions in educa ion,
pa e n ecogni ion, spo s, and indus y. [25].
Da a clus e ing is he p ocess o
g ouping a se o objec s ha objec s is he
same g oups a e mo e simila o each o he
han o hose in o he g oups. [20] As da ase s
g ow la ge and mo e complex, e icien
clus e ing echniques a e essen ial o
unco e ing hidden pa e ns in high-
dimensional da a. Clus e ing aids applica ions
like cus ome segmen a ion and genomic
analysis. The inc easing demand o da a-
d i en insigh s has d i en he widesp ead
adop ion o scalable, e icien clus e ing
algo i hms ac oss di e se machine lea ning
domains.
K-Means Clus e ing:
K-means is one o he easies
algo i hms o unsupe ised lea ning used o
clus e ing [5]. K-means is one o he simples
IJAAR Vol. 6 No. 38 ISSN – 2347-7075
Kudale Gau am Appasaheb, Shinde Monika & D . Gup a Gau a
81
unsupe ised lea ning algo i hms used o
clus e ing. [5, 7, 10, 20] K-Means clus e ing
gene a es a speci ic numbe o disjoin , la
(non-hie a chical) clus e s. The K-Means
me hod is nume ical, unsupe ised, non-
de e minis ic and i e a i e [9]. K-means is one
o he simples unsupe ised lea ning
algo i hms ha sol e he well-known
clus e ing p oblem. [10, 20] The K-Means
clus e ing echnique is used o classi y da a in
a c isp sense. [12] K-means is an old and
widely used echnique in clus e ing me hod
[15]. K-means is he mos popula clus e ing
algo i hm commonly used in all me ic spaces
[18]
The below diag am explains he
wo king o he K-means Clus e ing Algo i hm:
Fig. 1: Wo king o K-means Clus e ing
Algo i hm
A. Gene alised Pseudocode o T adi ional k-
means [5, 8, 9, 15, 22, 24]
S ep 1: Accep he numbe o clus e s o
g oup da a in o and he da ase o clus e as
inpu alues
S ep 2: Ini ialize he i s K clus e s
- Take i s k ins ances o
- Take Random sampling o k elemen s
S ep 3: Calcula e he a i hme ic means o
each clus e o med in he da ase .
S ep 4: K-means assigns each eco d in he
da ase o only one o he ini ial clus e s - Each
eco d is assigned o he nea es clus e using a
measu e o dis ance (e.g Euclidean dis ance).
S ep 5: K-means e-assigns each eco d in
he da ase o he mos simila clus e and e-
calcula es he a i hme ic mean o all he
clus e s in he da ase .
B. Flowcha
Fig. 2: Flow Cha o K-means Clus e ing Algo i hm
IJAAR Vol. 6 No. 38 ISSN – 2347-7075
Kudale Gau am Appasaheb, Shinde Monika & D . Gup a Gau a
82
C. Ad an ages o K-Means Clus e ing:
 Simple, easy o unde s and, and
implemen . [5,7,10,20]
 Fas con e gence and low
compu a ional cos . [15, 25]
 Scalable o la ge da ase s and
adap able o spa se da a. [6]
 Assembles s able and igh clus e s
e icien ly. [24]
 High e iciency and widely used
ac oss a ious ields due o i s
i e a i e, unsupe ised, and non-
de e minis ic na u e. [28]
D. Disad an ages o K-Means Clus e ing:
 Requi es p ede ined numbe o
clus e s (K).
 Sensi i e o ini ial cen oid selec ion,
isking subop imal esul s.
 Assumes sphe ical, equally sized
clus e s.
 Poo pe o mance wi h non-linea o
complex clus e shapes.
 Highly sensi i e o ou lie s and noise.
 No sui able o ca ego ical da a
wi hou p ep ocessing.
 Requi es ea u e scaling o accu a e
esul s. [28]
E. Commonly used clus e e alua ion
me ics o K-Means Clus e ing:
 Ine ia (WCSS): Measu es
compac ness wi hin clus e s; lowe is
be e .
 Silhoue e Sco e: E alua es cohesion
and sepa a ion; anges om -1 o 1.
 Calinski-Ha abasz Index: Highe
alues indica e be e -de ined clus e s.
 Da ies-Bouldin Index: Lowe alues
sugges be e clus e ing.
 Dunn Index: Highe is be e .
 Adjus ed Rand Index (ARI) and
Pu i y: Compa e clus e ing o ue
labels.
F. Challenges o K-Means Clus e ing:
 Requi es selec ing he op imal numbe
o clus e s (K).
 Sensi i e o ini ial cen oid placemen ,
isking local minima.
 Assumes sphe ical, simila ly sized
clus e s.
 Ine ec i e o non-linea o complex
clus e bounda ies.
 Ou lie s and noise can dis o
clus e ing esul s.
 Scalabili y issues wi h la ge o high-
dimensional da ase s.
 Requi es nume ical da a and p ope
ea u e scaling.
Pa allel K-Means Clus e ing:
Pa allel K-Means is an op imized
e sion o he adi ional K-Means algo i hm
designed o handle la ge-scale and high-
dimensional da ase s by le e aging pa allel
and dis ibu ed compu ing. I accele a es he
compu a ion by pe o ming clus e ing
ope a ions simul aneously ac oss mul iple
p ocesso s o nodes.
The Pa allel K-Means Clus e ing
Algo i hm is an enhanced e sion o he
adi ional K-Means, op imized o pa allel
and dis ibu ed compu ing en i onmen s. I
di ides compu a ion among mul iple
p ocesso s o nodes o e icien ly clus e la ge-
scale and high-dimensional da ase s.
A. Gene alised Pseudocode o Pa allel K-
Means
Ini ialize K cen oids
B oadcas cen oids o all p ocesso s
epea
// Pa allel Assignmen S ep
o each p ocesso in pa allel:
Assign local da a poin s o nea es
cen oids
IJAAR Vol. 6 No. 38 ISSN – 2347-7075
Kudale Gau am Appasaheb, Shinde Monika & D . Gup a Gau a
83
Compu e pa ial sums and coun s o
each clus e
// Global Agg ega ion S ep
Ga he all pa ial sums and coun s
Compu e new cen oids globally
B oadcas upda ed cen oids o all
p ocesso s un il con e gence c i e ia me
A. Flowcha
Fig. 3: Flow Cha o Pa allel K-means Clus e ing Algo i hm
B. Ad an ages o Pa allel K-Means
Clus e ing:
 Enhances scalabili y o la ge and
high-dimensional da ase s.
 Reduces compu a ion ime h ough
pa allel p ocessing.
 E icien ly handles big da a using
dis ibu ed compu ing amewo ks.
 Main ains clus e ing accu acy while
imp o ing pe o mance.
 Sui able o eal- ime and da a-
in ensi e applica ions.
 Balances wo kload ac oss mul iple
p ocesso s o nodes, ensu ing as e
con e gence.
C. Disad an ages o Pa allel K-Means
Clus e ing:
 Requi es complex pa allel and
dis ibu ed compu ing in as uc u e.
 Communica ion o e head be ween
p ocesso s can a ec e iciency.
 Load balancing issues may a ise in
he e ogeneous en i onmen s.
 Sensi i e o ini ial cen oid selec ion,
simila o adi ional K-Means.
 Scalabili y can be limi ed by ha dwa e
and ne wo k cons ain s.
 Inc eased implemen a ion complexi y
compa ed o s anda d K-Means.
D. Commonly used clus e e alua ion
me ics o Pa allel K-Means
Clus e ing:
 Ine ia (WCSS): Measu es
compac ness wi hin clus e s; lowe
alues a e be e .
 Silhoue e Sco e: E alua es cohesion
and sepa a ion be ween clus e s;
highe is be e .
IJAAR Vol. 6 No. 38 ISSN – 2347-7075
Kudale Gau am Appasaheb, Shinde Monika & D . Gup a Gau a
84
 Calinski-Ha abasz Index: Assesses
clus e sepa a ion; highe alues
indica e well-de ined clus e s.
 Da ies-Bouldin Index: Lowe alues
e lec be e clus e ing.
 Adjus ed Rand Index (ARI):
Compa es clus e ing agains g ound
u h labels.
E. Challenges/Limi a ions o Pa allel K-
Means Clus e ing:
 Requi es complex pa allel o
dis ibu ed compu ing se up.
 Communica ion o e head be ween
nodes can educe e iciency.
 Sensi i e o ini ial cen oid selec ion,
isking subop imal clus e ing.
 Scalabili y may be cons ained by
ha dwa e and ne wo k limi a ions.
 Load balancing issues in
he e ogeneous sys ems.
 Inc eased algo i hmic and
implemen a ion complexi y compa ed
o adi ional K-Means
IV. Conclusion:
This s udy p esen s a comp ehensi e
compa a i e analysis o he adi ional K-
Means and Pa allel K-Means clus e ing
algo i hms o e icien da a analysis. The
indings e eal ha while K-Means o e s
simplici y and ease o implemen a ion, i
encoun e s limi a ions in handling la ge-scale
and high-dimensional da ase s due o
scalabili y and compu a ional ine iciencies.
Pa allel K-Means, le e aging pa allel and
dis ibu ed compu ing amewo ks,
signi ican ly enhances clus e ing pe o mance
by imp o ing scalabili y, educing execu ion
ime, and handling da a-in ensi e asks mo e
e ec i ely. The s udy unde sco es he c i ical
ole o pa allelism in mode n clus e ing
applica ions, p o iding a mo e obus and
e icien app oach o la ge-scale da a analysis
ac oss di e se domains.
Fu u e Wo k:
Fu u e esea ch will explo e hyb id
clus e ing models in eg a ing deep lea ning
echniques, such as au oencode s, wi h Pa allel
K-Means o enhance dimensionali y educ ion
and clus e ing accu acy on complex, high-
dimensional da a. Scalabili y on he e ogeneous
compu ing en i onmen s will also be
in es iga ed.
Re e ences:
1. Da a Mining In oduc o y and Ad anced
Topics, Ma ga e H. Dunhan, Pea son
2. Da a Mining P ac ical Machine Lea ning
Tools and Techniques, 3 d Edi ion, Ian
H.wi en, Eibe F ank, Ma k A. Hall
3. Mining o Massi e Da ase s, 2nd Edi ion,
Ju e Lesko ec, Anand Raja aman, Je ey
Da id Ullman
4. Da a Mining, Concep s and Techniques,
3 d Edi ion, Jiawei Han, Micheline
Kambe , Jian Pei
5. P o . P ashan Sahai Saxena, P o . M. C.
Go il, “P edic ion o S uden ’s Academic
Pe o mance using Clus e ing,” Special
Con e ence Issue: Na ional Con e ence on
Cloud Compu ing & Big Da a
6. Bindiya M Va ghese, Jose Tomy J,
Unnik ishnan A, Poulose Jacob K,
“Clus e ing s uden da a o cha ac e ize
pe o mance pa e ns,” (IJACSA)
In e na ional Jou nal o Ad anced
Compu e Science and Applica ions,
Special Issue on A i icial In elligence,
7. Md. Hedaye ul Islam Sho on, Mah uza
Haque, “An App oach o Imp o ing
S uden ’s Academic Pe o mance by using
K-means clus e ing algo i hm and
Decision ee,” (IJACSA) In e na ional

IJAAR Vol. 6 No. 38 ISSN – 2347-7075
Kudale Gau am Appasaheb, Shinde Monika & D . Gup a Gau a
85
Jou nal o Ad anced Compu e Science
and Applica ions, Vol.3, No. 8, 2012
8. Oyelade, O. J, Oladipupo, O. O.,
Obagbuwa, I. C., “Applica ion o k-Means
Clus e ing algo i hm o p edic ion o
S uden s’ Academic Pe o mance,”
(IJCSIS) In e na ional Jou nal o
Compu e Science and In o ma ion
Secu i y, Vol. 7, o. 1, 2010
9. Rakesh Kuma A o a, D . Dha mend a
Badal, “E alua ing S uden ’s Pe o mance
Using k-Means Clus e ing,” In e na ional
Jou nal o Compu e Science And
Technology, IJCST Vol. 4, Issue 2, Ap il -
June 2013, ISSN : 0976-8491 (Online) |
ISSN : 2229-4333 (P in )
10. Sha mila, R.C Mish a, “Pe o mance
E alua ion o Clus e ing Algo i hms,”
In e na ional Jou nal o Enginee ing
T ends and Technology (IJETT) -
Volume4 Issue7- July 2013, ISSN: 2231-
5381
11. Pa el, J. and Yada , R.S. (2015)
“Applica ions o Clus e ing Algo i hms in
Academic Pe o mance E alua ion.” Open
Access Lib a y Jou nal, 2: Augus 2015 |
Volume 2 | e1623
12. Jyo i may Pa el, Ramjee Singh Yada ,
“Applica ions o clus e ing algo i hms in
academic pe o mance e alua ion”
13. E.Venka esan, S.Sel a agini, “P edic ion
o s uden s academic pe o mance using
classi ica ion and clus e ing algo i hms,”
In e na ional Jou nal o Pu e and Applied
Ma hema ics Volume 116 No. 16 2017,
327-333 ISSN: 1311-8080 (p in ed
e sion); ISSN: 1314-3395 (on-line
e sion)
14. Snehal Bhogan , Keda Sawan , Pu a
Naik , Rubana Shaikh , Odelia Diuka ,
Saylee Dessai, “P edic ing s uden
pe o mance based on clus e ing and
classi ica ion,” IOSR Jou nal o Compu e
Enginee ing (IOSR-JCE) e-ISSN: 2278-
0661,p-ISSN:2278-8727, Volume 19,
Issue 3, Ve . V (May-June 2017), PP 49-
52
15. M . Shashikan P adip Bo ga aka , M .
Ami Sh i as a a, “E alua ing s uden ’s
pe o mance using k-means clus e ing,”
In e na ional Jou nal o Enginee ing
Resea ch & Technology (IJERT), ISSN:
2278-0181, Vol. 6 Issue 05, May – 2017
16. M s .Ma y idya john, Aksha a police
pa il, Anjali mish a, Bindhu eddy G,
Jamuna N, “Clus e ing echnique o
s uden pe o mance,” In e na ional
Resea ch Jou nal o Compu e Science
(IRJCS), Issue 06, Volume 6 (June 2019),
ISSN: 2393-9842
17. Noel Va ela , Edga do Sánchez Mon e o ,
Ca men Vásquez , Jesús Ga cía Guiliany ,
Ca los Va gas Me cado , Na aly O ellano
Llinas , Ka ina Ba is a Zea , and Pablo
Palencia, “S uden pe o mance
assessmen using clus e ing echniques,”
© Sp inge Na u e Singapo e P e L d.
2019 Y. Tan and Y. Shi (Eds.): DMBD
2019, CCIS 1071, pp. 179–188, 2019.
h ps://doi.o g/10.1007/978-981-32-9563-
6_19
18. N.Vala ma hy, S.K ishna eni,
“Pe o mance e alua ion and compa ison
o clus e ing algo i hms used in
educa ional da a mining,” In e na ional
Jou nal o Recen Technology and
Enginee ing (IJRTE) ISSN: 2277-3878,
Volume-7, Issue-6S5, Ap il 2019
19. Lubna Mahmoud Abu Zohai , “P edic ion
o S uden ’s pe o mance by modelling
small da ase size,” Abu Zohai
In e na ional Jou nal o Educa ional
Technology in Highe Educa ion (2019)
16:27 h ps://doi.o g/10.1186/s41239-019-
0160-3
IJAAR Vol. 6 No. 38 ISSN – 2347-7075
Kudale Gau am Appasaheb, Shinde Monika & D . Gup a Gau a
86
20. M s. Bhawna Janghel, D . Asha
Ambhaika , “Pe o mance o s uden
academics by k-mean clus e ing
algo i hm,” In e na ional J. Technology.
Janua y – June, 2020; Vol. 10: Issue 1,
ISSN 2231-3907 (P in ), ISSN 2231-3915
(Online)
21. Ma zieh Babaie, Mahdi She idi
Noushabadi, “A e iew o he me hods o
p edic ing s uden s' pe o mance using
machine lea ning algo i hms,” A chi es o
Pha macy P ac ice ¦ Volume 11 ¦ Issue S1 ¦
Janua y-Ma ch 2020
22. D . G. Raji ha De i, “P edic ion o s uden
academic pe o mance using clus e ing,”
In e na ional Jou nal o Cu en Resea ch
in Mul idisciplina y (IJCRM) ISSN: 2456-
0979 Vol. 5, No. 6, (June’20), pp. 01-05
23. Dewi Ayu Nu Wulanda i; Riski Annisa;
Les a i Yusu , Ti in P iha in, “An
educa ional da a mining o s uden
academic p edic ion using k-means
clus e ing and naï e bayes classi ie ,”
jou nal Pila Nusa Mandi i Vol 16, No 2
Sep embe 2020
24. Yann Ling Goh, Yeh Huann Goh, Chun-
Chieh Yip, Chen Hun Ting, Raymond
Ling Leh Bin, Kah Pin Chen, “P edic ion
o s uden s' academic pe o mance by k-
means clus e ing,” Pee - e iew unde
esponsibili y o 4 h Asia In e na ional
Mul idisciplina y Con e ence 2020
Scien i ic Commi ee
25. Re a hi Vankayalapa i, Kalyani Balaso
Ghu ugade, Rekha Vannapu am, Bejjanki
Pooja S ee P asanna, “K-means algo i hm
o clus e ing o lea ne s pe o mance
le els using machine lea ning echniques,”
Re ue d'In elligence A i icielle Vol. 35,
No. 1, Feb ua y, 2021, pp. 99-104
26. Rina Ha imu i, Ekoha iadi, Muno o, I. G.
P As o Budi jahjan o, “In eg a ing k-
means clus e ing in o au oma ic
p og amming assessmen ool o s uden
pe o mance analysis,” Indonesian Jou nal
o Elec ical Enginee ing and Compu e
Science Vol. 22, No. 3, June 2021, pp.
1389~1395 ISSN: 2502-4752, DOI:
10.11591/ijeecs. 22.i3.pp1389-1395
27. Rui Shang , Balqees A a, Islam Zada,
Shah Nazi , Zaid Ullah, and Sha i Ullah
Khan, “Analysis o simple k-mean and
pa allel k-mean clus e ing o so wa e
p oduc s and o ganiza ional pe o mance
using educa ion sec o da ase ,” Hindawi
Scien i ic P og amming Volume 2021,
A icle ID 9988318, 20 pages
h ps://doi.o g/10.1155/2021/9988318
28. Bao Chong, “K-means clus e ing
algo i hm: a b ie e iew,” Academic
Jou nal o Compu ing & In o ma ion
Science ISSN 2616-5775 Vol. 4, Issue 5:
37-40, DOI:
10.25236/AJCIS.2021.040506
29. Said Abubaka Sheikh Ahmed,
“E alua ing s uden s’ pe o mance o
social wo k depa men using k-means and
wo-s ep clus e “a case s udy o
mogadishu uni e si y”,” Mogadishu
Uni e si y Jou nal, Issue 7, 2021, ISSN
2519-9781
30. Zhihui Wang, “Highe educa ion
managemen and s uden achie emen
assessmen me hod based on clus e ing
algo i hm,” Hindawi Compu a ional
In elligence and Neu oscience Volume
2022, A icle ID 4703975, 10 pages
h ps://doi.o g/10.1155/2022/4703975
31. Ahmad Fik i Mohamed Na u i , No
Samsiah Sani, Nu Fa in Aqilah Zainudin ,
Abdul Hadi Abd Rahman and Mohd Ali ,
“Clus e ing analysis o classi ying s uden
academic pe o mance in highe
educa ion,” Appl. Sci. 2022, 12, 9467.
h ps://doi.o g/10.3390/app12199467
IJAAR Vol. 6 No. 38 ISSN – 2347-7075
Kudale Gau am Appasaheb, Shinde Monika & D . Gup a Gau a
87
32. O hman, F., Abdullah, R., Rashid, N. A.
A., & Salam, R. A. (2004, Decembe ).
Pa allel k-means clus e ing algo i hm on
DNA da ase . In In e na ional Con e ence
on Pa allel and Dis ibu ed Compu ing:
Applica ions and Technologies (pp. 248-
251). Be lin, Heidelbe g: Sp inge Be lin
Heidelbe g.
33. Zhao, W., Ma, H., & He, Q. (2009).
Pa allel k-means clus e ing based on
map educe. In Cloud Compu ing: Fi s
In e na ional Con e ence, CloudCom
2009, Beijing, China, Decembe 1-4,
2009. P oceedings 1 (pp. 674-679).
Sp inge Be lin Heidelbe g.
34. Ke dp asop, K., & Ke dp asop, N. (2010).
A ligh weigh me hod o pa allel k-means
clus e ing. In e na ional Jou nal o
Ma hema ics and Compu e s in
Simula ion, 4(4), 144-153.
35. Kuma , J., Mills, R. T., Ho man, F. M., &
Ha g o e, W. W. (2011). Pa allel k-means
clus e ing o quan i a i e eco egion
delinea ion using la ge da a se s. P ocedia
Compu e Science, 4, 1602-1611.
36. Jin, S., Cui, Y., & Yu, C. (2016). A new
pa alleliza ion me hod o K-means. a Xi
p ep in a Xi :1608.06347.
37. Alguliye , R. M., Aliguliye , R. M., &
Sukhos a , L. V. (2021). Pa allel ba ch k-
means o Big da a clus e ing. Compu e s
& Indus ial Enginee ing, 152, 107023.
38. Nig o, L. (2022). Pe o mance o pa allel
K-means algo i hms in
Ja a. Algo i hms, 15(4), 117.