scieee Science in your language
[en] (orig)

DETECTING AND EVALUATING FAKE WEBSITES USING PATTERN RECOGNITION ALGORITHMS

Author: Anvar Kabulov; Alisherbek Otakhanov
Publisher: Zenodo
DOI: 10.5281/zenodo.17295648
Source: https://zenodo.org/records/17295648/files/29_890-174-181-Otaxonov.pdf
174
“Al-Fa g‘oniy a lodla i” elek on ilmiy
ju nali
ISSN 2181-4252. Tom: 1 | Son: 3 | 2025-yil
"Descendan s o Al-Fa ghani" elec onic scien i ic
jou nal.
ISSN 2181-4252. Vol: 1 | Iss: 3 | 2025 yea
Электронный научный журнал "Потомки Аль-
Фаргани"
ISSN 2181-4252. Том: 1 | Выпуск: 3 | 2025 год
h ps://al- a goniy.uz/
DETECTING AND EVALUATING FAKE WEBSITES USING PATTERN RECOGNITION
ALGORITHMS
Kabulo An a ,
P o esso o he Na ional Uni e si y o Uzbekis an named a e
Mi zo Ulugbek
[email p o ec ed]
O akhano Alishe bek,
Doc o al s uden o Fe gana S a e Uni e si y
[email p o ec ed]
Abs ac : The inc ease in he c ea ion o ake web pages by a acke s is leading o a sha p inc ease
in cybe a acks. A acke s use hese ake web si es o ad e ise p oduc s o In e ne use s, dis ibu e
malicious p og ams, o s eal use s' aluable logins and passwo ds. T adi ional solu ions o de ec ing
such ake web add esses a e no e ec i e in de ec ing newly c ea ed ake web add esses. In his
a icle, we p opose a new app oach ha combines se e al machine lea ning algo i hms. The e o e,
we use a ious selec ed ea u es o imp o e he accu acy o so ing and classi ying web pages. F om
ou expe imen al esul s, i can be seen ha using he p oposed app oach, he Random Fo es (RF)
classi ie showed he bes accu acy o 99%. I can be seen ha he Random Fo es classi ie can be
conside ed mo e eliable han he o he s in de ec ing ake web add esses.
Keywo ds: web Secu i y, Machine Lea ning, Random Fo es , Cybe a acks, Fake Websi es, URL.
In oduc ion. The apid de elopmen ,
expansion, and inc eased accessibili y o he In e ne
ha e led consume s o shi om adi ional shopping
o online shopping. Howe e , his inno a ion has also
inc eased cybe h ea s. A acke s use s a egies such
as ake websi es o s eal con iden ial accoun
in o ma ion om unsuspec ing use s. F auds e s use
ake websi es o ick indi iduals in o e ealing
sensi i e in o ma ion, including c edi ca d numbe s,
passwo ds, and pe sonal in o ma ion [1].
The c ea ion o ake websi es has a ac ed
signi ican a en ion om secu i y esea che s due o i s
po en ial o exploi use s. Al hough use s a e able o
de ec hese ake pages by ca e ully examining URLs,
he busyness o online ac i i ies some imes causes
hem o o e look such di e ences, which can lead
hem o all p ey o a acke s. Fake websi es a ec us
in online shopping and cause inancial losses o
indi iduals. As a Ve izon da a b each s udy shows, he
ini ial s ep o accessing a ake websi e is esponsible
o 90 pe cen o all auds [2].
In addi ion o ob aining pe sonal and
con iden ial in o ma ion, he pu pose o mode n ake
websi es is o in ec ic ims’ compu e s wi h a ious
o ms o malwa e [3], [4]. Communica ion channels
such as he In e ne , SMS, and email a e used o
dis ibu e hese ake websi es. The In e ne se es as a
means o a acke s o communica e wi h ic ims
h ough email messages, ake websi es, ins an
messaging, and social ne wo ks [5], [6].
In his s udy, we p opose a comp ehensi e
app oach o de ec and p e en audulen websi es,
which includes he use o machine lea ning me hods.
Thus, his esea ch wo k allows us o mo e accu a ely
iden i y such websi es. Ou main goal is o de elop a
sys em wi h enhanced capabili ies o de ec ing and
classi ying audulen websi es and hen p e en ing
hem. Using machine lea ning algo i hms, we ex ac
27 ea u es om websi es, including URL and domain-
based a ibu es, and a ibu es based on he di ec o y
and pa ame e pa s o he URL. The p oposed
app oach is aimed a s eng hening he secu i y o
In e ne use s and p o ec ing hem om he b each o
hei pe sonal and inancial da a. To e alua e he
e ec i eness o ou app oach, we pe o m a
compa a i e analysis o 7 machine lea ning models.
175
“Al-Fa g‘oniy a lodla i” elek on ilmiy
ju nali
ISSN 2181-4252. Tom: 1 | Son: 3 | 2025-yil
"Descendan s o Al-Fa ghani" elec onic scien i ic
jou nal.
ISSN 2181-4252. Vol: 1 | Iss: 3 | 2025 yea
Электронный научный журнал "Потомки Аль-
Фаргани"
ISSN 2181-4252. Том: 1 | Выпуск: 3 | 2025 год
h ps://al- a goniy.uz/
These a e: Nai e Bayes, Decision T ee, K-nea es
Neighbo , Suppo Vec o Machine, Random Fo es ,
Logis ic Reg ession, and Adaboos . This esea ch will
p o ide an oppo uni y o imp o e websi e secu i y and
p o ec con iden ial in o ma ion om malicious
a acks. In his esea ch, we aim o inc ease knowledge
abou how o a oid being ooled by ake websi es and
dis inguish hem om eal ones.
Rela ed wo ks. This sec ion e iews a ious
me hods o de ec ing ake websi es using machine
lea ning algo i hms and websi e ea u es.
Chiew and Chang [7] p oposed a me hod ha
elies on websi e logos. They ex ac ed he logo om a
web page and ed i o Google’s image sea ch engine o
lea n how o de ec suspicious websi es. They we e
able o de e mine whe he a websi e was legi ima e o
ake by compa ing he websi e wi h sea ch engine
esul s.
They p esen a con en -agnos ic me hod o
p edic ing websi e domains based on ce i ica e
anspa ency logs and passi e DNS eco ds. The s udy
demons a es he use ulness o his analysis by aining
a classi ie wi h unique ea u es and achie ing low
alse posi i e a es, as well as high accu acy and ecall
in p edic ing ake websi e domains [8].
Heu is ic solu ions o de ec ing ake websi es
s udy a ious ea u es such as non-con en -based,
con en -based, and isual simila i y-based ea u es, as
well as DNS in o ma ion and he egis ies used o
egis e he si e [9]. The pape p oposes a gene al
solu ion based on heu is ics o hese ea u es o p edic
ake websi es.
Sinha [10] p oposes a da ase o 198 ea u es
om ake websi es as an empi ical app oach o
de ec ing ake websi es. The da ase is analyzed using
machine lea ning and deep lea ning models, and
Random Fo es de ec s ake web pages wi h high
accu acy. The pape emphasizes he impo ance o
ob aining a di e se se o ea u es o e ec i e
de ec ion o ake websi es.
The use o machine lea ning o de ec ake
websi es and o highligh he limi a ions o blacklis s in
de ec ing eal- ime a acks [11]. The s udy ocused on
ake websi e URLs and domains associa ed wi h I alian
o ganiza ions, and models based on p e- ained
encode s and con olu ional neu al ne wo ks showed
p omising esul s.
In his pape , we p esen a me hod o de ec ing
and p e en ing ake websi es. We ex ac a se o
a ibu es om hei URL add ess and hen classi y
hem as legi ima e o ake websi es. Speci ically, we
ely on he use o machine lea ning models o de ec
po en ial ake websi es based on hei URL add ess.
We hen de i e a comple e se o 27 ea u es, including
URL-based ea u es, domain-based ea u es, URL-
based ea u es, and URL-based ea u es and di ec o y-
based ea u es, o u he e alua e a websi e and
de e mine i s au hen ici y.
We compa ed di e en machine lea ning
models o e alua e he e ec i eness o ou p oposed
app oach. In his pape , websi es we e e alua ed based
on URL ea u es using Nai e Bayes (NB), Decision
T ee (DT), K-Nea es Neighbo (KNN), Suppo
Vec o Machine (SVM), Random Fo es (RF), Logis ic
Reg ession (LR), and Adaboos algo i hms. Ou
app oach shows ha he p oposed app oach is e ec i e
in de ec ing ake websi es wi h a e y good pe cen age.
The p oposed me hod. Ou p oposed
app oach ex ac s and analyzes a ious ea u es om
URLs o e ec i ely de ec ake websi es. The main
con ibu ion o his pape is ha he ex ac ed ea u e
se is used oge he . We build a knowledge base om
he ex ac ed ea u es. Based on his knowledge base,
we check and p edic whe he he URLs o o he web
pages a e eal o ake. We p opose he use o a ule se
o imp o e he accu acy o de ec ing ake web pages.
A chi ec u e o he p oposed me hod. The
gene al a chi ec u e o he p oposed app oach is
di ided in o h ee s ages. In he i s s age, all he
impo an ea u es o he URLs a e ex ac ed. In he
second s age, a knowledge base is o med based on he
p oduc ion logic. In he hi d s age, he ex ac ed web
pages a e es ed o de e mine whe he hey a e ake o
genuine. Fig. 1 illus a es he a chi ec u e o he
p oposed app oach. The de ails o each s age in he
a chi ec u e a e desc ibed below.
176
“Al-Fa g‘oniy a lodla i” elek on ilmiy
ju nali
ISSN 2181-4252. Tom: 1 | Son: 3 | 2025-yil
"Descendan s o Al-Fa ghani" elec onic scien i ic
jou nal.
ISSN 2181-4252. Vol: 1 | Iss: 3 | 2025 yea
Электронный научный журнал "Потомки Аль-
Фаргани"
ISSN 2181-4252. Том: 1 | Выпуск: 3 | 2025 год
h ps://al- a goniy.uz/
Figu e 1. Gene al a chi ec u e o he p oposed
app oach.
C ea e ea u es. The ea u e se is c ea ed as
ollows. Ou ea u es a e based on he URL o a web
page. We ex ac he ea u es om he URL using
so wa e (C#). We di ide he ea u es in o ou g oups
acco ding o he loca ion whe e hey a e de ec ed, as
shown in Table 1. Speci ically, ea u es 𝑓1,…,𝑓9
belong o he gene al URL, ea u es 𝑓10,…,𝑓13 belong
o he domain pa o he URL, ea u es 𝑓14,…,𝑓17
belong o he di ec o y pa o he URL, and ea u es
𝑓18,…,𝑓27 belong o he pa ame e pa o he URL. A
de ailed explana ion o he de ec ed ea u es is gi en in
he ea u e ex ac ion sec ion o his a icle.
Table 1.
Fea u es used in he p oposed app oach
Ca ego
y
Sign
Name o Fea u e
URL-
based
ea u es
𝑓1,𝑓2,𝑓3,𝑓4,𝑓5,
𝑓6,𝑓7,𝑓8,𝑓9
coun _do _u l,
coun _slash_u l,
coun _ ld_u l,
leng h_u l,
coun _a _u l,
coun _hyphen_u l,
coun _unde line_u l,
coun _equal_u l,
coun _and_u l
Domain
-based
ea u es
𝑓10,𝑓11,𝑓12,𝑓13
coun _do _domain,
coun _ owels_domain
, domain_leng h,
coun _hyphen_domain
Di ec o
y-based
ea u es
𝑓14,𝑓15,𝑓16,𝑓17
coun _slash_di ec o y,
di ec o y_leng h,
coun _do _di ec o y,
coun _hyphen_di ec o
y
Pa ame
e -based
ea u es
𝑓18,𝑓19,𝑓20,𝑓21,𝑓22,
𝑓23,𝑓24,𝑓25,𝑓26,𝑓27
coun _hyphen_pa ams
, coun _a _pa ams,
coun _do _pa ams,
coun _equal_pa ams,
coun _and_pa ams,
pa ams_leng h,
coun _pa ams,
coun _unde line_pa a
ms,
coun _slash_pa ams,
coun _ques ionma k_p
a ams
Vec o iza ion o ea u es. A e ea u e
ex ac ion, ea u e ec o iza ion is applied o c ea e a
ea u e ec o o each URL. A e ha , a s uc u ed
da abase is c ea ed. We di ide he URL ea u es in o 4
g oups and c ea e he ea u e ec o equi ed o ain
he p oposed app oach. The combina ion o 9 ea u es
belonging o he gene al URL add ess o ms a 9-
dimensional ea u e ec o 𝐹𝑈=〈𝑓1,𝑓2,𝑓3,…,𝑓9〉,
while he combina ion o 4 ea u es belonging o he
domain pa o he URL o ms a 4-dimensional ea u e
ec o 𝐹𝐷=〈𝑓10,𝑓11,𝑓12,𝑓13〉. The combina ion o 4
ea u es belonging o he di ec o y pa o he web page
URL p oduces a 4-dimensional ea u e ec o 𝐹𝐶=
〈𝑓14,𝑓15,𝑓16,𝑓17〉, and inally, he combina ion o 10
ea u es belonging o he pa ame e pa o he URL
p oduces a 10-dimensional ea u e ec o 𝐹𝑃=
〈𝑓18,𝑓19,𝑓20,…,𝑓27〉. The abo e 4 ea u e ec o s a e
combined o p oduce he inal ea u e ec o 𝐹𝑉=
𝐹𝑈⋃𝐹𝐷⋃𝐹𝐶⋃𝐹𝑃=〈𝑓1,𝑓2,𝑓3,𝑓4,𝑓5,…,𝑓27〉, and his
ea u e ec o is gi en as inpu o machine lea ning
algo i hms, which hen classi y he websi e.
De ec ion module. The de ec ion module
in ol es building a powe ul classi ie using machine
lea ning classi ie s. As a esul , he classi ie is
e ec i e in de ec ing ake websi es. The classi ie s we
use he e apply ea u es based on he o al leng h o he
URL 𝐹𝑈, ea u es based on he domain pa o he URL
177
“Al-Fa g‘oniy a lodla i” elek on ilmiy
ju nali
ISSN 2181-4252. Tom: 1 | Son: 3 | 2025-yil
"Descendan s o Al-Fa ghani" elec onic scien i ic
jou nal.
ISSN 2181-4252. Vol: 1 | Iss: 3 | 2025 yea
Электронный научный журнал "Потомки Аль-
Фаргани"
ISSN 2181-4252. Том: 1 | Выпуск: 3 | 2025 год
h ps://al- a goniy.uz/
𝐹𝐷, ea u es ela ed o he di ec o y pa 𝐹𝐶, and
ea u es based on he pa ame e pa 𝐹𝑃 o a combined
se o ea u es. In he aining phase, he classi ie s a e
ained using he ea u e ec o 𝐹𝑈⋃𝐹𝐷⋃𝐹𝐶⋃𝐹𝑃
collec ed om each eco d in he aining da ase . In he
es ing phase, he classi ie s de e mine whe he a gi en
websi e is a ake o a eal websi e. A de ailed
desc ip ion is shown in Fig. 2.
Figu e 2. Algo i hm o de ec ake websi e.
The no el y o his me hodology is he use o
machine lea ning echniques o achie e high accu acy
and e iciency in de ec ing ake websi es. Machine
lea ning echniques p o ide a powe ul and lexible
way o deal wi h new o unknown ake websi es. This
me hodology can imp o e web secu i y and p o ec
use s om ake web add esses.
Ex ac ing ea u es. We used 27 ea u es o
ca ego ize websi es, including URL-based ea u es,
domain-based ea u es, di ec o y-based ea u es, and
pa ame e -based ea u es. Table 1 shows he ea u es
used in his s udy.
Ma hema ical and algo i hmic o maliza ion
and e alua ion me hods. This sec ion p esen s he
algo i hms used in he a icle and hei ma hema ical
o maliza ion. In addi ion, in o ma ion is p o ided
abou he machine lea ning algo i hms used in he
p oposed model and hei e alua ion me hods.
Ma hema ical o maliza ion based on Pe i
ne s. The ma hema ical o maliza ion [12] o a h ee-
s age machine lea ning sys em based on Pe i ne s is
cons uc on he undamen al desc ip ion o Pe i ne s:
𝑃𝑁=(𝑃,𝑇,𝐹,𝑊,𝑀0) (1)
he e, 𝑃={𝑝1,𝑝2,…,𝑝9} – posi ions, 𝑇=
{𝑡1,𝑡2,…,𝑡7} – ansi ions, 𝐹⊆(𝑃×𝑇)∪(𝑇×𝑃) –
low ela ion, 𝑊:𝐹→𝑁+ – weigh s, 𝜇0:𝑃→𝑁 –
ini ial ma king.
The posi ions and ansi ions in a Pe i ne [13]
a e named as ollows:
Posi ions:
− 𝑝1: incoming URLs
− 𝑝2: he p ocess o ea u e ex ac ion
− 𝑝3: ec o ized ea u es
− 𝑝4: knowledge base
− 𝑝5: model aining s age
− 𝑝6: model es ing s age
− 𝑝7: model analysis
− 𝑝8: esul (Legi ima e o Fake)
− 𝑝9: sa e y decision making
T ansi ions:
− 𝑡1: ex ac ea u es
− 𝑡2: ec o ize
− 𝑡3: gene a e knowledge base
− 𝑡4: ain ML models
− 𝑡5: inpu es URLs
− 𝑡6: classi y using ML algo i hms
− 𝑡7: secu i y decision
Using he abo e posi ions and ansi ions, he
𝐹- low ela ionship is o med as ollows:
𝐹=
{
(𝑝1,𝑡1),(𝑡1,𝑝2),
(𝑝2,𝑡2),(𝑡2,𝑝3),
(𝑝3,𝑡3),(𝑡3,𝑝4),
(𝑝4,𝑡4),(𝑡4,𝑝5),
(𝑝6,𝑡5),(𝑡5,𝑝6),
(𝑝5,𝑡6),(𝑝6,𝑡6),(𝑡6,𝑝7),
(𝑝7,𝑡7),(𝑡7,𝑝8),(𝑡7,𝑝9)
}
Based on he F – low ela ionship, he ac i e
ansi ions [14] in Table 2 we e gene a ed.
Table 2.
Wo k low able o ac i e ansi ions om
p e ious s a e o new s a e
S a ion
Ac i e
ansi io
n
New s a ion
𝜇0
=(1,0,0,0,0,1,0,0,0)
𝑡1
𝜇1
=(0,1,0,0,0,1,0,0,0)
𝜇1
=(0,1,0,0,0,1,0,0,0)
𝑡2
𝜇2
=(0,0,1,0,0,1,0,0,0)
𝜇2
=(0,0,1,0,0,1,0,0,0)
𝑡3
𝜇3
=(0,0,0,1,0,1,0,0,0)
𝜇3
=(0,0,0,1,0,1,0,0,0)
𝑡4
𝜇4
=(0,0,0,0,1,1,0,0,0)
178
“Al-Fa g‘oniy a lodla i” elek on ilmiy
ju nali
ISSN 2181-4252. Tom: 1 | Son: 3 | 2025-yil
"Descendan s o Al-Fa ghani" elec onic scien i ic
jou nal.
ISSN 2181-4252. Vol: 1 | Iss: 3 | 2025 yea
Электронный научный журнал "Потомки Аль-
Фаргани"
ISSN 2181-4252. Том: 1 | Выпуск: 3 | 2025 год
h ps://al- a goniy.uz/
𝜇4
=(0,0,0,0,1,1,0,0,0)
𝑡5
𝜇5
=(0,0,0,0,1,0,1,0,0)
𝜇5
=(0,0,0,0,1,0,1,0,0)
𝑡6
𝜇6
=(0,0,0,0,0,0,0,1,0)
𝜇6
=(0,0,0,0,0,0,0,1,0)
𝑡7
𝜇7
=(0,0,0,0,0,0,0,0,1)
The o de o ope a ion o he Pe i ne is
desc ibed by he ollowing s ages o ansi ions:
𝜎=𝑡1∘𝑡2∘𝑡3∘𝑡4∘𝑡5∘𝑡6∘𝑡7
Machine lea ning algo i hms used in he
a icle. In his s udy, we compa ed he pe o mance o
7 classi ie s used as machine lea ning me hods o he
p oposed sys em: Nai e Bayes (NB), Decision T ee
(DT), K-Nea es Neighbo (KNN), Suppo Vec o
Machine (SVM), Random Fo es (RF), Logis ic
Reg ession (LR), and Adaboos .
Abou he da ase used in he a icle. The
p oposed me hod o classi ying URLs as legi ima e
and ake web pages was es ed using a da ase . The
da ase used in he es s con ains URLs o 6000 web
pages, some o which a e used o ake web pages and
o he s a e legi ima e. We use he da abase o ake and
eal websi es on he Kaggle open da a pla o m. We
ex ac 27 a ibu es o each websi e ha is pa o he
da ase . The lis includes a ious ea u es such as URL
leng h, numbe o special cha ac e s in he URL, and
ea u es ela ed o he domain, pa ame e , and di ec o y
pa o he URL. We summa ize he ea u es o he
da ase used o he expe imen s and e alua ion in
Table 1. We assign a alue o 1 o each ea u e in he
da ase i i is a ake websi e and a alue o 0 i i is a
legi ima e websi e.
E alua ion pa ame e s used. In ou
expe imen , we compa ed di e en classi ie s using
e alua ion me ics such as alse posi i e a e, alse
nega i e a e, p ecision, ecall, F1-sco e, and accu acy
o e alua e he pe o mance o he p oposed sys em.
Acco ding o Table 4, hese pa ame e s a e calcula ed
using he T ue Posi i e (TP), T ue Nega i e (TN),
False Posi i e (FP), and False Nega i e (FN) ields o
he con usion ma ix shown in Table 3.
Table 3.Con usion ma ix
Class
Fake
Legi ima e
Fake
TP
FP
Legi ima e
FN
TN
FPR (False Posi i e Ra e): This is he
pe cen age o alse de ec ion o legi ima e websi es,
which is calcula ed acco ding o (3):
𝐹𝑃𝑅=𝐹𝑃
𝐹𝑃+𝑇𝑁 (2)
FNR (False Nega i e Ra e): This is he
pe cen age o alsely classi ied ake websi es,
calcula ed as ollows (4):
𝐹𝑁𝑅= 𝐹𝑁
𝑇𝑃+𝐹𝑁 (3)
P ecision: I assesses how accu a e he model
is. I is possible o co ec ly classi y he ue esul , and
i is calcula ed as ollows (5):
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛= 𝑇𝑃
𝑇𝑃+𝐹𝑃 (4)
Recall: The abili y o a model o co ec ly
p edic posi i es om ue posi i es is indica ed by he
model's ecall sco e, which is calcula ed by (6):
𝑅𝑒𝑐𝑎𝑙𝑙= 𝑇𝑃
𝐹𝑁+𝑇𝑃 (5)
F-measu e: This is simila o he accu acy and
ha monic mean. I p o ides a as way o compa e
classi ie s and is be ween 0 and 1, calcula ed by (7):
𝐹−𝑚𝑒𝑎𝑠𝑢𝑟𝑒= 2∗𝑇𝑃
2∗𝑇𝑃+𝐹𝑁+𝐹𝑃 (6)
Accu acy (%): This is a well-de ined
pe cen age o legi ima e and ake websi es, calcula ed
acco ding o (8):
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦= 𝑇𝑃+𝑇𝑁
𝑇𝑃+𝐹𝑁+𝑇𝑁+𝐹𝑃∗100 (7)
Table 4.
Fields o he con usion ma ix and hei
de ini ion
Field
De ini ion
TP
Numbe o websi es co ec ly iden i ied as
ake
TN
Numbe o websi es iden i ied as
legi ima e
FP
Numbe o legi ima e websi es inco ec ly
iden i ied as ake websi es
FN
Numbe o ake websi es inco ec ly
iden i ied as legi ima e
These ields a e used o calcula e a numbe o
pe o mance me ics, including p ecision (%),
accu acy, ecall, and F1 sco e. We use hese me ics o
e alua e he pe o mance o he algo i hm and hen
make necessa y changes o imp o e i .

179
“Al-Fa g‘oniy a lodla i” elek on ilmiy
ju nali
ISSN 2181-4252. Tom: 1 | Son: 3 | 2025-yil
"Descendan s o Al-Fa ghani" elec onic scien i ic
jou nal.
ISSN 2181-4252. Vol: 1 | Iss: 3 | 2025 yea
Электронный научный журнал "Потомки Аль-
Фаргани"
ISSN 2181-4252. Том: 1 | Выпуск: 3 | 2025 год
h ps://al- a goniy.uz/
Resul s and discussion. Classi ica ion esul s
using ea u es (such as domain-based ea u es,
pa ame e -based ea u es, URL-based ea u es, and
di ec o y-based ea u es) a e shown in his sec ion.
Acco ding o equa ions (2) o (7), he accu acy
(%), p ecision, ecall, F1 sco e, and FPR and FNR
alues o he ecommended ea u es a e shown in Table
5. Se en classi ie s we e used o classi y ake websi es
acco ding o ea u e g oups. Using he da a in his able,
we can unde s and how each ea u e a ec s he
classi ica ion. The indings show ha he RF algo i hm
p o ides he mos accu a e classi ica ion wi h he
lowes FPR and FNR.
Table 5.
Pe o mance e alua ion esul s o di e en
classi ie s
The con usion ma ix o he 7 classi ie
algo i hms we used in ou s udy is shown in Fig. 3. As
can be seen om his igu e, he mos e icien
algo i hm was he andom o es classi ie . The
algo i hm ha achie ed he lowes accu acy was he
Naï e Bayes classi ie .
Figu e 3. Con usion ma ices o algo i hms.
The pe o mance o he ecommended ea u e
se was e alua ed and he esul s we e summa ized o
he se en classi ie s conside ed based on accu acy. Fig.
4 shows ha RF wi h an accu acy o 99% and DT wi h
an accu acy o 98.17% bo h achie ed he maximum
accu acy o his ea u e se .
Figu e 4. Accu acy (%) esul s o algo i hms.
Accu acy is de ined as he a io o ue posi i e
p edic ions o all posi i e p edic ions (including ue
posi i e and alse posi i e p edic ions). The e o e, he
amoun o alse posi i e p edic ions dec eases as he
accu acy inc eases. In his case, RF and DT algo i hms
ha e he highes accu acy sco es, i.e., 99.22% and
98.33%, espec i ely. The esul s a e shown in Fig. 5.
Figu e 5. P ecision esul s o algo i hms.
180
“Al-Fa g‘oniy a lodla i” elek on ilmiy
ju nali
ISSN 2181-4252. Tom: 1 | Son: 3 | 2025-yil
"Descendan s o Al-Fa ghani" elec onic scien i ic
jou nal.
ISSN 2181-4252. Vol: 1 | Iss: 3 | 2025 yea
Электронный научный журнал "Потомки Аль-
Фаргани"
ISSN 2181-4252. Том: 1 | Выпуск: 3 | 2025 год
h ps://al- a goniy.uz/
The esul s in Fig. 6 show he ecall alues o
a ious machine lea ning algo i hms used o classi y
websi es in o legi ima e and ake websi es. I measu es
how well he model can dis inguish be ween posi i e
cases ( ake websi es) and ue posi i e examples.
Figu e 6. Recall esul s o algo i hms.
The F1 sco es o a ious machine lea ning
algo i hms o classi ying websi es as legi ima e o
ake a e shown in Fig. 7. The RF algo i hm is he mos
accu a e me hod o classi ying websi es as legi ima e
o ake.
Figu e 7. E alua ion o algo i hm esul s using
F1 sco e.
Fig. 8 below shows a compa ison o he esul s
o Accu acy (%), P ecision, Recall, and F1 sco e o
se en algo i hms used o de ec ake websi es. In his
case, i shows ha he andom o es algo i hm is he
bes .
Figu e 8. Compa ison o e alua ion algo i hms.
In Fig. 9, we selec ed he impo an ea u es o
he abo e 3 models (DT, RF, AdaBoos ) by mu ual
o ing in o de o dec easing common impo an
ea u es o he p oposed model.
Figu e 9. The mos impo an ea u es o he 3
models (DT, RF, AdaBoos ).
Conclusion. In his s udy, we p esen ed a
powe ul me hod o de ec ing ake websi es using
machine lea ning echniques. We used URL ea u es o
a web page, domain-based ea u es, and ea u es
ela ed o he di ec o y and pa ame e pa s o he URL
o inspec he web page. The esul s o ou esea ch
show ha he RF classi ie achie ed he highes
accu acy o 99%, wi h FPR and FNR a es o 0.78 and
181
“Al-Fa g‘oniy a lodla i” elek on ilmiy
ju nali
ISSN 2181-4252. Tom: 1 | Son: 3 | 2025-yil
"Descendan s o Al-Fa ghani" elec onic scien i ic
jou nal.
ISSN 2181-4252. Vol: 1 | Iss: 3 | 2025 yea
Электронный научный журнал "Потомки Аль-
Фаргани"
ISSN 2181-4252. Том: 1 | Выпуск: 3 | 2025 год
h ps://al- a goniy.uz/
1.22, espec i ely. In addi ion, he decision ee model
demons a ed excellen pe o mance wi h an accu acy
o 98.17%, despi e a sligh inc ease in FPR and FNR
a es o 1.67 and 2.00.
Coun e ing he sophis ica ed s a egies o
c ea ing ake websi es equi es a p oac i e app oach.
As cybe c iminals cons an ly imp o e hei me hods, i
becomes necessa y o de elop inc easingly powe ul
and e ec i e sys ems. Ou s udy is one o he impo an
s eps owa ds sol ing his p oblem.
Re e ences
1. Singla, S., Gando a, E., Bansal, D., &
So a , S. “A no el app oach o malwa e de ec ion using
s a ic classi ica ion”, In e na ional Jou nal o
Compu e Science and In o ma ion, Vol.13, No.3,
pp.1-5, 2015.
2. En e p ise, V. "Ve izon 2018 da a
b each in es iga ions epo ", 2018. [Online].
A ailable:
h ps:// e izon.com/business/ esou ces/ epo s/2018 -
da a-b each-diges .pd
3. Gando a, E., Bansal, D., & So a , S.
“Malwa e in elligence: beyond malwa e analysis”,
In e na ional Jou nal o Ad anced In elligence
Pa adigms, Vol.13, No.1-2, pp.80-100, 2019. DOI:
10.1504/IJAIP.2019.099945
4. Sha ma, A., Gando a, E., Bansal, D., &
Gup a, D. “Malwa e capabili y assessmen using uzzy
logic”, Cybe ne ics and Sys ems, Vol.50, No.4,
pp.323-338, 2019. DOI:
10.1080/01969722.2018.1552906
5. Chiew, K.L., Yong, K.S.C., & Tan,
C.L.J.E.S.w.A. “A su ey o phishing a acks: Thei
ypes, ec o s, and echnical app oaches”, Expe
Sys ems wi h Applica ions, Vol.106, pp.1-20, 2018.
DOI: 10.1016/j.eswa.2018.03.050
6. Gando a, E., & So a , S.J.I.J.o.N.-G.C.
“Tools & Techniques o Malwa e Analysis and
Classi ica ion”, In e na ional Jou nal o Nex -
Gene a ion Compu ing, Vol.7, No.3, pp.176-197,
2016. 7. Chiew, K. L., Chang, E. H., & Tiong,
W. K. “U iliza ion o websi e logo o phishing
de ec ion”, Compu e s & Secu i y, Vol.54, pp.16-26,
2015. DOI: 10.1016/j.cose.2015.07.006
8. AlSabah, M., Nabeel, M., Boshma , Y.,
& Choo, E. “Con en -Agnos ic De ec ion o Phishing
Domains Using Ce i ica e T anspa ency and Passi e
DNS”, P oceedings o he 25 h In e na ional
Symposium on Resea ch in A acks, In usions, and
De enses, pp. 446-459, 2022. DOI:
10.1145/3545948.3545958
9. To ealba A, L. and Bus os-Jiménez, J.
“De ec ing Phishing in a Heu is ic Way (Abs ac )”,
2021. 10. Sinha, J., & Sachan, M. “PhishX: An
Empi ical App oach o Phishing De ec ion”, 2022.
DOI: 10.1145/1122445.1122456
11. Ranaldi, L., Pe i o, M., Ge a di, M.,
Fallucchid, F., & Zanzo o, F.M. “Machine Lea ning
Techniques o I alian Phishing De ec ion”, in I alian
Con e ence on Cybe secu i y, Rome, I aly 2022.
12. A.Kabulo , I. Ya asho and A.
O akhono , "Algo i hmic Analysis o he Sys em
Based on he Func ioning Table and In o ma ion
Secu i y," 2022 IEEE In e na ional IOT, Elec onics
and Mecha onics Con e ence (IEMTRONICS),
To on o, ON, Canada, 2022, pp. 1-5, doi:
10.1109/IEMTRONICS55184.2022.9795746.
13. No ma o , I., Ya asho , I., O akhono ,
A., & E gashe , B. (2022, Sep embe ). Cons uc ion o
eliable well dis ibu ion unc ions based on he
p inciple o in a iance o con enien use access
con ol. In 2022 In e na ional Con e ence on
In o ma ion Science and Communica ions
Technologies (ICISCT) (pp. 1-5). IEEE.
14. Toshma o , S., Ya asho , I.,
O akhono , A., & Isma illaye , A. (2022, Sep embe ).
Designing an algo i hmic o maliza ion o h ea
ac ions based on a Func ioning able. In 2022
In e na ional Con e ence on In o ma ion Science and
Communica ions Technologies (ICISCT) (pp. 1-5).
IEEE.