scieee Science in your language
[en] (orig)

Spam Filtering Security Evaluation Framework Using SVM, LR and MILR

Author: IJCSIT
Publisher: Zenodo
DOI: 10.5281/zenodo.17291309
Source: https://zenodo.org/records/17291309/files/3316ijcax02.pdf
In e na ional Jou nal o Compu e - Aided Technologies (IJCAx) Vol.3, No. 2/3, July 2016
DOI:10.5121/ijcax.2016.3302 19
S
PAM
F
ILTERING
S
ECURITY
E
VALUATION
F
RAMEWORK
U
SING
SVM,
LR
A
ND
MILR
Kunjali Pawa
1
and Madhu i Pa il
2
1
M.E. S uden , D . D.Y.Pa il School o Engg. And Technology, Lohegaon, Pune,
Sa i ibai Phule Pune Uni e si y, India.
2
Assis an P o esso , D . D.Y.Pa il School o Engg. And Technology, Lohegaon, Pune,
Sa i ibai Phule Pune Uni e si y, India.
A
BSTRACT
The Pa e n classi ica ion sys em classi ies he pa e n in o ea u e space wi hin a bounda y. In case
ad e sa ial applica ions use, o example Spam Fil e ing, he Ne wo k In usion De ec ion Sys em (NIDS),
Biome ic Au hen ica ion, he pa e n classi ica ion sys ems a e used. Spam il e ing is an ad e sa y
applica ion in which da a can be employed by humans o a enua e pe spec i e ope a ions. To app aise he
secu i y issue ela ed Spam Fil e ing oluminous machine lea ning sys ems. We p esen ed a amewo k o
he expe imen al e alua ion o he classi ie secu i y in an ad e sa ial en i onmen s, ha combines and
cons uc s on he a ms ace and secu i y by design, Ad e sa y modelling and Da a dis ibu ion unde
a ack. Fu he mo e, we p esen ed a SVM, LR and MILR classi ie o classi ica ion o ca ego ize email as
legi ima e (ham) o spam emails on he basis o hee ex samples.
K
EYWORDS
Ad e sa y Model, Mul iple Ins ance Logis ic Reg ession, Pa e n Classi ica ion, Secu i y E alua ion, Spam
Fil e ing
1.
I
NTRODUCTION
This Machine lea ning sys ems bid an unpa alleled esilience in ac ing wi h eme ging inpu in a
a ia ion o applica ions, such as In usion De ec ion Sys ems (IDS) [1] and he spam il e ing o
e-mails. Whene e machine lea ning is used o p e en illegal o unsanc ioned ac i i y [2] and
he e is an economic incen i e, ad e sa ies will a emp o a oid he s abili y p o ided.
Cons ain s on how ad e sa ies can employs he aining da a (TR) and es da a (TS) o
classi ie s used o encoun e inc edulous beha iou make p oblems in his a ea ac able and
in e es ing. Pa e n classi ica ion has ea ned eminence in di e en ields which con ains secu i y
conce ned applica ions like he Spam Fil e ing, he Ne wo k In usion De ec ion Sys em (NIDS),
and Biome ic Au hen ica ion o dis inguish be ween he legi ima e and malicious samples [3].
In speci ic, he e a e h ee main clea issues can be ecognized: (a) examining he weaknesses
( ulne abili ies) o classi ica ion algo i hms, and he co esponding a acks; (b) c ea ing he no el
me hodologies o assess he classi ie secu i y unde hese a acks, which is no possible using he
classical pe o mance e alua ion me hodologies; (c) es ablishing he design me hods [4] o
gua an ee he classi ie secu i y in an ad e sa ial en i onmen . The goal o a acke is o de ea
he no mal p ocess o spam il e s so ha hey can send spams [5].
In e na ional Jou nal o Compu e - Aided Technologies (IJCAx) Vol.3, No. 2/3, July 2016
20
The espi e o he pape is unionized as ollows: The sec ion II, sc u inize abou he p oblem
de ini ion on he Secu i y E alua ion. The sec ion III, discuss a p oposed sys em amewo k o
Spam Fil e ing Secu i y E alua ion. The sec ion IV sc u inizes he algo i hms and e alua ion o
hese algo i hms using examples. The emainde o sec ion V co e s expec ed esul s o
classi ie s. The sec ion VI, summa ize he conclusion and he u u e scope.
2.
P
ROBLEM
S
TATEMENT
2.1. P oblem s a emen
• Exis ing me hods add ess one o he main open issues o e alua ing a design phase he
secu i y o pa e n classi ie s.
• E en hough he design phase o secu e classi ie s is a di e en issue han secu i y
e alua ion, exis ing amewo k could be exploi ed o his end. Fo ins ance spam il e ing
exis ing sys em conside s SVM and LR classi ie .
• To apply an empi ical secu i y e alua ion amewo k and p o ide secu i y o Spam
Fil e ing applica ion and use bes classi ie in ou amewo k.
2.2. Sol ing app oach
The p oposed sys em ocuses on mul iple ins ance logis ic eg ession (MILR) s a egy. In he
ecommended s a egy, emails a e ea ed as bags o mul iple ins ances [6] and a logis ic model a
he ins ance le el is indi ec ly lea ned by exploi ing he bag le el binomial log-likelihood
unc ion [14].
3.
P
ROPOSED
F
RAMEWORK
The con ibu ion o his pape is-
• Classi ica ion o email using SVM, LR and MILR classi ie s.
• In en o inc ease classi ica ion esul s, classi ie called Mul iple Ins ance Logis ic
Reg ession (MILR) is used.
• MILR di e s om a single ins ance supe ised lea ning [7], such ha by spli ing an
email in o se e al ins ances, a MI lea ne will be capable o iden i y he spam pa o he
ex mail e en hough ex mail has been injec ed wi h good wo ds which sol e he
e iciency issue o GWI a ack [14].
The da a dis ibu ion gi es he aining da a and es ing da a sepa a ely [8]. The es ing da a can
be manually gene a ed by he use du ing he compose emails. The assump ions can be gi en wi h
he help o Ad e sa y Model [9]. Modelling he ad e sa y is dependen on he a ack scena ios. I
consis s o goal, knowledge, capabili y, s a egies o he ad e sa y as shown in igu e 1.
The classi ica ion applica ion can be au hen ica ed by au hen ica o . I consis s o h ee classi ie s
like SVM, LR and MILR. The classi ie s ac s like an algo i hms. These classi ie s gi e i s
classi ica ion esul s ei he he email is spam o legi ima e o no mal. The aining da a can be
al eady ained by admin. This use can be an a acke o an au ho ized pe son. Fo his
applica ion use can pe o m classi ica ion echniques (SVM, LR, and MILR) by means o
analysis. This aining egion subsis s o all ypes o mails such as spam o ham. Fo classi ica ion
o es ing pa /mail using aining pa , di e en classi ie s a e used. SVM, LR and MILR
classi ie s a e used o classi ica ion and esul analysis [11].
In e na ional Jou nal o Compu e - Aided Technologies (IJCAx) Vol.3, No. 2/3, July 2016
21
Figu e1. Secu i y E alua ion F amewo k using Spam Fil e ing
The Pa e n classi ie classi ies o dis inguishes he pa e n (which is combina ion o ea u e
which can be cha ac e ized by he indi iduals) in o he ea u e space o wo d space wi hin a
bounda y [10], [12], [13]. The goal is o pa i ion he ea u e o wo d space in he class labeled
decision egions. The e o e, o he ambi ion o model selec ion conside s ha he de elope
wan s o appoin a Suppo Vec o Machine (SVM) wi h a linea ke nel, a Logis ic Reg ession
(LR) classi ie , and Mul iple Ins ance Logis ic Reg ession (MILR). In he p oposed sys em, o
he classi ica ion pu pose SVMs a e ac ualized wi h he LibSVM, Logis ic Reg ession (LR) and
Mul iple Ins ance Logis ic Reg ession (MILR) is used o p ac ical analysis.
4.
A
LGORITHMS WITH EXAMPLE
Fi s ly, a amewo k is p esen ed o he empi ical e alua ion o classi ie based on simula ion o
po en ial a ack scena ios. The exis ing sys em conside s SVM and LR classi ie s [11]. The
p oposed sys em ocuses on mul iple ins ance logis ic eg ession (MILR) s a egy.
4.1. SVM Classi ie
Algo i hm: SVM classi ie
Inpu :
Se o email da a ,
(Combina ion o posi i ely and nega i ely labeled da a)
Ou pu : Classi ied email da a (Spam/Ham)
In e na ional Jou nal o Compu e - Aided Technologies (IJCAx) Vol.3, No. 2/3, July 2016
22
P ocess:
S ep1: Dis ibu ion o posi i ely and nega i ely labeled da a acco ding o ea u es.
S ep2: Compu e Mapping unc ion Ø().
S ep3: Se bias as 1 o e e y ec o s
S ep4: Compu e do (.) p oduc s equa ions
S ep5: Calcula e he alue o disc imina e Hype planes α1, α2, α3,..
S ep6: P edic posi i e and nega i e samples
Example:
Figu e 2 shows he e alua ion o SVM classi ie wi h an example.
Figu e2. SVM example
4.2. LR Classi ie
Algo i hm: LR classi ie
Conside y= 1, i wo d spam and y=0, i wo d is no Spam (Ham).
Fo Spam Fil e ing, classi ica ion o wo ds as spam o ham
In e na ional Jou nal o Compu e - Aided Technologies (IJCAx) Vol.3, No. 2/3, July 2016
23
The p obabili y o wo d is
The p obabili y o spam pe wo d is
The p obabili y o ham pe wo d is
And
Example:
Figu e 3 shows he e alua ion o LR classi ie wi h an example.
Figu e3. LR example
4.2. MILR Classi ie
Algo i hm: MILR classi ie
Inpu :
Se o email da a ,
Be p obabili y ha he i
h
email is Posi i e,
Be p obabili y ha he i
h
email is Nega i e

In e na ional Jou nal o Compu e - Aided Technologies (IJCAx) Vol.3, No. 2/3, July 2016
24
Is a ec o o he wo d equency coun s (o -id weigh ) o unique e ms in e e y email.
P ocess:
S ep 1: Binomial log-likelihood unc ion is:
S ep 2: In a single ins ance se ing, p obabili y has sigmoid esponse unc ion as:
S ep 3: In mul iple ins ances, se ing es ima e he ins ance-le el class p obabili ies
has a sigmoidal esponse unc ion as:
Whe e, is he j
h
ins ance in he i
h
da a, and and a e he pa ame e s ha need o be
es ima ed.
S ep 4: In a single ins ance se ing, p obabili y has sigmoid esponse unc ion as:
S ep 5: In mul iple ins ance se ing es ima e he ins ance-le el class p obabili ies
has a sigmoidal esponse unc ion as:
5.
E
XPECTED
R
ESULTS
The expe imen al esul s o he classi ica ion o legi ima e (no mal) and Spam e-mails a e
compu ed using he accu acy o he classi ie s. The inpu o he p oposed sys em is he numbe o
es ing samples. The compu a ions o esul s a e based on di e en mails. Some mails a e ained
be o e es and some new mails a e also es ed in p oposed sys em. The analysis shows ha MILR
app oach pe o m be e as compa e o SVM and LR algo i hm by conside ing pa ame e s like
ime, mean absolu e e o , e c. The accu acy can be calcula ed using ollowing o mula,
Classi ica ion accu acy:
c
N
N
Ac =
Whe e, = Numbe o wo ds which a e co ec ly classi ied
= To al numbe o wo ds
In his expe imen , he e iciency o SVM, LR and MILR [11], [14] is e alua ed o es he abili y
o classi ie s [15]. Figu e2 shows he ou pu o accu acy on e e y classi ie . He e we use he
Sp ing Tool Sui e (STS) wi h wampse e o da abase connec i i y. Table 1 shows he accu acy
esul o que ies M1, M2, M3 and M4 mail in e ms o pe cen age (%).
In e na ional Jou nal o Compu e - Aided Technologies (IJCAx) Vol.3, No. 2/3, July 2016
25
M1: Code demons a es how o ge inpu om use .
M2: This is oo di y place. Please clean o he wise wash i .
M3: I ha e you idio . I will kill you idio .
M4: The Indian Mujahideen claimed esponsibili y o he Jaipu bombings h ough an email sen
o Indian media and decla ed open wa agains India.
TABLE I: Classi ica ion Accu acy
Figu e2. G aphical ep esen a ion o classi ica ion accu acy
M1 M2 M3 M4
SVM 58.93 61.24 74.49 58.93
LR 55.24 60.2 65.48 58.93
MILR
64.73 68.75 73.33 64.58
In e na ional Jou nal o Compu e - Aided Technologies (IJCAx) Vol.3, No. 2/3, July 2016
26
6.
C
ONCLUSION
This pape ocused on expe imen al secu i y e alua ion o he pa e n classi ie s which imp o e
p edic ion accu acy o spam il e ing applica ion. Fo classi ica ion and analysis h ee classi ie s
a e used, a e called SVM, LR and MILR. The p oposed amewo k acquain ed on a model o he
ad e sa y, and on a model o da a dis ibu ion; accommoda es an analy ical app oach o he
aining and es ing se s gene a ion ha acc edi s secu i y e alua ion and can u nish he
applica ion dis inc echniques. In he u u e, we will ex end he da a classi ica ion algo i hm ha
will imp o e accu acy and pe o mance o he sys em by means o spam de ec ion.
A
CKNOWLEDGEMENTS
We would like o hank he esea che s as well as publishe s o making hei esou ces a ailable
and eache s o hei guidance. We a e hank ul o he au ho i ies o Sa i ibai Phule Pune
Uni e si y and conce n membe s o hei cons an guidelines and suppo . We a e also hank ul
o e iewe o hei aluable sugges ions and also hank he college au ho i ies o p o iding he
equi ed in as uc u e and suppo .
R
EFERENCES
[1] A. A. Ca denas, J.S. Ba as, and K. Seamon, A F amewo k o he E alua ion o In usion De ec ion
Sys ems, P oc. IEEE Symp. Secu i y and P i acy, pp. 63-77, 2006.
[2] D.B. Skillico n, Ad e sa ial Knowledge Disco e y, IEEE In elligen Sys ems, ol. 24, no. 6,
No ./Dec. 2009.
[3] M. Ba eno, B. Nelson, R. Sea s, A.D. Joseph, and J.D. Tyga , Can Machine Lea ning be Secu e?
P oc. ACM Symp. In o ma ion, Compu e and Comm. Secu i y (ASIACCS), pp. 16-25, 2006.
[4] Kunjali Pawa and Madhu i Pa il, A Re iew on Secu i y E alua ion o Pa e n Classi ie agains
A ack, In e na ional Jou nal o Compu e Applica ions (IJCA) P oceedings on Na ional Con e ence
on Ad ances in Compu ing NCAC-2015(4): 19-22, Decembe 2015. (ISSN: 0975-8887).
[5] Y. Song, Z. Zhuang, W. C. Lee, H. Li, C.L. Giles and J. Li Q. Zhao, Real-Time Au oma ic Tag
Recommenda ion, P oc. 31s Ann. In l ACM SIGIR Con . Resea ch and De elopmen in
In o ma ion Re ie al (SIGIR 08), pp. 515-522, 2008.
[6] A. Kolcz and C.H. Teo, Fea u e Weigh ing o Imp o ed Classi ie Robus ness, P oc. Six h Con .
Email and An i- Spam, 2009.
[7] P. Lasko and R. Lippmann, Machine Lea ning in Ad e sa ial En i onmen s, Machine Lea ning,
ol. 81, pp. 115- 119, 2010.
[8] M. Ba eno, B. Nelson, A. Joseph, and J. Tyga , The Secu i y o Machine Lea ning, Machine
Lea ning, ol. 81, pp. 121- 148, 2010.
[9] D.B. Skillico n, Ad e sa ial Knowledge Disco e y, IEEE In elligen Sys ems, ol. 24, no. 6,
No ./Dec. 2009.
[10] P. Lasko and M. Klo , A F amewo k o Quan i a i e Secu i y Analysis o Machine Lea ning,
P oc. Second ACM Wo kshop Secu i y and A i icial In elligence, pp. 1-4, 2009.
[11] B. Biggio, G. Fume a, and F. Roli, Secu i y E alua ion o Pa e n Classi ie s unde A ack, IEEE
T ansac ions On knowledge and Da a engg., ol. 26, No. 4, Ap il 2014.
[12] D. Lowd and C. Meek, Good Wo d A acks on S a is ical Spam Fil e s, P oc. Second Con . Email
and An i-Spam, 2005.
[13] R.O. Duda, P.E. Ha , and D.G. S o k, Pa e n Classi ica ion, Wiley-In e science Publica ion, 2000.
[14] Z. Jo gensen, Y. Zhou, and M. Inge, A Mul iple Ins ance Lea ning S a egy o Comba ing Good
Wo d A acks on Spam Fil e s, J. Machine Lea ning Resea ch, ol. 9, pp. 1115-1146, 2008.
[15] Kunjali Pawa and Madhu i Pa il, Pa e n Classi ica ion unde A ack on Spam Fil e ing, IEEE
In e na ional Con e ence on Resea ch in Compu a ional In elligence and Communica ion Ne wo ks
(ICRCICN-2015), No embe 2015.
In e na ional Jou nal o Compu e - Aided Technologies (IJCAx) Vol.3, No. 2/3, July 2016
27
A
UTHORS
Ms. Kunjali Pawa ecei ed Bachelo o Enginee ing deg ee in Compu e Science &
Enginee ing in 2014 and now pu suing Pos G adua ion (M.E.) in he depa men o
Compu e Enginee ing om D . D.Y.Pa il School o Enginee ing and Technology in
he cu en academic yea 2015-16. She is now s udying o he domain Da a Mining
and In o ma ion Re ie al as esea ch pu pose on Secu i y E alua ion o Pa e n
Classi ie agains a ack using Spam il e ing du ing he academic yea .
P o . M s. Madhu i Pa il. She is p esen ly wo king as an Assis an P o esso in he
depa men o compu e enginee ing, D . D. Y. Pa il School o Enginee ing and
Technology, Pune, Maha ash a, India. She has 6 yea s expe ience in eaching ield and
he esea ch a ea is Da a Mining and In o ma ion Re ie al.