scieee Science in your language
[en] (orig)

N-grams-based file signatures for malware detection

Author: Santos, Igor,Peña Landaburu, Yoseba,Devesa, Jaime,García Bringas, Pablo
Publisher: Scitepress
Year: 2009
DOI: 10.5220/0001863603170320
Source: https://addi.ehu.eus/bitstream/10810/75966/1/04%20n-grams%20paper_1.pdf
N-GRAMS-BASED FILE SIGNATURES FOR MALWARE
DETECTION
Igo San os, Yoseba K. Penya, Jaime De esa and Pablo G. B ingas
S3Lab, Deus o Technological Founda ion, Bilbao, Spain
Keywo ds: Secu i y, Compu e i uses, Da a-mining, Malwa e de ec ion, Machine lea ning.
Abs ac : Malwa e is any malicious code ha has he po en ial o ha m any compu e o ne wo k. The amoun o malwa e
is inc easing as e e e y yea and poses a se ious secu i y h ea . Thus, malwa e de ec ion is a c i ical opic in
compu e secu i y. Cu en ly, signa u e-based de ec ion is he mos ex ended me hod o de ec ing malwa e.
Al hough his me hod is s ill used on mos popula comme cial compu e an i i us so wa e, i can only achie e
de ec ion once he i us has al eady caused damage and i is egis e ed. The e o e, i ails o de ec new
malwa e. Applying a me hodology p o en success ul in simila p oblem-domains, we p opose he use o n-
g ams (e e y subs ing o a la ge s ing, o a fixed lengh n) as file signa u es in o de o de ec unknown
malwa e whils keeping low alse posi i e a io. We show ha n-g ams signa u es p o ide an e ec i e way o
de ec unknown malwa e.
1 INTRODUCTION
The e m malwa e was coined o name any compu e
p og am wi h malicious in en ions, such as i uses,
wo ms, o T ojan ho ses. As one may hink, pa allel
o he g ow o he In e ne , he amoun , powe , and
a ie y o malwa e inc eases e e y yea (Kaspe sky,
2008), as well as i s abili y o a oid all kind o secu i y
ba ie s.
The classic me hod o de ec hese h ea s consis s
on wai ing o a ce ain numbe o compu e s o be in-
ec ed, de e mining hen a file signa u e o he i us
and finally finding a specific solu ion o i . In his
way, based on he lis o signa u es (also known as
signa u e da abase (Mo ley, 2001)), he malwa e de-
ec ion so wa e can p o ide p o ec ion agains known
i uses (ie. hose on he lis ). This app oach has
p o ed o be e ec i e when he h ea s a e known in
be o ehand, and is he mos ex ended solu ion wi hin
an i i us so wa e. S ill, as al eady men ioned, i ails
when acing new ones.
Mo eo e , upon new i us appa i ion and un il he
co esponding file signa u e is ob ained, mu a ions o
he o iginal i us eleased in he meanwhile may es-
cape o de ec ion based on ha signa u e.
These ac s ha e led o a si ua ion in which
malwa e w i e s de elop new i uses and di e en
ways o hiding hei code, while esea che s design
new ools and s a egies o de ec hem (Nachenbe g,
1997). Such e olu ion makes e y di ficul o de-
elop an uni e sal malwa e de ec o . Thus, he ask o
he esea che is o achie e e y good esul s agains
known malwa e and o u n he a emp o w i ing new
unde ec able i us mo e di ficul .
Gene ally, he e a e se e al indica o s o e alua e
he e ec i eness o a new malwa e de ec ion sys em.
Fi s , we ha e o look a i s malwa e de ec ion a-
io (i.e. he amoun o i uses o a sample ha he
so wa e de ec s). Second, we ha e o look a he
alse posi i e a io (i.e. he amoun o non malicious
p og ams ha a e e oneously classified as malwa e),
since i de e mines how p ac ical he me hod is o be
comme cialised: a new sys em able o de ec a lo o
malwa e bu a he cos o a high a e o alse posi i es
is no p ac ical in he eal wo ld, whe e he sys em
mus deal wi h a lo o benign so wa e ha should
no be classified as malwa e.
The e o e, an i i us companies usually p e e a
modes de ec ion a io wi h low (o ideally ze o) alse
posi i e a io a he han a no able de ec ion one wi h
high also a alse posi i e a io.
Language ecogni ion is a esea ch a ea ha has
ackled a simila p oblem, since hey also ha e o
deal wi h he e ie al o in o ma ion ha is ha d o
see a he fi s glance. The mos ex ended echnique
agains his p oblem has been he use o he so-called
317
San os I., Penya Y., De esa J. and B ingas P. (2009).
N-GRAMS-BASED FILE SIGNATURES FOR MALWARE DETECTION.
In P oceedings o he 11 h In e na ional Con e ence on En e p ise In o ma ion Sys ems - A icial In elligence and Decision Suppo Sys ems, pages
317-320
DOI: 10.5220/0001863603170320
Copy igh c
©SciTeP ess