Co esponding au ho : Ida Godwin Ogah.
Copy igh © 2025 Au ho (s) e ain he copy igh o his a icle. This a icle is published unde he e ms o he C ea i e Commons A ibu ion License 4.0.
In es iga ing employee a i ion using machine lea ning echniques
Ida Godwin Ogah *
Depa men o Compu e Science, Facul y o Applied Sciences, WSB Uni e si y, Dąb owa Gó nicza, Poland.
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2223-2239
Publica ion his o y: Recei ed on 03 Ap il 2025; e ised on 11 May 2025; accep ed on 13 May 2025
A icle DOI: h ps://doi.o g/10.30574/wja .2025.26.2.1845
Abs ac
In oduc ion: This s udy in es iga es unde lying issues ha employees migh no openly disclose in exi in e iews by
le e aging machine lea ning echniques o explo e he ac o s causing employee u no e , o e ing insigh s beyond
chu n p edic ions and adi ional exi in e iews. The no el y o his esea ch lies in he use o ML causal in e ence o
d aw conclusions.
Me hods: The machine lea ning algo i hm was ained on 10 ea u es o he da ase wi h 14,999 eco ds. The ea u e
impo ance analysis and clus e ing highligh ed he mos in luen ial ac o s in p edic ing a i ion. Then, p opensi y
sco e ma ching was used o es ima e he causal e ec o hese ea u es on a i ion by compa ing simila g oups o
employees who s ayed and le .
Resul s: The model achie ed an imp essi e accu acy o 95.25% and an F1-sco e o 96.0%, demons a ing he
obus ness o he algo i hm. Fu he analysis, including clus e ing and causal in e ence using p opensi y sco e
ma ching, e ealed dis inc pa e ns among depa ing employees, such as low, us a ed, and high pe o me s.
Conclusion: By employing causal in e ence a he han me ely p edic ion, his s udy o e s a mo e objec i e
unde s anding o he causes o a i ion. The causal model in his esea ch p o ided g ea e anspa ency in o he
decision-making p ocess, allowing HR eams o isualize he ac o s d i ing a i ion and make in o med e en ion
policies.
Keywo ds: Machine Lea ning; Da a Science; Causal In e ence; Da a Analy ics; Human Resou ces; Employee Re en ion
1. In oduc ion
A i ion, also e e ed o as employee u no e , p esen s a c i ical challenge o o ganiza ions globally. Employee
a i ion poses signi ican inancial and ope a ional challenges o many o ganiza ions wo ldwide, and adi ional
me hods like exi in e iews and machine lea ning p edic ions may no always e eal he ue easons o employees
lea ing (Eades, 2022). Mo eo e , some employees hemsel es may no e en be ully conscious o all he ac o s
con ibu ing o hei decision o qui he job. In exi in e iews, qui ing employees may also be hesi an o sha e hei
ue easons o lea ing, especially i hey in ol e nega i e eedback abou he company o hei manage s. They migh
also ea epe cussions o simply wan o a oid con lic . Howe e , machine lea ning models can go beyond p edic ions
o iden i y pa e ns and ela ionships in da a ha migh no be appa en o indi iduals. This s udy in oduces a no el
app oach by le e aging ML causal in e ence o d aw conclusions, po en ially unco e ing hidden issues ha con ibu e
o employee u no e . The objec i e o his s udy is o p o ide aluable insigh s o human esou ce (HR) p o essionals,
enabling hem o de elop da a-d i en s a egies ha mi iga e a i ion and enhance employee e en ion policies.
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2223-2239
2224
In 2021, he U.S. Bu eau o Labo S a is ics epo ed a eco d 4.5 million esigna ions in No embe alone, highligh ing
he u gency o add essing employee a i ion and i s causes (JOLTS, 2021). T adi ional app oaches, such as exi
in e iews, o en ail o e eal he ue easons o employees’ depa u e, as depa ing employees may wi hhold
in o ma ion. Thus, p edic ing a i ion using machine lea ning (ML) models allows o objec i e insigh s based on
employee da a, bypassing subjec i e biases.
Machine lea ning's in eg a ion in o human esou ce (HR) analy ics is ela i ely new bu apidly g owing. Resea ch has
shown ha olun a y a i ion is o en linked o job dissa is ac ion and un ul illed ca ee goals (Tae e al., 2008). As
o ganiza ions s i e o compe i i eness, e aining employees and unde s anding he unde lying ac o s leading o
olun a y esigna ions ha e become impe a i e. This esea ch aims o build on exis ing wo k by applying decision ee
algo i hms o p edic a i ion and iden i y he key a iables a ec ing employee decisions o lea e he hos o ganiza ion.
The s udy con ibu es o he g owing body o li e a u e by ocusing on p ac ical applica ions in HR managemen and
imp o ing e en ion s a egies h ough da a-d i en decision-making.
Key ac o s like employee sa is ac ion, e alua ion sco es, and ime spen a he company a e shown o s ongly in luence
u no e (Pe man, 1973). By accu a ely iden i ying employees a isk o lea ing, HR depa men s can p oac i ely
add ess hese issues h ough ailo ed in e en ions, educing u no e a es and minimizing ope a ional dis up ions.
This p ojec explo ed p edic ing and unde s anding employee a i ion using machine lea ning and s a is ical
echniques, eplacing adi ional exi in e iews. Decision ees iden i ied key ac o s like sa is ac ion le el, las
e alua ion, and ime spen in he company as signi ican a i ion d i e s. Fu he analysis using clus e ing and
p opensi y sco e ma ching helped o e eal pa e ns among employees who le , p o iding insigh s in o po en ial
causes, like low pe o mance, us a ion, o seeking be e oppo uni ies. This app oach o e s aluable da a-d i en
insigh s in o a i ion, guiding in e en ions, and s a egies o employee e en ion wi hou elying on exi in e iews.
This pape is di ided in o sec ions: Sec ion 1 p o ides he backg ound o he s udy. Sec ion 2 p esen s a li e a u e
e iew o exis ing wo k in his domain. Sec ion 3 de ails he me hodological app oach employed. Sec ion 4 p esen s he
esul s and discusses hei implica ions. Finally, Sec ion 5 o e s conclusions and ecommenda ions o u u e esea ch
and p ac ice.
2. Li e a u e e iew
The in eg a ion o machine lea ning in o human esou ce (HR) analy ics is a ela i ely new bu apidly g owing ield.
Se e al s udies ha e success ully demons a ed he use o he K-Nea es Neighbo s (KNN) classi ie in p edic ing
employee a i ion (Yedida e al., 2018), wi h a ocus on model pe o mance and p edic ion accu acy.
Acco ding o esea ch published in he Ha a d Business Re iew, adi ional e idence-based app oaches o iden i ying
he causes and na u e o a i ion ha e limi a ions compa ed o machine lea ning algo i hms (Klo z, 2019). HR
p o essionals and manage s o en ely on exi in e iews and eedback om colleagues who we e closes o he
depa ing employee o ind he cause o a i ion, howe e , esea ch indica es ha many employees do no disclose hei
ue easons o lea ing du ing exi in e iews (König e al., 2022). Richa d e al. (2021) sugges using decision ee
algo i hms o classi ica ion and u no e p edic ion, bu he esea ch does no add ess causali y.
By le e aging machine lea ning echniques such as decision ee ea u e impo ance models, a da a-d i en app oach
can be implemen ed o iden i y key ac o s in luencing a i ion and es ablish causa i e ela ionships. Machine lea ning
o e s a obus al e na i e o adi ional quali a i e and quan i a i e me hods, pa icula ly in handling complexi y and
con iden iali y (Binoy e al., 2010). These algo i hms can ou pe o m con en ional s a is ical app oaches while
mi iga ing biases in oduced by employees wi hholding in o ma ion. Wi h decision ees and expe knowledge,
o ganiza ions can e ec i ely analyze employee u no e wi hou elying solely on exi in e iews.
2.1. Employee e en ion s a is ics and ends
Employee e en ion is a conce n o almos e e y i m, acco ding o s a is ics om a ound he wo ld. E en he mos
success ul CEOs ha e s uggled o keep hei op pe o me s. E e y manage should make e e y e o o keep people
on boa d. When you iew he numbe s below, you'll ealize he impo ance o his. As con ained in Employeepedia
(2017), he ollowing s a is ics we e ga he ed abou a i ion and e en ion in ecen yea s:
• One- hi d o new hi es qui hei jobs wi hin he i s six mon hs. This poin is a c ucial s age o e e y company
as hey need o in ensi y hei e en ion p ocess.
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2223-2239
2225
• 73% o o ganiza ions a e cons an ly e amping hei onboa ding p ocesses o imp o e employee e en ion.
• 45% o e e ed employees will lea e a e wo yea s. Imp o ing you e e al p og am can help you keep he
s a o longe .
• 78% o manage s alue employee e en ion. These a e he companies ha include employee e en ion in hei
budge s way be o e hey hi e.
• 33% o ec ui s knew whe he hey we e going o s ay o he long- e m o sho - e m wi hin hei i s week.
• 50% o emo e wo ke s a e less likely o qui as hey a e mo e sa is ied wi h hei wo king condi ions. They
can wo k a hei own pace in an en i onmen in which hey eel com o able. Remo e wo k is an added
ad an age o companies as hey will no ha e o wo k ha d o e ain hei employees.
• 35% o employees will look o new posi ions i hey don' ge a pay ise in he nex 12 mon hs. I you company
p o ides no sala y inc emen s o bonuses, expec some possible e en ion p oblems. All employees, besides
he newbies, expec highe sala ies o ma ch hei p oduc i i y.
• 9.32% o employe s expec hei employees o job-hop. This will ac as men al p epa a ion o mos employe s
who ail a e aining employees o an ex ended pe iod. I is wise o keep such ends a he back o you mind,
a he same ime you ind ou he cause.
• 33% o supe iso s and manage s a e looking o new oppo uni ies. No one in he company is immune o
lea ing, including he senio leade ship. The leade ship will mo e i hey eel unde alued by you o ganiza ion.
Gua d you business so i does no ge o his poin .
2.2. The Dynamics o Employee A i ion
A esea che once said ha " he posi i e and nega i e e ec s o employees coming and going a e wo sides o he same
coin - employee u no e (Mayhew, 2019). A good unde s anding o he ac o s ha in luence an employee's decision
o qui can help he o ganiza ion be e posi ion i sel . An employee's decision o qui can ha e ei he a posi i e o a
nega i e impac on he company. Mos manage s make he mis ake o belie ing ha no employee is indispensable
(Na lani, 2018). O cou se, no employee is indispensable. Howe e , hey neglec he losses ha he company su e ed
du ing his ime. Employee u no e is mo e han jus he annual pe cen age o employees who le and hose who
s ayed. An employee who qui s in a nega i e sense p omo es a bad image o he company. In addi ion, he e a e he
cos s o hi ing and aining new employees, low p oduc ion due o labo sho ages, los sales, and a bad epu a ion o
he company.
While chu n p edic ions ha e been widely used in a i ion s udies, esea ch sugges s hey o en ail o cap u e he ue
easons behind employee depa u es. To add ess his limi a ion, ou s udy applies causal in e ence, allowing o
conclusions beyond p edic ions and sel - epo ed da a du ing exi in e iews, which may be biased.
3. Me hodology
This s udy employs causal in e ence echniques o analyse a i ion pa e ns, enhancing employee chu n p edic ions
and elimina ing he need o exi in e iews, which a e o en un eliable.
This p ojec employed a mixed-me hod app oach o analyse employee a i ion using a combina ion o causal in e ences
and a machine lea ning algo i hm. The algo i hm was ained on 10 ea u es o he da ase wi h 14,999 eco ds. The
ea u e impo ance analysis was used o iden i y he mos in luen ial ac o s in p edic ing a i ion. To u he
unde s and he unde lying causes, K-means clus e ing was applied o g oup employees who le based on hei
sa is ac ion le el and las e alua ion, e ealing dis inc clus e s like low pe o me s, us a ed pe o me s, and high
pe o me s. Finally, p opensi y sco e ma ching was used o es ima e he causal e ec o hese ea u es on a i ion by
compa ing simila g oups o employees who s ayed and le .
This esea ch o e s an inno a i e solu ion compa ed o adi ional me hods o unde s anding employee a i ion, such
as employee chu n p edic ion, which o e s no insigh in o he causes, and exi in e iews, which o en ely on sel -
epo ed da a and may no ully cap u e he ac ual unde lying easons o employees’ depa u e.
3.1. Da a Sou ce and T ans o ma ion
The da ase , sou ced om an HR da ase om Kaggle, comp ises 14,999 employees, o which 24% had olun a ily le
(Ramin, 2021). This da ase included ea u es such as employee sa is ac ion, las e alua ion sco es, a e age mon hly
wo king hou s, and sala y le els, among o he s. The da a p ep ocessing s eps o ensu e he model's e ec i eness
in ol e cleaning, ans o ming ca ego ical a iables using one-ho encoding, and scaling con inuous a iables, among
o he s. Addi ionally, dummy a iables we e c ea ed o ep esen ca ego ical da a like depa men and sala y ca ego ies.
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2223-2239
2226
These de ails a e explained in he subsequen sec ions. Figu e 1.0 below ep esen s he o e iew o he his o ical HR
da ase .
Figu e 1 Summa y o employee da ase used in his s udy. Sou ce: Kaggle, 2021
To p e en da a leakage, he da ase was me iculously spli in o aining and es ing se s be o e any da a p ep ocessing
o ea u e enginee ing s eps we e applied. This ensu ed ha in o ma ion om he es ing se did no in luence he
model's aining and ha he e alua ion esul s accu a ely e lec ed i s pe o mance on unseen da a. Ca e ul a en ion
was paid o a oid including ea u es ha implici ly con ained in o ma ion abou he a ge a iable in he es ing se ,
he eby u he mi iga ing he isk o da a leakage (Shacha e al., 2011).
3.2. Fea u e Enginee ing
3.2.1. Mean Impu a ion
In o de o p e en da a loss and imp o e model obus ness, he missing alues in he da ase a e illed using he mean
subs i u ion. In s a is ics, mean impu a ion is a me hod whe e missing alues o a pa icula a iable a e eplaced wi h
he mean o he obse ed alues o ha a iable (Lin e al., 2020). This app oach add esses indi idual missing alues
(no en i e eco ds) and is o en used when only some componen s o a da ase a e missing (Waljee e al., 2013).
𝑋 =∑𝑛 =
𝐼1𝑥𝑖
𝑛
We e
• x′ is he impu ed alue,
• 𝑥𝑖 a e he known alues in he ea u e (column),
• n is he numbe o obse ed (non-missing) alues in he column.
3.2.2. Dummy T apping and One-Ho Encoding
To a oid dummy apping o mul icollinea i y o ca ego ical da a, one ca ego y o he depa men a iable was emo ed
as a e e ence poin . The d opped ca ego y becomes he baseline o e e ence le el agains which he o he ca ego ies
a e compa ed. This imp o es s abili y and in e p e a ion du ing model aining. This ensu es ha he model can
in e p e he depa men in o ma ion co ec ly wi hou any issues. This ans o ma ion allows he model o u ilize he
depa men in o ma ion o p edic ions. Mul icollinea i y is a s a is ical issue ha a ises when wo o mo e independen
a iables in a decision ee model exhibi a high deg ee o co ela ion, signi ying a s ong linea ela ionship be ween
he p edic o a iables (Chan e al., 2022).
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2223-2239
2227
One-ho encoding was used based on he idea o indica o a iables in s a is ics, ep esen ing ca ego ical da a as bina y
ec o s o a oid unin ended o dinal ela ionships among ca ego ies (Powe s, 2008).
one−ho (Ci)={1 i he alue belongs o Ci
0 o o he s
This is a ans o ma ion whe e each unique ca ego y (Ci) in he ca ego ical ea u e becomes a new bina y ea u e,
ep esen ed as a ec o . To egula e high ca dinali y due o he explosion in he numbe o dimensions associa ed wi h
one-ho encoding, di e en s a egies we e deployed o educe he impac on memo y, compu a ional e iciency, and
model pe o mance.
• Fea u e Impo ance Analysis: Be o e applying one-ho encoding, we analyze impo an ca ego ies using ea u e
impo ance sco es om models. This emo es i ele an o edundan ca ego ies, educing dimensionali y.
• Hyb id Encoding: Combine one-ho encoding o low-ca dinali y a iables and a ge encoding o equency-
based h esholding o high-ca dinali y a iables. This helps o balance he ade-o be ween in e p e abili y
and compu a ional e iciency.
• P incipal Componen Analysis (PCA): T ans o m he high-dimensional bina y da a in o a smalle se o p incipal
componen s while e aining mos a iance. This helps o main ain ela ionships be ween ca ego ies in he
educed dimensions.
• Clus e ing Simila Ca ego ies: This me hod g oups simila ca ego ies based on da a-d i en simila i y me ics
by K-means clus e ing.
3.2.3. Min-Max Scaling (No maliza ion):
The da ase was escaled o a speci ic ange be ween [0, 1] o ensu e consis ency in some ea u es and enhance model
pe o mance. The o mula o min-max scaling is:
x′ = x−min(x)
max(x)−min(x)
Whe e
• x is he o iginal alue,
• min(x) is he minimum alue o he ea u e,
• max(x) is he maximum alue o he ea u e,
• 𝑥′ is he scaled alue wi hin he ange [0, 1]
3.3. Decision T ee Model T aining
The decision ee algo i hm was selec ed due o i s in e p e abili y and abili y o handle non-linea ela ionships. A
decision ee's pu pose is o di ide he aining se in o homogeneous zones wi h only one i is species p esen based on
he ea u es p o ided, in his case, pe al and sepal wid hs. The ee is c ea ed i e a i ely om he oo o he las lea .
The decision ee model implemen a ion was done using he sci-ki -lea n lib a y in Py hon. The ea u es and a ge s
selec ed o model aining we e based on ea u e impo ance calcula ion by he decision ee classi ie . The ea u es
wi h high impo ance sco es we e de ec ed as majo con ibu o s o he p edic i e model. These ea u es included
sa is ac ion le el, las e alua ion, enu e in he company, wo k acciden , numbe o p ojec s, a e age mon hly hou s,
depa men , and sala y.
To p epa e he da a o machine lea ning, i is necessa y o clea ly de ine he a ge a iable ("s ayed_o _le ") and he
ea u e a iables (all o he columns). This sepa a ion is c ucial o aining and e alua ing he model's pe o mance in
p edic ing employee a i ion based on he p o ided ea u es (Sha ma, 2012). The decision ee model was ained on
he p epa ed aining da a, which consis ed o he selec ed ea u es ( ea u es_ ain) and co esponding a ge a iables
( a ge _ ain), while he a iables o es ing ( ea u es_ es ) and ( a ge _ es ) espec i ely.
• Ta ge Va iable: The "s a us" column was selec ed as he a ge a iable because i 's he a iable he model is
ying o p edic .
• Fea u e Va iables: The emaining columns (excep "s a us") we e selec ed as he ea u es used o make
p edic ions abou he a ge a iable. These ea u es include employee sa is ac ion, sala y, depa men , e c.
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2223-2239
2228
• Model aining: By spli ing he da a in o aining and es ing se s ( a ge _ ain, a ge _ es , ea u es_ ain,
ea u es_ es ), he model was ained on one po ion o he da a and hen e alua ed i s pe o mance on a
sepa a e, unseen po ion. This helps us unde s and how well he model is likely o gene alize o new da a.
• Fi ing P ocess: The i me hod o he DecisionT eeClassi ie objec was called, passing he aining da a as
a gumen s. This p ocess builds he decision ee s uc u e by ecu si ely spli ing he da a based on he
selec ed ea u es and he Gini impu i y c i e ion (de aul ).
3.4. Gini Impu i y
This s udy uses Gini Impu i y as he spli ing c i e ion when building he decision ee model using
DecisionT eeClassi ie . This helps he algo i hm o c ea e a ee ha e ec i ely sepa a es he da a and makes accu a e
p edic ions. I educes he likelihood o inco ec classi ica ion o a andomly chosen node and labels acco ding o he
dis ibu ion o labels in he node. The Gini impu i y 𝐺 o a node is calcula ed as:
𝐺 =1 − ∑𝑝2
𝑐
𝑖=1 i
He e, C ep esen s he o al numbe o classes, and P𝑖 deno es he p opo ion o i ems in he node ha belong o class 𝑖.
The Gini impu i y G anges om 0 (indica ing pe ec pu i y, whe e all elemen s belong o a single class) o 0.5
( ep esen ing maximum impu i y in bina y classi ica ion).
The ollowing assump ions we e made while wo king wi h he decision ee algo i hm.
• In he beginning, he whole aining se is conside ed as he oo .
• Fea u e alues a e p e e ed o be ca ego ical.
• I he alues a e non-ca ego ical, hey a e disc e ized o con e ed o dummy a iables be o e building he
model.
• Reco ds a e dis ibu ed ecu si ely based on a ibu e alues.
O de o place a ibu es as oo o in e nal nodes o he ee is done by using a s a is ical app oach.
3.5. Hype pa ame e Tuning wi h GRIDSEARCHCV
Hype pa ame e uning is c ucial o op imizing machine lea ning models and inding he bes se ings ha yield he
bes pe o mance. The s udy employs G idSea chCV om he sklea n.model_selec ion module o his pu pose. The
C oss- alida ion (CV) echnique is used o e alua e he model's pe o mance by spli ing he da ase in o aining and
alida ion se s mul iple imes. In k- old c oss- alida ion (CV), whe e k is he numbe o olds, he da ase was spli in o
k equal pa s o subse s. The model is ained k imes, each ime using k−1 olds o aining and he emaining old o
alida ion. The a e age c oss- alida ion sco e was calcula ed by
𝐶𝑉𝑠𝑐𝑜𝑟𝑒 =1
𝑘∑𝑠𝑐𝑜𝑟𝑒𝑖
𝑘
𝑖 = 1
The g id sea ch sys ema ically sea ches h ough a p ede ined se o hype -pa ame e s o ind he bes combina ion o
a model. In his s udy, a i e- old g id sea ch c oss- alida ion was employed o ine- une hype -pa ame e s, helping o
p e en o e i ing and enhance model gene alizabili y. The c oss- alida ion sco e was calcula ed as he a e age
pe o mance ac oss he i e olds, gi en by
𝐶𝑉𝑠𝑐𝑜𝑟𝑒 =1
5∑𝑠𝑐𝑜𝑟𝑒𝑖
5
𝑖 = 1
A e he sea ch is comple e, he bes _pa ams_ a ibu e o he pa am_sea ch objec s o es he combina ion o hype -
pa ame e s ha p oduce he highes pe o mance on he alida ion da a.
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2223-2239
2229
Figu e 2 A ypical decision ee spli using he Gini impu i y c i e ion
3.6. Model Gene aliza ion
The goal o machine lea ning is o build models ha can gene alize well o new, unseen da a. The ain- es spli helps
us assess he model's abili y o gene alize. G id sea ch uses c oss- alida ion o e alua e each combina ion o hype -
pa ame e s. Fo each se o hype -pa ame e s, he model was ained and alida ed using c oss- alida ion. This ensu es
ha he chosen hype -pa ame e s gene alize well o unseen da a.
By uning he hype -pa ame e s using G idSea chCV, he s udy aims o imp o e he model's abili y o gene alize well
o unseen da a (a oid o e i ing) and enhance i s pe o mance on he mino i y class ( ixing imbalance). The esul ing
bes pa ame e s guide he c ea ion o a inal model ha is expec ed o achie e be e o e all accu acy, ecall, and
ROC/AUC sco es.
3.7. O e i ing Con ol and Hype -pa ame e Tuning
O e i ing happens when a model lea ns he aining da a oo well, including i s noise and andom luc ua ions, and
pe o ms poo ly on unseen o new da a. The s udy add esses o e i ing using he ollowing echniques:
• Limi ing Max T ee Dep h: Se ing max_dep h in he DecisionT eeClassi ie p e en s he ee om g owing oo
deep and becoming o e ly complex. This helps o gene alize he model be e o unseen o new da a. The s udy
expe imen s wi h di e en max_dep h alues (5 o 20) o ind he op imal one.
• Limi ing Min Sample Size on a Lea : Se ing min_samples_lea ensu es ha a lea node has a minimum numbe
o samples. This p e en s he ee om c ea ing lea es ha a e oo speci ic o indi idual aining ins ances,
he eby educing o e i ing. The s udy explo es min_samples_lea alues anging om 50 o 500 o iden i y
he minimum sample size.
• G id Sea ch: The s udy uses G idSea chCV o sys ema ically sea ch o he bes combina ion o max_dep h and
min_samples_lea alues. This helps o ine- une he model and a oid o e i ing. The bes pa ame e s we e
ound o be max_dep h o 6 and min_samples_lea o 50.
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2223-2239
2230
3.8. Balancing he Class Imbalance
Class imbalance occu s when one class has signi ican ly mo e ins ances han ano he . This can lead o a model ha is
biased owa ds he majo i y class. This s udy handles imbalance using he ollowing echniques:
• Class Weigh ing: The s udy uses he balanced class_weigh pa ame e in he DecisionT eeClassi ie . This
au oma ically adjus s he weigh s o he classes du ing aining, gi ing mo e impo ance o he mino i y class.
This helps he model o lea n he pa e ns o he mino i y class be e .
• Recall and ROC/AUC Sco es: The s udy uses ecall and ROC/AUC sco es o e alua e he model's pe o mance
on he mino i y class. These me ics a e mo e sensi i e o class imbalance han accu acy. Focusing on hese
me ics ensu es ha he model is no biased owa ds he majo i y class.
• Compa ing Balanced and Imbalanced Models: The s udy compa es he pe o mance o balanced and
imbalanced models using ecall and ROC/AUC sco es. This helps o see he impac o class weigh ing on he
model's pe o mance. The esul s show ha he balanced model pe o ms be e on he mino i y class, as
expec ed.
3.9. Model Pe o mance E alua ion using Con usion Ma ix
The con usion ma ix p o ides a de ailed b eakdown o he model's pe o mance ac oss ue posi i es (TP), ue
nega i es (TN), alse posi i es (FP), and alse nega i es (FN). Using hese alues, we can in e p e how he model
pe o ms o each class.
3.9.1. In e p e a ion
• T ue Posi i es (TP): Ins ances co ec ly p edic ed as posi i e.
• T ue Nega i es (TN): Ins ances co ec ly p edic ed as nega i e.
• False Posi i es (FP): Ins ances inco ec ly p edic ed as posi i e (Type I e o ).
• False Nega i es (FN): Ins ances inco ec ly p edic ed as nega i e (Type II e o ).
Accu acy = TP+TN
TP+TN+FP+FN
P ecision = TP
TP+ FP
Recall = TP
TP+ FN
F1-Sco e = 2 ∗ P ecision ∗ Recall
P ecision + Recall
In Figu e 3.0 below, he con usion ma ix used o e alua e p edic ions was illus a ed, showing ue posi i es and alse
posi i es clea ly de ined.
Figu e 3 Con usion ma ix used o e alua e model p edic ions. Sou ce: Au ho 's wo k
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2223-2239
2231
F om he igu e abo e, ou ocus is on T ue-Posi i e (TP) is de ined as he numbe o employees who ac ually le and
we e co ec ly labeled as le . False-Posi i e (FP) is de ined as he numbe o employees who s ayed bu we e w ongly
labeled as le .
3.10. Me hodology o Causal In e ence
We use he p opensi y sco e o es ima e he likelihood o an indi idual ecei ing he ea men based on hei
cha ac e is ics. The highe he p opensi y sco e, he mo e likely hey a e o be in he ea men g oup. By ma ching
indi iduals wi h simila p opensi y sco es, we aim o c ea e compa able g oups, educing he in luence o con ounding
ac o s and allowing o a mo e accu a e es ima ion o he ea men 's causal e ec .
3.10.1. Ma hema ically
P(X) = P (T = 1 | X)
Whe e:
• P(X) = P opensi y sco e, ha is p edic ed p obabili y o ecei ing ea men .
• P = P obabili y unc ion
• T = T ea men s a us (1 = ea ed = le , 0 = con ol = s ayed)
• X = Obse ed co a ia es (con ounding ac o s)
4. Resul s and Discussion
This s udy success ully implemen s a machine lea ning model ha unco e s hidden easons o employee a i ion. By
analyzing employee da a and iden i ying key p edic o s, he decision ee model achie ed high accu acy in p edic ing
ac o s leading o a i ion. This sugges s ha he model can cap u e unde lying issues ha con ibu e o employees'
decisions o lea e, e en when hose ac o s a e no explici ly s a ed by he depa ing employees. The indings o his
s udy ha e impo an implica ions o HR p o essionals, enabling hem o de elop mo e a ge ed and e ec i e e en ion
s a egies ha add ess he oo causes o a i ion, a he han elying solely on po en ially biased o incomple e
in o ma ion ob ained h ough exi in e iews.
The model demons a ed high pe o mance and accu acy o 95.56%. The model success ully iden i ied key ac o s such
as sa is ac ion le el, las e alua ion, and ime spen a he company as key ac o s in luencing employees' decisions o
qui hei jobs.
4.1. Desc ip i e S a is ics Summa y
The da ase comp ised 14,999 employees, wi h an a i ion a e o 24%. Table 1.0 highligh s he o e all dis ibu ion o
he employee da ase , while Figu e 4.0 illus a es he p opo ion o employees who s ayed wi h he company e sus
hose who depa ed.
Table 1 The key ea u es exhibi ed by he da ase
coun
mean
s d
min
25%
50%
75%
max
sa is ac ion_le el
14999
0.612834
0.248631
0.09
0.44
0.64
0.82
1
las _e alua ion
14999
0.716102
0.171169
0.36
0.56
0.72
0.87
1
numbe _p ojec
14999
3.803054
1.232592
2
3
4
5
7
a e age_mon ly_hou s
14999
201.0503
49.9431
96
156
200
245
310
ime_spend_company
14999
3.498233
1.460136
2
3
3
4
10
Wo k_acciden
14999
0.14461
0.351719
0
0
0
0
1
p omo ion_las _5yea s
14999
0.021268
0.144281
0
0
0
0
1
s ayed_o _le
14999
0.238083
0.425924
0
0
0
0
1
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2223-2239
2238
S a emen o e hical app o al
This esea ch u ilizes publicly a ailable anonymized da ase s o model de elopmen , ensu ing compliance wi h e hical
s anda ds and esea ch in eg i y. As such, i does no equi e special app o al o licensing.
Decla a ion o In e es S a emen
The e is no con lic o in e es ela ed o his esea ch. The e is no compe ing inancial, p o essional, o pe sonal
in e es s ha could ha e in luenced he conduc o indings o his esea ch. This s udy was ca ied ou independen ly,
wi h no ex e nal unding o con lic s o in e es .
Funding sou ces
This esea ch did no ecei e any speci ic g an om any unding agencies in he public, comme cial, o no - o -p o i
sec o s.
S a emen o Con ibu ion
This esea ch was conduc ed independen ly, wi h all concep ualiza ion, da a collec ion, analysis, and in e p e a ion
ca ied ou solely by me. My supe iso o e ed guidance and ad ice as needed, ensu ing academic igo and ull
au onomy in execu ing he s udy.
Re e ences
[1] Eades, C. (2022). Using exi in e iews o enhance police employee e en ion and hi ing. Sain Leo Uni e si y.
[2] U.S. Bu eau o Labo S a is ics. (2021). Employee u no e s a is ics. Job Openings and Labo Tu no e Su ey
(JOLTS). Re ie ed om h ps://www.bls.go /jl .
[3] Tae, H. L. e al., (2008). Unde s anding Volun a y Tu no e : Pa h-Speci ic Job Sa is ac ion E ec s and The
Impo ance o Unsolici ed Job O e s. Re ie ed om h ps://doi.o g/10.5465/am .2008.33665124
[4] Yedida, R. (2018). Employee a i ion p edic ion using machine lea ning. Jou nal o Human Resou ces Analy ics,
13(2), 124-135.
[5] Klo z, A. C., and Bolino, M. C. (2019). Do you eally know why employees lea e you company? Re ie ed om
Ha a d Business Re iew. h ps://hb .o g/2019/07/do-you- eally-know-why-employees-lea e-you -company.
[6] Mayhew Ru h (2019), Employee Tu no e De ini ions and Calcula ions
[7] Employeepedia (2017). S a is ics On Employee Re en ion.
[8] h ps://www.employeepedia.com/manage/ e en ion/994-employee- e en ion- heo ies
[9] Richa d, J., Sh eyas, U., and Sanke , J. (2021). Employee a i ion using machine lea ning and dep ession analysis.
IEEE.
[10] Pe man, B. O. (1973). Some ac o s in luencing labou u no e : A e iew o esea ch li e a u e. Indus ial
Rela ions Jou nal, 4(3), 43-61.
[11] Na lani A. (2018). Decision ee classi ica ion in Py hon u o ial.
h ps://www.da acamp.com/communi y/ u o ials/decision- ee-classi ica ion-py hon.
[12] Binoy B, e al., (2010). A Gene ic Algo i hm Op imized Decision T ee-SVM-based S ock Ma ke T end P edic ion
Sys em
[13] Gu u, V., and Su esh, K. (2018). Employee a i ion and employee e en ion:
[14] Challenges and sugges ions. Resea chGa e. Re ie ed om
h ps://www. esea chga e.ne /publica ion/322896996.
[15] Sciki -lea n Decision T ees Use Guide. Re ie ed om h ps://sciki -lea n.o g/s able/modules/ ee.h ml
[16] Riche Valen in (2019). Unde s anding Decision T ees.
[17] Konig, C. J., Rich e , M., and Isak, I. (2022). Exi in e iews as a ool o educe pa ing employees’ complain s abou
hei o me employe and o ensu e esidual commi men . Managemen esea ch e iew, 45(3), 381-397.
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2223-2239
2239
[18] Ramin Huseyn (2021) HR Analy ics Da a Se . Kaggle. h ps://www.kaggle.com/da ase s/ aminhuseyn/h -
analy ics-da a-se
[19] Shacha K., Saha on R., Claudia P. (2011). Leakage in da a mining: Fo mula ion, de ec ion, and a oidance.
[20] Chan, J. Y. L., Leow, S. M. H., Bea, K. T., Cheng, W. K., Phoong, S. W., Hong, Z. W., and Chen, Y. L. (2022). Mi iga ing
he mul icollinea i y p oblem and i s machine lea ning app oach: a e iew. Ma hema ics, 10(8), 1283.
[21] Lin, W. C., and Tsai, C. F. (2020). Missing alue impu a ion: a e iew and analysis o he li e a u e (2006–2017).
A i icial In elligence Re iew, 53, 1487-1509.
[22] Waljee, A. K., Mukhe jee, A., Singal, A. G., Zhang, Y., Wa en, J., Balis, U., ... and Higgins, P. D. (2013). Compa ison o
impu a ion me hods o missing labo a o y da a in medicine. BMJ open, 3(8), e002847.
[23] Powe s, D., and Xie, Y. (2008). S a is ical me hods o ca ego ical da a analysis. Eme ald G oup Publishing.
[24] Gog ay, N. J., and Tha e, U. M. (2017). P inciples o co ela ion analysis. Jou nal o he Associa ion o Physicians
o India, 65(3), 78-81.
[25] Sha ma¹, P., and Bha ia, A. P. R. (2012). Implemen a ion o decision ee algo i hm o analysis he pe o mance.
[26] Sainju, B., Ha well, C., and Edwa ds, J. (2021). Job sa is ac ion and employee u no e de e minan s in Fo une
50 companies: Insigh s om employee e iews om Indeed. com. Decision Suppo Sys ems, 148, 113582.
[27] Memon, M. A., Salleh, R., Mi za, M. Z., Cheah, J. H., Ting, H., Ahmad, M. S., and Ta iq, A. (2021). Sa is ac ion ma e s:
he ela ionships be ween HRM p ac ices, wo k engagemen , and u no e in en ion. In e na ional Jou nal o
Manpowe , 42(1), 21-50.