Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17330380
14
A F amewo k o C edi Risk Analysis using Machine Lea ning
Sh eeya Gup a1, Ga ima Tyagi2
1S uden (BCA),School o Compu e Applica ion & Technology, Ca ee Poin Uni e si y,
Ko a(Raj.), India
2P o esso , School o Compu e Applica ion & Technology, Ca ee Poin Uni e si y, Ko a
(Raj.), India
Abs ac
Th ough c edi isk p edic ion, his pape in es iga es how machine lea ning migh enable
banks make be e lending decisions. We seek o ca ego ize bo owe s as ei he "good" o
"bad" c edi isks using he IDBI C edi da ase , which comp ises in o ma ion om 1,000
applican s including age, employmen s a us, loan de ails, and accoun his o y.
We i s ca e ully explo ed he da ase and looked o ends ha migh comp omise
c edi wo hiness. We isualized impo an ends, cleaned and p ep ocessed he da a, and
made p edic ions using se e al models—including andom o es s, decision ees, and logis ic
eg ession. Ou esul s emphasize which elemen s mos in luence a cus ome 's c edi isk and
show ha machine lea ning can be a use ul ool o isk assessmen enhancemen .
Keywo ds: C edi Risk, Machine Lea ning, Risk Assessmen , P edic i e Analy ics, Financial
Modeling, Da a Mining, C edi Sco ing
In oduc ion
Banks unde mo e p essu e han e e o p ecisely e alua e c edi isk in he as changing
inancial scene o oday. Making he co ec lending decisions is c ucial o p ese ing
inancial s abili y as much as o p o i abili y. This p ojec in es iga es closely how machine
lea ning migh enable inancial ins i u ions o o ecas loan applican de aul ing on paymen
likelihood.
We base ou esea ch on he well-known c edi analysis esou ce, he IDBI C edi (S a log)
da ase . The e a e 1,000 eco ds in i , each one a dis inc pe son iden i ied as ei he a "good"
o "bad" c edi isk. Age, job s a us, loan pu pose, accoun balances, and c edi his o y a e
among he inancial and pe sonal elemen s in he da ase . Wi h 600 en ies agged "good" and
600 "bad," he ca ego iza ion
We i s did a comp ehensi e explo a o y da a analysis (EDA), looking a missing alues,
ea u e dis ibu ion, and a iable ela ionships. Visualiza ions including boxplo s, coun
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17330380
15
g aphs, and hea maps helped us ind help ul ends sepa a ing low- isk om high- isk
applican s.
We ained se e al classi ica ion models ollowing da a cleansing and p epa a ion (by
handling missing alues, encoding ca ego ical da a, and nume ical ea u e scaling). These
comp ised mo e di icul models like Random Fo es s and simple models like Logis ic
Reg ession and Decision T ees. We e alua ed hei pe o mance ac oss accu acy, p ecision,
ecall, and F1-sco e.
Beyond simply accu acy, we also concen a ed on in e p e abili y—a c ucial conside a ion in
he inancial indus y. Decision ees and ea u e impo ance cha s le us explain why a
model p oduced pa icula p edic ions, so s eng hening ou case.
Li e a u e Re iew
Bezawada B ahmaiah (2022) conduc ed an empi ical s udy on c edi isk con ol in Indian
comme cial banks be ween 2017 and 2021. Resul s indica e ha p i a e sec o banks always
su passed hei public coun e pa s in e ms o c edi isk managemen . This be e
pe o mance was demons a ed by highe asse le el and p o i abili y. The s udy emphasized
he impo ance o sys ema ic p ocesses, including iden i ying isks, acking, and con ol
mechanisms.
F om 2010 o 2017, Liaqa Ali and Sonia Dhiman (2019) looked a he ela ion be ween he
public sec o banks' p o i abili y and c edi isk managemen . Thei esea ch ound ha while
low liquidi y and bad asse quali y can hu a bank's pe o mance, capi al adequacy and
ea nings quali y ha e a posi i e impac on ROA.
Punya a Bu ola and eamma es (2022) s udied a g oup o 38 scheduled comme cial banks
om 2005 o 2019. They ound ha highe c edi - o-deposi a ios, be e ope a ing p o i s,
and inc eased capi al adequacy we e all posi i ely connec ed wi h bank p o i abili y.
Con e sely, a highe ne in e es ma gin and an inc ease in non-pe o ming asse s (NPAs)
ha e been linked o wo se inancial pe o mance.
Suni ha G. and V. Venu Madha (2021) looked a how c edi a ings wo k o con ol he
isk o c edi . Thei analysis shows ha good a ings educe banks' o e all c edi isk and ise
loan a ailabili y. The s udy unde lined how c ucial c edi a ing agencies a e o keeping he
heal h o he inancial sys em.
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17330380
16
Tisa Ma ia An ony and Su esh G. (2023) looked a 31 Indian comme cial banks om 2012
o 2021 o ind he ac o s ha a ec he isk o c edi . Thei esul s show ha an imp o ed
Re u n on Equi y (ROE) gene ally dec eases c edi isk, e en hough mac oeconomic ac o s
like he g ow h o GDP and in la ion, as well as bank-speci ic ac o s like age and he i le
ype, ha e a signi ican impac on he na u e o c edi isk.
Shahni Singh e al. pe o med a compa ison o he impac o c edi isk and deb co e age
a ios on he ea nings o banks in he public and p i a e sec o s in 2023. Thei indings,
which e ealed majo di e ences be ween he wo indus ies, indica ed ha c edi isk and
bo owing co e age we e bo h impo an indica o s o p o i abili y.
Mani Bhushan Kuma (2023) highligh ed he impo ance o ope a ional isk managemen .
In o de o Indian banks o main ain hei inancial s abili y and p omo e economic g ow h,
his esea ch made clea he necessi y o a s ong ope a ional isk amewo k. He alked abou
he challenges banks ace when se ing up hese amewo ks and o e ed enhancemen s o
egula o y measu es.
Das and Kumbhaka (2010) used a andomly gene a ed on ie app oach o analyze how
well Indian banks manage he isk- e u n ade-o . They ound ha la ge banks a e ypically
mo e e icien . In e es ingly, public sec o banks we e ound o be mo e p o i -e icien e en
hough hey lagged behind p i a e banks in e ms o cos -e iciency.
Kau and Gup a (2015) saw an inc easing end in he echnical e iciency o Indian banks
o e ime. Thei s udy ound ha p i a e sec o banks opped public banks, especially in cos
con ol, highligh ing he cons an need o public banks imp o e hei isk managemen and
ope a ional s a egies.
Resea ch Gap
1. No enough s udy in o ad anced AI and machine lea ning models (e.g., XGBoos ,
Neu al Ne wo ks, and Ensemble Models) ha could o e highe p ecision and
insigh s om highly dimensional da a o c edi isk p edic ions.
2. The e is an absence o esea ch on dynamic o eal- ime isk assessmen models ha
adap o changing bo owe beha io , s ock ma ke s, and economic condi ions by
using cu en da a s eams.
3. 3. inadequa e s udy o he pa e ns o c edi isk a he sec o and egional le els. ( o
ins ance, MSMEs, housing loans, and ag icul u e).
4. Unde s udied Collabo a ion o Mac oeconomic and Beha io al Fac o s: Few
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17330380
17
in eg a ed isk assessmen models in eg a e inancial, beha io al, and mac oeconomic
ac o s in o one model.
Objec i es
To igu e ou and look in o he p ima y ac o s impac ing c edi isk: The p ima y
objec i e o his objec i e is o ind he inancial and demog aphic ac o s ha ha e
he mos e ec on a cus ome 's g ouping as a good o bad c edi isk, including age,
ype o job oppo uni ies, c edi amoun , and accoun posi ion.
To enhance model pe o mance, use solid da a p ep ocessing echniques: Be o e
aining he model, he da ase needs o be p epa ed by handling missing alues,
encoding ca ego ical a iables, and scaling nume ical ea u es. These p ep ocessing
s eps make su e he da a is clean, consis en , and sui able o p ecise machine lea ning
p edic ions.
To isualize ea u e dis ibu ions and ela ionships using explo a o y da a analysis
(EDA): Using ools such as boxplo s, his og ams, coun plo s, and hea maps, he s udy
seeks o ind pa e ns and co ela ions in he da ase . The in o ma ion p o ided may
cla i y wha ac o s ha e he g ea es connec ion o c edi isk esul s.
To c ea e an easy o unde s and model sui able o p ac ical banking applica ions:
P edic i e accu acy is impo an , bu so a e he inal model's in e p e abili y and
anspa ency. Models ha o e clea easoning, like decision ees o hose wi h
isualized ea u e impo ance, a e mo e likely o be execu ed in inancial ins i u ions
whe e simplici y is a c i ical equi emen .
The wo k will c ea e and e alua e mul iple classi ica ion models o c edi isk p edic ion by
aining and es ing a ange o models, including Random Fo es , Decision T ee, and Logis ic
Reg ession classi ie s. Common e alua ion me ics such as accu acy, p ecision, ecall, and
F1-sco e will be used o assess each model's pe o mance in o de o de e mine which one
pe o ms bes .
Tool/So wa e
Py hon
Jupy e No ebook
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17330380
18
Pandas
Ma plo lib / Seabo n
Sciki -lea n
NumPy
Me hodology
Pu pose
• P ima y p og amming language
• Code de elopmen and documen a ion
• Da a manipula ion and cleaning
• Da a isualiza ion
• Machine lea ning model building and e alua ion
• Nume ical ope a ions
1. Techniques and P ocedu es
a Explo a o y Da a Analysis (EDA)
Desc ip i e s a is ics and summa y me ics
Visualiza ion using boxplo s, his og ams, ba cha s, and hea maps
b. Da a P ep ocessing
Handling Missing Values: D opping o impu ing missing da a
Encoding Ca ego ical Va iables: Using one-ho encoding o label encoding
Fea u e Scaling: S anda diza ion/No maliza ion o nume ical ea u es
T ain-Tes Spli : Di iding he da ase (80% aining, 20% es ing).
c. Model Building
Algo i hms Used:
Logis ic Reg ession
Decision T ee Classi ie
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17330380
19
Random Fo es Classi ie
Model E alua ion Me ics:
o Accu acy
o P ecision
o Recall
o F1-Sco e
o Con usion Ma ix
d. Model Compa ison
E alua e pe o mance ac oss models o selec he mos accu a e and eliable one.
Analyze ea u e impo ance o iden i y he key p edic o s o c edi isk.
Flowcha diag am
Desc ip ion
Ga he ing and Combining Da a
The i s s ep in his p ojec was ga he ing ele an da a ha indica ed a ious aspec s
o a bo owe 's inancial and pe sonal p o ile. Fo his, we used he IDBI C edi
Da a Collec ion &
In eg a ion. Da a P ocessing Fea u e
enginee ing
Machine lea ning
& Model
De elopmen . Model E alua ion C edi Risk
Assesmen .
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17330380
20
da ase , which builds up in o ma ion om loan applica ion eco ds, demog aphic da a,
and accoun his o ies. By combining his da a in o a single, logical o ma ha ensu ed
egula i y ac oss a iables, a ounda ion o p ecise analysis and model de elopmen
was laid.
Da a P ocessing
The aw da ase wen h ough an ex ensi e ans o ma ion o da a and cleaning p ocess
a e i was pu oge he . This s ep included inding and co ec ing missing alues,
no malizing con inuous ea u es, and con aining ca ego ical a iables in o nume ical
o ma . likewise any inconsis en o ex eme alues ha migh shi he indings we e
ound using ou lie de ec ion echniques.
Enginee ing Fea u es
Du ing his phase, new a iables we e c ea ed o cu en ones we e modi ied o be e
cap u e key ends in he da a. Age g oups and deb - o-income a ios, o example, a e
examples o a byp oduc ea u es ha helped unco e ela ionships ha we e no
appa en in he aw da a. The selec ion o ea u es was also used o emo e
unnecessa y o low- a iance a iables wi h he goal o s eamline he model and
inc ease accu acy in p edic ing.
De elopmen o Models o Machine Lea ning
Once we had a clean, well-s uc u ed da ase , we wen on o he model-building
phase. Se e al machine lea ning me hods, including Random Fo es classi ica ion
algo i hms, Decision T ees, and Logis ic Reg ession, we e used. Each model was
ained using c oss- alida ion echniques o ensu e eliabili y and p e en o e i ing
om occu ing In his case, building models
E alua ion o he Model
The e iciency o each model was ho oughly assessed based on indus y-s anda d
measu emen s, including accu acy, p ecision, ecall, F1-sco e, and ROC-AUC. These
me ics o e ed a ai e alua ion o he models' abili y o classi y bo h good and bad
c edi isks. This e alua ion s ep needed o be ca e ully weighed in o de o iden i y
he model ha p o ided he bes balance be ween p edic ion asse and cla i y.
Examina ion o C edi Risk
In he inal s age, he bes -pe o ming model was used o assess he c edi isk le els
o applican s. The esul s o he model we e used o ca ego ize candida es in o isk
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17330380
21
g oups, such as low, mode a e, o high isk. These insigh s could bene i banks in
making decisions ega ding loan app o als, in e es a e assignmen s, and cus ome
managemen ac ics.
Knowledge o Da a
The IDBI C edi da ase , which includes comp ehensi e eco ds o 1,000 cus ome s,
was used in his in es iga ion. E e y en y con ains inancial and pe sonal da a ha is
necessa y o assess c edi isk. The da ase is made up o :
Nume ical cha ac e is ics: loan du a ion, age, and c edi amoun
Gende , job ype, housing si ua ion, checking and sa ings accoun s a us, and loan
pu pose a e examples o ca ego ical ea u es.
The a ge a iable is: Risk ha is classi ied as "Good" o "Bad" Abou 70% o he
cases a e classi ied as ha ing "Good" c edi , and 30% a e classi ied as ha ing
"Bad" c edi , acco ding o ou p elimina y analysis. Because i can a ec
e alua ion me ics and aining e icacy, his sligh class imbalance mus be aken
in o accoun when de eloping he model.
P ep ocessing & Da a Cleaning
Handling Missing Values: When null alues showed up in some eco ds, hey
we e ei he emo ed o he sake o simplici y o , i needed, he missing en ies
we e calcula ed using s a is ical me hods such as mean o mode subs i u ion.
Da a Type Con e sion and Encoding: Ca ego ical a iables we e con e ed in o
nume ical o ma using a one-ho encoding me hod. This s ep is equi ed o
machine lea ning algo i hms in o de o in e p e hese a iables co ec ly.
Ou lie De ec ion and Handling: Box plo s we e used o iden i y ou lie s in
nume ical ields such as c edi amoun and loan du a ion. When ex eme alues
could skew model lea ning, adjus men s we e made using ans o ma ion o being
excluded.
Fea u e Scaling: S anda diza ion and no maliza ion me hods we e used o e i y
ha all ea u es wo ked on a iables wi h numbe s.
Analysis o Explo a o y Da a (EDA)
Uni a ia e Analysis: Ba plo s and his og ams we e u ilized o isualize he
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17330380
22
dis ibu ion o a ious ea u es. Fo ins ance, a signi ican numbe o applican s
we e be ween he ages o 25 and 40, and mos c edi amoun s we e on he
smalle side o he ange.
Bi a ia e Analysis: To look a he connec ion be ween ea u es and he a ge
a iable, box plo s and coun plo s we e used. I became clea ha applican s
wi h "Bad" c edi we e mo e likely o ha e la ge loan amoun s and longe loan
e ms. Addi ionally, he "Bad" c edi ca ego y had a high p opo ion o
cus ome s wi hou checking accoun s.
Co ela ion Analysis: A hea map was made o look a how nume ical
a iables connec ed o one ano he . C edi was ound o ha e a sligh posi i e
co ela ion ( ≈ 0.62) wi h mos ea u es, showing low
Fea u e Selec ion and Model P epa a ion
Fea u es wi h low a iance o a poo ela ionship wi h he a ge a iable we e
emo ed.
Using ea u e impo ance sco es om ea ly models like Random Fo es s and
Decision T ees, he mos impo an p edic o s we e selec ed.
The inal da ase was spli 80/20 in o aining and es se s o allow o an
accu a e assessmen and e i ica ion o model pe o mance.
Finding Pa e ns and C ea ing Insigh s
C edi isk had a high co ela ion wi h ac o s such as checking accoun s a us,
age, c edi amoun , and du a ion. I 's impo an ha c edi isk was linked o
ends in inancial beha io a he han any one ac o , emphasizing he
impo ance o mul i- ea u e models.
The chance o de aul was g ea e o applican s wi h li le o no balances in hei
checking o sa ings accoun s.
Findings
Unbalanced Dis ibu ion o Risk
o Acco ding o a p elimina y analysis o he da a, 70% o he applican s