Enhancing Accuracy Of Credit Scoring Classification With Imbalance Data Using Synthetic Minority Oversampling Technique-Support Vector Machine (SMOTE-SVM) Model

Bingamawa, Muhammad Tosan (2017) Enhancing Accuracy Of Credit Scoring Classification With Imbalance Data Using Synthetic Minority Oversampling Technique-Support Vector Machine (SMOTE-SVM) Model. Masters thesis, Universiti Teknikal Malaysia Melaka.

[img] Text
Enhancing Accuracy Of Credit Scoring Classification With Imbalance Data Using Synthetic Minority Oversampling Technique-Support Vector Machine (SMOTE-SVM) Model - 24 Pages.pdf - Submitted Version

Download (102kB)

Abstract

Credit is one of the business models that provide a significant growth. With the growth of new credit applicants and financial markets, the possibility of credit problem occurrence also become higher. Thus, it becomes important for a financial institution to conduct a preliminary selection to the credit applicants. In order to do that, credit scoring becomes one of the models used by a financial institution to perform a preliminary selection of potential customer. One of the most common techniques used to develop a credit scoring model is data mining classification task. However, this technique provides difficulties in classifying imbalanced data distribution. It is because imbalanced data problem may lead the classifier to perform misclassification by classified all of the data into majority class and perform poorly on minority class. In the case of credit scoring, credit data also have imbalanced data distribution. Therefore, classifying a credit data with imbalanced data distribution using unappropriated technique may lead the classification provides a wrong decision result for a financial institution. In this study, several methods for handle imbalanced data problem are identified. Moreover, an improvement of credit scoring model with imbalanced data problem in a financial institution using SMOTE-SVM model is also proposed in this study. This study is conducted in five phases which are data collection, data pre-processing, feature selection, classification, validation, and evaluation. For the experiments using SMOTE-SVM model, the experiments are conducted by taking a consideration in different data ratio and nearest neighbours used in SMOTE. The result of experiments provides that the accuracy and performance result are improved along with the balanced data using SMOTE-SVM model. The performance measurement using 10-fold cross validation and confusion matrix shows that SMOTE-SVM model can correctly classify most of the data in each class with the good result of accuracy, class precision, and class recall. Based on this result, an SMOTE-SVM model is believed to be effective in handling imbalanced data for credit scoring classification.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Business intelligence, Data mining, Credit -- Management, Decision making -- Statistical methods
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics > QA76 Computer software
Divisions: Library > Tesis > FTMK
Depositing User: Nor Aini Md. Jali
Date Deposited: 25 Apr 2018 09:23
Last Modified: 25 Apr 2018 09:23
URI: http://eprints.utem.edu.my/id/eprint/20759
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item