ACCURACY ENHANCEMENT OF MACHINE LEARNING MODEL BY HANDLING IMBALANCE DATA

Shaik Mohammed Imran, Dr. Angelina Geetha

Authors

Shaik Mohammed Imran, Dr. Angelina Geetha Author

Abstract

Abstract— As big data has growing rapidly, a new era of scientific research has emerged. Uneven values of response variable distributed, or class imbalance, is one of the most prevalent problems with raw data. This issue arises in many domains when the number of instances with negative labels is far higher than the total number of occurrences with positive labels. For example, it is used in fraud detection, medical diagnostics, and network intrusion detection. Machine Learning (ML) algorithms fail in dealing with imbalanced data because they focus on reducing error rates for the majority category while disregarding the minority. The research aims to propose an effective method to deal with the issue of data imbalance and improve the accuracy of ML models. We use a churn prediction dataset with imbalanced data obtained from Kaggle. The dataset initially contains missing values, irrelevant features, improper data formats, and imbalances. To address these challenges, preprocessing is conducted. For data imbalance, we introduce a novel ensemble margin-based algorithm along with custom methods such as Tomek Links, Synthetic Minority Over-sampling Technique (SMOTE), and NearMiss. The balanced data from each method is then fed into ML models like Support Vector Machine (SVM) and Naïve Bayes (NB). The performance of both models under various techniques is evaluated using positive metrics. Experimental findings indicate that the proposed algorithm achieves the highest accuracy of 97.29% and 98.17% for SVM and NB models, respectively.

ACCURACY ENHANCEMENT OF MACHINE LEARNING MODEL BY HANDLING IMBALANCE DATA

Authors

Abstract

Downloads

Published

Issue

Section

How to Cite

INFO

SCOPUS

SCIMAGO

Latest publications

Make a Submission

Language

Information