A Review Of Training Data Selection In Software Defect Prediction

Sinaga, Benyamin Langgu and Ahmad, Sabrina and Abal Abas, Zuraida (2020) A Review Of Training Data Selection In Software Defect Prediction. Journal of Theoretical and Applied Information Technology, 98 (12). pp. 2092-2108. ISSN 1992-8645

Text
FULL PAPER_ A REVIEW OF TRAINING DATA SELECTION IN SOFTWARE.PDF
Download (313kB)

Official URL: http://www.jatit.org/volumes/Vol98No12/9Vol98No12....

Abstract

The publicly available dataset poses a challenge in selecting the suitable data to train a defect prediction model to predict defect on other projects. Using a cross-project training dataset without a careful selection will degrade the defect prediction performance. Consequently, training data selection is an essential step to develop a defect prediction model. This paper aims to synthesize the state-of-the-art for training data selection methods published from 2009 to 2019. The existing approaches addressing the training data selection issue fall into three groups, which are nearest neighbour, cluster-based, and evolutionary method. According to the results in the literature, the cluster-based method tends to outperform the nearest neighbour method. On the other hand, the research on evolutionary techniques gives promising results but is still scarce. Therefore, the review concludes that there is still some open area for further investigation in training data selection. We also present research direction within this area

Item Type:	Article
Uncontrolled Keywords:	Software Defect Prediction, Training Data Selection, Nearest-Neighbor, Cluster-Based, Evolutionary-Based
Divisions:	Faculty of Information and Communication Technology
Depositing User:	Norfaradilla Idayu Ab. Ghafar
Date Deposited:	24 Feb 2021 23:48
Last Modified:	01 Mar 2021 10:21
URI:	http://eprints.utem.edu.my/id/eprint/24911
Statistic Details:	View Download Statistic

Actions (login required)

View Item