Small Dataset Learning In Prediction Model Using Box-Whisker Data Transformation

Lateh, Masitah bdul (2020) Small Dataset Learning In Prediction Model Using Box-Whisker Data Transformation. Masters thesis, Universiti Teknikal Malaysia Melaka.

[img] Text (24 Pages)
Small Dataset Learning In Prediction Model Using Box-Whisker Data Transformation.pdf - Submitted Version

Download (785kB)
[img] Text (Full Text)
Small Dataset Learning In Prediction Model Using Box-Whisker Data Transformation.pdf - Submitted Version
Restricted to Registered users only

Download (2MB)

Abstract

There are several data mining tasks such as classification, clustering, prediction, summarization and others. Among them, a prediction task is widely applied in many real applications such as in manufacturing, medical, business and mainly for developing prediction model. However, to build a robust prediction model, the learning process from the training set are advised to have many samples. Otherwise, learning from small sample sizes might cause prediction task produced an imprecise model. However, to enlarge a sample size and ensure sufficient learning is sometimes difficult or expensive in certain situations. Thus, the information gained from small samples size are deficient. The main reason why a small sample size has problem in extracting the valuable information is that, the information gaps is exist. These gaps should be filled with observations in a complete dataset. However, these observations are not available. This situation has caused most of the learning tools are difficult to perform the prediction task. This is due to a small samples size will not provide sufficient information in the learning process which will lead to incorrect result. From the previous studies, there are solutions to improve learning accuracy and predictive capability where some artificial data will be added to the system using artificial data generation approach. Hence, the aims of this study are proposing an algorithm of hybrid to generate artificial samples adopts Small Johnson Data Transformation and Box-Whisker Plot which is introduced in previous studies. The proposed algorithm named as Box-Whisker Data Transformation considered all samples contain in a MLCC dataset in order to generate artificial samples. This study also investigates the effectiveness of employing the artificial data generation approach into a prediction model. Initially, the quantiles of raw samples are determine using Box-whisker Plot technique. Subsequently, the Small Johnson Data Transformation is employed to transformed raw samples to a Normal Distribution. Next, samples are generated from Normal Distribution. To test the effectiveness of the proposed algorithm, the real and generated samples is added to training phase to build a prediction model using M5 Model Tree. The results of this study are sample quantiles from reasonable range are generated. Not only that, using all samples available in a dataset as a training samples caused the properties of original pattern behaviors is retained. Besides, the effectivess of the learning performance of prediction model are proved when the number of artificial samples are increased, the average of the mean absolute Percentage Error (AvgMAPE) results of a M5 Model Tree are decreased. This reveals that the training size effect the accuracy of prediction models when the sample size is small.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Data mining, Dataset Learning
Subjects: Q Science > QA Mathematics
Q Science > QA Mathematics > QA76 Computer software
Divisions: Library > Tesis > FTMK
Depositing User: F Haslinda Harun
Date Deposited: 27 Oct 2021 16:13
Last Modified: 27 Oct 2021 16:13
URI: http://eprints.utem.edu.my/id/eprint/25379
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item