Barkah, Azhari Shouni (2025) Enhancing anomaly detection performance in imbalanced datasets using a deep generative model and Tomeklinks approach. Doctoral thesis, Universiti Teknikal Malaysia Melaka.
![]() |
Text (24 Pages)
Enhancing anomaly detection performance in imbalanced datasets using a deep generative model and Tomeklinks approach (24 Pages).pdf - Submitted Version Download (932kB) |
![]() |
Text (Full Text)
Enhancing anomaly detection performance in imbalanced datasets using a deep generative model and Tomeklinks approach.pdf - Submitted Version Restricted to Registered users only Download (3MB) |
Abstract
Data imbalance is a problem in machine learning. Unbalanced classes cause a common problem in machine classification, where there is a disproportionate ratio within each class. Data imbalance results in a decrease in model quality, where the model can provide high accuracy but only applies to the majority of data and ignores minority data. Many techniques are used to deal with class imbalance problems, namely the resampling technique, which includes oversampling and undersampling. Both of these techniques aim to change the ratio between the majority and minority classes. By making the training data more balanced, resampling allows different classes to have relatively the same effect on the results of the classification model. The oversampling technique is used because of the independence of the classifier, especially with random oversampling and synthetic minority oversampling. However, this technique causes overfitting problems because random oversampling only duplicates the minority data class. Besides that, it also increases data training time. The overlapping problem caused by synthetic minority oversampling is solved by using an approach based on local information, not on the distribution of the minority class as a whole, in synthesizing new data. Besides that, it also causes data noise in the samples because the separation between the majority and minority class groups is not clear. Aiming to address the problem of dataset imbalance that improves the performance of anomaly detection in detecting new and rare attacks, this research proposes an enhanced ANIDS model called as DGT-RF using a Conditional Generative Adversarial Network (CGAN) combine with TomekLinks and Random Forest as classifier. According to test and evaluation reports, DGT-RF has proven successful in increasing the performance of anomaly detection to detect new and rare attacks on extreme imbalance minority classes. The validation results show that this model outperforms previous work by an average of 7.62% accuracy. In the future, aiming to improve the performance in detecting new and rare attacks, the use of techniques like data balancing other variants of synthetic data based on deep learning will need to be considered.
Item Type: | Thesis (Doctoral) |
---|---|
Uncontrolled Keywords: | Imbalanced data, Anomaly detection, Deep generative model, rare attacks, IDS |
Subjects: | Q Science Q Science > QA Mathematics |
Divisions: | Faculty of Information and Communication Technology |
Depositing User: | Norhairol Khalid |
Date Deposited: | 10 Oct 2025 07:58 |
Last Modified: | 10 Oct 2025 07:58 |
URI: | http://eprints.utem.edu.my/id/eprint/29011 |
Statistic Details: | View Download Statistic |
Actions (login required)
![]() |
View Item |