Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu

Zulkalnain, Mohd Asyraf (2025) Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu. Masters thesis, Universiti Teknikal Malaysia Melaka.

[img] Text (24 Pages)
Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu (24 pages).pdf - Submitted Version

Download (814kB)
[img] Text (Full Text)
Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu.pdf - Submitted Version
Restricted to Registered users only

Download (5MB)

Abstract

Sentiment analysis in Bahasa Melayu leverages Natural Language Processing (NLP) to interpret opinions and emotional tone expressed in Malay texts. This research investigates the application of transformer-based deep learning models, Bidirectional Encoder Representations from Transformers (BERT), DistilBERT, BERT-multilingual, ALBERT, and BERT-CNN, for sentiment classification into positive, negative, and neutral categories. The study addresses challenges in Bahasa Melayu sentiment analysis, including limited annotated resources, linguistic nuances, and common mixed-language usage on platforms like social media.To train and evaluate the models, a large-scale Malay dataset (Malaya dataset) was used. Pretrained models from HuggingFace were fine-tuned using 10-fold cross-validation to improve generalization. Optimization methods such as data augmentation were also implemented. The evaluation considered not just accuracy but also precision, recall, F1 score, and computational efficiency. Among the models, BERT-CNN achieved the best performance, with 96.30% accuracy and consistently high scores across all sentiment classes. BERT also performed well, especially for neutral sentiment, reaching 89.5% accuracy but showed slightly lower recall in the positive class. DistilBERT offered competitive performance (88.96% accuracy) while being faster and more lightweight, making it suitable for deployment in resource-limited environments. BERT-multilingual showed balanced results with a peak accuracy of 89.84%, and ALBERT, despite having fewer parameters, reached 88.76% accuracy but underperformed in positive sentiment recall. The results demonstrate that transformer-based models outperform traditional machine learning and lexicon-based approaches, particularly in handling informal, mixed-language Malay text. The proposed models can support real-world applications such as analyzing consumer sentiment, public opinion, or social response to policies. This study contributes to advancing sentiment analysis for low-resource languages by offering comparative insights and effective model configurations, setting a solid foundation for further research and practical deployment.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Sentiment analysis, Bahasa Melayu, Transformer-Based Models, Deep Learning, Natural Language Processing (NLP)
Subjects: Q Science
Q Science > QA Mathematics
Divisions: Faculty Of Electronics And Computer Technology And Engineering
Depositing User: Norhairol Khalid
Date Deposited: 26 Dec 2025 07:59
Last Modified: 26 Dec 2025 07:59
URI: http://eprints.utem.edu.my/id/eprint/29320
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item