Malay language vowel classification using audio image profile via deep learning for speech disorder rehabilitation assessment

Ahmad Azhar, Nur Syahmina (2024) Malay language vowel classification using audio image profile via deep learning for speech disorder rehabilitation assessment. Masters thesis, Universiti Teknikal Malaysia Melaka.

[img] Text (24 Pages)
Malay language vowel classification using audio image profile via deep learning for speech disorder rehabilitation assessment.pdf - Submitted Version

Download (957kB)
[img] Text (Full Text)
Malay language vowel classification using audio image profile via deep learning for speech disorder rehabilitation assessment.pdf - Submitted Version
Restricted to Registered users only

Download (6MB)

Abstract

Communication impairments can result from various medical conditions, such as speech problems, hearing loss, brain injuries, strokes, and physical disabilities. These conditions can affect verbal and non-verbal communication and may require specific rehabilitation and therapy. Currently, speech rehabilitation and treatment are time-consuming and involve physical activity, with most facilities still manually performing the process. However, technological advancements, such as Artificial Intelligence (AI), have opened up innovative solutions in speech rehabilitation. AI studies have focused on speech classification for various human languages, with the potential to revolutionize speech rehabilitation and make it more accessible to individuals worldwide. Since computer vision has impacted this field, machine learning and deep learning have been applied to the medical and healthcare industries to enhance rehabilitation by utilizing the new technology. Convolutional Neural Network (CNN) network models have been proven in countless studies to be precise at classifying performance in object and speech classification. This research analyzed the performance accuracy of different deep learning comparative network models, proposed network models, VGG-Net, AlexNet, and Inception, and performed a complete comparative analysis to assess these network models' classification accuracy and suitability for rehabilitation purposes. This thesis aims to develop a reliable vowel classification system with high-performance accuracy that can successfully recognize the classification of vowels in the normal person group, the post-stroke patient group with speech disorders, and the combination of both groups using the two proposed image profiles: the Mel spectrogram and the Mel Frequency Cepstral Coefficients (MFCC). According to the experimental results, the proposed network network model, which used six batch sizes, 20 epochs, and ADAM as the optimizer, managed to outperform the performance accuracy of the other existing comparative network network models. The highest performance accuracy gained for the Mel spectrogram, and MFCC image profile in the analyses conducted was 96.30% and 98.77%, respectively.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Communication impairments, Speech therapy, Rehabilitation technology
Divisions: Library > Tesis > FTKEK
Depositing User: Muhamad Hafeez Zainudin
Date Deposited: 03 Apr 2025 09:44
Last Modified: 03 Apr 2025 09:44
URI: http://eprints.utem.edu.my/id/eprint/28630
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item