Enhancement of text representation for Indonesian document summarization with deep sequential pattern mining

Dian Sa’adillah Maylawati (2023) Enhancement of text representation for Indonesian document summarization with deep sequential pattern mining. Doctoral thesis, Universiti Teknikal Malaysia Melaka.

[img] Text (24 Pages)
Enhancement of text representation for Indonesian document summarization with deep sequential pattern mining.pdf - Submitted Version

Download (456kB)
[img] Text (Full Text)
Enhancement of text representation for Indonesian document summarization with deep sequential pattern mining.pdf - Submitted Version
Restricted to Registered users only

Download (19MB)

Abstract

Readability is a great challenge necessary to solve in text summarization research. Referring to the previous research studies, one key concern is minimizing the gap between the summary result and reader understanding. It is important to keep the meaning of the text to reach a readable summary result. However, every language has its grammar and structure characteristics. This also happens to the Indonesia language, in which a specific treatment is needed to find the meaning of the text. The present study hypothesizes that readability can be achieved with text representation that maintains the meaning of text documents well. Therefore, the present study aims: (1) to improve Indonesian text summary by enhancing the Sequence of Word (SoW) as text representation using Sequential Pattern Mining (SPM) with PrefixSpan algorithm since the effectiveness of SPM in Indonesian is proven useful for text classification and clustering; (2) to combine SPM and Deep Learning (DeepSPM) in text summarization with Indonesian text, as a result of its superior accuracy when trained with large amounts of data; and (3) to evaluate the readability of Indonesian text summary with several evaluation scenarios. Most text summarization research mainly uses co-selection based analysis to evaluate the summary result. This seems to be less sufficient to evaluate readability. Therefore, this study includes content-based analysis and human readability evaluation to evaluate the readability of summary result. First, this study combines SPM with Sentence Scoring method as feature-based approach and Bellman-Ford algorithm as graph-based to validate the performance of SPM. Second, the proposed SPM approach is combined with Deep Belief Network (DBN), called DeepSPM, based on the unsupervised Deep Learning method. Then, the performance of the proposed methods in producing Indonesian text summary result is evaluated by Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as co-selection-based analysis; Dwiyanto Djoko Pranowo metrics, Gunning Fog Index (GFI) and Flesch-Kincaid Grade Level (FKGL) as content-based analysis; and human readability evaluation. The experimental findings from this study, using IndoSum dataset, show that SPM can enhance the quality of summary results. DeepSPM achieves better results than DBN with f-measure scores of 46.21% for ROUGE-1, 36.94% for ROUGE-2, and 41.01% for ROUGE-L. Furthermore, the readability evaluation using Dwiyanto’s metrics, GFI, and FKGL also shows that the summary results of DeepSPM are readable at a moderate level and are consistent with the human evaluation results conducted by two Indonesian language experts.

Item Type: Thesis (Doctoral)
Uncontrolled Keywords: Data mining, Natural language processing (Computer science), Computer programs
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics
Divisions: Library > Tesis > FTMK
Depositing User: Unnamed user with email nuraina0324@gmail.com
Date Deposited: 19 Sep 2024 16:43
Last Modified: 04 Nov 2024 11:48
URI: http://eprints.utem.edu.my/id/eprint/27713
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item