Cross document relation identification for multi document summarization based on enhanched case based reasoning framework

Yogan Jaya , Kumar (2014) Cross document relation identification for multi document summarization based on enhanched case based reasoning framework. PhD thesis, UTeM.

[img] Text
Cross Document Relation Identification For Multi Document Summarization Based On Enhanced Case Based Reasoning Framework 24pages.pdf

Download (486kB)

Abstract

Documents which are available through online search often provide readers with large collection of texts. In the context of news documents, different news sources reporting on the same event usually contain common components that make up the main story of the news. This study aims to produce high quality multi document summaries by taking into account the generic components of a news story within a specific domain. Since this study involves multiple documents, the research further investigates the automatic identification of cross-document relations from unannotated text documents, where the case based reasoning (CBR) classification model is proposed. Cross-document relations are used to identify highly relevant sentences to be included in the summary. With the aim to improve the crossdocument relation identification, genetic algorithm (GA) is integrated to enhance the CBR classifier. GA is used to scale the relevance of the data features used by the CBR classifier. Following that, this research proposes two new sentence scoring mechanism based on the identified cross-document relations. The first approach is based on a voting technique named votCombMAX which gives votes to sentences based on the relationship types between sentence pairs. The second approach investigates the benefits of fuzzy reasoning over the identified cross-document relations; since not all cross-document relation types have positive effect towards summary generation. In this study, the Document Understanding Conference (DUC) 2002 data sets are used; and as for the evaluation, the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluation metrics are used. The evidence from this study showed that the proposed methods yield significant improvement over the mainstream methods.

Item Type: Thesis (PhD)
Uncontrolled Keywords: Soft computing, Case-based reasoning
Subjects: Q Science > QA Mathematics
Q Science > QA Mathematics > QA76 Computer software
Divisions: Library > Tesis > FTMK
Depositing User: Norziyana Hanipah
Date Deposited: 12 Aug 2015 07:27
Last Modified: 12 Aug 2015 07:27
URI: http://eprints.utem.edu.my/id/eprint/14843
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item