An Enhanced Malay Named Entity Recognition Using Clustering and Classification Approach For Crime Textual Data Analysis

Salleh, Muhammad Sharilazlan (2018) An Enhanced Malay Named Entity Recognition Using Clustering and Classification Approach For Crime Textual Data Analysis. Masters thesis, Universiti Teknikal Malaysia Melaka.

[img] Text (24 Pages)
An Enhanced Malay Named Entity Recognition Using Clustering and Classification Approach For Crime Textual Data Analysis.pdf - Submitted Version

Download (381kB)

Abstract

Named Entity Recognition (NER) is one of the tasks undertaken in the information extraction. NER is used for extracting and classifying words or entities that belong to the proper noun category in text data such as the person's name, location, organization, date, etc. As seen in today's generation, social media such as web pages, blogs, Facebook, Twitter, Instagram and online newspapers are among the major contributors to information extraction. These resources contain various types of unstructured data such as text. However, the amount of works done to process this type of data is limited for Malay Named Entity Recognition (MNER). The deficiency on Malay textual analytic has led to difficulties in extracting information for decision making. This research aims to present a Malay Named Entity Recognition technique that focuses on crime data analysis in the Malay language that extracted from Polis Diraja Malaysia (PDRM) news web page. This Malay Named Entity Recognition (MNER) technique is proposed by using multi-staged of clustering and classification methods. The methods are Fuzzy C-Means and K-Nearest Neighbors Algorithm. The methods involve multi-layer features extraction to recognize entities such as person name, location, organization, date and crime type. This multi-staged technique is obtained 95.24% accuracy in the process of recognizing named entities for text analysis, particularly in Malay. The proposed technique can improve the accuracy performance on named entity recognition of crime data based on the suitability selected features for the Malay language.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Natural language processing, Artificial intelligence, Machine learning, Malay Named Entity Recognition, Crime Textual Data Analysis
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics > QA76 Computer software
Divisions: Library > Tesis > FTMK
Depositing User: Mohd Hannif Jamaludin
Date Deposited: 30 Aug 2019 03:32
Last Modified: 17 Sep 2020 11:21
URI: http://eprints.utem.edu.my/id/eprint/23326
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item