Kasmuri, Emaliana and Basiron, Halizah (2019) Building a malay-english code-switching subjectivity corpus for sentiment analysis. International Journal of Advances in Soft Computing and its Applications, 11 (1). pp. 112-130. ISSN 2074-8523
Text
ARTICLE-IJASCA2019-BUILDING-A-MALAY-ENGLISH-CODE-SWITCHING-SUBJECTIVITY-CORPUS-FOR-SENTIMENT-ANALYSIS.PDF Download (455kB) |
Abstract
Combining of local and foreign language in single utterance has become a norm in multi-ethnic region. This phenomenon is known as code-switching. Code-switching has become a new challenge in sentiment analysis when the Internet users express their opinion in blogs, reviews and social network sites. The resources to process code-switching text in sentiment analysis is scarce especially annotated corpus. This paper develops a guideline to build a code-switching subjectivity corpus for a mix of Malay and English language known as MY-EN-CS. The guideline is suitable for any code-switching textual document. This paper built a new MY-EN-CS to demonstrate the guideline. The corpus consists of opinionated and factual sentences that are constructed from combination of words from these the languages. The sentences were retrieved from blogs and MY-EN-CS sentences are identified and annotated either as opinionated or factual. The annotated task yields 0.83 Kappa value rate that indicates the reliability of this corpus.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Annotation guideline, Code-switching corpus, Sentiment analysis, Subjectivity corpus |
Divisions: | Faculty of Information and Communication Technology |
Depositing User: | Sabariah Ismail |
Date Deposited: | 03 Dec 2020 16:27 |
Last Modified: | 12 Jul 2023 11:22 |
URI: | http://eprints.utem.edu.my/id/eprint/24469 |
Statistic Details: | View Download Statistic |
Actions (login required)
View Item |