Rumaisa, Fitrah and Saaya, Zurina and Khamis, Noorli and Basiron, Halizah (2019) Development of multilingual social media data corpus: Development and evaluation. International Journal Of Innovation, Creativity And Change, 6 (5). pp. 1-14. ISSN 2201-1323
Text
6501_RUMAISA_2019_E_R (1).PDF Download (345kB) |
Abstract
The purpose of this study is manual annotating, a corpus for Bahasa Indonesia and Bahasa Melayu. Corpus for both languages has been made by many researchers before, but the focus of this research is only on words with the same vocabulary but which have very different meanings. The data were obtained from social media, so informal words were found. As many as 2100 words for each language were identified which were then randomly selected so that 300 words with the same vocabulary but with different meanings were used. The objective of this study was to confirm that this condition can influence the results of polarity sentiment. At the end of this paper, we will show the results of the influence of the conditions of the two languages on the polarity of sentiments. From the manual annotation, an annotation agreement test was made by three Bahasa Indonesia annotators and three Bahasa Melayu annotators. The results of the annotation found that there were 63 out of 300 words that experience different polarity. Results of score agreement among annotations for each language show that there is good agreement among the annotators during annotation process
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Corpus, Bahasa Indonesia, Bahasa Melayu, Annotation, Social Media |
Divisions: | Faculty of Information and Communication Technology > Department of System and Computer Communication |
Depositing User: | Norfaradilla Idayu Ab. Ghafar |
Date Deposited: | 04 Dec 2020 16:21 |
Last Modified: | 08 Aug 2023 12:06 |
URI: | http://eprints.utem.edu.my/id/eprint/24401 |
Statistic Details: | View Download Statistic |
Actions (login required)
View Item |