Tweet Clustering in Indonesian Language Twitter Social Media using Naive Bayes Classifier Method

Tjut Adek, Rizal and Nasution, Sahlan (2018) Tweet Clustering in Indonesian Language Twitter Social Media using Naive Bayes Classifier Method. Eurasian Journal of Analytical Chemistry (Abbrev. Eurasian J Anal Chem. or EJAC), 13 (6). pp. 277-284. ISSN 1306-3057

[img]
Preview
Text
Tweet Clustering in.pdf - Published Version

Download (1MB) | Preview
Official URL: http://www.eurasianjournals.com/

Abstract

Twitter is one of the social media that has been widely used for various purposes, especially to facilitate the means of information, communication, entertainment and a means of expressing expression. We can find various kinds of information on twitter such as culture, sports, culinary, tourism, music, politics and others. The purpose of this research is to build an application that can group tweets from twitter into sports and non-sports categories using the Naive Bayes classifier method. Text mining is a technique used to handle classification, clustering, information extraction and information retrieval problems. To classify tweets from twitter automatically needed one of the mining Clustering text techniques. Learning outcomes in the form of probabilities will be used as material for processing tweet documents that are not yet known in the category. In the process, the tweet document will go through a text pre-processing process, and grouped into unigram (one word), bigram (two words), trigram (three words). For determining the category of a tweet document that is not yet known, the comparison is made between the results of the appearance of the categories of the three n-grams. From the results of testing the system using 100 to 2000 training data in each category, and 10 testing data in each category. The result is the accuracy of tweets that are categorized as 60% in training data as much as 100, accuracy of 65% in training data as much as 200, and accuracy of 90% in training data as much as 2000. The conclusion is that the more training data used as learning increases also the success rate of clusters to a tweet document.

Item Type: Article
Subjects: T Technology & Engineering > TI Informatics, Information System
Divisions: Faculty of Engineering > Department of Informatics
Depositing User: Mr. Rizal Tjut Adek
Date Deposited: 22 Jun 2020 05:46
Last Modified: 07 Aug 2020 04:26
URI: http://repository.unimal.ac.id/id/eprint/5688

Actions (login required)

View Item View Item