Filtering spam text messages by using Twitter-LDA algorithm

Dani Gunawan, Romi Fadillah Rahmat, Arsandi Putra, Muhammad Fermi Pasha

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

8 Citations (Scopus)

Abstract

Recently, the usage of short messaging service (SMS) or text messages have been changed gradually to product or service promotion, and even fraud. The mobile phone users in Indonesia also experience the same condition. A simple approach to address this issue is creating black list of phone numbers or certain keywords and phrases. However, this approach is inefficient because the spammer might change the phone number or change the content of the text message. Meanwhile, another approach is utilizing text classification such as Naive Bayes, k-Nearest Neighbor (kNN), and Support Vector Machine (SVM) to recognize pattern of the text messages. This research proposes Twitter-LDA algorithm to identify spam text messages in Bahasa Indonesia. There are total 985 text messages divided to 774 text messages for training dataset and 211 text messages for testing dataset. These datasets consist of 860 spam and 125 ham text messages. All the text messages should be pre-processed before the training and testing process are applied. This research conducts five experiments which yield the average of f-score is 94.26% and accuracy is 96.49%. According to this result, the Twitter-LDA algorithm has demonstrated a good performance in identifying spam text messages in Bahasa Indonesia.

Original languageEnglish
Title of host publication2018 IEEE International Conference on Communication, Networks and Satellite (Comnetsat) - Proceedings
EditorsRomi Fadillah Rahmat
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages1-6
Number of pages6
ISBN (Electronic)9781538667170, 9781538667163
ISBN (Print)9781538667187
DOIs
Publication statusPublished - 2018
EventIEEE International Conference on Communications, Networks, and Satellite (COMNETSAT) 2018 - Medan, Indonesia
Duration: 15 Nov 201817 Nov 2018
Conference number: 7th
https://ieeexplore.ieee.org/xpl/conhome/8681388/proceeding (Proceedings)

Conference

ConferenceIEEE International Conference on Communications, Networks, and Satellite (COMNETSAT) 2018
Abbreviated titleCOMNETSAT 2018
Country/TerritoryIndonesia
CityMedan
Period15/11/1817/11/18
Internet address

Keywords

  • short messaging service
  • spam
  • spam filtering
  • spam text messages
  • twitter-lda

Cite this