Detecting spamming activities in twitter based on deep-learning technique

Tingmin Wu, Sheng Wen, Shigang Liu, Jun Zhang, Yang Xiang, Majed Alrubaian, Mohammad Mehedi Hassan

Research output: Contribution to journalArticleResearchpeer-review

14 Citations (Scopus)

Abstract

Twitter spam has long been a critical but difficult problem to be addressed. So far, researchers have developed a series of machine learning–based methods and blacklisting techniques to detect spamming activities on Twitter. According to our investigation, current methods and techniques have achieved the accuracy of around 87%. However, because of the problems of spam drift and information fabrication, these machine learning–based methods cannot efficiently detect spam activities in real-life scenarios. Meanwhile, the blacklisting method also cannot catch up with the variations of spamming activities, as manually inspecting suspicious URLs is extremely timeconsuming. In this paper, we proposed a novel technique based on deep-learning technique to address the above challenges. The syntax of each tweet will be learned through WordVector and trained by deep learning. We then constructed a binary classifier to differentiate spam and regular tweets. In experiments, we collected and labeled a 10-day real tweet dataset as ground truth to evaluate our proposed method. We first went for empirical analysis with a series of comparisons to other methods: (1) performance of different classifiers, (2) other existing text-based methods, and (3) nontext-based detection techniques. According to the experiment results, our proposed method largely outperformed previous methods. We further conducted principle component analysis on typical methods to theoretically justify the outperformance of our method. We extracted all kinds of features via dimensionality reduction. It was found that our features were most distinct among all the detection methods. This well demonstrated the outperformance of our method.

Original languageEnglish
Article numbere4209
Number of pages11
JournalConcurrency and Computation-Practice & Experience
Volume29
Issue number19
DOIs
Publication statusPublished - 20 Jun 2017
Externally publishedYes

Keywords

  • deep learning
  • social media security
  • twitter spam detection

Cite this