TY - JOUR
T1 - Detecting spamming activities in twitter based on deep-learning technique
AU - Wu, Tingmin
AU - Wen, Sheng
AU - Liu, Shigang
AU - Zhang, Jun
AU - Xiang, Yang
AU - Alrubaian, Majed
AU - Hassan, Mohammad Mehedi
PY - 2017/6/20
Y1 - 2017/6/20
N2 - Twitter spam has long been a critical but difficult problem to be addressed. So far, researchers have developed a series of machine learning–based methods and blacklisting techniques to detect spamming activities on Twitter. According to our investigation, current methods and techniques have achieved the accuracy of around 87%. However, because of the problems of spam drift and information fabrication, these machine learning–based methods cannot efficiently detect spam activities in real-life scenarios. Meanwhile, the blacklisting method also cannot catch up with the variations of spamming activities, as manually inspecting suspicious URLs is extremely timeconsuming. In this paper, we proposed a novel technique based on deep-learning technique to address the above challenges. The syntax of each tweet will be learned through WordVector and trained by deep learning. We then constructed a binary classifier to differentiate spam and regular tweets. In experiments, we collected and labeled a 10-day real tweet dataset as ground truth to evaluate our proposed method. We first went for empirical analysis with a series of comparisons to other methods: (1) performance of different classifiers, (2) other existing text-based methods, and (3) nontext-based detection techniques. According to the experiment results, our proposed method largely outperformed previous methods. We further conducted principle component analysis on typical methods to theoretically justify the outperformance of our method. We extracted all kinds of features via dimensionality reduction. It was found that our features were most distinct among all the detection methods. This well demonstrated the outperformance of our method.
AB - Twitter spam has long been a critical but difficult problem to be addressed. So far, researchers have developed a series of machine learning–based methods and blacklisting techniques to detect spamming activities on Twitter. According to our investigation, current methods and techniques have achieved the accuracy of around 87%. However, because of the problems of spam drift and information fabrication, these machine learning–based methods cannot efficiently detect spam activities in real-life scenarios. Meanwhile, the blacklisting method also cannot catch up with the variations of spamming activities, as manually inspecting suspicious URLs is extremely timeconsuming. In this paper, we proposed a novel technique based on deep-learning technique to address the above challenges. The syntax of each tweet will be learned through WordVector and trained by deep learning. We then constructed a binary classifier to differentiate spam and regular tweets. In experiments, we collected and labeled a 10-day real tweet dataset as ground truth to evaluate our proposed method. We first went for empirical analysis with a series of comparisons to other methods: (1) performance of different classifiers, (2) other existing text-based methods, and (3) nontext-based detection techniques. According to the experiment results, our proposed method largely outperformed previous methods. We further conducted principle component analysis on typical methods to theoretically justify the outperformance of our method. We extracted all kinds of features via dimensionality reduction. It was found that our features were most distinct among all the detection methods. This well demonstrated the outperformance of our method.
KW - deep learning
KW - social media security
KW - twitter spam detection
UR - http://www.scopus.com/inward/record.url?scp=85021148844&partnerID=8YFLogxK
U2 - 10.1002/cpe.4209
DO - 10.1002/cpe.4209
M3 - Article
AN - SCOPUS:85021148844
SN - 1532-0626
VL - 29
JO - Concurrency and Computation: Practice and Experience
JF - Concurrency and Computation: Practice and Experience
IS - 19
M1 - e4209
ER -