6 million spam tweets: a large ground truth for timely Twitter spam detection

Chao Chen, Jun Zhang, Xiao Chen, Yang Xiang, Wanlei Zhou

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

116 Citations (Scopus)

Abstract

Twitter has changed the way of communication and getting news for people's daily life in recent years. Meanwhile, due to the popularity of Twitter, it also becomes a main target for spamming activities. In order to stop spammers, Twitter is using Google SafeBrowsing to detect and block spam links. Despite that blacklists can block malicious URLs embedded in tweets, their lagging time hinders the ability to protect users in real-time. Thus, researchers begin to apply different machine learning algorithms to detect Twitter spam. However, there is no comprehensive evaluation on each algorithms' performance for real-time Twitter spam detection due to the lack of large groundtruth. To carry out a thorough evaluation, we collected a large dataset of over 600 million public tweets. We further labelled around 6.5 million spam tweets and extracted 12 light-weight features, which can be used for online detection. In addition, we have conducted a number of experiments on six machine learning algorithms under various conditions to better understand their effectiveness and weakness for timely Twitter spam detection. We will make our labelled dataset for researchers who are interested in validating or extending our work.

Original languageEnglish
Title of host publication2015 IEEE International Conference on Communications, ICC 2015
EditorsNathan Gomes
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages7065-7070
Number of pages6
ISBN (Electronic)9781467364324, 9781467364317
DOIs
Publication statusPublished - 2015
Externally publishedYes
EventIEEE International Conference on Communications 2015: Smart City & Smart World - London, United Kingdom
Duration: 8 Jun 201512 Jun 2015
http://icc2015.ieee-icc.org/ (Website)
http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7225357 (IEEE Conference proceedings)

Publication series

NameIEEE International Conference on Communications
PublisherIEEE, Institute of Electrical and Electronics Engineers
Volume2015-September
ISSN (Print)1550-3607
ISSN (Electronic)1938-1883

Conference

ConferenceIEEE International Conference on Communications 2015
Abbreviated titleICC 2015
Country/TerritoryUnited Kingdom
CityLondon
Period8/06/1512/06/15
Other2015 IEEE International Conference on Communications (ICC2015), held in the magnificent city of London, the UK’s capital. ICC2015 will be an excellent presentation, networking and publicity event, offering the opportunity for researchers, engineers and business people to meet and exchange ideas and information. In addition to the traditional keynote, symposium, workshop, tutorial and industry forum sessions, this year we are offering a new event – Chief Technology Officer Forum - during the plenary session. Here, the future of mobile networks will be discussed.
Internet address

Cite this