P-GENT

Privacy-preserving GEocoding of Non-geotagged Tweets

Shuo Wang, Richard Sinnott, Surya Nepal

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

1 Citation (Scopus)

Abstract

With the widespread proliferation of location-aware devices and social media applications, more and more people share information on location-based social networks such as Twitter. Such data can be beneficial to better plan and manage individual's activities and other social applications, e.g., location-based advertisement or recommendation. However, only a very small proportion of tweets are geotagged due to privacy concerns or lack of underlying positioning infrastructures. Hence it is meaningful to estimate the geographic information for non-geotagged tweets, i.e., geocoding, which can help to improve the applicability and utility of social media data. Contrary to existing geocoding approaches, this paper aims at the privacy risk and providing a fine-grained estimation. In this paper, we propose Privacy-preserving GEocoding of Non-geotagged Tweets (P-GENT) for geocoding non-geotagged tweets with fine-grained estimation whilst protecting privacy. Our approach estimates the geographic location of a non-geotagged tweet based on the similarities between the content of the tweet and the keyword lists of detected local events form the archived geo-tagged tweets during the same time period. This approach implements a spatio-temporal clustering algorithm to discover local events with a fine-grained granularity and an important keyword extraction mechanism to describe the detected local event. In addition, a density-seed discovery approach is used to reduce the sparseness of geo-tagged tweets and the time complexity of clustering approach. The experimental evaluation with real-world data demonstrates that our approach has at most 92% precision for one timeslot and 33-43% precision remained for all time slots after using privacy-preserving mechanisms.

Original languageEnglish
Title of host publicationProceedings - The 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (IEEE TrustCom 2018) - The 12th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE 2018) - 2018 IEEE Trustcom/BigDataSE
EditorsKim-Kwang Raymond Choo, Yongxin Zhu, Zongming Fei, Bhavani Thuraisingham, Yang Xiang
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages972-983
Number of pages12
ISBN (Electronic)9781538643884, 9781538643877
ISBN (Print)9781538643891
DOIs
Publication statusPublished - 2018
Externally publishedYes
EventIEEE International Conference on Trust, Security and Privacy in Computing and Communications and IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE) 2018 - New York, United States of America
Duration: 31 Jul 20183 Aug 2018
Conference number: 17th
http://www.cloud-conf.net/trustcom18/

Conference

ConferenceIEEE International Conference on Trust, Security and Privacy in Computing and Communications and IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE) 2018
Abbreviated titleTrustCom 2018
CountryUnited States of America
CityNew York
Period31/07/183/08/18
Internet address

Keywords

  • Differential privacy
  • event detection
  • location estimation
  • spatio temporal clustering

Cite this

Wang, S., Sinnott, R., & Nepal, S. (2018). P-GENT: Privacy-preserving GEocoding of Non-geotagged Tweets. In K-K. R. Choo, Y. Zhu, Z. Fei, B. Thuraisingham, & Y. Xiang (Eds.), Proceedings - The 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (IEEE TrustCom 2018) - The 12th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE 2018) - 2018 IEEE Trustcom/BigDataSE (pp. 972-983). [8456006] Piscataway NJ USA: IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/TrustCom/BigDataSE.2018.00137
Wang, Shuo ; Sinnott, Richard ; Nepal, Surya. / P-GENT : Privacy-preserving GEocoding of Non-geotagged Tweets. Proceedings - The 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (IEEE TrustCom 2018) - The 12th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE 2018) - 2018 IEEE Trustcom/BigDataSE. editor / Kim-Kwang Raymond Choo ; Yongxin Zhu ; Zongming Fei ; Bhavani Thuraisingham ; Yang Xiang. Piscataway NJ USA : IEEE, Institute of Electrical and Electronics Engineers, 2018. pp. 972-983
@inproceedings{ea2ef332d2014dccbb693b54d3e2ce9e,
title = "P-GENT: Privacy-preserving GEocoding of Non-geotagged Tweets",
abstract = "With the widespread proliferation of location-aware devices and social media applications, more and more people share information on location-based social networks such as Twitter. Such data can be beneficial to better plan and manage individual's activities and other social applications, e.g., location-based advertisement or recommendation. However, only a very small proportion of tweets are geotagged due to privacy concerns or lack of underlying positioning infrastructures. Hence it is meaningful to estimate the geographic information for non-geotagged tweets, i.e., geocoding, which can help to improve the applicability and utility of social media data. Contrary to existing geocoding approaches, this paper aims at the privacy risk and providing a fine-grained estimation. In this paper, we propose Privacy-preserving GEocoding of Non-geotagged Tweets (P-GENT) for geocoding non-geotagged tweets with fine-grained estimation whilst protecting privacy. Our approach estimates the geographic location of a non-geotagged tweet based on the similarities between the content of the tweet and the keyword lists of detected local events form the archived geo-tagged tweets during the same time period. This approach implements a spatio-temporal clustering algorithm to discover local events with a fine-grained granularity and an important keyword extraction mechanism to describe the detected local event. In addition, a density-seed discovery approach is used to reduce the sparseness of geo-tagged tweets and the time complexity of clustering approach. The experimental evaluation with real-world data demonstrates that our approach has at most 92{\%} precision for one timeslot and 33-43{\%} precision remained for all time slots after using privacy-preserving mechanisms.",
keywords = "Differential privacy, event detection, location estimation, spatio temporal clustering",
author = "Shuo Wang and Richard Sinnott and Surya Nepal",
year = "2018",
doi = "10.1109/TrustCom/BigDataSE.2018.00137",
language = "English",
isbn = "9781538643891",
pages = "972--983",
editor = "Choo, {Kim-Kwang Raymond} and Zhu, {Yongxin } and Fei, {Zongming } and Thuraisingham, {Bhavani } and Xiang, {Yang }",
booktitle = "Proceedings - The 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (IEEE TrustCom 2018) - The 12th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE 2018) - 2018 IEEE Trustcom/BigDataSE",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
address = "United States of America",

}

Wang, S, Sinnott, R & Nepal, S 2018, P-GENT: Privacy-preserving GEocoding of Non-geotagged Tweets. in K-KR Choo, Y Zhu, Z Fei, B Thuraisingham & Y Xiang (eds), Proceedings - The 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (IEEE TrustCom 2018) - The 12th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE 2018) - 2018 IEEE Trustcom/BigDataSE., 8456006, IEEE, Institute of Electrical and Electronics Engineers, Piscataway NJ USA, pp. 972-983, IEEE International Conference on Trust, Security and Privacy in Computing and Communications and IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE) 2018, New York, United States of America, 31/07/18. https://doi.org/10.1109/TrustCom/BigDataSE.2018.00137

P-GENT : Privacy-preserving GEocoding of Non-geotagged Tweets. / Wang, Shuo; Sinnott, Richard; Nepal, Surya.

Proceedings - The 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (IEEE TrustCom 2018) - The 12th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE 2018) - 2018 IEEE Trustcom/BigDataSE. ed. / Kim-Kwang Raymond Choo; Yongxin Zhu; Zongming Fei; Bhavani Thuraisingham; Yang Xiang. Piscataway NJ USA : IEEE, Institute of Electrical and Electronics Engineers, 2018. p. 972-983 8456006.

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - P-GENT

T2 - Privacy-preserving GEocoding of Non-geotagged Tweets

AU - Wang, Shuo

AU - Sinnott, Richard

AU - Nepal, Surya

PY - 2018

Y1 - 2018

N2 - With the widespread proliferation of location-aware devices and social media applications, more and more people share information on location-based social networks such as Twitter. Such data can be beneficial to better plan and manage individual's activities and other social applications, e.g., location-based advertisement or recommendation. However, only a very small proportion of tweets are geotagged due to privacy concerns or lack of underlying positioning infrastructures. Hence it is meaningful to estimate the geographic information for non-geotagged tweets, i.e., geocoding, which can help to improve the applicability and utility of social media data. Contrary to existing geocoding approaches, this paper aims at the privacy risk and providing a fine-grained estimation. In this paper, we propose Privacy-preserving GEocoding of Non-geotagged Tweets (P-GENT) for geocoding non-geotagged tweets with fine-grained estimation whilst protecting privacy. Our approach estimates the geographic location of a non-geotagged tweet based on the similarities between the content of the tweet and the keyword lists of detected local events form the archived geo-tagged tweets during the same time period. This approach implements a spatio-temporal clustering algorithm to discover local events with a fine-grained granularity and an important keyword extraction mechanism to describe the detected local event. In addition, a density-seed discovery approach is used to reduce the sparseness of geo-tagged tweets and the time complexity of clustering approach. The experimental evaluation with real-world data demonstrates that our approach has at most 92% precision for one timeslot and 33-43% precision remained for all time slots after using privacy-preserving mechanisms.

AB - With the widespread proliferation of location-aware devices and social media applications, more and more people share information on location-based social networks such as Twitter. Such data can be beneficial to better plan and manage individual's activities and other social applications, e.g., location-based advertisement or recommendation. However, only a very small proportion of tweets are geotagged due to privacy concerns or lack of underlying positioning infrastructures. Hence it is meaningful to estimate the geographic information for non-geotagged tweets, i.e., geocoding, which can help to improve the applicability and utility of social media data. Contrary to existing geocoding approaches, this paper aims at the privacy risk and providing a fine-grained estimation. In this paper, we propose Privacy-preserving GEocoding of Non-geotagged Tweets (P-GENT) for geocoding non-geotagged tweets with fine-grained estimation whilst protecting privacy. Our approach estimates the geographic location of a non-geotagged tweet based on the similarities between the content of the tweet and the keyword lists of detected local events form the archived geo-tagged tweets during the same time period. This approach implements a spatio-temporal clustering algorithm to discover local events with a fine-grained granularity and an important keyword extraction mechanism to describe the detected local event. In addition, a density-seed discovery approach is used to reduce the sparseness of geo-tagged tweets and the time complexity of clustering approach. The experimental evaluation with real-world data demonstrates that our approach has at most 92% precision for one timeslot and 33-43% precision remained for all time slots after using privacy-preserving mechanisms.

KW - Differential privacy

KW - event detection

KW - location estimation

KW - spatio temporal clustering

UR - http://www.scopus.com/inward/record.url?scp=85054097599&partnerID=8YFLogxK

U2 - 10.1109/TrustCom/BigDataSE.2018.00137

DO - 10.1109/TrustCom/BigDataSE.2018.00137

M3 - Conference Paper

SN - 9781538643891

SP - 972

EP - 983

BT - Proceedings - The 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (IEEE TrustCom 2018) - The 12th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE 2018) - 2018 IEEE Trustcom/BigDataSE

A2 - Choo, Kim-Kwang Raymond

A2 - Zhu, Yongxin

A2 - Fei, Zongming

A2 - Thuraisingham, Bhavani

A2 - Xiang, Yang

PB - IEEE, Institute of Electrical and Electronics Engineers

CY - Piscataway NJ USA

ER -

Wang S, Sinnott R, Nepal S. P-GENT: Privacy-preserving GEocoding of Non-geotagged Tweets. In Choo K-KR, Zhu Y, Fei Z, Thuraisingham B, Xiang Y, editors, Proceedings - The 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (IEEE TrustCom 2018) - The 12th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE 2018) - 2018 IEEE Trustcom/BigDataSE. Piscataway NJ USA: IEEE, Institute of Electrical and Electronics Engineers. 2018. p. 972-983. 8456006 https://doi.org/10.1109/TrustCom/BigDataSE.2018.00137