Identifying active transport from spontaneous data source with natural language processing

Teng Li, Zhuo Chen, Alexa Delbosc

Research output: Chapter in Book/Report/Conference proceedingConference PaperOther


Social media data (SMD) has emerged as a promising resource in furnishing activity location information for travel behavior analysis. However, previous studies neglected the incomplete dataset issue resulting from only collecting limited geotagged SMD which may lead to unreliable analyses. This study mitigates this issue by proposing a novel framework, which employed a Bidirectional Encoder Representations from Transformers (BERT)-based classifier to identify relevant SMD, utilized a named-entity-matching method to extract locations from SMD contents, and integrated these locations with geotag locations through a reasoning scheme to provide more complete SMD for travel behavior analysis. The framework was applied to a case study in Greater Melbourne, Australia, where Twitter data was utilized to investigate the variations in the travel behavior of active transporters related to the COVID-19 pandemic. The results indicated that the framework increased the extraction of location-contained Twitter data by 33.70%. Changes in active transporters’ travel behavior included: 1) the decline of travel activity varied across the city; and 2) the clustering intensity of the activity locations decreased. The contributions of this study encompass providing spontaneous data sources to overcome challenges in obtaining active travel data, simplifying the retrieval of historical travel records, and offering a solution to the incomplete dataset issue. Furthermore, the natural language processing techniques utilized in this study can be transferred to future travel behavior studies to streamline data preparation and location information extraction.
Original languageEnglish
Title of host publicationAnnual Meeting of the US Transportation Research Board 2024
Publication statusPublished - 2024
EventAnnual Meeting of the US Transportation Research Board 2024 - Walter E. Washington Convention Center, Washington DC, United States of America
Duration: 7 Jan 202411 Jan 2024
Conference number: 103rd


ConferenceAnnual Meeting of the US Transportation Research Board 2024
Abbreviated titleTRB 2024
Country/TerritoryUnited States of America
CityWashington DC
Internet address

Cite this