Skip to main navigation Skip to search Skip to main content

Benchmarking Library Recognition in Tweets

  • Ting Zhang
  • , Divya Prabha Chandrasekaran
  • , Ferdian Thung
  • , David Lo

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Software developers often use social media (such as Twitter) to share programming knowledge such as new tools, sample code snippets, and tips on programming. One of the topics they talk about is the software library. The tweets may contain useful information about a library. A good understanding of this information, e.g., on the developer's views regarding a library can be beneficial to weigh the pros and cons of using the library as well as the general sentiments towards the library. However, it is not trivial to recognize whether a word actually refers to a library or other meanings. For example, a tweet mentioning the word 'pandas' may refer to the Python pandas library or to the animal. In this work, we created the first benchmark dataset and investigated the task to distinguish whether a tweet refers to a programming library or something else. Recently, the pre-trained Transformer models (PTMs) have achieved great success in the fields of natural language processing and computer vision. Therefore, we extensively evaluated a broad set of modern PTMs, including both general-purpose and domain-specific ones, to solve this programming library recognition task in tweets. Experimental results show that the use of PTM can outperform the best-performing baseline methods by 5% - 12% in terms of F1-score under within-, cross-, and mixed-library settings.

Original languageEnglish
Title of host publicationProceedings - 30th IEEE/ACM International Conference on Program Comprehension, ICPC 2022
EditorsAyushi Rastogi, Rosalia Tufano
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages343-353
Number of pages11
ISBN (Electronic)9781450392983
DOIs
Publication statusPublished - 2022
Externally publishedYes
EventIEEE/ACM International Conference on Program Comprehension 2022 - Online, United States of America
Duration: 16 May 202217 May 2022
Conference number: 30th
https://dl.acm.org/doi/proceedings/10.1145/3524610 (Proceedings)
https://conf.researchr.org/home/icpc-2022 (Website)

Publication series

NameIEEE International Conference on Program Comprehension
PublisherIEEE, Institute of Electrical and Electronics Engineers
Volume2022-March
ISSN (Print)2643-7147
ISSN (Electronic)2643-7171

Conference

ConferenceIEEE/ACM International Conference on Program Comprehension 2022
Abbreviated titleICPC 2022
Country/TerritoryUnited States of America
Period16/05/2217/05/22
Internet address

Keywords

  • benchmark study
  • disambiguation
  • software libraries
  • tweets

Cite this