Easy-to-deploy API extraction by multi-level feature embedding and transfer learning

Suyu Ma, Zhenchang Xing, Chunyang Chen, Cheng Chen, Lizhen Qu, Guoqiang Li

Research output: Contribution to journalArticleResearchpeer-review

4 Citations (Scopus)

Abstract

Application Programming Interfaces (APIs) have been widely discussed on social-technical platforms (e.g., Stack Overflow). Extracting API mentions from such informal software texts is the prerequisite for API-centric search and summarization of programming knowledge. Machine learning based API extraction has demonstrated superior performance than rule-based methods in informal software texts that lack consistent writing forms and annotations. However, machine learning based methods have a significant overhead in preparing training data and effective features. In this paper, we propose a multi-layer neural network based architecture for API extraction. Our architecture automatically learns character-, word- and sentence-level features from the input texts, thus removing the need for manual feature engineering and the dependence on advanced features (e.g., API gazzetter) beyond the input texts. We also propose to adopt transfer learning to adapt a source-library-trained model to a target-library, thus reducing the overhead of manual training-data labeling when the software text of multiple programming languages and libraries need to be processed. We conduct extensive experiments with six libraries of four programming languages which support diverse functionalities and have different API-naming and API-mention characteristics. Our experiments investigate the performance of our neural architecture for API extraction in informal software texts, the importance of different features, the effectiveness of transfer learning. Our results confirm not only the superior performance of our neural architecture than existing machine learning based methods for API extraction in informal software texts, but also the easy-to-deploy characteristic of our neural architecture.

Original languageEnglish
Number of pages15
JournalIEEE Transactions on Software Engineering
DOIs
Publication statusAccepted/In press - 11 Oct 2019

Keywords

  • API extraction
  • CNN
  • Computer architecture
  • Feature extraction
  • Libraries
  • LSTM
  • Machine learning
  • Manuals
  • Software
  • Training data
  • Transfer learning
  • Word embedding

Cite this