Improving automated documentation to code traceability by combining retrieval techniques

Xiaofan Chen, John Grundy

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

50 Citations (Scopus)

Abstract

Documentation written in natural language and source code are two of the major artifacts of a software system. Tracking a variety of traceability links between software documentation and source code assists software developers in comprehension, efficient development, and effective management of a system. Automated traceability systems to date have been faced with a major open research challenge: how to extract these links with both high precision and high recall. In this paper we introduce an approach that combines three supporting techniques, Regular Expression, Key Phrases, and Clustering, with a Vector Space Model (VSM) to improve the performance of automated traceability between documents and source code. This combination approach takes advantage of strengths of the three techniques to ameliorate limitations of VSM. Four case studies have been used to evaluate our combined technique approach. Experimental results indicate that our approach improves the performance of VSM, increases the precision of retrieved links, and recovers more true links than VSM alone.

Original languageEnglish
Title of host publication2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE 2011, Proceedings
Pages223-232
Number of pages10
DOIs
Publication statusPublished - 2011
Externally publishedYes
EventAutomated Software Engineering Conference 2011 - Lawrence, United States of America
Duration: 6 Nov 201112 Nov 2011
Conference number: 26th
https://dl.acm.org/doi/proceedings/10.5555/2190078 (Proceedings)

Conference

ConferenceAutomated Software Engineering Conference 2011
Abbreviated titleASE 2011
Country/TerritoryUnited States of America
CityLawrence
Period6/11/1112/11/11
Other2011 26th IEEE/ACM International Conference on Automated Software Engineering ASE 2011
Internet address

Keywords

  • Clustering
  • Key Phrases
  • Regular Expression
  • Traceability
  • Vector Space Model

Cite this