Locating latent design information in developer discussions: a study on pull requests

Giovanni Viviani, Michalis Famelis, Xin Xia, Calahan Janik-Jones, Gail C. Murphy

Research output: Contribution to journalArticleResearchpeer-review

1 Citation (Scopus)

Abstract

A software system’s design determines many of its properties, such as maintainability and performance. An understanding of design is needed to maintain system properties as changes to the system occur. Unfortunately, many systems do not have up-to-date design documentation and approaches that have been developed to recover design often focus on how a system works by extracting structural and behaviour information rather than information about the desired design properties, such as robustness or performance. In this paper, we explore whether it is possible to automatically locate where design is discussed in on-line developer discussions. We investigate and introduce a classifier that can locate paragraphs in pull request discussions that pertain to design with an average AUC score of 0.87. We show that this classifier, when applied to projects on which it was not trained, agrees with the identification of design points by humans with an average AUC score of 0.79. We describe how this classifier could be used as the basis of tools to improve such tasks as reviewing code and implementing new features.

Original languageEnglish
Number of pages13
JournalIEEE Transactions on Software Engineering
DOIs
Publication statusAccepted/In press - 20 Jun 2019

Keywords

  • Conversations
  • Design Discussions
  • Design Recovery
  • Latent Design
  • Prediction Model

Cite this

Viviani, Giovanni ; Famelis, Michalis ; Xia, Xin ; Janik-Jones, Calahan ; Murphy, Gail C. / Locating latent design information in developer discussions : a study on pull requests. In: IEEE Transactions on Software Engineering. 2019.
@article{4f368d2cb0f646f191e267dac4687903,
title = "Locating latent design information in developer discussions: a study on pull requests",
abstract = "A software system’s design determines many of its properties, such as maintainability and performance. An understanding of design is needed to maintain system properties as changes to the system occur. Unfortunately, many systems do not have up-to-date design documentation and approaches that have been developed to recover design often focus on how a system works by extracting structural and behaviour information rather than information about the desired design properties, such as robustness or performance. In this paper, we explore whether it is possible to automatically locate where design is discussed in on-line developer discussions. We investigate and introduce a classifier that can locate paragraphs in pull request discussions that pertain to design with an average AUC score of 0.87. We show that this classifier, when applied to projects on which it was not trained, agrees with the identification of design points by humans with an average AUC score of 0.79. We describe how this classifier could be used as the basis of tools to improve such tasks as reviewing code and implementing new features.",
keywords = "Conversations, Design Discussions, Design Recovery, Latent Design, Prediction Model",
author = "Giovanni Viviani and Michalis Famelis and Xin Xia and Calahan Janik-Jones and Murphy, {Gail C.}",
year = "2019",
month = "6",
day = "20",
doi = "10.1109/TSE.2019.2924006",
language = "English",
journal = "IEEE Transactions on Software Engineering",
issn = "0098-5589",
publisher = "Publ by IEEE",

}

Locating latent design information in developer discussions : a study on pull requests. / Viviani, Giovanni; Famelis, Michalis; Xia, Xin; Janik-Jones, Calahan; Murphy, Gail C.

In: IEEE Transactions on Software Engineering, 20.06.2019.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Locating latent design information in developer discussions

T2 - a study on pull requests

AU - Viviani, Giovanni

AU - Famelis, Michalis

AU - Xia, Xin

AU - Janik-Jones, Calahan

AU - Murphy, Gail C.

PY - 2019/6/20

Y1 - 2019/6/20

N2 - A software system’s design determines many of its properties, such as maintainability and performance. An understanding of design is needed to maintain system properties as changes to the system occur. Unfortunately, many systems do not have up-to-date design documentation and approaches that have been developed to recover design often focus on how a system works by extracting structural and behaviour information rather than information about the desired design properties, such as robustness or performance. In this paper, we explore whether it is possible to automatically locate where design is discussed in on-line developer discussions. We investigate and introduce a classifier that can locate paragraphs in pull request discussions that pertain to design with an average AUC score of 0.87. We show that this classifier, when applied to projects on which it was not trained, agrees with the identification of design points by humans with an average AUC score of 0.79. We describe how this classifier could be used as the basis of tools to improve such tasks as reviewing code and implementing new features.

AB - A software system’s design determines many of its properties, such as maintainability and performance. An understanding of design is needed to maintain system properties as changes to the system occur. Unfortunately, many systems do not have up-to-date design documentation and approaches that have been developed to recover design often focus on how a system works by extracting structural and behaviour information rather than information about the desired design properties, such as robustness or performance. In this paper, we explore whether it is possible to automatically locate where design is discussed in on-line developer discussions. We investigate and introduce a classifier that can locate paragraphs in pull request discussions that pertain to design with an average AUC score of 0.87. We show that this classifier, when applied to projects on which it was not trained, agrees with the identification of design points by humans with an average AUC score of 0.79. We describe how this classifier could be used as the basis of tools to improve such tasks as reviewing code and implementing new features.

KW - Conversations

KW - Design Discussions

KW - Design Recovery

KW - Latent Design

KW - Prediction Model

UR - http://www.scopus.com/inward/record.url?scp=85067803067&partnerID=8YFLogxK

U2 - 10.1109/TSE.2019.2924006

DO - 10.1109/TSE.2019.2924006

M3 - Article

AN - SCOPUS:85067803067

JO - IEEE Transactions on Software Engineering

JF - IEEE Transactions on Software Engineering

SN - 0098-5589

ER -