FaCoY: a code-to-code search engine

Kisub Kim, Dongsun Kim, Tegawendé F. Bissyandé, Eunjong Choi, Li Li, Jacques Klein, Yves Le Traon

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

    Abstract

    Code search is an unavoidable activity in software development. Various approaches and techniques have been explored in the literature to support code search tasks. Most of these approaches focus on serving user queries provided as natural language free-form input. However, there exists a wide range of use-case scenarios where a code-to-code approach would be most beneficial. For example, research directions in code transplantation, code diversity, patch recommendation can leverage a code-to-code search engine to find essential ingredients for their techniques. In this paper, we propose FaCoY, a novel approach for statically finding code fragments which may be semantically similar to user input code. FaCoY implements a query alternation strategy: instead of directly matching code query tokens with code in the search space, FaCoY first attempts to identify other tokens which may also be relevant in implementing the functional behavior of the input code. With various experiments, we show that (1) FaCoY is more effective than online code-to-code search engines; (2) FaCoY can detect more semantic code clones (i.e., Type-4) in BigCloneBench than the state-of-the-art; (3) FaCoY, while static, can detect code fragments which are indeed similar with respect to runtime execution behavior; and (4) FaCoY can be useful in code/patch recommendation.

    Original languageEnglish
    Title of host publicationProceedings
    Subtitle of host publication2018 ACM/IEEE 40th International Conference on Software Engineering - ICSE 2018
    EditorsMarsha Chechik, Mark Harman
    Place of PublicationNew York NY USA
    PublisherAssociation for Computing Machinery (ACM)
    Pages946-957
    Number of pages12
    ISBN (Print)9781450356381
    DOIs
    Publication statusPublished - 27 May 2018
    EventInternational Conference on Software Engineering 2018 - Gothenburg, Sweden
    Duration: 27 May 20183 Jun 2018
    Conference number: 40th
    https://www.icse2018.org/

    Conference

    ConferenceInternational Conference on Software Engineering 2018
    Abbreviated titleICSE 2018
    CountrySweden
    CityGothenburg
    Period27/05/183/06/18
    Internet address

    Cite this

    Kim, K., Kim, D., Bissyandé, T. F., Choi, E., Li, L., Klein, J., & Traon, Y. L. (2018). FaCoY: a code-to-code search engine. In M. Chechik, & M. Harman (Eds.), Proceedings: 2018 ACM/IEEE 40th International Conference on Software Engineering - ICSE 2018 (pp. 946-957). New York NY USA: Association for Computing Machinery (ACM). https://doi.org/10.1145/3180155.3180187
    Kim, Kisub ; Kim, Dongsun ; Bissyandé, Tegawendé F. ; Choi, Eunjong ; Li, Li ; Klein, Jacques ; Traon, Yves Le. / FaCoY : a code-to-code search engine. Proceedings: 2018 ACM/IEEE 40th International Conference on Software Engineering - ICSE 2018. editor / Marsha Chechik ; Mark Harman. New York NY USA : Association for Computing Machinery (ACM), 2018. pp. 946-957
    @inproceedings{c7e4dbeaaf9a4473821599b11c336e87,
    title = "FaCoY: a code-to-code search engine",
    abstract = "Code search is an unavoidable activity in software development. Various approaches and techniques have been explored in the literature to support code search tasks. Most of these approaches focus on serving user queries provided as natural language free-form input. However, there exists a wide range of use-case scenarios where a code-to-code approach would be most beneficial. For example, research directions in code transplantation, code diversity, patch recommendation can leverage a code-to-code search engine to find essential ingredients for their techniques. In this paper, we propose FaCoY, a novel approach for statically finding code fragments which may be semantically similar to user input code. FaCoY implements a query alternation strategy: instead of directly matching code query tokens with code in the search space, FaCoY first attempts to identify other tokens which may also be relevant in implementing the functional behavior of the input code. With various experiments, we show that (1) FaCoY is more effective than online code-to-code search engines; (2) FaCoY can detect more semantic code clones (i.e., Type-4) in BigCloneBench than the state-of-the-art; (3) FaCoY, while static, can detect code fragments which are indeed similar with respect to runtime execution behavior; and (4) FaCoY can be useful in code/patch recommendation.",
    author = "Kisub Kim and Dongsun Kim and Bissyand{\'e}, {Tegawend{\'e} F.} and Eunjong Choi and Li Li and Jacques Klein and Traon, {Yves Le}",
    year = "2018",
    month = "5",
    day = "27",
    doi = "10.1145/3180155.3180187",
    language = "English",
    isbn = "9781450356381",
    pages = "946--957",
    editor = "Marsha Chechik and Mark Harman",
    booktitle = "Proceedings",
    publisher = "Association for Computing Machinery (ACM)",
    address = "United States",

    }

    Kim, K, Kim, D, Bissyandé, TF, Choi, E, Li, L, Klein, J & Traon, YL 2018, FaCoY: a code-to-code search engine. in M Chechik & M Harman (eds), Proceedings: 2018 ACM/IEEE 40th International Conference on Software Engineering - ICSE 2018. Association for Computing Machinery (ACM), New York NY USA, pp. 946-957, International Conference on Software Engineering 2018, Gothenburg, Sweden, 27/05/18. https://doi.org/10.1145/3180155.3180187

    FaCoY : a code-to-code search engine. / Kim, Kisub; Kim, Dongsun; Bissyandé, Tegawendé F.; Choi, Eunjong; Li, Li; Klein, Jacques; Traon, Yves Le.

    Proceedings: 2018 ACM/IEEE 40th International Conference on Software Engineering - ICSE 2018. ed. / Marsha Chechik; Mark Harman. New York NY USA : Association for Computing Machinery (ACM), 2018. p. 946-957.

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

    TY - GEN

    T1 - FaCoY

    T2 - a code-to-code search engine

    AU - Kim, Kisub

    AU - Kim, Dongsun

    AU - Bissyandé, Tegawendé F.

    AU - Choi, Eunjong

    AU - Li, Li

    AU - Klein, Jacques

    AU - Traon, Yves Le

    PY - 2018/5/27

    Y1 - 2018/5/27

    N2 - Code search is an unavoidable activity in software development. Various approaches and techniques have been explored in the literature to support code search tasks. Most of these approaches focus on serving user queries provided as natural language free-form input. However, there exists a wide range of use-case scenarios where a code-to-code approach would be most beneficial. For example, research directions in code transplantation, code diversity, patch recommendation can leverage a code-to-code search engine to find essential ingredients for their techniques. In this paper, we propose FaCoY, a novel approach for statically finding code fragments which may be semantically similar to user input code. FaCoY implements a query alternation strategy: instead of directly matching code query tokens with code in the search space, FaCoY first attempts to identify other tokens which may also be relevant in implementing the functional behavior of the input code. With various experiments, we show that (1) FaCoY is more effective than online code-to-code search engines; (2) FaCoY can detect more semantic code clones (i.e., Type-4) in BigCloneBench than the state-of-the-art; (3) FaCoY, while static, can detect code fragments which are indeed similar with respect to runtime execution behavior; and (4) FaCoY can be useful in code/patch recommendation.

    AB - Code search is an unavoidable activity in software development. Various approaches and techniques have been explored in the literature to support code search tasks. Most of these approaches focus on serving user queries provided as natural language free-form input. However, there exists a wide range of use-case scenarios where a code-to-code approach would be most beneficial. For example, research directions in code transplantation, code diversity, patch recommendation can leverage a code-to-code search engine to find essential ingredients for their techniques. In this paper, we propose FaCoY, a novel approach for statically finding code fragments which may be semantically similar to user input code. FaCoY implements a query alternation strategy: instead of directly matching code query tokens with code in the search space, FaCoY first attempts to identify other tokens which may also be relevant in implementing the functional behavior of the input code. With various experiments, we show that (1) FaCoY is more effective than online code-to-code search engines; (2) FaCoY can detect more semantic code clones (i.e., Type-4) in BigCloneBench than the state-of-the-art; (3) FaCoY, while static, can detect code fragments which are indeed similar with respect to runtime execution behavior; and (4) FaCoY can be useful in code/patch recommendation.

    UR - http://www.scopus.com/inward/record.url?scp=85049382246&partnerID=8YFLogxK

    U2 - 10.1145/3180155.3180187

    DO - 10.1145/3180155.3180187

    M3 - Conference Paper

    SN - 9781450356381

    SP - 946

    EP - 957

    BT - Proceedings

    A2 - Chechik, Marsha

    A2 - Harman, Mark

    PB - Association for Computing Machinery (ACM)

    CY - New York NY USA

    ER -

    Kim K, Kim D, Bissyandé TF, Choi E, Li L, Klein J et al. FaCoY: a code-to-code search engine. In Chechik M, Harman M, editors, Proceedings: 2018 ACM/IEEE 40th International Conference on Software Engineering - ICSE 2018. New York NY USA: Association for Computing Machinery (ACM). 2018. p. 946-957 https://doi.org/10.1145/3180155.3180187