Automatic techniques for code example generation

HKUST Electronic Theses

Automatic techniques for code example generation

by Xiaodong Gu

THESIS 2017

Ph.D. Computer Science and Engineering

xii, 94 pages : illustrations ; 30 cm

Abstract

Developers often wonder how to implement a program functionality. Code examples are very helpful in this regard. Over the years, many approaches have been proposed to generate code examples. The existing approaches often treat queries and source code as textual documents and utilize information retrieval models to retrieve relevant code snippets that match a given query.

However, conventional code example generation approaches involve the following major challenges. First, they rely on a bag-of-words assumption and cannot recognize high-level features of queries and source code. Second, source code and natural language queries are heterogeneous. Existing approaches mainly rely on the textual similarity between source code and natural language query. They lack a mapping of high-level semantics between queries and source code. Moreover, the generated code examples may be redundant and project-specific, this requires to generate succinct and high-coverage code examples.

To address these challenges, in this thesis, we propose three machine learning based approaches to the generation of code examples. Instead of mapping keywords, our approaches learn the deep semantics of queries and code snippets.

We first propose a technique, DeepAPI which generates API usage sequences via deep learning. DEEPAPI adapts a neural language model named RNN Encoder-Decoder [31]. Given a corpus of annotated API sequences, i.e.,〈API sequence, annotation〉pairs, DEEPAPI trains the language model that encodes each sequence of words (annotation) into a fixed-length context vector and decodes an API sequence based on the context vector. Then, in response to an API-related user query, it generates API sequences by consulting the neural language model.

Furthermore, we propose a technique, DeepCodeHow to generate code examples via searching from existing code corpus. To bridge the lexical gap between queries and source code, DeepCodeHow jointly embeds code snippets and natural language descriptions into a high-dimensional vector space. With the unified vector representation, code snippets semantically related to a natural language query can be retrieved according to their vectors.

Finally, to generate succinct and high-coverage examples, we design a code example selection technique named CodeKernel. CodeKernel leverages a machine learning technique named Graph Kernel. It represents code snippets as object usage graphs and embeds graphs into a high-level vector space. With the graph embedding, CodeKernel clusters similar graphs and selects a typical graph as the code example.

We empirically evaluate our techniques on a large scale code corpus collected from GitHub. The experimental results show that our proposed techniques effectively generate relevant code examples and outperform the conventional IR-based approaches.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Kim, S. Authors Gu, Xiaodong Subjects Code generators Coding theory Machine learning Language English Call number Thesis CSED 2017 Gu DOI 10.14711/thesis-991012553967203412

Full record

Automatic techniques for code example generation

by Xiaodong Gu

Post a Comment Cancel reply