THESIS
2018
xiv, 98 pages : illustrations ; 30 cm
Abstract
Stack Overflow (SO) is a widely and actively used community question answering forum
for developers. Despite the huge amount of invaluable information residing in SO, taking full
advantage of it has been challenging. To effectively utilise this crowd wisdom, this thesis proposes
three different novel works that leverage the SO wisdom to help developers improve their
productivity.
The first work helps developers to debug their software code through the use of SO dataset.
Our approach reveals 189 warnings and majority of them are confirmed by developers from
eight high-quality and well-maintained projects.
A recent survey revealed that many non-native English speaking developers have trouble
understanding SO posts, and they desire more visuals within the posts to help them unders...[
Read more ]
Stack Overflow (SO) is a widely and actively used community question answering forum
for developers. Despite the huge amount of invaluable information residing in SO, taking full
advantage of it has been challenging. To effectively utilise this crowd wisdom, this thesis proposes
three different novel works that leverage the SO wisdom to help developers improve their
productivity.
The first work helps developers to debug their software code through the use of SO dataset.
Our approach reveals 189 warnings and majority of them are confirmed by developers from
eight high-quality and well-maintained projects.
A recent survey revealed that many non-native English speaking developers have trouble
understanding SO posts, and they desire more visuals within the posts to help them understand
the content easier. The second work highlights informative sentences in answer posts, through
ensemble models of extractive summarization approaches. Compared with state-of-the art extractive
summarization methods, our approach consistently outperforms them between 13.41%
and 40.91% for problem-cause extractive summarization, and between 4.12% and 40.28% for
solution summarization, with respect to relative improvement.
Existing works on generating SQL queries from natural language are conditioned either
on some given table schema or relational databases. We analyzed real-world developers’ data
management issues in SO and found that these scenarios are a tiny portion of a myriad of other
problems developers are facing. In the third work, we propose an end-to-end general purpose
natural language to SQL (GP-NL2SQL) statement generation using SO dataset. Our method
also incorporates a denoising module that can be applied to correct ill-formed queries regardless
of the GP-NL2SQL generation model used. Experiments show that the proposed GP-NL2SQL
yields more well-formed queries (up to 43% more using a Seq2Seq model).
Post a Comment