Projects per year
Abstract / Description of output
Descriptive names are a vital part of readable, and hence maintainable,
code. Recent progress on automatically suggesting names for
local variables tantalizes with the prospect of replicating that success
with method and class names. However, suggesting names for methods
and classes is much more difficult. This is because good method
and class names need to be functionally descriptive, but suggesting
such names requires that the model goes beyond local context. We
introduce a neural probabilistic language model for source code
that is specifically designed for the method naming problem. Our
model learns which names are semantically similar by assigning
them to locations, called embeddings, in a high-dimensional continuous
space, in such a way that names with similar embeddings tend
to be used in similar contexts. These embeddings seem to contain
semantic information about tokens, even though they are learned
only from statistical co-occurrences of tokens. Furthermore, we
introduce a variant of our model that is, to our knowledge, the first
that can propose neologisms, names that have not appeared in the
training corpus. We obtain state of the art results on the method,
class, and even the simpler variable naming tasks. More broadly,
the continuous embeddings that are learned by our model have the
potential for wide application within software engineering.
code. Recent progress on automatically suggesting names for
local variables tantalizes with the prospect of replicating that success
with method and class names. However, suggesting names for methods
and classes is much more difficult. This is because good method
and class names need to be functionally descriptive, but suggesting
such names requires that the model goes beyond local context. We
introduce a neural probabilistic language model for source code
that is specifically designed for the method naming problem. Our
model learns which names are semantically similar by assigning
them to locations, called embeddings, in a high-dimensional continuous
space, in such a way that names with similar embeddings tend
to be used in similar contexts. These embeddings seem to contain
semantic information about tokens, even though they are learned
only from statistical co-occurrences of tokens. Furthermore, we
introduce a variant of our model that is, to our knowledge, the first
that can propose neologisms, names that have not appeared in the
training corpus. We obtain state of the art results on the method,
class, and even the simpler variable naming tasks. More broadly,
the continuous embeddings that are learned by our model have the
potential for wide application within software engineering.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering |
Publisher | ACM |
Pages | 38-49 |
Number of pages | 12 |
ISBN (Electronic) | 978-1-4503-3675-8 |
DOIs | |
Publication status | Published - 2015 |
Fingerprint
Dive into the research topics of 'Suggesting Accurate Method and Class Names'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Statistical Natural Language Processing Methods for Computer Program Source Code
1/10/13 → 31/03/17
Project: Research