Identification of topics in source code

Girish Maskeri Rama (Inventor), Kenneth Heafield (Inventor), Santonu Sarkar (Inventor)

Research output: Patent

Abstract

Topics in source code can be identified using Latent Dirichlet Allocation (LDA) by receiving source code, identifying domain specific keywords from the source code, generating a keyword matrix, processing the keyword matrix and the source code using LDA, and outputting a list of topics. The list of topics is output as collections of domain specific keywords. Probabilities of domain specific keywords belonging to their respective topics can also be output. The keyword matrix comprises weighted sums of occurrences of domain specific keywords in the source code.
Original languageEnglish
Patent numberUS8209665 B2
Priority date8/04/08
Publication statusPublished - 26 Jun 2012

Cite this