Parameter-Free Probabilistic API Mining at GitHub Scale

Jaroslav Fowkes, Charles Sutton

Research output: Working paper

Abstract / Description of output

Existing API mining algorithms are not yet practical to use as they require expensive parameter tuning and the returned set of API calls can be large, highly redundant and difficult to understand. In an attempt to remedy these shortcomings we present PAM (Probabilistic API Miner), a near parameter-free probabilistic algorithm for mining the most informative API call patterns. We show that PAM significantly outperforms both MAPO and UPMiner, achieving 70% test-set precision, at retrieving relevant API call sequences from GitHub. Moreover, we focus on libraries for which the developers have explicitly provided code examples, yielding over 300,000 LOC of hand-written API example code from the 967 client projects in the data set. This evaluation suggests that the hand-written examples actually have limited coverage of real API usages.
Original languageEnglish
PublisherArXiv
Number of pages12
Publication statusPublished - 17 Dec 2015

Fingerprint

Dive into the research topics of 'Parameter-Free Probabilistic API Mining at GitHub Scale'. Together they form a unique fingerprint.

Cite this