Abstract / Description of output
his paper introduces significant enhancements to RepoSim4Py and RepoSnipy, advanced semantic tools for deep analysis of software repositories. RepoSim4Py command- line toolbox now supports multi-level embedding, encompass- ing code, documentation, requirements, README, and com- prehensive repository analysis, which enable the understand- ing of repository dynamics. Concurrently, RepoSnipy web- based search engine facilitates sophisticated repository similarity searches and introduces clustering based on both repository tags (topic cluster) and code embeddings (code cluster). We also introduce SimilarityCal, a novel binary classification model trained on these clusters, to predict and quantify repository similarities with high accuracy. These developments provide researchers and developers with powerful tools to navigate the complex landscape of software repositories, improving efficiency in software development and fostering innovation through better reuse of existing resources.
Original language | English |
---|---|
Number of pages | 10 |
DOIs | |
Publication status | Published - 23 Sept 2024 |
Event | IEEE eScience 2024 - Senri Life Science Center , Osaka, Japan Duration: 16 Sept 2024 → 20 Sept 2024 https://www.escience-conference.org/2024/ |
Conference
Conference | IEEE eScience 2024 |
---|---|
Abbreviated title | eScience 2024 |
Country/Territory | Japan |
City | Osaka |
Period | 16/09/24 → 20/09/24 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- repository similarity
- semantic analysis
- reposi tory clustering
- code understanding
- multi-level embeddings
- pretrained language models
- GitHub
- mining software repositories.