Practical Automated Detection of Malicious npm Packages

Adriana Sejfia, Max Schafer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

The npm registry is one of the pillars of the JavaScript and Type-Script ecosystems, hosting over 1.7 million packages ranging from simple utility libraries to complex frameworks and entire applications. Each day, developers publish tens of thousands of updates as well as hundreds of new packages. Due to the overwhelming popularity of npm, it has become a prime target for malicious actors, who publish new packages or compromise existing packages to introduce malware that tampers with or exfiltrates sensitive data from users who install either these packages or any package that (transitively) depends on them. Defending against such attacks is essential to maintaining the integrity of the software supply chain, but the sheer volume of package updates makes comprehensive manual review infeasible. We present Amalfi, a machine-learning based approach for automatically detecting potentially malicious packages comprised of three complementary techniques. We start with classifiers trained on known examples of malicious and benign packages. If a package is flagged as malicious by a classifier, we then check whether it includes metadata about its source repository, and if so whether the package can be reproduced from its source code. Packages that are reproducible from source are not usually malicious, so this step allows us to weed out false positives. Finally, we also employ a simple textual clone-detection technique to identify copies of malicious packages that may have been missed by the classifiers, reducing the number of false negatives. Amalfi improves on the state of the art in that it is lightweight, requiring only a few seconds per package to extract features and run the classifiers, and gives good results in practice: running it on 96287 package versions published over the course of one week, we were able to identify 95 previously unknown malware samples, with a manageable number of false positives.

Original languageEnglish
Title of host publicationProceedings - 2022 ACM/IEEE 44th International Conference on Software Engineering, ICSE 2022
PublisherIEEE Computer Society Press
Pages1681-1692
Number of pages12
ISBN (Electronic)9781450392211
DOIs
Publication statusPublished - 5 Jul 2022
Event44th ACM/IEEE International Conference on Software Engineering, ICSE 2022 - Pittsburgh, United States
Duration: 22 May 202227 May 2022

Publication series

NameProceedings - International Conference on Software Engineering
Volume2022-May
ISSN (Print)0270-5257

Conference

Conference44th ACM/IEEE International Conference on Software Engineering, ICSE 2022
Country/TerritoryUnited States
CityPittsburgh
Period22/05/2227/05/22

Keywords / Materials (for Non-textual outputs)

  • malware detection
  • supply chain security

Fingerprint

Dive into the research topics of 'Practical Automated Detection of Malicious npm Packages'. Together they form a unique fingerprint.

Cite this