Projects per year
Abstract
We outline in this article a study of ‘adversarial scraping’ for academic research, which involves the collection of data from websites that implement defences against traditional web scraping tools. Although this is primarily a research methods article, it also constitutes a valuable systematic accounting of the different defensive techniques used by the administrators of illicit online services. Some of these administrators intentionally implement functionality which attempts to prevent web scrapers from gathering data from their site, and some will unintentionally design their sites in ways that make data gathering harder. This is of particular importance for criminological research, where websites such as cryptomarkets and underground forums are publicly available (and hence there is an ethical case for data collection), but the illicit activity involved means that the administrators of these services limit scraping. We classify different anti-crawling techniques taken by websites and outline our developed countermeasures. Based on this, we evaluate which of these methods do and do not succeed at preventing data gathering from a website, as well as those which impact the scraper but do not necessarily prevent the data from being obtained. We find that there are some defences that, if used together, might thwart scraping. There are also a series of defences that are successful at slowing down scrapers, making historical scraping more difficult. On the other hand, we show that many defences are easy to work around and do not impact scraping.
Original language | English |
---|---|
Title of host publication | 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) |
Publisher | Institute of Electrical and Electronics Engineers |
Number of pages | 10 |
ISBN (Electronic) | 9781728185972 |
ISBN (Print) | 9781728185989 |
DOIs | |
Publication status | Published - 22 Oct 2020 |
Event | IEEE European Symposium on Security and Privacy 2020 - Duration: 7 Sept 2020 → 11 Sept 2020 https://www.ieee-security.org/TC/EuroSP2020/index.html |
Publication series
Name | IEEE European Symposium on Security and Privacy Workshops |
---|---|
Publisher | IEEE Xplore |
Volume | 5 |
Conference
Conference | IEEE European Symposium on Security and Privacy 2020 |
---|---|
Period | 7/09/20 → 11/09/20 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- web scraping
- cybercrime
- web crawling
- underground forums
- chat channels
Fingerprint
Dive into the research topics of 'A tight scrape: methodological approaches to cybercrime research data collection in adversarial environments'. Together they form a unique fingerprint.-
-
CrimeBB collaborations with the Cambridge Cybercrime Centre
Collier, B. (Principal Investigator)
Project: Research