A tight scrape: methodological approaches to cybercrime research data collection in adversarial environments

Kieron Turk, Sergio Pastrana, Ben Collier

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We outline in this article a study of ‘adversarial scraping’ for academic research, which involves the collection of data from websites that implement defences against traditional web scraping tools. Although this is primarily a research methods article, it also constitutes a valuable systematic accounting of the different defensive techniques used by the administrators of illicit online services. Some of these administrators intentionally implement functionality which attempts to prevent web scrapers from gathering data from their site, and some will unintentionally design their sites in ways that make data gathering harder. This is of particular importance for criminological research, where websites such as cryptomarkets and underground forums are publicly available (and hence there is an ethical case for data collection), but the illicit activity involved means that the administrators of these services limit scraping. We classify different anti-crawling techniques taken by websites and outline our developed countermeasures. Based on this, we evaluate which of these methods do and do not succeed at preventing data gathering from a website, as well as those which impact the scraper but do not necessarily prevent the data from being obtained. We find that there are some defences that, if used together, might thwart scraping. There are also a series of defences that are successful at slowing down scrapers, making historical scraping more difficult. On the other hand, we show that many defences are easy to work around and do not impact scraping.
Original languageEnglish
Title of host publication2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)
PublisherInstitute of Electrical and Electronics Engineers
Number of pages10
ISBN (Electronic)9781728185972
ISBN (Print)9781728185989
DOIs
Publication statusPublished - 22 Oct 2020
EventIEEE European Symposium on Security and Privacy 2020 -
Duration: 7 Sept 202011 Sept 2020
https://www.ieee-security.org/TC/EuroSP2020/index.html

Publication series

NameIEEE European Symposium on Security and Privacy Workshops
PublisherIEEE Xplore
Volume5

Conference

ConferenceIEEE European Symposium on Security and Privacy 2020
Period7/09/2011/09/20
Internet address

Keywords / Materials (for Non-textual outputs)

  • web scraping
  • cybercrime
  • web crawling
  • underground forums
  • chat channels

Fingerprint

Dive into the research topics of 'A tight scrape: methodological approaches to cybercrime research data collection in adversarial environments'. Together they form a unique fingerprint.

Cite this