Using Clustering Algorithms to Automatically Identify Phishing Campaigns

Kholoud Althobaiti, Kami Vaniea, Maria K. Wolters, Nawal Alsufyani

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Attackers attempt to create successful phishing campaigns by sending out trustworthy-looking emails with a range of variations, such as adding the recipient name in the subject line or changing URLs in email body. These tactics are used to bypass filters and make it difficult for the information system teams to block all emails even when they are aware of an ongoing attack. Little is done about grouping emails into campaigns with the goal of better supporting staff who mitigate phishing using reported phishing. This paper explores the feasibility of using clustering algorithms to group emails into campaigns that IT staff would interpret as being similar. First, we applied Meanshift and DBSCAN algorithms with seven feature sets. Then, we evaluated the solutions with the Silhouette coefficient and homogeneity score and find that Mean Shift outperforms DBSCAN with email origin and URLs based features. We then run a user study to validate our clustering solution and find that clustering is a promising approach for campaign identification.
Original languageEnglish
Article number10235951
Pages (from-to)96502-96513
Number of pages12
JournalIEEE Access
Early online date31 Aug 2023
Publication statusPublished - 11 Sept 2023

Keywords / Materials (for Non-textual outputs)

  • electronic mail
  • phishing
  • security
  • clustering algorithms
  • companies
  • uniform resource locators
  • servers

Cite this