Abstract / Description of output
Attackers attempt to create successful phishing campaigns by sending out trustworthy-looking emails with a range of variations, such as adding the recipient name in the subject line or changing URLs in email body. These tactics are used to bypass filters and make it difficult for the information system teams to block all emails even when they are aware of an ongoing attack. Little is done about grouping emails into campaigns with the goal of better supporting staff who mitigate phishing using reported phishing. This paper explores the feasibility of using clustering algorithms to group emails into campaigns that IT staff would interpret as being similar. First, we applied Meanshift and DBSCAN algorithms with seven feature sets. Then, we evaluated the solutions with the Silhouette coefficient and homogeneity score and find that Mean Shift outperforms DBSCAN with email origin and URLs based features. We then run a user study to validate our clustering solution and find that clustering is a promising approach for campaign identification.
Keywords / Materials (for Non-textual outputs)
- electronic mail
- clustering algorithms
- uniform resource locators