Abstract
The nature of people’s web navigation has significantly changed in recent years.The advent of smartphones and other handheld devices has given rise to web users consulting websites with more than one device, or using a shared device.As a result, large volumes of seemingly disjoint data are available, which when analysed together can support decision-making. The task of identifying web sessions by linking such data back to a specific person, however, is hard. The idea of session stitching aims to overcome this by using machine learning inference to identify similar or identical users. Many such efforts use various demographic data or device-based features to train matching algorithms. However, often these variables are not available for every dataset or are recorded differently,making a streamlined setup difficult. Besides, the often result in vast feature spaces which are hard to use for actionable interpretation.
In this paper, we present an alternative approach based on the finger printing of web pages visited by users in a single session. By learning behavioral patterns from these sequences of page visits, we obtain features that can be used for matching without requiring sensitive user-agent data such as IP, geo location,or device details as is common with other approaches. Using these sequential fingerprints does not rely on pre-defined features, but only requires the recording of web page visits, making our approach actionable. The approach is empirically tested on real-life web logs and compared with matching using regular user-agent features and state-of-the-art embedding techniques. Results in an ecommerce context show sequential features can still obtain strong performance with fewer features, facilitating decision-making on session stitching and inform subsequent related activities such as marketing or customer analysis.
In this paper, we present an alternative approach based on the finger printing of web pages visited by users in a single session. By learning behavioral patterns from these sequences of page visits, we obtain features that can be used for matching without requiring sensitive user-agent data such as IP, geo location,or device details as is common with other approaches. Using these sequential fingerprints does not rely on pre-defined features, but only requires the recording of web page visits, making our approach actionable. The approach is empirically tested on real-life web logs and compared with matching using regular user-agent features and state-of-the-art embedding techniques. Results in an ecommerce context show sequential features can still obtain strong performance with fewer features, facilitating decision-making on session stitching and inform subsequent related activities such as marketing or customer analysis.
Original language | English |
---|---|
Article number | 113579 |
Journal | Decision Support Systems |
Early online date | 28 Apr 2021 |
DOIs | |
Publication status | E-pub ahead of print - 28 Apr 2021 |
Keywords / Materials (for Non-textual outputs)
- session stitching
- web analytics
- sequence mining
- session fingerprinting