TY - GEN
T1 - Evaluating the similarity of location-based corpora identified in Reddit comments
AU - Berragan, Cillian
AU - Singleton, Alex
AU - Calafiore, Alessia
AU - Morley, Jeremy
PY - 2023/4/2
Y1 - 2023/4/2
N2 - Social interaction is typically studied from the context of physical movement, where geographic distance and ease of connectivity influence the strength of interaction between regions. From the point of view of social media networks however, these limitations appear to still persist, despite interactions not being reliant on physical movement, suggesting non-physical geographic characteristics influence interaction between social communities. Unlike geotags, which provide explicit geographic information about social media users as coordinates, unstructured text presents an alternative perspective for the study of social interaction between regions, instead allowing for the comparison between the language used when mentioning locations in context. Our paper analyses the corpora associated with major cities across the UK, first vectorising Reddit comments through transformer-based embeddings, which capture semantic information, then using these to establish unsupervised clusters and similarity between them. We find that distinct groups emerge which broadly conform with established regional identities of locations across the UK, but with interesting deviations.
AB - Social interaction is typically studied from the context of physical movement, where geographic distance and ease of connectivity influence the strength of interaction between regions. From the point of view of social media networks however, these limitations appear to still persist, despite interactions not being reliant on physical movement, suggesting non-physical geographic characteristics influence interaction between social communities. Unlike geotags, which provide explicit geographic information about social media users as coordinates, unstructured text presents an alternative perspective for the study of social interaction between regions, instead allowing for the comparison between the language used when mentioning locations in context. Our paper analyses the corpora associated with major cities across the UK, first vectorising Reddit comments through transformer-based embeddings, which capture semantic information, then using these to establish unsupervised clusters and similarity between them. We find that distinct groups emerge which broadly conform with established regional identities of locations across the UK, but with interesting deviations.
KW - natural language processing
KW - social interaction
KW - social media
UR - https://www.scopus.com/pages/publications/85159694480
UR - https://ceur-ws.org/Vol-3385/
UR - https://geo-ext.github.io/
M3 - Conference contribution
AN - SCOPUS:85159694480
VL - 3385
T3 - CEUR Workshop Proceedings
SP - 1
EP - 6
BT - Proceedings of the First Workshop on Geographic Information Extraction from Texts (GeoExT 2023) co-located with The 45th European Conference on Information Retrieval (ECIR 2023)
A2 - Hu, Xuke
A2 - Hu, Yingjie
A2 - Resch, Bernd
A2 - Kersten, Jens
A2 - Stock, Kristin
PB - CEUR Workshop Proceedings
T2 - 1st Workshop on Geographic Information Extraction from Texts, GeoExT 2023
Y2 - 2 April 2023
ER -