Recency is good: expanding with fresh news improves event detection in Twitter

Sasa Petrovic, Miles Osborne, Victor Lavrenko

Research output: Contribution to journalArticlepeer-review

Abstract

Twitter is a popular microblogging site that is a good source of real-time information. Detecting events in Twitter is an ongoing research effort and a fundamental task is clustering tweets according to which (news) event they describe. Document expansion can improve this clustering, especially
for Twitter, given that tweets are short. While document expansion using external corpora has been around for years [1], all previous work treats the external corpus as temporally static. We are the first to treat the external corpus (newswire articles in this case) as a time-synchronous stream, expanding tweets with words found in similar, temporally aligned newswire articles. Tweets are expanded with terms from the most similar newswire document, where the terms are weighted by the cosine similarity between the tweet and the newswire document [2]. Using the tweet corpus compiled by [3], and newswire data from the same time period, coming from eight major newswire sources (Reuters, CNN, BBC, New York Times, Google News, Guardian, Wired, The Register), we find that using timely newswire for expansion material improves event detection for Twitter more than using older newswire for the same purpose.
Original languageEnglish
Pages (from-to)1-1
Number of pages1
JournalTiny Transactions on Computer Science (TinyToCS)
Volume2
Publication statusPublished - 2013

Cite this