Abstract
Twitter is a popular microblogging site that is a good source of real-time information. Detecting events in Twitter is an ongoing research effort and a fundamental task is clustering tweets according to which (news) event they describe. Document expansion can improve this clustering, especially
for Twitter, given that tweets are short. While document expansion using external corpora has been around for years [1], all previous work treats the external corpus as temporally static. We are the first to treat the external corpus (newswire articles in this case) as a time-synchronous stream, expanding tweets with words found in similar, temporally aligned newswire articles. Tweets are expanded with terms from the most similar newswire document, where the terms are weighted by the cosine similarity between the tweet and the newswire document [2]. Using the tweet corpus compiled by [3], and newswire data from the same time period, coming from eight major newswire sources (Reuters, CNN, BBC, New York Times, Google News, Guardian, Wired, The Register), we find that using timely newswire for expansion material improves event detection for Twitter more than using older newswire for the same purpose.
for Twitter, given that tweets are short. While document expansion using external corpora has been around for years [1], all previous work treats the external corpus as temporally static. We are the first to treat the external corpus (newswire articles in this case) as a time-synchronous stream, expanding tweets with words found in similar, temporally aligned newswire articles. Tweets are expanded with terms from the most similar newswire document, where the terms are weighted by the cosine similarity between the tweet and the newswire document [2]. Using the tweet corpus compiled by [3], and newswire data from the same time period, coming from eight major newswire sources (Reuters, CNN, BBC, New York Times, Google News, Guardian, Wired, The Register), we find that using timely newswire for expansion material improves event detection for Twitter more than using older newswire for the same purpose.
Original language | English |
---|---|
Pages (from-to) | 1-1 |
Number of pages | 1 |
Journal | Tiny Transactions on Computer Science (TinyToCS) |
Volume | 2 |
Publication status | Published - 2013 |