Extracting a Topic Specific Dataset from a Twitter Archive

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Datasets extracted from the microblogging service Twitter are often generated using specific query terms or hashtags. We describe how a dataset produced using the query term 'syria' can be increased in size to include tweets on the topic of Syria that do not contain that query term. We compare three methods for this task, using the top hashtags from the set as search terms, using a hand selected set of hashtags as search terms and using LDA topic modelling to cluster tweets and selecting appropriate clusters. We describe an evaluation method for accessing the relevance and accuracy of the tweets returned.
Original languageEnglish
Title of host publicationResearch and Advanced Technology for Digital Libraries
Subtitle of host publication9th International Conference on Theory and Practice of Digital Libraries, TPDL 2015, Poznań, Poland, September 14-18, 2015, Proceedings
PublisherSpringer International Publishing
Pages364-367
Number of pages4
ISBN (Electronic)978-3-319-24592-8
ISBN (Print)978-3-319-24591-1
DOIs
Publication statusPublished - 28 Nov 2015

Publication series

NameLecture Notes in Computer Science
PublisherSpringer International Publishing
Volume9316
ISSN (Print)0302-9743

Cite this