Towards Using Word Embedding Vector Space for Better Cohort Analysis

Mohamed Bahgat, Steve Wilson, Walid Magdy

Research output: Chapter in Book/Report/Conference proceedingConference contribution


On websites like Reddit, users join communities where they discuss specific topics which cluster them into possible cohorts. The authors within these cohorts have the opportunity to post more openly under the blanket of anonymity, and such openness provides a more accurate signal on the real issues individuals are facing. Some communities contain discussions about mental health struggles such as depression and suicidal ideation. To better understand and analyse these individuals, we propose to exploit properties of word embeddings that group related concepts close to each other in the embeddings space. For the posts from each topically situated sub-community, we build a word embeddings model and use handcrafted lexicons to identify emotions, values and psycholinguistically relevant concepts. We then extract insights into ways users perceive these concepts by measuring distances between them and references made by users either to themselves, others orother things around them. We show how our proposed approach can extract meaningful signals that go beyond the kinds of analyses performed at the individual word level.
Original languageEnglish
Title of host publicationProceedings of the International AAAI Conference on Web and Social Media
Place of PublicationPalo Alto, California USA
PublisherAAAI Press
Number of pages5
ISBN (Electronic)978-1-57735-823-7
Publication statusPublished - 26 May 2020
Event14th International Conference on Web and Social Media - Atlanta, United States
Duration: 8 Jun 202011 Jun 2020
Conference number: 14

Publication series

ISSN (Print)2162-3449
ISSN (Electronic)2334-0770


Conference14th International Conference on Web and Social Media
Abbreviated titleICWSM 2020
Country/TerritoryUnited States
Internet address


Dive into the research topics of 'Towards Using Word Embedding Vector Space for Better Cohort Analysis'. Together they form a unique fingerprint.

Cite this