Improving Word Embedding Coverage in Less-Resourced Languages Through Multi-Linguality and Cross-Linguality: A Case Study with Aspect-Based Sentiment Analysis

Md Shad Akhtar, Palaash Sawant, Sukanta Sen, Asif Ekbal, Pushpak Bhattacharyya

Research output: Contribution to journalArticlepeer-review

Abstract

In the era of deep learning-based systems, efficient input representation is one of the primary requisites in solving various problems related to Natural Language Processing (NLP), data mining, text mining, and the like. Absence of adequate representation for an input introduces the problem of data sparsity, and it poses a great challenge to solve the underlying problem. The problem is more intensified with resource-poor languages due to the absence of a sufficiently large corpus required to train a word embedding model. In this work, we propose an effective method to improve the word embedding coverage in less-resourced languages by leveraging bilingual word embeddings learned from different corpora. We train and evaluate deep Long Short Term Memory (LSTM)-based architecture and show the effectiveness of the proposed approach for two aspect-level sentiment analysis tasks (i.e., aspect term extraction and sentiment classification). The neural network architecture is further assisted by hand-crafted features for prediction. We apply the proposed model in two experimental setups: multi-lingual and cross-lingual. Experimental results show the effectiveness of the proposed approach against the state-of-the-art methods.
Original languageEnglish
Article number15
Number of pages22
JournalACM Transactions on Asian Language Information Processing
Volume18
Issue number2
Early online date17 Dec 2018
DOIs
Publication statusPublished - 1 Feb 2019

Keywords

  • bilingual word embeddings
  • low-resourced languages
  • Indian languages
  • Aspect-Based Sentiment Analysis (ABSA)
  • deep learning
  • Sentiment analysis
  • data sparsity
  • Long Short Term Memory (LSTM)
  • cross-lingual sentiment analysis

Fingerprint

Dive into the research topics of 'Improving Word Embedding Coverage in Less-Resourced Languages Through Multi-Linguality and Cross-Linguality: A Case Study with Aspect-Based Sentiment Analysis'. Together they form a unique fingerprint.

Cite this