Abstract / Description of output
In the era of deep learning-based systems, efficient input representation is one of the primary requisites in solving various problems related to Natural Language Processing (NLP), data mining, text mining, and the like. Absence of adequate representation for an input introduces the problem of data sparsity, and it poses a great challenge to solve the underlying problem. The problem is more intensified with resource-poor languages due to the absence of a sufficiently large corpus required to train a word embedding model. In this work, we propose an effective method to improve the word embedding coverage in less-resourced languages by leveraging bilingual word embeddings learned from different corpora. We train and evaluate deep Long Short Term Memory (LSTM)-based architecture and show the effectiveness of the proposed approach for two aspect-level sentiment analysis tasks (i.e., aspect term extraction and sentiment classification). The neural network architecture is further assisted by hand-crafted features for prediction. We apply the proposed model in two experimental setups: multi-lingual and cross-lingual. Experimental results show the effectiveness of the proposed approach against the state-of-the-art methods.
Original language | English |
---|---|
Article number | 15 |
Number of pages | 22 |
Journal | ACM Transactions on Asian Language Information Processing |
Volume | 18 |
Issue number | 2 |
Early online date | 17 Dec 2018 |
DOIs | |
Publication status | Published - 1 Feb 2019 |
Keywords / Materials (for Non-textual outputs)
- bilingual word embeddings
- low-resourced languages
- Indian languages
- Aspect-Based Sentiment Analysis (ABSA)
- deep learning
- Sentiment analysis
- data sparsity
- Long Short Term Memory (LSTM)
- cross-lingual sentiment analysis