Electronic Health Records (EHR) narratives are a rich source of information, embedding high-resolution information of value to secondary research use. However, because the EHRs are mostly in natural language free-text and highly ambiguity-ridden, many natural language processing algorithms have been devised around them to extract meaningful structured information about clinical entities. The performance of the algorithms however, largely varies depending on the training dataset as well as the effectiveness of the use of background knowledge to steer the learning process.
In this paper we study the impact of initializing the training of a neural network natural language processing algorithm with pre-defined clinical word embeddings to improve feature extraction and relationship classification between entities. We add our embedding framework to a bi-directional long short-term memory (Bi-LSTM) neural network, and further study the effect of using attention weights in neural networks for sequence labelling tasks to extract knowledge of Adverse Drug Reactions (ADRs). We incorporate unsupervised word embeddings using Word2Vec and GloVe from widely available medical resources such as Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpora, Unified Medical Language System (UMLS) as well as embed pharmaco lexicon from available EHRs. Our algorithm, implemented using two datasets, shows that our architecture outperforms baseline Bi-LSTM or Bi-LSTM networks using linear chain and Skip-Chain conditional random fields (CRF).
|Publication status||Published - Jul 2017|
|Event||DH '17: Proceedings of the 2017 International Conference on Digital Health - |
Duration: 2 Jul 2017 → 5 Jul 2017
|Conference||DH '17: Proceedings of the 2017 International Conference on Digital Health|
|Period||2/07/17 → 5/07/17|