Collaborative Recognition and Recovery of the Chinese Intercept Abbreviation

Jinshuo Liu, Yusen Chen, Juan Deng, Donghong Ji, Jeff Pan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

One of the important works of Information Content Security is evaluating the theme words of the text. Because of the variety of the Chinese expression, especially of the abbreviation, the supervision of the theme words becomes harder. The goal of this paper is to quickly and accurately discover the intercept abbreviations from the text crawled at the short time period. The paper firstly segments the target texts, and then utilizes the Supported Vector Machine (SVM) to recognize the abbreviations from the wrongly segmented texts as the candidates. Secondly, this paper presents the collaborative methods: Improve the Conditional Random Fields (CRF) to predict the corresponding word to each character of the abbreviation; To solve the problems of the 1:n relationship, collaboratively merge the ranking list from the predict steps with the matched results of the thesaurus of abbreviations. The experiments demonstrate that our method at the recognizing stage is 76.5% of the accuracy and 77.8% of the recall rate. At the recovery step, the accuracy is 62.1%, which is 20.8% higher than the method based on Hidden Markov Model (HMM).
Original languageEnglish
Title of host publicationChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data
Subtitle of host publication16th China National Conference, CCL 2017, and 5th International Symposium, NLP-NABD 2017, Nanjing, China, October 13-15, 2017, Proceedings
EditorsMaosong Sun, Xiaojie Wang, Baobao Chang, Deyi Xiong
Place of PublicationCham
PublisherSpringer
Pages224-236
Number of pages13
ISBN (Electronic)978-3-319-69005-6
ISBN (Print)978-3-319-69005-6
DOIs
Publication statusPublished - 7 Oct 2017
EventThe 16th Chian National Conference on Computational Linguistics & The 5th International Symposium on Natural Processing based on Natural Annoted Big Data - Nanjing, China
Duration: 13 Oct 201715 Oct 2017
http://www.cips-cl.org/static/CCL2017/en/home.html

Publication series

NameLecture Notes in Computer Science
PublisherSpringer, Cham
Volume10565
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceThe 16th Chian National Conference on Computational Linguistics & The 5th International Symposium on Natural Processing based on Natural Annoted Big Data
Abbreviated titleCCL 2017 and NLP-NABD 2017
Country/TerritoryChina
CityNanjing
Period13/10/1715/10/17
Internet address

Keywords / Materials (for Non-textual outputs)

  • Collaborative recovery
  • Improved CRF
  • Chinese abbreviation

Fingerprint

Dive into the research topics of 'Collaborative Recognition and Recovery of the Chinese Intercept Abbreviation'. Together they form a unique fingerprint.

Cite this