Integrating character representations into Chinese word embedding

Chen, Xingyuan, Jin, Peng, McCarthy, Diana Frances and Carroll, John (2016) Integrating character representations into Chinese word embedding. In: Dong, Minghui, Lin, Jingxia and Tang, Xuri (eds.) Chinese lexical semantics: 17th workshop, CLSW 2016, Singapore, Singapore, May 20–22, 2016, revised selected papers. Lecture notes in computer science, 10085 . Springer International Publishing, pp. 335-349. ISBN 9783319495071

Full text not available from this repository.


In this paper we propose a novel word representation for Chinese based on a state-of-the-art word embedding approach. Our main contribution is to integrate distributional representations of Chinese characters into the word embedding. Recent related work on European languages has demonstrated that information from inflectional morphology can reduce the problem of sparse data and improve word representations. Chinese has very little inflectional morphology, but there is potential for incorporating character-level information. Chinese characters are drawn from a fixed set – with just under four thousand in common usage – but a major problem with using characters is their ambiguity. In order to address this problem, we disambiguate the characters according to groupings in a semantic hierarchy. Coupling our character embeddings with word embeddings, we observe improved performance on the tasks of finding synonyms and rating word similarity compared to a model using word embeddings alone, especially for low frequency words.

Item Type: Book Section
Schools and Departments: School of Engineering and Informatics > Informatics
Research Centres and Groups: Data Science Research Group
Subjects: Q Science > QA Mathematics > QA0075 Electronic computers. Computer science
Depositing User: John Carroll
Date Deposited: 22 Feb 2017 15:38
Last Modified: 22 Feb 2017 15:38
📧 Request an update