Named Entity Recognition(NER)in Chinese social media is important with the development of the internet.Previ?ous methods focus on in-domain supervised learning which is limited by the rare annotated data.However,there are enough corpora in formal domains and massive in-domain unannotated texts which can be used to improve the task.A unified model which can learn from out-of-domain corpora and in-domain unannotated texts is proposed,the unified model contains two major functions,one is for cross-domain learning and the other is for semi-supervised learning.Cross-domain leaning function can learn out-of-domain in?formation based on domain similarity.Semi-Supervised learning function can learn in-domain unannotated information by self-train?ing.Both learning functions outperform existing methods for NER in Chinese social media.Used unified model to experiment get a better result and decrease the workload of manual tagged corpus.%随着互联网的发展,对中文社交媒体中命名实体进行识别具有重要的意义,传统的做法是采用监督学习方法,局限于标注数据的稀缺.然而,通用领域中有足够的语料库且社交媒体中的海量未标注的文本可以用于提高命名实体识别的效果.论文提出了一个联合模型,利用通用领域语料库和社交网络领域中未标注的文本进行训练.该联合模型由两个模型组成,一个是跨领域学习模型另外一个是半监督学习模型.跨领域学习基于领域的相似性学习通用领域的信息.半监督学习通过主动学习目标域内未标注的信息.该联合模型提高了命名实体识别的效果,且大大减小了人工标注语料工作.
展开▼