Active Learning for Chinese Word Segmentation

机译：主动学习中文分词

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Currently, the best performing models for Chinese word segmentation (CWS) are extremely resource intensive in terms of annotation data quantity. One promising solution to minimize the cost of data acquisition is active learning, which aims to actively select the most useful instances to annotate for learning. Active learning on CWS, however, remains challenging due to its inherent nature. In this paper, we propose a Word Boundary Annotation (WBA) model to make effective active learning on CWS possible. This is achieved by annotating only those uncertain boundaries. In this way, the manual annotation cost is largely reduced, compared to annotating the whole character sequence. To further minimize the annotation effort, a diversity measurement among the instances is considered to avoid duplicate annotation. Experimental results show that employing the WBA model and the diversity measurement into active learning on CWS can save much annotation cost with little loss in the performance.

机译：当前，就注释数据量而言，性能最佳的中文分词模型（CWS）占用的资源非常多。主动学习是一种将数据获取成本降至最低的有前途的解决方案，其目的是主动选择最有用的实例进行注释。但是，由于CWS的固有性质，因此主动学习仍然具有挑战性。在本文中，我们提出了单词边界注释（WBA）模型，以使在CWS上进行有效的主动学习成为可能。这是通过仅注释那些不确定的边界来实现的。这样，与对整个字符序列进行注释相比，手动注释的成本大大降低了。为了进一步最小化注释工作，考虑了实例之间的分集测量以避免重复注释。实验结果表明，将WBA模型和分集测量应用于CWS的主动学习可以节省很多注释成本，而性能损失很小。

著录项

来源
《International conference on computational linguistics》|2012年|683-692|共10页
会议地点
作者
Shoushan Li; Guodong Zhou; Chu-Ren Huang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Chinese Word Segmentation; Active Learning; Word Boundary Annotation;

机译：中文分词;主动学习;词边界注释;

相似文献

外文文献
中文文献
专利

1. Hybrid Feature Fusion Learning Towards Chinese Chemical Literature Word Segmentation [J] . Xiang Li, Kewen Zhang, Quanyin Zhu, Quality Control, Transactions . 2021,第1期

机译：杂交特征融合学习中国化学文学词分割
2. Learning Chinese Word Segmentation Based on Bidirectional GRU-CRF and CNN Network Model [J] . Chenghai Yu, Shupei Wang, Jiajun Guo International journal of technology and human interaction . 2019,第3期

机译：基于双向GRU-CRF和CNN网络模型的中文分词学习
3. Simple Semi-supervised Learning for Chinese Word Segmentation and Pos Tagging [J] . Xinxin Li, Xuan Wang, Muhammad Waqas Anwar Information Technology Journal . 2013,第20期

机译：汉字分割和POS标记的简单半监督学习
4. Active Learning for Chinese Word Segmentation on Judgements [C] . Qian Yan, Limin Wang, Shoushan Li, Natural language understanding and intelligent applications . 2017

机译：主动学习判断中的汉语分词
5. Experimental comparison of discriminative learning approaches for Chinese word segmentation. [D] . Song, Dong. 2008

机译：判别学习方法对中文分词的实验比较。
6. Rapid Cortical Plasticity Induced by Active Associative Learning of Novel Words in Human Adults [O] . Alexandra M. Razorenova, Boris V. Chernyshev, Anastasia Yu Nikolaeva, 2020

机译：在人类成年人中积极联想学习引起的快速皮质塑性
7. Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff [O] . Wei-yun Ma 2003

机译：CKIP中文分词系统的首次国际分词推广

Active Learning for Chinese Word Segmentation

摘要

著录项

相似文献

相关主题

期刊订阅