Using cross ambiguity model improves the effect of vietnamese word segmentation

Niu Yitong; Xiong Mingming; Guo Jianyi; Mao Cunli; Xian Yantuan; Yu Zhengtao

首页> 外文期刊>International Journal of Computer Systems Science & Engineering >Using cross ambiguity model improves the effect of vietnamese word segmentation

【24h】

Using cross ambiguity model improves the effect of vietnamese word segmentation

机译：使用交叉歧义模型可提高越南语分词的效果

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The ambiguity problem is widely distributed in Vietnamese sentences and impacts the accuracy of word segmentation. In this paper, we proposed a Vietnamese word segmentation method based on CRF (Condition Random Field) and cross ambiguity models, which we combined with Vietnamese lexical features to incorporate essential characteristics of Vietnamese into Condition Random Fields. Overall,5377 ambiguity fragments were extracted from the training corpus, selected statistical features, ambiguity field internal features and ambiguity contextual features and placed into the maximum entropy model and cross ambiguity model, and then incorporated into the segmentation model. The training corpus is divided into ten copies evenly for the cross validation experiment; the segmentation accuracy reached 96.55%. And compared with the Vietnamese segmentation tool,VnTokenizer, the experimental results suggest that our proposed method for Vietnamese segmentation performs well and is precise. The precision and recall rates of the proposed model are increased by 1.34% and 0.63% over VnTokenizer, and alignment error rate (AER) is reduced by 0.98%.

机译：歧义问题广泛分布在越南语句子中，并影响分词的准确性。本文提出了一种基于条件随机场（CRF）和交叉歧义模型的越南语分词方法，并结合越南语的词法特征将越南语的基本特征纳入条件随机场中。总体上，从训练语料库中提取了5377个歧义片段，选择了统计特征，歧义字段内部特征和歧义上下文特征，并将其放入最大熵模型和交叉歧义模型中，然后将其纳入分割模型中。训练语料库平均分为十份，用于交叉验证实验；分割精度达到96.55％。实验结果表明，与越南文分割工具VnTokenizer相比，本文提出的越南文分割方法效果良好且精确。与VnTokenizer相比，该模型的精度和召回率分别提高了1.34％和0.63％，对齐错误率（AER）降低了0.98％。

著录项

来源
《International Journal of Computer Systems Science & Engineering》 |2016年第6期|475-484|共10页
作者
Niu Yitong; Xiong Mingming; Guo Jianyi; Mao Cunli; Xian Yantuan; Yu Zhengtao;
展开▼
作者单位

Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming 650500, Peoples R China;

Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming 650500, Peoples R China;

Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming 650500, Peoples R China|Yunnan Coll, Key Lab Pattern Recognit & Intelligent Comp, Kunming 650500, Peoples R China;

Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming 650500, Peoples R China|Yunnan Coll, Key Lab Pattern Recognit & Intelligent Comp, Kunming 650500, Peoples R China;

Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming 650500, Peoples R China|Yunnan Coll, Key Lab Pattern Recognit & Intelligent Comp, Kunming 650500, Peoples R China;

Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming 650500, Peoples R China|Yunnan Coll, Key Lab Pattern Recognit & Intelligent Comp, Kunming 650500, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Vietnamese corpus; CRFs; Vietnamese segmentation; Maximum Entropy; Cross ambiguity model; VnTokenizer;

机译：越南语料库;CRFs;越南语分割;最大熵;交叉歧义模型;VnTokenizer;

相似文献

外文文献
中文文献
专利

1. Ambiguity Analysis Model of Word Segmentation Based on Word Group [J] . Rongliang Luo, Hongxi Zhang, Minghui Wu Journal of Applied Sciences . 2013,第16期

机译：基于词组的分词歧义分析模型
2. A Chinese word segmentation based on language situation in processing ambiguous words [J] . Zhang MY, Lu ZD, Zou CY Information Sciences: An International Journal . 2004,第3a4期

机译：基于语言环境的歧义词中文分词
3. Cross-Situational Learning of Phonologically Overlapping Words Across Degrees of Ambiguity [J] . Mulak Karen E., Vlach Haley A., Escudero Paola Cognitive science . 2019,第5期

机译：跨歧义程度的语音重叠单词的跨情境学习
4. Identifying reduplicative words for Vietnamese word segmentation [C] . Ngoc Anh Tran, Phuong Thai Nguyen, Thanh Tinh Dao, 2015 IEEE RIVF International Conference on Computing amp; Communication Technologies - Research, Innovation, and Vision for Future . 2015

机译：识别重复性词以进行越南语分词
5. Word segmentation, word recognition, and word learning: A computational model of first language acquisition. [D] . Daland, Robert. 2009

机译：分词，单词识别和单词学习：母语习得的计算模型。
6. Word segmentation of overlapping ambiguous strings during Chinese reading [O] . Guojie Ma, Xingshan Li, Keith Rayner -1

机译：汉语阅读过程中重叠歧义字符串的分词
7. Ambiguity Analysis Model of Word Segmentation Based on Word Group [O] . Rongliang Luo, Hongxi Zhang, Minghui Wu 2013

机译：基于Word组的词分割模型分析模型

Using cross ambiguity model improves the effect of vietnamese word segmentation

摘要

著录项

相似文献

相关主题

期刊订阅