首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >A LANGUAGE MODEL BASED ON SEMANTICALLY CLUSTERED WORDS IN A CHINESE CHARACTER RECOGNITION SYSTEM
【24h】

A LANGUAGE MODEL BASED ON SEMANTICALLY CLUSTERED WORDS IN A CHINESE CHARACTER RECOGNITION SYSTEM

机译:基于汉字字符识别系统中词类聚类语言的语言模型

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a new method for clustering the words in a dictionary into ward groups. A Chinese character recognition system can then use these groups in a language model to improve the recognition accuracy. In the language model, the number of parameters we must train beforehand can be kept to a reasonable value. The Chinese synonym dictionary Tong2yi4ci2 ci2lin2 providing the semantic features is used to calculate the weights of the semantic attributes of the character-based word classes. The weights of the semantic attributes are next updated according to the words of the Behavior dictionary, which has a rather complete word set. Then, the word classes are clustered to In groups according to the semantic measurement by a greedy method. The words in the Behavior dictionary can finally be assigned to the m groups. The parameter space for the bigram contextual information of the character recognition system is m(2). From the experimental results, the recognition system with the proposed model has shown better performance than that of a character-based bigram language model. (C) 1997 Pattern Recognition Society. Published by Elsevier Science Ltd. [References: 9]
机译:本文提出了一种将字典中的单词聚类为病房组的新方法。然后,汉字识别系统可以在语言模型中使用这些组来提高识别精度。在语言模型中,我们必须预先训练的参数数量可以保持在合理的值。提供语义特征的中文同义词词典Tong2yi4ci2 ci2lin2用于计算基于字符的单词类的语义属性的权重。接下来,根据行为词典中具有相当完整的单词集的单词来更新语义属性的权重。然后,根据通过贪婪方法的语义测量,将单词类别聚类为In组。行为词典中的单词最终可以分配给m个组。字符识别系统的双字组上下文信息的参数空间为m(2)。从实验结果来看,与基于字符的二元语言模型相比,具有该模型的识别系统具有更好的识别性能。 (C)1997模式识别学会。由Elsevier Science Ltd.发布[参考:9]

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号