首页> 外文会议>International Conference on Intelligent Computing >Segmentation of Mixed Chinese/English Documents Based on Chinese Radicals Recognition and Complexity Analysis in Local Segment Pattern
【24h】

Segmentation of Mixed Chinese/English Documents Based on Chinese Radicals Recognition and Complexity Analysis in Local Segment Pattern

机译:基于中国自由基识别与局部段模式复杂性分析的混合中/英语文件的分割

获取原文

摘要

Segmentation based on character recognition is one of the most popular methods of segmenting mixed Chinese/English documents. However, the rejection to outliers is always the bottleneck of this method. A new method is provided to alleviate the problem in this paper. We will give language attribute of each segment as possible as we can and then merge or split segment according to the language attribute. First of all, we construct a mixed OCR engine for Chinese radical and English character and some English character-pairs. Furthermore, English negative samples are trained to improve the capability of rejection to outliers. Finally, language determination of segments based on the mixed OCR engine and complexity analysis of local pattern is conducted. Encouraging performance has been obtained according to the test results.
机译:基于字符识别的分割是分割混合中文/英语文件的最流行方法之一。然而,对异常值的拒绝始终是这种方法的瓶颈。提供了一种新方法来缓解本文的问题。我们将尽可能多地提供每个段的语言属性,然后根据语言属性合并或分割段。首先,我们为中国激进和英语角色和一些英文角色对构建一个混合的OCR引擎。此外,培训英语阴性样本以提高拒绝异常值的能力。最后,进行了基于混合OCR发动机的段的语言确定和局部模式的复杂性分析。根据测试结果获得了令人鼓舞的表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号