【24h】

Partitioning of the degradation space for OCR training

机译:划分退化空间以进行OCR训练

获取原文
获取原文并翻译 | 示例

摘要

Generally speaking optical character recognition algorithms tend to perform better when presented with homogeneous data. This paper studies a method that is designed to increase the homogeneity of training data, based on an understanding of the types of degradations that occur during the printing and scanning process, and how these degradations affect the homogeneity of the data. While it has been shown that dividing the degradation space by edge spread improves recognition accuracy over dividing the degradation space by threshold or point spread function width alone, the challenge is in deciding how many partitions and at what value of edge spread the divisions should be made. Clustering of different types of character features, fonts, sizes, resolutions and noise levels shows that edge spread is indeed shown to be a strong indicator of the homogeneity of character data clusters.
机译:一般而言,光学字符识别算法在呈现同质数据时往往会表现更好。本文基于对打印和扫描过程中发生的降级类型以及这些降级如何影响数据均一性的理解,研究了一种旨在提高训练数据同质性的方法。虽然已经表明,通过边缘扩展来划分退化空间比仅通过阈值或点扩展函数宽度来划分退化空间能够提高识别精度,但挑战在于确定应进行多少划分以及以边缘扩展的值进行划分。不同类型的字符特征,字体,大小,分辨率和噪声水平的聚类表明,边缘扩展确实显示为字符数据聚类同质性的有力指标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号