首页> 外文会议>IAPR International Workshop on Document Analysis Systems >Improving OCR Accuracy on Early Printed Books by Utilizing Cross Fold Training and Voting
【24h】

Improving OCR Accuracy on Early Printed Books by Utilizing Cross Fold Training and Voting

机译:通过交叉训练和投票提高早期印刷书籍的OCR准确性

获取原文

摘要

In this paper we introduce a method that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books. The method uses a combination of cross fold training and confidence based voting. After allocating the available ground truth in different subsets several training processes are performed, each resulting in a specific OCR model. The OCR text generated by these models then gets voted to determine the final output by taking the recognized characters, their alternatives, and the confidence values assigned to each character into consideration. Experiments on seven early printed books show that the proposed method outperforms the standard approach considerably by reducing the amount of errors by up to 50% and more.
机译:在本文中,我们介绍了一种方法,该方法可以显着降低从早期印刷书籍上训练的OCRopus模型获得的OCR文本的字符错误率。该方法结合了交叉训练和基于信任的投票。在将可用的基本事实分配给不同的子集后,将执行几个训练过程,每个过程都会生成一个特定的OCR模型。然后,将这些模型生成的OCR文本进行投票,以通过考虑识别的字符,它们的替代方案以及分配给每个字符的置信度值来确定最终输出。在七本早期印刷书籍上进行的实验表明,该方法可将错误数量减少多达50%甚至更多,从而大大优于标准方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号