【24h】

A Two-Step Approach for Automatic OCR Post-Correction

机译:一种两步的自动校正后自动校正方法

获取原文

摘要

The quality of Optical Character Recognition (OCR) is a key factor in the digitisation of historical documents. OCR errors are a major obstacle for downstream tasks and have hindered advances in the usage of the digitised documents. In this paper we present a two-step approach to automatic OCR post-correction. The first component is responsible for detecting erroneous sequences in a set of OCRed texts, while the second is designed for correcting OCR errors in them. We show that applying the preceding detection model reduces both the character error rate (CER) compared to a simple one-step correction model and the amount of falsely changed correct characters.
机译:光学字符识别(OCR)的质量是历史文档数字化的关键因素。 OCR错误是下游任务的主要障碍,并且在数字化文档的使用情况下阻碍了进步。 在本文中,我们提出了一种自动校正后的两步方法。 第一组件负责检测一组OCRED文本中的错误序列,而第二则设计用于校正它们中的OCR误差。 我们表明,与简单的一步校正模型相比,应用前述检测模型减少了字符错误率(CER),并且错误地改变了正确的字符。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号