A Two-Step Approach for Automatic OCR Post-Correction

机译：一种两步的自动校正后自动校正方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The quality of Optical Character Recognition (OCR) is a key factor in the digitisation of historical documents. OCR errors are a major obstacle for downstream tasks and have hindered advances in the usage of the digitised documents. In this paper we present a two-step approach to automatic OCR post-correction. The first component is responsible for detecting erroneous sequences in a set of OCRed texts, while the second is designed for correcting OCR errors in them. We show that applying the preceding detection model reduces both the character error rate (CER) compared to a simple one-step correction model and the amount of falsely changed correct characters.

机译：光学字符识别（OCR）的质量是历史文档数字化的关键因素。 OCR错误是下游任务的主要障碍，并且在数字化文档的使用情况下阻碍了进步。在本文中，我们提出了一种自动校正后的两步方法。第一组件负责检测一组OCRED文本中的错误序列，而第二则设计用于校正它们中的OCR误差。我们表明，与简单的一步校正模型相比，应用前述检测模型减少了字符错误率（CER），并且错误地改变了正确的字符。

著录项

来源
《Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature》|2020年|52-57|共6页
会议地点
作者
Robin Schaefer; Clemens Neudecker;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Toward the optimized crowdsourcing strategy for OCR post-correction [J] . Omri Suissa, Avshalom Elmalech, Maayan Zhitomirsky-Geffet Aslib Proceedings . 2020,第2期

机译：对OCR后纠正的优化众包策略
2. A two-step approach for automatic microscopic image segmentation using fuzzy clustering and neural discrimination [J] . S. Colantonio, O. Salvetti, I. B. Gurevich Pattern recognition and image analysis: advances in mathematical theory and applications in the USSR . 2007,第3期

机译：基于模糊聚类和神经识别的两步自动显微图像分割方法
3. Automatic Identification of Bond Information Based on OCR and NLP [J] . Jizhe Dai, Zhengyan Ma Journal of Computers . 2019,第6期

机译：基于OCR和NLP的债券信息自动识别
4. From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction [C] . Mika Haemaelaeinen, Simon Hengchen International conference on recent advances in natural language processing . 2019

机译：从Paft到Fiiture：用于OCR后改正的全自动NMT和单词嵌入方法
5. A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques. [D] . Singh, Amriteshwar. 2011

机译：一种使用光学字符识别（OCR）和自动语音识别（ASR）技术的自动邮政地址识别系统的多模式融合方法。
6. A two-step Convolutional Neural Network based Computer-aided detection scheme for automatically segmenting adipose tissue volume depicting on CT images [O] . Yunzhi Wang, Yuchen Qiu, Theresa Thai, -1

机译：基于两步卷积神经网络的计算机辅助检测方案用于自动分割在CT图像上描绘的脂肪组织体积
7. From the Paft to the Fiiture: a Fully Automatic NMT andWord Embeddings Method for OCR Post-Correction [O] . Mika Hämäläinen, Simon Hengchen 2019

机译：从点对手到Fiiture：一个全自动的NMT和Wind Embeddings方法，用于OCR后校正
8. Statistical Approach to the Generation of a Database for Evaluating OCR Software [R] . Brundick, F. S. , Brodeen, A. E. , Taylor, M. S. 2000

机译：用于评估OCR软件的数据库生成的统计方法

A Two-Step Approach for Automatic OCR Post-Correction

摘要

著录项

相似文献

相关主题

期刊订阅