Improving OCR Accuracy on Early Printed Books by Utilizing Cross Fold Training and Voting

机译：通过交叉训练和投票提高早期印刷书籍的OCR准确性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we introduce a method that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books. The method uses a combination of cross fold training and confidence based voting. After allocating the available ground truth in different subsets several training processes are performed, each resulting in a specific OCR model. The OCR text generated by these models then gets voted to determine the final output by taking the recognized characters, their alternatives, and the confidence values assigned to each character into consideration. Experiments on seven early printed books show that the proposed method outperforms the standard approach considerably by reducing the amount of errors by up to 50% and more.

机译：在本文中，我们介绍了一种方法，该方法可以显着降低从早期印刷书籍上训练的OCRopus模型获得的OCR文本的字符错误率。该方法结合了交叉训练和基于信任的投票。在将可用的基本事实分配给不同的子集后，将执行几个训练过程，每个过程都会生成一个特定的OCR模型。然后，将这些模型生成的OCR文本进行投票，以通过考虑识别的字符，它们的替代方案以及分配给每个字符的置信度值来确定最终输出。在七本早期印刷书籍上进行的实验表明，该方法可将错误数量减少多达50％甚至更多，从而大大优于标准方法。

著录项

来源
《IAPR International Workshop on Document Analysis Systems》|2018年|423-428|共6页
会议地点
作者
Christian Reul; Uwe Springmann; Christoph Wick; Frank Puppe;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Optical character recognition software; Training; Engines; Character recognition; Tools; Text recognition; Data models;

机译：光学字符识别软件;培训;引擎;字符识别;工具;文本识别;数据模型;

相似文献

外文文献
中文文献
专利

1. Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training. [J] . Dor O, Zhou Y Proteins: Structure, Function, and Genetics . 2007,第4期

机译：通过大规模训练，二级结构预测可达到80％的十倍交叉验证准确性。
2. Star product: Tecnau and Smyth roll-fed folding and sewing book solution: Turnkey solution for converting printed rolls into sewn books [J] . Nessan Cleary Printweek . 2019,第Jula22期

机译：星级产品：TECNAU和SMYTH卷式折叠和缝纫书解决方案：用于将印刷卷转换成缝制书籍的交钥匙解决方案
3. K-FOLD CROSS-VALIDATION FOR IMPROVING MEDICAL CLASSIFICATION ACCURACY AND MODEL SELECTION IN K-NEAREST NEIGHBORS CLASSIFIERS [J] . Zhao M. Basic & clinical pharmacology & toxicology. . 2016,第Suppla1期

机译：K-fold交叉验证可提高K-近邻分类器的医学分类准确性和模型选择
4. Improving OCR Accuracy on Early Printed Books by Utilizing Cross Fold Training and Voting [C] . Christian Reul, Uwe Springmann, Christoph Wick, IAPR International Workshop on Document Analysis Systems . 2018

机译：通过利用交叉折叠训练和投票提高早期印刷书籍的OCR精度
5. In her own voice: A narrative inquiry into how three women recently diagnosed with breast cancer utilized the artist's fold book. [D] . Laux, Katherine. 2009

机译：用她自己的声音：对最近被诊断出患有乳腺癌的三名妇女如何利用艺术家的折页的叙述性询问。
6. Twelve Weeks of Sprint Interval Training Improves Indices of Cardiometabolic Health Similar to Traditional Endurance Training despite a Five-Fold Lower Exercise Volume and Time Commitment [O] . Jenna B. Gillen, Brian J. Martin, Martin J. MacInnis, -1

机译：尽管运动量和时间投入降低了五倍但与传统耐力训练相似十二周的Sprint间隔训练可以改善心脏代谢健康指标
7. Improving OCR Accuracy on Early Printed Books by utilizing Cross Fold Training and Voting [O] . Reul, Christian, Springmann, Uwe, Wick, Christoph, 2017

机译：利用交叉折叠提高早期印刷书籍的OCR准确度培训和投票

Improving OCR Accuracy on Early Printed Books by Utilizing Cross Fold Training and Voting

摘要

著录项

相似文献

相关主题

期刊订阅