首页> 外文期刊>Advances in multimedia >Text Extraction from Historical Document Images by the Combination of Several Thresholding Techniques
【24h】

Text Extraction from Historical Document Images by the Combination of Several Thresholding Techniques

机译:结合多种阈值技术从历史文献图像中提取文本

获取原文
           

摘要

This paper presents a new technique for the binarization of historical document images characterized by deteriorations and damages making their automatic processing difficult at several levels. The proposed method is based on hybrid thresholding combining the advantages of global and local methods and on the mixture of several binarization techniques. Two stages have been included. In the first stage, global thresholding is applied on the entire image and two different thresholds are determined from which the most of image pixels are classified intoforegroundorbackground. In the second stage, the remaining pixels are assigned toforegroundorbackgroundclasses based on local analysis. In this stage, several local thresholding methods are combined and the final binary value of each remaining pixel is chosen as the most probable one. The proposed technique has been tested on a large collection of standard and synthetic documents and compared with well-known methods using standard measures and was shown to be more powerful.
机译:本文提出了一种新技术,对历史文档图像进行二值化处理,其特征是劣化和损坏,使其难以在几个级别上进行自动处理。所提出的方法是基于混合阈值,结合了全局和局部方法的优点以及几种二值化技术的混合。包括两个阶段。在第一阶段,将全局阈值应用于整个图像,并确定两个不同的阈值,据此可将大多数图像像素分类为前景或背景。在第二阶段,基于局部分析将剩余像素分配给前景或背景类。在此阶段,将几种局部阈值方法组合在一起,并选择每个剩余像素的最终二进制值作为最可能的像素值。所提议的技术已在大量标准和合成文档中进行了测试,并与使用标准方法的众所周知的方法进行了比较,结果显示其功能更强大。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号