首页> 外文学位 >Handwriting identification, matching, and indexing in noisy document images.
【24h】

Handwriting identification, matching, and indexing in noisy document images.

机译:在嘈杂的文档图像中进行手写识别,匹配和索引。

获取原文
获取原文并翻译 | 示例

摘要

Throughout history, handwriting has been the primary means of recording information that is persevered across both time and space. With the coming of the electronic document era, we are challenged with making an enormous amount of handwritten documents available for electronic access. Though many handwritten documents contain only handwriting, now, more are mixed with printed text, noise, and background patterns. The mixture of handwriting with other components presents a great challenge for making an original document electronically accessible.; Many handwritten documents come together with a special background pattern, rule lines, which are printed on the paper to guide writing. After digitization, rule lines will touch text and cause problems for further document image analysis if they are not detected and removed. In this dissertation, we present a rule line detection algorithm based on hidden Markov model (HMM) decoding, achieving both high detection accuracy and a low false alarm rate. After detection, line removal is performed by line width thresholding.; Handwriting often mixes with printed text, such as signatures and annotations on a business letter. Handwriting in a printed document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content. The data set we are processing is noisy, which makes the problem more challenging. In this dissertation, we first segment the document at a suitable level, and then classify each segmented block as machine printed text, handwriting, or noise. Markov random field (MRF) based post-processing is exploited to refine the classification results.; The identified handwriting may be further analyzed. In this dissertation, we propose a novel point-pattern based handwriting snatching technique and apply it for handwriting synthesis and retrieval. We formulate point matching as an optimization problem trying to preserve the local neighborhood structures. After establishing the correspondence between two handwriting samples, we warp one sample toward the other using the thin plate spline (TPS) deformation model to synthesize new handwriting samples. We also apply our matching algorithm for handwriting retrieval since it is much easier to define robust features based on the matching results.
机译:纵观历史,手写一直是记录在时间和空间上持久存在的信息的主要手段。随着电子文档时代的到来,我们面临着使大量手写文档可用于电子访问的挑战。尽管许多手写文档仅包含手写内容,但现在更多的是混合了印刷文本,杂色和背景图案。手写与其他组成部分的混合提出了使电子原始文档可访问的巨大挑战。许多手写文档带有特殊的背景图案和规则线,这些规则图案被打印在纸上以指导书写。数字化后,如果规则线没有被检测到并被删除,它们将接触文本并引起问题,以进行进一步的文档图像分析。本文提出了一种基于隐马尔可夫模型(HMM)解码的规则线检测算法,可以实现较高的检测精度和较低的误报率。检测后,通过线宽阈值执行线去除。手写通常与印刷文本混合在一起,例如商务信函上的签名和注释。印刷文档中的笔迹通常指示更正,添加或其他补充信息,应与主要内容区别对待。我们正在处理的数据集比较嘈杂,这使问题更具挑战性。本文首先对文档进行适当的分割,然后将每个分割后的块分类为机器打印的文本,手写或杂色。利用基于马尔可夫随机场(MRF)的后处理来完善分类结果。所识别的笔迹可以被进一步分析。本文提出了一种新颖的基于点模式的笔迹抢夺技术,并将其应用于笔迹的合成与检索。我们将点匹配公式化为试图保留局部邻域结构的优化问题。建立两个手写样本之间的对应关系后,我们使用薄板样条(TPS)变形模型将一个样本向另一个样本弯曲,以合成新的手写样本。我们还将匹配算法应用于笔迹检索,因为根据匹配结果定义健壮特征要容易得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号