首页> 外文学位 >A multiscale domain-independent algorithm for document image segmentation.
【24h】

A multiscale domain-independent algorithm for document image segmentation.

机译:一种多尺度域无关的文档图像分割算法。

获取原文
获取原文并翻译 | 示例

摘要

Document Image Segmentation is a crucial step in the conversion process for paper document images into electronic documents. Entities in a document image, such as text blocks, tables and figures need to be separated before further document analysis and recognition can occur. Many Document Segmentation algorithms are designed exclusively for a few specific document types, utilizing highly-specialized document models.; This thesis presents a domain independent segmenter which does not assume specific document layout models in its segmentation. The segmenter utilizes a minimal amount of image domain knowledge. Segmentation of graphic and text entities is based purely on their geometric attributes and tonal values. Entities from the document images are extracted as non-overlapping sub-images by the segmenter.; The segmenter is a general-purpose tool, which can be used for segmentation tasks when domain specific models would be inappropriate, for example, in the purposes of image retrieval. The output of the segmenter can also be used to identify the domain of a document. Subsequently an algorithm specific for that domain may be applied to the image to produce a refined segmentation. The segmenter can also act as a pre-segmenter to separate out document entities so that they can be resegmented by domain specific segmenters. Due to the general nature of the segmenter, it can also be used for segmenting natural images. Results of segmentation are shown on a diverse set of test images.
机译:文档图像分割是纸质文档图像到电子文档转换过程中的关键步骤。文档图像中的实体(例如文本块,表格和图形)需要先进行分离,然后才能进行进一步的文档分析和识别。利用高度专业化的文档模型,许多文档分割算法专为一些特定的文档类型而设计。本文提出了一种独立于域的分割器,该分割器在分割时不采用特定的文档布局模型。分割器利用最少的图像域知识。图形和文本实体的分割完全基于它们的几何属性和色调值。分割器从文档图像中提取实体作为不重叠的子图像。分割器是一种通用工具,当特定领域的模型不合适时(例如,出于图像检索的目的),该工具可用于分割任务。分段器的输出还可以用于标识文档的域。随后,可以将特定于该域的算法应用于图像,以产生精确的分割。分段器还可以充当预分段器,以分离出文档实体,以便可以由特定领域的分段器对它们进行重新分段。由于分割器的一般性质,它也可以用于分割自然图像。分割的结果显示在一组不同的测试图像上。

著录项

  • 作者

    Chen, Sean Jy-Shyang.;

  • 作者单位

    Queen's University (Canada).;

  • 授予单位 Queen's University (Canada).;
  • 学科 Computer Science.
  • 学位 M.Sc.
  • 年度 2003
  • 页码 119 p.
  • 总页数 119
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号