【24h】

Document image segmentation and text area ordering

机译:文档图像分割和文本区域排序

获取原文

摘要

A system for document image segmentation and ordering text areasis described and applied to both Japanese and English complex printedpage layouts. There is no need to make any assumption about the shape ofblocks, hence the segmentation technique can handle not only skewedimages without skew-correction but also documents where column are notrectangular. In this technique, on the bottom-up strategy, the connectedcomponents are extracted from the reduced image, and classifiedaccording to their local information. The connected components aremerged into lines, and lines are merged into areas. Extracted text areasare classified as body, caption, header, and footer. A tree graph of thelayout of body texts is made, and we get the order of texts by preordertraversal on the graph. The authors introduce the influence range ofeach node, a procedure for the title part, and extraction of the whitehorizontal separator. Making it possible to get good results on variousdocuments. The total system is fast and compact
机译:用于文档图像分割和文本区域排序的系统 描述并应用于日文和英文复合印刷 页面布局。无需对形状进行任何假设 块,因此分割技术不仅可以处理偏斜 没有偏斜校正的图像,但没有列的文档 矩形的。在这项技术中,在自下而上的策略上, 从缩小的图像中提取分量,并进行分类 根据他们的当地信息。连接的组件是 合并成线,然后将线合并成区域。提取的文字区域 分为正文,标题,页眉和页脚。的树形图 制作正文文本的布局,然后按预定顺序获得文本的顺序 在图上遍历。作者介绍了影响范围。 每个节点,标题部分的过程以及白色的提取 水平分隔符。使在各种情况下获得良好的结果成为可能 文件。整个系统快速紧凑

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号