首页> 外国专利> Document layout analysis program, computer-readable storage medium storing document layout analysis program, document layout analysis method, and document layout analysis apparatus

Document layout analysis program, computer-readable storage medium storing document layout analysis program, document layout analysis method, and document layout analysis apparatus

机译:文件格式分析程序,存储文件格式分析程序的计算机可读存储介质,文件格式分析方法和文件格式分析装置

摘要

PPROBLEM TO BE SOLVED: To provide a document analysis program capable of accurately extracting a document layout structure of an electronic document, a computer-readable storage medium storing the document layout analysis program, a document layout analysis method, and a document layout analysis device. PSOLUTION: Coordinate information about respective characters in a document image is acquired, a character string in the document image is detected based on the acquired coordinate information, and characters included in the detected character string are selected one by one. In a rectangular inspection area taking a predetermined angle of a circumscribing rectangle, which surrounds the character string, as one angle and including the circumscribing rectangle surrounding the selected characters, a character string is set by numbering the respective characters so that a character with a larger number than that for the selected characters is not included and adding the characters one by one according to the given number order. In the rectangular inspection area containing the characters already added to the character string and a newly added character, if a character other than the characters already added and the newly added character is contained, the newly added character is removed while the already added characters are combined together to be set again as one sentence. PCOPYRIGHT: (C)2005,JPO&NCIPI
机译:

要解决的问题:提供一种能够准确地提取电子文档的文档布局结构的文档分析程序,一种存储该文档布局分析程序的计算机可读存储介质,一种文档布局分析方法以及一种文档布局分析装置。

解决方案:获取有关文档图像中各个字符的坐标信息,根据获取的坐标信息检测文档图像中的字符串,并逐一选择包含在检测到的字符串中的字符。在以包围该字符串的外接矩形的预定角度为一个角度并且包括围绕所选择的字符的外接矩形的矩形检查区域中,通过对各个字符进行编号来设置字符串,以使得具有较大字符的字符被设置。不包括比所选字符大的数字,并根据给定的数字顺序一一添加字符。在包含已添加到字符串中的字符和新添加的字符的矩形检查区域中,如果包含已添加的字符以外的字符和新添加的字符,则在合并已添加的字符的同时删除新添加的字符。一起重新设置为一个句子。

版权:(C)2005,JPO&NCIPI

著录项

  • 公开/公告号JP4213558B2

    专利类型

  • 公开/公告日2009-01-21

    原文格式PDF

  • 申请/专利权人 富士通株式会社;

    申请/专利号JP20030357941

  • 发明设计人 武部 浩明;藤本 克仁;

    申请日2003-10-17

  • 分类号G06K9/20;

  • 国家 JP

  • 入库时间 2022-08-21 19:37:45

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号