首页> 外国专利> Digital organization of printed documents according to extracted semantic information

Digital organization of printed documents according to extracted semantic information

机译:根据提取的语义信息对打印文档进行数字组织

摘要

A method of analyzing and organizing printed documents is performed at a computing system having one or more processors and memory. The method includes receiving one or more printed documents, each including one or more pages. The method includes processing each page of each printed document. The method includes scanning the respective page to obtain an image file. The method also includes determining a document class for the respective page by inputting the image file to one or more trained classifier models, and generating a semantic analyzer pipeline including at least an optical character recognition (OCR)-based semantic analyzer. The method also includes applying the OCR-based semantic analyzer to the preprocessed output page to generate a preprocessed output page and to extract semantic information corresponding to the respective page. The method includes determining a digital organization for the respective printed document based on the extracted semantic information and the document class.
机译:在具有一个或多个处理器和存储器的计算系统上执行一种分析和组织打印文档的方法。该方法包括接收一个或多个打印文档,每个包括一个或多个页面。该方法包括处理每个打印文档的每一页。该方法包括扫描各个页面以获得图像文件。该方法还包括通过将图像文件输入到一个或多个训练过的分类器模型来确定相应页面的文档类别,以及生成至少包括基于光学字符识别(OCR)的语义分析器的语义分析器管线。该方法还包括将基于OCR的语义分析器应用于预处理的输出页面以生成预处理的输出页面并提取与各个页面相对应的语义信息。该方法包括基于所提取的语义信息和文档类别为各个打印文档确定数字组织。

著录项

  • 公开/公告号US10769503B1

    专利类型

  • 公开/公告日2020-09-08

    原文格式PDF

  • 申请/专利权人 ZORROA CORPORATION;

    申请/专利号US201916395151

  • 申请日2019-04-25

  • 分类号G06K9/72;G06K9;G06K9/62;G06K9/68;H04N1;

  • 国家 US

  • 入库时间 2022-08-21 11:27:50

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号