首页>
外国专利>
Digital organization of printed documents according to extracted semantic information
Digital organization of printed documents according to extracted semantic information
展开▼
机译:根据提取的语义信息对打印文档进行数字组织
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method of analyzing and organizing printed documents is performed at a computing system having one or more processors and memory. The method includes receiving one or more printed documents, each including one or more pages. The method includes processing each page of each printed document. The method includes scanning the respective page to obtain an image file. The method also includes determining a document class for the respective page by inputting the image file to one or more trained classifier models, and generating a semantic analyzer pipeline including at least an optical character recognition (OCR)-based semantic analyzer. The method also includes applying the OCR-based semantic analyzer to the preprocessed output page to generate a preprocessed output page and to extract semantic information corresponding to the respective page. The method includes determining a digital organization for the respective printed document based on the extracted semantic information and the document class.
展开▼