首页> 外国专利> SYSTEMS AND METHODS FOR ENABLING MANUAL CLASSIFICATION OF UNRECOGNIZED DOCUMENTS TO COMPLETE WORKFLOW FOR ELECTRONIC JOBS AND TO ASSIST MACHINE LEARNING OF A RECOGNITION SYSTEM USING AUTOMATICALLY EXTRACTED FEATURES OF UNRECOGNIZED DOCUMENTS

SYSTEMS AND METHODS FOR ENABLING MANUAL CLASSIFICATION OF UNRECOGNIZED DOCUMENTS TO COMPLETE WORKFLOW FOR ELECTRONIC JOBS AND TO ASSIST MACHINE LEARNING OF A RECOGNITION SYSTEM USING AUTOMATICALLY EXTRACTED FEATURES OF UNRECOGNIZED DOCUMENTS

机译:用于使未识别文档的手工分类以完成电子作业的工作流并利用未识别文档的自动提取特征来辅助识别系统的机器学习的系统和方法

摘要

A method in a document analysis system automatically extracts image and text features from each received electronic document and compares the extracted features with feature sets associated with each category of document to determine whether the document is recognizable as belonging to a document category. If an electronic document is recognized as belonging to one of the document categories, the method classifies the electronic document as belonging to that document category. If, however, an electronic document is unrecognized, the method submits the unrecognized document to a learning phase, in which the unrecognized document is presented to a human trainer for manual classification of the unrecognized electronic document into a document category, and automatically modifies at least one of the features and the weights of the feature set of the document category corresponding to the manually-classified electronic document using the automatically extracted features of the manually-classified document.
机译:文档分析系统中的一种方法从每个接收到的电子文档中自动提取图像和文本特征,并将提取的特征与与每个文档类别相关联的特征集进行比较,以确定该文档是否可识别为属于文档类别。如果将电子文档识别为属于文档类别之一,则该方法将电子文档分类为属于该文档类别。但是,如果电子文档未被识别,则该方法将未识别的文档提交到学习阶段,在该学习阶段中,将未识别的文档提供给培训师,以将未识别的电子文档手动分类为文档类别,并至少自动修改使用手动分类的文档的自动提取的特征,与手动分类的电子文档相对应的文档类别的特征集的特征之一和权重。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号