首页> 外文会议>IEEE Winter Conference on Applications of Computer Vision >Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context
【24h】

Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context

机译:全局表提取器(GTE):使用视觉上下文的联合表识别和单元结构识别的框架

获取原文

摘要

Documents are often used for knowledge sharing and preservation in business and science, within which are tables that capture most of the critical data. Unfortunately, most documents are stored and distributed as PDF or scanned images, which fail to preserve logical table structure. Recent vision-based deep learning approaches have been proposed to address this gap, but most still cannot achieve state-of-the-art results. We present Global Table Extractor (GTE), a vision-guided systematic framework for joint table detection and cell structured recognition, which could be built on top of any object detection model. With GTE-Table, we invent a new penalty based on the natural cell containment constraint of tables to train our table network aided by cell location predictions. GTE-Cell is a new hierarchical cell detection network that leverages table styles. Further, we design a method to automatically label table and cell structure in existing documents to cheaply create a large corpus of training and test data. We use this to enhance PubTabNet with cell labels and create FinTabNet, real-world and complex scientific and financial datasets with detailed table structure annotations to help train and test structure recognition. Our framework surpasses previous state-of-the-art results on the ICDAR 2013 and ICDAR 2019 table competition in both table detection and cell structure recognition. Further experiments demonstrate a greater than 45% improvement in cell structure recognition when compared to a vanilla RetinaNet object detection model in our new out-of-domain FinTabNet.
机译:文档通常用于商业和科学中的知识共享和保存,其中包括捕获大多数关键数据的表。不幸的是,大多数文档都被存储和分发为PDF或扫描图像,这不能保留逻辑表结构。最近的基于视觉的深度学习方法已经提出解决这种差距,但大多数仍然无法实现最先进的结果。我们呈现全球表提取器(GTE),视觉引导系统框架,用于联合表检测和单元结构识别,这可以构建在任何物体检测模型的顶部。使用GTE-TABLE,我们根据小区位置预测,根据表的自然单元格限制来培训我们的表网络的新罚则。 GTE-COLL是一种新的分层单元检测网络,可利用表格样式。此外,我们设计一种方法来自动标记现有文档中的表和单元结构,以便宜地创建大型培训和测试数据的语料库。我们使用它来增强PubTabnet与单元格标签,并创建FintabNet,现实世界和复杂的科学和金融数据集,详细的表结构注释来帮助列车和测试结构识别。我们的框架超越了以前的ICDAR 2013和ICDAR 2019年表竞赛的最先进的结果,既表格检测和细胞结构识别。进一步的实验表明,与我们新的域名FintabNet中的香草视网网对象检测模型相比,细胞结构识别的提高大于45%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号