首页> 外文会议>IADIS International Conference Information Systems >EMPIRICAL EVALUATION OF CRF-BASED BIBLIOGRAPHY EXTRACTION FROM RESEARCH PAPERS
【24h】

EMPIRICAL EVALUATION OF CRF-BASED BIBLIOGRAPHY EXTRACTION FROM RESEARCH PAPERS

机译:基于CRF的参考书目提取研究论文的实证评价

获取原文

摘要

We proposed an automatic bibliography extraction method for research papers scanned with OCR markup. The method uses conditional random fields (CRF) to label serially OCRed text lines in the article title page as appropriate bibliographic element names. Although we achieved good extraction accuracies for some Japanese academic journals, extraction errors are inevitable. Therefore, this paper proposes three confidence measures for bibliography labeling to detect such extraction errors. This paper also reports an empirical evaluation of CRF-based page analysis for research papers on the basis not only of labeling accuracy but also of labeling error detection. We applied the three confidence measures to labeling three academic journals published in Japan. The experiments showed that the proposed confidence measures reasonably indicated the labeling accuracies and could be used for error detection.
机译:我们提出了一种用OCR标记扫描研究论文的自动参考文献提取方法。该方法使用条件随机字段(CRF)将文章标题页中的串行调制文本线标记为适当的书目名称。虽然我们为一些日本学术期刊取得了良好的提取精度,但提取错误是不可避免的。因此,本文提出了参考书目标记检测此类提取误差的三种置信度。本文还报告了基于CRF的页面分析对研究论文的实证评估,而不仅仅是标记精度,而且还报告了标签精度,也是标记错误检测。我们将三项信心措施应用于标志在日本发表的三个学术期刊。实验表明,拟议的置信度措施合理地表明了标记精度,可用于误差检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号