首页> 外文会议>Document recognition and retrieval XVI >Locating and Parsing Bibliographical References in HTML Medical Articles
【24h】

Locating and Parsing Bibliographical References in HTML Medical Articles

机译:在HTML医学文章中查找和解析书目参考

获取原文
获取原文并翻译 | 示例

摘要

Bibliographical references that appear in journal articles can provide valuable hints for subsequent information extraction. We describe our statistical machine learning algorithms for locating and parsing such references from HTML medical journal articles. Reference locating identifies the reference sections and then decomposes them into individual references. We formulate reference locating as a two-class classification problem based on text and geometric features. An evaluation conducted on 500 articles from 100 journals achieves near perfect precision and recall rates for locating references. Reference parsing is to identify components, e.g. author, article title, journal title etc., from each individual reference. We implement and compare two reference parsing algorithms. One relies on sequence statistics and trains a Conditional Random Field. The other focuses on local feature statistics and trains a Support Vector Machine to classify each individual word, and then a search algorithm systematically corrects low confidence labels if the label sequence violates a set of predefined rules. The overall performance of these two reference parsing algorithms is about the same: above 99% accuracy at the word level, and over 97% accuracy at the chunk level.
机译:期刊文章中出现的参考文献参考可以为后续信息提取提供有价值的提示。我们描述了用于从HTML医学期刊文章中查找和解析此类引用的统计机器学习算法。参考定位确定参考部分,然后将其分解为单独的参考。我们根据文字和几何特征将参考定位公式定义为两类分类问题。对来自100种期刊的500篇文章进行的评估可实现近乎完美的精确度和查找参考文献的查全率。参考解析是为了识别组件,例如作者,文章标题,期刊标题等,来自每个参考文献。我们实现并比较了两种参考解析算法。一个依赖于序列统计并训练条件随机场。另一个专注于局部特征统计并训练支持向量机对每个单词进行分类,然后,如果标签序列违反了一组预定义规则,则搜索算法会系统地纠正低置信度标签。这两种参考解析算法的总体性能大致相同:字级的准确性高于99%,块级的准确性超过97%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号