Locating and Parsing Bibliographical References in HTML Medical Articles

机译：在HTML医学文章中查找和解析书目参考

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Bibliographical references that appear in journal articles can provide valuable hints for subsequent information extraction. We describe our statistical machine learning algorithms for locating and parsing such references from HTML medical journal articles. Reference locating identifies the reference sections and then decomposes them into individual references. We formulate reference locating as a two-class classification problem based on text and geometric features. An evaluation conducted on 500 articles from 100 journals achieves near perfect precision and recall rates for locating references. Reference parsing is to identify components, e.g. author, article title, journal title etc., from each individual reference. We implement and compare two reference parsing algorithms. One relies on sequence statistics and trains a Conditional Random Field. The other focuses on local feature statistics and trains a Support Vector Machine to classify each individual word, and then a search algorithm systematically corrects low confidence labels if the label sequence violates a set of predefined rules. The overall performance of these two reference parsing algorithms is about the same: above 99% accuracy at the word level, and over 97% accuracy at the chunk level.

机译：期刊文章中出现的参考文献参考可以为后续信息提取提供有价值的提示。我们描述了用于从HTML医学期刊文章中查找和解析此类引用的统计机器学习算法。参考定位确定参考部分，然后将其分解为单独的参考。我们根据文字和几何特征将参考定位公式定义为两类分类问题。对来自100种期刊的500篇文章进行的评估可实现近乎完美的精确度和查找参考文献的查全率。参考解析是为了识别组件，例如作者，文章标题，期刊标题等，来自每个参考文献。我们实现并比较了两种参考解析算法。一个依赖于序列统计并训练条件随机场。另一个专注于局部特征统计并训练支持向量机对每个单词进行分类，然后，如果标签序列违反了一组预定义规则，则搜索算法会系统地纠正低置信度标签。这两种参考解析算法的总体性能大致相同：字级的准确性高于99％，块级的准确性超过97％。

著录项

来源
《Document recognition and retrieval XVI》|2009年|724708.1-724708.12|共12页
会议地点 San Jose CA(US);
作者
Jie Zou; Daniel Le; George R. Thoma;
展开▼
作者单位

Lister Hill National Center for Biomedical Communications, National Library of Medicine 8600 Rockville Pike, Bethesda, MD 20894;

Lister Hill National Center for Biomedical Communications, National Library of Medicine 8600 Rockville Pike, Bethesda, MD 20894;

Lister Hill National Center for Biomedical Communications, National Library of Medicine 8600 Rockville Pike, Bethesda, MD 20894;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
reference parsing; HTML document analysis; document object model (DOM); support vector machine (SVM); conditional random field (CRF);

机译：参考解析； HTML文档分析；文档对象模型（DOM）；支持向量机（SVM）；条件随机场（CRF）;

相似文献

外文文献
中文文献
专利

1. Locating and parsing bibliographic references in HTML medical articles [J] . Jie Zou, Daniel Le, George R. Thoma International Journal on Document Analysis and Recognition . 2010,第2期

机译：在HTML医学文章中查找和解析书目参考
2. Relevance of bibliographic references in articles published in medical journals [J] . Aguirre C Marcela, CONICy Santiago Chile, Oyarzún G Manuel, Revista Chilena de Enfermedades Respiratorias . 2012,第2期

机译：医学期刊上发表的文章中书目参考的相关性
3. Improved bibliographic reference parsing based on repeated patterns [J] . Guido Sautter, Klemens Boehm International journal on digital libraries . 2014,第1a2期

机译：基于重复模式的改进书目参考解析
4. Locating and Parsing Bibliographical References in HTML Medical Articles [C] . Jie Zou, Daniel Le, George R. Thoma SPIE Conference on Document Recognition and Retrieval . 2009

机译：在HTML医学文章中定位和解析参考书目的参考
5. A classified, annotated bibliography of trumpet articles from selected medical and science periodicals. [D] . Leopold, Gary Adrian, Jr. 2008

机译：来自某些医学和科学期刊的小号文章的分类，带注释的参考书目。
6. Locating and parsing bibliographic references in HTML medical articles [O] . Jie Zou, Daniel Le, George R. Thoma -1

机译：在HTML医学文章中定位和解析参考书目的参考
7. Volume 25, Issue 1 (March 2021) Physiol Pharmacol 2021, 25(1): 1-6 Back to browse issues page 10.32598/ppj.25.1.10 XML Print Download citation: BibTeX RIS EndNote Medlars ProCite Reference Manager RefWorks Send citation to: Mendeley Zotero RefWorks Najafi H, Zarei R, Alimoradian A, Asafari M, Mohammadi M, Samadi F, et al . A review on humane endpoints in animal experimentation for biomedical research. Physiol Pharmacol. 2021; 25 (1) :1-6 URL: http://ppj.phypha.ir/article-1-1606-en.html A review on humane endpoints in animal experimentation for biomedical research [O] . Houshang Najafi, Reza Zarei, Abbas Alimoradian, 2020

机译：第25卷，问题1（3月2021）Physiol pharmacol 2021,25（1）：1-6 返回浏览问题Page 10.32598 / PPJ.25.1.10 XML打印下载引用：BIBTEX 里斯终点麦德劳 procite 参考管理器 Refworks发送引用：Mendeley Zotero Refworks Najafi H，Zarei R，Alimoradian A，Asafari M，Mohammadi M，Samadi F等人。生物医学研究动物实验中人文终点综述。 physiol pharmacol。 2021; 25（1）：1-6 URL：http：//ppj.phypha.ir/article-1-1606-en.html在生物医学研究中的动物实验中对人文终点的评论

Locating and Parsing Bibliographical References in HTML Medical Articles

摘要

著录项

相似文献

相关主题

期刊订阅