...
首页> 外文期刊>Bioinformatics >Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users
【24h】

Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users

机译:生物医学文本的多维分类:向各种用户提供自动化,实用的高实用性文本

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no average biologist client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks.Results: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice.
机译:动机:目前在生物医学文本挖掘中的许多研究都涉及通过从科学文本中提取某些信息来为生物学家服务。我们注意到没有普通的生物学家客户;不同的用户有不同的需求。例如,正如过去的评估工作(BioCreative,TREC,KDD)所指出的那样,数据库管理员经常对显示实验证据和方法的句子感兴趣。相反,寻找有关蛋白质已知信息的实验室科学家可能会寻找事实,通常会高度自信地说。如果文本挖掘系统可以首先识别用户感兴趣的丰富科学内容类型的文本区域,检索具有许多此类区域的文档并关注事实,则该系统可以针对特定的最终用户并提高其效率。从这些地区提取。在这里,我们研究了自动表征和分类此类文本的能力。我们最近引入了一种多维分类和注释方案,旨在将其应用于各种生物医学文献和科学陈述,同时旨在支持特定的生物医学检索和提取任务。结果:注释方案已应用于大型语料库由八个独立的注释者控制,其中三个独立的注释者分别标记每个句子。然后,我们训练并测试了机器学习分类器,以基于注释对句子片段进行自动分类。我们在这里讨论此任务涉及的问题,并提供结果概述。后者强烈建议沿大多数维度进行自动注释是高度可行的,并且这种用于科学句子分类的新框架可在实践中应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号