首页> 外文会议>International Conference on Intelligent Systems for Molecular biology >Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup
【24h】

Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup

机译:评估数据库策策的文本数据挖掘:从KDD挑战杯中吸取的经验教训

获取原文

摘要

Motivation: The biological literature is a major repository of knowledge. Many biological databases draw much of their content from a careful curation of this literature. However, as the volume of literature increases, the burden of curation increases. Text mining may provide useful tools to assist in the curation process. To date, the lack of standards has made it impossible to determine whether text mining techniques are sufficiently mature to be useful. Results: We report on a Challenge Evaluationtask that we created for the Knowledge Discovery and Data Mining (KDD) Challenge Cup. We provided a training corpus of 862 articles consisting of journal articles curated in FlyBase, along with the associated lists of genes and gene products, as well asthe relevant data fields from FlyBase. For the test, we provided a corpus of 213 new ('blind') articles; the 18 participating groups provided systems that flagged articles for curation, based on whether the article contained experimental evidence for gene expression products. We report on the evaluation results and describe the techniques used by the top performing groups.
机译:动机:生物学文献是一个主要知识库。许多生物数据库从这种文献的仔细策策中吸引了他们的大部分内容。然而,随着文献量增加,策委的负担增加。文本挖掘可以提供有用的工具来协助策策过程。迄今为止,缺乏标准使得不可能确定文本挖掘技术是否足够成熟以有用。结果:我们报告了我们为知识发现和数据挖掘(KDD)挑战杯创建的挑战评估摊。我们提供了862篇文章的培训语料库,包括在Flybase中策划的日记文章,以及相关的基因和基因产品列表,以及来自Flybase的相关数据字段。对于测试,我们提供了213个新的('盲人')文章的语料库;根据本文是否含有基因表达产品的实验证据,18个参与组提供了标记用于策委文章的系统。我们报告评估结果,并描述了顶部执行组使用的技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号