首页> 外文会议>International Symposium on Knowledge Exploration in Life Science Informatics(KELSI 2004); 20041125-26; Milan(IT) >Analysis of Protein/Protein Interactions Through Biomedical Literature: Text Mining of Abstracts vs. Text Mining of Full Text Articles
【24h】

Analysis of Protein/Protein Interactions Through Biomedical Literature: Text Mining of Abstracts vs. Text Mining of Full Text Articles

机译:通过生物医学文献分析蛋白质/蛋白质相互作用:摘要的文本挖掘与全文文章的文本挖掘

获取原文
获取原文并翻译 | 示例

摘要

The challenge of knowledge management in the pharmaceutical industry is twofold. First it has to address the integration of sequence data with the vast and growing body of data from functional analysis of genes with the information in huge historical archival databases. Second, as the number of bio-medical publications exponentially increases (Medline now contains more than 13 million records), researchers require assistance in order to broaden their vision and comprehension of scientific domains. Analogous to data mining in the sense that it uncovers relationships in information, text mining uncovers relationships in a text collection and leverages the creativity of the knowledge worker in the exploration of these relationships and in the discovery of new knowledge. We describe herein a text mining method to automatically detect protein interactions which are described across a large amount of scientific publications. This method relies on natural language processing to identify protein names, their synonyms and the various interactions they can bear with other proteins. We have then compared text mining analysis on abstracts to the same kind of analysis on full text articles to assess how much information is lost when only abstracts are processed. Our results show that: 1)LexiQuest Mine is a very versatile and accurate tool when mining biomedical literature to analyze interactions between proteins. 2)Mining only abstracts can be sufficient and time saving for applications that do not require a high level of detail on a large scale whereas mining full text articles is to be chosen for more exhaustive applications designed to address a specific issue. Availability: LexiQuest Mine is available for commercial licensing from SPSS, Inc.
机译:制药行业知识管理的挑战是双重的。首先,它必须解决序列数据与来自基因功能分析的庞大且不断增长的数据集成问题,以及庞大的历史档案数据库中的信息。其次,随着生物医学出版物数量成倍增加(Medline现在包含超过1300万条记录),研究人员需要帮助以拓宽视野和理解科学领域。从数据挖掘揭示信息中的关系的意义上讲,类似于数据挖掘,文本挖掘在文本集合中揭示关系,并在探索这些关系和发现新知识时利用知识工作者的创造力。我们在本文中描述了一种文本挖掘方法,该方法可自动检测蛋白质相互作用,这在大量科学出版物中都有描述。这种方法依靠自然语言处理来识别蛋白质名称,它们的同义词以及它们与其他蛋白质的各种相互作用。然后,我们将对摘要的文本挖掘分析与对全文文章的同类分析进行了比较,以评估仅处理摘要时丢失了多少信息。我们的结果表明:1)LexiQuest Mine在挖掘生物医学文献以分析蛋白质之间的相互作用时是一种非常通用且准确的工具。 2)对于不需要大量详细信息的应用程序,仅挖掘摘要就足够了,并且可以节省时间,而对于那些旨在解决特定问题的更详尽的应用程序,则应选择挖掘全文文章。可用性:LexiQuest矿可从SPSS,Inc.获得商业许可。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号