首页> 外文会议>Fuzzy Systems and Knowledge Discovery,(FSKD), 2008 Fifth International Conference on >Detection of Protein Subcellular Localization Based on a Full Syntactic Parser and Semantic Information
【24h】

Detection of Protein Subcellular Localization Based on a Full Syntactic Parser and Semantic Information

机译:基于完整句法分析器和语义信息的蛋白质亚细胞定位检测

获取原文

摘要

A proteinȁ9;s subcellular localization is considered an essential part of the description of its associated biomolecular phenomena. As the volume of biomolecular reports has increased, there has been a great deal of research on text mining to detect protein subcellular localization information in documents. It has been argued that linguistic information, especially syntactic information, is useful for identifying the subcellular localizations of proteins of interest. However, previous systems for detecting protein subcellular localization information used only shallow syntactic parsers, and showed poor performance. Thus, there remains a need to use a full syntactic parser and to apply deep linguistic knowledge to the analysis of text for protein subcellular localization information. In addition, we have attempted to use semantic information from the WordNet thesaurus. To improve performance in detecting protein subcellular localization information, this paper proposes a three-step method based on a full syntactic dependency parser and semantic information. In the first step, we construct syntactic dependency paths from each protein to its location candidate. In the second step, we retrieve root information of the syntactic dependency paths. In the final step, we extract syn-semantic patterns of protein subtrees and location subtrees. From the root and subtree nodes, we extract syntactic category and syntactic direction as syntactic information, and synset offset of the WordNet thesaurus as semantic information. According to the root information and syn-semantic patterns of subtrees, we extract (protein, localization) pairs. Even with no biomolecular knowledge, our method shows reasonable performance in experimental results using Medline abstract data. In fact, our proposed method gave an F-measure of 74.53% for training data and 58.90% for test data, significantly outperforming previous methods, by 12ȁ3;25%.
机译:9蛋白的亚细胞定位被认为是其相关生物分子现象描述的重要组成部分。随着生物分子报告数量的增加,已经进行了大量有关文本挖掘以检测文档中蛋白质亚细胞定位信息的研究。有人争辩说,语言信息,特别是句法信息,对于识别目标蛋白质的亚细胞定位很有用。但是,以前的用于检测蛋白质亚细胞定位信息的系统仅使用浅语法分析器,并且显示出较差的性能。因此,仍然需要使用完整的语法分析器,并将深厚的语言知识应用于蛋白质亚细胞定位信息的文本分析。此外,我们尝试使用WordNet同义词库中的语义信息。为了提高蛋白质亚细胞定位信息的检测性能,本文提出了一种基于完整句法依赖解析器和语义信息的三步法。第一步,我们构建从每种蛋白质到其候选位置的句法依赖路径。在第二步中,我们检索语法依赖路径的根信息。在最后一步中,我们提取蛋白质子树和位置子树的同义模式。从根和子树节点中,提取句法类别和句法方向作为句法信息,并提取WordNet词库的同义词集偏移作为语义信息。根据子树的根信息和同义模式,我们提取(蛋白质,定位)对。即使没有生物分子知识,我们的方法在使用Medline抽象数据的实验结果中也显示出合理的性能。实际上,我们提出的方法对训练数据的F度量为74.53%,对测试数据的F度量为58.90%,显着优于以前的方法,为12ȁ3; 25%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号