首页> 外文期刊>BioMed research international >Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection
【24h】

Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection

机译:使用基于上下文相似性的特征选择改善蛋白质相互作用文章的分类

获取原文
           

摘要

Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of theF1measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.
机译:蛋白质相互作用文章分类是生物学领域中的文本分类任务,以确定哪些文章描述了蛋白质-蛋白质相互作用。由于文本分类中的特征空间是高维的,因此特征选择被广泛用于减小特征的维数以在不牺牲分类性能的情况下加快计算速度。许多现有的特征选择方法都是基于文档频率和术语频率的统计度量。这些方法的一个潜在缺点是它们分别处理特征。因此,首先,我们在上下文信息之间设计一个相似性度量,以考虑单词共现和特征周围的短语块。然后,我们将上下文信息的相似性引入到功能的重要性度量中,以替代文档和术语频率。因此,我们提出了一种新的基于上下文相似度的特征选择方法。在两个蛋白质相互作用文章集上评估了它们的性能,并与基于频率的方法进行了比较。实验结果表明,基于上下文相似度的方法在F1度量和降维率方面表现更好。受益于围绕特征的上下文信息,所提出的方法可以有效地选择独特的特征用于蛋白质相互作用物品的分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号