首页> 外文期刊>Computers in Biology and Medicine >A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining
【24h】

A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining

机译:基于最大信息系数和克施密特正交化的滤波器特征选择方法对生物医学数据挖掘的基础

获取原文
获取原文并翻译 | 示例
           

摘要

Abstract A filter feature selection technique has been widely used to mine biomedical data. Recently, in the classical filter method minimal-Redundancy-Maximal-Relevance (mRMR), a risk has been revealed that a specific part of the redundancy, called irrelevant redundancy, may be involved in the minimal-redundancy component of this method. Thus, a few attempts to eliminate the irrelevant redundancy by attaching additional procedures to mRMR, such as Kernel Canonical Correlation Analysis based mRMR (KCCAmRMR), have been made. In the present study, a novel filter feature selection method based on the Maximal Information Coefficient (MIC) and Gram-Schmidt Orthogonalization (GSO), named Orthogonal MIC Feature Selection (OMICFS), was proposed to solve this problem. Different from other improved approaches under the max-relevance and min-redundancy criterion, in the proposed method, the MIC is used to quantify the degree of relevance between feature variables and target variable, the GSO is devoted to calculating the orthogonalized variable of a candidate feature with respect to previously selected features, and the max-relevance and min-redundancy can be indirectly optimized by maximizing the MIC relevance between the GSO orthogonalized variable and target. This orthogonalization strategy allows OMICFS to exclude the irrelevant redundancy without any additional procedures. To verify the performance, OMICFS was compared with other filter feature selection methods in terms of both classification accuracy and computational efficiency by conducting classification experiments on two types of biomedical datasets. The results showed that OMICFS outperforms the other methods in most cases. In addition, differences between these methods were analyzed, and the application of OMICFS in the mining of high-dimensional biomedical data was discussed. The Matlab code for the proposed method is available at https://github.com/lhqxinghun/bioinformatics/tree/master/OMICFS/ . Highlights ? A novel filter feature selection method named OMICFS is proposed. ? MIC statistics is employed to quantify the relevance between features and target. ? An orthogonalization strategy is used to deal with the irrelevant redundancy risk. ? The performance is compared in terms of both accuracy and efficiency.
机译:摘要滤波器特征选择技术已广泛用于挤出生物医学数据。近来,在古典滤波器方法中最小冗余 - 最大关联(MRMR),已经揭示了一种冗余的特定部分,称为无关冗余,可以参与该方法的最小冗余分量。因此,通过将另外的程序附加到MRMR,例如基于核(KCCAMRMR)的MRMR(KCCAMRMR),少数尝试消除无关冗余。在本研究中,提出了一种基于最大信息系数(MIC)和GRAM-SCHMIDT正交化(GSO)的新型过滤器特征选择方法,命名正交麦克风特征选择(OMICFS)以解决这个问题。在最大相关性和最小冗余标准下的其他改进方法不同,在所提出的方法中,MIC用于量化特征变量和目标变量之间的相关程度,GSO被致力于计算候选的正交化变量关于先前选择的特征的特征,并且可以通过最大化GSO正交化变量和目标之间的麦克风相关性来间接优化最大相关性和最小冗余。这种正交化策略允许OMICFS在没有任何其他程序的情况下排除无关冗余。为了通过在两种类型的生物医学数据集中进行分类实验,将OMICFS与其他过滤器特征选择方法进行比较。结果表明,在大多数情况下,OMICFS在其他方法中优于其他方法。另外,分析了这些方法之间的差异,讨论了OMICFS在高维生物医学数据采集中的应用。所提出的方法的MATLAB代码可在https://github.com/lhqxinghun/bioinformatics/tree/master/omicfs/处获得。强调 ?提出了一种名为OMICFS的新型过滤器特征选择方法。还MIC统计数据用于量化特征与目标之间的相关性。还正交化策略用于处理无关的冗余风险。还在精度和效率方面比较性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号