首页> 外文会议>PSB;Pacific symposium on biocomputing; 20090105-09;20090105-09; Kohala Coast, HI(US);Kohala Coast, HI(US) >SUPERVISED CLASSIFICATION OF ARRAY CGH DATA WITH HMM-BASED FEATURE SELECTION
【24h】

SUPERVISED CLASSIFICATION OF ARRAY CGH DATA WITH HMM-BASED FEATURE SELECTION

机译:基于HMM的特征选择对阵列CGH数据进行监督分类

获取原文
获取原文并翻译 | 示例

摘要

Motivation: For different tumour types, extended knowledge about the molecular mechanisms involved in tumorigenesis is lacking. Looking for copy number variations (CNV) by Comparative Genomic Hybridization (CGH) can help however to determine key elements in this tumorigenesis. As genome-wide array CGH gives the opportunity to evaluate CNV at high resolution, this leads to huge amount of data, necessitating adequate mathematical methods to carefully select and interpret these data. Results: Two groups of patients differing in cancer subtype were defined in two publicly available array CGH data sets as well as in our own data set on ovarian cancer. Chromosomal regions characterizing each group of patients were gathered using recurrent hidden Markov Models (HMM). The differential regions were reduced to a subset of features for classification by integrating different univariate feature selection methods. Weighted Least Squares Support Vector Machines (LS-SVM), a supervised classification method which takes unbalancedness of data sets into account, resulted in leave-one-out or 10-fold cross-validation accuracies ranging from 88 to 95.5%. Conclusion: The combination of recurrent HMMs for the detection of copy number alterations with LS-SVM classifiers offers a novel methodological approach for classification based on copy number alterations. Additionally, this approach limits the chromosomal regions that are necessary to classify patients according to cancer subtype.
机译:动机:对于不同类型的肿瘤,缺乏有关肿瘤发生的分子机制的扩展知识。然而,通过比较基因组杂交(CGH)寻找拷贝数变异(CNV)可以帮助确定此肿瘤发生过程中的关键要素。由于全基因组阵列CGH提供了以高分辨率评估CNV的机会,因此会导致产生大量数据,因此需要足够的数学方法来仔细选择和解释这些数据。结果:在两个可公开获得的阵列CGH数据集以及我们自己的卵巢癌数据集中定义了两组癌症亚型不同的患者。使用复发隐式马尔可夫模型(HMM)收集表征每组患者的染色体区域。通过整合不同的单变量特征选择方法,将差异区域缩小为特征的子集以进行分类。加权最小二乘支持向量机(LS-SVM)是一种监督分类方法,考虑了数据集的不平衡性,其留一法或十倍交叉验证的准确性在88%至95.5%之间。结论:结合LS-SVM分类器的循环HMMs用于检测拷贝数变化,为基于拷贝数变化的分类提供了一种新颖的方法学方法。另外,这种方法限制了根据癌症亚型对患者进行分类所必需的染色体区域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号