首页> 外文学位 >Finding multiple clustering structures in data, with applications to DNA microarrays.

【24h】

Finding multiple clustering structures in data, with applications to DNA microarrays.

机译：在数据中找到多个聚类结构，并将其应用于DNA微阵列。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cluster analysis is the art of discovering classes in data. Traditionally, the goal of cluster analysis has been to uncover the unknown clustering structure by partitioning the observations into a single set of clusters such that the observations within each cluster are more similar to one another than those assigned to different clusters. However, as the number of variables gets larger, it becomes increasingly unlikely for any pair of observations to be similar across all the variables simultaneously. In contrast, the observations tend to group better on small subsets of the variables. Moreover, different subsets of variables might induce different and potentially useful clustering structures of observations. In this work, the standard clustering problem of finding a single clustering structure of observations is first generalized to the problem of discovering multiple clustering structures and finding variables that induce them. Three dissimilarity measures based on entropy, empirical measures and interpoint-distance based graphs are proposed for clustering variables and their performance is compared to the widely used correlation-based dissimilarity. A procedure based on binning that makes the computation of the first two of these dissimilarity measures feasible is developed. We also propose a weighted distance two-way clustering method for discovering multiple clustering structures in the data and give a randomization test for similarity of clustering structures. The motivating application is to gene expression data.

机译：聚类分析是发现数据类别的艺术。传统上，聚类分析的目标是通过将观察结果划分为单个聚类集来发现未知的聚类结构，从而使每个聚类中的观察值与一个，而不是分配给不同群集的那些。但是，随着变量数量的增加，任何一对观测值同时在所有变量中相似的可能性越来越小。相反，观察结果倾向于在较小的变量子集上更好地分组。此外，变量的不同子集可能会引发观察结果的不同且可能有用的聚类结构。在这项工作中，首先将发现观测值的单个聚类结构的标准聚类问题概括为发现多个聚类结构并找到引起它们的变量的问题。针对聚类变量，提出了三种基于熵的差异度量，经验度量和基于点间距离的图，并将它们的性能与广泛使用的基于相关性的差异进行了比较。提出了一种基于分箱的程序，该程序使得这些相异性度量中的前两个度量的计算变得可行。我们还提出了一种加权距离双向聚类方法，用于发现数据中的多个聚类结构，并对聚类结构的相似性进行随机检验。激励性的应用是基因表达数据。

著录项

作者
Belitskaya, Ilana Yolyevna.;
展开▼
作者单位

Stanford University.;

展开▼
授予单位 Stanford University.;
学科 Statistics.
学位 Ph.D.
年度 2003
页码 p.1319
总页数 148
原文格式 PDF
正文语种 eng
中图分类统计学;
关键词

相似文献

外文文献
中文文献
专利

1. Finding common task-related regions in fMRI data from multiple subjects by periodogram clustering and clustering ensemble [J] . Ye Jun, Li Yehua, Lazar Nicole A., Statistics in medicine . 2016,第15期

机译：通过周期图聚类和聚类集成在多个受试者的fMRI数据中找到常见的任务相关区域
2. Clustering in applications with multiple data sources-A mutual subspace clustering approach [J] . Ming Hua, Jian Pei Neurocomputing . 2012,第期

机译：具有多个数据源的应用程序中的群集-相互子空间群集方法
3. Cluster Gauss-Newton method for finding multiple approximate minimisers of nonlinear least squares problems with applications to parameter estimation of pharmacokinetic models [J] . Yasunori Aoki, Ken Hayami, Kota Toshimoto, NII Technical Report . 2020,第1期

机译：集群高斯 - 牛顿方法，用于查找多种近似值的非线性最小二乘问题的应用与药代动力学模型的参数估计
4. Structured Bi-clusters Algorithm for Classification of DNA Microarray Data [C] . Pawel Foszner, Andrzej Polanski Conference on Information Technologies in Biomedicine . 2016

机译：用于分类DNA微阵列数据的结构化双簇算法
5. NMR data management for structural genomics: Applications in the structure determination and functional characterization of iron-sulfur cluster bioassembly proteins. [D] . Baran, Michael. 2005

机译：用于结构基因组学的NMR数据管理：在铁硫簇生物组装蛋白的结构确定和功能表征中的应用。
6. Recursively partitioned mixture model clustering of DNA methylation data using biologically informed correlation structures [O] . Devin C. Koestler, Brock C. Christensen, Carmen J. Marsit, -1

机译：使用生物学上已知的相关结构对DNA甲基化数据进行递归划分的混合物模型聚类
7. Heterogeneity in DNA Multiple Alignments: Modeling, Inference, and Applications in Motif Finding ∗ [O] . Gong Chen, Qing Zhou 2010

机译：DNA多重比对中的异质性：建模，推断及其在母题查找中的应用*

Finding multiple clustering structures in data, with applications to DNA microarrays.

摘要

著录项

相似文献

相关主题

期刊订阅