首页>
外国专利>
User-controlled iterative sub-clustering of large data sets guided by statistical heuristics
User-controlled iterative sub-clustering of large data sets guided by statistical heuristics
展开▼
机译:在统计启发式的指导下,用户控制的大数据集的迭代子集群
展开▼
页面导航
摘要
著录项
相似文献
摘要
The current invention is related to data analysis, and in particular, various methods for cluster analysis. It provides a method that aims to summarize and illustrate an original data set by means of breaking it iteratively into sub-divisions, altogether comprising a hierarchical cluster structure. The method comprises at least the steps of collecting a parametrically predetermined number of samples from a given original data set in which each data item is described by a vector of values, and iterating each of the following steps at least once: presenting to the user the hierarchical cluster structure composed by already completed iterations, the list of variables specified by the data set presented in a manner that indicates a heuristic for optimal distinctivity within the cluster, receiving from the user a selection of a supercluster to be sub-divided and a sub-divisive variable, collecting a sample of a fixed number of items from the original data set such that fall within the union of interval values for each of the variables that defined the supercluster in previous iterations, and performing a sub-division on said elected divisive variable on said cluster.
展开▼