首页> 外国专利> K-MEANS METHOD FOR CLUSTERING HOMOGENEOUS INFORMATION BY A DISTANCE BASED ON PREFERENCES AND RATIOS.

K-MEANS METHOD FOR CLUSTERING HOMOGENEOUS INFORMATION BY A DISTANCE BASED ON PREFERENCES AND RATIOS.

机译:基于优先级和比率的均匀距离信息聚类的K均值方法。

摘要

The invention refers to a method known as k-means, which includes innovations intended to customize and homogenize the data mining, such as: the user identifies the clustering attributes and establishes the preference degree (i.e., for demanding a higher degree of vicinity between the cluster group; or provide more freedom); the user establishes the convergence threshold (i.e., percentage of satisfaction in the reassignment of members to the clusters); the method provides the initial values for characterizing the centroids in symmetric regions with the same ratio (i.e., the minimum value and the maximum value are symmetric with regard to the centroid); the method estimates the Euclidian distance in a standard and balanced manner, since the accumulation of differences between the centroids and attributes with values represented by heterogeneous units (i.e., hundredths, millions...) is changed by distance percentages (i.e., the difference between the centroid and the value of an attribu te is divided amongst the centroid and the whole values existing in the information repository of that attribute) and these percentages being magnified or degraded in a portion equivalent to the preference assigned to the attribute (i.e., the higher is the relevance of the attribute, the more the percentage ratio grows, for instance a value upper than 1.0:1.1, 1.25); the less is the relevance, the percent valueá proportionally decreases (i.e., a value lower than 1.0: 0.9, 0.75); the method ends the mining upon satisfying the threshold defined by the user, (i.e., avoiding the treatment of the members that were recently assigned to a new cluster).
机译:本发明涉及一种称为k-means的方法,该方法包括旨在定制和均化数据挖掘的创新,例如:用户识别聚类属性并建立偏好度(即,用于要求用户之间更高的邻近度)。集群组;或提供更多自由);用户确定收敛阈值(即,将成员重新分配到集群中的满意度百分比);该方法提供了用于表征具有相同比率的对称区域中的质心的初始值(即,最小值和最大值关于质心是对称的);该方法以标准且平衡的方式估算欧几里得距离,因为质心和属性之间的差异累积(由异质单位(即,百分数,数百万...)表示)的值随距离百分比(即,质心和属性的值在质心和该属性的信息存储库中存在的整个值之间进行划分),并且这些百分比在与分配给该属性的首选项等效的部分中被放大或降低(即,较高是属性的相关性,百分比比率增长得越多,例如,值大于1.0:1.1、1.25);相关性越小,百分比值就成比例地降低(即值小于1.0:0.9、0.75);该方法在满足用户定义的阈值时结束挖掘(即避免处理最近分配给新集群的成员)。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号