首页> 外文OA文献 >A Discretization Algorithm of Continuous Attributes Based on Supervised Clustering
【2h】

A Discretization Algorithm of Continuous Attributes Based on Supervised Clustering

机译:基于监督聚类的连续属性离散化算法

摘要

Many machine learning algorithms can be applied only to data described by categorical attributes. So discretizatioti of continuous attributes is one of the important steps in preprocessing of extracting knowledge. Traditional discretization algorithms based on clustering need a pre-determined clustering number k, also typically are applied in an unsupervised learning framework. This paper describes such an algorithm, called SX-means (Supervised X-means), which is a new algorithm of supervised discretization of continuous attributes on clustering. The algorithm modifies clusters with knowledge of the class distribution dynamically. And this procedure can not stop until the proper k is found. For the number of clusters k is not pre-determined by the user and class distribution is applied, the random of result is decreased greatly. Experimental evaluation of several discretization algorithms on six artificial data sets show that the proposed algorithm is more efficient and can generate a better discretization schema. Comparing the output of C4.5, resulting tree is smaller, less classification rules, and high accuracy of classification.
机译:许多机器学习算法只能应用于分类属性描述的数据。因此,连续属性的离散化是预处理知识提取的重要步骤之一。基于聚类的传统离散化算法需要预定的聚类数k,通常也应用于无监督的学习框架中。本文介绍了一种称为SX-means(监督X-均值)的算法,它是一种在聚类上对连续属性进行监督离散化的新算法。该算法利用类分布的知识动态地修改集群。并且找到适当的k之前,该过程无法停止。由于聚类的数量k不是由用户预先确定的,而是应用类别分布,因此结果的随机性大大降低。对六个人工数据集的几种离散化算法的实验评估表明,该算法效率更高,并且可以生成更好的离散化方案。比较C4.5的输出,结果树更小,分类规则更少,分类准确性更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号