首页> 外文会议>International Conference on Artificial Intelligence IC-AI'2001 Vol.2, Jun 25-28, 2001, Las Vegas, Nevada, USA >Discretization Algorithm that Uses Class-Attribute Interdependence Maximization
【24h】

Discretization Algorithm that Uses Class-Attribute Interdependence Maximization

机译:使用类属性相互依赖最大化的离散化算法

获取原文
获取原文并翻译 | 示例

摘要

Most of the existing machine learning algorithms are able to extract knowledge from databases that store discrete attributes (features). If the attributes are continuous, the algorithms can be integrated with a discretization algorithm that transforms them into discrete attributes. The paper describes an algorithm, called CAIM (class-attribute interdependence maximization), for discretization of continuous attributes that is designed to work with supervised learning algorithms. The algorithm maximizes the class-attribute interdependence and, at the same time, generates possibly minimal number of discrete intervals. Its big advantage is that it does not require the user to pre-define the number of intervals, in contrast to many existing discretization algorithms. The CAIM algorithm and five other state-of-the-art discretization algorithms were tested on well-known machine learning datasets consisting of continuous and mixed-mode attributes. The tests show that the proposed algorithm generates discrete attributes with, almost always, the highest class-attribute interdependency when compared with other algorithms, and at the same time it always generates the lowest number of intervals. The discretized datasets were used in conjunction with the CLIP4 machine learning algorithm. The accuracy of the rules generated by the CLIP4 shows that the proposed algorithm significantly improves classification performance; it also performs best in comparison with other five discretization algorithms. The CAIM algorithm's speed is comparable to the simplest unsupervised algorithms and outperforms other supervised discretization algorithms.
机译:大多数现有的机器学习算法都能够从存储离散属性(功能)的数据库中提取知识。如果属性是连续的,则可以将算法与离散化算法集成,该离散化算法会将其转换为离散属性。本文描述了一种称为CAIM(类属性相互依赖最大化)的算法,用于离散化连续属性,该算法旨在与监督学习算法一起使用。该算法最大程度地提高了类属性的相互依赖性,同时生成了可能最少的离散间隔。与许多现有的离散化算法相比,它的主要优点是它不需要用户预先定义间隔数。在由连续和混合模式属性组成的著名机器学习数据集上对CAIM算法和其他五种最新的离散化算法进行了测试。测试表明,与其他算法相比,所提出的算法几乎总是具有最高的类-属性相互依赖性的离散属性,并且同时总是生成最少的间隔。离散化数据集与CLIP4机器学习算法结合使用。 CLIP4生成的规则的准确性表明,该算法显着提高了分类性能。与其他五种离散化算法相比,它的效果也最好。 CAIM算法的速度可与最简单的无监督算法相媲美,并且优于其他有监督的离散化算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号