首页> 外文学位 >Knowledge discovery in databases: An attribute-oriented rough set approach.
【24h】

Knowledge discovery in databases: An attribute-oriented rough set approach.

机译:数据库中的知识发现:一种面向属性的粗糙集方法。

获取原文
获取原文并翻译 | 示例

摘要

Knowledge discovery systems face challenging problems from the real-world databases which tend to be very large, redundant, noisy and dynamic. In this thesis, we develop an attribute-oriented rough set approach for knowledge discovery in databases. The method adopts the artificial intelligent "learning from examples" paradigm combined with rough set theory and database operations. The learning procedure consists of two phases: data generalization and data reduction. In data generalization, our method generalizes the data by performing attribute-oriented concept tree ascension, thus some undesirable attributes are removed and a set of tuples may be generalized to the same generalized tuple. The goal of data reduction is to find a minimal subset of interesting attributes that have all the essential information of the generalized relation; thus the minimal subset of the attributes can be used rather than the entire attribute set of the generalized relation. By removing those attributes which are not important and/or essential, the rules generated are more concise and ellicacious.;Our method integrates a variety of knowledge discovery algorithms, such as DBChar for deriving characteristic rules. DBClass for classification rules. DBDeci for decision rules. DBMaxi for maximal generalized rules. DMBkbs for multiple sets of knowledge rules and DBTrend for data trend regularities, which permit a user to discover various kinds of relationships and regularities in the data. This integration inherit the advantages of the attribute-oriented induction model and rough set theory. Our method makes some contribution to the KDD. A generalized rough set model is formally defined with the ability to handle statistical information and also consider the importance of attributes and objects in the databases. Our method is able to identify the essential subset of nonredundant attributes (factors) that determine the discovery task, and can learn different kinds of knowledge rules efficiently from large databases with noisy data and in a dynamic environment and deal with databases with incomplete information. A prototype system DBROUGH was constructed under a Unix/C/Sybase environment. Our system implements a number of novel ideas. In our system, we use attribute-oriented induction rather than tuple-oriented induction, thus greatly improving the learning efficiency. By integrating rough set techniques into the learning procedure, the derived knowledge rules are particularly concise and pertinent, since only the relevant and/or important attributes (factors) to the learning task are considered. In our system, the combination of transition network and concept hierarchy provides a nice mechanism to handle dynamic characteristic of data in the databases. For applications with noisy data, our system can generate multiple sets of knowledge rules through a decision matrix to improve the learning accuracy. The experiments using the NSERC information system illustrate the promise of attribute-oriented rough set learning for knowledge discovery for databases. (Abstract shortened by UMI.)
机译:知识发现系统面临来自现实世界数据库的具有挑战性的问题,这些问题往往非常大,冗余,嘈杂且动态。在本文中,我们开发了一种面向属性的粗糙集方法,用于数据库中的知识发现。该方法采用了人工智能的“实例学习”范例,结合了粗糙集理论和数据库操作。学习过程包括两个阶段:数据概括和数据归约。在数据泛化中,我们的方法通过执行面向属性的概念树提升对数据进行泛化,因此删除了一些不想要的属性,并且可以将一组元组泛化为相同的泛化元组。数据约简的目的是找到有趣的属性的最小子集,这些子集具有广义关系的所有基本信息。因此,可以使用属性的最小子集,而不是广义关系的整个属性集。通过删除那些不重要和/或不重要的属性,生成的规则将更加简洁明了。我们的方法集成了多种知识发现算法,例如DBChar,用于推导特征规则。 DBClass用于分类规则。 DBDeci用于决策规则。 DBMaxi用于最大的通用规则。 DMBkbs用于多组知识规则,DBTrend用于数据趋势规律性,这使用户可以发现数据中的各种关系和规律性。这种集成继承了面向属性的归纳模型和粗糙集理论的优点。我们的方法对KDD有所贡献。正式定义了广义粗糙集模型,具有处理统计信息的能力,并且还考虑了数据库中属性和对象的重要性。我们的方法能够识别决定发现任务的非冗余属性(因子)的重要子集,并且能够从具有嘈杂数据的大型数据库中以及在动态环境中有效地学习各种知识规则,并能够处理具有不完整信息的数据库。在Unix / C / Sybase环境下构建了原型系统DBROUGH。我们的系统实现了许多新颖的想法。在我们的系统中,我们使用面向属性的归纳而不是面向元组的归纳,从而大大提高了学习效率。通过将粗糙集技术整合到学习过程中,由于仅考虑了学习任务的相关和/或重要属性(因素),因此得出的知识规则特别简洁和相关。在我们的系统中,过渡网络和概念层次结构的结合提供了一种很好的机制来处理数据库中数据的动态特征。对于具有嘈杂数据的应用程序,我们的系统可以通过决策矩阵生成多套知识规则,以提高学习准确性。使用NSERC信息系统的实验说明了面向属性的粗糙集学习对数据库知识发现的希望。 (摘要由UMI缩短。)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号