首页> 外文学位 >Data mining using relational database management system.
【24h】

Data mining using relational database management system.

机译:使用关系数据库管理系统进行数据挖掘。

获取原文
获取原文并翻译 | 示例

摘要

With the wide availability of huge amounts of data and the imminent demands to transform the raw data into useful information and knowledge, data mining has become an important research field both in the database area and the machine learning areas. Data mining is defined as the process to solve problems by analyzing data already present in the database and discovering knowledge in the data. Database systems provide efficient data storage, fast access structures and a wide variety of indexing methods to speed up data retrieval. Machine learning provides theory support for most of the popular data mining algorithms. Weka-DB combines properties of these two areas to improve the scalability of Weka, which is an open source machine learning software package. Weka implements most of the machine learning algorithms using main memory based data structure, so it cannot handle large datasets that cannot fit into main memory. Weka-DB is implemented to store the data into and access the data from DB2, so it achieves better scalability than Weka. However, the speed of Weka-DB is much slower than Weka because secondary storage access is more expensive than main memory access. In this thesis we extend Weka-DB with a buffer management component to improve the performance of Weka-DB. Furthermore, we increase the scalability of Weka-DB even further by putting further data structures into the database, which uses a buffer to access the data in database. Furthermore, we explore another method to improve the speed of the algorithms, which takes advantage of the data access properties of machine learning algorithms.
机译:随着海量数据的广泛可用性以及将原始数据转换为有用的信息和知识的迫切需求,数据挖掘已成为数据库领域和机器学习领域的重要研究领域。数据挖掘被定义为通过分析数据库中已经存在的数据并发现数据中的知识来解决问题的过程。数据库系统提供了有效的数据存储,快速的访问结构和多种索引方法,以加快数据检索的速度。机器学习为大多数流行的数据挖掘算法提供了理论支持。 Weka-DB结合了这两个方面的属性,以提高Weka(可扩展的开源机器学习软件包)的可伸缩性。 Weka使用基于主内存的数据结构来实现大多数机器学习算法,因此它无法处理无法放入主内存的大型数据集。 Weka-DB的实现是将数据存储到DB2中以及从DB2中访问数据,因此它实现了比Weka更好的可伸缩性。但是,Weka-DB的速度比Weka慢得多,因为辅助存储访问比主存储器访问昂贵。在本文中,我们使用缓冲管理组件扩展了Weka-DB,以提高Weka-DB的性能。此外,我们通过将更多的数据结构放入数据库中来进一步提高Weka-DB的可伸缩性,该数据库使用缓冲区来访问数据库中的数据。此外,我们探索了另一种提高算法速度的方法,该方法利用了机器学习算法的数据访问属性。

著录项

  • 作者

    Ma, Xuesong.;

  • 作者单位

    McGill University (Canada).;

  • 授予单位 McGill University (Canada).;
  • 学科 Computer Science.
  • 学位 M.Sc.
  • 年度 2006
  • 页码 64 p.
  • 总页数 64
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号