首页> 外文学位 >A computational environment for data preprocessing in supervised classification.
【24h】

A computational environment for data preprocessing in supervised classification.

机译:在监督分类中进行数据预处理的计算环境。

获取原文
获取原文并翻译 | 示例

摘要

In this thesis, a data preprocessing environment has been created, for use in a supervised classification context, with the Windows platform of the R programming language and environment for statistical computing and graphics. The functions that compose the environment have been selected based on the results of empirical studies on the effects of the data preprocessing techniques investigated on the misclassification error of well-known classifiers used on real datasets. Visualization techniques were also included in the environment to support data exploration, as well as data preprocessing decisions. The techniques considered in this thesis were applied to twelve real datasets found at the Machine Learning Database Repository at the University of California, Irvine. The datasets varied in the number of dimensions from 4 to 60, in the number of observations from 150 to 4435, and in the number of classes from 3 to 7. Other existing studies on data preprocessing study the effects of applying these techniques to the whole dataset, but not by class.; The functions that form the data preprocessing environment were placed in a package that can be downloaded to the R directory R_HOME/library and then, loaded to the user's workspace to create a data preprocessing environment for supervised classification applications. Future investigations may explore the use of these functions for data mining projects that involve very-high dimensional and very large datasets.
机译:本文利用R编程语言的Windows平台以及用于统计计算和图形的环境,创建了一个数据预处理环境,用于监督分类环境。根据对数据预处理技术对实际数据集上使用的知名分类器的误分类错误的影响进行调查的经验研究结果,选择了构成环境的功能。环境中还包含可视化技术,以支持数据探索以及数据预处理决策。本文中考虑的技术被应用于加州大学欧文分校的机器学习数据库存储库中找到的十二个真实数据集。数据集的维数从4到60不等,观察数从150到4435不等,类别数从3到7不等。其他有关数据预处理的现有研究研究了将这些技术应用于整体的效果。数据集,但不是按类。构成数据预处理环境的功能放在一个包中,该包可以下载到R目录R_HOME / library,然后加载到用户的工作区中,以创建用于监督分类应用程序的数据预处理环境。未来的调查可能会探索将这些功能用于涉及到非常高维和非常大数据集的数据挖掘项目。

著录项

  • 作者

    Rodriguez, Caroline K.;

  • 作者单位

    University of Puerto Rico, Mayaguez (Puerto Rico).;

  • 授予单位 University of Puerto Rico, Mayaguez (Puerto Rico).;
  • 学科 Computer Science.; Statistics.
  • 学位 M.S.
  • 年度 2004
  • 页码 175 p.
  • 总页数 175
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;统计学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号