首页> 外文学位 >Analysis of robust measures for random forest regression.
【24h】

Analysis of robust measures for random forest regression.

机译:分析随机森林退化的有效措施。

获取原文
获取原文并翻译 | 示例

摘要

Our approach is based on the RFR with two major differences---the introduction of robust prediction and error statistic. The current methodology utilizes the node mean for prediction and mean squared error (MSE) to derive the in-node and overall error. Herein, we introduce and assess the use of a median (and other robust measures) for prediction and mean absolute deviation (MAD) to derive the in-node and overall error. Extensive research has shown that the median is a better prediction of the centrality of the distribution in the presence of large or unbounded outliers because the median inherently ignores these outliers basing its prediction on the ordered, central value(s) of the data.; Our research hypothesis is that robust methods should significantly improve the predictive performance of random forest methods for nonparametric regression when the data contains unbounded outliers and displays the heteroscedastic property. We have shown that RRFR performs well under extreme conditions; with datasets that include unbounded outliers or heteroscedastic conditions. This hypothesis was tested using corrosion data and other datasets. Comparative performance among models was based on both the mean-squared-error (MSE) and mean-absolute-deviation (MAD) statistics.; The NDT data were derived from eddy current (EC) scans of the United States Air Force's (USAF) KC-135 aircraft. While we might suspect a link between NDT results and corrosion, up until now this link has not been formally established. Instead, the NDT data have been converted into false color images that are analyzed visually by maintenance operators. Previous models that we introduced suggest that by applying appropriate data mining techniques we can more effectively handle noisy data through more sophisticated models rather than simpler ones. Moreover, while a variety of modeling techniques can predict corrosion with reasonable accuracy, regression trees are particularly effective in modeling the complex relationships between the eddy current measurements and the actual amount of corrosion. (Abstract shortened by UMI.)
机译:我们的方法基于具有两个主要差异的RFR-引入鲁棒预测和误差统计量。当前的方法利用节点均值进行预测并利用均方误差(MSE)得出节点内误差和总误差。本文中,我们介绍并评估了中位数(和其他鲁棒度量)用于预测和平均绝对偏差(MAD)的使用,以得出节点内误差和总体误差。大量研究表明,在存在较大或无界的离群值的情况下,中位数是对分布中心性的更好预测,因为中位数根据数据的有序中心值固有地忽略了这些离群值。我们的研究假设是,当数据包含无界外值并显示异方差性质时,鲁棒方法应可显着提高随机森林方法对非参数回归的预测性能。我们已经表明,RRFR在极端条件下表现良好;包含无界异常值或异方差条件的数据集。使用腐蚀数据和其他数据集检验了该假设。模型之间的比较性能是基于均方误差(MSE)和均值绝对偏差(MAD)统计数据。 NDT数据来自美国空军(USAF)KC-135飞机的涡流(EC)扫描。尽管我们可能会怀疑NDT结果与腐蚀之间存在联系,但到目前为止,这一联系尚未正式建立。取而代之的是,NDT数据已转换为假彩色图像,维护人员可以对其进行可视化分析。我们引入的先前模型表明,通过应用适当的数据挖掘技术,我们可以通过更复杂的模型(而不是更简单的模型)更有效地处理嘈杂的数据。此外,尽管各种建模技术可以合理准确地预测腐蚀,但是回归树在建模涡流测量值与实际腐蚀量之间的复杂关系时特别有效。 (摘要由UMI缩短。)

著录项

  • 作者

    Brence, John R.;

  • 作者单位

    University of Virginia.;

  • 授予单位 University of Virginia.;
  • 学科 Engineering System Science.; Operations Research.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 249 p.
  • 总页数 249
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 系统科学;运筹学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号