首页> 外文会议>Machine learning and data mining in pattern recognition >Hot Deck Methods for Imputing Missing Data The Effects of Limiting Donor Usage
【24h】

Hot Deck Methods for Imputing Missing Data The Effects of Limiting Donor Usage

机译:估算丢失数据的热板方法限制捐赠者使用的影响

获取原文
获取原文并翻译 | 示例

摘要

Missing data methods, within the data mining context, are limited in computational complexity due to large data amounts. Amongst the computationally simple yet effective imputation methods are the hot deck procedures. Hot deck methods impute missing values within a data matrix by using available values from the same matrix. The object, from which these available values are taken for imputation within another, is called the donor. The replication of values leads to the problem, that a single donor might be selected to accommodate multiple recipients. The inherent risk posed by this is that too many, or even all, missing values may be imputed with the values from a single donor. To mitigate this risk, some hot deck variants limit the amount of times any one donor may be selected for donating its values. This inevitably leads to the question under which conditions such a limitation is sensible. This study aims to answer this question though an extensive simulation. The results show rather clear differences between imputations by hot deck methods in which the donor limit was varied. In addition to these differences, influencing factors are identified that determine whether or not a donor limit is sensible.
机译:在数据挖掘上下文中,由于数据量大,丢失的数据方法的计算复杂度受到限制。在计算上简单而有效的插补方法包括热甲板程序。热甲板方法通过使用来自同一矩阵的可用值来在数据矩阵内估算缺失值。从这些可用值中获取用于另一个内插值的对象称为施主。价值的复制导致了一个问题,即可能会选择一个捐助者来容纳多个接收者。由此带来的固有风险是,可能会用单个捐赠者的价值来推算太多甚至所有缺失的价值。为了减轻这种风险,某些热平台变体限制了可以选择任何一个捐赠者来捐赠其价值的次数。这不可避免地导致了这样的问题,在这种情况下,这种限制是明智的。这项研究旨在通过广泛的模拟来回答这个问题。结果表明,通过热甲板方法进行的估算之间存在相当明显的差异,其中供体限有所变化。除了这些差异外,还确定了影响因素,这些因素决定了供体限值是否合理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号