首页> 外文学位 >Comparacion de los metodos de imputacion con respecto al poder de separacion del modelo de regresion logistica (Spanish text).
【24h】

Comparacion de los metodos de imputacion con respecto al poder de separacion del modelo de regresion logistica (Spanish text).

机译:关于逻辑回归模型的分离力的插补方法比较(西班牙语)。

获取原文
获取原文并翻译 | 示例

摘要

An MCAR (Missing Completely at Random) mechanism was used with different missing data proportions in order to generate iteratively missing values in some data sets obtained from the Machine Learning Database Repository at the University of California, Irvine, to compare the efficiency of single, hot deck, and multiple imputation techniques in a logistic regression model. The parameter of interest in these comparisons is the separation power of the logistic regression model obtained by the area under the Receiver Operating Characteristic (ROC) curve. We are implementing unconditional and conditional mean, median, and mode (IMEAN, ICMEAN, IMED, ICMED, IMOD, ICMOD) as the single imputation methods. And for the Hot-Deck imputation, we used the unconditional and conditional random sampling of the observed values (IRS, ICRS), and the k th nearest neighbor imputation (KNN). The multiple one is the FRITZ (Federal Reserve Imputation Technique Zeta) algorithm implemented by [Kennickell, 1991] on the SCF (Survey of Consumer Finances). Several iterations for the separation power were obtained after a generation of missing data with a given proportions, and then fill-in these missing values by some imputation method. The average bias between the real separation power and the separation power for all the iterations was calculated for all the imputation methods and some missing data proportions. The testing of these estimated biases were made by using non-parametric comparison procedures. From these testing we have found that the ICRS technique generate the minor bias on the area under the ROC curve. Also, we found that under a MCAR mechanism there are imputation methods that have a good performance at proportions of missing data higher than 15%.
机译:为了在从加州大学欧文分校的机器学习数据库存储库获得的某些数据集中生成迭代缺失值,使用了MCAR(随机缺失)机制,以迭代地生成缺失值,以比较单个热点数据的效率。逻辑回归模型中的套牌和多种插补技术。这些比较中关注的参数是通过接收器工作特性(ROC)曲线下的面积获得的逻辑回归模型的分离能力。我们正在实现无条件和有条件的均值,中位数和众数(IMEAN,ICMEAN,IMED,ICMED,IMOD,ICMOD)作为单一插补方法。对于Hot-Deck插补,我们使用观测值(IRS,ICRS)和第k个最近邻插补(KNN)的无条件和有条件随机抽样。倍数是[Kennickell,1991]在SCF(消费者财务调查)上实施的FRITZ(联邦储备估算技术Zeta)算法。在生成具有给定比例的缺失数据之后,对分离能力进行了多次迭代,然后通过某种插补方法来填充这些缺失值。对于所有插补方法和一些缺失的数据比例,计算了所有迭代的实际分离能力和分离能力之间的平均偏差。这些估计的偏差的测试是通过使用非参数比较程序进行的。从这些测试中,我们发现ICRS技术会在ROC曲线下的区域上产生较小的偏差。此外,我们发现,在MCAR机制下,有一些插补方法在丢失数据的比例高于15%时具有良好的性能。

著录项

  • 作者

    Lopez Vazquez, Victor.;

  • 作者单位

    University of Puerto Rico, Mayaguez (Puerto Rico).;

  • 授予单位 University of Puerto Rico, Mayaguez (Puerto Rico).;
  • 学科 Statistics.; Mathematics.
  • 学位 M.S.
  • 年度 2006
  • 页码 174 p.
  • 总页数 174
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 统计学;数学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号