首页> 外文学位 >Assessing fit of item response models for performance assessments using Bayesian analysis.
【24h】

Assessing fit of item response models for performance assessments using Bayesian analysis.

机译:使用贝叶斯分析评估用于绩效评估的项目响应模型的适合性。

获取原文
获取原文并翻译 | 示例

摘要

Assessing IRT model-fit and comparing different IRT models from a Bayesian perspective is gaining attention. This research evaluated the performance of Bayesian model-fit and model-comparison techniques in assessing the fit of unidimensional Graded Response (GR) models and comparing different GR models for performance assessment applications.;The study explored the general performance of the PPMC method and a variety of discrepancy measures (test-level, item-level, and pair-wise measures) in evaluating different aspects of fit for unidimensional GR models. Previous findings that the PPMC method is conservative were confirmed. In addition, PPMC was found to have adequate power in detecting different aspects of misfit when using appropriate discrepancy measures. Pair-wise measures were found more powerful in detecting violations of unidimensionality and local independence assumptions than test-level and item-level measures. Yen's Q3 measure appeared to perform best. In addition, the power of PPMC increased as the degree of multidimensionality or local dependence among item responses increased. Two classical item-fit statistics were found effective for detecting the item misfit due to discrepancies from GR model boundary curves.;The study also compared the relative effectiveness of three Bayesian model-comparison indices (DIC, CPO, and PPMC) for model selection. The results showed that these indices appeared to perform equally well in selecting a preferred model for an overall test. However, the advantage of PPMC applications is that they can be used to compare the relative fit of different models, but also evaluate the absolute fit of each individual model. In contrast, the DIC and CPO indices only compare the relative fit of different models.;This study further applied the Bayesian model-fit and model-comparison methods to three real datasets from the QCAI performance assessment. The results indicated that these datasets were essentially unidimensional and exhibited local independence among items. A 2P GR model provided better fit than a 1P GR model, and a two-dimensional model was also not preferred. These findings were consistent with previous studies, although Stone's fit statistics in the PPMC context identified less misfitting items compared to previous studies. Limitations and future research for Bayesian applications to IRT are discussed.
机译:评估IRT模型拟合并从贝叶斯角度比较不同的IRT模型正受到关注。这项研究评估了贝叶斯模型拟合和模型比较技术在评估一维梯度响应(GR)模型的拟合度并比较不同GR模型用于性能评估应用中的性能。评估一维GR模型拟合的不同方面时,各种差异度量(测试级别,项目级别和成对度量)。以前的发现证实了PPMC方法是保守的。另外,当使用适当的差异度量时,PPMC被发现具有足够的能力来检测失配的不同方面。研究发现,成对度量比检测级别和项目级别的度量在检测违反一维性和局部独立性假设方面更有效。日元的第三季度指标似乎表现最好。另外,PPMC的功能随着项目响应之间的多维程度或局部依赖性程度的增加而增加。发现两个经典的项目拟合统计量可以有效地检测由于GR模型边界曲线的差异而导致的项目失配。该研究还比较了三个贝叶斯模型比较指标(DIC,CPO和PPMC)的相对有效性。结果表明,这些指数在选择整体测试的首选模型时表现良好。但是,PPMC应用程序的优点是它们可用于比较不同模型的相对拟合,但也可以评估每个单独模型的绝对拟合。相比之下,DIC和CPO指数仅比较不同模型的相对拟合。本研究进一步将贝叶斯模型拟合和模型比较方法应用于QCAI绩效评估中的三个真实数据集。结果表明,这些数据集基本上是一维的,并且在项目之间表现出局部独立性。 2P GR模型比1P GR模型具有更好的拟合度,并且二维模型也不是首选。这些结果与以前的研究一致,尽管与以前的研究相比,Stone在PPMC中的拟合度统计确定了较少的不匹配项。讨论了贝叶斯应用于IRT的局限性和未来研究。

著录项

  • 作者

    Zhu, Xiaowen.;

  • 作者单位

    University of Pittsburgh.;

  • 授予单位 University of Pittsburgh.;
  • 学科 Quantitative psychology.;Educational tests measurements.;Educational psychology.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 304 p.
  • 总页数 304
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号