...
首页> 外文期刊>BMC Medical Education >Assessment of examiner leniency and stringency ('hawk-dove effect') in the MRCP(UK) clinical examination (PACES) using multi-facet Rasch modelling
【24h】

Assessment of examiner leniency and stringency ('hawk-dove effect') in the MRCP(UK) clinical examination (PACES) using multi-facet Rasch modelling

机译:使用多方面Rasch建模评估MRCP(UK)临床检查(PACES)中检查者的宽大度和严格性(“鹰鸠效应”)

获取原文
           

摘要

Background A potential problem of clinical examinations is known as the hawk-dove problem, some examiners being more stringent and requiring a higher performance than other examiners who are more lenient. Although the problem has been known qualitatively for at least a century, we know of no previous statistical estimation of the size of the effect in a large-scale, high-stakes examination. Here we use FACETS to carry out a multi-facet Rasch modelling of the paired judgements made by examiners in the clinical examination (PACES) of MRCP(UK), where identical candidates were assessed in identical situations, allowing calculation of examiner stringency. Methods Data were analysed from the first nine diets of PACES, which were taken between June 2001 and March 2004 by 10,145 candidates. Each candidate was assessed by two examiners on each of seven separate tasks. with the candidates assessed by a total of 1,259 examiners, resulting in a total of 142,030 marks. Examiner demographics were described in terms of age, sex, ethnicity, and total number of candidates examined. Results FACETS suggested that about 87% of main effect variance was due to candidate differences, 1% due to station differences, and 12% due to differences between examiners in leniency-stringency. Multiple regression suggested that greater examiner stringency was associated with greater examiner experience and being from an ethnic minority. Male and female examiners showed no overall difference in stringency. Examination scores were adjusted for examiner stringency and it was shown that for the present pass mark, the outcome for 95.9% of candidates would be unchanged using adjusted marks, whereas 2.6% of candidates would have passed, even though they had failed on the basis of raw marks, and 1.5% of candidates would have failed, despite passing on the basis of raw marks. Conclusion Examiners do differ in their leniency or stringency, and the effect can be estimated using Rasch modelling. The reasons for differences are not clear, but there are some demographic correlates, and the effects appear to be reliable across time. Account can be taken of differences, either by adjusting marks or, perhaps more effectively and more justifiably, by pairing high and low stringency examiners, so that raw marks can be used in the determination of pass and fail.
机译:背景技术临床检查的潜在问题被称为鹰鸽子问题,与其他较宽容的检查员相比,一些检查员更加严格并且要求更高的表现。尽管从质量上至少已经知道了这个问题一个世纪了,但是我们不知道以前在大规模,高风险检查中对影响大小的统计估计。在这里,我们使用FACETS对MRCP(UK)的临床检查(PACES)中的检查员做出的配对判断进行多方面Rasch建模,在相同情况下对相同的候选人进行评估,从而可以计算检查员的严格程度。方法分析2001年6月至2004年3月之间10,145位候选人的前9种PACES饮食数据。每位候选人均由两名审查员对七个独立任务中的每一个进行评估。候选人由总共1,259名考官评估,总共142,030分。根据年龄,性别,种族和应试者总数描述了应试者的人口统计资料。结果FACETS指出,主效应方差的大约87%是由于候选者的差异,1%是由于位置的差异,而12%是由于检查者之间的宽严度差异。多元回归表明,更高的审查员严格度与更高的审查员经验相关,并且来自少数族裔。男性和女性检查者的严格程度没有总体差异。考试分数针对考官的严格程度进行了调整,结果表明,对于目前的及格分数,使用调整后的分数不会改变95.9%的应试者的成绩,而2.6%的应试者会通过,即使他们基于原始分数,尽管基于原始分数通过,但仍有1.5%的候选人失败。结论考官的宽容或严格度确实有所不同,可以使用Rasch模型估算其效果。差异的原因尚不清楚,但存在一定的人口统计学相关性,其影响在时间上似乎是可靠的。可以通过调整标记来考虑差异,或者可以通过将严格程度较高和较低的审查员配对来更有效,更合理地考虑差异,以便可以将原始标记用于确定通过与否。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号