网络双评过程中作文评分误差以及评分者效应的分析--以大规模英语考试作文评分为例

李美娟; 刘红云

首页> 中文期刊> 《中国考试》 >网络双评过程中作文评分误差以及评分者效应的分析--以大规模英语考试作文评分为例

网络双评过程中作文评分误差以及评分者效应的分析--以大规模英语考试作文评分为例

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

目前大规模考试作文评分大都采用双评评分模式，本研究采用多侧面Rasch模型（MFRM）分析双评模式下大型英语作文评分中的评分者误差来源及主要影响因素。对57名评分者所评价的2427篇作文分析发现：①评分者的宽严度存在显著的差异；②在作文评分中，约有22.8%的评分者之间的一致性较差，也存在约3.5%的评分者之间一致性过高；③约90%的评分者自身的一致性都较高，但仍有8.8%的评分者自身一致性很差，约2%的评分者出现评分自身一致性过高的情况；④从整体上讲，评分者在不同的评分标准（或维度）上、不同评分等级宽严程度的把握存在差异；评分者和被试，以及评分者、被试和评分标准三者的交互作用不显著；⑤评分者对男生和女生具有相同的宽严度。%This research would investigate the extent to which that second language writing performance scores were influenced by rater effect in large scale assessment in China. Writing samples were obtained from 2427(1491 females, 936 males)first grade students in Junior high school. The 54 raters in this study were all experienced specialists in the field of Teaching English as the second language. Each examinee was randomly scored by two raters. Each writing sample was scored according to five criterion:①Information, a 4-point scale was use to measure content;②Gracture, which is a 4-point scale used to evaluate the sentence; ③Mechanics, a 3-point scale is for the overall structure;④Length, a 2-point scale used to measure the number of words;and⑤Coherence, a 3-point scale u the expression. The MFRM analysis was completed using Facets software. Three facets were analyzed including persons, raters, and rating criteria based on Partial credit Model. The findings in this study indicated that①Raters differed in severity or leniency.②Some raters could not follow the rating scale consistently, while others could not stay close to their own scoring standard.③Raters could be able to maintain an constant level of severity across all the examinees, but not to all five criteria. ④There was no differential rater functioning related to the gender of examinees, which also means that the raters maintained a consistent severity or leniency across male and female examinees. MFRM study had a number of implications for rating issues in L2 writing assessment. Individual feedbacks can improve the efficiency of rater training to ensure objectivity and fairness of the writing performance assessment.

著录项

来源
《中国考试》 |2015年第2期|39-48|共10页
作者
李美娟; 刘红云;
展开▼
作者单位

北京教育科学研究院北京100191;

北京师范大学北京100875;

展开▼
原文格式 PDF
正文语种 chi
中图分类 G405;
关键词
主观题评分; 多侧面Rasch模型; 评分者误差分析;

相似文献

中文文献
外文文献
专利

1. 大规模英语考试作文评分标准效度验证 [J] . 陈建林 . 中国考试 . 2016,第001期
2. 作文评分：从“类”的标准走向“个”的标准--以上海市浦东新区高三语文一模作文评分为例 [J] . 胡根林 . 中学语文（上旬·教学大参考） . 2013,第005期
3. 作文评分:到语言为止?——就作文评分与黄助昌先生商榷兼谈作文的道德底线 [J] . 李运淼 ,李在荣 . 中学语文（上旬·教学大参考） . 2007,第003期
4. 四级英语考试作文评分样卷的衔接手段研究 [J] . 谢遐均 . 中国成人教育 . 2005,第6期
5. 中国EFL学习者自动作文评分探索 [C] . 葛诗利 ,陈潇潇 . 第三届学术计算语言学研讨会 . 2006
6. 大规模网上作文评分的信度研究—CEPT写作评分一致性检测 [A] . 师艳芹 . 2009

网络双评过程中作文评分误差以及评分者效应的分析--以大规模英语考试作文评分为例

摘要

著录项

相似文献

相关主题

期刊订阅