等级间信度
印为红字的
可靠性(半导体)
心理学
情感(语言学)
分辨率(逻辑)
统计
计算机科学
评定量表
数学教育
数学
发展心理学
人工智能
功率(物理)
物理
量子力学
沟通
作者
Robert L. Johnson,James Penny,Belita Gordon
标识
DOI:10.1207/s15324818ame1302_1
摘要
Abstract When the raters of constructed-response items, such as writing samples, disagree on the level of proficiency exhibited in an item, testing agencies must resolve the score discrepancy before computing an operational score for release to the public. Several forms of score resolution are used throughout the assessment industry. In this study, we selected 4 of the more common forms of score resolution that were reported in a national survey of testing agencies and investigated the effect that each form of resolution has on the interrater reliability associated with the resulting operational scores. It is shown that some forms of resolution can be associated with higher reliability than other forms and that some forms may be associated with artificially inflated interrater reliability. Moreover, it is shown that the choice of resolution method may affect the percentage of papers that are defined as passing in a high-stakes assessment.
科研通智能强力驱动
Strongly Powered by AbleSci AI