摘要
Toward a New Readability: A Mixed Model Approach David F. Dufty (ddufty@memphis.edu) Scott A. Crossley (scrossley@mail.psyc.memphis.edu) Department of English, Mississippi State University Starkville, MS 39759 Department of Psychology, University of Memphis Memphis. TN 38152 Philip M. McCarthy (pmmccrth@memphis.edu) Danielle S. McNamara (d.mcnamara@mail.psyc.memphis.edu) Department of Psychology, University of Memphis Memphis. TN 38152 Department of Psychology, University of Memphis Memphis. TN 38152 Classic Readability Abstract Providing students with texts that are accessible and well matched to reader abilities has always been a challenge for educators. A solution to this problem has been the creation and use of readability formulas. Since 1920 more than 50 readability formulas have been produced in the hopes of providing tools to measure text difficulty more accurately and efficaciously. Additionally, it was hoped these formulas would allow for a greater understanding of optimal text readability. The majority of these readability formulas are based on factors that represent two broad aspects of comprehension difficulty: lexical or semantic features and sentence or syntactic complexity (Chall & Dale, 1995). According to Chall and Dale (1995), formulas that depend on these variables are successful because they are related to text simplification. For instance, when a text is written for a beginning reading audience, the text generally contains more frequent words and shorter sentences. Thus, measuring the word frequency and sentence length of a text should provide a basis for understanding how readable it is. However, traditional readability formulas are often not based on any theory of reading or reading comprehension, but rather on empirical correlations. Therefore, their soundness is strictly predictive and they are often accused of having weak construct validity. Regardless, a number of classic validation studies have found the formulas’ predictive validity to be consistently high, correlating with observed difficulty in the r = .8 range and above (Chall, 1958; Chall & Dale, 1995; Fry, 1989). While the predictive validity of these measures seems strong, they are generally based on traditional student populations reading academic or instructional texts. This has led many proponents of readability formulas to caution against their use with literary or technical texts, or texts written to the formulas. However, the draw of readability formulas’ simple, mechanical assessments has led to their widespread use for assessing all sorts of texts for a wide variety of readers and reading situations beyond those for which the formulas were invented. The widespread use of traditional formulas in spite of restricted validity has inclined many researchers within the field of discourse processing to regard them with This study is a preliminary examination into the use of Coh-Metrix, a computational tool that measures cohesion and text difficulty at various levels of language, discourse, and conceptual analysis, as a means of measuring English text readability. The study uses 3 Coh-Metrix variables to analyze 32 academic reading texts and their corresponding readability scores. The results show that two indices, one measuring lexical co-referentiality and one measuring word frequency, mixed with an estimate of syntactic complexity, yield a prediction of reading difficulty that is similar to traditional readability formulas. The study demonstrates that Coh-Metrix variables can contribute to a readability prediction that better reflects the psycholinguistic factors of reading comprehension. Keywords: Readability; Corpus Linguistics; Cognitive Processing; Computational Linguistics; Discourse Analysis Introduction This study is an exploratory examination into the use of Coh-Metrix (Graesser, McNamara, Louwerse, & Cai, 2004) as an improved means of measuring text readability. While traditional readability formulas such as Flesch Reading Ease (1948) and Flesch-Kincaid (1975) have been widely accepted by the reading research community, they have also been widely criticized by cognitive researchers for their inability to take into account textbase processing, situation levels (Kintsch, et al., 1990; McNamara et al., 1996) and cohesion (Graesser et al., 2004, McNamara et al., 1996). Coh-Metrix, however, offers the prospect of addressing the limitations of conventional readability measures by providing detailed analyses of language by integrating lexicons, pattern classifiers, part-of-speech taggers, syntactic parsers, shallow semantic interpreters, and other components that have been developed in the field of computational linguistics (Jurafsky & Martin, 2000). In reference to cohesion indices, Coh-Metrix also analyzes co-referential cohesion, causal cohesion, density of connectives, Latent Semantic Analysis metrics, and syntactic complexity. Since Coh-Metrix considers textbase processing and cohesion, it is well suited to address many of the criticisms of traditional readability formulas.