Professional Performance, Reliability, Hierarchical Linear Models
During the past year I completed three chapters (two with Steve Schilling) for The Essential Brunswik.
Application of the Lens Model to the Evaluation of Professional Performance
In this chapter, I argue that the value systems of evaluators can be made explicit through ideographic modeling with multiple linear regression in a way that facilitates examination of those values. The clarification of both expressed and implemented values (inferred from subjective and empirical weights, respectively) can improve communication among evaluators and persons whose professional performance is to be assessed. This is particularly important where the evaluator also serves as supervisor (e.g., a classroom teacher supervising practice teaching).
Inter-evaluator agreement is decomposed by the lens model equation into components that can guide training of evaluators to improve the consensus among them and, hence, the reliability of the assessment of professional performance.
Statistics obtained in the modeling process (e.g., the squared multiple correlation) and from the decomposition of inter-evaluator agreement (e.g., G) can be used as indices of the adequacy of the functioning of individual evaluators and as the basis of comparisons of groups of evaluators.
Finally, further examination of the evaluative consensus through cluster analytic methods can lead to the description of evaluative typologies consisting of evaluators with similar implemented value systems.
Assessing the Reliability of Judgments
Within the Social Judgment Theory (SJT) framework, the reliability of judgments is typically defined as the consistency of repeated ratings of the same cases by the same judges. This approach yields a separate test-retest correlation for each judge and is sensitive to temporal variation. Reliability could also be assessed in terms of agreement among the judges (interrater reliability), but this would yield as many sets of interrater reliabilities as occasions. Averaging across occasions is possible, but this would ignore variance due to occasions.
Fortunately, generalizability theory (Cronbach, Gleser, Nanda, & Rajaratnam, 1972) provides a comprehensive analytic framework that can be used to estimate the magnitude of multiple sources of error in an SJT investigation to permit the design of a subsequent study that will yield the required reliability of judgments.
In this chapter, Steve Schilling and I provide a brief overview of generalizability theory, relate it to ideographic and nomothetic levels of analysis as well as representative design, and demonstrate its application to data collected in an SJT study of hail forecasting (Stewart, Moninger, Grassia, Brady, & Merrem, 1989).
Hierarchical Linear Models for the Nomothetic Aggregation of Ideographic Descriptions of Judgment
Steve Schilling and I hope that this chapter will provide a useful way to reconcile the apparent tension between ideographic and nomothetic approaches to the analysis of judgment data.
Beginning with a summary of the distinction between the ideographic and nomothetic orientations and Brunswik's view of the tension between them, we note strategies that previously have been employed to aggregate ideographic judgment data.
Next, we provide a brief overview of hierarchical linear models (HLMs; Bryk & Raudenbush, 1992) and show how they can be used to analyze a set of judgments based on multiple cues.
Finally, we suggest that when undertaken after the ratings of judges have been analyzed and understood at the individual level, HLMs provide a means to simultaneously model judges at the ideographic and nomothetic levels. In addition, HLMs clearly indicate whether quadratic relationships detected in ideographic descriptions of individual judges should be included in an overall, nomothetic description of the aggregated data. Finally, HLMs lead to a parsimonious nomothetic model of the aggregated data, and provide significance tests that can detect (a) differences among the judges with respect to the weights they assign individual cues (ideographic level of analysis) and (b) appreciable cue utilization in the population from which the sample of judges was drawn (nomothetic level of analysis).
Steve and I have also been working on an ever-expanding presentation of the material in the HLM chapter in a manuscript titled Multilevel Judgment and Reliability Analysis: Hierarchical Linear Models as a Bridge Between Generalizability Theory and the Lens Model Equation.
In this paper, we deal with such additional topics as modeling variability as a function of judge characteristics.