|By Guillermo Solano-Flores
Workshop paper presented at the 1999 Meeting of the
National Association for the Research of Science Teaching
We propose a model for developing culturally-responsive science assessments as an alternative to the more popular approach of adapting assessments originally developed for a mainstream student population. The model builds on knowledge we have gained from both developing science assessments for linguistic minorities and constructing tools for effective science assessment development. Our model develops concurrently two versions of the same assessment: version A (mainstream culture or language) and version Alpha (specific linguistic or cultural minority). Throughout the process of assessment development, both version A and version Alpha undergo the same number of review-tryout-revise iterations and are piloted with students the same number of times. The development of one version benefits from the development of the other (modifications and improvements on version A are also made on version Alpha and vice versa). Shells (blueprints that specify the structural and formal characteristics of the exercises) are used as documents that both enable developers to have systematic discussions on the nature of the exercises and ensure that versions A and Alpha are kept comparable throughout the entire process of assessment development. We are currently using our model to develop assessments in English and Spanish and will be able to share preliminary data on its effectiveness at the NARST conference in April.
Limitations of Conventional Approaches to Addressing Linguistic and Cultural Diversity in Science Assessment
The challenges of testing across cultural and linguistic groups have been discussed thoroughly in the past few years. One challenge has to do with construct validity (see Van de Vijver & Hambleton, 1998). If cultural differences are not taken into consideration properly, the construct measured by the test may not be the same across cultures.
As standardized testing at the state, national, and international levels increases (e.g., state exit exams, NAEP and TIMMS, respectively), the need for effective procedures to properly addressing cultural and linguistic diversity in science assessment becomes evident. There is evidence that different methods for measuring knowledge tap into different aspects of academic achievement (Baxter & Shavelson, 1994; Dalton, Morocco, Tivnan, & Rawson, 1994; Ruiz-Primo & Shavelson, 1996). As a consequence, many assessment programs are rapidly incorporating new types of tasks (e.g., open-ended questions, hands-on tasks, concept maps, and computer simulations) in addition to traditional multiple-choice items. However, only performance gender and racial differences have been investigated for some of these tasks (e.g., Jovanovic, Solano-Flores, & Shavelson, 1994; Klein, Jovanovic, Stecher, McCaffrey, Shavelson, Haertel, Solano-Flores, & Comfort, 1997), and there is scant information on how different science tasks elicit on students different styles of thinking and problem solving strategies depending on thier cultural and linguistic backgrounds.
There is also evidence that student performance is extremely sensitive to the language used in science performance assessments. Wording has tremendous influence on the way students respond to an exercise (see Baxter, Shavelson, Goldman, & Pine, 1992). If a student does not understand a word in an exercise, or if she understands it in a slightly different way because of her linguistic and cultural backgrounds, her performance may not reflect accurately her scientific knowledge and skills.
Precisely because of this sensitivity to language in science assessments, adapting an instrument developed for mainstream students for use with different cultural or linguistic groups will fail to produce equitable, comparable assessments. Commonly used procedures accepted as and indication of the adequacy of test adaptations may not effectively control for cultural and linguistic bias. For example, a study on the use of bilingual, English-and-Spanish formats in science assessment (Solano-Flores, Ruiz-Primo, Baxter, & Shavelson, 1992) found that some Latino, English-learning students were unfamiliar with some words the researchers believed to be part of the students¢ everyday language. These words made their way to the Spanish version despite use of an experienced translator, a panel of bilingual scholars who reviewed the Spanish version and translated the Spanish version back into English to monitor retention of the original meaning, and testing the Spanish version with some students to make refinements. Although the translation was accurate and its grammar impeccable, it reflected the thinking and knowledge of the researchers rather than the thinking and knowledge of the students targeted.
From an assessment development perspective, the approach of adapting tests to address cultural and linguistic diversity has a serious methodological weakness that prevents testing from being equitable: the process used to develop the original version of the assessment (Version A, mainstream cultural or linguistic group) is different from the process used to develop the adapted version (Version Alpha, cultural or linguistic minority). The development process for Version A is cyclical (Figure 1a). During each iteration, developers observe the students or ask them to talk aloud as they perform in order to gain access to the reasonings they use to solve the problems. They also interview students to investigate how well they understand the problems posed and the kind of knowledge and thinking skill they use in solving the problem. The developers discuss the findings and create a more refined version of the assessment at each iteration. Among other things, they may rephrase questions, add, replace, or eliminate words, or include examples in the directions for equipment use (Solano-Flores and Shavelson, 1997). By contrast, Version Alpha is created by simply adapting Version A (Figure 1b). The delicate process that allows assessment developers to refine the assessment by trying it out with pilot students does not take place for Version Alpha.
The Model for Culturally Responsive-Assessment Development
Given the complexity of the process of assessment development, we postulate that any assumption that versions A and Alpha are equivalent is questionable unless they undergo the same process. In seeking alternative ways to develop science assessments that ensure equity and fairness in testing, we devised a model for culturally-responsive assessment development (see Solano-Flores & Nelson-Barber, 1999) (Figure 2). In our model, versions A and Alpha are developed concurrently. Both versions undergo the same number of review-tryout-revise iterations and are piloted with students the same number of times - each with students from the population it intends to serve.
A key element to the effectiveness of our model is the use of shells as development tools that ensure the comparability of versions A and Alpha. To put it simply, shells are blueprints that provide assessment developers with directions for generating exercises of a given type (Solano-Flores, Jovanovic, Shavelson & Bachman, 1999). Shells can also be thought of as documents that specify the characteristics, structure, and format of an exercise.
We have observed that independent teams of developers can develop science exercises that have comparable structures and appearances only if the shells provide very specific directions and do not leave much room for interpretation (see Stecher, Klein, Solano-Flores, McCaffrey, Robyn, Shavelson, & Haertel, 1999; Solano-Flores, Jovanovic, Shavelson, & Bachman, 1999). We have also observed that shells can be used as tools that help assessment developers to have more systematic discussions (Schneider, Daehler, Hershbell, McCarthy, Shaw, & Solano-Flores, 1999). Throughout the cyclical process of assessment development, shells are updated based on the experience gained from trying out and revising the exercises.
In our model, shells are used in two ways: (1) as documents that provide formal and highly specific descriptions of the structure of the exercises ¯which ensures that versions A and Alpha have comparable structures and appearances throughout the entire process of assessment development; (2) as interfaces for effective communication between developers. Developers are trained to use the shells as frameworks of reference for discussing and negotiating any proposed change. For example, if, based on the responses given by some pilot students to Version A of the assessment, a developer proposes to rephrase a prompt, the relevance of the change and the way to implement it is discussed in regard to both versions A and Alpha. Any modification is made upon consensus and made on both versions. In addition to ensuring equal participation of cultural or linguistic groups, an unanticipated potential benefit of this approach is that the content of the assessment is enriched by giving consideration to two languages or cultures throughout the entire process of development.
We are currently using the model to develop assessments in English and
Spanish for a school district that serves both English learners and native
English speakers. At the NARST conference, in April, we will share
preliminary data on the effectiveness of the model. We invite participants
who need to develop assessments for students of different cultural and
linguistic backgrounds to approach us and allow us to design for them strategies
for assessment development intended to suit their assessment needs.
This will allow us to refine our development model and evaluate the psychometric
soundness and comparability of the assessments that can generated with
Baxter, G. P. & Shavelson, R. J. (1994). Science performance assessments: benchmarks and surrogates. International Journal of Educational Research, 21(3), 279-298.
Baxter, G. P., Shavelson, R. J., Goldman, S. R., & Pine, J. (1992). Evaluation of a procedure-based scoring for hands-on science assessment. Journal of Educational Measurement, 29(1), 1-17.
Dalton, B., Morocco, C. C., Tivnan, T., & Rawson, P. (1994). Effect of format on learning disabled and non-learning disabled students¢ performance on a hands-on science assessment. International Journal of Educational Research, 21(3) 299-316.
Jovanovic, J., Solano-Flores, G., & Shavelson, R. J. (1994). Performance-based assessments: Will gender differences in science achievement be eliminated? Education and Urban Society, 26(4), August, 352-366.
Klein, S. P., Jovanovic, J., Stecher, B. M., McCaffrey, D., Shavelson, R. J., Haertel, E., Solano-Flores, G., & Comfort, K. (1997). Gender and racial/ethnic differences on performance assessments in science. Educational Evaluation and Policy Analysis, 19(2), 83-97.
Ruiz-Primo, M. A. & Shavelson, R. J. (1996). Rhetoric and reality in science performance assessment. Journal of Research in Science Teaching, 33(10), 1045-1063.
Schneider, S., Daehler, K. R., Hershbell, K., McCarthy, J., Shaw, J., & Solano-Flores, G. (1999). Developing a national science assessment for teacher certification: Practical lessons learned. In, L. Ingvarson, (Ed.), Assessing teachers for professional certification: The first ten years of the National Board for Professional Teaching Standards. Greenewich, Connecticut: JAI Press, Inc.
Solano-Flores, G., Jovanovic, J., Shavelson, R. J., & Bachman, M. (1999). On the development and evaluation of a shell for generating science performance assessments. International Journal of Science Education, Vol., ( ), pp.-pp.
Solano-Flores, G., & Nelson-Barber, S. (1999). Promoting equity and fairness in testing from the start: Developing culturally-responsive assessments. Paper presented at the Annual Meeting of the Center for Research on Students Placed at Risk. El Paso, Texas, January 20-23.
Solano-Flores, G., Ruiz-Primo, M. A., Baxter, G. P., & Shavelson, R. J. (1992). Science performance assessments with language minority students. Unpublished manuscript. University of California Santa Barbara.
Solano-Flores, G. & Shavelson, R. J. (1997). Development of performance assessments in science: Conceptual, practical, and logistical issues. Educational Measurement: Issues and Practice, 16(3), 16-25.
Stecher, B. M., Klein, S. P., Solano-Flores, G., McCaffrey, D., Robyn, A., Shavelson, R. J., & Haertel, E. (1999). The effects of content, format, and inquiry level on science performance assessment scores. (Under review).
Van de Vivjer, F. & Hambleton, R. K. (1997). Translating tests: Some practical guidelines. European Psychologist, 1(2), 89-99.
[ Back to Index Page ]