View Print Friendly Version | Close Window

CEPI - Commonwealth Educational Policy Institute
Policy Issues - Standards / Assessment / Accountability

James McMillan, Editor

High Stakes Testing

Descriptive Context

Student assessment is at the heart of the current standards movement in public education.  Every state has some kind of large scale testing program, and for forty-eight states the results of these assessments are used to monitor progress toward student attainment of standards, and for evaluating schools.  Thirty-three states have instituted rewards and sanctions as part of a state testing accountability program, and 38 states test at all three grade levels (elementary, middle, secondary). Large scale, high stakes assessments have become central to both school and student accountability.  For better or worse, student scores on statewide assessments are used to determine whether students graduate from high school, are promoted from one grade level to the next, receive special diplomas, and whether schools are accredited, rewarded, or even face some form of state takeover.  In some states student assessments are used to determine whether teachers receive pay raises.

Because the direct consequences of performance on a single test are very significant, the term “high-stakes” is often used to describe their purpose.  It is almost as if standards have become secondary to assessments, since performance on the tests is what counts.  Many describe what is currently occurring across the country assessment-driven reform rather than standards-based reform.

Granted, the ultimate purpose of any school reform movement is to improve student achievement.  But how is it that policy makers can best conclude that student achievement has improved?  After all, standardized and classroom testing of students has been done for decades.  What is different in the current reform movement is to construct and administer state controlled assessments that are specifically aligned with “high” state standards, and to then attach important consequences to how all students perform.  This marks a fundamental change in how student performance is evaluated.  What has traditionally been localized and based primarily on teacher evaluations, grades, and course-taking, is now centralized and standardized.  School quality is judged on the basis of student performance rather than on resources, teacher credentials, and other “input” or “process” factors.  As a result of this different emphasis, high stakes assessment has had a profound impact on education

Since decisions about students and schools are tied directly to how students perform on high stakes assessments, it is important to understand the nature of these tests, what we have learned from previous high stakes testing programs, what makes for a high quality large-scale assessment program, and both positive and negative intended and unintended consequences.  This discussion will focus on the nature of the assessments; consequences are considered separately as CEPI issue Testing Consequences.

What is the Nature of High Stakes Statewide Assessment?

An assessment is a procedure for gathering evidence directly from students and using the evidence to evaluate student performance.  The process of gathering or collecting evidence is separate from evaluation.  In statewide assessment the evidence is gathered in a “standardized” manner in which the testing and scoring procedures used are the same for every student and school.  In this way current high stakes assessments are much like standardized achievement and aptitude tests.  The format for collecting information can vary.  Most assessments use selected-response type items, such as multiple-choice (49 of 50 sates).  These items are objectively scored, and so are often called objective items.  Open-response items, such as writing, completion, and performance-based assessments are scored using professional judgment. Professional judgment is also used in objective tests in the creation of the item and determination of the correct response.  Thus, when assessments are called “objective,” they are objective only in the way the answers are scored (e.g., machine scoreable). Thirty-eight state assessment systems utilize short-answer items. Only 7 states use essay or performance-based assessments (Quality Control, 2001).

Evaluation involves an interpretation of the information that has been gathered, in which value judgments are made about the performance.  Evaluation gets at what test scores mean.  In the current reform movement standards provide the basis for evaluating performance on high stakes assessments.  Obviously professional judgment is central to evaluation.  Who makes the judgments and how they make them central to determining the overall results, such as outcomes reported to the public such as “pass” or “accredited.”  What level of performance is necessary for “pass?”  Only professional judgment can answer this question.  In contrast to standardized achievement tests that evaluate student performance based on how a student compares to a “norm” group (norm-referenced testing), most current state high-stakes assessment evaluations are based on pre-determined levels of achievement (criterion-referenced testing). In previous decades most statewide testing was done with standardized, norm-referenced, “off-the-shelf” tests developed by national testing companies.  Current high stakes assessments, in contrast, are implemented by statewide assessment programs.  Given the increased stakes of these assessments, the wide scope typically covered (many grade levels and subjects), the demand for quick turn around for reporting scores, and the political press for “immediate” positive benefits, state assessment resources are rarely able to meet these demands.  As a result, most states partner with one of a few national testing companies to develop the tests and report the scores.  Resources have been stretched, at both the state and national level, resulting in significant problems.  Just three companies develop these tests (Harcourt Educational Measurement, Riverside, CTB-McGraw-Hill), and industry executives have said increasing demands could affect test quality and scorring accuracy. For example, in 1999, thousands of New York City students were mistakenly required to attend summer school based on incorrect scores.  In Washington, 500,000 student writing samples had to be rescored, and Kentucky, Minnesota, and California have experienced miscoring or loss of student tests. In the past three years, at least 19 states have reported problems with test materials or mistakes in scoring.  This context is important because it effects what high stakes assessments are and are not able to provide in a credible, high quality way that will be accurate and fair.


Differing Perspectives

High stakes statewide assessments that are aligned with state content and performance standards can be used for many purposes.  It is important for policy-makers to understand and revisit as needed what the purposes for high stakes assessments are so that implementation and use of the results are consistent with the purposes.  Clearly articulated purposes also clarify to schools and the public what the overall goal is and what criteria should be used to evaluate the effectiveness of the system to meet stated goals. Purposes and benefits of high stakes assessment programs include the following:

  • Assures that standards are taken seriously and motivates teaching of the standards.  Assessments are the carrots and sticks that force schools and teachers to change, “shaking up” parents, school boards, school administration, and teachers.  Parents, in particular, are motivated to take action to improve schools. Recent research shows that almost 80% of teachers believe that today’s curriculum, in comparison to 3 years ago, is “somewhat” or “a lot” more demanding (Quality Counts 2001).

  • Motivates students to learn.  By attaching important consequences students will be motivated to learn more given higher standards.

  • Provides the same basis of evaluation for all students. Grade inflation and different standards used in different schools can be minimized.  Research has shown that some schools serving low socioeconomic status students have “easier” standards for grading students than schools serving high socioeconomic status students.

  • Provides information that can inform policy-makers of the quality of education.  While a direct link between student performance and quality of education may not be warranted, the assessments provide meaningful data on what is being provided and opportunity to learn.

  • Monitors school improvement efforts.  Assessment results over time provide information about whether targeted school improvement efforts are working.  The assessments provide indicators to evaluate program effectiveness.

  • Identifies student strengths and weaknesses to target instruction.  Results may be used to group students, change the curriculum, examine teaching, or emphasize identified areas.

  • Provides the same high expectations for all students. Having the same assessments and criteria for passing provides greater equity between the “haves” and the “have nots” by adopting a philosophy that “all students can learn.” “All” students typically refers to all socioeconomic status levels, all schools, and may include students with disabilities and students whose primary language is not English.

  • Assures better preparation for higher education and employment. Employers are increasingly demanding more of new employees, as are colleges and universities that need to reduce or eliminate remedial courses.

  • Allows recognition to schools and teachers of students who perform high and/or improve performance.  Assessments provide for meaningful rewards based on student achievement.

  • Holds schools accountable for student performance.  Assessment results provide for public accountability of schools and teachers for student performance.  The assessments provide a helpful external referent for accountability.

  • Increases the emphasis on student achievement.  Student achievement is one of many factors used in grading students.  High stakes assessments focus attention on student achievement, lessening the influence of effort, participation, and other non-achievement factors typically used in grading students.

  • Enhances professional development of teachers and administrators.  The assessments provide a basis for professional development.  Teachers have benefited very much if involved in the development, scoring, and standard-setting processes, especially for performance-based assessments.

While high stakes assessment is used extensively, it is not without its limitations and detractors. Much of the criticism is concerned with the consequences of such testing on teachers, curriculum and students. These criticisms are briefly listed here, along with other limitations, to provide a sense of what arguments against high stakes testing are based on.

  • Too much too soon.  Many maintain that the tests are developed and implemented too quickly, resulting in poor questions and unreliable scoring.  Assessment expertise is thin, particularly at the state level, which may lead to an over reliance on “outside” experts.

  • Objective assessments measure too much simple knowledge.  Objectively scored tests, such as the multiple-choice tests used extensively in Virginia, are best used to measure student knowledge and simple skills.  These kinds of tests are not the best way to measure deep understanding, complex skills, reasoning, problem-solving, and critical thinking.

  • New assessments don’t tell us much that is new. What new information is gained from high stakes testing?  What more do we know about our students and schools?  Some maintain that high stakes tests don’t tell us very much that we don’t already know from existing sources of information (e.g., standardized achievement and aptitude test scores).

  • What you test and how you test it is what you get; what is not tested you do not get.  A potentially serious limitation with high stakes tests is not what they do, it is what they do not do.  High stakes assessments inform us about only some of what is important in public education, namely student proficiency on what is tested.  The tests are limited by what content is covered and by the nature of the assessments.  There is much that is not covered by the tests that is also very important to knowing whether students have had a high quality education and have developed needed skills.

  • Too much emphasis on a single test score.  Failure to achieve a passing score on high stakes assessments can be life-changing.  Many organizations, including national testing companies and national testing associations, maintain that such critical decisions as being unable to graduate should never be made on the basis of a single test score.  To do so uses the score in a way that is not appropriate given the limitations of what a single test can tell us.

  • Do higher test scores reflect true changes in student achievement?  Many contend that increases in test scores are mostly because students have become more sophisticated test takers (more test-wise) and because schools institute test review practices that raise scores without really raising achievement.  Research tends to support the notion of test score inflation (Shepard, 2000).

  • Statewide assessment leads to a more narrow statewide curriculum.  Because of high stakes, schools will narrow the curriculum to emphasize what is on the test.  The common test for all schools drives a common curriculum, resulting in less local autonomy and less curriculum that is tailored to local conditions, traditions, and context.

  • The tests are unfair to low socioeconomic students and schools.  We know that scores on standardized tests are based primary on three influences:  1) student ability or aptitude, 2) what’s learned outside of school, and 3) what is learned in school.  Because low socioeconomic students tend to have lower abilities and live in environments that may not be conducive to academic learning, they are at a significant disadvantage. Further, teachers and principals in low socioeconomic schools that “fail” are penalized for factors that are outside their control. …

  • High stakes assessments are designed to make schools look bad.  Some maintain that one purpose of high stakes assessments, particularly those with very “high” performance standards, is to make public schools look like they are failing or are inadequate.  Some maintain further that the motivation for this is to enact policy to allow for vouchers to private schools.

  • High stakes assessments increase the centralized education bureaucracy.  States have learned that developing and operating a high stakes testing program requires considerable resources, leading to a larger education bureaucracy.

  • High stakes assessments drain resources from other programs.  Because assessments have high stakes it is essential to provide whatever resources are needed to assure needed quality.  Huge sums of money are needed to develop, administer, and score the tests.  Much of that money goes to a few national testing companies.  That is money that could go directly to school divisions to improve instruction.

  • High stakes assessments encourage motivation based on extrinsic rewards.  Motivation is enhanced when students are involved, when they want to learn to understand better, when they can see a purpose to what they are learning.  This is referred to as “intrinsic” motivation. External, high stakes tests motivate students solely on the outcome.  There is greater emphasis on extrinsic motivation.

  • High stakes assessments result in too much time preparing students to take the test.  The pressure to do well on the tests results in inordinate amounts of time devoted to practice tests and test-taking skills.  Time is diverted from more important learning.

  • High stakes assessments don’t provide information that can improve instruction.  By their very nature, high stakes statewide assessments cover a lot of content and provide only a few scores.  Unlike some criterion-referenced tests, they do not provide specific information that teachers can use to change instruction.  In the Virginia tests, even the “reporting category” scores give only provide a general sense of student strengths and weaknesses.

  • Results of high stakes assessments are used to judge teachers and administrators.  Even though current tests are not designed to be used to evaluate teachers and administrators, the scores are often used in this way.  The tests are designed to measure student, not teacher or administrator competence.  Like schools, teachers and administrators have limited influence over important factors that determine the scores, such as ability, effort, and what students learn outside of school.

  • High stakes assessment results depend on student motivation.  It is critical that students try their hardest to do well on statewide assessments.  If students are not motivated to perform, low scores may suggest, in error, that appropriate instruction has not been provided.

While it is clear that there are champions and detractors of high-stakes testing, one group with relatively little political interest is the American Educational Research Association.  This association, which includes educational measurement and research experts and seeks to promote educational policy that scientific research has shown to be beneficial, has recently issued a statement about high-stakes testing (AERA, 2000).  The organization maintains that while high-stakes testing can improve education, there is the potential for serious harm if either inadequate resources are invested or if technical requirements are insufficient for intended uses.  Under these conditions, it is likely that policymakers will be misled by test results.  AERA reccomends that high-stakes testing programs should meet the following conditions:

  1. Protection against high-stakes decisions based on a single test. According to one expert, “there is a definite limit to the amount of information that once-a-year assessments of limited duration...can provide” (Herman, 2001) Also see Gordon, 2000; and recent statements from the American Association of School Administrators which rejects accountability systems that rely only on testing.
  2. Adequate resources and opportunity to learn.
  3. Validation for each separate intended use.
  4. Full disclosure of likely negative consequences.
  5. Alignment between the test and the curriculum.
  6. Validity of passing scores and achievement levels.
  7. Opportunities for meaningful remediation for examinees who fail tests.
  8. Appropriate attention to language differences among examinees.
  9. Appropriate attention to students with disabilities.
  10. Careful adherence to explicit rules for determining which students are to be tested.
  11. Sufficient reliability for each intended use.
  12. Ongoing evaluation of intended and unintended effects.

 

Snapshots of Researrch and Court Decisions

In April 2001, a study funded by The Business Round Table reported higher student achievement and decreasing racial gaps without raising dropout levels in Texas.

  • In May 2001, more than half of Scarsdale NY eighth graders boycotted state tests.
  • High-stakes statewide assessment is occurring in 48 states.
  • A survey of Virginia registered voters conducted in August, 2000, found that 51 percent of the respondents said the SOL testing program “is not working:” 34 percent said it was working (Washington Post).  Strongest opposition was in Southwest Virginia.
  • A survey of Virginia registered voters conducted in August, 2000, found that 51 percent of the respondents said the SOL testing program “is not working:” 34 percent said it was working (Washington Post).  Strongest opposition was in Southwest Virginia.
  • A 2000 poll of 621 Minnesotans found that 48% thought the emphasis on statewide testing was a good thing for the state.
  • States with the most experience with high stakes testing are Texas, California, Kentucky, and Maryland.  Maryland is noted as a system that uses different types of items, including both multiple-choice and performance-based assessment.
  • Rocked by two straight years of student failure on statewide math tests, Arizona education officials have deemed it too difficult and have delayed high-stakes concequences.
  • Each Virginia SOL test contains approximately 50 multiple-choice items, except for a writing test in grades 5 and 8.
  • Virginia SOL tests are given in grades 3, 5, 8, and in certain courses in high school in mathematics, science, English, and history and the social sciences.  SOL technology tests are given in grades 5 and 8.
  • In March 2001, North Carolina officials announced that writing tests would not, as initially planned, be factored into school ratings for the next three years.
  • First full administration of the SOL tests was completed in spring, 1998.
  • In October, 1998, the Virginia Board of Education adopted two passing scores for 27 SOL test (pass/proficient and pass/advanced).
  • Significant resources are needed to develop, administer, and score high stakes statewide assessments.
  • In 1999, Wisconsin repealed its high-stakes test mandate.
  • Even though 86% of Texas 10th graders passed and 55% of Massechusetts 10th graders passed, Massachusetts students far outpaced Texas on NAEP.
  • Only 15 states test students across consecutive grade levels as proposed by President Bush.

 

The Issue in Practice

Considerable variation exists in high stakes statewide testing programs.  The trend nationwide has been to shift from norm-referenced to criterion-referenced testing, from basic or minimum knowledge and skills to higher level understanding and thinking processes, and, for many states, to some type of performance-based assessment in which students are required to formulate an original response to a question and communicate an answer through some kind of constructed act.  Currently, most states have systems that provide a balance between objectively scored and performance-based assessments.

High stakes testing will be most effective if adequate attention has been given to three technical qualities associated with testing:  validity, reliability, and fairness. (The use of tests as Part of High-Stakes Decision-Making for Students: A Resource Guide for Educators and Policy-Makers, 2000)

Validity

Validity is a professional judgment about the appropriateness of inferences, uses, and consequences that result from the assessment.  Validity is concerned with the soundness, trustworthiness, or legitimacy of the claims or inferences that are made from the scores.  Often the phrase “the validity of the test” is used, when more accurately it is “the validity of the interpretation, use, or inference.”  That is, it is the inference or use that is valid or invalid, not the test.  Thus, the same test scores can be used validly or invalidly.  For example, scores from high stakes statewide assessments may be valid as a general indicator of student knowledge, but would be invalid as a measure of teacher effectiveness.

Different kinds of evidence are used to make judgments about validity. The most common type of evidence in assessment-driven reform is related to the content covered in the test.  If the content of the test is representative of content contained in a larger domain, then inferences about student knowledge of the larger domain, based on the test results, will be reasonable.  Achievement tests always sample student knowledge. While it is sometimes difficult to comprehend, it is possible through appropriate sampling to make inferences about larger domain on the basis of what seems like a relatively short test (e.g., 50 items that cover many months of content).

Another kind of evidence is gathered when the scores from an assessment are correlated to scores from other tests or to other criteria.  If the scores are positively related, then there is greater confidence about what is being measured.  For example, one would expect that the best performing students in math would obtain the highest scores on the math assessment, while those who perform poorly would obtain low scores.   This kind of logic is important as a “common sense” kind of evaluation.

What is interesting from a validity standpoint in the current standards/assessment-driven reform movement is that educators and test specialists tend to have more conservative views about what test scores tell us than do policy-makers.  To the extent that test specialists, testing associations such as the National Council on Testing in Education, and school teachers and administrators maintain that there are limits to what standardized tests can be used for, in contrast to policy-makers, it is inevitable that there are conflicts and uneasy alliances.

Reliability

Every test score that is reported has some degree of error in it.  We simply do not have perfect measures of student performance.  The sources of error include characteristics of students, including their mood, health, or level of confidence on a particular day, luck in guessing at answers, extreme heat or cold when taking the test, distractions, and in test characteristics, such as poorly worded items.  The degree of error is technically referred to as reliability.  Test scores with little error are highly reliable; scores with a great amount of error are unreliable.  While there are several types of evidence that can be used to estimate the degree of error, resulting in reliability coefficients, the important point is that consideration of error should be part of any decision made on the basis of the scores received.

When scores are used to make dichotomous decisions, such as pass/fail, it is important to know the accuracy of making the judgment.   That is, some students who in reality do not know enough to pass, but do obtain a passing score, will have some “positive” error (e.g., good luck in guessing), resulting in misclassification.  Similarly, some students who do not receive a high enough score to pass actually have the knowledge and/or skills required.  They are also misclassified.  From a policy standpoint, it is critical to know what percentage of students is misclassified.

Fairness

Given the high stakes that characterize SOL testing, it is essential that the assessments are fair and nondiscriminatory, unbiased toward any particular group of examinees.  That is, a fair assessment is one that provides all students an equal opportunity to demonstrate achievement.  Six aspects of fairness need to be considered:

  1. It is important for the content of the assessments to be public, as well as any criteria that are used to score student constructed responses (such as what Virginia does with the writing test).  Students as well as teachers and administrators should know what will be tested and how answered will be scored.  This is made possible through clear test specifications and blueprints, and through release of sample items (Virginia’s system publishes test blueprints and sample items).
  2. Fair tests are ones that assess knowledge and skills students have had ample opportunity to learn.  Instruction that covers what is tested should be clearly and systematically documented.
  3. Students should only be tested on things that require prerequisite knowledge or skills that they possess.  This means that needed prerequisites should be clarified and documented.  This includes test-taking skills.
  4. Test questions and content should avoid stereotypes.
  5. Bias in assessment tasks and procedures should be avoided.
  6. By law, high stakes assessments must be designed to accommodate the special abilities of exceptional children.  If performance is influenced by a specific disability, the assessment must be modified so that the disabling trait is not a factor in the performance.

Additional characteristics often found in high quality assessment systems. (“v” indicates a characteristic of Virginian programs.)

  • Using scores that make sense.
  • Using both constructed-response and selected-response test items.
  • Minimizing the time between taking the tests and receiving scores. (V)
  • Allowing adequate retesting. (V)
  • Standardizing administration procedures (V)
  • Maintaining adequate test security (V)
  • Providing adequate and timely technical information on the tests.
  • Clearly articulating appropriate and inappropriate uses of the scores.
  • Conducting research on the intended and unintended consequences of the assessment system.

 

Related Issues

There are a number of important issues related to high stakes testing.  One is the cost of developing, administering, and reporting test results, and needed technical support .  Machine scoreable tests cost between $5 and $8 per student.  Policy decisions to make test forms available to teachers and the public will increase cost considerably because of the need for new tests.  Decisions to include constructed-response items will likewise increase costs.  Assessments that mix short answers with objective items cost two to three times more than machine-scoreable tests.  Hands-on performance-based assessments can cost between $30 and $70 per student.  In addition, there are operational costs to the Department of Education as well as costs incurred by local school divisions.  To date, there has been no estimate of the total cost of current high-stakes testing in Virginia. In Maryland, it costs about $30 per test per student.

Policymakers can anticipate legal challenges to the technical quality of high stakes assessments.  Ensuring strong technical quality involves careful oversight of testing companies developing the tests, an adequately trained and supported state technical staff, and an independent, external group of testing experts that can provide recommendations upon review of current testing practices.

In Virginia, options have been provided to use substitute standardized tests in place of current end-of-course tests in high school (e.g., advanced placement and international baccalaureate tests).  While providing some choice of test may address concerns of unnecessary duplication, it is very important to conduct research that links the tests so that cut scores are comparable.

Consideration has also been given to greater use of technology in the Virginia testing program, so that students could take tests online and schools could have immediate access to results.  When high stakes are involved, there needs to be equal access to technology.  There is also a need to conduct research to be sure that the technology itself is not impacting how well or poorly students perform. The 1997 reauthorization of the Individuals with Disabilities Act (IDEA) requires that students with disabilities participate in large-scale assessments.  Generally, these accommodations match what is deemed appropriate in a student’s Individualized Education Program (IEP).  As a result, there is a need to anticipate the development of policy to allow for special testing conditions and accommodations, such as different test formats, and for establishing cost estimates for such accommodations.  (See Mastergeorge & Miyoshi, 1999, Thurlow & Ysseldyke, 1996, and Thurlow, Elliott, & Ysseldyke, 1998, for comprehensive reviews of testing accommodations for students with disabilities.)

 

CEPI Summary

Because the SOL assessment system has significant consequences for students and impacts the nature of instruction, it is important for policy makers to consider seriously both limitations as well as strengths to fashion a system that will meet high student and school performance goals.  It is critical for policy makers to make sure that technical qualities of validity, reliability, and fairness are met.  Factors that affect these standards and require monitoring include:

  • Adequate technical resources.
  • Monitoring of  test development, scoring, and reporting.
  • Systematic collection of  data on opportunity to learn, on the degree to which students are motivated to do their best, on the effect of technology on test taking efficiency, on testing exceptional students, and on consequences of the assessment program on instruction.
  • Monitoring of quality of reliability and validity data for each intended use of the results.
  • Use of constructed-response assessments that align well with certain standards if feasible. Currently 34 states include performance questions in their assessments.  While multiple-choice items are best for simple knowledge and skills, more advanced knowledge and skills, which are prevalent in many of the SOL, are better assessed through constructed-response items.

Like most other states, Virginia has made a strong commitment to using high-stakes testing.  While technical standards of quality, such as reliability and validity, are essential and must be met, the overall impact of high-stakes testing needs to be evaluated.  This will require apolitical, reasoned, thoughtful research and thinking to provide sound data to make decisions that enhance the goals of public education.

 

Legislative History

Click here for summary of recent Virginia Legislative history of “High Stakes Testing.”

 

Sources, Cites, Links

Print and Internet Resources

AERA position statement concerning high-stakes testing in prek-12 education. (2000). 
www.aera.net/about/policy/stakes.htm

Cizek, G. J. (1998).  Filling in the blanks:  Putting standardized tests to the test.  Fordham Report, 2(7)  (Web site:  www.edexcellence.net/library/cizek.pdf.

Claycomb, C., & Kysilko, D. (2000).  The purposes & elements of effective assessment systems.  The State Education Standard, 1(2), 7-11.

Gordon, B.M. (2000) On high stakes testing. AREA Division G. News, Fall.

Herman, J. L. (2001) Accountability bottom up. The CREEST Line (Winter), 1-2, 8.

Heubert, J. P., & Hauser, R. M. (Eds.) (1999).  High stakes testing for tracking, promotion, and graduation.  Washington, DC:  National Academy Press.

High stakes testing:  Too much? Too soon?  (2000).  State Education Leader  18(1).

Kahl, S. (2000).  Stakes, mistakes, & statewide testing. The State Education Standard, 1(2), 18-21.

Kifer, E. (2001). Large-scale assessment: Dimensions, Dilemmas, and policy. Thousand Oaks, CA: Corwin Press, Inc.

Klein, S. P., & Hamilton, L. (1999).  Large-scale testing:  Current practices and new directions.  Santa Monica, CA: RAND Education.

Linn, R. L., & Herman, J. L. (1997).  A policymaker’s guide to standards-led assessment.  Denver, CO:  Education Commission of the States.

Mastergeorge, A. M., & Miyoshi, J. (1999).  Accommodations for students with disabilities:  A teacher’s guide.  Los Angeles:  National Center for Research on Evaluation, Standards, and Student Testing.

Neill, M. (2000).  State exams flunk test of quality. The State Education Standard, 1(2), 31-35.

Popham, W. J. (1999).  Why standardized tests don’t measure educational quality, Educational Leadership,   ,8-15.

Quality counts ’99:  Rewarding results, punishing failure.  (1999). ‚Education Week, 18 (17). (www.edweek.org/sreports/qc99)

Quality Counts 2001: A better balance: Standards, tests, and the tools to succeed. (2001) Education Week, 20 (17).

Reed, S. (2000).  The too often neglected aspects of state assessment.  The State Education Standard, 1(2), 12-16.

Roeber, E. D. (2000).  How will we gather the data we need to inform policy makers?  Dover MA:  Measured Progress.

Shepard, L. A. (2000).  The role of assessment in a learning culture.  Educational Researcher, 29 (7), 4-14.

Standards for educational and psychological testing (3rd Ed.). (2000).  Washington, DC: American Educational Research Association.

Thurlow, M., Elliott, J., & Ysseldyke, R.  (1998).Testing students with disabilities:  Practical strategies for complying with district and state requirements.  Thousand Oaks, CA:  Corwin Press, Inc.

Thurlow, M., & Ysseldyke, R.  (1996).  Assessment guidelines that maximize the participation of students with disabilities in large-scale assessments:  Characteristics and considerations.  Minneapolis:  National Center on Educational Outcomes.

The use of tests as part of high-stakes decision-making for students: A resourse guide for educators and policy-makers (2000). Washington, DC: U.S. Department of Education.

Organizations

Achieve, Inc., web site: www.Achieve.org

American Educational Research Association, web site:  www.aera.net

Association of Test Publishers. web site www.testpublishers.org

CCSSO State Collaborative on Assessment and Student Standards (SCASS).  web site: www.ccsso.org.

National Center for Research on Evaluation, Standards, and Student Testing (CRESST).  web site:  www.cse.ucla.edu

Education Commission of the States, web site: www.ecs.org

FairTest (National Center for Fair & Open Testing).  web site: www.fairtest.org

Fordham Foundation, web site:  www.edexcellence.net

National Center on Educational Outcomes, web site: www.coled.umn.edu/NCEO

National Council on Measurement in Education, web site: www.ncme.org

National Institute on Student Achievement, Curriculum, & Assessment. web site: www.ed.gov/offices/OERI/SAI

Rand Corporation.  web site:  www.Rand.org/centers/education

Education Week: Assessments. web site: www.edweek.org/context/topics/assess.htm

WestEd. web site: www.WestEd.org

Consortium for Policy Research in Education. web site: www.cpre.org/index_js.htm

 

E-mail Response

Click to provide comment or additional information. (to: cepi@vcu.edu) Please indicate in e-mail copyright source and contact info for new inclusions.

Back to Top

Copyright © CEPI 2000
CEPI grants permission to reproduce this paper for noncommercial purposes if CEPI is credited.

 

 

View Print Friendly Version | Close Window