|
James
McMillan, Editor

Student assessment is at the heart of the current standards
movement in public education. Every state has some kind of
large scale testing program, and for forty-eight states the
results of these assessments are used to monitor progress
toward student attainment of standards, and for evaluating
schools. Thirty-three states have instituted rewards and
sanctions as part of a state testing accountability program,
and 38 states test at all three grade levels (elementary,
middle, secondary). Large scale, high stakes assessments have
become central to both school and student accountability.
For better or worse, student scores on statewide assessments
are used to determine whether students graduate from high
school, are promoted from one grade level to the next, receive
special diplomas, and whether schools are accredited, rewarded,
or even face some form of state takeover. In some states
student assessments are used to determine whether teachers
receive pay raises.
Because the direct consequences of performance on a single
test are very significant, the term high-stakes
is often used to describe their purpose. It is almost as
if standards have become secondary to assessments, since performance
on the tests is what counts. Many describe what is currently
occurring across the country assessment-driven reform rather
than standards-based reform.
Granted, the ultimate purpose of any school reform movement
is to improve student achievement. But how is it that policy
makers can best conclude that student achievement has improved?
After all, standardized and classroom testing of students
has been done for decades. What is different in the current
reform movement is to construct and administer state controlled
assessments that are specifically aligned with high
state standards, and to then attach important consequences
to how all students perform. This marks a fundamental change
in how student performance is evaluated. What has traditionally
been localized and based primarily on teacher evaluations,
grades, and course-taking, is now centralized and standardized.
School quality is judged on the basis of student performance
rather than on resources, teacher credentials, and other input
or process factors. As a result of this different
emphasis, high stakes assessment has had a profound impact
on education
Since decisions about students and schools are tied directly
to how students perform on high stakes assessments, it is
important to understand the nature of these tests, what we
have learned from previous high stakes testing programs, what
makes for a high quality large-scale assessment program, and
both positive and negative intended and unintended consequences.
This discussion will focus on the nature of the assessments;
consequences are considered separately as CEPI issue Testing
Consequences.
What is the Nature of High Stakes Statewide Assessment?
An assessment is a procedure for gathering evidence directly
from students and using the evidence to evaluate student performance.
The process of gathering or collecting evidence is separate
from evaluation. In statewide assessment the evidence is
gathered in a standardized manner in which the
testing and scoring procedures used are the same for every
student and school. In this way current high stakes assessments
are much like standardized achievement and aptitude tests.
The format for collecting information can vary. Most assessments
use selected-response type items, such as multiple-choice
(49 of 50 sates). These items are objectively scored, and
so are often called objective items. Open-response items,
such as writing, completion, and performance-based assessments
are scored using professional judgment. Professional judgment
is also used in objective tests in the creation of the item
and determination of the correct response. Thus, when assessments
are called objective, they are objective only
in the way the answers are scored (e.g., machine scoreable).
Thirty-eight state assessment systems utilize short-answer
items. Only 7 states use essay or performance-based assessments
(Quality Control, 2001).
Evaluation involves an interpretation of the information
that has been gathered, in which value judgments are made
about the performance. Evaluation gets at what test scores
mean. In the current reform movement standards provide the
basis for evaluating performance on high stakes assessments.
Obviously professional judgment is central to evaluation.
Who makes the judgments and how they make them central to
determining the overall results, such as outcomes reported
to the public such as pass or accredited.
What level of performance is necessary for pass?
Only professional judgment can answer this question. In contrast
to standardized achievement tests that evaluate student performance
based on how a student compares to a norm group
(norm-referenced testing), most current state high-stakes
assessment evaluations are based on pre-determined levels
of achievement (criterion-referenced testing). In previous
decades most statewide testing was done with standardized,
norm-referenced, off-the-shelf tests developed
by national testing companies. Current high stakes assessments,
in contrast, are implemented by statewide assessment programs.
Given the increased stakes of these assessments, the wide
scope typically covered (many grade levels and subjects),
the demand for quick turn around for reporting scores, and
the political press for immediate positive benefits,
state assessment resources are rarely able to meet these demands.
As a result, most states partner with one of a few national
testing companies to develop the tests and report the scores.
Resources have been stretched, at both the state and national
level, resulting in significant problems. Just three companies
develop these tests (Harcourt Educational Measurement, Riverside,
CTB-McGraw-Hill), and industry executives have said increasing
demands could affect test quality and scorring accuracy. For
example, in 1999, thousands of New York City students were
mistakenly required to attend summer school based on incorrect
scores. In Washington, 500,000 student writing samples had
to be rescored, and Kentucky, Minnesota, and California have
experienced miscoring or loss of student tests. In the past
three years, at least 19 states have reported problems with
test materials or mistakes in scoring. This context is important
because it effects what high stakes assessments are and are
not able to provide in a credible, high quality way that will
be accurate and fair.
High stakes statewide assessments that are aligned with state
content and performance standards can be used for many purposes.
It is important for policy-makers to understand and revisit
as needed what the purposes for high stakes assessments are
so that implementation and use of the results are consistent
with the purposes. Clearly articulated purposes also clarify
to schools and the public what the overall goal is and what
criteria should be used to evaluate the effectiveness of the
system to meet stated goals. Purposes and benefits of high
stakes assessment programs include the following:
-
Assures that standards are taken seriously and motivates
teaching of the standards. Assessments are the carrots
and sticks that force schools and teachers to change,
shaking up parents, school boards, school
administration, and teachers. Parents, in particular,
are motivated to take action to improve schools. Recent
research shows that almost 80% of teachers believe that
todays curriculum, in comparison to 3 years ago,
is somewhat or a lot more demanding
(Quality Counts 2001).
-
Motivates students to learn. By attaching important
consequences students will be motivated to learn more
given higher standards.
-
Provides the same basis of evaluation for all students.
Grade inflation and different standards used in different
schools can be minimized. Research has shown that some
schools serving low socioeconomic status students have
easier standards for grading students than
schools serving high socioeconomic status students.
-
Provides information that can inform policy-makers
of the quality of education. While a direct link
between student performance and quality of education may
not be warranted, the assessments provide meaningful data
on what is being provided and opportunity to learn.
-
Monitors school improvement efforts. Assessment
results over time provide information about whether targeted
school improvement efforts are working. The assessments
provide indicators to evaluate program effectiveness.
-
Identifies student strengths and weaknesses to target
instruction. Results may be used to group students,
change the curriculum, examine teaching, or emphasize
identified areas.
-
Provides the same high expectations for all students.
Having the same assessments and criteria for passing provides
greater equity between the haves and the have
nots by adopting a philosophy that all students
can learn. All students typically refers
to all socioeconomic status levels, all schools, and may
include students with disabilities and students whose
primary language is not English.
-
Assures better preparation for higher education and
employment. Employers are increasingly demanding more
of new employees, as are colleges and universities that
need to reduce or eliminate remedial courses.
-
Allows recognition to schools and teachers of students
who perform high and/or improve performance. Assessments
provide for meaningful rewards based on student achievement.
-
Holds schools accountable for student performance.
Assessment results provide for public accountability of
schools and teachers for student performance. The assessments
provide a helpful external referent for accountability.
-
Increases the emphasis on student achievement.
Student achievement is one of many factors used in grading
students. High stakes assessments focus attention on
student achievement, lessening the influence of effort,
participation, and other non-achievement factors typically
used in grading students.
-
Enhances professional development of teachers and
administrators. The assessments provide a basis for
professional development. Teachers have benefited very
much if involved in the development, scoring, and standard-setting
processes, especially for performance-based assessments.
While high stakes assessment is used extensively, it is not
without its limitations and detractors. Much of the criticism
is concerned with the consequences of such testing on teachers,
curriculum and students. These criticisms are briefly listed
here, along with other limitations, to provide a sense of
what arguments against high stakes testing are based on.
-
Too much too soon. Many maintain that the tests
are developed and implemented too quickly, resulting in
poor questions and unreliable scoring. Assessment expertise
is thin, particularly at the state level, which may lead
to an over reliance on outside experts.
-
Objective assessments measure too much simple knowledge.
Objectively scored tests, such as the multiple-choice
tests used extensively in Virginia, are best used to measure
student knowledge and simple skills. These kinds of tests
are not the best way to measure deep understanding, complex
skills, reasoning, problem-solving, and critical thinking.
-
New assessments dont tell us much that is new.
What new information is gained from high stakes testing?
What more do we know about our students and schools?
Some maintain that high stakes tests dont tell us very
much that we dont already know from existing sources
of information (e.g., standardized achievement and aptitude
test scores).
-
What you test and how you test it is what you get;
what is not tested you do not get. A potentially
serious limitation with high stakes tests is not what
they do, it is what they do not do. High stakes assessments
inform us about only some of what is important in public
education, namely student proficiency on what is tested.
The tests are limited by what content is covered and by
the nature of the assessments. There is much that is
not covered by the tests that is also very important to
knowing whether students have had a high quality education
and have developed needed skills.
-
Too much emphasis on a single test score. Failure
to achieve a passing score on high stakes assessments
can be life-changing. Many organizations, including national
testing companies and national testing associations, maintain
that such critical decisions as being unable to graduate
should never be made on the basis of a single test score.
To do so uses the score in a way that is not appropriate
given the limitations of what a single test can tell us.
-
Do higher test scores reflect true changes in student
achievement? Many contend that increases in test
scores are mostly because students have become more sophisticated
test takers (more test-wise) and because schools institute
test review practices that raise scores without really
raising achievement. Research tends to support the notion
of test score inflation (Shepard, 2000).
-
Statewide assessment leads to a more narrow statewide
curriculum. Because of high stakes, schools will
narrow the curriculum to emphasize what is on the test.
The common test for all schools drives a common curriculum,
resulting in less local autonomy and less curriculum that
is tailored to local conditions, traditions, and context.
-
The tests are unfair to low socioeconomic students
and schools. We know that scores on standardized
tests are based primary on three influences: 1) student
ability or aptitude, 2) whats learned outside of
school, and 3) what is learned in school. Because low
socioeconomic students tend to have lower abilities and
live in environments that may not be conducive to academic
learning, they are at a significant disadvantage. Further,
teachers and principals in low socioeconomic schools that
fail are penalized for factors that are outside
their control. …
-
High stakes assessments are designed to make schools
look bad. Some maintain that one purpose of high
stakes assessments, particularly those with very high
performance standards, is to make public schools look
like they are failing or are inadequate. Some maintain
further that the motivation for this is to enact policy
to allow for vouchers to private schools.
-
High stakes assessments increase the centralized education
bureaucracy. States have learned that developing
and operating a high stakes testing program requires considerable
resources, leading to a larger education bureaucracy.
-
High stakes assessments drain resources from other
programs. Because assessments have high stakes it
is essential to provide whatever resources are needed
to assure needed quality. Huge sums of money are needed
to develop, administer, and score the tests. Much of
that money goes to a few national testing companies.
That is money that could go directly to school divisions
to improve instruction.
-
High stakes assessments encourage motivation based
on extrinsic rewards. Motivation is enhanced when
students are involved, when they want to learn to understand
better, when they can see a purpose to what they are learning.
This is referred to as intrinsic motivation.
External, high stakes tests motivate students solely on
the outcome. There is greater emphasis on extrinsic motivation.
-
High stakes assessments result in too much time preparing
students to take the test. The pressure to do well
on the tests results in inordinate amounts of time devoted
to practice tests and test-taking skills. Time is diverted
from more important learning.
-
High stakes assessments dont provide information
that can improve instruction. By their very nature,
high stakes statewide assessments cover a lot of content
and provide only a few scores. Unlike some criterion-referenced
tests, they do not provide specific information that teachers
can use to change instruction. In the Virginia tests,
even the reporting category scores give only
provide a general sense of student strengths and weaknesses.
-
Results of high stakes assessments are used to judge
teachers and administrators. Even though current
tests are not designed to be used to evaluate teachers
and administrators, the scores are often used in this
way. The tests are designed to measure student, not teacher
or administrator competence. Like schools, teachers and
administrators have limited influence over important factors
that determine the scores, such as ability, effort, and
what students learn outside of school.
-
High stakes assessment results depend on student motivation.
It is critical that students try their hardest to do well
on statewide assessments. If students are not motivated
to perform, low scores may suggest, in error, that appropriate
instruction has not been provided.
While it is clear that there are champions and detractors
of high-stakes testing, one group with relatively little political
interest is the American Educational Research Association.
This association, which includes educational measurement and
research experts and seeks to promote educational policy that
scientific research has shown to be beneficial, has recently
issued a statement about high-stakes testing (AERA, 2000).
The organization maintains that while high-stakes testing
can improve education, there is the potential for serious
harm if either inadequate resources are invested or if technical
requirements are insufficient for intended uses. Under these
conditions, it is likely that policymakers will be misled
by test results. AERA reccomends that high-stakes testing
programs should meet the following conditions:
- Protection against high-stakes decisions based on a single
test. According to one expert, there is a definite
limit to the amount of information that once-a-year assessments
of limited duration...can provide (Herman, 2001) Also
see Gordon, 2000; and recent statements from the American
Association of School Administrators which rejects accountability
systems that rely only on testing.
- Adequate resources and opportunity to learn.
- Validation for each separate intended use.
- Full disclosure of likely negative consequences.
- Alignment between the test and the curriculum.
- Validity of passing scores and achievement levels.
- Opportunities for meaningful remediation for examinees
who fail tests.
- Appropriate attention to language differences among examinees.
- Appropriate attention to students with disabilities.
- Careful adherence to explicit rules for determining which
students are to be tested.
- Sufficient reliability for each intended use.
- Ongoing evaluation of intended and unintended effects.

In April 2001, a study funded by The Business Round Table
reported higher student achievement and decreasing racial
gaps without raising dropout levels in Texas.
- In May 2001, more than half of Scarsdale NY eighth graders
boycotted state tests.
- High-stakes statewide assessment is occurring in 48 states.
- A survey of Virginia registered voters conducted in August,
2000, found that 51 percent of the respondents said the
SOL testing program is not working: 34 percent
said it was working (Washington Post). Strongest opposition
was in Southwest Virginia.
- A survey of Virginia registered voters conducted in August,
2000, found that 51 percent of the respondents said the
SOL testing program is not working: 34 percent
said it was working (Washington Post). Strongest opposition
was in Southwest Virginia.
- A 2000 poll of 621 Minnesotans found that 48% thought
the emphasis on statewide testing was a good thing for the
state.
- States with the most experience with high stakes testing
are Texas, California, Kentucky, and Maryland. Maryland
is noted as a system that uses different types of items,
including both multiple-choice and performance-based assessment.
- Rocked by two straight years of student failure on statewide
math tests, Arizona education officials have deemed it too
difficult and have delayed high-stakes concequences.
- Each Virginia SOL test contains approximately 50 multiple-choice
items, except for a writing test in grades 5 and 8.
- Virginia SOL tests are given in grades 3, 5, 8, and in
certain courses in high school in mathematics, science,
English, and history and the social sciences. SOL technology
tests are given in grades 5 and 8.
- In March 2001, North Carolina officials announced that
writing tests would not, as initially planned, be factored
into school ratings for the next three years.
- First full administration of the SOL tests was completed
in spring, 1998.
- In October, 1998, the Virginia Board of Education adopted
two passing scores for 27 SOL test (pass/proficient and
pass/advanced).
- Significant resources are needed to develop, administer,
and score high stakes statewide assessments.
- In 1999, Wisconsin repealed its high-stakes test mandate.
- Even though 86% of Texas 10th graders passed and 55% of
Massechusetts 10th graders passed, Massachusetts students
far outpaced Texas on NAEP.
- Only 15 states test students across consecutive grade
levels as proposed by President Bush.

Considerable variation exists in high stakes statewide testing
programs. The trend nationwide has been to shift from norm-referenced
to criterion-referenced testing, from basic or minimum knowledge
and skills to higher level understanding and thinking processes,
and, for many states, to some type of performance-based assessment
in which students are required to formulate an original response
to a question and communicate an answer through some kind
of constructed act. Currently, most states have systems that
provide a balance between objectively scored and performance-based
assessments.
High stakes testing will be most effective if adequate attention
has been given to three technical qualities associated with
testing: validity, reliability, and fairness. (The use of
tests as Part of High-Stakes Decision-Making for Students:
A Resource Guide for Educators and Policy-Makers, 2000)
Validity
Validity is a professional judgment about the appropriateness
of inferences, uses, and consequences that result from the
assessment. Validity is concerned with the soundness, trustworthiness,
or legitimacy of the claims or inferences that are made from
the scores. Often the phrase the validity of the test
is used, when more accurately it is the validity of
the interpretation, use, or inference. That is, it
is the inference or use that is valid or invalid, not the
test. Thus, the same test scores can be used validly or invalidly.
For example, scores from high stakes statewide assessments
may be valid as a general indicator of student knowledge,
but would be invalid as a measure of teacher effectiveness.
Different kinds of evidence are used to make judgments about
validity. The most common type of evidence in assessment-driven
reform is related to the content covered in the test. If
the content of the test is representative of content contained
in a larger domain, then inferences about student knowledge
of the larger domain, based on the test results, will be reasonable.
Achievement tests always sample student knowledge.
While it is sometimes difficult to comprehend, it is possible
through appropriate sampling to make inferences about larger
domain on the basis of what seems like a relatively short
test (e.g., 50 items that cover many months of content).
Another kind of evidence is gathered when the scores from
an assessment are correlated to scores from other tests or
to other criteria. If the scores are positively related,
then there is greater confidence about what is being measured.
For example, one would expect that the best performing students
in math would obtain the highest scores on the math assessment,
while those who perform poorly would obtain low scores.
This kind of logic is important as a common sense
kind of evaluation.
What is interesting from a validity standpoint in the current
standards/assessment-driven reform movement is that educators
and test specialists tend to have more conservative views
about what test scores tell us than do policy-makers. To
the extent that test specialists, testing associations such
as the National Council on Testing in Education, and school
teachers and administrators maintain that there are limits
to what standardized tests can be used for, in contrast to
policy-makers, it is inevitable that there are conflicts and
uneasy alliances.
Reliability
Every test score that is reported has some degree of error
in it. We simply do not have perfect measures of student
performance. The sources of error include characteristics
of students, including their mood, health, or level of confidence
on a particular day, luck in guessing at answers, extreme
heat or cold when taking the test, distractions, and in test
characteristics, such as poorly worded items. The degree
of error is technically referred to as reliability. Test
scores with little error are highly reliable; scores with
a great amount of error are unreliable. While there are several
types of evidence that can be used to estimate the degree
of error, resulting in reliability coefficients, the important
point is that consideration of error should be part of any
decision made on the basis of the scores received.
When scores are used to make dichotomous decisions, such
as pass/fail, it is important to know the accuracy of making
the judgment. That is, some students who in reality do not
know enough to pass, but do obtain a passing score, will have
some positive error (e.g., good luck in guessing),
resulting in misclassification. Similarly, some students
who do not receive a high enough score to pass actually have
the knowledge and/or skills required. They are also misclassified.
From a policy standpoint, it is critical to know what percentage
of students is misclassified.
Fairness
Given the high stakes that characterize SOL testing, it is
essential that the assessments are fair and nondiscriminatory,
unbiased toward any particular group of examinees. That is,
a fair assessment is one that provides all students an equal
opportunity to demonstrate achievement. Six aspects of fairness
need to be considered:
- It is important for the content of the assessments to
be public, as well as any criteria that are used to score
student constructed responses (such as what Virginia does
with the writing test). Students as well as teachers and
administrators should know what will be tested and how answered
will be scored. This is made possible through clear test
specifications and blueprints, and through release of sample
items (Virginias system publishes test blueprints
and sample items).
- Fair tests are ones that assess knowledge and skills students
have had ample opportunity to learn. Instruction that covers
what is tested should be clearly and systematically documented.
- Students should only be tested on things that require
prerequisite knowledge or skills that they possess. This
means that needed prerequisites should be clarified and
documented. This includes test-taking skills.
- Test questions and content should avoid stereotypes.
- Bias in assessment tasks and procedures should be avoided.
- By law, high stakes assessments must be designed to accommodate
the special abilities of exceptional children. If performance
is influenced by a specific disability, the assessment must
be modified so that the disabling trait is not a factor
in the performance.
Additional characteristics often found in high quality assessment
systems. (v indicates a characteristic of Virginian
programs.)
- Using scores that make sense.
- Using both constructed-response and selected-response
test items.
- Minimizing the time between taking the tests and receiving
scores. (V)
- Allowing adequate retesting. (V)
- Standardizing administration procedures (V)
- Maintaining adequate test security (V)
- Providing adequate and timely technical information on
the tests.
- Clearly articulating appropriate and inappropriate uses
of the scores.
- Conducting research on the intended and unintended consequences
of the assessment system.

There are a number of important issues related to high stakes
testing. One is the cost of developing, administering, and
reporting test results, and needed technical support . Machine
scoreable tests cost between $5 and $8 per student. Policy
decisions to make test forms available to teachers and the
public will increase cost considerably because of the need
for new tests. Decisions to include constructed-response
items will likewise increase costs. Assessments that mix
short answers with objective items cost two to three times
more than machine-scoreable tests. Hands-on performance-based
assessments can cost between $30 and $70 per student. In
addition, there are operational costs to the Department of
Education as well as costs incurred by local school divisions.
To date, there has been no estimate of the total cost of current
high-stakes testing in Virginia. In Maryland, it costs about
$30 per test per student.
Policymakers can anticipate legal challenges to the technical
quality of high stakes assessments. Ensuring strong technical
quality involves careful oversight of testing companies developing
the tests, an adequately trained and supported state technical
staff, and an independent, external group of testing experts
that can provide recommendations upon review of current testing
practices.
In Virginia, options have been provided to use substitute
standardized tests in place of current end-of-course tests
in high school (e.g., advanced placement and international
baccalaureate tests). While providing some choice of test
may address concerns of unnecessary duplication, it is very
important to conduct research that links the tests so that
cut scores are comparable.
Consideration has also been given to greater use of technology
in the Virginia testing program, so that students could take
tests online and schools could have immediate access to results.
When high stakes are involved, there needs to be equal access
to technology. There is also a need to conduct research to
be sure that the technology itself is not impacting how well
or poorly students perform. The 1997 reauthorization of the
Individuals with Disabilities Act (IDEA) requires that students
with disabilities participate in large-scale assessments.
Generally, these accommodations match what is deemed appropriate
in a students Individualized Education Program (IEP). As
a result, there is a need to anticipate the development of
policy to allow for special testing conditions and accommodations,
such as different test formats, and for establishing cost
estimates for such accommodations. (See Mastergeorge & Miyoshi,
1999, Thurlow & Ysseldyke, 1996, and Thurlow, Elliott, & Ysseldyke,
1998, for comprehensive reviews of testing accommodations
for students with disabilities.)

Because the SOL assessment system has significant consequences
for students and impacts the nature of instruction, it is
important for policy makers to consider seriously both limitations
as well as strengths to fashion a system that will meet high
student and school performance goals. It is critical for
policy makers to make sure that technical qualities of validity,
reliability, and fairness are met. Factors that affect these
standards and require monitoring include:
- Adequate technical resources.
- Monitoring of test development, scoring, and reporting.
- Systematic collection of data on opportunity to learn,
on the degree to which students are motivated to do their
best, on the effect of technology on test taking efficiency,
on testing exceptional students, and on consequences of
the assessment program on instruction.
- Monitoring of quality of reliability and validity data
for each intended use of the results.
- Use of constructed-response assessments that align well
with certain standards if feasible. Currently 34 states
include performance questions in their assessments. While
multiple-choice items are best for simple knowledge and
skills, more advanced knowledge and skills, which are prevalent
in many of the SOL, are better assessed through constructed-response
items.
Like most other states, Virginia has made a strong commitment
to using high-stakes testing. While technical standards of
quality, such as reliability and validity, are essential and
must be met, the overall impact of high-stakes testing needs
to be evaluated. This will require apolitical, reasoned,
thoughtful research and thinking to provide sound data to
make decisions that enhance the goals of public education.

Click here for summary of recent Virginia Legislative history
of High
Stakes Testing.
Print and Internet Resources
AERA position statement concerning high-stakes testing in
prek-12 education. (2000).
www.aera.net/about/policy/stakes.htm
Cizek, G. J. (1998). Filling in the blanks: Putting standardized
tests to the test. Fordham Report, 2(7) (Web site:
www.edexcellence.net/library/cizek.pdf.
Claycomb, C., & Kysilko, D. (2000). The purposes & elements
of effective assessment systems. The State Education Standard,
1(2), 7-11.
Gordon, B.M. (2000) On high stakes testing. AREA Division
G. News, Fall.
Herman, J. L. (2001) Accountability bottom up. The CREEST
Line (Winter), 1-2, 8.
Heubert, J. P., & Hauser, R. M. (Eds.) (1999). High stakes
testing for tracking, promotion, and graduation. Washington,
DC: National Academy Press.
High stakes testing: Too much? Too soon? (2000). State
Education Leader 18(1).
Kahl, S. (2000). Stakes, mistakes, & statewide testing.
The State Education Standard, 1(2), 18-21.
Kifer, E. (2001). Large-scale assessment: Dimensions, Dilemmas,
and policy. Thousand Oaks, CA: Corwin Press, Inc.
Klein, S. P., & Hamilton, L. (1999). Large-scale testing:
Current practices and new directions. Santa Monica, CA:
RAND Education.
Linn, R. L., & Herman, J. L. (1997). A policymakers
guide to standards-led assessment. Denver, CO: Education
Commission of the States.
Mastergeorge, A. M., & Miyoshi, J. (1999). Accommodations
for students with disabilities: A teachers guide. Los
Angeles: National Center for Research on Evaluation, Standards,
and Student Testing.
Neill, M. (2000). State exams flunk test of quality. The
State Education Standard, 1(2), 31-35.
Popham, W. J. (1999). Why standardized tests dont measure
educational quality, Educational Leadership, ,8-15.
Quality counts 99: Rewarding results, punishing failure.
(1999). ‚Education Week, 18 (17). (www.edweek.org/sreports/qc99)
Quality Counts 2001: A better balance: Standards, tests,
and the tools to succeed. (2001) Education Week, 20 (17).
Reed, S. (2000). The too often neglected aspects of state
assessment. The State Education Standard, 1(2), 12-16.
Roeber, E. D. (2000). How will we gather the data we need
to inform policy makers? Dover MA: Measured Progress.
Shepard, L. A. (2000). The role of assessment in a learning
culture. Educational Researcher, 29 (7), 4-14.
Standards for educational and psychological testing (3rd
Ed.). (2000). Washington, DC: American Educational Research
Association.
Thurlow, M., Elliott, J., & Ysseldyke, R. (1998).Testing
students with disabilities: Practical strategies for complying
with district and state requirements. Thousand Oaks,
CA: Corwin Press, Inc.
Thurlow, M., & Ysseldyke, R. (1996). Assessment guidelines
that maximize the participation of students with disabilities
in large-scale assessments: Characteristics and considerations.
Minneapolis: National Center on Educational Outcomes.
The use of tests as part of high-stakes decision-making
for students: A resourse guide for educators and policy-makers
(2000). Washington, DC: U.S. Department of Education.
Organizations
Achieve, Inc., web site: www.Achieve.org
American Educational Research Association, web site: www.aera.net
Association of Test Publishers. web site www.testpublishers.org
CCSSO State Collaborative on Assessment and Student Standards
(SCASS). web site: www.ccsso.org.
National Center for Research on Evaluation, Standards, and
Student Testing (CRESST). web site: www.cse.ucla.edu
Education Commission of the States, web site: www.ecs.org
FairTest (National Center for Fair & Open Testing). web
site: www.fairtest.org
Fordham Foundation, web site: www.edexcellence.net
National Center on Educational Outcomes, web site: www.coled.umn.edu/NCEO
National Council on Measurement in Education, web site: www.ncme.org
National Institute on Student Achievement, Curriculum, &
Assessment. web site: www.ed.gov/offices/OERI/SAI
Rand Corporation. web site: www.Rand.org/centers/education
Education Week: Assessments. web site: www.edweek.org/context/topics/assess.htm
WestEd. web site: www.WestEd.org
Consortium for Policy Research in Education. web site: www.cpre.org/index_js.htm

Click to provide comment or additional information. (to:
cepi@vcu.edu) Please indicate
in e-mail copyright source and contact info for new inclusions.
Back to Top
Copyright © CEPI 2000
CEPI grants permission to reproduce this paper for noncommercial purposes if
CEPI is credited.
|