Tuesday, March 24, 2009

Test Scoring Scam Again in Texas TAKS

Thanks to Donna Garner for posting this on EducationLoop yahoogroup. She
reveals that Texas scorers were not allowed to give out any "A" level
scores, no matter how good the papers were.

This was the same problem with North Carolina writing project that I
scored. They would move definitions up and down day by day depending
on whether they had too many or too few scores. High score was a 4,
but you had to have permission to give out a 4. Published figures on
their website after the fact showed that fewer than 1% were given a 4.
Top 1% is good enough to get into Harvard or Yale, but not good enough
for an "A" on this test. If you said "integrity is a good thing" and
produced a New York Times article that re-worded a press release, but
said nothing new, that was a 1. If you said that "Martin Luther King
had integrity", that was a 2. If you gave a real person like GW Bush, a
person from the novel "Night", and a story about your mother, that was
a 3. If you wrote a Wall Street Journal op-ed piece, that MIGHT be
good enough for a 4, and in this TAKS example, would not be good
enough if no high scores were permitted.

Contrast that to my son's 10th grade writing WASL - 20% got a PERFECT
score, which would mean perfect convention and perfect content. This
also correlates to California CLAS where there were ZERO "4" math
scores statewide, and the woman in charge told Washington state
legislators that this was deliberate to "give room for improvement".
The entire process is designed to produce low scores at the start and
inflated them to produce high pass rates at the end. Washington

Republicans in my state are still blaming killing the WASL on liberals
who don't have standards. We have to keep on message that "standards"
is jus another word for "outcomes" and that "outcomes based education"
was and still is a disaster. It's also how businesses are being run
when you have to rank people and pay them by "performance-based
evaluation" and produce "continuous improvement" or else.

Posted by: "Donna Garner"
Mon Mar 23, 2009 5:50 pm (PDT)
"TAKS Scoring by Peer Pressure"
by Donna Garner

March 24, 2009

From time to time, I receive anonymous e-mails from people who read my
reports that are posted on the Internet. Last week I received one such
e-mail. This person said he had seen my 4.15.08 article entitled "An
Exposé of the TAKS Tests." (Please see my three attached reports on
Texas' state-mandated TAKS tests.). He said my concerns echoed his own
personal concerns about TAKS scoring because he had worked as an
experienced scorer for Pearson, the company that has the contract with
the Texas Education Agency (TEA) to develop and score students' TAKS
tests. My anonymous source (i.e., "John Doe") confirms my long-held
concerns that having subjectively scored sections on high-stakes tests
is an open invitation for manipulation of scores.

John had been an educator for many years and had decided to work as a
scorer for Pearson on the English / Language Arts (ELA) TAKS scoring
project. He became very uncomfortable about the harsh and inconsistent
manner in which the scoring was done and finally quit the job as did
many others. John said the scorers were forced to give low scores to
students who demonstrated exemplary writing skills but higher scores
to those students who were less deserving.

He said, "There was what I call an 'unspoken no 3 rule' on the
expository portion of a reading comprehension question [open-ended
response questions]. By unspoken, I mean that we weren't explicitly
told in so many words not to give a 3, but that we should obtain the
express approval of our supervisor before so doing. Whenever a scorer
would request permission to give a 3 on a particular paper, the
supervisors would not give their consent. In due course, many scorers
began to stop giving 3s altogether. I failed to see the logic in
this." [On the ELA-TAKS, Grade 11, Spring 2008 administration, 0% of
students in Texas made 3's on the open-response questions.]

John went on to say that an adaptation of a Readers Digest article
instructed the students to explain why they thought a particular
person was a hero or was not a hero. John said that the prompt along
with scoring materials contained major problems which should have been
resolved prior to reaching the scorers' desks. "The anchor papers
(i.e., papers used as examples) and rubric all contained several
errors and inconsistencies. In some cases, the annotations under the
student-written portion were not illustrative or supportive of the
examples given. Some of the examples provided did not even reflect the
goals stated in our manual."

John told me that during the training session for the job, various
people questioned the Texas representatives about the problems with
the anchor papers. The scorers were told to ignore the problems and
score them anyway. One of these Texas Education Agency representatives
was Victoria Young. "We were told not to rely upon our anchor papers
but to use the rubric more. The problem was that the language used in
the rubric was very general and over-broad. This created too many
loopholes and led scorers to drift, either giving too many high scores
or too many low scores. This undermined the integrity of the scoring
altogether. Anchor papers were far more precise than the rubric and
were essential to accurate scoring, especially on the open-ended

John explained that many of the scorers expressed their concerns to
Pearson management about the inequities, but the managers simply
shifted the blame back to Texas and said they were powerless to do
anything about it. If the scorers did not "go along to get along" at
Pearson, they were considered to be renegades and were treated with
group disapproval. The scorers were heavily criticized if their scores
were too high or too low; and when they tried to explain their
rationale for giving the scores they did, they were treated like
"naughty children who refused to obey."

In addition, a spreadsheet was circulated around the scoring room
periodically so that John and the other scorers could compare their
scores with the rest of the scorers (e.g., to see if they were giving
too many 1's or 2's). If so, they were told they had to change; but
they did not know how to begin because no specific work samples were
provided for reference. In fact, John said the Pearson scoring system
was not based on accuracy but on general agreement and consensus of
opinions. Roughly, only 20% of the papers scored were backread by
supervisors and/or directors. "It's basically a majority-rules system,
where conformity with your fellow scorers is of the essence, and
quality control plays little if any part at all."

One of the most disturbing statements made by John is that last year,
Pearson began a policy whereby bonuses were given to scorers based on
several criteria such as high validity scores, productivity levels
(number of papers scored per hour), and other subjective factors
determined by the supervisor and scoring director. John explained that
validity is one of the criteria used to monitor and gauge a scorer's
consistency in performance and/or application of the rules, but the
problem was there was no way to measure accuracy. He said it was
feasible that a scorer could be consistently wrong but still wind up
with a bonus. If a scorer made a mistake at the beginning of the
project and tried to correct it later on, he would receive a lower
validity score that affected his bonus. One supervisor actually told
one of the scorers that if he wanted a bonus, he should keep on
repeating what he had done at the beginning even if it was wrong!

I continue to say that subjectively assessed sections (including
portfolio assessments, open-response questions, and essays) on
high-stakes tests are ludicrous. People's opinions, peer pressure, and
manipulation take over; and the final scores then become meaningless
as measures of student academic performance. At least 80% of any
high-stakes test should be based upon objectively tested,
right-or-wrong answers. This is only common sense.

Donna Garner wgarner1 at hot.rr.com

Writer/Consultant for MyStudyHall.com

English Success Standards (K-12)

Please let me know if you do not wish to receive my e-mails, and I will take you off my address list. Thank you

Back to top Reply to sender | Reply to group | Reply via


Anonymous said...

A while back, during the scoring of 3rd grade WASL math, we came across a reading comprehension question that was somehow accidently inserted in the students' math booklet before it went to print. The prompt required the students to think of some questions a librarian would be likely to ask 3rd graders while ordering books for them. Many third graders were confused and tried to turn it into a math question like, "If a book costs $7.50 and the librarian needs to order 20 of them, how much money would she need to place the order?" Anyone who answered this way was marked wrong. Others simply stated that they didn't think the question had anything to do with math. Although they were correct, these responses were also given a 0. The desired answers were along the lines of: "What kind of books do you like?" "Who is your favorite author," "What is your favorite genre," "Do you like Dr. Seuss," etc.

The fact that the question was in the wrong place and therefore very misleading was never taken into consideration by the scoring facility. In all fairness to the students, this question shouldn't have been counted at all, yet it was.

There was constant "tweaking" of the rules during the scoring of the WASL. The rules we learned during qualifying and training were repeatedly modified. Some were dropped altogether. Our scoring at the end of the project was entirely different than the way it began initially.

Scoring is certainly not rocket science; so much weight should *not* be given to any one test.

BlArthurHu said...

The public still has no appreciate for how bad a scam this was, to justify a huge injection of cash, time and effort for WORSE THAN NOTHING. The very idea that "all will succeed" is wrong headed. The purpose of education is to not insure success, it's to insure everybody gets an education. You cannot guarantee success in a democracy, and it's a lie in a socialist society. You can test and increase spending until the cows come home, and you will not achieve "basic education" as they legislature is STILL trying to push through, and the people are still clueness enough to not notice.

Anonymous said...

I myself am also surprised at the lack of public outcry on this topic.

What's worse is that even though we in America test our school children far more than they do in other countries, America still lags behind the rest of the world in math and science. We have become far too complacent (and perhaps even a little too trusting) as a nation and must demand better from our elected state and local officials when it comes to school funding and administration.

I would have to agree that increased spending alone isn't the answer. These grossly inflated multi-million dollar contracts between the states and the for-profit-only national testing are a big joke and do nothing more than breed corruption starting at the top levels. That's where all the "backscratching" goes on, then it filters downwards from there.

We're putting far too much money into standardized testing and not enough into basic classroom resources, such as textbooks, computers, and other teaching tools. This expensive, elaborate testing only serves to ignore our most fundamental problem, and that is with the system itself.

RJ Hewitt