VALUE-ADDED TESTING IN TENNESSEE

Series of stories in the Memphis Commercial-Appeal
November 29-30, 1998

Measuring Schools: Value-Added Puts Tennessee On Map
Which Is Your School?
The Premise of Value-Added Assessment
UT Statistics Professor Didn't Count On This Reception
Accountability? For Schools, Yes; For Teachers, No
Editorial: School Value Accountability Tool Deserves Full, Fair Use


Measuring Schools: Value-Added Puts Tennessee On Map


By Mickie Anderson

One of the biggest movements in education is happening in Tennessee's public schools.

Value-added assessment, the brainchild of a University of Tennessee, Knoxville, professor, has churned up tons of data about school performance - enough to confirm a few common-sense theories about education, as well as debunk a few.

Value-added assessment is based on a complicated statistical theory, but its purpose is simple - to measure how much the students in a particular school or even a particular classroom improve over the course of a year compared with their peers.

It's viewed as a performance measure of teachers and schools as much as of students. And Tennessee is the only state doing it.

Dallas schools use a similar system, but Tennessee's, created by UT professor Bill Sanders, is considered more advanced.

"A lot of people around the country are taking notice of Sanders's research," said Mike Petrilli of the Fordham Foundation, an education reform group.

Not everyone embraces value-added assessment, though.

The state's largest teachers association has doubts. Some teachers and principals prefer less complicated assessment methods. And Tennessee Education Commissioner Jane Walters has been vocal in her stance against relying too much on test-based systems like Sanders's for analyzing the performance of schools or teachers.

Nor are value-added scores compiled for every Tennessee classroom or teacher. So far, they exist only for fourth- through eighth-grade teachers and secondary math teachers.

Despite the skepticism and limitations, some educators have dived into the tricky value-added assessment method and found weak spots in their schools that needed tweaking.

Snowden principal Catherine Battle, for example, said her school has used value-added assessment to target academic areas that need emphasis the next year.

"We don't use it as `Oh, you're a crummy teacher,' '' she says. "But let's say I've noticed a new third-, fourth- or fifth-grade teacher, and the class is on par in all areas except maybe social studies. Then I would use it to ask: How can I help you with this?''

After some initial skepticism, Memphis Supt. Gerry House has taken a keen interest in the scores since schools using her reform models showed substantial gains under the value-added system.

Shelby County Schools' assistant superintendent for instruction, Fred Johnson, said he hasn't studied value-added assessment deeply enough to comment about it.

Some educators are paying attention, however, especially in light of the trends uncovered by value-added assessment.

Since Sanders developed the method in the mid-1980s, researchers have been able to pin down a number of insights about the dynamics of classrooms, kids and teachers.

Among the biggest findings:

-- More than anything else, teachers matter.

"What we consistently find, and have found since the beginning, is that the difference in the effectiveness among teachers is the single largest factor affecting the academic growth of students," Sanders says.

In a study of fifth-grade math scores in two large Tennessee school systems, Sanders found that students who had three straight years of low-performing teachers had percentile scores 52 to 54 points lower than students who had high-performing teachers three years in a row.

A percentile score shows how well a student did compared with students around the country. A 75 would mean the student did as well as, or better than, 75 percent of the test-takers.

In test score terms, 50 percentile points is an astronomical difference, the kind that could mean a child being challenged in advanced courses or turning brain-dead in average ones.

"It's basically taking an engineering, math-based career off the table for them by the time they're in the fifth grade,'' says Dave Shearon of the Nashville school board.

Shearon, elected this summer, calls that luck-of-the-draw difference tragic - but avoidable.

Last month, he suggested his school system use teacher effectiveness data to plot where the most effective and least effective teachers work, using a system of green and red dots.

But Shearon tabled the plan after Nashville's teachers, smack in the middle of hot contract negotiations, went into battle mode. They mocked the plan, wearing red and green dot-stickers in protest.

Shearon says he'll bring the idea up again when negotiations cool.

He insists the data wouldn't be used to assign teachers to low-performing schools, saying he doesn't believe the teachers would be effective if they're unhappy.

"But my suspicion is we might see some of our best teachers volunteering to go back into those schools,'' he said.

-- Teachers matter - corollary two.

The negative effects of a bad teacher, especially on a child who suffers poor-performing teachers two or more years in a row, can linger long after the child leaves that classroom.

Just as important are the effects of a great teacher. A child who gets a high-performing teacher will enjoy academic benefits for years to come.

Residual academic effects from extremely effective and ineffective teachers were measurable three years later, Sanders's study showed.

-- Teacher effect - part three.

The good teacher effect isn't compensatory. That means a good teacher can make gains with her students, but she can't completely wipe out the lingering negative effects of a bad teacher.

-- You can't judge a good teacher by her school Zip Code.

A study of two large Tennessee school districts found that black schoolchildren were overrepresented in the worst teachers' classrooms by about 10 percent and underrepresented in the best teachers' classrooms by about the same amount.

But by and large, Sanders, his RS6000 computer and his team of statisticians have found effective teachers in all kinds of schools, from rich to poor.

-- The building-change phenomenon.

Researchers have found huge drops in test scores - more strikingly in Memphis's seventh-graders than almost anywhere else - the year after students move from their elementary school to middle school or junior high.

Sanders says researchers don't believe the trauma of switching schools, or even the trauma of early adolescence, is to blame.

"Our hypothesis is that the receiving schools do not usually have a good fix on where the feeder schools left off, so the receiving schools tend to re-teach a good chunk of old material until they get a fix on where the kids are," he says.

Schools often waste time re-teaching material, dulling students' scores on that year's achievement test, he believes.

The other two theories don't pan out because test scores of students who move from one school to another don't always drop. And children are being tested against their own age group, so teenage angst shouldn't come into play.

"This is one of the things schools might want to address and give some attention to, because those impacts are large and they're consistent," he said.

"It's been hurtful to Memphis, relative to its overall cumulative gains, because the seventh grade is nearly flat," Sanders says.

Memphis's seventh-grade scores, which have shown far less progress than the national average, have dragged down city scores overall, he said.

-- Test results show that in most public schools, both inner-city and rural, the pace of instruction is geared to the lowest-scoring kids.

Sanders says his studies have shown that if students with above-average test scores in their early school years are stuck in a series of classrooms where the pace is slow, they slow down, too.

"Then it becomes a self-fulfilling prophecy: That these early above-average kids, after being in that series of classrooms, will no longer be above average, but indeed will start regressing," he says.

Much of the debate surrounding value-added assessment mirrors the arguments that have raged for years about standardized testing in general.

"There is no statistical system that will allow you to look at a student's performance and tell you what the teacher quality is. Tests don't test what a student knows,'' said Al Mance of the Tennessee Education Association.

"(Teaching) isn't like a person standing around making widgets and you're only going to hire someone who can manufacture 20 widgets a day. When you're talking about human beings . . . a person can give input on one end, and the reaction to it can be quite different,'' Mance says.

The TEA doesn't oppose value-added assessment in general, Mance says, but the 45,000-member group does object to legislators attaching ``high stakes'' to the results, such as the portion of the Education Improvement Act that allows the state to put continually low-performing schools on probation.

John Stone, an East Tennessee State University professor who is outspoken in his criticism of the education establishment, says Mance's ``widgets'' analogy doesn't fly.

``Their whole modality of argument is to simply find fault,'' he says. ``They damn the good because it isn't perfect. There is no perfect system of measuring what an individual teacher does with a group of students, and no matter what, to some extent, the method will always distort the outcome.''

But as a method of evaluating schools, teachers and students, Stone says, value-added assessment is about as fair as it gets.

``Nobody would accept it if banks argued that you shouldn't do audits because it doesn't perfectly capture what they're doing with the money,'' he said. ``But that's the implication in those arguments.''

Which Is Your School?


Education researchers say they've been able to sketch several types of schools based on value-added assessment data.

"SHED" - This pattern is most prevalent in inner-city, urban schools and is also the most frequently noted pattern among schools statewide.The shed pattern is an indication that a school is tailoring lessons to the lowest-achieving students, while having less success with average- and high-achieving students.

"REVERSE SHED" - Most common in suburban districts, such as Shelby County Schools. Shows that a school is having the most success teaching its highest-achieving students, while average- and low-achieving students may not be faring as well. A prevalent pattern for private schools.

"TEEPEE" - A less frequent pattern among Tennessee schools, indicates that a school is having the most success with its average-achieving students, while low- and high-achieving students aren't gaining as much. "V" - Also an infrequent pattern among Tennessee schools, a "V" pattern suggests a school is having success with its high- and low-achieving students, but is less successful with average kids.

The Premise of Value-Added Assessment


The premise of value-added assessment is this: Looking at scores from standardized tests doesn't give a clear picture of what happens in a classroom during a school year.

Value-added assessment is meant to measure not how much a child knows but how much that child has learned over the course of a year - how much "value" the teacher and school have added.

The system was developed by University of Tennessee-Knoxville professor Bill Sanders.

To arrive at his assessments, Sanders uses a child's current scores on the Tennessee Comprehensive Assessment Program tests, up to five years' worth of the child's old test scores, and expected improvements based on the national average gain.

Each year the system generates a batch of reports to teachers, principals and administrators showing the progress made by students in school systems, schools and even individual classrooms.

The value-added system has become attractive to school systems around the country that spent the last decade or so jumping onto the school accountability bandwagon. As a measure of the effectiveness of schools and teachers, it adds more depth to the picture provided by raw test scores.

For example, a child who continually scores well on standardized tests might make her parents happy, but it may be that the child's teacher didn't push her far enough or fast enough. Maybe she didn't gain enough compared to other kids around the country.

And for children in the lowest-scoring schools, value-added assessment is a way to gauge whether the school is helping kids catch up with their better-scoring peers.

To catch up - no easy task - students' scores must show that they're outgaining other schools each year, what Sanders calls the ``ratcheting effect.''

Value-added assessment is calculated only for third- through eighth-grade students because they're the only students who take the TCAP tests.

But Sanders has developed a value-added assessment to measure the progress of seventh- through 12th-graders in five different math courses.

Based on scores from end-of-course tests, it measures the gain made by students against the statewide average. The same method soon may be used to assess secondary school students in as many as 14 other subjects.

Sanders notes that it's almost pointless to look at value-added scores without also looking at percentile scores. That's because a school with dreadfully low test scores might show deceptively high value-added scores.

The opposite also could be true.

A school that practically aces standardized tests might not show any value-added gains. The test scores alone would give parents a bogus picture about whether their child is working up to potential.


UT Statistics Professor Didn't Count On This Reception


By Mickie Anderson

Timing is everything.

For years, University of Tennessee professor Bill Sanders banged on Nashville doors like a salesman, pestering anyone who would listen to his method for measuring school effectiveness.

Now the world's at his door.

Sanders and his ``value-added'' method for evaluating schools are the biggest things going in education, the hot topic at educrat cocktail parties, the stuff school researchers would kill to sink their teeth into.

In education circles, it's put Tennessee, the only state with such a system, on the map.

It was a harmonic convergence - a mix of political timing and dumb luck - that pushed Sanders, an agriculture research statistician by trade, into the field of education.

It was the mid-1980s and Gov. Lamar Alexander's career ladder proposal was all the news. (Legislators froze the teacher merit pay plan last year.)

Sanders, long an advocate of a little-known brand of mathematics called ``mixed-model statistics,'' was teaching a group of graduate students about the method.

Coming up with an on-the-fly sample problem, he began to show the class how one could use mixed-model statistics to evaluate how much progress a school made in a year's time.

"That was just pulled totally, completely out of the air," the 56-year-old statistician said.

His colleague, Robert McLean, corralled him after class.

McLean insisted the two write to the governor, outlining how a school evaluation system could work.

Sanders got the OK to try out his theory using Knox County schools' test data.

"I mean, to show you how naive I was, I thought the whole world was waiting for this, I really did," he recalls with amusement. "We were go-go-go-go-go.''

"So, anyway, I got done, and I called up my contact in Nashville, and I said, `I'm done.' And his question was `With what?'

"That was my first clue that the whole world wasn't waiting on this," Sanders says, chuckling at the memory.

Sanders did his first three reports based on student test data from Knox County, Blount County and Chattanooga City Schools.

Despite some significant findings, Sanders couldn't get the time of day.

So he put it aside.

Satisfied that he'd proved it could be done, he figured an education researcher one day might pick up where he left off.

But then the scene shifts.

It's now 1989, and the small schools vs. big schools lawsuit has again made education a hot-button issue in Tennessee.

About a week after the November election that year, a brand-new legislator called Sanders at home. He had been on an airplane with a retired UT professor who had chatted the legislator up about Sanders's work.

In 1992, after months of hearings and discussion and debate, Sanders's value-added assessment was made part of the state's massive Education Improvement Act.

Sen. Andy Womack (D-Murfreesboro), one of the sponsors of the legislation that incorporated Sanders's plan, isn't quite convinced that Tennessee schools are using the value-added data as much as they could.

But so far, he says, the plan is on the right track, noting that he gets calls from legislators around the country interested in what Tennessee is up to.

"I think it has brought attention to the fact that the only way to evaluate school performance is whether value is being added or not,'' Womack said.

"It gives us a new criteria for judging and evaluating schools that never existed previously.''

Sanders, who's quick with an analogy to explain his sometimes-confusing system, has logged more frequent-flier miles than a flight attendant, preaching his value-added gospel to interested school districts and groups.

With so much attention on the program, he's busy fielding offers and figuring how to handle his new fame.

"Sometimes I'm reminded of the dog that chased the car,'' he says.

"When the car stops, what do you do with it?''


Accountability? For Schools, Yes; For Teachers, No


Evaluations Not Tied To Tests, Critics Say

By Mickie Anderson The Commercial Appeal

When legislators were asked to raise the state's sales tax to generate more money for schools back in 1992, they held their breath and did it, insisting on one thing in exchange: accountability.

Indeed, school systems are held accountable on several fronts, including academic improvement as measured by the state's value-added assessment system.

But teachers aren't.

For one thing, the value-added assessment system is based on scores from the Tennessee Comprehensive Assessment Program, the annual standardized tests taken by students in grades three through eight.

That means there is no comprehensive value-added assessment for first- and second-graders or high school students. (There is an assessment that measures the progress of seventh- though 12th-graders in certain math courses.)

But even for fourth- through eighth-grade teachers, value-added assessment plays a limited role in their evaluations or accountability.

State law says value-added scores can be used in teachers' formal evaluations when three years' worth of data has been collected, a mark some teachers reached two years ago.

But new guidelines issued by the state Board of Education governing teacher evaluations ask just two things: what the teacher has learned from the value-added data and how the teacher intends to use the findings to improve.

The evaluation form used in Memphis refers several times to test scores. But, there is no direct tie between student scores and the teacher's rating.

Nor can parents use value-added assessments to choose their children's teachers. By state law, reports that reflect on individual teachers are not public.

State legislators conducted weeks of hearings before adopting the value-added system six years ago. Some say state Education Commissioner Jane Walters's lack of oversight and enthusiasm for the system has yanked most of the teeth from their attempt to hold teachers accountable.

"I think the intent was very clear, and that was that it was to be used to identify teachers, who year after year, didn't have gains, then identify the teachers whose students were showing gains year after year," said Sen. Tommy Haun (R-Greeneville).

Walters has heard the "you-just-don't-want-to-get-rid-of-the-bad-teachers" criticism before. It's off-base, she says.

She favors the use of teacher effectiveness data in evaluations but is leery of putting significant emphasis on one score.

Instead of trying to go after the worst teachers, Walters says, energy would be better spent coaching the majority in the middle.

"Getting rid of that bottom 5 percent is not going to be as effective as boosting those in the middle," she said.

A lot of things can affect a teacher-effectiveness report, she said, such as a school spending more time fund-raising than teaching, or constant classroom interruptions.

While Walters says she does not oppose value-added assessment, she has suggested scrapping some of the data-gathering, and last year convinced legislators to end mandatory second-grade testing.

When the value-added system was approved, Ned McWherter was governor, and Charles Smith ran the state's Education department. Both were ardent supporters of the system.

Research based on data turned up through the massive assessment system has consistently found one thing: That teacher quality has more to do with how students will do on standardized achievement tests than anything else.

Analysis showed that students who had the least effective teachers three years straight had standardized test scores 52 to 54 percentile points lower than students lucky enough to have highly effective teachers three years in a row.

"That's a very powerful insight into how important it is to have good teachers," said Diane Ravitch of the Brookings Institution, a public policy think tank.

"The typical educator's response is, `It's all in the home background,' '' she said. "But this is very powerful evidence that besides all that, there's still something important going on in the classroom. And that's terrifying to many people."

Although many poor students face tough academic challenges, value-added assessment tracks individual students against their own progress, eliminating the poverty factor.

Haun said in carrot-and-stick terms, the legislation was never meant as a stick to rap on teachers' heads. The intent was to identify the best teachers as a way to help poor teachers improve. In fact, legislators ensured in the law that individual teachers' records wouldn't be public.

But if a teacher simply couldn't be salvaged, Haun said, then the value-added assessment scores could be used to help terminate the employee.

"If the attitude is, `I'm not going to change,' '' he said, then (value-added assessment) could be used as a stick."

The state does hold school systems accountable for improvements, and the possibility of district probation is spelled out in the law.

That has happened only once, when Hancock County schools were put on probation last year. Besides having poor test scores, the district also was criticized for misusing funds, for using outdated textbooks and for its top-heavy administration.

The gap between school systems' accountability and teacher accountability troubles some.

"You find out that the law might not be meeting what your intent was," Haun said. "The downfall is in the oversight."

But when having a poor teacher for three years in a row is enough to "knock a kid out of the box," as Ravitch puts it, it shouldn't be an option.

"A lot of people don't want to know. They don't even want to ask the question," Ravitch says. "But you have to step back and say `What do we have these schools for?' They're not employment agencies. These kids have to be protected."


School Value Accountability Tool Deserves Full, Fair Use


Editorial

THE USE of "value-added assessment" to measure the annual improvement - or lack of it - of Tennessee's public schools gives parents, taxpayers, lawmakers and school officials a potentially powerful tool to hold principals and teachers responsible for their performance. Its application should not be unduly limited by efforts to avoid such accountability.

Value-added assessment uses pupil scores on standardized achievement tests over several years to measure how a district, a school and even an individual classroom are progressing. It seeks to establish how much the district, school and classroom contribute each year to student learning, relative to other systems, schools and classrooms across the state and nation.

The method was developed by a professor of statistics at the University of Tennessee-Knoxville and is used more widely in our state than in any other. It permits more sophisticated measures of school improvement than do simple comparisons of aggregate test scores.

Of course no single statistical device can fully assess the effectiveness of a particular school or teacher, any more than a single test score should determine a student's admission to college. But used in combination with other types of evaluations, value-added assessment can suggest how well a school is educating the children in its charge, and how it can improve.

There seems no real basis, other than reflexive resistance to change, for the opposition the method has generated among Tennessee teacher unions and some state education officials. Self-interested objections to tying value-added assessment to classroom evaluations - and appropriately publicizing those evaluations - cannot provide the last word on the method's merits.

Many of the findings of value-added assessment seem self-evident: The quality of teaching, more than any other factor, determines how well pupils do on standardized tests. Bad teaching in early grades can place children at a disadvantage from which they may never recover.

By contrast, good teaching can provide a solid academic foundation that lasts for years. And the best teachers are not always found in the districts and schools that are perceived to be the most prestigious.

Value-added assessment measures have confirmed the success of Memphis City Schools reform initiatives. Several city schools have used the method's findings to identify weaknesses and define remedial measures.

The findings also raise important policy questions. Can curricula be coordinated better, to prevent the large declines in test scores that occur when students move from elementary school to middle or junior high school? What are the consequences for gifted students of policies that set the pace of classroom instruction to accommodate low-achieving pupils?

The method has its drawbacks. Because its database consists of scores on Tennessee Comprehensive Assessment Program tests, it generally excludes pupils in the earliest grades and in high school who do not take those tests, although new measurements are under development for secondary grades.

Many aspects of teaching quality are subjective and not susceptible to statistical measurement. Test scores may not take into account the unique obstacles a school or class faces.

Like other statistical measures, value-added scores also may be deceptive at the extremes - in this case, identifying improvements in the lowest- and highest-achieving schools. And any system that links rewards to test scores can encourage schools to "teach to the test" instead of teaching, period.

With those caveats, though, value-added assessment can help measure the performance of teachers and school principals. Suggesting that assessment data identifying consistently poor performance should not affect continued employment makes no more sense than arguing that the marks on a child's report card should have no bearing on whether he or she is promoted to the next grade.

Tennessee's Education Improvement Act - and the tax increase the state levied to pay for it - are based on the premise that school districts must be publicly accountable for the resources they receive.

That should apply to schools, teachers and administrators as well. Value-added assessment is one important way of imposing such accountability - if politicians and interest groups allow it to work as it is designed to.

##