How Standardized Testing Is Like Airline Thinking
By Debbie Silver
Recently, I was scheduled to leave on American Airlines’ first and only daily flight from DFW to Bozeman, MT. Just before time to board, the gate agent announced that maintenance needed to replace the tires on our plane. She explained it was a long process, and we should be patient for the next few hours while we waited for an update.
I quickly dashed off a tweet to AA asking them why they didn’t change the tires the night before when maintenance could have done it without inconveniencing passengers.
They replied with a generic, “Our primary goal is to put passenger safety first.”
I responded, “I’m not arguing about keeping us safe, I am asking why you couldn’t have maintenance take care of this before time for us to leave?”
They wrote back, “Often our pilots don’t know there’s an engine problem until they turn on the plane and spot an irregularity.”
I countered, “I’m not talking about an engine problem, I am talking about an external tire problem that should have been reported when the plane landed and taken care of overnight.”
They answered, “Our primary goal is to put passenger safety first.” They obviously weren’t interested in hearing about or correcting an easily fixable problem.
Yes, there’s an analogy – keep reading!
I was so frustrated, I just turned off my phone and returned to reading about using student data to evaluate school effectiveness. As I scanned the article describing how states aggregate standardized test scores to justify all manner of assertions, I couldn’t help but think how education decision-makers, like the airlines, often obfuscate individual problems with sweeping top-down generalizations that have little or nothing to do with responding to the needs right in front of us. Whether it’s “passenger safety” or “school accountability,” all-purpose labeling often serves no good purpose at all.
I recall a story from several years ago about an award-winning music instructor – Nick Prior of Albuquerque, New Mexico – who was deemed “minimally effective” on his annual teaching evaluation because half of his points were determined by low “student achievement.”
This is a man whose choirs have won multiple state and national competitions. The irony is that the “student achievement” points were based on test scores in reading and math, not music, and they included many students he didn’t even teach. (Slate) Sound like airline thinking?
Unfortunately, the practice of holding teachers generally accountable for circumstances beyond their control is not uncommon. But if one of the purposes of data collection is to help teachers improve their practices, why present us with results from students who are not in our classes?
There are more examples of airline thinking. As a middle grades classroom teacher, I generally didn’t receive test results until months after the test takers had left my room. The overdue reports concentrated on comparison of our school’s overall student scores with those of students in a “sister school” (assigned by the state as being statistically similar, though in many ways the schools weren’t comparable).
The summaries were geared towards a generalized accountability and were neither diagnostic nor instructive. I remember thinking, “How does this highly questionable sweeping school comparison help me improve my teaching?” This practice still goes on to varying degrees all around the nation.
I understand that politicians and some other folks far removed from education believe the best (and cheapest) way to ensure fidelity in testing is to standardize and automate it. However, homogeneous multiple-choice tests yield little descriptive or prescriptive information for teachers. Data is not always about numbers. It should render actionable information to those responsible for delivering classroom instruction – the teachers.
A timely, effective alternative: Test-retest
In an age where educators are held accountable for how much value we add to student learning, why don’t we use a simple test-retest model? At the beginning of each school year students could be assessed in both formal and informal contexts. Well-made tests are an excellent way to measure learning and diagnose weaknesses.
Data could quickly be shared with individual teachers so we could plan our curriculum choices and instructional strategies based on the information we have on our actual students. End of the year assessments would be given to those same students.
School districts (or states) could factor in the variables we know affect an individual’s predictable learning gains (i.e. socioeconomic status, identified learning disabilities, language challenges, etc.) and apply a formula for expected increases. Comparing the actual outcomes with the projected ones would give teachers, schools, districts, states, and the community a more accurate picture of what is happening in our classrooms.
Test/retest with teacher input
Both pre- and post-assessments should be designed at least in part by teachers of the grade levels and subject areas they measure. Content and performance standards should be meaningful and aligned to the standards of the district (or state). Formalized testing would be limited to the beginning and the end of the school year with the primary purpose to give teachers the information we need to improve our practice.
As one who used to get bored and begin to play “connect the dots” on my answer sheet, I can attest to the fact that not all the things that need to be measured get measured. Machine scored tests don’t always give a full picture of what a student knows. We need to move beyond multiple choice and include opportunities for students to demonstrate what they can do.
Why do we hold pep rallies and/or bribe students to do well on their assessments? Shouldn’t high test scores be the by-product of a good education rather than the result of a frenzied push one time a year? Well-constructed, meaningful assessments can inspire students to show what they have learned. Let’s shift our focus from how to increase performance scores to how to better understand our students.
Addressing the real issue
When I question decision makers about why we continue to test students and report data in such an unconstructive manner, I hear the party line, “The public wants accountability.” To me that sounds a lot like the American Airlines proclamation, “Our primary goal is to put passenger safety first.” It sounds great, but it doesn’t address the problem at hand.
Teachers want to understand what our students should know and be able to do, what they presently know and are able to do, and how to best bridge the gap between the two. Globalized, aggregated test data does little to inform us about how to improve teaching methods.
I realize my proposed test-retest method has critics and is not without its drawbacks, but at least teachers would have tangible data with which to improve our craft. I think teachers are the most creative, resourceful people on the planet (probably because we have to be), so let’s put our heads together and propose some alternatives to the high stakes testing system currently in place.
Please write and contribute your thoughts. Who knows, maybe we can come up with a plan that gives us a viable answer to the question, “Isn’t there a better use for student test data?”
Dr. Debbie Silver is a learning consultant and humorist with over 30 years of experience as a teacher, staff development facilitator, and university professor. As a middle grades classroom teacher, Debbie won numerous awards including the 1990 Louisiana Teacher of the Year award. She speaks worldwide on issues involving education and is a passionate advocate for students and teachers. She is the author and co-author of four bestselling books, including Deliberate Optimism: Reclaiming the Joy in Teaching.