Am I a geek or a nerd?


Is this a dead horse I’m beating?
February 19, 2012, 8:53 pm
Filed under: education law, evaluation | Tags: , ,

I belatedly finished Diane Ravitch’s Death and Life of the Great American School System (I link to B&N because I have a Nook…we actually have three Nooks in this house…for two people who are able to read…and a tablet…and a netbook…heavens…) and while I’m pretty sure I knew the basics of much of her position, that she provides citations and research to debunk myth and illustrate our dangerous ways makes this a nerdy/geeky delight.

Read it yourself for the full poop; I’m still thinking about the prospect of using student test scores to evaluate teachers. Ravitch provides much info, but the piece I’m most struck by is this study: “Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains.” In short, the USDE commissioned a study that found that error rates in determining effectiveness of individual teachers range anywhere from 25 to 35 per cent depending the variables (number of years scores are collected for example). This was released in July 2010. One year prior to this study Race to the Top was announced and in part required states to use student scores to evaluate teachers. Hmm. I poked through the study document and found citations dated 2010, so my assumption is that this was commissioned sometime in 2010. So USDE demanded an action of states and at the same time paid for a study to see if that action was really effective. Then it turns out that the action doesn’t seem to be a great option but they keep on keepin’ on with the demands.

I get that scores will just be one part of evaluation for teacher here in NY, and I get that we don’t have a great system in place now (see here, here, here and I’m sure I’ve mentioned more than that and for some other great links and commentary on this, here), but I really don’t think I want any part of any evaluation to have an error rate of 25%.

I’m still trying to be unsurprised by the blindness of everyone barreling down this road.

Advertisements


Everything I could say about RTTT in NY has been said…almost.
January 26, 2012, 9:41 pm
Filed under: assessment, education law | Tags: , , , ,

Here

God bless the New York Times, this article sums up all of my questions and concerns better than I ever could. That said, I have something to add. The author questions the role of State Ed in determining how well 700 school districts are assessing students for growth in non-state tested subjects given the budget cuts at the state level and the number of students and teachers and assessments that would need independent review.

 

I think I know what’s going to happen. The assessment/evaluation police aren’t going to show up. In fact, its likely that no one will ever look closely at how districts meet the mandate. Experience tells me this. While working in Michigan as a Social Studies Consultant the state adopted k-12 Social Studies expectations where there had previously been very little required, but the state also had just one Social Studies Supervisor for the entire state and when she retired she was not replaced. State Ed funding and personnel were dramatically reduced. No one checked what you were teaching, or if you had created a curriculum that was freely available to parents, or if you were assessing students on the new expectations. Some schools even refused to align their grade level subjects to those required by the state. Because, really, what was the limbless state ed going to do about it?

 

I suspect RTTT in NYS will result in similar behavior in schools; do enough to make it look like you’re doing what they want, create a paper trail, but don’t really change much of anything in the end.

 

 

 

 



How Do I Know I’m Doing My Job Well?
November 12, 2010, 2:26 pm
Filed under: assessment, Geek | Tags: , , ,

I’m a Social Studies Consultant for an intermediate school district in Michigan. That means I provide professional development services to our local school districts. Most people still don’t know what that means. What do I do all day? How do I know I’m doing it well? That I have trouble answering that concerns me.

Wow.

Education Week has recently been working on a series on Professional Development for teachers and its effectiveness, generally finding that not only is effectiveness not often measured, but that there are very few standards for what constitutes PD. EW suggests that PD needs to be targeted toward student weaknesses and how teachers can attend to those weaknesses. I agree, but I think there is more to PD as well.

Some of  my job includes attending to specific requests of districts. I assume when a department head or administrator asks me to assist their teachers with something specific it is because there is a known shortcoming. Mostly this is true, but sometimes it isn’t, and as I don’t work with one building, one department, or one district I’m often at a loss to determine what teachers really need. For example, many administrators called me to assist with Social Studies data analysis of last year’s state exams but clearly were not aware that we should not and could not do this. Should not because the exams were slated to address different standards this years, so knowing how we performed on standards no longer in place wouldn’t assist us in preparing students for the next assessment. Could not because the state no longer releases specific social studies items; we can’t see more than the content strand of each question. Without specific question language the percentages and scores provided by the state are as deep as we can go. Now had they asked me to help teachers create assessments and then analyze that data we would have been on our way to some good PD.

Another portion of my job is creating PD opportunities based on what I know many teachers and districts are struggling with. For example, I know that county-wide our students struggle with supporting their assertions in writing. I’ve put together workshop series to address this idea specifically in Social Studies. Do I know if it works? No. I don’t get to observe teachers and I’m fairly confident that the people doing the observing don’t know what PD teachers have attended. I also don’t know if the teacher who attend are those who most need this PD.

A third part of what I do is awareness. When the state adopted new content standards I did some PD about how to read them and adapt to them. I’ve since been part of a project to create a comprehensive k-12 SS curriculum based on these new standards; there has been a tremendous amount of PD for teachers to learn the progressions of learning, the units of instruction, and the various ways to implement lessons. None of this is specifically targeting weaknesses among our teachers, but it is necessary information for them to do their jobs.

On one hand I would like to know that any PD I conduct results in improved instruction and student performance, I’m also aware that some PD doesn’t manifest itself this way. Learning about changes in state requirements, for example, has less to do with improving student performance and more to do with learning the minimum requirements of your job. If I were locked into provided only services that could be measured by student test results I wouldn’t be providing very broad or rich PD.



How Long Does It Take to Grade a Test?
November 11, 2010, 8:38 am
Filed under: assessment | Tags: , , , ,

In Michigan it takes several months

The bulk of MEAP testing is multiple choice with student responses recorded on bubble sheets. I have never understood why it takes two months to get these scores or why the scores released at two months are not for public consumption. Having scored exams in NYS, I know that a single teacher (me, for example) can run 800 electronic score sheets by hand (we didn’t have a fancy self fed machine…) in an hour or two. At the end, a summary sheet that indicates the percent of students who selected each answer is produced. I understand that Michigan does not allow in-house scoring, that the tests need to be packaged and sent back to the state, and that we are testing ALL students in selected grades, but is the state really working with a single machine? Does it have just one person scoring these? Does it have the very old scoring machines that don’t have a way to transfer data to a computer for sorting?

I’ll live with this. But now the added wait for scoring of written assessments. According to the letter linked above, they must be scored and then the state and SBE will determine cut scores. In my understanding of test development, the writing assessments should have been pilot tested on a variety of students, analyzed for validity and reliabililty and cut scores assigned determined prior to scoring the exams. Again, I’ll compare my NYS experience. Each year’s cut scores were different; I could never say precisely to a student  that he must answer 30 of 50 MC correctly and get a 4/5 on each essay to pass before the test arrived at my building, but once we scored (remember, an hour for bubbles, and then 3 days for 1600 essays each read twice) we knew immediately our students results. No waiting for the whole state to be scored and then figure out passing…

I’m sure there is a rationale to this. Thoughts out there?

As an added irritant, none of the items will be released. Schools get general scores, and percentages of student performances on various strands in the tested subjects, but teachers will never know exactly what each student they teach struggled with. They will never know if their students didn’t know a particular core concept or simply didn’t know some non-content vocabulary. If data analysis is one tool for school improvement, and state tests are the measuring stick for that improvement, how can schools know where to apply their efforts if they never see the actual measure? It’s like running a race without any knowledge of how long the race is…how do you pace yourself, determine what nutrition you need, which shoes to wear, how much water to drink.



Evaluation Time
June 11, 2010, 8:11 am
Filed under: assessment, Geek | Tags: , ,

My office doesn’t evaluate people. We don’t go into schools and evaluate teachers. It’s a strange position we’re in; schools and teachers require staff development (by law in some cases) and we have the responsibility to provide it, but we do not have the authority to evaluate its impact. That’s not to say we don’t collect numbers, but it’s not specific evaluation of people attending our staff development offerings. We get MEAP numbers, MME numbers, teacher cert numbers, drop out numbers, graduation numbers. One of my plans for next year is to actually try to observe teachers from our networks–not an evaluation, just an observation. 

We also don’t get evaluated ourselves. In some ways this is good. When I left my position as an assistant principal I left behind an a boss who was outstanding at his job, but hard to please as well. His evaluations of me were sometimes confusing, in that he would suggest improvements in things that were simply unmeasurable. One I remember was Visibility. He had trouble articulating what he meant and I’m not sure I ever lived up to what we in his mind, but it was a fuzzy thing to be evaluated on. When I came here, I was accustomed to reporting my work constantly, keeping my boss in the loop so to speak. After about three weeks of my regular emails detailing various projects and proposals, my new boss (also outstanding at her job) replied that I had been hired to do a job and that could just do it, I didn’t need her approval. Wow. So I roll along, creating staff development projects based on district requests and my own anticipation of needs.

Things might be changing. For the first time here, I’ve been asked to evaluate my administrative support. I could write pages on the awesomeness that is Carol–she not only keeps a babillion plates spinning at once, she also cleans up after my figurative messes (like forgetting contracts or mixing up dates or ordering the wrong book).  She tolerates my potty mouth, saves me from sales reps, and lets me borrow her pickup truck.  She came up with the ingenious way to track attendance at our networks. She manages grants in her sleep. And she has excellent penmanship. Okay, great eval on the way. Except when I look at this evaluation document it doesn’t allow me express that we make a great team and she makes up for my shortcomings and I make up for hers. Instead, I have to check boxes ranking her as above average and so on for technical knowledge, accuracy, amount of work, cooperation, flexibility, and punctuality.

Not much of  picture of Carol, is it.

Reminds me of the teacher evaluations. You know, where all teachers are excellent.

Bringing me to my point. I know Carol is awesome, I watch her work everyday. Its only because I’m intimately familiar with her work style and work load that I can say she is great at what she does. I’ve had other secretaries, one was horrible, one was awesome. How do you measure their awesomeness, or lack of? Its so much more than a check list. How do I quantify her creativity (for example, I don’t make my own workshop announcements any more, Carol has a great creative sensibility that allows her to whip these things up and the are always better than what I can do)? The same is true when we look at teachers. Ask any principal who his best teachers are and I’m sure you will get a quick response, but look at the paper trail of observations and you’ll find many more teachers seemingly just as good.  How do these administrators know who is the best and why don’t they record it? They know it because they walk around their buildings all day, hear reports from kids and parents, see who is early to work and late to leave. Principal Walk Throughs were one of my best tools to learn who my go to teachers were–a non-evaluative  and unannounced visit can provide a glimpse of what goes on in our classroom.

Public Impact has created a new website and a report dedicated to the idea that identification and better use of our outstanding teachers can create substantial change for our students. Their assessment of the change possible is inspiring, until you realize that the have jumped right over the GIGANTIC task of IDENTIFYING these amazing people. The report doesn’t clearly state how to determine who the high fliers are, but does give some interesting predictions for what will happen when we use them properly. Okay, so we just ask those principals, right? The ones who are only allowed to use the paper trail of observations as evaluative tools even if they know better. Or maybe test scores? Lets use test scores, cause all kids are exactly the same so it should be a snap to see if the teachers achieved something with their widget children. Oh, I know, lets look at training and credentials. That should tell us. Or not. Seems that credentials aren’t at all a predictor.

The task of evaluating teachers must include bits of all of these things. Just like I with my eval of Carol; its includes my day to day sense, examples of great work, a piece of paper ranking some technical aspects, honesty on my part, openness to being evaluated on hers.  Before we can use our great teachers (or heavens, make staffing and salary decisions) we need to overhaul the current system of check lists for evaluation.



Lonestar 70.3 and standardized testing
April 14, 2010, 2:09 pm
Filed under: Nerd | Tags: , , ,

In an attempt to keep this education related but still express some frustration with triathlon I’m going to build a tenuous link between Standardized Testing and Wave Starts at Lonestar 70.3.

As with any ‘test’, you need to know what you are measuring. In this case, Lonestar measures my ability to complete a 1.2 mile swim, a 56 mile bike ride, and 13.1 mile run. While any 70.3 courses may have different challenges (hills, wind, currents) all competitors must complete the distance. The argument is made by many that the playing field at any given race is level; all the entrants deal with the same terrain and constraints of the given course.   Most races have leveled the playing field further by using electronic timing chips so that each individual athlete’s completion time is precise–my chip timer electronically starts when my feet cross the start line, not when the first of the 1500 entrants crosses the start line. This is also a method to prevent cheating and provide data–at certain points on the course (swim finish, bike start, bike turn around, bike finish, run start, run turnaround or lap, run finish) the athlete and his chip will pass electronic markers.  In this way, the race officials can start 100 or so athletes at a time, instead of a 1500 at once, every 5 minutes, lets say and still have accurate completion statistics for each entrant.  They do their best, the race organizers, to keep us all happy, out of each other’s way (that’s why the pro athletes start before the rest of us–we would simply get run over) but sometimes the standardization methods make the playing field not so level.  Those for whom the event counts as work (sponsored athletes) get a very level field–they start first, with people who are of the same caliber. The rest of us are lumped by age with no concern for our ability or level of preparation.

Take my Lonestar 70.3 wave. My group is last. I start at 8:15AM. Sounds fine, right. Except that there are 14 or so other groups starting before me, the pros at 7AM, the young guys at 7:05 etc. Again, you ask, what’s the big deal?  There are three things here that can negatively affect my performance and that are out of my control.

1. The winds in Galveston get stronger as the day goes on. Starting 75 minutes later than others means I will probably face a windier and therefore more challenging course.

2. It’s also hot in Galveston, the later the day, the hotter it gets. Again, beginning my race 75 minutes after the official start means running in higher temps than some of my competitors.

3. I’m an above average swimmer. I always catch up to the average and below average swimmers in the previous waves and find myself either running into them or navigating around them. Certainly others in earlier waves have the same problem, but they may only have 1 or 2 or 3 groups in front of them. I have 12.

What’s the connection to standardized testing? I ask you if the playing field for the entrants is actually level. Does every ‘competitor’ face the same circumstances that are beyond his control-like quality of teacher, temperature of the building, instructional material, parental involvement, breakfast? While I can’t control my start time in these events, I am participating voluntarily. Our kids are not.

PS: you can track my progress on 4/25 by going to ironmanlonestar.com and following athlete number 1444!



Triathlon and Teaching Cross Paths Again-AYP, FTP, NCLB, ESEA…
March 17, 2010, 3:01 pm
Filed under: Nerd | Tags: , , , ,

Sometime I’m dumbstruck by how stuff I learn from my coach and triathlon training are so related to things we do in the classroom.  Chuckie V (over there on my blogroll) had a great post today that, to translate for educators, was essentially about what tests measure and what we can learn from them. To sum up: the only thing we know for sure about tests is that they measure the ability to perform on said test, and even then sometimes the results change for reasons that are unrelated to preparation for the test.

There is this number many of us in cycling/triathlon shoot for, consider it the AYP of the bicycle world, called Functional Threshold Power, or FTP. To get this number you ride your ass off for 60 minutes as hard as you can or you extrapolate from a shorter, probably harder ride. This number then sets the basis for future training. Sort of like when we pre-test kids or do lots of practice tests for the real test, then see where they struggle and try to fix it. The goal being that next time the numbers are better. Unfortunately, we often use these test numbers as measurements of other things. For example, some might think that if I have a high FTP I must be in good shape and will do well at longer distances than someone with a lower FTP. Just like one might think that a school with a high proficiency percentage a better school than one with a lower proficiency percentage.  But there is the rub. The test measures the test circumstances but leaves everything else out.

Heavier riders have higher power numbers than lighter riders. Certainly sometimes I can push bigger numbers than out of shape fat guys, but in general, I’m never going to have numbers like the men, or like the women with 20 pounds on me. Doesn’t make me slower though, just means I’m using less wattage.  My FTP for one hour also doesn’t  indicate how I’ll do at 5 hours. Maybe I’ve just trained and trained to have high numbers for one hour.  I can only really get information when I compare my old FTP to my new FTP and use it plan my paces for future workouts. And then measure again…

Sound familiar?

Like maybe kids have just trained and trained to get the test questions right. Or that maybe we should consider some other factors, like native language or poverty or transience. Or maybe even compare a student’s current numbers to his past numbers. And then plan our instruction around where the students are…

Will the NCLB-The Sequel do this or will it just mean more testing at more grades in more disciplines? Will we get away from days of test prep? Will we hold up meaningless numbers and say they represent performance?

I don’t know, but I’m off to do a workout based on my FTP of 235: 30 minutes at 145 watts, 15 minutes at 200 watts, 10 minutes at 145 watts, 30 minutes at 200+ watts and a cool down. Ideally next time, my FTP will go up. Who knows what that really means…

 




%d bloggers like this: