Tuesday, June 10, 2014

How 97,000 AP Physics exams get scored

I was fortunate to be a reader this year for the AP Physics exams. In order to be considered, I had to have taught AP Physics for at least 3 years before filling in the application.  It took another four years to be invited to be a reader.

Readers spend just over a week at their reading location, with all travel and expenses paid by the College Board. This year's physics reading occurred in Kansas City, Missouri, at the same time as psychology and European History. 205 physics teachers from around the world came to grade over 90,000 exams.

New readers are called "Acorns" and are given an acorn logo on their name badge. While you might expect this to be an invitation for hazing, we were welcomed and treated quite nicely.  Name badges were required for entry into any of the reading areas to ensure security of test materials.

AP reader name badge with acorn indicating first-time reader

For seven consecutive days, we worked from 8:00 a.m. till 5:00 p.m. with an hour for lunch and two 15 minute breaks. There were varied activities scheduled most evenings including receptions, professional development sessions and workshops given by the College Board.

Behind the Scenes

Before the reading begins, there is a remarkable process by which all exams are transported from the individual schools to the College Board offices in New Jersey.  There, the exams are sorted, tagged and assembled into folders. Each folder contains 25 exams and a set of computer bubble-sheets, one for each question. Ten of these folders are packed into each of thousands of cardboard boxes and shipped to the respective reading locations which this year included Louisville, KY, Cincinnati, OH, Salt Lake City, UT and Kansas City, MO.

Each reader is assigned a specific question from one of the exams.  I was grading the second free response question on the AP Physics B exam, referred to as "B2".  There were an additional twelve readers assigned to my question as well as two "Table Leaders" who arrive prior to the readers to fine-tune the grading rubric along with the Chief Reader.

The first day of reading began with a welcome meeting giving general instructions and introducing the various table leaders. We then each followed our table leaders to our reading locations. The large conference rooms (think hotel ballroom) were split into individual question rooms by portable curtains. Our room had tables up front where boxes of exams were assembled and where packets were picked up and dropped off for grading.  On each side of these tables was a flip-chart for keeping a tally of number of folders each reader completed every session.  Table leaders occupied two additional tables facing the room and additional tables facing the table leaders each sat two readers.

We went through brief introductions and began training on our specific question. Sets of student responses were copied and distributed for us each to grade according to the rubric, after which we compared our scores on each of the four parts of the question. we discussed any discrepancies and continued with several sample packets and discussions to ensure that we were applying the rubric consistently with similar interpretation.

Following lunch, the reading began.

In addition to the physics teachers flown in from around the world, there was a local workforce hired to manage the flow of boxes and exams between rooms. They would open boxes and prepare the packets for pick-up by pulling the proper question scan sheet from a pocket in the back of the folder and place it in the  folder in front of the pack of exams. They would also perform quality control on the returned packets, ensuring the scan sheets were properly bubbled and all information was complete before re-packaging the boxes of exams to be moved to another room for another question to be graded.

The scoring process consisted of picking up a packet from the front of the room, returning to my desk, filling in my reader number on the edge of the folder to indicate that question 2 had been read, and bubbling in my reader number on the scan sheet. I then would work through the 25 exams, which are the actual packets the students wrote on during the test administration (I had thought these might have been scanned in and we'd be working from images), fairly and consistently applying the rubric and bubbling in the question 2 score on the scan sheet.  When complete, I'd bring the folder back to the return/quality control table, post a tally mark on the flip chart and grab another folder.

On the first day of the reading, every folder was dropped off with the table leader instead of at the return table, and the table leader would grade the same set of exams.  Any discrepancies of more than a point would be discussed.  Once we were getting the same scores, the remaining packets could be turned in to the return table. This grading quality control continued throughout the week with table leaders requesting packets at various times through the day. Each scan sheet contained two columns for grading, and the completed scan sheets were processed for statistical control, both to keep track of each reader's overall productivity as well as to identify any deviation from table leader scores on identical packets or from the rest of the group in terms of overall average scores on the question.  These statistics were examined daily and throughout the day by the reading leadership.

Productivity

I kept track of my daily total for individual number of tests scored, while table leaders kept track of folders scored for each reading period.  By the third day of reading, the pace of scoring had reached a point where a typical reader could complete about ten packets per session.

I was surprised to see productivity continue to increase as much as it did over the first few days of reading. I was so excited about my numbers that I was tweeting them to followers and a former student took it on himself to graph my progress.

Courtesy https://twitter.com/kywilli

By the end of the reading, I had personally scored a total of 5,620 exams.


Conclusion

Grading the exams was a remarkable learning experience, both for the actual process as well as for the after-hours professional development and camaraderie with fellow physics teachers. But I also came away with several tips for students taking exams in the future:


  1. Read Directions:  So many students were likely approaching problems and drawing diagrams just the way they were taught by their teachers, but many of these approaches and drawings did not earn credit because they did not follow the clear directions stated in the problem. 
  2. Don't Bother Writing an Irrelevant Essay:  Students might think it is amusing to write long letters or fiction or complaints to the person grading the exam, but chances are your work will never be read.  I took 22.5 seconds on average to grade each exam, which included bubbling the scan sheet and walking the test to the front of the room.  I don't have time to read your essay, and if it is in the "justify" section, it'll just upset me if you waste my time. A brief display of your artistic talent, however, was appreciated on occasion, especially if you got the rest of the question right.
  3. Show Your Work: Many points were lost for "immaculate answers" or answers where the origin of the solution was unclear.  This isn't a test about getting the right number, it is a test about demonstrating knowledge of physics.
I hope I get the chance to return every year to repeat this experience. It certainly will make me a better teacher and I'll be better able to prepare my students to demonstrate their knowledge of physics.

Oh, and I also got a nifty new geeky physics tee shirt to commemorate the experience.


Disclaimer: Although I was paid by ETS for my participation in scoring AP exams, these opinions are my own and I am in no way authorized to represent the opinions of ETS. It is my understanding that none of the above discloses any proprietary information.