As stated above, the primary intent of the Hot Seat was to provide an opportunity for all students to earn points for meaningful class participation. Throughout the semester, I noticed a surprising new phenomenon at the end of class: Student groups were planning review sessions to prepare for their days in the Hot Seat! I was curious how this preparation before class would affect student learning. My investigation was guided by several questions listed as section headings below.
To gauge effectiveness at improving subject mastery, I analyzed responses on both the midterm and final exams. During the 2002–2003 and 2003–2004 academic years, instructors at Elon University have used Universe: The Solar System (first edition) by Freedman & Kaufmann. The accompanying Test Bank for Universe Sixth Edition by Clark & Wilson (2002) was the source of nearly all of the midterm and final exam questions.
As another measure of student learning, I administered the Astronomy Diagnostic Test (ADT) Version 2.0 (Hufnagel 2002) at the beginning and end of the course. This commonly used conceptual exam is a descendant of the Project STAR Astronomy Concept Inventory (Sadler 1998) and the Misconceptions Measure (Zeilik, Schau, & Mattern 1998; see Bailey 2003 for a full history). The ADT covers a range of basic astronomical topics, including diurnal motion, lunar phases, and global warming. Deming (2002) has demonstrated both the reliability and validity of this test in a nationwide sample of 5,346 precourse scores and 3,842 postcourse scores.
Of the 21 astronomy questions on the ADT, only questions 2, 5, 6, 8, 14, 18, and 21 concerned material covered on a day in which students were in the Hot Seat. In Table I, I summarize the relevant parameters for all three exams (i.e., midterm, final, and ADT), including the number of students, the total number of questions, and the number of questions related to material covered by a group in the Hot Seat.
This first question is the most obvious one. In Table II, I list the percentage of correct responses in the following two subsets: (1) responses by students to questions related to their Hot Seat material, and (2) responses by the remaining students to these same questions. There is a significant difference between these two populations. Combining the midterm and final exams, there is only a 0.04% probability that the improved performance is by chance. Curiously, most of this significance is based in the midterm (t=3.0, p=0.15%). The final exam results are improved for the Hot Seat students, but are nevertheless consistent with no gain (t=0.47, p=32%).
As a second test of the effectiveness of the Hot Seat, I performed a similar analysis using the ADT. Here I compared gains by students on questions related to their Hot Seat topic (g=0.33) with the gains by the remaining students answering these same seven questions (g=0.16). The results for each question appear in Table III. Although the gain of the Hot Seat students is higher, with so few questions, the difference is moderately significant: only 1.7 sigma, with a chance probability of 5.0%.
Although it would be interesting to see if students not in the Hot Seat benefited (or suffered) from other students being in the Hot Seat, there is little reason to expect this. To examine this issue, I used the seven ADT questions related to Hot Seat material. The gains on these questions by students sitting in the Hot Seat audience (19.7±2.8%) were not significantly different from those of previous semesters (18.8±3.8% in spring 2003, and 22.7±5.3% in fall 2003). I conclude that there is no evidence that the Hot Seat system detracted from the learning experience of students not participating in it on a given day.
On both the midterm and final exams, nearly all of the students were included in both the Hot Seat population and the audience population, depending on the question. However, in analysis of the ADT, only seven Hot Seat-related questions corresponding to five groups were used in the test sample. It is not unreasonable that these students could, by chance, be intrinsically brighter. To test this, I compared their performance with that of the remaining students on the 14 ADT questions unrelated to any Hot Seat material. The Hot Seat population scored lower, though this difference was insignificant (12.4±5.0% vs. 18.3±1.8%).
The midterm exam questions had an average difficulty of 0.69 and discrimination of 0.37. The final exam questions had an average difficulty of 0.71 and discrimination of 0.36. Although these average values are acceptable, they indicate that a nonnegligible fraction of items might not discriminate effectively.