As we enter the last week of September, school is starting to feel real again. But as everyone is locking in following their summer hiatus, there is one topic no one wants a refresher on: AI and the use of ChatGPT in the classroom. Ever since ChatGPT shook the academic world, teachers and administrators have grappled with the role, or lack thereof, that AI should play in the classroom. This year, however, AI may play a much bigger role than previously anticipated.
Last year, physics instructors Gregory Kestin and Kelly Miller led a study of their Physical Sciences 2 (PS2) course, a biology-based physics course, where they implemented an AI tutor in the class. In the resulting article pre-printed along with Anna Klales, Timothy Milbourne, and Gregorio Ponti, students taught by the AI tutor outperformed students taught in an active learning lecture when tested after a week of lessons each.
After a pre-test to gauge previous knowledge of the concepts covered, the students were tested again at the end of the week to compare their relative learning. As quoted in The Harvard Gazette, “Learning gains for students in the AI-tutored group were about double those for students in the in-class group.”
Inspired by the results from the PS2 research, Math 21A Course Head Eva Politou decided to conduct a more in-depth study this semester. “Last year, we had the component of the workshops in 21A where we had mixed feelings from students about it, [as] many of them say that it feels like an extra section of the week,” Politou said in an interview with the Independent.
This year, 21A students were given the choice between an AI tutor-led workshop and an in-person workshop led by undergraduate course assistants (CAs), each of which would meet once a week. These workshops were designed to increase student understanding of the course material with real-world applications. Politou’s study will examine student engagement with the instructors, both AI and CAs, by analyzing the quality of conversations generated in workshop discussions.
“If students are just kind of saying, ‘What did you get for that question? Oh, I got the same. Okay, let’s keep going’—That’s a less quality kind of conversation,” Megan Selbach-Allen, an education research scientist who is working alongside Politou for the study, explained. “Are they kind of engaging richly with the tutor? Or are they asking kind of quicker, shorter, I-just-want-to-get-this-done type questions?”
An important feature of this study is the duration and breadth of the data collection. Because Politou wants to explore the benefits of AI and human teaching beyond one or two weeks of lessons, the researchers seek to analyze students’ work and conversations inside and outside the classroom.
“There’s going to be a lot of different metrics that we’re looking at. So it’s going to be the [AI/in-person] logs themselves and the quality of those questions that we observe. It is going to be the reflections. It is going to be the kind of results on p-sets or exams,” Selbach-Allen said. From test scores to clips of student discussions, the 21A research team hopes to have a comprehensive understanding of student learning.
However, with any sort of AI or technological innovation, it’s also important to think about who will be displaced as a result. The main goal of the workshops is to promote deep thinking about mathematical concepts. Depending on how beneficial AI is for students, CAs may or may not need more training for future workshops.
“If we find that it doesn’t really make any difference, we might keep both for just the flexibility aspect,” Politou said about the results regarding AI tutors. “If it gives more, then maybe we will have to reconsider training for the CAs to allow people that still want to do it in person to be able to do it in person. So maybe we could have smaller sections for the in-person [workshops] with more trained people.” Course assistants can rest assured that their jobs are safe, for now.
Considering the 21A study will take place over a longer period of time than the PS2 one, the results of Politou’s work will take at least a year, consisting of carefully reviewed data sourced from filmed conversations, question logs, and written reflections. “That’s what we would call qualitative research. It’s not going to be as clearly quantified, but we probably will come up with some degree of quantification based on a rubric, then codes, maybe how many questions are happening, and some rating of the quality of those questions.” Selbach-Allen explained regarding the data analysis.
Even if there is no conclusive outcome from the study, Politou hopes to use the lessons learned from the study to make the AI tool and workshops more efficient for next term. “I’m not necessarily expecting an outcome; I’m very curious to see how this evolves.”
Caroline Stohrer ’28 (carolinestohrer@college.harvard.edu) dreams about the AI apocalypse when she falls asleep in class.