A Lesson For AI Tutors
Let's take a look at the literature...
EdTech Insider recently hosted a YouTube webinar on “The Future of AI Tutoring.” The panelists included executives from top organizations like OpenAI, Khan Academy, and Google’s DeepMind. These organizations have spoken very highly of tutoring:
Deep Mind (December 2025): “One-to-one tutoring is widely considered the gold standard for personalized education.”
Sal Khan (TED Talk 2023): “If you were to give personal 1-to-1 tutoring for students, then you could actually get a distribution...a two standard-deviation improvement.”
Sam Altman, Open AI (June 2023): “I’ve seen in the US there are big differences between classroom education and one-on-one tutoring, it’s two standard deviations.”
Different representatives appeared in the webinar, but they, too, were enthusiastic about tutoring. One part in particular caught my attention. They mentioned studying “the literature” on human tutoring. One guest even said that understanding that literature was “the easy part.” In other words, “We’ve done the studies. We know tutoring works. Now we have to figure out how AI can replicate this success.”
Here’s something strange they didn’t mention: None of that literature says anything about professional tutors. By “professional” I mean a tutor who has spent at least 1000 hours per year for at least 5 years in 1:1 student interactions.1 Let’s run through some of the most famous studies, the ones you’ll find if you look on Khan Academy or ask an LLM for the most influential tutoring studies.
Bloom, B. S. (1984). The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring.
VanLehn, K. (2011). The Relative Effectiveness of Human Tutoring, Intelligent Tutoring Systems, and Other Tutoring Systems.
Chi, M. T. H., & Wylie, R. (2014). The ICAP Framework: Linking Cognitive Engagement to Active Learning Outcomes.
Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring in problem solving.
Lepper, M. R., Woolverton, M., Mumme, D. L., & Gurtner, J. L. (1993). Motivational techniques of expert human tutors: Lessons for the design of computer-based tutors.
I had the most recent models of ChatGPT, Gemini, and Claude to read these and 5 other top studies. In every case, the subjects were undergrads, graduate students, or amateurs of some kind. I could not find2 any studies of professional tutors.
Now, if you just read the first study in my list, you might wonder why we need professionals. Bloom writes about two classes in which the students reportedly improved by two standard deviations (2 sigma) after various interventions, including tutoring, by people who are amateurs by my definition (graduate students). This is the study Khan and Altman are referring to in those earlier quotes. That is a truly incredible increase, roughly equal to improving from a 1100 to a 1500 on the SAT. To achieve that result across a group of students is astonishing.
But interestingly, the two-sigma results of this study have never been replicated. The data from the study was actually taken from the work of two graduate students, neither of whom ever published anything else on tutoring. In fact, according to this excellent overview of Bloom’s study by U.T. Austin professor Paul T. von Hippel, “Among 96 tutoring studies the authors [of a 2020 study] reviewed, none produced a two-sigma effect.”
That’s not to say that there is no evidence to support the effectiveness of tutoring. The 2020 study von Hippel referred to reports that “tutoring programs yield consistent and substantial positive impacts on learning outcomes, with an overall pooled effect size estimate of 0.37 SD.”
But how many of those 96 tutoring studies chose professionals as their subjects? Zero.
If teaching and learning were mechanical, straightforward processes, I don’t think this oversight would matter. But as I argued in my last post, math education is not mechanical. We don’t know exactly how the brain works. We have useful general theories, but they can only be applied with great skill and effort. If tutoring is the “gold standard” for education, doesn’t it make sense to study the very best practitioners? If you wanted to understand the best practices of medical experts, would you study medical students?
And, of course, the point of studying these experts is not merely academic: it’s to help people. You study the best doctors to help patients.
These companies dream of a world in which every student has an AI tutor. If they really want to help these students, they should have their AI tutors learn from professionals.
Note: Thanks to Mike Bergin of Tutor: The Newsletter for reading a draft of this post.
This is a pretty low bar, by the way. There are many tutors with well over 10,000 hours of teaching experience.]
If you know of one, please send it to me!


