A Lesson For AI Tutors

Let's take a look at the literature...

Mar 11, 2026

EdTech Insider recently hosted a YouTube webinar on “The Future of AI Tutoring.” The panelists included executives from top organizations like OpenAI, Khan Academy, and Google’s DeepMind. These organizations have spoken very highly of tutoring:

Deep Mind (December 2025): “One-to-one tutoring is widely considered the gold standard for personalized education.”
Sal Khan (TED Talk 2023): “If you were to give personal 1-to-1 tutoring for students, then you could actually get a distribution...a two standard-deviation improvement.”
Sam Altman, Open AI (June 2023): “I’ve seen in the US there are big differences between classroom education and one-on-one tutoring, it’s two standard deviations.”

Different representatives appeared in the webinar, but they, too, were enthusiastic about tutoring. One part in particular caught my attention. They mentioned studying “the literature” on human tutoring. One guest even said that understanding that literature was “the easy part.” In other words, “We’ve done the studies. We know tutoring works. Now we have to figure out how AI can replicate this success.”

Here’s something strange they didn’t mention: None of that literature says anything about professional tutors. By “professional” I mean a tutor who has spent at least 1000 hours per year for at least 5 years in 1:1 student interactions.1 Let’s run through some of the most famous studies, the ones you’ll find if you look on Khan Academy or ask an LLM for the most influential tutoring studies.

I had the most recent models of ChatGPT, Gemini, and Claude to read these and 5 other top studies. In every case, the subjects were undergrads, graduate students, or amateurs of some kind. I could not find2 any studies of professional tutors.

Now, if you just read the first study in my list, you might wonder why we need professionals. Bloom writes about two classes in which the students reportedly improved by two standard deviations (2 sigma) after various interventions, including tutoring, by people who are amateurs by my definition (graduate students). This is the study Khan and Altman are referring to in those earlier quotes. That is a truly incredible increase, roughly equal to improving from a 1100 to a 1500 on the SAT. To achieve that result across a group of students is astonishing.

But interestingly, the two-sigma results of this study have never been replicated. The data from the study was actually taken from the work of two graduate students, neither of whom ever published anything else on tutoring. In fact, according to this excellent overview of Bloom’s study by U.T. Austin professor Paul T. von Hippel, “Among 96 tutoring studies the authors [of a 2020 study] reviewed, none produced a two-sigma effect.”

That’s not to say that there is no evidence to support the effectiveness of tutoring. The 2020 study von Hippel referred to reports that “tutoring programs yield consistent and substantial positive impacts on learning outcomes, with an overall pooled effect size estimate of 0.37 SD.”

But how many of those 96 tutoring studies chose professionals as their subjects? Zero.

If teaching and learning were mechanical, straightforward processes, I don’t think this oversight would matter. But as I argued in my last post, math education is not mechanical. We don’t know exactly how the brain works. We have useful general theories, but they can only be applied with great skill and effort. If tutoring is the “gold standard” for education, doesn’t it make sense to study the very best practitioners? If you wanted to understand the best practices of medical experts, would you study medical students?

And, of course, the point of studying these experts is not merely academic: it’s to help people. You study the best doctors to help patients.

These companies dream of a world in which every student has an AI tutor. If they really want to help these students, they should have their AI tutors learn from professionals.

Note: Thanks to Mike Bergin of Tutor: The Newsletter for reading a draft of this post.

This is a pretty low bar, by the way. There are many tutors with well over 10,000 hours of teaching experience.]

If you know of one, please send it to me!

PLG

Apr 3

This post hit home for me - I've got an 8 year old who we are trying to help with math. We have her in one of the reasonably well known enrichment/tutoring programs, which I see as a combination of curriculum, instruction and day care. I like the curriculum, but the instruction is average at best - it seems like they just hire moms who previously sent their kids there, and we wouldn't expect those people to be especially good teachers.

A friend and I are messing around with vibe coding an alternative - it's pretty easy to recreate sample problems that emulate the curriculum, but I'm trying to figure out how you could handle the introducing concepts and helping through challenges - It's not a high bar for the instruction to be better than the class she's in, but we're certainly not there yet.

I had wondered about researching how high quality tutors did it and whether there were best practices... interesting to find out that no one else has really done that either.

1 reply by Mike McGibbon

Jeff DeLisle

Mar 26

I find it hard to think of any of my good teachers, and separate out the content of what they taught from their personhood. The very best robot with the very best content, even if it could be constructed, would fail that test.

And why would you want to construct it anyhow? Would anyone be better if in a world of progressively less human encounters and progressively more encounters with machines?

2 more comments...

Mathchops’s Substack

Discussion about this post

Ready for more?