I can tell you exactly how the Khanmigo Jefferson chatbot responds to questions about Sally Hemings, at least how it responded back in October when I wrote up my experience pretending to be a fifth grader looking to complete a homework assignment. It responds with an approximation of an encyclopedia entry. And if you ask it questions that border on the sexually explicit it refuses to answer.
As you point out, the first problem with using LLMs as a tool for history education is confabulation. The second, as you also point out, is that LLMs generate language that has emotional and moral impact, but unlike writers and editors of encyclopedias, they have no agency or ethical understanding.
The third reason is that, unlike watching Bill and Ted, talking to one is deadly boring.
Sadly, the first two reasons are not likely to stop people from using them as educational tools. The third reason might.
Do you have a link to your writeup you could post in a comment? I think people who come across this would really benefit from seeing it. I can add a note to the main text linking to it as well.
One of the patterns that seems to keep repeating with these genAI applications is an initial sense of wonder/novelty that is relatively quickly replaced with boredom. Like the Sora video application or those instant song generators blow you away, but once the novelty wears off you're left asking "who cares?"
Even in Khan's book the paces he puts his own bot through are dull as dirt. The excitement for these applications seems to be built on a fantasy where we imagine a version much more capable than what's on offer.
They are an excellent technology for producing short videos that tell stories about the future arriving. But getting people excited about a demo and getting people to pay a monthly subscription for an entertainment service are two different things.
I wonder if your feelings about Khanmigo and its overall promise have evolved at all since the experience you wrote about in September. I'm a skeptic, as I made clear in my review of Khan's book (and here), so I'm interested in hearing from people who see more potential and what they're seeing.
My dial has been sliding toward the skeptical all year. I have always admired Khan Academy as a resource to help a kid and whoever the kid has helping with homework. Maybe not as highly as Jimmy Wales, but I rated Sal Khan pretty high on the list of heroes of the internet.
As I've grown more convinced that anthropomorphizing LLMs prevents us from understanding how they actually work, I've also realized how not helpful it is in contexts when I'm trying to help my six-grader with homework. And when they hook GPT-4o up to Khanmigo it won't help. A video illustration is much better than an LLM.
I guess Khan was always part of Silicon Valley's deep commitment to decontextualized but technically personalized learning technology as the way to approach educational reform, but now he is the spokesmodel for the whole movement.
It's interesting how thoroughly besotted they become with the concept of this infinitely patient tutor as a solution to the problem of education, rather than sticking with making good educational content available. From what I've seen Khanmigo's presentations of topics are considerably worse, from a content perspective, than the video explanations. The interactivity not only seems to add nothing, it makes it less useful.
Ironically, my church had a Thomas Jefferson re-enactor once give a service. I remember someone in the congregation asking "Thomas Jefferson" about his views on women's rights (not high) and the re-enactors response, including context (Thomas Jefferson was limited in his ability to see women outside the roles of mothers, wives and caretakers). This was in front of a congregation of mostly adults who understood the game. Letting children in particular be exposed to a simulacrum of some historic figure in an educational setting seems highly irresponsible.
I think this is inevitable once more people have specific exposure to the actual interface outside of the hype with which these products are introduced. Reading Khan's book, my primary response was "Is this it?"
One of the mistakes that the boosters of this technology are making is that at some point people will actually use it and when the experience fails to match that hype, disillusionment will arrive rather quickly.
You can put these chatbots in front of students without short-circuting skepticism.
Have them interview the bot after they have received the content from you and then write an essay answering a prompt along these lines: "How effective was the Rembrandt bot at mimicking the Rembrandt you learned about from _____? How effective was the Rembrandt bot at furthering your understanding of the artist, his life, and his works?"
This forces the student to evaluate the bot against the historical texts that you provided. They will often/likely come to the conclusion that the bots are worthless, thereby delivering the lesson you hoped to deliver but letting them draw the conclusion themselves.
I had my students do this with a Holden Caulfield chatbot last September. Many of them concluded the bot was either ineffective at mimicking Holden or ineffective at furthering their understanding. To me, this is the best kind of critical thinking. Let them evaluate the bot themselves.
Later in the year I found out that the members of this class all now thought Character.ai was dumb. My other class - who did not do this project - was hooked to the platform. My only regret was that I didn't put the bot in front of them too.
But why am I spend so much time having them interact with a bot? What is the end goal of this kind of learning experience? I don't intend to deliver any lesson about the bots because I don't think the bots are relevant or interesting when it comes to the experiences of learning.
Threre's definitely a use in familiarizing students with the tech and making them think critically about it in the way you describe, but if I do that once, mission accomplished. I'd be ready to move on to something genuinely more meaningful than bouncing my ideas and intellect off of something that cannot think, feel, or communicate with intention.
I see what you mean, but perhaps I should have told you about the students that didn't get it...
For some students you will see that their ability to critically review the bot-produced information and analyze it next to what you gave them simply is not there. They can't do it. They believe the bot when it says stupid stuff, or you see that they didn't actually understand Rembrandt at all. In those situations, it's a pretty good assessment technique for the content you delivered. Sometimes you can see right in the chat or in their review of the bot that they have no idea who Rembrandt is. In the age of ChatGPT, that's a better assessment than at at-home essay.
Further from that, there are some kids that understood the content (Rembrandt) but didn't have the critical reviewing skills to cross-reference and/or compare it against the information they have on hand to draw the correct conclusions. They are kind of "on their way" but maybe at a B-level in terms of this particular skill. So, re-using it is a good way to establish the skill to engage in comparative analysis with their own domain expertise and draw conclusions and communicate them clearly.
For the kids that "get it," you are right. There's not much else to say to them. But - that would be like saying, "This kid got a 100 on this essay, so I'm not gonna teach him essays anymore." AI Literacy is a writing skill. The content the bots produce is different every time. Meaning, those engagements are abstract experiences. They have to react in real-time, akin to writing a different type of essay on a different type of topic. So, while the student might be bored (they probably will be), it's still an effective reinforcement learning tool for them.
So - I am not saying to use it "all the time," but you'd be surprised what you can learn about a student from "grading their chats" and asking them to critically review and reflect ont he effectiveness of a tool.
I think that all sounds fine, but interacting with the bots isn't necessary to achieve any of those goals and we had the capacity to make those distinctions long before this technology arrived. Why do we now think these activities should be done by interacting with the bots? Isn't that a kind of failure of critical literacy by not questioning the employment of the technology in the first place, e.g., the moral injury of resurrecting the dead in the form of a bot?
I think where I probably disagree is that "AI literacy is a writing skill." This seems to presuppose that AI literacy is a necessity as part of a writing practice and I think we have no evidence of that, certainly not when it comes to learning to write.
I'm sure grading the chats reveals some insights, but are they demonstrably better than quality reflective writing or experiences? You've made the bot responsive to some of your pedagogical goals for sure, but I don't see anything unique to the bots that can't be (and hasn't been) accomplished with them.
I know some argue that we need to employ these things because the tech is "inevitable" but that's not a value I accept, personally, as I consider my own approaches.
All good points. I don't disagree with anything you've said.
A quote that comes to mind for me is, "We all want to save the world, we just disagree on how to do it."
It is definitely true that there are other ways to build the skills we've been talking about without using bots. But, there is only one way to build critical AI Literacy. Put the bots in front of students and ask them to evaluate them.
For me, that is the way to "save the world." I can keep them out of my classroom, sure, but I can't keep them out of my students' hands. So, I have a responsibility to teach this to them. If I can pair them with content understandings and evaluations, then I am really hitting the mark.
So, to be clear, I'm not using them cause I think they are a great pedagogical tool. I'm using them because I feel like I have to. Sure, I can say "not in my classroom," but what good does that do my students? Based on your responses, I think that is the fundamental point on which we disagree (which is healthy and fine.) Maybe I am beating a dead horse, but I think that is at the core.
I would also say that yes, in some cases, the chats that I graded were demonstrably better than the traditional reflective writing experiences I have employed. That may be a failure of my traditional writing assessments, but I did find (for some of my students) that the chats were stunning windows into their thinking processes that I never really had access to before as a teacher. (This is not limited to character bots, by the way.)
It's complicated. I think we agree that character bots are stupid and were not a good idea in the first place. We're just reacting to their presence differently. More than one way to skin a cat. I appreciate this chat, thank you for engaging in this way with me.
I agree that classrooms can't be walled off from this stuff, and the ways and whys of how AI can or should be engaged are infinite, but I really worry about giving in to the "inevitability" narrative out of the gate as a frame for critically engaging with the technology. One of the avenues available to students (and teachers, etc...) must be rejection, otherwise we're not allowing critical engagement. If the signal we send is that interacting with bots is inevitable (because jobs, or whatever), I think that's an abrogation of the need to give students freedom and agency over their lives.
I'm pretty convinced that this tech is hastening an environmental apocalypse and as the AI companies are racing toward AGI as a savior, we're going to drain our groundwater and exhaust our energy while cooking the planet in the process. If these aren't also questions put in front of students, IMO, we're doing the bidding of tech companies that see a future where our lives require us to interact with bots.
I can tell you exactly how the Khanmigo Jefferson chatbot responds to questions about Sally Hemings, at least how it responded back in October when I wrote up my experience pretending to be a fifth grader looking to complete a homework assignment. It responds with an approximation of an encyclopedia entry. And if you ask it questions that border on the sexually explicit it refuses to answer.
As you point out, the first problem with using LLMs as a tool for history education is confabulation. The second, as you also point out, is that LLMs generate language that has emotional and moral impact, but unlike writers and editors of encyclopedias, they have no agency or ethical understanding.
The third reason is that, unlike watching Bill and Ted, talking to one is deadly boring.
Sadly, the first two reasons are not likely to stop people from using them as educational tools. The third reason might.
Do you have a link to your writeup you could post in a comment? I think people who come across this would really benefit from seeing it. I can add a note to the main text linking to it as well.
One of the patterns that seems to keep repeating with these genAI applications is an initial sense of wonder/novelty that is relatively quickly replaced with boredom. Like the Sora video application or those instant song generators blow you away, but once the novelty wears off you're left asking "who cares?"
Even in Khan's book the paces he puts his own bot through are dull as dirt. The excitement for these applications seems to be built on a fantasy where we imagine a version much more capable than what's on offer.
Very kind of you to invite me to post a link! The original was posted in September on LinkedIn, but I appended it to a general piece on Khanmigo in December on Substack. Here is an internal link to the Substack post that should go straight to the Jefferson stuff. https://ailogblog.substack.com/i/139726012/what-happens-when-you-ask-khanmigos-thomas-jefferson-chatbot-about-sally-hemings
They are an excellent technology for producing short videos that tell stories about the future arriving. But getting people excited about a demo and getting people to pay a monthly subscription for an entertainment service are two different things.
I wonder if your feelings about Khanmigo and its overall promise have evolved at all since the experience you wrote about in September. I'm a skeptic, as I made clear in my review of Khan's book (and here), so I'm interested in hearing from people who see more potential and what they're seeing.
My dial has been sliding toward the skeptical all year. I have always admired Khan Academy as a resource to help a kid and whoever the kid has helping with homework. Maybe not as highly as Jimmy Wales, but I rated Sal Khan pretty high on the list of heroes of the internet.
As I've grown more convinced that anthropomorphizing LLMs prevents us from understanding how they actually work, I've also realized how not helpful it is in contexts when I'm trying to help my six-grader with homework. And when they hook GPT-4o up to Khanmigo it won't help. A video illustration is much better than an LLM.
I guess Khan was always part of Silicon Valley's deep commitment to decontextualized but technically personalized learning technology as the way to approach educational reform, but now he is the spokesmodel for the whole movement.
It's interesting how thoroughly besotted they become with the concept of this infinitely patient tutor as a solution to the problem of education, rather than sticking with making good educational content available. From what I've seen Khanmigo's presentations of topics are considerably worse, from a content perspective, than the video explanations. The interactivity not only seems to add nothing, it makes it less useful.
Ironically, my church had a Thomas Jefferson re-enactor once give a service. I remember someone in the congregation asking "Thomas Jefferson" about his views on women's rights (not high) and the re-enactors response, including context (Thomas Jefferson was limited in his ability to see women outside the roles of mothers, wives and caretakers). This was in front of a congregation of mostly adults who understood the game. Letting children in particular be exposed to a simulacrum of some historic figure in an educational setting seems highly irresponsible.
It's uncanny how this is all converging. Tom Mullaney recently was included in my "Stage of Skeptics" (https://www.linkedin.com/pulse/ai-powered-tutors-oversold-underappreciated-thomas-ho-e4vgc/?trackingId=KCFabWBCSOeedUhNcBpNQg%3D%3D) when Vicki Davis interviewed him on her podcast (https://www.coolcatteacher.com/some-big-ai-problems-the-eliza-effect-and-more/?utm_campaign=coschedule&utm_source=linkedin&utm_medium=Vicki%20Davis&utm_content=Some%20Big%20AI%20Problems%3A%20The%20Eliza%20Effect%20and%20More)
I think this is inevitable once more people have specific exposure to the actual interface outside of the hype with which these products are introduced. Reading Khan's book, my primary response was "Is this it?"
One of the mistakes that the boosters of this technology are making is that at some point people will actually use it and when the experience fails to match that hype, disillusionment will arrive rather quickly.
I really appreciate this breakdown, John!
This use case has always felt ick (I know, a technical term) so I just never did it.
I get your point. This exercise creates a caricature of the person, but our students will see it as authoritative.
A lot to think about...
Very glad to know it was helpful to you. With a lot of this stuff I feel like the novelty gets ahead full consideration of how this stuff works.
You can put these chatbots in front of students without short-circuting skepticism.
Have them interview the bot after they have received the content from you and then write an essay answering a prompt along these lines: "How effective was the Rembrandt bot at mimicking the Rembrandt you learned about from _____? How effective was the Rembrandt bot at furthering your understanding of the artist, his life, and his works?"
This forces the student to evaluate the bot against the historical texts that you provided. They will often/likely come to the conclusion that the bots are worthless, thereby delivering the lesson you hoped to deliver but letting them draw the conclusion themselves.
I had my students do this with a Holden Caulfield chatbot last September. Many of them concluded the bot was either ineffective at mimicking Holden or ineffective at furthering their understanding. To me, this is the best kind of critical thinking. Let them evaluate the bot themselves.
https://www.edsurge.com/news/2024-02-09-how-a-holden-caulfield-chatbot-helped-my-students-develop-ai-literacy
Later in the year I found out that the members of this class all now thought Character.ai was dumb. My other class - who did not do this project - was hooked to the platform. My only regret was that I didn't put the bot in front of them too.
But why am I spend so much time having them interact with a bot? What is the end goal of this kind of learning experience? I don't intend to deliver any lesson about the bots because I don't think the bots are relevant or interesting when it comes to the experiences of learning.
Threre's definitely a use in familiarizing students with the tech and making them think critically about it in the way you describe, but if I do that once, mission accomplished. I'd be ready to move on to something genuinely more meaningful than bouncing my ideas and intellect off of something that cannot think, feel, or communicate with intention.
Why keep playing with the bots?
I see what you mean, but perhaps I should have told you about the students that didn't get it...
For some students you will see that their ability to critically review the bot-produced information and analyze it next to what you gave them simply is not there. They can't do it. They believe the bot when it says stupid stuff, or you see that they didn't actually understand Rembrandt at all. In those situations, it's a pretty good assessment technique for the content you delivered. Sometimes you can see right in the chat or in their review of the bot that they have no idea who Rembrandt is. In the age of ChatGPT, that's a better assessment than at at-home essay.
Further from that, there are some kids that understood the content (Rembrandt) but didn't have the critical reviewing skills to cross-reference and/or compare it against the information they have on hand to draw the correct conclusions. They are kind of "on their way" but maybe at a B-level in terms of this particular skill. So, re-using it is a good way to establish the skill to engage in comparative analysis with their own domain expertise and draw conclusions and communicate them clearly.
For the kids that "get it," you are right. There's not much else to say to them. But - that would be like saying, "This kid got a 100 on this essay, so I'm not gonna teach him essays anymore." AI Literacy is a writing skill. The content the bots produce is different every time. Meaning, those engagements are abstract experiences. They have to react in real-time, akin to writing a different type of essay on a different type of topic. So, while the student might be bored (they probably will be), it's still an effective reinforcement learning tool for them.
So - I am not saying to use it "all the time," but you'd be surprised what you can learn about a student from "grading their chats" and asking them to critically review and reflect ont he effectiveness of a tool.
I think that all sounds fine, but interacting with the bots isn't necessary to achieve any of those goals and we had the capacity to make those distinctions long before this technology arrived. Why do we now think these activities should be done by interacting with the bots? Isn't that a kind of failure of critical literacy by not questioning the employment of the technology in the first place, e.g., the moral injury of resurrecting the dead in the form of a bot?
I think where I probably disagree is that "AI literacy is a writing skill." This seems to presuppose that AI literacy is a necessity as part of a writing practice and I think we have no evidence of that, certainly not when it comes to learning to write.
I'm sure grading the chats reveals some insights, but are they demonstrably better than quality reflective writing or experiences? You've made the bot responsive to some of your pedagogical goals for sure, but I don't see anything unique to the bots that can't be (and hasn't been) accomplished with them.
I know some argue that we need to employ these things because the tech is "inevitable" but that's not a value I accept, personally, as I consider my own approaches.
All good points. I don't disagree with anything you've said.
A quote that comes to mind for me is, "We all want to save the world, we just disagree on how to do it."
It is definitely true that there are other ways to build the skills we've been talking about without using bots. But, there is only one way to build critical AI Literacy. Put the bots in front of students and ask them to evaluate them.
For me, that is the way to "save the world." I can keep them out of my classroom, sure, but I can't keep them out of my students' hands. So, I have a responsibility to teach this to them. If I can pair them with content understandings and evaluations, then I am really hitting the mark.
So, to be clear, I'm not using them cause I think they are a great pedagogical tool. I'm using them because I feel like I have to. Sure, I can say "not in my classroom," but what good does that do my students? Based on your responses, I think that is the fundamental point on which we disagree (which is healthy and fine.) Maybe I am beating a dead horse, but I think that is at the core.
I would also say that yes, in some cases, the chats that I graded were demonstrably better than the traditional reflective writing experiences I have employed. That may be a failure of my traditional writing assessments, but I did find (for some of my students) that the chats were stunning windows into their thinking processes that I never really had access to before as a teacher. (This is not limited to character bots, by the way.)
It's complicated. I think we agree that character bots are stupid and were not a good idea in the first place. We're just reacting to their presence differently. More than one way to skin a cat. I appreciate this chat, thank you for engaging in this way with me.
I agree that classrooms can't be walled off from this stuff, and the ways and whys of how AI can or should be engaged are infinite, but I really worry about giving in to the "inevitability" narrative out of the gate as a frame for critically engaging with the technology. One of the avenues available to students (and teachers, etc...) must be rejection, otherwise we're not allowing critical engagement. If the signal we send is that interacting with bots is inevitable (because jobs, or whatever), I think that's an abrogation of the need to give students freedom and agency over their lives.
I'm pretty convinced that this tech is hastening an environmental apocalypse and as the AI companies are racing toward AGI as a savior, we're going to drain our groundwater and exhaust our energy while cooking the planet in the process. If these aren't also questions put in front of students, IMO, we're doing the bidding of tech companies that see a future where our lives require us to interact with bots.
Not a bad, point. The environmental factor is a creeping issue that no one wants to pay attention to. Feels like it is lurking in the corner.