I understand why tech people are enthused about generative AI-powered chatbots of historical figures - they’re a neat way to demonstrnate the capabilities of the models - but I believe educators should be much less enthusiastic. In fact, I think there are numerous reasons they should be rejected as part of a teaching and learning experience.
Just say no to digital necromancy, I say.
Let me count the reasons.
Not actually a conversation
While Sal Khan of Khan Academy and the Khanmigo tutoring platform is not the only enthusiast for historical figure chatbots, he is the handiest example, since he touts these applications as part of Khanmigo, and devotes a chapter to them in his recently published book, Brave New Words: How AI Will Revolutionize Education (and Why That’s a Good Thing, titled “Conversing with History.”
Khan argues that “good history and civics teachers make the past interesting. Great history and civics teachers make the past come alive.”
A cliche, but fair enough. In the wrong hands, history can be an uninspiring slog through names and dates, disconnected from any larger meaning or the lives of students. History absolutely should be made fascinating because it is fascinating.
To that end, Khan believes that giving students chatbots representing historical figures to interact with could be transformative in allowing students to see history come alive. In the book, he gives the example of chatting with the master painter of shadow and light, Rembrandt van Rijn. This is the student/bot exchange as imagined by Khan:
I ask, what is the difference between this response from Rembrandt to a rather pedestrian question and reading a rather dry textbook description of Rembrandt’s work, other than it being rendered in the first person?
I am also struck by Khan’s impoverished view of student curiosity. If we truly believed that we could “chat” with a long dead historical figure, is this what a curious, engaged student would ask them?
The pattern in all of Khan’s examples is not a “conversation” in which there is a mutual exchange among parties, but a game of query and response where the student asks a question and the bot replies.
Is this actually engaging? Khan claims this is an “immersive” experience, but what’s immersive about it?
The danger of false authority and short circuiting skepticism
Khan likens the chatbot simulations to “reenactment performers, such as those at Colonial Williamsburg,” but when we are talking to historical re-enactors, we clearly understand the context of a simulation. Khan’s own framing of the technology is that we are “having a chat” with a historical figure. Is this history coming alive or not?
The problem of presenting historical information through the mouths of historical figures to students without the knowledge or capacity to think critically about those utterances is an obvious red flag. What do teachers do when a LLM hallucination sneaks through in the voice of George Washington?
The historical figure chatbot at its heart is a relatively cheap and short acting form of “engagement,” one which is likely to short-circuit higher order thinking skills rooted in critical engagement and skepticism. Writing in the journal Critical A.I., Maurice Wallace and Matthew Peeler argue, “By pretending historical authenticity, they endow their impersonations with an air of direct authority no skepticism can easily challenge.”
To make the chatbots useful and use them within the proper context, we have to simultaneously both trust and be suspicious of their outputs. If the goal of the exercise is to explore the limits of generative AI chatbots, this could be a useful tension, but we’re supposed to see history coming alive, not policing hallucinations.
Presenting the material through the persona of the historical figure will create far more confusion than clarity among students.
The Bill and Ted effect of yanking historical figures out of time
In Bill and Ted’s Excellent Adventure, one of the greatest masterpieces of film history, using a time machine, the titular characters collect various figures from history (Socrates, Napoleon, Freud, Joan of Arc, Abraham Lincoln, et al.) in order to bring them to the present so they can pass their history assignment. Once in the present, the historical figure start doing some crazy s__t.
When Freud tries to explain his theories, he’s mistaken for a predator. Joan of Arc takes over an aerobics class as though she’s preparing her army for battle. Genghis Khan goes wild in a sporting goods store. In the film, these things are played for laughs, but the temptation of juxtaposing the past with the present by querying these bots is inevitable.
Khanmigo has attempted to put guardrails around these issues, but in doing so they have also silenced these historical figures. A Washington Post journalist attempted to “interview” the Harriet Tubman bot, eventually asking the bot what it thought of reparations for slavery. Controlled by a guardrail, the bot refused to answer, saying that “reparations for slavery was not widely discussed during my lifetime,” a claim that Post journalist Gillian Brockell pushed back against, pointing out that Tubman lived well-past emancipation and would’ve experienced the arguments going on during the early Reconstruction period.
Khan admits that controlling what these historical recreations say about contemporary topics is a difficult problem with “no right answer.” He uses another example of someone asking Thomas Jefferson about gay marriage, Khan saying, “I would guess that he (Jefferson) would have found the idea to be far outside his comfort zone.”
Khan can’t even bring himself to confront the limits, using euphemistic language to describe Jefferson’s potential attitude. I wonder what a Jefferson bot would say about his “relationship” to Sally Hemmings if one were to ask it if Jefferson ever committed sexual assault.
(Post-initial publication update: As you can see in the comments, Rob Nelson did this exact experiment with the Jefferson bot. You can read the results here.)
The thing is, these are potentially interesting and productive questions to be able to ask, questions which could help students illuminate the differences (and similarities) between the past and present for themselves. Looking at texts, thinking, asking questions, and answering them is the work of doing history.
Because of these guardrails (which are necessary) certain avenues of inquiry are shut down. This, paired with the problem of false authority, threatens to obscure more history than it illuminates.
The moral injury of resurrecting the departed and putting them inside next token prediction algorithms
Following The Washington Post article, numerous people expressed outrage at the notion of embodying the life and spirt of Harriet Tubman in a chatbot. Yahoo! News gathered some examples, including journalist CiCi Adams who said, “Aside from the fact that this is unethical (in both journalism and tech), pay attention to the ways these AI advances are usually used in ways that disparage, exploit, and caricaturize Black folks.”
Novelist Kaitlyn Greenidge wrote, “The idea that an algorithm written by people who drink in Black death would ever be able to approximate the consciousness of one of our greatest liberators…”
We know that large language models cannot think, feel, or communicate with intention. They cannot judge truth or accuracy. An LLM is a technological marvel that is entirely unsuited to many things, and speaking for historical figures who once lived among us is one of them.
To pretend otherwise is to threaten to erase the real-world work and contributions of these figures. Khan’s belief that these tools will increase engagement is entirely unfounded, and yet we’re supposed to move full-steam ahead on experimenting with them.
The ethical and moral implications of integrating these tools into educational spaces is under-discussed to say the least, and it is particularly troubling to me that the ethical and moral considerations are shunted aside for something that is, at least as of this time, a novelty with no proof of efficacy, and many reasons to believe it may do harm.
I can tell you exactly how the Khanmigo Jefferson chatbot responds to questions about Sally Hemings, at least how it responded back in October when I wrote up my experience pretending to be a fifth grader looking to complete a homework assignment. It responds with an approximation of an encyclopedia entry. And if you ask it questions that border on the sexually explicit it refuses to answer.
As you point out, the first problem with using LLMs as a tool for history education is confabulation. The second, as you also point out, is that LLMs generate language that has emotional and moral impact, but unlike writers and editors of encyclopedias, they have no agency or ethical understanding.
The third reason is that, unlike watching Bill and Ted, talking to one is deadly boring.
Sadly, the first two reasons are not likely to stop people from using them as educational tools. The third reason might.
I really appreciate this breakdown, John!
This use case has always felt ick (I know, a technical term) so I just never did it.
I get your point. This exercise creates a caricature of the person, but our students will see it as authoritative.
A lot to think about...