The bad jokes of ChatGPT (and other chatbots)

--

Written by: Adriaan Odendaal, May 2024

When ChatGPT publicly launched at the end of 2022, we were all blown away by its seemingly human ability to respond eloquently and coherently to even the most colloquial or conversational prompts. This new generation of LLM chatbots are so seemingly human-like (and increasingly so) that some people are even convinced that they are sentient. We are at a tipping point in the advancement of artificial intelligence, with ChatGPT’s creators at OpenAI even proclaiming that the holy grail of artificial general intelligence is just on the horizon.

Yet, ask ChatGPT to tell you a funny joke and you will be confronted with the same stale puns and cringey dad jokes that you’d get from the older preprogrammed voice-assistants like Alexa or Siri. If ChatGPT is so good at imitating human conversations, why does it seem so bad at telling jokes?

One of many rote ChatGPT jokes.

Why do chatbots try to be funny?

Being able to respond to verbal prompts to “tell a joke” has long been an integral component to chatbots and virtual assistants like Siri, Alexa, or Google Assistant.

Siri telling one of its many pun-based jokes.

Humour is a definitive and integral part of human communication and social interaction. It thus makes a lot of sense for chatbots, in their efforts to emulate human conversation, to have a persistent ‘joke-telling’ functionality. As a Google product manager explains, these kinds of features give chatbots “personality” and allow them to seem “more conversational”. This makes the use of these technologies more intuitive and facilitate more seamless human-computer interaction.

It’s not incidental that if you look at science fiction, the lack of a sense of humour is often used as a trope for the breakdown of communication between humans and non-humans such as robots or androids.

Data, the famously humour-less android from Star Trek.

However, in a behind-the-scenes interview with the erstwhile ‘Personality team’ involved in developing Google Assistant, tech columnist David Pogue writes: “It turns out that all you need to create a smart, funny, empathetic assistant is a roomful of smart, funny, empathetic people”.

With chatbots like Alexa or Siri, the jokes they tell on command are not computationally generated. Instead, they are written or curated from existing sources by teams of humans. The ability to tell a joke is thus explicitly preprogrammed and the jokes predefined. Siri will parse the command “tell a joke” and then automatically retrieves and relays a string of characters from a human-written joke database. It is no wonder that the jokes of Alexa and Siri are the same formulaic puns, dad-jokes, and one-liners that you’d find collected in books such as 1001 One-Liners and Short Jokes. It’s concise and formulaic humour that can fit into a system of subroutines and databases.

ChatGPT as a generative AI technology works in a fundamentally differently manner. It can procedurally create novel sentences by iteratively predicting, based on vast troves of training data, what string of characters are likely to follow one another given a specific preceding string of characters (read a more complete description of how ChatGPT works here). It is a new frontier for conversational agents and AI technology, moving away from preprogrammed responses towards generative answers. Being able to respond to the prompt “tell a joke” is thus not merely a hardcoded artificial affectation to improve the human-computer interaction anymore, but an innate ability of the model.

Innovation in computational humour

Throughout the history of AI, the ability to simulate different human capabilities have been used as important milestones for innovation. Machines were seen to surpass us in logical reasoning with the achievements of IBM’s chess-playing DeepBlue, human-like intuition was simulated with Google’s AlphaGo, and recently the domain of creativity has been besieged by generative AI tools such as DALL-E or Midjourney. Humour is seen as yet another illusive — yet monumental — milestone for the advancement of AI.

Humour is a complex cognitive task that relies greatly on understanding nuance, context, as well as culturally contingent semantics. Processing (or ‘getting’) a joke relies on a lot of things happening all at once. It relies on your ability to recognize the complex socio-cultural context of the joke, understand the multiple meanings of words, understand paralinguistic communications such as intonation, identifying incongruities within and between the context and communications, as well as the usage of abstract thinking. To make matters worse, humour is also highly subjective and takes many different forms and formats. To computer engineers, creating a computational agent that can perceive, appreciate, and produce humour thus signifies a profound achievement of computationally simulated cognitive flexibility, creativity, contextual and cultural awareness, and linguistic intelligence.

In pursuit of this achievement, Natural Language Processing researchers established a field of research known as ‘computational humour’. Computational humour researcher see humour as “one of the few capabilities that seemed to be reserved for human individuals thus far”, and thus focus on “the great complexity of automated humor and the limitations of current endeavors,” writes Jentzsch and Kersting. As computer scientist Tony Veale writes: “A phenomenon as sprawling and amorphous as humor, one that touches on so many aspects of our lives, is not going to be squeezed into a single formula or equation”. Humour is seen as so complex a task that it is considered to be a requirement for ‘AI-complete’ by researchers such as Veale. This means that computational humour requires artificial general intelligence to solve, and that it can thus be seen as an indicator for artificial general intelligence.

In their research, Jentzsch and Kersting writes: “With their innovative capabilities, GPT-based models have the potential to open a new chapter of [computational humour] research”. ChatGPT can generate limericks, songs, poems, help write award-winning novels, and produce new artistic masterpieces. Can it then also tell good jokes?

An experiment: can ChatGPT write an original Mitch Hedberg one-liner?

Problem 1: Finding the formula

Mitch Hedberg is a famous comedian from the 90s that still enjoys widespread internet relevance. This is thanks to his timeless dry wit and defeatist self-deprecating humour, but also to the short meme-able format of iconic one-liners, such as:

“My fake plants died because I did not pretend to water them.”

Or:

“I haven’t slept for ten days, because that would be too long.”

The jokes are simple and succinct and seem to have an easily replicable formula. Yet they are perceptibly witty and humorous. It thus seems like a fairly straightforward (if unscientific) test to put to ChatGPT: “Write some original Mitch Hedberg one liner jokes”.

These are some of the jokes ChatGPT came up with:

“I don’t have a fear of heights. I have a fear of widths. Like, have you seen how wide some people drive? Terrifying.”

“I’m not a fan of speed bumps. If I wanted my car to jump, I’d take it to the circus.”

“I used to play hide and seek. They never found me. Turns out, I was just really good at hiding. Or nobody was looking. Either way, I win.”

One the surface, these have a similar structure to Mitch Hedberg’s one-liner and don’t seem too far off the mark. They share a similar sense of surreal absurdism that makes Hedberg’s jokes work. Yet they don’t seem to land as jokes in quite the same way. Their absurdism seems to turn almost into something a bit nonsensical. ChatGPT can’t seem to find the formula of what exactly makes Mitch Hedgeberg’s jokes. It seems to stumble its way through the perceptible structure of his jokes. “I don’t have a fear of heights. I have a fear of widths,” seems to set-up a typical Hedberg joke. Yet ChatGPT predictive next sentence falls flat: “Like, have you seen how wide some people drive? Terrifying”.

In their empirical research on ChatGPT’s ability to perceive and reproduce humour, Jentzsch and Kersting found that the system can “accurately explains valid jokes but also comes up with fictional explanations for invalid jokes”. Invalid jokes “contain joke-like elements but fails to deliver a punch line”. In the same way, ChatGPT only seem to come up with seemingly invalid Mitch Hedberg jokes. This points to the critique often levelled at ChatGPT and other LLMs that they are ‘stochastic parrots’, meaning that although they are able to to generate plausible language, they can’t understand the meaning of the language they process or reproducing. In linguist Emily Bender’s words, a stochastic parrot is an entity “for haphazardly stitching together sequences of linguistic forms … according to probabilistic information about how they combine, but without any reference to meaning”.

In contrast, despite the seeming simplicity of Hedberg’s jokes they are the product of a well-honed literary and comedic craft. Hedberg was heralded for his ability to distil highly complex humour, semiotically layered and nuanced, into single simple sentences. Comedian Adam Hess wrote of Hedberg for The Guardian: “This joke is just ten words long, but ten words was enough for Hedberg to paint a hilarious and detailed picture that most people couldn’t paint with 100”. He was an “alchemist who turned sentences into comedy gold,” the comedian continued.

Hedberg’s one-liners, thought they seem structurally like the one-liners and puns of Siri and Alexa, are qualitatively different in that they don’t follow easily replicable formulas. There is something almost inexplicable — seemingly alchemic — about just what makes Hedberg’s jokes funny. As Adam Hess writes again: “I’d never heard one-liners before which had no hint of pun-slinging or wordplay; just funny sentences”.

Perhaps what complicates this is that there is also, of course, the linguistic pragmatics of humour. How a joke is delivered, where it is told, as well as the paralinguistic communications of the comedian matters in making something funny. Much of what made Hedberg a notable comic was his deadpan and seemingly listless standup routine. Perhaps, delivered by Hedberg on a stage some of ChatGPT’s jokes would seem funnier. (Yet, I somehow doubt it looking at the disappointing findings of a group of San Franciscan comedians who tested some ChatGPT-written jokes on live audiences). Moreover, listening to a Mitch Hedberg set, you also get the sense that what makes his jokes funny is that they spring from a unique point of view — a way of seeing the world that is unique to Mitch Hedberg. As one journalist writes: “He had this way of breaking down the absurdity of the world around us that was just… clever and inherently silly”.

Problem 2: Thinly veiled plagiarism

Despite the reservations made above, amongst the imitations of Mitch Hedberg, ChatGPT did deliver one joke that I thought was actually quite funny:

“I saw a sign that said ‘Watch for children.’ I thought, ‘That sounds like a fair trade.’ But nobody wanted to trade.”

However, a quick internet search revealed that this is a rephrasing of a well-known joke by Demetri Martin — another one-liner standup comic:

Another of the better jokes created by ChatGPT was likewise a rephrasing of a famous Mitch Hedberg joke. ChatGPT wrote:

“I like escalators because they can’t break, they can only become stairs. Sorry for the convenience.”

The original being:

“An escalator can never break: it can only become stairs. You should never see an Escalator Temporarily Out Of Order sign, just Escalator Temporarily Stairs. Sorry for the inconvenience.”

While these jokes again points to the inability of ChatGPT to identify and replicate the nuance of what made the original jokes work well, they point to a more fundamental limitation. ChatGPT remixes and recycles existing jokes and present them as novel.

As Jetzsch and Kersting found in their empirical study:

“Our empirical evidence indicates that jokes are not hard-coded but mostly also not newly generated by the model. Over 90% of 1008 generated jokes were the same 25 Jokes.”

Their researchers distinguish between ‘originally generated output’ (text “composed by the model”); ‘replicated output’ (text that is “memorized from training data and played back by the system in exact wording”); and ‘modified output’ (text that is “a mix of both”). They found that most of the jokes told by ChatGPT are either replicated or modified data — with slight variations on existing jokes. Jetzsch and Kersting write: “This recurrence supports the initial impression that jokes are not originally generated. Presumably, the most frequent instances are explicitly learned and memorized from the model training”.

There is a reason why when asked to tell a joke, ChatGPT often produce the same recycled puns as Siri or Alexa. This is because ChatGPT’s training data includes the countless joke collections and well-worn one-liners already posted and reposted all over the internet. These are the same collections that many of Siri or Alexa’s jokes are curated from. Thus, while technically not the same as Siri or Alexa, the ChatGPT model gives very similar outputs. As San Franciscan comedians also found in a comedy ‘Turing Test’ conducted with ChatGPT: “[it] mostly turned up dad jokes lifted from the internet, making it easy to identify a generic punchline from an original”.

Though ChatGPT is a generative AI tool that can procedurally create novel sentences, the jokes it tells are usually written by humans in one way or another. As Tony Veale summarizes the state of affairs: “Machines can already tell jokes, or at least recite the ones that we train them to tell”.

What the humourlessness of ChatGPT tell us

Jetzsch and Kersting seem to suggest that AI will eventually solve the computational humour conundrum. They argue: “in comparison to previous LLMs, [ChatGPT] can be considered a huge leap toward a general understanding of humor”. Other technologies also offer alternative avenues for innovation, such as the case of Jon the Robot that can determine, based on audience reactions, whether a joke has fallen flat — and respond with self-referential humour about it.

In the same way, there are older chatbots that already proved to be quite humorous. Jabberwacky, one of the pioneering chatbots in computer history, was created in 1988 by Rollo Carpenter with the aim to “simulate natural human chat in an interesting, entertaining and humorous manner”. Ask it to tell you a joke, however, and it will most likely refuse. Jabberwacky is not so much a joke-telling chatbot, than it is a chatbot we can project humour on to. It’s often nonsensical or snarky, somewhat glitchy even, which makes it easy for a user to see humour in its interactions.

Likewise, it is often the failures and errors of newer chatbots that make them funny. Alexa misinterpreting a voice command can produce more humour than its pre-programmed one-liners:

Or when ChatGPT falls for our tricks:

With this I am not trying to say that humour is somehow uniquely human, and the technical limitations of machines to reproduce jokes prove some kind of human exceptionalism. Rather, I want to make the point that paying attention to the limitations of chatbots (and specifically AI-powered chatbots) to convincingly reproduce humour allows us to see the technology more clearly for what it is. Seeing ChatGPT create invalid jokes reveals the fact that it functions as a stochastic parrot — a statistical linguistic machine that doesn’t understanding what it is generating. Recognizing familiar or well-worn jokes in its ‘original’ outputs can help us appreciate how it is trained on vast corpuses of existing textual data. Looking at the inability of chatbots to tell convincingly funny jokes allows us to remain aware of the ways these technologies are still machines, despite their increasingly inscrutable ability to appear human.

--

--