62 years later, passing the ‘Turing test’ remains elusive
It is 100 years this week since the birth of the revered wartime codebreaker Alan Turing, and 67 years since he was awarded an OBE for leading the team, in Bletchley Park’s Hut 8, that cracked the German navy’s Enigma code. It has also now been 60 years since he was convicted for gross indecency, after admitting to being in a consensual same-sex relationship, and sentenced to chemical castration by means of regular injections of oestrogen, as an alternative to time in prison. It’s 58 years to the month since he killed himself, and just less than three years since a British prime minister saw fit to issue an official apology for his treatment.
Though best known for the story of his wartime heroism and the appalling circumstances of his death, in academic circles, Turing’s name carries other connotations. Among philosophers and computer scientists, he is known as the father of artificial intelligence, thanks in part to a single essay penned in 1950, asking the question, “Can machines think?” In the article, published in the philosophical journal Mind, Turing proposed a game capable of providing an answer: a competitive conversation in which a computer and a human attempt to convince a judge that they too are a conscious, feeling, thinking thing.
The game would come to be known as the “Turing test”. At the time, it was impossible to conduct: humans had yet to create the necessary networks and software; computer programs were nowhere near intelligent enough to simulate anything resembling conversation. It took another 40 years for Turing’s imagined game to become a reality, when in 1990 the American philanthropist Hugh Loebner founded the annual Loebner prize for artificial intelligence, “the first formal instantiation of the Turing test”.
The prize is not, by Loebner’s own admission, a rigorous academic test. The programs competing are also not necessarily the most impressive in the field: entrants tend to be enthusiasts’ passion projects, rather than multimillion-pound ventures, such as the iPhone’s talking assistant Siri.
Computers have not evolved quite as Turing expected them to, but Loebner has stayed determined to run the competition to the founding father’s precise specifications. To mark the centenary of Turing’s birth this year, the contest was held for the first time in its history at Bletchley Park, and I went along to see if a computer could manage to persuade a panel of humans that it was a real person.
“Your job,” explains the award’s colourful founder Loebner, to his four nervous volunteers, “is to convince the judges that you are the human.” Moments later, the four of them will sit down at their screens and begin the first of four competitive online chats. Their opponents hum quietly on the table next to them: four unmanned computers, each set up by a neutral engineer, each with a different conversational software program installed, known as “chatbot”, designed by AI enthusiasts to be mistaken for a human being.
Across the hall, in Bletchley Park mansion’s cosy Morning Room, four judges sit at another bank of screens. In each of the competition’s 25-minute rounds, the judges will hold two online chats simultaneously – one with a volunteer and one with one of the chatbots. They have not been told in advance which is the person and which the computer. If a bot manages to fool two or more of the judges, it will win its creator a gold medal engraved with Turing’s image, and $100,000 (£64,000).
This is Loebner’s “grand prize”, which nobody has ever won. In fact, year on year, with very few exceptions, not a single judge is fooled. The last time a chatbot successfully “passed” – in a single round of the 2010 competition – it did so only because a volunteer didn’t follow instructions and chose to imitate a robot. When none of the judges are fooled, a $5,000 “bronze award” is given to the bot they rank “most human-like”.
Being here at Bletchley Park, says Loebner, is “like treading on hallowed ground”. But Turing might have been a little disappointed with the competitors. When he proposed the game, he predicted computers would be comfortably passing the test “in about 50 years’ time”. Yet 62 years on, Loebner is disparaging about the competitors. “These are rudimentary,” he says. “They have the intelligence of a two-year-old.”
It isn’t hard to see what he means. The first bot gives itself away just 10 seconds into its opening conversation. “Hi, how are you?” asks the judge in both windows. “I’m fine, thanks for asking,” comes one reply, the other: “Please rephrase as a proper question, instead of ‘Jim likes P.'” No prizes for spotting the human there.
Another bot blows its cover by asking : “Did you hold funerals for your relatives when they died?” (The judge’s response: “No, I normally cut up the bodies and buried them myself.”) A third bombards questions: “Have you recently been to a live theatre?”, “Have you recently been to the art gallery?”, “Do you want a hug? Do you have a child? Do you want a child? I can’t.”
One tries to confuse a judge by being petulant (“Do you have a point? I must have missed it”), while last year’s winner, Talking Angela, does its best to fool them by posing as a teenage girl: “I really like Lady Gaga. I think it’s the combination of the sound and the fashion-look that appeals to me,” before coming unstuck by claiming: “I’m a cat.”
As predicted, the judges aren’t taken in at all. “It became apparent quite quickly in all cases,” says volunteer judge Michael Stean, who is also a chess grandmaster, though he admitted to being fooled by small patches of one or two of the conversations. “I think if you went through the conversations and you edited out the answers that were obviously wrong, it would be quite a close contest.”
David Levy, whose bots have won the bronze prize twice, has managed to fool a judge just once: “The first time I won was 1997. We stayed up and watched the news the night before, and I wrote a script based on that. The news was that Ellen DeGeneres came out as a lesbian.” Levy’s bot began all its conversations by asking the judge what they made of the news, and even shared its own opinions. “In the first section, one of the judges was completely fooled.”
Though he won that year, Levy is keen to stress the many practical applications of the technology. “I think there’s an absolute fortune to be made in this field,” he explained. “I think already there are areas of medical diagnosis where it’s been proven that computers can do better than doctors. The problem is there’s a huge amount of litigation. But the logical question is which would you rather be diagnosed by, a human doctor who’s 80% right or a computer doctor that’s 95% right?”
It’s not just medicine either. Levy is confident that in 30 or 40 years, “there will be robots that are very human-like that people will be forming friendships with, and having sex with, and falling in love with”.
For now though, even this year’s “most human-like computer” is unlikely to be receiving any love letters. With the rankings tallied, the $5,000 prize goes to American chatbot Chip Vivant, the same bot that told one judge, “Please rephrase as a proper question, instead of ‘Jim likes P'”. For its creator, American programming consultant Mohan Embar, it is success at the fifth attempt. “It feels wonderful, obviously. In the early 2000s, when reading transcripts of previous years’ competition, my mouth started to water and I knew I wanted to be a part of this.”
For Embar, creating his bot wasn’t about deceiving the judges so much as offering them a meaningful conversation. “I’m not interested in creating a chatbot that fools people, but rather one that can empathise with and provide comfort to people who can’t or don’t want to get it from a real person. I’ve become keenly aware of the futility of creating a program that comes anywhere close to fooling someone who knows what they’re doing.”
Before I leave I ask Loebner if he thinks anyone will ever manage it. “It’ll come,” he says. “Probably long after I die.”
[Android via Andrea Danti / Shutterstock]