As many reported yesterday, a chatbot has passed the Turing Test, though under some very convenient conditions. The announcement has led some to declare that the age of AI is finally here, but that's nonsense. Here's why the Turing Test is a very poor way of measuring machine intelligence.
A quick recap of yesterday's announcement: A Russian-designed chatterbot named "Eugene Goostman" convinced 33% of judges that it was human. But the developers kluged their way to victory by representing Eugene as a 13-year-old, non-native-English-speaking Ukrainian boy. This made its mistakes and inadequacies as a full-blown human more believeable — a tactic which shows how craptastic the Turing Test really is.
It worth noting that this is not the first time a chatbot has passed the Turing Test; Cleverbot did it back in 2011. Eugene's developers, with cybernetics professor Kevin Warwick at the PR helm, have done a masterful job at heralding their "achievement," making it seem like much more than it really is.
To be clear, Eugene is not a supercomputer, nor is it artificially intelligent or aware in the way many media outlets are claiming. Much of this has to do with the exaggerated importance many people ascribe to the Turing Test.
It's also important to note that chatbots — though impressive — have virtually no influence on the development of bona fide artificial intelligence. Indeed, the Turing Test, it would seem, has devolved into a contest that seeks to trick judges into believing they're interacting with a human. This is most certainly not what Alan Turing had in mind when he came up with the concept some six decades ago.
Today, most of us take it for granted that our brains — to a certain degree — work along computational principles. This approach, which is referred to as cognitive functionalism or computationalism, is a product of the digital age. It's no coincidence that mind and consciousness studies never really took off with any kind of fervor or sophistication until the advent of computer science. We now have a model that helps explain cognition; AI theorists have finally been able to study things like pattern recognition, learning, problem solving, theorem proving, game-playing, just to mention only a few.
We can thank Alan Turing for much of this. His Church-Turing Hypothesis made the extraordinary claim that whenever there is an effective method for obtaining the values of a mathematical function, the function can be computed by a Turing Machine (a hypothetical device that can simulate the logic of any computer program). In other words, logical computing machines can do anything described as being "rule of thumb" or "purely mechanical." This claim led Turing and others to wonder if the brain operated along similar principles, leading to the profound question: Can machines think?
To determine whether or not human-like intelligence could take residence in a machine, Turing devised a test in 1951 called "The Imitation Game." In the test, judges working at a computer terminal had to decide which agent among a series of contestants was a machine. Arbitrarily, he felt that if even less than 50% of the judges mistakenly believed they were interacting with a human, then intelligence must be present in the machine.
His naivete can be excused. It's important to remember that Turing was speculating about these things at a time when artificial intelligence was first emerging as a field. Our notions as to what constitutes "intelligent behavior" has changed considerably since then. It was thought, for example, that computers would never defeat humans at chess — a conviction that came crashing down in 1997. Indeed, our conceptions of intelligence keeps shifting as our technological artifacts exhibit talents previously thought out-of-reach by a machine.
Along these lines, Turing figured that human conversation was among the more "intelligent" things a machine could do, hence the Turing Test. Many people, then and now, argue that human-like interaction is an essential component of human-like intelligence. A artificial intelligence is no good, they argue, if its intelligence lies trapped in an unresponsive program.
But Turing had no way of knowing that human conversation — or the appearance of it — could be simulated by natural language processing (NLP) software and the rise of chatterbots. Yes, these programs exhibit intelligence — but they're intelligent in the same way that calculators are intelligent. Which isn't really very intelligent at all. More crucially, the introduction of these programs to Turing Test competitions fail to answer the ultimate question posed by the test: Can machines think?
Though impressive, and despite their apparent ability to fool human judges, these machines — or more accurately, software programs — do not think in the same way humans do. For example, here's a transcript of a 2012 conversation with a judge which resulted in them picking Eugene over the human subject operating another terminal:
[16:11:25] Judge: what music do you like?
[16:11:32] Eugene: I like Eminem – because he doesn't have a thing to do with "polite-correctness".
[16:11:46] Judge: OMG i love eminem, he's my favourite artist!
[16:12:00] Eugene: I wonder what Eminem's nick mean. I associate it only with a word "enema". But he is a cool guy, I like his "The Real Slim Shady".
[16:12:55] Judge: i know what his nick name means, his real name is Marshall Mathers. From the initials M&M phynetically sounding like Eminem.
[16:13:06] Eugene: Eminem is a famous crapper. No, raper... ripper... Yeah, I mean - rapper. I like his "Stan" and "The Real Slim Shady".
[16:13:32] Judge: I like "Hailies song"
[16:13:46] Eugene: Interesting tastes. I like science fiction books, by the way. Where do you came from, by the way? Could you tell me about the place where you live?
[16:14:22] Judge: Epsom, home of the derby. yourself?
[16:14:27] Eugene: My guinea pig says that name Derby sounds very nice.
[16:14:41] Judge: i shopuld talk with him sometime
[16:14:48] Eugene: I think you should take some typing lessons. Or just type more accurately.
[16:15:04] Judge: that's very rude?
[16:15:09] Eugene: Ummm... Frankly, I didn't get your question:-(
[16:15:49] Judge: "Ummm" isnt a word. i think you need typing lessons
[16:15:58] Eugene: My thoughts are same. By the way, I still don't know your specialty - or, possibly, I've missed it?
It's all smoke and mirrors, folks. There's no thinking going on here — just quasi pre-programmed responses spouted out by sophisticated algorithms. But because Turing's conjecture was directed at assessing the presence of human-like cognition in a machine, his test falls flat. It has not stood the test of time and should subsequently either be retired or refined.
Indeed, critics of the test argue that Turing's behavioral criterion of intelligence is insufficient and irrelevant to AI. What matters is that the computer can demonstrate cognitive ability, regardless of behavior. Indeed, it's not necessary for a program to speak in order for it to be intelligent. Thus, many humans and intelligent systems would likely fail the Turing Test. At best, it can be said that the Turing Test can assess for an extremely small and insufficient subset of human behaviors.
But the inadequacies of Turing's Test has also led to its perversion. As pointed out by Jason Hutchens in his essay, "How to Pass the Turing Test By Cheating," it's surprisingly easy to code a human conversation simulator, or chatbot. Moreover, the Turing Test is a poor test of intelligence — one that it encourages trickery, not intelligent behaviour. Hutchens predicted that the $100,000 Loebner Prize, a real-world instantiation of the Turing Test, would eventually fail on account of these shortcomings.
The Turing Test also fails to account for self-awareness or sentience in a machine. Part of the problem is that it conflates intelligence with sentience; it only tests how subjects act and respond — behaviors that can be simulated.
It's also a poor measure of intelligence. Some human behavior is unintelligent (e.g. random, unpredictable, chaotic, inconsistent, and irrational behavior). Moreover, some intelligent behavior is characteristically non-human in nature, but that doesn't make it unintelligent or a sign of lack of subjective awareness.
By the same token, we wouldn't use the Turing Test to determine whether or not highly sapient animals are intelligent.
It's also subject to the anthropomorphic fallacy. Humans are particularly prone to projecting minds where there aren't. In this sense, the test is more a measure of human intelligence — and gullibility — than in trying to locate the presence of thinking processes in a machine.
The Turing test also fails to account for the difficulty in articulating conscious awareness. There are a number of conscious experiences that we, as conscious agents, have difficulty articulating, yet we experience them nonetheless. For example:
- How do you know how to move your arm?
- How do you choose which words to say?
- How do you locate your memories?
- How do you recognize what you see?
- Why does seeing feel different from Hearing?
- Why are emotions so hard to describe?
- Why does red look so different from green?
- What does "meaning" mean?
- How does reasoning work?
- How does commonsense reasoning work?
- How do we make generalizations?
- How do we get (make) new ideas?
- Why do we like pleasure more than pain?
- What are pain and pleasure, anyway?
All this is not to say that artificial intelligence is impossible, or that we should give up on trying to locate it in a machine. In fact, it's crucial that we figure this out; the day is coming when an artificial intelligence becomes sophisticated enough and self-aware enough such that it migrates from an object of inquiry to a subject worthy of moral consideration.
In other words, we need to devise a test that measures the personhood of an AI, much like the trial portrayed in the excellent ST:NG episode, The Measure of a Man. And indeed, James Sennett has proposed a similar modification to the test.
Additionally, Steven Harnard has proposed a test where, instead of conversation, a machine must interact in all areas of human endeavor. And instead of a five minute conversation, the test would last for the entire length of time the machine's lifetime.
An in terms of developing AI, a little less chatbot and a little more whole brain emulation and/or cognitive modeling.
Images: diuno/Shutterstock | vistudio/Shutterstock