Nuance Communications has announced a new annual competition to develop artificially intelligent agents capable of solving the "Winograd Schema Challenge", a possible alternative to the Turing Test — and one that's intended to provide a more accurate measure of genuine machine intelligence.
Indeed, the recent proclamation that the "Eugene Goostman" chatbot had passed the Turing Test has exposed its inadequacies. The developers of the chatbot fooled judges into thinking it was a 13-year-old human rather than achieving true human-like cognitive capacities.
As noted by Hector Levesque, a professor of Computer Science at the University of Toronto, and a developer of the new test, "Chatbots...can fool at least some judges into thinking it is human, but that likely reveals more about how easy it is to fool some humans, especially in the course of a short conversation, than the bot's intelligence."
Levesque's Winograd Schema Challenge (WSC) takes a different approach. Instead of basing the test on short free-form conversations, the WSC poses a set of multiple-choice questions that look like this:
I. The trophy would not fit in the brown suitcase because it was too big. What was too big?
Answer 0: the trophy
Answer 1: the suitcase
II. The town councilors refused to give the demonstrators a permit because they feared violence. Who feared violence?
Answer 0: the town councilors
Answer 1: the angry demonstrators
The answers to these multiple choice questions should be fairly obvious to the average person, but ambiguous for machines devoid of human-like reasoning or intelligence. Indeed, humans use a surprising number of cognitive skills when answering these questions, including abilities in spatial and interpersonal reasoning, knowledge about the typical sizes of objects, how political demonstrations unfold, and many other types of commonsense reasoning.
What's more, based on the ambiguity of the questions, a machine can't simply access the Internet or a resource to find the answer; it has to literally reason it out. So an expert system like IBM's Watson would have a real hard time with this test as it's designed for different tasks altogether, namely natural language processing and data acquisition/analysis (and it does so by taking a probabilistic approach).
The WSC is being proposed as a way to measure and track progress in the development of automated human-like reasoning skills. To that end, a test will be administered on a yearly basis by Commonsense Reasoning Symposium, the first to be held at the 2015 AAAI Spring Symposia at Stanford University from March 23 to 25, 2015. Each year, the test will feature an entirely new set of questions. Researchers and students will be invited to design computer programs that simulate human intelligence. The winner that meets the baseline for human performance will receive a grand prize of $25,000.