Updating the Turing Test

Alan Turing proposed a simple test for machine intelligence. Based on a parlour game where players try to tell if a hidden person is a man or a woman just by passing notes, he suggested we define a computer as intelligent if people can't tell it from a human being through conversations with both over a teletype.

While this seemed like a great test (for those who accept that external equivalence is sufficient) in fact to the surprise of many people, computers passed this test long ago with ordinary, untrained examiners. Today there has been an implicit extension of the test, that the computer must be able to fool a trained examiner, typically an AI researcher or expert in brain sciences or both.

I am going to propose updating it further, in two steps. Turing proposed his test perhaps because at the time, computer speech synthesis did not exist, and video was in the distant future. He probably didn't imagine that we would solve the problems of speech well before we got handles on actual thought. Today a computer can, with a bit of care in programming inflections and such into the speech, sound very much like a human, and we're much closer to making that perfect than we are to getting a Turing-level intelligence. Speech recognition is a bit behind, but also getting closer.

So my first updated proposal is to cast aside the teletype, and make it be a phone conversation. It must be impossible to tell the computer from another human over the phone or an even higher fidelity audio channel.

The second update is to add video. We're not as far along here, but again we see more progress, both in the generation of digital images of people, and in video processing for object recognition, face-reading and the like. The next stage requires the computer to be impossible to tell from a human in a high-fidelity video call. Perhaps with 3-D goggles it might even be a 3-D virtual reality experience.

A third potential update is further away, requiring a fully realistic android body. In this case, however, we don't wish to constrain the designers too much, so the tester would probably not get to touch the body, or weigh it, or test if it can eat, or stay away from a charging station for days etc. What we're testing here is the being's "presence" -- fluidity of motion, body language and so on. I'm not sure we need this test as we can do these things in the high fidelity video call too.

Why these updates, which may appear to divert from the "purity" of the text conversation? For one, things like body language, nuance of voice and facial patterns are a large part of human communication and intelligence, so to truly accept that we have a being of human level intelligence we would want to include them.

Secondly, however, passing this test is far more convincing to the general public. While the public is not very sophisticated and thus can even be fooled by an instant messaging chatbot, the feeling of equivalence will be much stronger when more senses are involved. I believe, for example, that it takes a much more sophisticated AI to trick even an unskilled human if presented through video, and not simply because of the problems of rendering realistic video. It's because these communications channels are important, and in some cases felt more than they are examined. The public will understand this form of turing test better, and more will accept the consequences of declaring a being as having passed it -- which might include giving it rights, for example.

Though yes, the final test should still require a skilled tester.


You wrote: "Today a computer can, with a bit of care in programming inflections and such into the speech, [sound] very much like a human,..."

So true. I developed a short video clip regarding this and posted it on YouTube. I developed it for entertainment. It isn't supposed to be an actual Turing test or demonstration of an actual Turing test. But it is my way of showing how advanced software can communicate with humans. A new way of Human-Computer interaction.

Hallie talks with Dave (a look into the future of Voice Recognition)

The software used in the demo is e-Speaking (http://www.e-Speaking.com) in case you're interested. The demo is a series of voice commands where my command is responded to by the computer with an elaborate statement (pre-programmed). Again, this isn't exactly Artificial Intelligence. But it does seem a little like a conversation between man and machine. ;-)

I believe Darwin was the first to suggest that visual expression
of emotions, primarily through facial gestures, evolved to make
it difficult to fake emotions, thus letting one know if someone
else is REALLY sad, angry etc.

Thus, your update of the Turing test is in a good tradition!

nice idea, but this exists already. It's called the Loebner Prize and is actually even more complex than your test, since it includes visual recognition too to fully win.

I think that's inherent in any videophone test because the human doing the test will communicate with visual cues to the computer, which it must be able to read.

My goal here has some similarity to Loebner's but at the core of what I propose is not so much making the test harder, as making the test more convincing to the general public. I believe we are likely to create good code for audio/visual recognition and generation before we develop AGI, so these goals don't make the test any harder to pass. They just make the public better able to understand what passing means.

In Turing Test Two, two players A and B are again being questioned by a human interrogator C. Before A gave out his answer (labeled as aa) to a question, he would also be required to guess how the other player B will answer the same question and this guess is labeled as ab. Similarly B will give her answer (labeled as bb) and her guess of A's answer, ba. The answers aa and ba will be grouped together as group a and similarly bb and ab will be grouped together as group b. The interrogator will be given first the answers as two separate groups and with only the group label (a and b) and without the individual labels (aa, ab, ba and bb). If C cannot tell correctly which of the aa and ba is from player A and which is from player B, B will get a score of one. If C cannot tell which of the bb and ab is from player B and which is from player A, A will get a score of one. All answers (with the individual labels) are then made available to all parties (A, B and C) and then the game continues. At the end of the game, the player who scored more is considered had won the game and is more "intelligent".


While this test might reveal things about the analytic skills of the subjects, my revised test is designed to focus on not just a logical analysis of how well questions are answered, but the "feel" of a conversation with a real, creative, thinking being. I find this test interesting because while we might readily conclude than an AI that can solve problems we can't solve is intelligent, it will take more to convince the public that it's not a "soulless machine." That will take people falling in love with AIs (as people often fall in love over just the phone) and more. Not just questions.

The Turing test has been criticised for being too much about human-like intelligence. "We don't judge an airplane by whether we can't tell it from a bird." And indeed, there will be areas of intelligence for which the TT is useless. But at convincing the public to give the beings human rights, it will go farther.

Add new comment