Moderated by Stephen Muggleton.

Donald Michie

Return of the Imitation Game

The article mentioned above has been submitted to the Electronic Transactions on Artificial Intelligence, and the present page contains the review discussion. Click here for more explanations and for the webpage of theauthor, Donald Michie.
Overview of interactions

N:o Question Answer(s) Continued discussion

1 15.10  Anonymous Referee 1

2 15.10  Anonymous Referee 2

Q1. Anonymous Referee 1 (15.10):

This paper describes both weak and strong versions of Turing's "imitation game" and some considerations about what might be necessary for a system to pass the strong version. It then goes on to describe the three steps of Turing's "Child-Machine" project and some aspects of what would be involved in achieving parts of these steps. The paper concludes by describing research undertaken by the author and his collaborators on their SOPHIE system. I found all these aspects of the paper -- the historical background, the abilities required to satisfy the strong form of the imitation game, and the description of the SOPHIE system -- very interesting and recommend the paper be accepted.
My comments below are divided into two parts. The first part involves some points that I would feel inclined to make as a reviewer of the published version or as a commentator at a workshop in open debate. They flow more from my particular view as an AI researcher than from a consideration about how to improve the manuscript -- which latter, except for a few details that I'll mention in the second part, seems to be quite fine. I do not mean to imply that this first batch of arguments should necessarily influence the final version, but I give them here anyway. (The page numbers used in this review correspond to the numbers that got printed on my Microsoft Word version of the paper.)
Part One -- Some Arguments
Many (alas not enough!) AI researchers still believe in Turing's goal of building a machine able to "compete with men in all purely intellectual fields." To attempt, as the author does on page 7, to substitute for Turing's goal the one of cooperating "with mean (and women) in all purely intellectual fields" sounds a bit like trying to steal the show. The author's "cooperation goal" is certainly worthy, and may in the end be the best route through which to achieve Turing's goal. But I think it would have been better to propose the cooperation goal as one that could stand alongside Turing's instead of replacing it.
I think overmuch is made of "chat" in the paper. Being able to engage in chat-like conversation is, of course, important in its own right; and it is probably necessary in order to satisfy the strong form of the imitation game. It may even, as the author says, place "on the shoulders of AI a new responsibility." But I don't think being able to chat is anywhere near sufficient! The reader may rightly wonder whether the importance of chat is being rightly elevated because that is one of the things SOPHIE is able to do (to some extent). The author does mention (in connection with Laird's work at the top of page 5) some of the other things (besides chat) that might be necessary, but there is not enough acknowledgement that these things must ultimately be added to SOPHIE's descendants if they are able to pass the test Turing though might be passed in 2050. A system being "ever-ready to default to chat-mode to sustain rapport" seems to me to be putting the cart before the horse.
I like very much the emphasis on "rapport maintenance" discussed on pages 8 and 9. Some researchers might make the point though that "real rapport" (which might be required to pass the strong version of the imitation game) will be a good deal harder to achieve than illusory or surface rapport (which I'm afraid is all that systems like SOPHIE are capable of).
Regarding parsing, even the developers of some of the commercial question-answering systems that do not presently do parsing admit that ultimately they will have to do some linguistic analysis.
I'm not as optimistic as the author is (on page 6) that "Step 1 does not look in too bad shape." To "acquire knowledge" will require (in my opinion) already having a great deal, and I'm not sure that even Doug Lenat thinks he is quite ready yet.
In terms of the steps (on page 6) of the child-machine programme, I take it that SOPHIE is intended to be a contribution toward Step 2 (Integrate) -- which in itself is a pretty tall order. I would have liked to have seen an analysis of just what aspects of Step 2 SOPHIE is capable of and what is still beyond the grasp of SOPHIE-like systems. Although SOPHIE can engage a use in conversation at a superficial level and even answer a good number of detailed questions about specific domains, I doubt that it has (or could soon get) "sufficient language understanding to be educable, both by example and by precept." In making this conservative assessment, perhaps I missed some features of SOPHIE or plans that the author and his colleagues might have for adding features.
Of course, in order to achieve Step 3 (Educate), a lot more progress will need to made on Step 2. Are there detailed plans for making this progress based on something like SOPHIE as a platform?
Part 2 -- More Detailed Comments About the Paper
Page 2: NativeMinds is not supposed to have a space between Native and Minds.
I am not sure I would describe the four applications listed on page 2 (plus the NativeMinds application) as examples of bluffing their way through an interactive session. Although the illusion of interacting with a human certainly depends on bluff and superficial mechanisms, the information imparted by these applications is accomplished by somewhat deeper mechanisms. Perhaps a better job could be done of distinguishing between what parts of these programs are bluffs and what parts are real. Perhaps that could be done specifically when describing SOPHIE.
Pages 4 and 5: Can the part about ascribing mental qualities to machines be more tightly connected to the rest of the paper? Clearly people do ascribe mental qualities to machines, as the author points out, but is such ascription necessary for useful human-machine interaction? If so, perhaps that point could be made explicitly.
Pages 9 and following: I would attempt to add something to distinguish SOPHIE from earlier conversation programs such as ELIZA, DOCTOR, and PARRY. ELIZA at least was offered as an example of how much could be done toward fooling a human using only very superficial mechanisms. The reader is in danger of concluding that SOPHIE is not much more sophisticated unless s(he) is given a peek at some of the details.

Q2. Anonymous Referee 2 (15.10):

Recommendation: The paper can eventually be accepted for ETAI. Another refereeing round is needed.
This is a fascinating paper that relates the classical topic of Turing's imitation game to the interactive situations that are offered on a very large scale by modern chat systems and Internet-based customer support systems. The basic message of the paper is clear and convincing.
I do feel however that the paper does not report on solid results to a sufficient degree for being a journal paper. It would seem to be quite easy to repair this lack, however, in either or both of the following ways:
1) Explain how the program organizes the dialogue. It appears from the two examples given in the article that the program often uses S-R behavior where sentences of a certain kind constitute one kind of stimulus, but presumably it also has a 'goal' or 'direction' construct that provides coherence to the conversation. Is this so and how does it work? Does it maintain a current state for the dialogue agent; if so what are the most important components of that state? These are just a few examples of the questions that the reader is likely to ask.
2) Report on the statistical outcome of experiments with the systems described in the article. This is particularly natural in view of the criteria that Turing himself set up and that are mentioned in the article: at most 70% correct answers after 5 minutes of interaction. Would the present system qualify for that criterium? If the question is not an appropriate one to ask, why is it not?
I also find it a bit difficult to relate the chat dialogue that is described within the article, with the customer service facility that is described in the appendix. It is plausible that the same chatting facility an be used for both, but how does the system connect back and forth between serious dialogue and chat? This would seem to be of major importance in any application. After all, you don't want your interlocutor to break into chatting at a significant point in a conversation, e.g. at the 'take it or leave it' point in a negotiation (unless the system does it on purpose in order to gain time, but even then it must be very aware of what it is doing in order to do it right). Also, having a nice way of getting back from chat mode to serious mode is an art in itself - how is it transmitted to the computer?
I hope these comments can help to further strengthen what is already at this point an interesting article.

Background: Review Protocol Pages and the ETAI
This Review Protocol Page (RPP) is a part of the webpage structure for the Electronic Transactions on Artificial Intelligence, or ETAI. The ETAI is an electronic journal that uses the Internet medium not merely for distributing the articles, but also for a novel, two-stage review procedure. The first review phase is open and allows the peer community to ask questions to the author and to create a discussion about the contribution. The second phase - called refereeing in the ETAI - is like conventional journal refereeing except that the major part of the required feedback is supposed to have occurred already in the first, review phase.
The referees make a recommendation whether the article is to be accepted or declined, as usual. The article and the discussion remain on-line regardless of whether the article was accepted or not. Additional questions and discussion after the acceptance decision are welcomed.
The Review Protocol Page is used as a working structure for the entire reviewing process. During the first (review) phase it accumulates the successive debate contributions. If the referees make specific comments about the article in the refereeing phase, then those comments are posted on the RPP as well, but without indicating the identity of the referee. (In many cases the referees may return simply an " accept" or " decline" recommendation, namely if sufficient feedback has been obtained already in the review phase).