Interactive Voice Response: June 2007

Tuesday, June 26, 2007

Awash in real interaction data

In the past I've lamented the lack of published VUI user data. If you go to the big design conferences (HFES, CHI, UPA, etc.) you'll find lots of data from studies on web navigation and mobile device use, but very little on speech interfaces. I know this because when I present a study at these conferences the speech session (if there is one) is very small. The science and practice is built on good data, and there is much to go on right now.

There are, in fact, lots of data. Nearly all implementations go through at least one tuning cycle in which caller utterances are recorded and transcribed and matched to the speech recognizer's responses. These data are then analyzed for things such as grammar coverage and the appropriateness of the prompts and so on. Each tuning exercise generates a LOT of data. Unfortunately, the data are usually locked up by the company conducting the tuning exercise, never to see the light of day. There are legitimate reasons why companies don't share their data. Publication of the data is of little value to the company that produces it, and competitors could take advantage of it without responding in kind. Even if scrubbed, data could be used to identify the company or business units for which the data were produced. So I can see why companies are disinclined to distribute their data.

Still, I wonder what sort of intervention could be put in place to motivate companies to share their tuning data for the purpose of improving the best practices in the industry. If anyone has ideas, let me know.

Thursday, June 21, 2007

Deckard's Rules for VUI Development

In my last post I raved about the great science fiction movie Blade Runner. Based on the some of the issues raised in the movie, I'd like to propose (only partly in jest) Deckard's Rules for VUI Development. The rules are named for the character Deckard rather than Tyrell because (1) Tyrell got it wrong and (2) Deckard had to clean up the mess.

Rule 1: Don't over-engineer your system
Replicants Roy and Leon were built as fighters, presumably to protect property on off-world colonies. They were also created smarter and stronger than humans, and could pass for humans on Earth. Is it a good idea to build autonomous agents that are smarter and stronger than you and give them the capacity and motive to kill? No. It's a bad idea. Build your VUI system to do what its users need it to do and nothing more. And don't try to build something that could pass for a human.

Corollary to Rule 1: Keep your eye on the ROI
The technically sophisticated replicants were designed to self destruct after four years on fears that they could develop uncontrollable feelings and emotions if allowed to live any longer. Replacing advanced technology is extremely expensive - better to build your system to last, and maintain it as needed.

Rule 2: Words are powerful shapers of behavior - be brief and to the point
Deckard and Sebastian were manipulated into doing what others wanted them to do; Deckard into chasing replicants and Sebastian into setting a meeting between Roy and Tyrell. The dialogs required to do this were terse but gave sufficient direction to the targets that they understood what needed to be done. Good VUI dialogs are short and give users direction on what they need to say, without resorting to the painful "In order to do x, say x" construction.

Rule 3: Our methods for evaluating advanced technology isn't good enough - we need better tools
The Voight-Kampff test for determining whether a subject was replicant or human was nearly obsolete. It took Deckard, a trained evaluator, 100 questions to determine Rachael's identity. Our evaluation methods for VUIs are in a similar state. We still mostly rely on usability testing and questionnaires that were developed to evaluate web pages and GUIs. I gave a presentation at a UPA workshop in 2002 and pointed out that our old usability test protocols aren't sufficient for evaluating things like autonomous agents and speech systems. We need better evaluation methods.

Tuesday, June 19, 2007

Blade Runner: Director's Cut

I watched the 1982 movie Blade Runner the other night. I think it's one of the great science fiction movies of all time, but don't take my word for it, check out the online reviews. In the world of the future, dangerous work on other planets is done by "replicants," near-human artificial beings; they form an exploited class. Harrison Ford is a detective who must chase and destroy some replicants who have travelled back to earth in order to escape their enslavement. How near the replicants are to humans is one of the main concerns of the movie. The movie re-visits, with a little twist, one of the oldest arguments in artificial intelligence: if a machine demonstrates behaviors that in a human would be acknowledged as "intelligent," can we call the machine intelligent? A lot of philosophers insist that "no," intelligence is the property of humans only, and intelligence can reside only in things that think with wetware, or biological brains. Computers think with silicon and wires and thus can never be considered intelligent.

The little twist introduced in the movie is not over the replicants' supposed intelligence. The replicants clearly are intelligent. The movie even tweaks an old AI argument in one scene when one of the replicants beats its creator in chess. At one time it was said that machines could never play a good game of chess because chess requires insight and intuition, those very high level forms of intelligence, that machines could never possess. What's more, machines can only do what you program them to do, so how can you program something that's better than yourself? Of course, that argument has been settled pretty decisively.

Instead, the movie explores the idea of whether the artificial beings are capable of emotion and feeling. The replicants show a great range of emotion: anger and cunning, loyalty, sadness at the death of one of their group, resentment over the treatment of others in their class, and fear over their impending deaths. In fact, they demonstrate far more emotion than the humans in the movie. Harrison Ford's character is a tough guy detective type who drinks in order to suppress his emotions. His employer shows little concern for him despite his poor condition.

So, the question. If the replicants display what appears to humans as emotion, are they really feeling emotion? If so, do we need to be concerned over their exploitation? Were the replicants emotions something that they were programmed to display, or was this a side effect of having made them so complex? Lots of interesting questions, with no answers forthcoming in the movie. Certainly Ford's detective was affected by the replicants. He feels physically ill at having destroyed one, and it's suggested that the demands of the job are one of the reasons he drinks so much. By the end of the movie, one's sympathies are with the exploited replicants.

What does this have to do with VUI? Besides the obvious fact that the replicants have astonishing speech recognition capabilities (there's not a single "Sorry, I didn't understand" in the whole movie) there may be lessons for those who concern themselves with the design of the VUIs persona. I'll discuss the lessons in my next posting.

Thursday, June 14, 2007

800 GOOG 411

I tried Google's speech application for finding business phone numbers. I successfully found a business in Boulder, CO, in my first attempt. I'd need to work with it more before I passed judgment on how functional it is. What I found remarkable is the presentation. Here's the initial greeting.

"Calls recorded for quality. GOOG 411 experimental. What city and state?"

It's quick and businesslike. No effort to be cute or fancy. Let me be the first to say (at least I'm think I'm the first) that Google has apparently tried to capture the visual presentation of its web page in an auditory presentation. It succeeds. Its web page is just a white page with a logo, an input box, two buttons, and a small number of links. If you were trying to translate that visual presentation into a VUI you couldn't do a better job than Google has.

By being simple and almost terse, Google created a unique, differentiated experience. If its speech browser's performance is as good as its web site, it will have a winner.

Sunday, June 3, 2007

VUI best practices Pt. II - good or evil?

In my last entry I discussed one of the disadvantages of following user interface design guidelines: if the guidelines are poorly written then the quality of the designs based on the guidelines will suffer (and the interface designers who must follow the guidelines will push back). That much should be obvious. What about well-written guidelines? As pointed out in the previous entry, following even well-written design guidance have disadvantages.

The second danger of standards and guidelines: if taken as gospel, they constrain active thinking by the designers who rely on them.

Designers who make informed judgments about stepping outside of common practice to try something new are engaging in what Diego Rodriguez and Ryan Jacoby call "design thinking:" the act of taking risks in order to produce distinctive and usable products. Rodriquez and Jacoby's excellent article on design thinking asserts that designers take risks in order to learn and to excel, but mitigate risks using skills that should be in every designers skill set: prototyping, storytelling, and the ability to actively listen to customers.

So where does that leave user interface guidelines?

In fact, knowing when to ignore rules and push boundaries and when to re-use portions of previous designs and previous practice are both characteristics of good designers. Well-written guidelines are one way that re-use is achieved, and re-use is generally a good thing. The best guidelines are created (designed) from user data and from practice. They're a way of capturing the experience designers have with their designers and putting it into a usable form. The trick is in knowing when to re-use and when to take a chance and design something truly novel.

The VUI design world, in particular, suffers from a lack of published, reliable data on which to make informed decisions about VUI design. Go to any CHI or HFES conference and you'll find loads of studies on web and GUI applications - those domains have been under investigation for years. Not so in VUI design. As some speakers at VUI conferences insist, we really do need better guidelines for design. Before we get there, though, we need more and better data.

Take home message: yes, "design thinking" is good. Guidelines can be misused if taken as gospel. But we need a way to re-use our successes without resorting to copying. That's design thinking as well.

Interactive Voice Response