Tuesday, September 15, 2009

Appropriateness as a criterion for speech IVR

Someone who is putting together a conference symposium on "the impact of speech technology on society" pinged me and a big bunch of other opiniated people on this topic and asked for input. My quick effort to be an IVR/speech industry forward-thinker follows.
Where the IVR industry will go is away from homegrown, do-everything speech recognition systems running on a company's own platform to highly specific, appropriate applications running on hosted platforms. A lot of companies have spent a ton of money on speech development tools and handed them to IT people who figure that speech is like every other technology - you just hack on it until you get it right. When it doesn't work the companies blame the technology, and the IT department (or communications department) makes excuses.

Examples of highly specific, appropriate use applications are things like name capture and city and address capture. The applications that do this well are created by speech rec companies that understand what they are doing and design and test them for a long time, refining them for a long time. These small, self contained applications may be embedded in larger apps that may be DTMF only.

Hosting allows the speech vendor to gather an enormous amount of data from various companies' callers and use that data to improve their product. IT departments at individual companies don't have the resources or expertise to do that. Another specific, useful application is speech to text transcription. Again, a few speech vendors will succeed at doing this, but most companies don't have the resources.

The idea of speech recognition systems as giant, expensive opportunities to push an auditory brand at customers who call will go away.
"Appropriateness" is really something that is hard to talk to companies about. Once they've made up their minds about speech, based on some conversations you weren't at, it's very hard to pull them back and get them to think about whether speech is an appropriate modality for the IVR app they want to develop. I've tried to engage a number of people at companies, pointing out that a three-item menu of easily discriminable labels is a lot easier to implement and use if done in DTMF. No sale. After the speech decision has been made then it's full steam ahead, and anyone who says otherwise is simply engaging in "analysis paralysis."

