However, the answer to the question, "Why is it taking so long for speech technology to catch on," is simpler than the article lets on. For speech technology, there's really no "catching on" to do. The reason people don't like to use them is because there are so many bad ones out there, and the likelihood that people have had to deal with bad speech systems is pretty high. That sets expectations for the next time someone calls and encounters an unfamiliar speech system. If speech systems were uniformly good then people would use them. A question more to the point is, "Why are there so many bad speech systems out there?"
Part of the reason is due to the difficulty of implementing speech recognition systems properly. I don't mean being able to code up a small application that can pass a handful of test cases conducted under optimal conditions by one speaker in a test lab during QA. I'm talking about large, enterprise-critical applications that handle thousands of calls a day from callers all over the country under every condition imaginable. Getting big applications to function properly takes work and re-work and tuning and constant monitoring. It takes knowing all of the tricks that are available to improving speech recognition performance, and the willingness to implement those tricks despite the expense. Veteran IT people with no experience in speech recognition underestimate the amount of work it takes to get the speech systems working right. People who work in speech recognition say that the technology is mature, and that's correct in a somewhat narrow sense. There still isn't enough experience with the technology to prevent a lot of half-right systems from being released.
The larger problem, however, isn't with the technology. It's with the managers and stakeholders who try to do too much with speech recognition systems. Speech recognition applications are intended to handle simple, repetitive requests that don't require thinking: password resets, simple routing requests, caller identification, forms, rate information. The logic to implementing speech applications is that the applications handle the simple stuff, and everything else gets passed off to an agent. Unfortunately, managers tend to get carried away with the possibilities of a speech system and insist on functionality that is difficult to implement properly, will rarely or never be used, or simply doesn't solve a business problem. It's Jetsonian thinking at work. More than that, though, it's an attitude that customers are only too willing to use anything you put in front of them.
I talked to Ballantine after one of his presentations. He'd made some good points about building systems to solve business problems and addressed an issue that had been bothering me about some speech advocates' and business owners' insistence on creating "natural sounding" systems. I said that I didn't think we should be trying to create systems that could pass the Turing Test. He readily agreed. We'll move the practice forward when we keep our focus on what customers want, and create our business cases to align with their needs.