Monday, July 30, 2007

Catching up on the Dragon Systems founders

Jim and Janet Baker founded Dragon Systems back in 1982. The company produced Dragon Dictate, the first speech recognition system for PCs, and Dragon Naturally Speaking. They sold the company in 2000. They were among the first to use Hidden Markov Models to represent spoken language.

I read an interesting news item on Jim Baker recently. He's moving from Carnegie Mellon to Johns Hopkins University to work on a new research project for the Defense Department and the National Security Agency. The NSA wants to be able to conduct surveillance on millions of phone conversations, and today's speech recognition software isn't up to the task. Thus, they've funded one of the founders of the field to get speech recognition unstuck.

I'm conflicted about this. I admire Baker for the pioneering work he did on speech recognition. I understand that his initial research was funded by the DoD through ARPA, and much good came of that research eventually. Baker is good enough to be able to move the engineering of speech recognition forward. However, I don't trust the motives of the NSA, and if this is made to work properly we'll have on our hands another Big Brother-styled technology available to the government for eavesdropping on US citizens.

Maybe we'll get lucky. Maybe the funding will run out, the program will be seen as an expensive failure, and the NSA will say "so long and good luck." Then Baker will release some really groundbreaking research that will be of benefit to everyone working in speech. That's what I'd like think.

Wednesday, July 25, 2007

Notes from the Home Office

I work at home, from an office just off the front door of my house in Durham, NC. I've joined a growing number of employees who work from home, so in that sense, at least, I'm at the front of the movement out of traditional office spaces and into alternative work environments. Companies like the arrangement because it saves them from moving employees and giving them office (or cubicle, or workroom, or anything else) space. I like the arrangment for a number of reasons.

  • I'm a VUI designer for telephony systems, so much of my work is well-suited for working by telephone.
  • I can focus on doing work. I don't spend much time on administrative stuff and office politics, the twin curses of my previous positions.
  • I spend no time all day on unproductive travel by car.
  • I can listen to my blues and old-time music and no one complains.
  • I've always been good about writing reports as evidence of my work, and that's a useful skill to have if you work remotely.

I guess I could list some shortcomings to this arrangement. One is that I don't get to meet some of the people I work with and rely on, so there's no chance to socialize and get to know them as people. Or at least it isn't as easy. On a long term basis, you need to be in a traditional office setting in order to move up, but I'm not worried about moving up. If someone has any experiences with the pitfalls of work at home, let me know.

Tuesday, July 17, 2007

More GOOG 411 - call recordings

Here's an interesting item about 800 GOOG 411: Google is recording calls for the purpose of improving its application's performance. I've blogged about GOOG 411 before; the service is a simple way to find phone numbers of businesses in any area of the country.

There's nothing unique or objectionable about recording speech samples in order to improve your application performance. Nearly anyone who uses speech IVRs probably wishes they would work better than they do. The article that I linked to quotes Google's plain language privacy policy, stating that Google is recording calls, and-for good measure-collecting ANIs in order to "personalize" the caller's experience. The article goes on to state that the recordings are used for "phonemic analysis" and "voice prints," and conjures an "Orwellian" scenario out of this information.

Whoa. Let's take a breath here. Recordings of interactions between a caller and an IVR don't necessarily mean that they're being used for "phonemic analysis." I listen to recorded calls all the time as part of tuning exercises to improve an IVR application's performance, but there's no "phonemic analysis" involved. And as far as storing voice prints, for the amount of speech that GOOG 411 requires for a search, it would be a pretty ineffective way of collecting a voice print. Not to say that it couldn't be done, but it's not the way voice prints are usually collected.

There's no doubt about one thing: people get very concerned over voice prints and other types of biometrics. I've conducted research on consumers' perceptions of voice prints and what it takes to get people to trust the technology enough to use it. There is genuine mistrust of biometric technologies that companies who employ biometrics need to deal with.

However, I can't find any reference to voice prints in any of the information provided in this article. The author read "recordings" and thought "voice prints." If that's a typical response from a customer to a "calls recorded for quality" announcement, then we all need to do some serious customer education. If Google is, in fact, collecting voice prints, I'd sure like to know how they are doing it.

Friday, July 6, 2007

Why consumers won't use speech telephony systems

I love reading Bruce Balentine's stuff. His writing is not only correct from a technical perspective, but it's thought provoking and entertaining as well. In the article Why is it Taking So Long for Speech Technology to Catch On, Ballentine describes an attitude towards technology that he calls "Jetsonian thinking," a naive belief in the goodness of technology to solve all business problems. I've seen a lot of Jetsonian thinking among business and IT people in the past, and it captures in a neat phrase the phenomenon of project sponsors charging forward on a technical solution without a solid business case.

However, the answer to the question, "Why is it taking so long for speech technology to catch on," is simpler than the article lets on. For speech technology, there's really no "catching on" to do. The reason people don't like to use them is because there are so many bad ones out there, and the likelihood that people have had to deal with bad speech systems is pretty high. That sets expectations for the next time someone calls and encounters an unfamiliar speech system. If speech systems were uniformly good then people would use them. A question more to the point is, "Why are there so many bad speech systems out there?"

Part of the reason is due to the difficulty of implementing speech recognition systems properly. I don't mean being able to code up a small application that can pass a handful of test cases conducted under optimal conditions by one speaker in a test lab during QA. I'm talking about large, enterprise-critical applications that handle thousands of calls a day from callers all over the country under every condition imaginable. Getting big applications to function properly takes work and re-work and tuning and constant monitoring. It takes knowing all of the tricks that are available to improving speech recognition performance, and the willingness to implement those tricks despite the expense. Veteran IT people with no experience in speech recognition underestimate the amount of work it takes to get the speech systems working right. People who work in speech recognition say that the technology is mature, and that's correct in a somewhat narrow sense. There still isn't enough experience with the technology to prevent a lot of half-right systems from being released.

The larger problem, however, isn't with the technology. It's with the managers and stakeholders who try to do too much with speech recognition systems. Speech recognition applications are intended to handle simple, repetitive requests that don't require thinking: password resets, simple routing requests, caller identification, forms, rate information. The logic to implementing speech applications is that the applications handle the simple stuff, and everything else gets passed off to an agent. Unfortunately, managers tend to get carried away with the possibilities of a speech system and insist on functionality that is difficult to implement properly, will rarely or never be used, or simply doesn't solve a business problem. It's Jetsonian thinking at work. More than that, though, it's an attitude that customers are only too willing to use anything you put in front of them.

I talked to Ballantine after one of his presentations. He'd made some good points about building systems to solve business problems and addressed an issue that had been bothering me about some speech advocates' and business owners' insistence on creating "natural sounding" systems. I said that I didn't think we should be trying to create systems that could pass the Turing Test. He readily agreed. We'll move the practice forward when we keep our focus on what customers want, and create our business cases to align with their needs.

Thursday, July 5, 2007

Just for fun: Geek Techs IVR

I haven't written much about persona - yet. I think it's an important topic, but it gets altogether too much attention. There are lots of strongly held opinions by designers and managers alike, and not much data. There isn't even a consensus on a definition of persona, so a lot of arguments occur just over lack of agreement on terms.

With that said, I found a simple DTMF (touchtone only) application that has a persona that really hits the mark. Geek Techs (877 433-5835) IVR is just a simple routing application, but the voice perfectly evokes an image of an earnest, helpful, socially challenged computer geek with black horn-rimmed glasses held together by tape and a plastic pocket protector. The IVR also contains a bit of fun for callers who may already be pretty frustrated with their computers:

“To hear the sound of a computer becoming absolutely disintegrated by a 10 pound sledgehammer, press 5.”

Of course, if the application misroutes callers or they're left in queue for a long time, the goodwill created by the IVR's persona will disintegrate as well. There's a limit to what a good persona will buy you.

Monday, July 2, 2007

Local meetings - UX designers

I live in Durham, NC, part of the Research Triangle area, home to a great number of companies that employ user experience / human factors designers. We're fortunate to have enough people doing interface design work to support several active organizations of designers. I'll mention two: TriUPA, the local chapter of the Usability Professionals Association, and HFES Carolina Chapter, the local chapter for the Human Factors and Ergonomics Society.

I went to lunch last week with about eight others in the TriUPA group. We talked shop, traded business cards, and caught up on everyone's news. Chapter president Abe Crystal gave me a nice new book, The Myths of Innovation by Scott Berkun, on the condition that I write a review of it for the TriUPA website. I'll do that soon, I promise.

Having other designers around to talk to and trade ideas with is invaluable. I was in a town for many years that couldn't for various reasons support a community of designers, and it was a long, tough slog. If you're fortunate to be in a place that supports an active community, be grateful for what you have, and say a thanks to the people who do the work to pull events together (thanks Abe and Jackson). If you don't have that community, think about what you can do to build one. It's a lot of work, and I'm sorry to say that I haven't done my part, but it's a valuable thing, and credit accrues to those who make the effort.