Wednesday, July 30, 2008
Speech to text: Jott
A few weeks ago I blogged about the Spinvox demo that transcribes a short message and texts it to your cell phone. Then someone pointed me to Jott, which transcribes your message and sends it in email to someone in your phone book. I'm impressed. Voice transcription has come a long way since I started playing with IBMs Via Voice years ago. The old transcription systems were speaker dependent, and required a high quality microphone and a lot of training for the performance to be very good. Spinvox and Jott are doing speaker independent transcription, and from callers on cell phones as well. There are a lot of potential applications out there, just waiting for someone to develop them.
Sunday, July 27, 2008
"Specialized" MBA programs
Here's a brief article in BusinessWeek on the rise of specialized MBA programs. Two students from program at NC State are quoted in the article. I'm in the NC State part-time program, and I was glad to see the program get some attention, but I think the article missed the mark. NC State, like other MBA programs, offers various concentrations in areas like biopharma, services, finance, and innovation, but it doesn't grant a specialty degree like Master of Health Administration, for example. A better example of a speciality degree is the Master of Global Innovation Management where students take classes in Raleigh and in France. At any rate, BusinessWeek tends to report on the top-tier schools, so the article was a bit of a departure from the norm.
Sunday, July 20, 2008
Remarks on Free Agent Nation
Self employed free agents are unaccounted for in official government statistics, but form a large proportion of the US workforce. Daniel Pink, author of Free Agent Nation, estimated in 2004 that the number of free agents of various stripes was 33 million and growing. Many of these free agents work out of their homes, using the "free agent infrastructure" of the Internet, Starbucks, Kinkos, and FedEx to conduct their business.
Free Agent Nation is both a description of the free agent workforce and a argument for why workers should cut themselves free of large companies and work for themselves. It's a convincing argument if you have a strong set of skills and an entrepreneurial attitude. Easy to read, full of interesting data and opinions. This interview with the author gives a flavor of the book, first published in 2001 and reprinted in 2004. Recommended.
Free Agent Nation is both a description of the free agent workforce and a argument for why workers should cut themselves free of large companies and work for themselves. It's a convincing argument if you have a strong set of skills and an entrepreneurial attitude. Easy to read, full of interesting data and opinions. This interview with the author gives a flavor of the book, first published in 2001 and reprinted in 2004. Recommended.
Sunday, July 13, 2008
Lessons learned: Educate the management, pt. 1
One of the first things I learned about speech projects is how important it is to educate management about speech recognition IVRs. It's not a pretty scene when you roll out a speech application after a year-long effort and the business managers don't understand what they've been given.
- Tuning. Speech IVRs aren't fully tested during user acceptance testing. They can't be. Full testing depends on having large numbers of real customers hitting the IVR, evaluating the data, and making tweaks to the prompting and grammars. That's tuning. Business managers who don't know about tuning are shocked when they learn that testing isn't complete until several weeks after roll out.
- Roll out strategy. Deploying a speech IVR isn't simply a matter of flipping a switch and exposing your customers to the IVR, then walking away. Do you really want everyone to hear version 1.0 (see tuning, above)? Can your CSRs answer the inevitable questions from customers about the new system? Who is going to be responsible for ongoing observation and maintenance (see next point, below)? There are a lot of issues that management needs to settle long before the speech IVR is rolled out.
- Ongoing observation and maintenance. Believe it or not, some business managers think that once the IVR is deployed the business can just walk away from it. That may be true for some limited-functionality DTMF IVRs (and we all know how much customers love those systems) but it certainly is not true for speech IVRs. There's work to do even after a successful tuning cycle, and management needs to plan for that in advance.
Tuesday, July 8, 2008
Phone tree hell
This demo of an abusive IVR is too funny for words. Turn on your speakers and enjoy. I know it's an ad for some company's services, but it's great anyway. Thanks to Phil Shinn for forwarding it.
Saturday, July 5, 2008
Microsoft and voice search
I read two apparently unrelated news items about Microsoft recently. The first was its acquisition of Powerset, a natural language search engine for retrieving information from Wikipedia. One Microsoft blogger explains the logic of the Powerset acquisition.
The second item was released by Spinvox, a speech-to-text transcription engine that can be adapted for a number of applications. This news release, which hasn't been commented on much, said that its new senior director for its Microsoft relationship will "charged with driving the co-development of Microsoft unified communications and enterprise applications with SpinVox services." So Spinvox is going to co-develop apps with Microsoft.
If you put these two technologies together, speaker independent speech-to-text and natural language text search, you would have a pretty powerful way of searching Internet content from your mobile phone. Powerset, in particular, still needs some work, but it exists as a proof of concept that you can use natural language to get answers from large text corpora.
Microsoft has been challenging Google on search presented visually, so far without much success. These two recent developments could signal Microsoft's attempt to create a market for voice search.
The second item was released by Spinvox, a speech-to-text transcription engine that can be adapted for a number of applications. This news release, which hasn't been commented on much, said that its new senior director for its Microsoft relationship will "charged with driving the co-development of Microsoft unified communications and enterprise applications with SpinVox services." So Spinvox is going to co-develop apps with Microsoft.
If you put these two technologies together, speaker independent speech-to-text and natural language text search, you would have a pretty powerful way of searching Internet content from your mobile phone. Powerset, in particular, still needs some work, but it exists as a proof of concept that you can use natural language to get answers from large text corpora.
Microsoft has been challenging Google on search presented visually, so far without much success. These two recent developments could signal Microsoft's attempt to create a market for voice search.
Labels:
Google,
Microsoft,
Search,
Speech to Text,
Spinvox
Friday, July 4, 2008
Speech projects are special
The projects that I've worked on since 2002 have nearly all been implementations of speech recognition systems, or applied research on listener perceptions of speech reco systems. I've learned that speech reco projects have a lot of characteristics that distinguish them from typical software development projects. I've made note of those characteristics, and started to develop a lessons learned file to capture the kinds of things that must occur in order to ensure a successful project. I'll start posting those lessons over the next few months. Of course, if you have any favorite lesson learned from speech reco projects, please leave a comment.
(The title of the blog post is a take off on the "speech is special" argument that some linguists make based on phenomena like categorical perception that were first observed in speech experiments.)
(The title of the blog post is a take off on the "speech is special" argument that some linguists make based on phenomena like categorical perception that were first observed in speech experiments.)
Subscribe to:
Posts (Atom)