On Wed, Nov 19, 2003 at 10:22:55AM +0800, Steve Underwood wrote:That is essentially all that any voice recognition currently does. There is little meaningful context directed recognition (a "phrase locked loop" to use an old in joke) in anything available today.
Arnold Ligtvoet wrote:
Speech recognition accuracy is not great under ideal conditions. Doing what you suggest seems unlikely to achieve any meaningful accuracy. Speech recognition training systems require many occurances of a word or phrase, clearly spoken, before their accuracy becomes useful. A one shot utterance from Festival seems to fail on both counts :-)Since I would like the user names to be auto-generated by the system, I would guess that this could best be done using festival with a localized voice. I think there is a Dutch voice for Mbrola with should integrate into festival ( note to self : need bigger harddisk :-) )
Sphinx isn't doing general speech recognition, it is determining which of a list of patterns it has you said, like mobile phones do.
Many commercial on-line recognisers have serious trouble telling between "yes" and "no" when those are the only two acceptable answers.So it's fairly easy to tell between "Jennifer" and "Frank" if there are no other options.
Beware that many of these systems are actually a human operator hiding behind and IVR. I've had people tell me about amazing automated directory enquiry systems in the US, which turn out to be a human masquerading as an IVR. If the list is known to be short, that many not be the case here.When you call directory assistance in Australia, the IVR asks you what name you want, and gives you a suggestion out of the top 100 or 200 names, which you can accept or reject. Makes for riducule, but beats waiting on hold.
Only a speaker trained system could even begin to approach these accuracies for general text input. The accuracies I gave are for phone based systems expecting a very limited set of responses from an arbitrary caller.Bottom line: the very best speech recognition still sucks. As a British speaker I never get more than about 40% accuracy speaking into a US trained recogniser. I have never had better than about 70-80% accuracy on a British trained recogniser. Strangely, my terrible Cantonese gets nearly 100% on SpeechWorks recogniser. :-\
This is true for general speech recognition, where the computer
has a much larger dictionary to match the sound waves against.
Humans really don't do that much better at raw word recognition, but we heavily apply context to improve things.
Regards, Steve
_______________________________________________ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
