Anthony Wood wrote:

On Wed, Nov 19, 2003 at 10:22:55AM +0800, Steve Underwood wrote:


Arnold Ligtvoet wrote:



Since I would like the user names to be auto-generated by the system, I
would guess that this could best be done using festival with a localized
voice. I think there is a Dutch voice for Mbrola with should integrate into
festival ( note to self : need bigger harddisk :-) )




Speech recognition accuracy is not great under ideal conditions. Doing what you suggest seems unlikely to achieve any meaningful accuracy. Speech recognition training systems require many occurances of a word or phrase, clearly spoken, before their accuracy becomes useful. A one shot utterance from Festival seems to fail on both counts :-)




Sphinx isn't doing general speech recognition, it is determining which of a list of patterns it has you said, like mobile phones do.

That is essentially all that any voice recognition currently does. There is little meaningful context directed recognition (a "phrase locked loop" to use an old in joke) in anything available today.

So it's fairly easy to tell between "Jennifer" and "Frank" if there
are no other options.

Many commercial on-line recognisers have serious trouble telling between "yes" and "no" when those are the only two acceptable answers.

When you call directory assistance in Australia, the IVR asks you what name
you want, and gives you a suggestion out of the top 100 or 200 names, which you
can accept or reject.  Makes for riducule, but beats waiting on hold.

Beware that many of these systems are actually a human operator hiding behind and IVR. I've had people tell me about amazing automated directory enquiry systems in the US, which turn out to be a human masquerading as an IVR. If the list is known to be short, that many not be the case here.

Bottom line: the very best speech recognition still sucks. As a British speaker I never get more than about 40% accuracy speaking into a US trained recogniser. I have never had better than about 70-80% accuracy on a British trained recogniser. Strangely, my terrible Cantonese gets nearly 100% on SpeechWorks recogniser. :-\



This is true for general speech recognition, where the computer
has a much larger dictionary to match the sound waves against.


Only a speaker trained system could even begin to approach these accuracies for general text input. The accuracies I gave are for phone based systems expecting a very limited set of responses from an arbitrary caller.

Humans really don't do that much better at raw word recognition, but we heavily apply context to improve things.

Regards,
Steve


_______________________________________________ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users

Reply via email to