Re: [gnuspeech-contact] Synthetic Speech...

Farlie A Fri, 01 Apr 2016 06:36:13 -0700

On 01/04/2016 00:03, David Hill wrote:

Dear Alex,
I have interpolated my reply in your email, in what follows.

On Mar 30, 2016, at 16:30 09PM, Farlie A wrote:
On 30/03/2016 21:14, David Hill wrote:
The platform independent /gnuspeechsa/ does not yet incorporate theMonet facility though I believe Marcelo is working on that aspect,judging by some of the image material he has previewed to me.
Thanks.
In order to get different accents, intonation and rhythm, asrequired for your examples, you may have to get involved insignificant manual work, modifying the databases. For intonation,you'd have to create the required intonation contour manually.
Hmm, and as I am not a speech professional, this may be beyond mylevel of expertise, other than marking notes in the script. as tointonation intent. Your note about adding tonic feet below issomething I was missing.
Something else that will need to be worked out is how to translatebetween Gnuspeech's phoneme names and E-Speak's one (based on theKirshenbaum encoding. (see my other recent e-mail).
I think that would be a bad idea. The gnuspeech phoneticrepresentation is well descibed in the Monet manual. You shouldn'tneed to arbitrarily change the set of phonetic symbols. That is likelyto cause problems and seems pointless. The input is punctuated, plainEnglish text. If you want to modify the phonetic script produced,learn the symbols. They are very intuitive and are documented in themanual.

My reasons for the comment, were to do with the fact that E-Speak (andE-SpeakNG) have some interface code which allows their use withMicrosofts's SAPI Speech API, in it's Windows port, making the voicepotentially callable from any application which supports SAPI. Abetter ideas might be to encourage the Espeak NG developers to alsocontribute to gnuspeech's development, effectively adding a similarSAPI->Gnuspeech bridge. :)

In order to make the process easier and less trouble, the user andapplication dictionaries should be added and made usable. Thenparticular dictionaries (a lot smaller than the main dictionary)could be set up for particular dialogue and accent requirements.
Hmm... Would consideration some kind of Unintophonic( Universalintonation phonetic encoding) to represent both sounds and intonationintent? An older speech synth program I found called SuperiorSpeech! ( running under RISC OS 3 years ago) , allowed for at least 8different (albeit fixed) intonation pitches on individual phonemes aswell as some more advanced features for "singing" phonemes atspecific notes ( something which I understand is an area of currentresearch by others. ). There are some possible encodings like XSAMPAwhich incorporate intonation advice. MBROLA (which is non-free)stores intonation data in a format which deals at a much lower levelso it is possible to do much more finely tuned intonation contouring,if I understand what that means correctly. ( Thought: If there was away to add MBROLA's PHO style data to GNUspeech/Espeak inputfiles.... hmmm...)
You really need to read the Monet manual. /I have just updated myuniversity web site/, specifically the page accessed through theleft-hand menu selection "Gnuspeech material", to include both theTRAcT and Monet manuals, together with precompiled versions of bothTRAcT and Monet. Monet needs Mac OS X 10.10.x or better to run; TRAcTwill run on OS X 10.6 or higher. On that same page, in the list ofpapers relevant to Gnuspeech, there's also a new historical view ofthe work on intonation and rhythm that may be on interest (the firstpaper in the list) and there's access to the early data on which therhythm model was based (the last item, which is a report that waspresent to an Acoustical Society of America conference in 1977).
There are a whole bunch more papers, less specific to Gnuspeech, butundoubtedly some of interest, under the lefty menu selection"Published papers" which takes you to a new main page.


Duly bookmarked your page.

The cut-in and phrase echoing would have to be done by synthesisingthe cut-in phrase and then mixing, or possibly in the future byhaving two copies on Monet running.
That's what I thought the current situation was likely to need.However for audio-drama this is less of an issue given that ihegenerated speech audio will probably be edited together in anon-linear way anyway. Marking the cut-in's then become apartioning(?) issue during the lexical parsing(?) and timecoding inany automated scripts that would generate to ressamble the audiooutput. Muse (http://www.muse-sequencer.org/) is certainlyscriptable, and depending on programmer interest, it looks possiblethat a future gnuspeech might be able to pipe output directly intothe tool via various Audio interfaces like LV2, JACk etc...Granted that 'scripted' semi-automated editing for cues is outsideyour area of focus on the speech generation portion.
Having access to the source code and databases, you could inprinciple create any facilities you needed to facilitate the kindsof dramatic dialogue for which you are looking. Do you have aprogrammer with whom you could work? It would amount to creating a"dramatic dialogue" application, based on /gnuspeech/.
I don't yet, but was considering asking around on projects likeWikipedia/Wikisource/Wikiversity, given that certain aspects of itare quite broad.
You could put out a request on the gnuspeech list (address in the"Copy" field of this email). People reading the list are quite likelyto be interested.


I will consider that.

On a different but related topic... from some of your papers youbuilt an approximate Tract model. This is presumably flexible enoughto cope with most human charcteristics (including voices that "Soundlike that guy from the Trailer, that's been smoking since he was oldenough to buy them."(another 'staged' voice type I will add to my earlier examples ofvocal types.).
If you read the Monet manual, you'll find that there are variouscontrols to change various aspect of the voice -- and yes, theyinclude changing the settings for the tube resonance model (TRM). Youcan investigate the quality directly by using the TRAcT application toplay with the TRM, but it isn't dynamic. Monet is dynamic and canspeak and has an equivalent bunch of controls.


I'll have a read over the manual.

Humanoid like aliens, are also a possibility, The so-termed Nordictypes would probably have a voice closest to human (from a tractmodel perspective, based on internet accounts of allegedencounters), "Stage" aliens such as in old radio/TV are from what Irecall mostly accented human langauge albiet with much modifiedgrammar or intonational rythm. On the other hand you may havealiens that have "clicks" in their language (not sure of what theseare called in speech/IPA terms) in additional to tonal and noisebased phonemes.
Clicks are not yet in the repertoire! You'd have to generate themsomewhere else, for now, and edit them in. :-(

Noted. I'm not aware of many human languages that have them, so editingthem in manually seems fair. (In editing together an audio drama, I'devenntually be adding in other non-human SFx anyway. )

As you say I need to make some experiments when I am able to get holdthe relevant platform's hardware.


Alex Farlie.






---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
gnuspeech-contact mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: [gnuspeech-contact] Synthetic Speech...

Reply via email to