On 01/04/2016 00:03, David Hill wrote:
Dear Alex,
I have interpolated my reply in your email, in what follows.
On Mar 30, 2016, at 16:30 09PM, Farlie A wrote:
On 30/03/2016 21:14, David Hill wrote:
The platform independent /gnuspeechsa/ does not yet incorporate the
Monet facility though I believe Marcelo is working on that aspect,
judging by some of the image material he has previewed to me.
Thanks.
In order to get different accents, intonation and rhythm, as
required for your examples, you may have to get involved in
significant manual work, modifying the databases. For intonation,
you'd have to create the required intonation contour manually.
Hmm, and as I am not a speech professional, this may be beyond my
level of expertise, other than marking notes in the script. as to
intonation intent. Your note about adding tonic feet below is
something I was missing.
Something else that will need to be worked out is how to translate
between Gnuspeech's phoneme names and E-Speak's one (based on the
Kirshenbaum encoding. (see my other recent e-mail).
I think that would be a bad idea. The gnuspeech phonetic
representation is well descibed in the Monet manual. You shouldn't
need to arbitrarily change the set of phonetic symbols. That is likely
to cause problems and seems pointless. The input is punctuated, plain
English text. If you want to modify the phonetic script produced,
learn the symbols. They are very intuitive and are documented in the
manual.
My reasons for the comment, were to do with the fact that E-Speak (and
E-SpeakNG) have some interface code which allows their use with
Microsofts's SAPI Speech API, in it's Windows port, making the voice
potentially callable from any application which supports SAPI. A
better ideas might be to encourage the Espeak NG developers to also
contribute to gnuspeech's development, effectively adding a similar
SAPI->Gnuspeech bridge. :)
In order to make the process easier and less trouble, the user and
application dictionaries should be added and made usable. Then
particular dictionaries (a lot smaller than the main dictionary)
could be set up for particular dialogue and accent requirements.
Hmm... Would consideration some kind of Unintophonic( Universal
intonation phonetic encoding) to represent both sounds and intonation
intent? An older speech synth program I found called Superior
Speech! ( running under RISC OS 3 years ago) , allowed for at least 8
different (albeit fixed) intonation pitches on individual phonemes as
well as some more advanced features for "singing" phonemes at
specific notes ( something which I understand is an area of current
research by others. ). There are some possible encodings like XSAMPA
which incorporate intonation advice. MBROLA (which is non-free)
stores intonation data in a format which deals at a much lower level
so it is possible to do much more finely tuned intonation contouring,
if I understand what that means correctly. ( Thought: If there was a
way to add MBROLA's PHO style data to GNUspeech/Espeak input
files.... hmmm...)
You really need to read the Monet manual. /I have just updated my
university web site/, specifically the page accessed through the
left-hand menu selection "Gnuspeech material", to include both the
TRAcT and Monet manuals, together with precompiled versions of both
TRAcT and Monet. Monet needs Mac OS X 10.10.x or better to run; TRAcT
will run on OS X 10.6 or higher. On that same page, in the list of
papers relevant to Gnuspeech, there's also a new historical view of
the work on intonation and rhythm that may be on interest (the first
paper in the list) and there's access to the early data on which the
rhythm model was based (the last item, which is a report that was
present to an Acoustical Society of America conference in 1977).
There are a whole bunch more papers, less specific to Gnuspeech, but
undoubtedly some of interest, under the lefty menu selection
"Published papers" which takes you to a new main page.
Duly bookmarked your page.
The cut-in and phrase echoing would have to be done by synthesising
the cut-in phrase and then mixing, or possibly in the future by
having two copies on Monet running.
That's what I thought the current situation was likely to need.
However for audio-drama this is less of an issue given that ihe
generated speech audio will probably be edited together in a
non-linear way anyway. Marking the cut-in's then become a
partioning(?) issue during the lexical parsing(?) and timecoding in
any automated scripts that would generate to ressamble the audio
output. Muse (http://www.muse-sequencer.org/) is certainly
scriptable, and depending on programmer interest, it looks possible
that a future gnuspeech might be able to pipe output directly into
the tool via various Audio interfaces like LV2, JACk etc...
Granted that 'scripted' semi-automated editing for cues is outside
your area of focus on the speech generation portion.
Having access to the source code and databases, you could in
principle create any facilities you needed to facilitate the kinds
of dramatic dialogue for which you are looking. Do you have a
programmer with whom you could work? It would amount to creating a
"dramatic dialogue" application, based on /gnuspeech/.
I don't yet, but was considering asking around on projects like
Wikipedia/Wikisource/Wikiversity, given that certain aspects of it
are quite broad.
You could put out a request on the gnuspeech list (address in the
"Copy" field of this email). People reading the list are quite likely
to be interested.
I will consider that.
On a different but related topic... from some of your papers you
built an approximate Tract model. This is presumably flexible enough
to cope with most human charcteristics (including voices that "Sound
like that guy from the Trailer, that's been smoking since he was old
enough to buy them."
(another 'staged' voice type I will add to my earlier examples of
vocal types.).
If you read the Monet manual, you'll find that there are various
controls to change various aspect of the voice -- and yes, they
include changing the settings for the tube resonance model (TRM). You
can investigate the quality directly by using the TRAcT application to
play with the TRM, but it isn't dynamic. Monet is dynamic and can
speak and has an equivalent bunch of controls.
I'll have a read over the manual.
Humanoid like aliens, are also a possibility, The so-termed Nordic
types would probably have a voice closest to human (from a tract
model perspective, based on internet accounts of alleged
encounters), "Stage" aliens such as in old radio/TV are from what I
recall mostly accented human langauge albiet with much modified
grammar or intonational rythm. On the other hand you may have
aliens that have "clicks" in their language (not sure of what these
are called in speech/IPA terms) in additional to tonal and noise
based phonemes.
Clicks are not yet in the repertoire! You'd have to generate them
somewhere else, for now, and edit them in. :-(
Noted. I'm not aware of many human languages that have them, so editing
them in manually seems fair. (In editing together an audio drama, I'd
evenntually be adding in other non-human SFx anyway. )
As you say I need to make some experiments when I am able to get hold
the relevant platform's hardware.
Alex Farlie.
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
_______________________________________________
gnuspeech-contact mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/gnuspeech-contact