Re: [gnuspeech-contact] GNUSpeech Console Utility

David Hill Thu, 05 Nov 2009 12:29:53 -0800

Hi John,

The original text-to-speech system on the NeXT, on which the port isbased, did address the "question" intonation pattern.

The intonation patterns are affected by the punctuation andintonation control parameters. But, properly, only questionsexpecting the answer "Yes" or No", or statements expressinguncertainty that really have rising intonation at the end.

The rampant "up-talk" by the younger generation in Canada is anexception -- everything in "up-talk" gets a rising intonation at theend, perhaps a sign of insecurity in the speaker! :-).

Wh- questions don't show the rising intonation. The system did notmake allowance for this distinction -- it would have required somegrammatical analysis which we had not tackled, but it should be. Itisn't just a matter of detecting the presence of words like "why","when" "who", what", and "how" because it is fairly easy to frame a"Yes/No" question that also contains one or more of these words (forexample: "Did you tell her when we were supposed to meet?").

The system also had regular statements and emphatic statements.There should have been a lot more, and the plan was to implement thewhole of Michael Halliday's description of the intonation of BritishEnglish (he wrote an excellent tutorial book, with accompanying tapedexamples: A course in spoken English: Intonation" -- Oxford U. Press1970 SBN [sic] 19 453066 3).

The intonation system was tied to the metrical aspects of Englishdescribed by a number of British linguists -- most notably ProfessorDavid Abercrombie who was at Edinburgh university. We carried outsignificant research at the U of Calgary on the rhythm and intonationof British English and this was used when we spun off Trillium SoundResearch and built the original NeXT system. The rhythm andintonation were regarded as significantly effective features of thetext-to-speech system, even though the research results and Hallidaywere only partially implemented. The speech was found to be much lesstiring to listen to for long periods than, for example, DECTalk(which was based on MITalk developed at MIT: "From text to speech:the MITalk system," Allen, Hunnicutt & Klatt, Cambridge UniversityPress, 1987 ISBN 0-521-30641-8)

Abercrombie's claim was that spoken British English had "a tendencytowards isochrony". Specifically, spoken phrases and sentences couldbe split into "feet", rather like the bars in music, and the rhythmic"beat" falls on the first syllable of this unit (the stressedsyllables dictate where the foot boundaries fall). A tendencytowards isochrony then asserts that the beats fall at more regularintervals than would be expected from the differing number ofsyllables in each foot, and this is because the length of thesyllables becomes shorter as the number increase. American linguistsare skeptical about this idea but our analyses of a corpus of Englishspoken for purposes of illustrating intonation revealed that such atendency definitely exists. You'd think it was an easy enoughquestion to resolve one way or the other, but if you think this youdon't know linguists! :-)

There are several descriptions of the rhythm work we did. The mostcomplete one, though very academic, is:

JASSEM, W., HILL, D.R. & WITTEN, I.H. (1984) Isochrony in Englishspeech: its statistical validity and linguistic relevance. Pattern,Process and Function in Discourse Phonology (collection ed. DavyddGibbon), Berlin: de Gruyter, 203-225 (J)


but there is a shorter version that summarises the actual research data:

HILL, D.R., WITTEN I.H. and Jassem, W. (1977) Some results from apreliminary study of British English speech rhythm which waspresented at 94th. Meeting of the Acoustical Society of America,Miami, Dec 12-16 but only appears as a summary in the proceedings.The full text available as U of Calgary Computer Science Dept. Report78/26/5

I could send you a draft electronic copy as I am currently workingon putting a copy on the web but there's also a hard copy versionpublished as a departmental report.

The intonation work is best accessed through Halliday's book thoughCraig Taube-Schock's thesis (for which he received the GovernorGeneral of Canada's Gold Medal) reports the initial experimental workwe did to validate and extend Halliday's descriptions for purposes ofcomputer speech intonation:

"Synthesizing intonation for computer speech output" Craig-RichardTaube-Schock. M.Sc. Thesis, Department of Computer Science, TheUniversity of Calgary 1993, 109 pages.

It is available from Proquest (who archive all university theses inNorth America) though they have the date as 1994. In implementing theintonation for the TextToSpeech kit, a number of improvements weremade that are not written up in the thesis, especially the smoothingof contours.


From the original Developer TextToSpeech kit manual:

The Parser Module takes the text supplied by the client application(using the speakText:or speakStream: methods) and converts it into an equivalentphonetic representation. Theinput text is parsed, where possible, into sentences and tonegroups. This subdivision is doneprimarily by examining the punctuation. Each word or number orsymbol within a tone groupis converted to a phoneme string which indicates how the word is tobe pronounced. Thepronunciation is retrieved from one of five pronunciation knowledgebases.The Parser must also deal with text entered in any of the specialtext modes. For example, aword may be marked in letter mode, which means the word is tospelled out a letter at a time, orin emphasis mode, which means the word is to receive specialemphasis by lengthening it andaltering its pitch. The Parser marks the phonetic representationappropriately in these cases.

...

The system attempts to speak the text as a person would.Punctuation is not pronounced, butis used as guide to pronounce the text it marks. For example, aperiod that marks the end ofsentence is not pronounced, but does indicate that a pause occursbefore proceeding to the next
sentence.

A question mark at the end of a sentence caused the rising intonationof a question to be selected. Another special mode allowedpunctuation to be spoken, rather than used to control how the textwas spoken. I have put the whole manual on my university web sitewhere it is easier to find than digging through the savannahrepository, though it doesn't really address these issues completely(but is useful for many purposes, and you will find it usefulbackground). Go to:


http://pages.cpsc.ucalgary.ca/~hill

Select "Published papers" from the left-hand menu, scroll down tosection "E. Other publications" and you'll find a whole lot ofGnuspeech-related documents there. The sixth item is "Manual for theoriginal NeXT Developer TextToSpeech kit". Clicking the link witllallow you to download a .pdf file of the whole manual. The fiveprevious links in that section are also useful references forGnuspeech and will help you in your work on porting the server.

Many thanks for your willingness to get involved. Very muchappreciated. Feel free to bug me with any questions/problems thatcome up.


HTH. All good wishes.

david
---------
David Hill
[email protected]
http://savannah.gnu.org/projects/gnuspeech
--------

The only function of economic forecasting is to make astrology lookrespectable. (J.K. Galbraith)

--------

On Nov 4, 2009, at 6:21 PM, John Delaney wrote:

Here I was trying to implement a speech synthesis API for agraduate musical synthesis class, and now I'm getting roped intoactually working on the project. I'll implement some sort ofParameter class to hold the current intonation parameters, thatshould be pretty simple.
Would it be possible for the synthesis engine to ramp up theintonation at the end of a sentence whenever there is a questionmark? I don't think I have seen a synthesis engine, yet to dothis, and it seems like such a small/easy thing to do.
Perhaps I'll revisit this when I eventually take machine learningclasses.
Thank you,
John Delaney
On Wed, Nov 4, 2009 at 5:09 PM, Dalmazio Brisinda<[email protected]> wrote:Yes, you are correct. All those server methods are yet to beimplemented. Currently the server just supports speaking text withthe defaults that were taken from Monet. This is certainly one areathat could use some filling out, and any contribution would be morethan welcome.
Best,
Dalmazio



On 2009-11-04, at 5:54 PM, John Delaney wrote:
Thank you all for your help. I have switched to using the servermethod because its very easy and functional. Am I mistaken,though that many of the parameters such as pitch and intonationhave not yet been implemented to the server? I am looking at theserver and all the get/set methods have return zero. I suppose Iwill need to impliment those if this is the case.
On Wed, Nov 4, 2009 at 12:37 PM, Dalmazio Brisinda<[email protected]> wrote:Have a look at Linked Frameworks section in the Xcode Groups &Files pane. I've found in the past that for setting up the projecton a different system, I've often had to remove the customframeworks (Tube and GnuSpeech) and then add them again, so Xcodecorrectly picks up the new locations -- unless they're in standardsystem Framework folders. If you would like additional informationon Xcode, have a look at the book "Xcode Unleashed" -- there maybe others.


[snip]

---------

_______________________________________________
gnuspeech-contact mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: [gnuspeech-contact] GNUSpeech Console Utility

Reply via email to