Vocal Inflection, Part II

Feb 20th, 2008 | By Trevor Baca | Category: TTS | Text to Speech

In part I of this post we looked at the three most basic tones in English and we checked out the performance of the text-to-speech, or TTS, robot at AT&T Labs named “Mike”. We discovered that English does in fact have tones. And we discovered that tones are hard to get right in text-to-speech.

In this post we look at a different example of vocal inflection in English. And we see how tones interact with sentences. Listen to examples #1a and b, below.

Example #1a (falling then rising): “You downloaded the newest vèrsion, didn’t you?”

Example #1b (falling then falling again): “You downloaded the newest version, didn’t you?”

(The examples here follow the presentation of local meanings of rising tones in the second edition of Alan Cruttenden’s Intonation, which we introduced in the previous post.)

Same sentence, two different tone contours. (Accent marks help us see the tones.) Example #1a ends the first clause with a falling tone on “version” and then follows that with rising tone on “didn’t you?”. This is a very common pattern in spoken English. The pattern seems to convey genuine uncertainty on the part of the speaker. “Umm, you know, I’m not really sure that you downloaded the newest version and in fact I think there’s a chance you may be looking at one of the old, out-dated versions. So let me ask you to make sure: you downloaded the newest version, didn’t you?”

Example #1b starts off exactly the same as example #1a. But example #1b follows up the first clause with yet another falling tone on the “didn’t you” in the second clause. This is different. Example #1b seems represent a much stronger degree of certainty on the part of the speaker. So much so that the speaker seems to be asking for confirmation. “You did download the newest version and I’m quite certain about that fact; now confirm it for me so we can move on to more important things.”

So the meaning behind the vocal inflection in example #1a is something like “genuine question, uncertainty” whereas the meaning behind the vocal inflection in example #1b is more like “request for confirmation, relative certainty”. Exactly the same words. Only one tone differs.

But now let’s look at what happens when we let this exact same pair of tone contours interact with a different type of sentence.

Example #2a (falling then rising): “You downloaded the newest version, did you?”

Example #2b (falling then falling again): *“You downloaded the newest version, did you?” [wrong]

The sentences in examples #2a, b are almost exactly the same as the sentences in examples #1a, b. The only difference is the change from “didn’t” to “did” in the so-called “tag question” at the end of the sentence.

But notice what happens. Example #2a (falling then rising) is perfectly acceptable. But example #2b (falling then falling again) is not acceptable. At least not for a native speaker.

If we stop and think about this for a moment, we realize something quite astounding. The acceptability of English tones is somehow conditioned on (very slight) differences in syntax. Take away a “not” from examples #1a, b and you render one tone pattern valid and one completely unacceptable.

There are a couple of take-aways here.

First, tones are by no means the exclusive province of speaker preference. Yes, when we listen to Clinton, Obama and McCain we hear wildly different patterns of vocal inflection (some probably much more interesting than others). But the choices that different speakers make when they select different patterns of vocal inflection are very strongly conditioned by rules that govern the interaction between tone and syntax.

Second, these rules that govern the interaction between tone and syntax are largely hidden. Sure we teach kids and second-language learners to raise their voices at the end of a question. But our examples here give perfectly valid situations where you do exactly the opposite and lower the voice at the end of a question (to ask for confirmation rather than to exhibit doubt). How do we, as application designers interested in voice, capture these rules? Better yet, how do we as application designers pick different tone patterns for these sentences given that their written forms are exactly the same? Developers hate hidden rules.

Related posts:

  1. Vocal Inflection, Part I Communications-enabled business processes (CEBP) take many forms. Think school- and...
  2. Vocal Inflection, Part III Whereas part I and part II of this series have...

Tags: , , , , , , ,

Leave Comment