Research in the Prosodic Structure of Mandarin

 Ivan Chow, Department of French, University of Toronto

1. Mandarin, a tone language

Mandarin possesses what is called the lexical tones.  Most of the morphemes are monosyllables affiliated with a lexical tone. (Speer, Shih & Slowiaczek, 1989) There are four lexical tones in Mandarin: (1) a high, level tone, (2) a rising tone, (3) a low falling tone (with a rising tail in pre-pausal positions), and (4) a high falling tone. The following figure (based on Speer, Shih & Slowiaczek, 1989) shows the four tones with the corresponding tone characterization (used in Speer, Shih & Slowiaczek, 1989), tone contours and pitch values (cf. Chao 1968).

Figure 1. Representations of the four lexical tones in Mandarin



As the word “lexical” in “lexical tones” implies, the meaning of a morpheme depends on the register and shape of the tone that is associated with the monosyllable.  When the tone affiliated to a given syllable is changed, the meaning of the morpheme is completely changed.  Take the monosyllable “ma” as an example:


Syllable + Tone










“to yell at”


Like any other cultures and language communities, Mandarin speakers whisper as well.  Though by whispering, the tonal register and shapes are lost, interestingly, a sufficient amount of information can still be transmitted by the combination of syllables to make the communication worthwhile.  However, some occasional misunderstandings do happen:


(1) Xiao2 jie3, shui2 jiao3 yi1 wan3 duo1 shao3 qian2?


“Miss, how much is a bowl of dumplings?”


(2) Xiao2 jie3, shui4 jiao4 yi1 wan3 duo1 shao3 qian2?


“Miss, how much is it to sleep for one night?”

2. Manipulation of Tones with WinPitch


The acoustic correlate of the register and shape of the tones is called fundamental frequency (F0) and is measured in the units of Hertz (Hz).  Using WinPitch, pitch contour can be plotted as a continuous line of fundamental frequency against time.  Once the contour is generated, it can be altered by clicking on certain points on the contours and then drag in different directions.   For example, the pitch contour associated with "ma" can be altered and re-synthesized to resemble the sounds of the four lexical tones, even other tones that are not in the lexical tone repertoire of Mandarin.


As shown in the figure below, a pitch contour can be added to the original contour by specifying the beginning and end points.  The re-synthesizer then outputs the combination of the syllable "ma" with the new pitch contour.  Tone 1 can be re-synthesized by adding a level pitch contour of a higher fundamental frequency (from about 110Hz to 170Hz).  Tone 2 is re-synthesized by associating the syllable with a rising pitch contour that begins at about 110Hz in the beginning of the vowel segment to about 170Hz at the end.  Tone 3 can be generated by adding a slightly falling contour from about 125Hz to about 90Hz, and then a rising contour at the end from 90Hz to 135Hz, and tone 4, by adding a falling contour from about 170Hz in the beginning of the vowel segment to about 110Hz in the end.


(3) Synthesis of Tones using WinPitch




3. Prosodic Structure


Prosodic structure is a structure in which syllables in a sentence are organized into a hierarchical structure.   According to Martin (1987, 1997, 1999), prosodic structures are governed by three prosodic constraints and one syntax-related constraint.  They are proven to be valid in French, Italian and Brazilian Portuguese.  The three prosodic constraints and the syntax-related constraint are as follows:



a.        Stress clash condition: two stressed syllables must not be adjacent to each other. If this is the case, they must be separated by a space, or the stress on one of the syllables must be moved or cancelled.

b.       Maximal number of syllable condition: in a stress group (that contains one and only one stressed syllable), the number of consecutive non-stressed syllable cannot exceed the number of 7 or 8.

c.        Eurhythmic Rule: (a) balance the number of syllables in the prosodic groups at the same level of the prosodic structure; (b) modify the rhythm of same level prosodic groups by accelerating groups with a larger number of syllables and slowing down those with a fewer number of syllables.

d.       Syntactic clash condition: forbids two minimal prosodic units (i.e. stress groups) to be reunited in the same higher prosodic group if they are dominated in the syntactic tree by two distinct nodes.


In other words, the last constraint imposes that boundaries of prosodic groups are to be aligned with the boundaries between two syntactic nodes.  It is represented otherwise as such:


(5)                 *(m][n) 

where the parentheses denote prosodic boundaries and the square brackets denote syntactic boundaries.


Take French as an example, the following sentences consist of identical syllables, however, the syntactic structures are different.  In this case, prosody is the only mean for disambiguating the meaning of the sentences by aligning the prosody boundary with the major syntactic boundary between the N and the V nodes.   The sentences are divided into two prosodic words (in which only one stressed syllable is found) according to the placement of the boundary.


(6) La belle ferme le voile.


Syntactic Structure:            

Prosodic Structure:

“The beautiful (girl) closes the veil.”



Syntactic Structure:            

Prosodic Structure:

“The beautiful farm hides it.”





4. Prosodic Experiment in Mandarin


An acoustic experiment is being conducted to investigate the prosodic devices used by readers to disambiguate otherwise syntactically ambiguous sentences in Mandarin.  Eight native speakers are asked to read eight pairs of syntactically ambiguous sentences.  Six identical syllables are embedded in each pair. As shown in the example in French above, the prosodic boundaries have to be aligned with the syntactic boundaries in order to clarify the meaning of the sentences.   According to two different syntactic structures imposed on these six syllables, readers are to disambiguate the sentences by using prosodic devices to convey the aligned prosodic boundaries.


The followings are some of the test sentences used in this experiment:



(a).          San1 qian1 / duo1 da4 / xue2 sheng1.

                “3000 University of Toronto students.”


                San1 qian1 duo1 / da4 xue2 sheng1.

                “More than 3000 university students.”


(b).          Lan2 se4 / zhi2 na2 / san1 zhang1.

                “Only take 3 sheets of blue ones.”


                Lan2 se4 zhi3 / na2 san1 zhang1.

                “Take 3 sheets of blue paper.”


(c).          Ta1 cong2 / xiao3 bian4 / zhi1 dao4 zhi1 ji2 you3 tang2 niao4 bing4.

                “He finds out from the urine that he has diabetes.”


                Ta1 cong2 xiao3 / bian4 zhi1 dao4 zhi1 ji2 you3 tang2 niao4 bing4.

                “He has known ever since he was a child, that he has diabetes.”


Data analysis of the recordings is done with WinPitch. The measurements of the duration of boundary pauses, pre-boundary syllables and prosodic words within the 6-syllable test phrases are used to determine the prosodic devices used to convey the boundaries.  These devices include pauses and lengthening of the pre-boundary syllable. Pitch is also measured in terms of fundamental frequency (F0).  The difference between the end point of the pre-boundary tone and the beginning point of the post-boundary tone are measured and compared between each pair of sentences in order to determine whether the speak uses pitch neutralization to convey a given prosodic boundary.


192 possible prosodic boundaries (8 pairs of sentences x 3 possible boundaries x 8 speakers) are analyzed.  Pre-boundary lengthening is found in 144 instances (75%), pauses are found in 133 (69.3%), pitch reset in 145 instances (75.5%). 

Results show that, in contrary to the observations of Streeter (1978) and Shen (1992), and in support of Swerts’ (1996) view, pitch range and tone do play an important role in defining prosodic boundaries in Mandarin Chinese. On top of pauses and prolongation of pre-boundary syllables, pitch neutralization is also an important device in conveying prosodic boundaries.