Polytonia: a system for the automatic transcription of tonal aspects in speech corpora


This paper first proposes a labeling scheme for tonal aspects of speech and then describes an automatic annotation system using this transcription.

This fine-grained transcription provides labels indicating pitch level and pitch movement of individual syllables. Of the five pitch levels, three (low, mid, high) are defined on the basis of pitch changes in the local context and two (bottom, top) are defined relative to the speaker’s global pitch range boundaries. For pitch movements, both simple and compound, the transcription indicates direction (rise, fall, level) and size, using pitch interval categories adjusted relative to the speaker’s pitch range.

The automatic tonal annotation system combines six processing steps: segmentation into syllable peaks, pause detection, pitch stylization, pitch range estimation, classification of the intra-syllabic pitch contour, and pitch level assignment. It uses a dedicated and rule-based procedure, which unlike widely used supervised learning techniques does not require training through a labeled corpus.

A preliminary evaluation of the annotation system is included, for a reference corpus of nearly 14 minutes of spontaneous speech in French and Dutch, in order to quantify annotation errors. The results, as expressed in terms of standard measures of precision, recall, accuracy and F-measure are encouraging. For low, mid and high pitch levels an F-measure between 0.946 and 0.815 is obtained and for pitch movements a value between 0.708 and 1.

Additional modules to detect prominence and prosodic boundaries will enable the resulting annotation to serve as input for phonological annotation. 

