Volume 130, Issue 1, July 2011
Index of content:
- SPEECH PROCESSING AND COMMUNICATION SYSTEMS 
Automatic estimation of voice onset time for word-initial stops by applying random forest to onset detection130(2011); http://dx.doi.org/10.1121/1.3592233View Description Hide Description
The voice onset time (VOT) of a stop consonant is the interval between its burst onset and voicing onset. Among a variety of research topics on VOT, one that has been studied for years is how VOTs are efficiently measured. Manual annotation is a feasible way, but it becomes a time-consuming task when the corpus size is large. This paper proposes an automatic VOT estimation method based on an onset detection algorithm. At first, a forced alignment is applied to identify the locations of stop consonants. Then a random forest based onset detector searches each stop segment for its burst and voicing onsets to estimate a VOT. The proposed onset detection can detect the onsets in an efficient and accurate manner with only a small amount of training data. The evaluation data extracted from the TIMIT corpus were 2344 words with a word-initial stop. The experimental results showed that 83.4% of the estimations deviate less than 10 ms from their manually labeled values, and 96.5% of the estimations deviate by less than 20 ms. Some factors that influence the proposed estimation method, such as place of articulation, voicing of a stop consonant, and quality of succeeding vowel, were also investigated.