Volume 105, Issue 3, March 1999
Index of content:
- SPEECH PROCESSING AND COMMUNICATION SYSTEMS 
Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications105(1999); http://dx.doi.org/10.1121/1.426727View Description Hide Description
An analytic signal is modeled over a T second duration by a pole-zero model by considering its periodic extensions. This type of representation is analogous to that used in discrete-time systems theory, where the periodic frequency response of a system is characterized by a finite number of poles and zeros in the z-plane. Except, in this case, the poles and zeros are located in the complex-time plane. Using this signal model, expressions are derived for the envelope, phase, and the instantaneous frequency of the signal In the special case of an analytic signal having poles and zeros in reciprocal complex conjugate locations about the unit circle in the complex-time plane, it is shown that their instantaneous frequency (IF) is always positive. This result paves the way for representing signals by positive envelopes and positive IF (PIF). An algorithm is proposed for decomposing an analytic signal into two analytic signals, one completely characterized by its envelope and the other having a positive IF. This algorithm is new and does not have a counterpart in the cepstral literature. It consists of two steps. In the first step, the envelope of the signal is approximated to desired accuracy using a minimum-phase approximation by using the dual of the autocorrelation method of linear prediction, well known in spectral analysis. The criterion that is optimized is a waveform flatness measure as opposed to the spectral flatness measure used in spectral analysis. This method is called linear prediction in spectral domain (LPSD). The resulting residual error signal is an all-phase or phase-only analytic signal. In the second step, the derivative of the error signal, which is the PIF, is computed. The two steps together provide a unique AM-FM or minimum-phase/all-phase decomposition of a signal. This method is then applied to synthetic signals and filtered speech signals.
105(1999); http://dx.doi.org/10.1121/1.426738View Description Hide Description
The dynamics of airflow during speech production may often result in some small or large degree of turbulence. In this paper, the geometry of speechturbulence as reflected in the fragmentation of the time signal is quantified by using fractalmodels. An efficient algorithm for estimating the short-time fractal dimension of speech signals based on multiscale morphological filtering is described, and its potential for speech segmentation and phonetic classification discussed. Also reported are experimental results on using the short-time fractal dimension of speech signals at multiple scales as additional features in an automatic speech-recognition system using hidden Markovmodels, which provide a modest improvement in speech-recognition performance.