Joint source-filter optimization for robust glottal source estimation in the presence of shimmer and jitter
Prasanta Kumar Ghosh and Shrikanth Narayanan


Speech Communication, Elsevier, Volume 53, No. 1, January 2011, pp 98-109

Abstract: We propose a glottal source estimation method robust to shimmer and jitter in the glottal flow. The proposed estimation method is based on a joint source-filter optimization technique. The glottal source is modeled by the Liljencrants-Fant (LF) model and the vocal tract filter is modeled by an autoregressive filter, which is common in the source-filter approach to speech production. The optimization estimates the parameters of the LF model, the amplitudes of the glottal flow in each pitch period, and the vocal tract filter coefficients so that the speech production model best describes the observed speech samples. Experiments with synthetic and real speech data show that the proposed estimation method is robust to different phonation types with varying shimmer and jitter characteristics.


(pdf)


References:

[1] Airas, M. (2008). An environment for voice inverse filtering and parameterization. Logopedics Phoniatrics Vocology, 33, 49–64.
[2] Airas, M., & Alku, P. (2006). Emotions in vowel segments of continuous speech: Analysis of the glottal flow using the normalized amplitude quotient. Phonetica, 63, 26–46.
[3] Alku, P. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Comm., 11, 109–118.
[4] Baken, R. J., & Orlikoff, R. L. (1999). Clinical measurement of speech and voice. Singular; 2 edition.
[5] Carre, R. (1981). Vocal source-vocal tract coupling. effects on the vowel spectrum. IVth FASE Symposium, Venice.
[6] Childers, D. G. (2000). Speech Processing and Synthesis Toolboxes. New York: Wiley.
[7] Ding, W., Kasuya, H., & Adachi, S. (1995). Simultaneous estimation of vocal tract and voice source parameters based on an ARX model. IEICE Trans. Inf. Syst., E78-D(6), 738–743.
[8] Drugman, T., & Dutoit, T. (2009). Glottal closure and opening instant detection from speech signals. Proc. Interspeech, Brighton, UK, (pp. 2891–2894).
[9] Fant, G., & Lin, Q. (1985). A four-parameter model of glottal flow. STL-QPSR 4/85, R. Inst. Technol. (KTH), Stockholm, Sweden.
[10] Flanagan, J. L. (1965). Speech Analysis Synthesis and Perception. Academic Press, Inc., Publishers, New York.
[11] Frohlich, M., Michaelis, D., & Strube, H. W. (2001). SIM-simultaneous inverse filtering and matching of a glottal flow model for acoustic speech signals. J. Acoust. Soc. Amer., 110(1), 479–488.
[12] Fu, Q., & Murphy, P. (2006). Robust glottal source estimation based on joint source-filter model optimization. IEEE Trans. Audio, Speech and Language Proc., 14(2), 492–501.
[13] Hall, M. G., Oppenheim, A. V., & Willsky, A. S. (1983). Time-varying parametric modeling of speech. Signal Processing, 5(3), 267–285.
[14] Hess, W. (1983). Pitch Determination of Speech Signals - Algorithms and Devices. Berlin, Germany: Springer-Verlag.
[15] Klatt, D. H., & Klatt, L. C. (1990). Analysis, synthesis, and perception of voice quality variations among female and male talkers. J. Acoust. Soc. Amer., 87,  820–856.
[16] Krishnamurthy, A. K. (1992).        Glottal source estimation using a sum-of-exponentials model. IEEE Trans. Signal Proc., 40(3), 682–686.
[17] Krishnamurthy, A. K., & Childers, D. G. (1986). Two-channel speech analysis. IEEE Trans. Acoust., Speech, Signal Proc., ASSP-34(4), 730–743.
[18] Markel, J. D., & Gray, A. H. (1976). Linear Prediction of Speech. Springer-Verlag, Berlin.
[19] Miller, R. L. (1959). Nature of the vocal cord wave. J. Acoust. Soc. Amer., 31, 667–677.
[20] Moore, E., Clements, M., Peifer, J., & Weisser, L. (2003). Investigating the role of glottal features in classifying clinical depression. Proc. 25th Annual IEEE Intl. Conf., 3, 2849–2852.
[21] Plumpe, M. D., Quatieri, T. F., & Reynolds, D. A. (1999). Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. Speech Audio Proc., 7(5), 569–586.
[22] Quatieri, T. F. (2001). Discrete-time Speech Signal Processing: Principles and Practice. 1st ed. Englewood Cliffs, NJ: Prentice-Hall.
[23] Rabiner, L., & Schafer, R. (2010). Theory and Applications of Digital Speech Processing. Prentice Hall.
[24] Rosenberg, A. E. (1971). Effect of glottal pulse shape on the quality of natural vowels. J. Acoust. Soc. Amer., 49, 583–590.
[25] Saratxaga, I., D. Erro, I. H., Sainz, I., & Navas, E. (2009). Use of harmonic phase information for polarity detection in speech signals. Proc. Interspeech, Brighton, UK, (pp. 1075–1078).
[26] Shimamura, T., & Kobayashi, H. (2001). Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions on Speech and Audio Processing, 8(7), 727–730.
[27] Strik, H. (1998). Automatic parameterization of differentiated glottal flow: Comparing methods by means of synthetic flow pulses. J. Acoust. Soc. Amer., 103(5), 2659–2669.
[28] Veldhuis, R. (1998). A computationally efficient alternative for the liljencrant-fant model and its perceptual evaluation. J. Acoust. Soc. Amer., 103(1), 566–571.
[29] Wong, D. Y., Markel, J. D., Jr, & Gray, A. H. (1979). Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Trans. Acoust., Speech, Signal Proc., ASSP-27(4), 350–355.
[30] Yoshiyuki, H. (1982). Jitter and shimmer differences among sustained vowel phonations. Journal of Speech and Hearing Research, 25, 12–14.












The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees