Signal Analysis and Interpretation Laboratory
Problem Description:
References:

Internal:
Links:

Information Theoretic analysis of Acoustic-Articulatory dependencies for Speech Recognition

Vivek Rangarajan, Jorge Siva, Viktor Rozgic and Shrikanth Narayanan
Signal Analysis and Interpretation Laboratory
University of Southern California

Problem Description:

At USC, we are recently collecting a unique corpus of articulatory data corresponding to spontaneous speech dialogs. Specifically, the data comprises one dimensional time continuous signals, which capture the relative movement of the production of spontaneous speech along different positions of the vocal tract. The magnetometer point-tracking technology (EMA) is used for that purpose. Those signals provide extremely high temporal resolution of moving points on the articulators. The EMA technique provides ideal temporal resolution for examining temporal perturbations that occur at phrase junctures and under focal accent, offers well-established analytical approaches to speech kinematics, and is also accompanied by a high-quality audio signal. In both cases, these techniques will provide data from which a variety of articulatory geometric and kinematic features can be calculated in conjunction with speech features.

The general motivation of this research direction is to evaluate if those production signals provide complementary information for discriminating speech content and consequently, they can be used in a joint modeling framework for speech recognition. Furthermore, based on this joint model framework the idea is to study potential relationship between those production signals and perceptual acoustic events --- acoustic articulatory relations. Therefore, a crucial phase is to characterize a signal representation for the problem. We have planned to use general family of filter banks and explore the proposed rate-distortion framework to define the optimal filter bank analysis for those signals. The idea is to take advantage of the tree structure of the filter bank analysis ---the collection of possible basis set in this scenario--- to characterize algorithms for finding the optimal filter bank based on some local optimality conditions that can be potentially implemented. This last assumption is motivated by results obtained in similar dynamic basis selection problems in the context of image compression using rate-distortion optimality criteria [1, 2]. This research problem is in the spirit of the "Human-like Speech Processing" research project, which is focus in formulating novel approaches for doing automatic speech recognition. This project is a collaborative multidisciplinary research project between USC, University of Washington and Stanford University.

 

Figure 1: Data adquisition           
Figure 2: Articularory signals           

Some References

Signal Representation
K. Ramchandran and M. Vetterli,"Best wavelet packet bases in rate-distortion sense," IEEE Trans. on Image processing, vol. 2, no. 2, April 1993.
K. Ramchandran, M. Vetterli and C. Herley, "Wavelet, subband coding, and best bases," Proceedings of the IEEE vol. 84, no. 4, April 1996.
C. Scott, Tree pruning with sub-additive penalties,” IEEE trans. on Signal Processing, December 2005.

Information Theory
T. M. Cover and J. A. Thomas, Elements of information theory,” Wiley Interscience, New York, NY, 1991.
J. Liu and P. Moulin, ``Information-Theoretic Analysis of Interscale and Intrascale Dependencies Between Image Wavelet
Coefficients,
'' IEEE Trans. on Image Processing, Vol. 10, No. 10, pp. 1647-1658, Nov. 2001.
A.T. Ihler, J. W. Fisher, and A. Willsky, “Nonparametric Hypothesis test for statistical dependencies,” IEEE trans. on signal
processing, vol. 52, no. 8, August 2004.
R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1983.

Articulatory Modeling and Analysis
K. Kirchhoff, G. A. Fink, G. Sagerer, "Combining acoustic and articulatory feature information for robust speech recognition," Speech Communication, 37 (2002) 303–319.

K. Markov, J. Dang and S. Nakamura, "Integration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework," Speech Communication, Volume 48, Issue 2, Pages 111-232, February 2006.


Links

Signal Analysis and Interpretation Laboratory

 

 

Elsevier, SPEECH COMMUNICATION, INFORMATION FUSION

IEEE, Transation on Speech and Audio Procesing,

© SAIL

 

The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees