Problem
Description:
References:
Internal:
Links:
Information
Theoretic analysis of Acoustic-Articulatory dependencies for
Speech Recognition
Vivek Rangarajan, Jorge Siva, Viktor Rozgic
and Shrikanth Narayanan
Signal Analysis and Interpretation Laboratory
University of Southern California
Problem
Description:
At
USC, we are recently collecting a unique corpus of articulatory
data corresponding to spontaneous speech dialogs. Specifically,
the data comprises one dimensional time continuous signals,
which capture the relative movement of the production of spontaneous
speech along different positions of the vocal tract. The magnetometer
point-tracking technology (EMA) is used for that purpose.
Those signals provide extremely high temporal resolution of
moving points on the articulators. The EMA technique provides
ideal temporal resolution for examining temporal perturbations
that occur at phrase junctures and under focal accent, offers
well-established analytical approaches to speech kinematics,
and is also accompanied by a high-quality audio signal. In
both cases, these techniques will provide data from which
a variety of articulatory geometric and kinematic features
can be calculated in conjunction with speech features.
The
general motivation of this research direction is to evaluate
if those production signals provide complementary information
for discriminating speech content and consequently, they can
be used in a joint modeling framework for speech recognition.
Furthermore, based on this joint model framework the idea
is to study potential relationship between those production
signals and perceptual acoustic events --- acoustic articulatory
relations. Therefore, a crucial phase is to characterize a
signal representation for the problem. We have planned to
use general family of filter banks and explore the proposed
rate-distortion framework to define the optimal filter bank
analysis for those signals. The idea is to take advantage
of the tree structure of the filter bank analysis ---the collection
of possible basis set in this scenario--- to characterize
algorithms for finding the optimal filter bank based on some
local optimality conditions that can be potentially implemented.
This last assumption is motivated by results obtained in similar
dynamic basis selection problems in the context of image compression
using rate-distortion optimality criteria [1, 2]. This research
problem is in the spirit of the "Human-like Speech Processing"
research project, which is focus in formulating novel approaches
for doing automatic speech recognition. This project is a
collaborative multidisciplinary research project between USC,
University of Washington and Stanford University.
Figure 1: Data adquisition
Figure 2: Articularory signals
Some
References
Signal
Representation
K. Ramchandran and M. Vetterli,"Best
wavelet packet bases in rate-distortion sense," IEEE
Trans. on Image processing, vol. 2, no. 2, April 1993.
K. Ramchandran, M. Vetterli and C. Herley,
"Wavelet, subband coding, and best bases,"
Proceedings of the IEEE vol. 84, no. 4, April 1996.
C. Scott, Tree pruning
with sub-additive penalties, IEEE trans. on Signal
Processing, December 2005.
Information
Theory
T. M. Cover and J. A. Thomas, Elements
of information theory, Wiley Interscience, New York,
NY, 1991.
J. Liu and P. Moulin, ``Information-Theoretic
Analysis of Interscale and Intrascale Dependencies Between
Image Wavelet
Coefficients,'' IEEE Trans.
on Image Processing, Vol. 10, No. 10, pp. 1647-1658, Nov.
2001.
A.T. Ihler, J. W. Fisher, and
A. Willsky, Nonparametric Hypothesis test for statistical
dependencies, IEEE trans. on signal
processing, vol. 52, no. 8, August 2004.
R. O. Duda and P. E. Hart, Pattern
Classification and Scene Analysis. New York: Wiley, 1983.
Articulatory
Modeling and Analysis
K. Kirchhoff, G. A. Fink, G. Sagerer,
"Combining acoustic and articulatory feature information
for robust speech recognition," Speech Communication,
37 (2002) 303319.
K. Markov, J. Dang and S. Nakamura, "Integration
of articulatory and spectrum features based on the hybrid
HMM/BN modeling framework," Speech Communication,
Volume 48, Issue 2, Pages 111-232, February 2006.
Links
Signal
Analysis and Interpretation Laboratory
Elsevier, SPEECH
COMMUNICATION, INFORMATION
FUSION
IEEE, Transation
on Speech and Audio Procesing,
© SAIL
|