Estimation of Information Theoretic Quantities based on Data-Dependent Partitions

Key-word: Data-dependent partitions, statistical learning theory, divergence and mutual information, complexity regularization, tree-structured partitions.

This research line explores the problem of universal estimation of information theoretic quantities based on a histogram-based approach, and in particular the role of data-dependent partitions. This problem is largely unexplored, where classical results are available based on product type of histogram-based constructions and kernel plug-in estimates. Here, we are interested in exploring the role of non-product partition schemes because of its representation quality. This has been demonstrated in other statistical learning problems (classification, regression and density estimation). In the technical side, the problem of characterizing strongly consistency estimates and rate of convergence results are particularly important. Also exploring the role of data-driven tree-structured vector quantization (TSVQ) scheme is a very interesting scenario, where connections with minimum cost tree-pruning solutions for solving related complexity regularization problems are interesting directions to study.


Signal Representation for Pattern Recognition

Key-word: Signal representation, feature extraction, pattern recognition, Bayes decision approach, complexity regularization, statistical learning theory, decision trees, linear discriminant analysis, mutual information.

The focus of this research is to study the role of signal representation in pattern recognition from an information theoretic perspective, by finding conceptual connection with related problems in rate-distortion theory for lossy compression and structural risk minimization (SRM) in statistical learning theory. In particular, the minimum probability of error signal representation principle (MPE-SR, Vasconcelos 2004) has been extended for a more general theoretical setting, that justifies addressing it as a complexity-regularized optimization problem. We are currently working in the particularization of the MPE-SR principle for the problem of optimal filter bank selection using the tree-structured family of Wavelet packets (WPs). In this context, the induced complexity-regularized optimization problem could be reduced to the type of minimum cost tree pruning problem well understood in the context of regression and classification trees (CART, Breiman et al 1984) and tree-structured vector quantization (Chou et al 1989). Application to automatic speech recognition (ASR) and other pseudo-stationary time series classification problems are our application focus.


Discrimination Measure for Hidden Markov Models

Key-word: Acoustic discrimination measures, Kullback-Leibler distance or dvergence, hidden Markov models, speech recognition.

The objective of this work is to propose and evaluate new statistical discrimination measures for hidden Markov models (HMMs). The need for comparing different HMMs through appropriate distance measures, often arises in a variety of contexts. In the case of automatic speech recognition (ASR), we can consider: evaluation of the re-estimation processe [4]; redefinition of acoustic units [5]; multilingual phoneme mapping [6]; vocabulary selection [7], and pronunciation variation analysis [8].

We propose to extend the divergence and Kullback-Leibler distance (KLD) [1, 3] to the case of probabilistic function of Markov chains (HMMs) with focus in the transient behavior of the models [2]. Previous efforts that have tried to extend KLD concept to the case of HMMs focused just in their stationary behavior [4, 9]. However, the transient aspect of HMM is particularly crucial in speech recognition applications, because it is the part of the model that captures all the relevant dynamic information of the process.

We proposed the Average Divergence Distance (ADD) as a new statistical discrimination measure between two HMMs. This measure is based on the transient behavior of the models and naturally extends the divergence concept into this context [1]. In addition, we developed an analytical justification of its definition based on the Viterbi decoding approach [10], formally proved that the ADD is well defined for left-to-right HMM topology with a final non-emitting state ( standard model for basic sub-word units ) and proposed a dynamic programming approach to efficiently calculate it.

Publications:
J. Silva and S. Narayanan, "Upper Bound Kullback-Leibler Divergence for Transient Hidden Markov Models,"submitted to IEEE Tranactions on Signal Processing, February 2007..[Abstract, pdf]
J. Silva and S. Narayanan, "Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models," IEEE Transactions on Audio, Speech and Language Processing, vol. 14, issue 3, pp. 890-906, May 2006.[Abstract , pdf]
J. Silva and S. Narayanan, "Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition," in IEEE International Symposium on Information Theory, Seattle, WA 2006. [pdf]
J. Silva and S. Narayanan, "A Statistical Discrimination Measure for Hidden Markov Models based on Divergence," in Proc. of InterSpeech ICSLP, Jeju, Korea, 2004. [pdf]

Principal References:
[1] S. Kullback, "Information Theory and Statistics", New York: Wiley, 1958.
[2] J.R. Norris, "Markov Chains", Cambridge series in Statistical and Probabilistic Mathematics, 1999.
[3] R. M. Gray, "Entropy and Information Theory," Springer - Verlag, New York, 1990.
[4] B. H. Juang and L. R. Rabiner, "A probabilistic distance measure for hidden Markov models", AT&T Technical Journal, vol. 64 No.2, pp. 391-408, 1985.
[5] R. Singh, B. Raj., R. Stern, "Structured redefinition of sound units by merging and splitting for improved speech recognition", in ICSLP, 2000.
[6] Kohler J., "Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds", in ICSLP, 1996.
[7] P. Geutner, M. Finke, and A. Waibel, "Selection Criteria for hypothesis driven lexical adaptation", in ICASSP, 1999.
[8] Ming-Yi Tsai and Lin - Shan Lee, "Pronunciation Variations Based on Acoustic Phonemic Distance Measures with Applications examples of Mandarin Chinese", in ASRU December 2003.
[9] M. N. Do, "Fast approximation of Kullback - Leibler distance for dependence trees and hidden Markov models", IEEE Signal Processing Lett. , vol. 10, No. 4, pp. 115-118, Apr. 2003.
[10] A. J. Viterbi, "Error bounds for convolutional codes and an asymptotically optimal decoding algorithm", IEEE Trans. Information Theory, vol. IT-13, pp. 260-269, Apr. 1967.


Information Theoretic analysis of Acoustic-Articulatory dependencies for Speech Recognition

Key-word: Acoustic production process, information theoretic analysis.

At USC, we are recently collecting a unique corpus of articulatory data corresponding to spontaneous speech dialogs. Specifically, the data comprises one dimensional time continuous signals, which capture the relative movement of the production of spontaneous speech along different positions of the vocal tract. The magnetometer point-tracking technology (EMA) is used for that purpose. Those signals provide extremely high temporal resolution of moving points on the articulators. The EMA technique provides ideal temporal resolution for examining temporal perturbations that occur at phrase junctures and under focal accent, offers well-established analytical approaches to speech kinematics, and is also accompanied by a high-quality audio signal. In both cases, these techniques will provide data from which a variety of articulatory geometric and kinematic features can be calculated in conjunction with speech features.

The general motivation of this research direction is to evaluate if those production signals provide complementary information for discriminating speech content and consequently, they can be used in a joint modeling framework for speech recognition. Furthermore, based on this joint model framework the idea is to study potential relationship between those production signals and perceptual acoustic events --- acoustic articulatory relations. Therefore, a crucial phase is to characterize a signal representation for the problem. We have planned to use general family of filter banks and explore the proposed rate-distortion framework to define the optimal filter bank analysis for those signals. The idea is to take advantage of the tree structure of the filter bank analysis ---the collection of possible basis set in this scenario--- to characterize algorithms for finding the optimal filter bank based on some local optimality conditions that can be potentially implemented. This last assumption is motivated by results obtained in similar dynamic basis selection problems in the context of image compression using rate-distortion optimality criteria. This research problem is in the spirit of the "Human-like Speech Processing" research project, which is focus in formulating novel approaches for doing automatic speech recognition. This project is a collaborative multidisciplinary research project between USC, University of Washington and Stanford University. ..... [link]


Home Research Publications Presentations sail Links

The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees