Jorge Silva and Shrikanth Narayanan, "Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models," IEEE Transaction on Audio, Speech and Language Processing, vol. 14, issue 3, pp. 890-906, May 2006

Abstract

The paper proposes and evaluates a new statistical discrimination measure for hidden Markov models (HMMs) extending the notion of divergence, a measure of average discrimination information originally defined for two probability density functions. Similar distance measures have been proposed for the case of HMMs, but those have focused primarily on the stationary behavior of the models [1], [2]. However in speech recognition applications, the transient aspects of the models have a principal role in the discrimination process and consequently, capturing this information is crucial in the formulation of any discrimination indicator. This work proposes the notion of Average Divergence Distance (ADD) as a statistical discrimination measure between two HMMs, considering the transient behavior of these models. The paper provides an analytical formulation of the proposed discrimination measure, a justification of its definition based on the Viterbi decoding approach, and a formal proof that this quantity is well defined for a left-to-right HMM topology with a final non-emitting state, a standard model for basic acoustic units in automatic speech recognition (ASR) systems. Using experiments based on this discrimination measure, it is shown that ADD provides a coherent way to evaluate the discrimination dissimilarity between acoustic models.

Index Terms: Acoustic discrimination measures, Kullback-Leibler distance and Divergence, hidden Markov models, Speech Recognition, Information Theory.

File: pdf

References

[1] B. H. Juang and L. R. Rabiner, "A probabilistic distance measure for hidden Markov models," AT&T Technical Journal, vol. 64 no. 2, pp. 391-408, 1985.
[2] M. N. Do, "Fast approximation of Kullback - Leibler distance for dependence trees and hidden Markov models," IEEE Signal Processing Lett. , vol. 10, no. 4, pp. 115-118, Apr. 2003.
[3] M. N. Do and M. Vetterli, "Rotation invariant texture characterization and retrieval using steerable wavelet-domain hidden Markov model," IEEE Transaction on Multimedia, vol. 4, no. 4, pp. 517-527, Dec. 2002.
[4] M. N. Do and M. Vetterli, "Wavelet-Based Texture Retrieval Using Generalized Gaussian Density and Kullback-Leibler distance," IEEE Transaction on Image Processing,vol. 11, no. 2, pp. 146-158, Feb. 2002.
[5] Y. Singer and M.K. Warmuth, "Training Algorithm for Hidden Markov Models Using Entropy Based Distance Functions," in Advances in Neural Information Processing System 9, pp. 641-647, Morgan Kaufmann Publishers, 1996.
[6] M. Falkhausen, H. Reininger, and D. Wolf, "Calculation of distance measures between hidden Markov models," in Proceedings of Eurospeech 1995, pp. 1487-1490, 1995.
[7] M. Vihola, M. Harju, P. Salmela , J. Suontausta and J. Savela," Two dissimilarity measures for HMMS and their application in phoneme model clustering," in Proc. ICASSP 2002, pp. 933-936, May 2002.
[8]N. Vasconcelos, " On the Efficient Evaluation of Probabilistic Similarity Functions for Image Retrieval," IEEE Transactions on Information Theory, vol. 50, No.7, pp1482-1496, July 2004.
[9] S. Kullback, "Information Theory and Statistics," New York: Wiley, 1958.
[10] F. Jelinek, "Statistical Methods for Speech Recognition," MIT Press, 1997.
[11] David J.C. MacKay, "Information Theory, Inference, and Learning Algorithms," Cambridge Press, 2003.
[12] L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb 1989.
[13] Ming-Yi Tsai and Lin - Shan Lee, "Pronunciation Variations Based on Acoustic Phonemic Distance Measures with Applications examples of Mandarin Chinese," in ASRU December 2003.
[14] H. Printz and P. Olsen, "Theory and Practice of Acoustic Confusability," in ISCA ITRW ASR2000, pp. 77-84, 2000.
[15] J.R. Norris, "Markov Chains," Cambridge series in Statistical and Probabilistic Mathematics, 1999.
[16] A.P. Dempster, N. M. Laird, D.B. Rubin, "Maximum Likelihood Incomplete Data via EM Algorithm," Journal of the Royal Statistical Society, Series B, vol. 39, pp. 1-38, 1977.
[17] S. Young, J. Odell, D. Ollason, V. Valtchev, P. Woodland, "HTK book," Cambridge Research Laboratory, 1997.
[18] P. Geutner, M. Finke, A. Waibel, "Selection Criteria for hypothesis driven lexical adaptation," in ICASSP, 1999.
[19] R. Singh, B. Raj, R. Stern, "Structured redefinition of sound units by merging and splitting for improved speech recognition," in ICSLP, 2000.
[20] J. Kohler, "Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds," in ICSLP, 1996.
[21] A. J. Viterbi, "Error bounds for convolutional codes and an asymptotically optimal decoding algorithm," IEEE Trans. Information Theory, vol. IT-13, pp. 260-269, Apr. 1967.
[22] J. Li, R. M. Gray, and R.A. Olshen, "Multiresolution image classification by hierarchical modeling with two-dimensional hidden Markov models," IEEE Trans. On Information Theory, vol. 46, no.5, pp1826-1841, Aug 2000.
[23] M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, "Wavelet-based statistical signal processing using hidden Markov models," IEEE Trans. on Signal Processing, vol. 46, pp. 886-902, Apr. 1998.
[24] R. M. Gray, "Entropy and Information Theory," Springer - Verlag, New York, 1990.
[25] S. Chretien and A. L. Hero III, "Kullback Proximal Algorithms for Maximum-Likelihood Estimation," IEEE Trans. on Information Theory, vol. 46, no. 5, pp. 1800-1810, Aug 2000.
[26] G.D. Forney, "The Viterbi algorithm," Proceedings of the IEEE, 61:268-278, 1973.
[27] L. E. Baum, T. Petrie, G. Soules, and N. Weiss, "A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains," Ann. Math. Stat., vol. 41, pp. 164-171, 1970.
[28] L. R. Liporace, "Maximum likelihood estimation for multivariate observation of Markov sources," IEEE Trans. on Information Theory, vol. IT-28, pp. 729-734, Sept. 1982.
[29] S. Yildirim and S. Narayanan, "An information-theoretic analysis of developmental changes in speech," Proc. ICASSP, April 2003.
[30] L. R. Bahl, P. F. Brown, P. V. de Souza, and R. L. Mercer, "Maximum mutual information estimation of hidden Markov model parameters for speech recognition," in Proc. ICASSP 86, pp. 49-52, Apr. 1986.
[31] Y. Normandin, R. Cardin, and R. De Mori, "High-performance connected digit recognition using maximum mutual information estimation," IEEE Trans. Speech Audio Processing, vol. 2, pp. 299-311, 1994.
[32] B.-H. Juang, W. Chou, and C.-H. Lee, "Minimum classification error methods for speech recognition," IEEE Trans. Speech Audio Processing, vol. 5, no. 3, pp. 257-265, 1997.
[33] Y. Ephraim, A. Dembo and L. R. Rabiner, "A minimum discrimination information approach for hidden Markov modeling," IEEE Trans. on Information Theory, vol. 35, no. 5, pp. 1001-1013, Sept. 1989.
[34] C.E. Shannon, "A mathematical theory of communication," Bell Syst. Tech. J., vol. 27, pp. 379-493, 623- 656, 1948.
[35]Y. Singer and M. K. Warmuth, "Batch and on-line parameter estimation of Gaussian mixtures based on the joint entropy," in Advances in Neural Information Processing System 11, pp. 578-584, 1998.
[36] O. Ronen, J.R. Rohlicek, and M. Ostendorf, "Parameter estimation of dependence tree models using EM algorithm," IEEE Signal Proc. Letter, vol. 2, no. 8, pp. 157-159, Aug. 1995.
[37] P. Smyth, D. Heckerman, and M. Jordan, "Probabilistic independence networks for hidden Markov models," Neural Computation , vol.9, no. 2, pp. 227--269, 1997.
[38] P. Smyth, "Clustering sequences using hidden Markov models, " in Advances in Neural Information Processing 9, MIT Press, pp. 648-654, 1997.
[39] Alan Willsky," Multiresolution Markov Models for Signal and Image Processing ," Proceedings of the IEEE 90(8), August 2002.
[40] H. Lucke, "Which Stochastic Models Allow Baum-Welch Training?," IEEE Transactions on Signal Processing, vol. 44, no.11, November 1996.
[41] T. M. Cover and J. A. Thomas, "Elements of Information Theory," Wiley Interscience, New York, NY, 1991.


Home Research Publications Presentations sail Links

 

The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees