 |

Jorge
Silva and Shrikanth S. Narayanan,
"Upper Bound Kullback-Leibler Divergence for Transient
Hidden Markov Models," in Press, IEEE Tranactions on Signal
Processing.
Abstract
This
paper reports an upper bound for the Kullback-Leibler divergence
(KLD) for a general family of transient hidden Markov models
(HMMs). An upper bound KLD (UBKLD) expression for Gaussian
mixtures models (GMMs) is presented which is generalized
for the case of HMMs.Moreover, this formulation is extended
to the case of HMMs with non-emitting states, where under
some general assumptions, the UBKLD is proved to be well
defined for a general family of transient models. In particular,
the UBKLD has a computationally efficient closed-form for
HMMs with left-to-right topology and a final non-emitting
state, that we refer to as left-to-right transient HMMs.
Finally, the usefulness of the closed-form expression is
experimentally evaluated for automatic speech recognition
(ASR) applications, where left-to-right transient HMMs are
used to model basic acoustic-phonetic units. Results show
that the UBKLD is an accurate discrimination indicator for
comparing acoustic HMMs used for ASR.
Index
Terms:
Kullback-Leibler divergence, Gaussian mixture models, Markov
chains, hidden Markov processes and hidden Markov models,
transient processes, automatic speech recognition.
File:
manuscript

References
[1]
S. Kullback, Information theory and statistics. New York:
Wiley, 1958.
[2] J. Silva and S. Narayanan, Average divergence distance
as a statistical discrimination measure for hidden Markov
models, IEEE Transactions on Audio, Speech, and Language
Processing, vol. 14, no. 3, pp. 890906, May 2006.
[3] M. N. Do and M. Vetterli, Rotation invariant texture
characterization and retrieval using steerable wavelet-domain
hidden Markov model, IEEE Transactions on Multimedia,
vol. 4, no. 4, pp. 517527, December 2002.
[4] , Wavelet-based texture retrieval using
generalized gaussian density and kullback-leibler distance,
IEEE Transactions on Image Processing, vol. 11, no. 2, pp.
146158, February 2002.
[5] N. Vasconcelos, On the efficient evaluation of probabilistic
similarity functions for image retrieval, IEEE Transactions
on Information Theory, vol. 50, no. 7, pp. 14821496,
July 2004.
[6] P. Smyth, Clustering sequences using hidden Markov
models, in Advances in Neural Information Processing
9. MIT Press, 1997, pp. 648654.
[7] L. R. Rabiner, A tutorial on hidden Markov models
and selected applications in speech recognition, Proc.
of IEEE, vol. 77, no. 2, pp. 257286, February 1989.
[8] J. Li, R. M. Gray, and R. Olshen, Multiresolution
image classification by hierarchical modeling with two-dimensional
hidden Markov models, IEEE Transactions on Information
Theory, vol. 46, no. 5, pp. 18261841, 2000.
[9] M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, Wavelet-based
statistical signal processing using hidden Markov models,
IEEE Transactions on Signal Processing, vol. 46, pp. 886902,
April 1998.
[10] A. Willsky, Multiresolution Markov models for signal
and image processing, Proc. of IEEE, vol. 90, no. 8,
August 2002.
[11] P. Smyth, D. Heckerman, and M. Jordan, Probabilistic
independence networks for hidden Markov models, Neural
Computation, vol. 9, no. 2, pp. 227269, 1997.
[12] M. I. Jordan, Graphical models, Statistical
Science:, vol. 19, pp. 140155, 2004.
[13] T. P. L.E. Baum, Statistical inferences for probabilistic
functions of finite state Markov chains, Ann. Math.
Stat., vol. 37, pp. 15541563, 1966.
[14] Y. Ephraim and N. Merhav, Hidden Markov processes,
IEEE Transactions on Information Theory, vol. 48, no. 6, pp.
15181569, June 2002.
[15] B. H. Juang and L. R. Rabiner, A probabilistic
distance measure for hidden Markov models, AT&T
Technical Journal, vol. 64, no. 2, pp. 391408, 1985.
[16] R. Singh, B. Raj, and R. Stern, Structured redefinition
of sound units by merging and splitting for improved speech
recognition, in ICSLP, 2000.
[17] M. N. Do, Fast approximation of Kullback - Leibler
distance for dependence trees and hidden markov models,
IEEE Signal Processing Letters, vol. 10, no. 4, pp. 115118,
April 2003.
[18] Y. Singer and M. Warmuth, Training algorithm for
hidden Markov models using entropy based distance functions,
in Advances in Neural Information Processing System 9. Morgan
Kaufmann Publishers, 1996, pp. 641647.
[19] Y. Singer and M. K. Warmuth, Batch and on-line
parameter estimation of gaussian mixtures based on the joint
entropy, in Advances in Neural Information Processing
System 11, 1998, pp. 578584.
[20] T. M. Cover and J. A. Thomas, Elements of information
theory. Wiley Interscience, New York, 1991.
[21] J. Norris, Markov Chains. Cambridge series in Statistical
and Probabilistic Mathematics, 1999.
[22] M. Falkhausen, H. Reininger, and D. Wolf, Calculation
of distance measures between hidden Markov models, in
Proceedings of Eurospeech, 1995, pp. 14871490.
[23] R. M. Gray, Entropy and information theory. Springer-Verlag,
New York, 1990.
[24] P. Halmos, Measure theory. D. Van Notrand Co., New York,
1950.
[25] F. den Hollander, Large Deviations. American Mathematical
Society, 2000.
[26] L. Breiman, Probability. Society of Industrial and Applied
Mathematics, 1968.
[27] F. Jelinek, Statistical methods for speech recognition.
MIT Press, 1997.
[28] S. Yildirim and S. Narayanan, An information-theoretic
analysis of developmental changes in speech, in Proceedings
ICASSP, April 2003.
[29] S. Young, J. Odell, D. Ollason, V. Valtchev, and P. Woodland,
Htk book, Cambridge Research Laboratory, Tech.
Rep., 1997.
[30] M. Vihola, M. Harju, P. Salmela, J. Suontausta, and J.
Savela, Two dissimilarity measures for HMMs and their
application in phoneme model clustering, in Proceedings
ICASSP, May 2002, pp. 933936.
[31] M.-Y. Tsai and L.-S. Lee, Pronunciation variations
based on acoustic phonemic distance measures with applications
examples of mandarin chinese, in ASRU, December 2003..
Home
Research
Publications Presentations
sail Links
|
 |