|
Estimation
of Information Theoretic Quantities based on Data-Dependent
Partitions
Key-word:
Data-dependent partitions, statistical
learning theory, divergence and mutual information, complexity
regularization, tree-structured partitions.
This research line explores
the problem of universal estimation of information theoretic
quantities based on a histogram-based approach, and in particular
the role of data-dependent partitions. This problem is largely
unexplored, where classical results are available based on
product type of histogram-based constructions and kernel plug-in
estimates. Here, we are interested in exploring the role of
non-product partition schemes because of its representation
quality. This has been demonstrated in other statistical learning
problems (classification, regression and density estimation).
In the technical side, the problem of characterizing strongly
consistency estimates and rate of convergence results are
particularly important. Also exploring the role of data-driven
tree-structured vector quantization (TSVQ) scheme is
a very interesting scenario, where connections with minimum
cost tree-pruning solutions for solving related complexity
regularization problems are interesting directions to study.
Signal Representation for Pattern Recognition
Key-word:
Signal representation, feature extraction,
pattern recognition, Bayes decision approach, complexity regularization,
statistical learning theory, decision trees, linear discriminant
analysis, mutual information.
The
focus of this research is to study the role of signal representation
in pattern recognition from an information theoretic perspective,
by finding conceptual connection with related problems in
rate-distortion theory for lossy compression and structural
risk minimization (SRM) in statistical learning theory.
In particular, the minimum probability of error signal representation
principle (MPE-SR, Vasconcelos 2004) has been extended
for a more general theoretical setting, that justifies addressing
it as a complexity-regularized optimization problem. We are
currently working in the particularization of the MPE-SR principle
for the problem of optimal filter bank selection using the
tree-structured family of Wavelet packets (WPs). In
this context, the induced complexity-regularized optimization
problem could be reduced to the type of minimum cost tree
pruning problem well understood in the context of regression
and classification trees (CART, Breiman et al 1984)
and tree-structured vector quantization (Chou et al 1989).
Application to automatic speech recognition (ASR) and
other pseudo-stationary time series classification problems
are our application focus.
Discrimination
Measure for Hidden Markov Models
Key-word:
Acoustic discrimination measures, Kullback-Leibler
distance or dvergence, hidden Markov models, speech recognition.
The
objective of this work is to propose and evaluate new statistical
discrimination measures for hidden Markov models (HMMs).
The need for comparing different HMMs through appropriate
distance measures, often arises in a variety of contexts.
In the case of automatic speech recognition (ASR),
we can consider: evaluation of the re-estimation processe
[4]; redefinition of acoustic units [5]; multilingual phoneme
mapping [6]; vocabulary selection [7], and pronunciation variation
analysis [8].
We
propose to extend the divergence and Kullback-Leibler distance
(KLD) [1, 3] to the case of probabilistic function
of Markov chains (HMMs) with focus in the transient
behavior of the models [2]. Previous efforts that have tried
to extend KLD concept to the case of HMMs focused just in
their stationary behavior [4, 9]. However, the transient aspect
of HMM is particularly crucial in speech recognition applications,
because it is the part of the model that captures all the
relevant dynamic information of the process.
We
proposed the Average Divergence Distance (ADD)
as a new statistical discrimination measure between two HMMs.
This measure is based on the transient behavior of the models
and naturally extends the divergence concept into this context
[1]. In addition, we developed an analytical justification
of its definition based on the Viterbi decoding approach [10],
formally proved that the ADD is well defined for left-to-right
HMM topology with a final non-emitting state ( standard model
for basic sub-word units ) and proposed a dynamic programming
approach to efficiently calculate it.
Publications:
J.
Silva and S. Narayanan, "Upper
Bound Kullback-Leibler Divergence for Transient Hidden Markov
Models,"submitted
to IEEE Tranactions on Signal Processing, February 2007..[Abstract,
pdf]
J.
Silva and S. Narayanan, "Average
Divergence Distance as a Statistical Discrimination Measure
for Hidden Markov Models," IEEE Transactions on Audio,
Speech and Language Processing, vol. 14, issue 3, pp. 890-906,
May 2006.[Abstract
, pdf]
J.
Silva and S. Narayanan, "Upper Bound Kullback-Leibler
Divergence for Hidden Markov Models with Application as Discrimination
Measure for Speech Recognition," in IEEE
International Symposium on Information Theory, Seattle, WA
2006.
[pdf]
J.
Silva and S. Narayanan, "A
Statistical Discrimination Measure for Hidden Markov Models
based on Divergence," in Proc. of InterSpeech
ICSLP, Jeju, Korea, 2004. [pdf]
Principal
References:
[1] S. Kullback,
"Information Theory and Statistics", New York: Wiley,
1958.
[2] J.R. Norris, "Markov Chains", Cambridge series
in Statistical and Probabilistic Mathematics, 1999.
[3] R. M. Gray, "Entropy and Information Theory,"
Springer - Verlag, New York, 1990.
[4] B. H. Juang and L. R. Rabiner, "A probabilistic distance
measure for hidden Markov models", AT&T Technical
Journal, vol. 64 No.2, pp. 391-408, 1985.
[5] R. Singh, B. Raj., R. Stern, "Structured redefinition
of sound units by merging and splitting for improved speech
recognition", in ICSLP, 2000.
[6] Kohler J., "Multi-lingual phoneme recognition exploiting
acoustic-phonetic similarities of sounds", in ICSLP,
1996.
[7] P. Geutner, M. Finke, and A. Waibel, "Selection Criteria
for hypothesis driven lexical adaptation", in ICASSP,
1999.
[8] Ming-Yi Tsai and Lin - Shan Lee, "Pronunciation Variations
Based on Acoustic Phonemic Distance Measures with Applications
examples of Mandarin Chinese", in ASRU December 2003.
[9] M. N. Do, "Fast approximation of Kullback - Leibler
distance for dependence trees and hidden Markov models",
IEEE Signal Processing Lett. , vol. 10, No. 4, pp. 115-118,
Apr. 2003.
[10] A. J. Viterbi, "Error bounds for convolutional codes
and an asymptotically optimal decoding algorithm", IEEE
Trans. Information Theory, vol. IT-13, pp. 260-269, Apr. 1967.
Information
Theoretic analysis of Acoustic-Articulatory dependencies for
Speech Recognition
Key-word:
Acoustic
production process, information
theoretic analysis.
At
USC, we are recently collecting a unique corpus of articulatory
data corresponding to spontaneous speech dialogs. Specifically,
the data comprises one dimensional time continuous signals,
which capture the relative movement of the production of spontaneous
speech along different positions of the vocal tract. The magnetometer
point-tracking technology (EMA) is used for that purpose.
Those signals provide extremely high temporal resolution of
moving points on the articulators. The EMA technique provides
ideal temporal resolution for examining temporal perturbations
that occur at phrase junctures and under focal accent, offers
well-established analytical approaches to speech kinematics,
and is also accompanied by a high-quality audio signal. In
both cases, these techniques will provide data from which
a variety of articulatory geometric and kinematic features
can be calculated in conjunction with speech features.
The
general motivation of this research direction is to evaluate
if those production signals provide complementary information
for discriminating speech content and consequently, they can
be used in a joint modeling framework for speech recognition.
Furthermore, based on this joint model framework the idea
is to study potential relationship between those production
signals and perceptual acoustic events --- acoustic articulatory
relations. Therefore, a crucial phase is to characterize a
signal representation for the problem. We have planned to
use general family of filter banks and explore the proposed
rate-distortion framework to define the optimal filter bank
analysis for those signals. The idea is to take advantage
of the tree structure of the filter bank analysis ---the collection
of possible basis set in this scenario--- to characterize
algorithms for finding the optimal filter bank based on some
local optimality conditions that can be potentially implemented.
This last assumption is motivated by results obtained in similar
dynamic basis selection problems in the context of image compression
using rate-distortion optimality criteria. This research problem
is in the spirit of the "Human-like Speech Processing"
research project, which is focus in formulating novel approaches
for doing automatic speech recognition. This project is a
collaborative multidisciplinary research project between USC,
University of Washington and Stanford University. .....
[link]
Home
Research
Publications Presentations
sail Links
|