In an overcomplete basis, the number of basis vectors is greater than
the dimensionality of the input, and the representation of an input is
not a unique combination of basis vectors. Overcomplete
representations have been advocated because they have greater
robustness in the presence of noise, can be more sparse, and can have
greater flexibility in matching structure in the data. Overcomplete
codes have also been proposed as a model of some of the response
properties of neurons in primary visual cortex. Previous work has
focused on finding the best representation of a signal using a fixed
overcomplete basis (or dictionary). We present an algorithm for
learning an overcomplete basis by viewing it as probabilistic model of
the observed data. We show that overcomplete bases can yield a better
approximation of the underlying statistical distribution of the data
and can thus lead to greater coding efficiency. This can be viewed as
a generalization of the technique of independent component analysis
and provides a method for identification when there are more sources
than mixtures.
compressed postscript (27 pages, 223kB)
Mike Lewicki's home page.