FINE: Information Embedding for Document Classification- Carter et al.

RR Papers published in 2008.

FINE: Information Embedding for Document Classification- Carter et al.

Citation:
Kevin M. Carter, Raviv Raich, and Alfred O. Hero III. “FINE: Information Embedding for Document Classification,” Proc. of 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, NV. March.2008

Full Paper:
http://tbayes.eecs.umich.edu/kmcarter/f ... ne_doc.pdf

Code/Data: [ZIP,1.4MB]
http://tbayes.eecs.umich.edu/kmcarter/f ... c_code.zip
This Matlab code requires the LibSVM package

BibTeX:
http://ieeexplore.ieee.org/xpls/abs_all ... er=4517996

Copyright Notice:
Copyright holders include the journal/conference publisher.

Complementary URL:
http://tbayes.eecs.umich.edu/kmcarter/fine_doc/

Abstract
The problem of document classification considers categorizing or grouping of various document types. Each document can be represented as a ‘bag of words’, which has no straightforward Euclidean representation. Relative word counts form the basis for similarity metrics among documents. Endowing the vector of term frequencies with a Euclidean metric has no obvious straightforward justification. A more appropriate assumption commonly used is that the data lies on a statistical manifold, or a manifold of probabilistic generative models. In this paper, we propose calculating a low-dimensional, information based embedding of documents into Euclidean space. One component of our approach motivated by information geometry is the Fisher information distance to define similarities between documents. The other component is the calculation of the Fisher metric over a lower dimensional statistical manifold estimated in a nonparametric fashion from the data. We demonstrate that in the classification task, this information driven embedding outperforms both a standard PCA embedding and other Euclidean embeddings of the term frequency vector.


to the comments on this entry



User evaluations

You may select 1 option

 
 
View results



Return to 2008


cron

Reproducible Research Planet! Home | Learn Room | Librum | Blog Room | Give a Suggestion | Report a Problem | Contact Us | Share/Save RSS