Package: wnpp Severity: wishlist * Package name : r-cran-tm Version : 0.5-7.1 Upstream Author : Ingo Feinerer <feine...@logic.at> * URL : http://tm.r-forge.r-project.org/ * License : GPL-3+ Programming Lang: R Description : GNU R package for text mining
The tm package offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. The package has integrated database backend support to minimize memory demands. An advanced meta data management is implemented for collections of text documents to alleviate the usage of large and with meta data enriched document sets. . With the package ships native support for handling the Reuters-21578 data set, Gmane RSS feeds, e-mails, and several classic file formats (e.g. plain text, CSV text, or PDFs). . The data structures and algorithms can be extended to fit custom demands, since the package is designed in a modular way to enable easy integration of new file formats, readers, transformations and filter operations. . tm provides easy access to preprocessing and manipulation mechanisms such as whitespace removal, stemming, or conversion between file formats. Further a generic filter architecture is available in order to filter documents for certain criteria, or perform full text search. The package supports the export from document collections to term-document matrices, and string kernels can be easily constructed from text documents. --- I am in the process of reviewing O'Reilly's book "Machine Learning for Email". With the recent uploads of gglib2 and plyr, this is the last package that is needed for all packages used by the book to be available officially on Debian (and, I hope, in short time, on popular derivatives like Ubuntu and Linux Mint). Regards, -- Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org