Re: [R] Reading large sparse arff files into R

2012-01-01 Thread andy1234
Hi folks, Any ideas on this? This does sound like a fairly common situation - reading in large data file into R? Thanks. Andy -- View this message in context: http://r.789695.n4.nabble.com/Reading-large-sparse-arff-files-into-R-tp4249409p4252393.html Sent from the R help mailing list archive a

[R] Reading large sparse arff files into R

2011-12-31 Thread andy1234
Hi, I am trying to read in a large and highly sparse ARFF file into R which was produced by WEKA. However the package 'RWeka' just chokes on this file. The data set has about 40k observations and about 20k dimensions. Even after 1hr read.arff method of RWeka is still trying to read in the file, w

[R] Feature selection for text using R - Please help

2011-12-30 Thread andy1234
Hi all, I am new to R, and am trying to do feature selection on my text data that has about 30k observations and about 15k features. I am interested in using Chi-Sqaured and Mutual Information based feature selection. I tried using FSelector package but found it too slow for my purposes. Are t

Re: [R] Classifying large text corpora using R

2011-09-03 Thread andy1234
Daniel Malter wrote: > > Take a look here: http://www.jstatsoft.org/v25/i05/paper > > HTH, > Da. > > > andy1234 wrote: >> >> Dear everyone, >> >> I am new to R, and I am looking at doing text classification on a huge >> collection o

[R] Classifying large text corpora using R

2011-09-02 Thread andy1234
Dear everyone, I am new to R, and I am looking at doing text classification on a huge collection of documents (>500,000) which are distributed among 300 classes (so basically, this is my training data). Would someone please be kind enough to let me know about the R packages to use and their scala

[R] R's handling of high dimensional data

2011-08-13 Thread andy1234
Hello all, I am looking at doing text classification on very high dimensional data (about 300,000 or more features) and upto 2000 documents. I am quite new to R though, and was just wondering if R and it's libraries would scale to such high dimensions. Any thoughts will be much appreciated. Th

Re: [R] Entropy based feature selection in R

2011-08-13 Thread andy1234
Hello everyone, Any thoughts in this one please? The only thing I found was the FSelector package (http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Dimensionality_Reduction/Feature_Selection#Aviable_Feature_Ranking_Techniques_in_FSelector_Package). Unfortunately though it seems to be far

[R] Entropy based feature selection in R

2011-07-31 Thread andy1234
I need to use entropy based feature selection to reduce term space while doing text classification. Are there any R packages available that would help me do this? I can also make do with chi squared based algorithm, if there are packages for that. Thanks in advance. Andy -- View this message in