Also, see the nearZeroVar function in the caret package. MAx
On Fri, Mar 7, 2008 at 7:41 AM, Charilaos Skiadas <[EMAIL PROTECTED]> wrote: > > On Mar 7, 2008, at 2:17 AM, Oldrich Kruza wrote: > > > Hello Soumyadeep, > > > > if you store the data in a tabular file, then I suggest using standard > > text-editing tools like cut (say your file is called data.csv, fields > > are separated with commas and you want to get rid of the third and > > sixth column): > > > > $ cut --complement --delimiter="," --fields=3,6 < data.csv > > > data_cut.csv > > > > If you're not in an Unix environment but have perl, then you may use a > > script like: > > > > open SRC, "data.csv" or die("couldn't open source"); > > open DST, ">data_cut.csv" or die("couldn't open destination"); > > while (<SRC>) { > > chomp; > > @fields = split /,/; #substitute the comma for the > > delimiter you use > > splice @fields, 2, 1; #get rid of third column (they're > > zero-based, thus 2 instead of 3) > > splice @fields, 5, 1; #get rid of sixth column > > print DST join(",", @fields), "\n"; > > } > > > > If you need to do the selection within R, then you can do it by > > indexing the data structure. Suppose you have the data in a data.frame > > called data. Then: > > > >> data <- data[,-6] > >> data <- data[,-3] > > > > might do the trick (but since I'm not much of an R hacker, this is > > without guarantee). I think it might be better however to do the > > preprocessing before the data get into R because then you avoid > > loading the columns to discard into memory. > > I am guessing that the data is already in R, so it should be easier > to do it in R, especially if he doesn't know which columns are the > ones with all identical values. For instance, suppose the data set is > called x. Then the following would return TRUE for the columns that > have all values the same: > > allsame <- sapply(x,function(y) length(table(y))==1) > > and then the following will take them out > > newdata <- x[,!allsame] > > > Hope this helps > > ~ Oldrich > > Haris Skiadas > Department of Mathematics and Computer Science > Hanover College > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Max ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.