On Thu, Dec 3, 2009 at 9:09 PM, Sharpie <ch...@sharpsteen.net> wrote: > > > pengyu.ut wrote: >> >> I'm thinking of using external program 'grep' and pipe() to do so. But >> I'm wondering if there is a more efficient way to do so purely in R >> > > I would just suck the whole table in using read.table(), locate the lines > that I don't want using apply() and grepl() and then reduce the data set: > > dataSet <- read.table( "someData.txt" ) > > dataToDrop <- apply( dataSet, 1, function( row ){ > > return( > any( grepl( "regex", row ) ) > ) > > }) > > dataSet <- subset( dataSet, !dataToDrop ) > > Since this solution executes entirely in R without resorting to system() > calls, it should be portable between platforms.
This is not acceptable for my case. The orignal file, which is in .gz format, is about 100MB. It's original size should be pretty big. But I only needs about 2% of the data in the original file. It takes a long time to just read all the file in, if I use your method. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.