Re: [R] How to exclude lines that match certain regex when using read.table?

Peng Yu Thu, 03 Dec 2009 19:14:20 -0800

On Thu, Dec 3, 2009 at 9:09 PM, Sharpie <ch...@sharpsteen.net> wrote:
>
>
> pengyu.ut wrote:
>>
>> I'm thinking of using external program 'grep' and pipe() to do so. But
>> I'm wondering if there is a more efficient way to do so purely in R
>>
>
> I would just suck the whole table in using read.table(), locate the lines
> that I don't want using apply() and grepl() and then reduce the data set:
>
>  dataSet <- read.table( "someData.txt" )
>
>  dataToDrop <- apply( dataSet, 1, function( row ){
>
>    return(
>      any( grepl( "regex", row ) )
>    )
>
>  })
>
>  dataSet <- subset( dataSet, !dataToDrop )
>
> Since this solution executes entirely in R without resorting to system()
> calls, it should be portable between platforms.


This is not acceptable for my case. The orignal file, which is in .gz
format, is about 100MB. It's original size should be pretty big. But I
only needs about 2% of the data in the original file. It takes a long
time to just read all the file in, if I use your method.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to exclude lines that match certain regex when using read.table?

Reply via email to