Re: [R] how to load only lines that start with a particular symbol

2009-09-16 Thread Patrick Connolly
On Tue, 15-Sep-2009 at 02:44PM -0700, William Dunlap wrote: |> perl can do more complicated processing and filtering than grep. I once used perl to filter the useful 4 or 5 Mb out of a file that was over 250Mb. It took about 3 lines of perl code and about 40 seconds to run. Perl's not exactly

Re: [R] how to load only lines that start with a particular symbol

2009-09-15 Thread Gabor Grothendieck
In the Windows cmd shell ^ means escape the next character so try this (assuming the data you posted is in genetest.dat in the current directory): > readLines(pipe("findstr/b ^> genetest.dat")) [1] ">gene A;." ">gene B;" and on UNIX replace "..." with the corresponding grep command making

Re: [R] how to load only lines that start with a particular symbol

2009-09-15 Thread Doran, Harold
-project.org] On Behalf Of William Dunlap > Sent: Tuesday, September 15, 2009 5:45 PM > To: J Chen; r-help@r-project.org > Subject: Re: [R] how to load only lines that start with a > particular symbol > > > -Original Message- > > From: r-help-boun...@r-project.or

Re: [R] how to load only lines that start with a particular symbol

2009-09-15 Thread William Dunlap
> -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of J Chen > Sent: Tuesday, September 15, 2009 2:00 PM > To: r-help@r-project.org > Subject: [R] how to load only lines that start with a > particular sy

Re: [R] how to load only lines that start with a particular symbol

2009-09-15 Thread jim holtman
read in the data with 'readLines' and then use 'grep' > x [1] ">gene A;." "A" "T" "CCCTT" ">gene B;" "C" "G" > x <- x[grep("^>", x)] > x [1] ">gene A;." ">gene B;" > On Tue, Sep 15, 2009 at 4:59 PM, J Chen wrote: > > Dear all, > >

[R] how to load only lines that start with a particular symbol

2009-09-15 Thread J Chen
Dear all, I have DNA sequence data which are fasta-formatted as >gene A;. A T CCCTT >gene B; C G I want to load only the lines that start with ">" where the annotation information for the gene is contained. In principle, I can remove the sequences bef