On Tue, 15-Sep-2009 at 02:44PM -0700, William Dunlap wrote:
|> perl can do more complicated processing and filtering than grep.
I once used perl to filter the useful 4 or 5 Mb out of a file that was
over 250Mb. It took about 3 lines of perl code and about 40 seconds
to run. Perl's not exactly
In the Windows cmd shell ^ means escape the next character
so try this (assuming the data you posted
is in genetest.dat in the current directory):
> readLines(pipe("findstr/b ^> genetest.dat"))
[1] ">gene A;." ">gene B;"
and on UNIX replace "..." with the corresponding grep command
making
-project.org] On Behalf Of William Dunlap
> Sent: Tuesday, September 15, 2009 5:45 PM
> To: J Chen; r-help@r-project.org
> Subject: Re: [R] how to load only lines that start with a
> particular symbol
>
> > -Original Message-
> > From: r-help-boun...@r-project.or
> -Original Message-
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of J Chen
> Sent: Tuesday, September 15, 2009 2:00 PM
> To: r-help@r-project.org
> Subject: [R] how to load only lines that start with a
> particular sy
read in the data with 'readLines' and then use 'grep'
> x
[1] ">gene A;." "A" "T" "CCCTT"
">gene B;" "C" "G"
> x <- x[grep("^>", x)]
> x
[1] ">gene A;." ">gene B;"
>
On Tue, Sep 15, 2009 at 4:59 PM, J Chen wrote:
>
> Dear all,
>
>
Dear all,
I have DNA sequence data which are fasta-formatted as
>gene A;.
A
T
CCCTT
>gene B;
C
G
I want to load only the lines that start with ">" where the annotation
information for the gene is contained. In principle, I can remove the
sequences bef
6 matches
Mail list logo