> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of J Chen > Sent: Tuesday, September 15, 2009 2:00 PM > To: r-help@r-project.org > Subject: [R] how to load only lines that start with a > particular symbol > > > Dear all, > > I have DNA sequence data which are fasta-formatted as > > >gene A;..... > AAAAACCCC > TTTTTGGGG > CCCTTTTTT > >gene B;.... > CCCCCAAAA > GGGGGTTTT > > I want to load only the lines that start with ">" where the annotation > information for the gene is contained. In principle, I can remove the > sequences before loading or after loading all the lines. I > just wonder if > there's a way to load only lines with a particular pattern. The skip > argument in read.table() doesn't work for my purpose.
You could use pipe() to call an external program like grep or perl to filter the lines of interest from the file so R's input routine only has to allocate space for those. E.g., the following makes a sample file and the readLines(pipe(...)) call reads only the lines starting with ">> " from it. (It assumes you don't have grep in PATH and gives where it is installed on my Windows machine.) > tfile <- tempfile() > cat(file=tfile, sep="\n", c(">> Date", ">> Author", "columnA columnB", "1 2", "3 4")) > readLines(tfile) [1] ">> Date" ">> Author" "columnA columnB" "1 2" [5] "3 4" > readLines(pipe(paste("e:/cygwin/bin/grep \"^>> \" ", tfile))) [1] ">> Date" ">> Author" perl can do more complicated processing and filtering than grep. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > > Thanks in advance, > Jimmy > > -- > View this message in context: > http://www.nabble.com/how-to-load-only-lines-that-start-with-a > -particular-symbol-tp25461693p25461693.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.