Along those lines, python is so easy to use for stuff like this. Sample code would be
# Read in a file with the data filename = raw_input("Please enter the name of the original file: ") new_file = raw_input("Enter the name of the file to output: ") # create a new file defined by the user f = open(new_file, 'w') outfile = open(filename, 'r') for line in outfile: if line[0] == '>': print >> f, line f.close() > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap > Sent: Tuesday, September 15, 2009 5:45 PM > To: J Chen; r-help@r-project.org > Subject: Re: [R] how to load only lines that start with a > particular symbol > > > -----Original Message----- > > From: r-help-boun...@r-project.org > > [mailto:r-help-boun...@r-project.org] On Behalf Of J Chen > > Sent: Tuesday, September 15, 2009 2:00 PM > > To: r-help@r-project.org > > Subject: [R] how to load only lines that start with a particular > > symbol > > > > > > Dear all, > > > > I have DNA sequence data which are fasta-formatted as > > > > >gene A;..... > > AAAAACCCC > > TTTTTGGGG > > CCCTTTTTT > > >gene B;.... > > CCCCCAAAA > > GGGGGTTTT > > > > I want to load only the lines that start with ">" where the > annotation > > information for the gene is contained. In principle, I can > remove the > > sequences before loading or after loading all the lines. I > just wonder > > if there's a way to load only lines with a particular pattern. The > > skip argument in read.table() doesn't work for my purpose. > > You could use pipe() to call an external program like grep or > perl to filter the lines of interest from the file so R's > input routine only has to allocate space for those. E.g., > the following makes a sample file and the readLines(pipe(...)) > call reads only the lines starting with ">> " from it. (It > assumes you don't have grep in PATH and gives where it is > installed on my Windows machine.) > > > tfile <- tempfile() > > cat(file=tfile, sep="\n", c(">> Date", ">> Author", > "columnA columnB", "1 2", "3 4")) > > > readLines(tfile) > [1] ">> Date" ">> Author" "columnA columnB" "1 2" > > [5] "3 4" > > readLines(pipe(paste("e:/cygwin/bin/grep \"^>> \" ", tfile))) > [1] ">> Date" ">> Author" > > perl can do more complicated processing and filtering than grep. > > Bill Dunlap > TIBCO Software Inc - Spotfire Division > wdunlap tibco.com > > > > > Thanks in advance, > > Jimmy > > > > -- > > View this message in context: > > http://www.nabble.com/how-to-load-only-lines-that-start-with-a > > -particular-symbol-tp25461693p25461693.html > > Sent from the R help mailing list archive at Nabble.com. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.