On Fri, Sep 10, 2010 at 4:20 PM, Duke <duke.li...@gmx.com> wrote: > On 9/10/10 2:49 PM, Gabor Grothendieck wrote: >> >> On Fri, Sep 10, 2010 at 1:24 PM, Duke<duke.li...@gmx.com> wrote: >>> >>> Hi all, >>> >>> I have to filter a tab-delimited text file like below: >>> >>> "GeneNames" "value1" "value2" "log2(Fold_change)" >>> "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change) >>> normalized)> 4)" >>> ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656 >>> TRUE >>> ENSG00000177133 142 2 5.46771720082336 5.13545298955309 >>> FALSE >>> ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982 >>> TRUE >>> ENSG00000009724 10 162 -4.69995182667858 -5.03221603794886 >>> FALSE >>> ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731 >>> TRUE >>> >>> based on the last column (TRUE), and then write to a new text file, >>> meaning >>> I should get something like below: >>> >>> "GeneNames" "value1" "value2" "log2(Fold_change)" >>> "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change) >>> normalized)> 4)" >>> ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656 >>> TRUE >>> ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982 >>> TRUE >>> ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731 >>> TRUE >>> >>> I used read.table and write.table but I am still not very satisfied with >>> the >>> results. Here is what I did: >>> >>> expFC<- read.table( "test.txt", header=T, sep="\t" ) >>> expFC.TRUE<- expFC[expFC[dim(expFC)[2]]=="TRUE",] >>> write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t" >>> ) >>> >>> Result: >>> >>> "GeneNames" "value1" "value2" "log2.Fold_change." >>> "log2.Fold_change..normalized" >>> "Signature.abs.log2.Fold_change..normalized....4." >>> "ENSG00000209350" 4 35 -3.81131293562629 -4.14357714689656 >>> TRUE >>> "ENSG00000116285" 115 1669 -4.54130810709955 >>> -4.87357231836982 >>> TRUE >>> "ENSG00000162460" 3 31 -4.05126372834704 -4.38352793961731 >>> TRUE >>> >>> As you can see, there are two points: >>> >>> 1. The headers were altered. All the special characters were converted to >>> dot (.). >>> 2. The gene names (first column) were quoted (which were not in the >>> original >>> file). >>> >> This will copy input lines matching pattern as well as the header to >> the output verbatim preserving all quotes, spacing, etc. >> >> myFilter<- function(infile, outfile, pattern = "TRUE$") { >> L<- readLines(infile) >> cat(L[1], "\n", file = outfile) >> L2<- grep(pattern, L[-1], value = TRUE) >> for(el in L2) cat(el, "\n", file = outfile, append = TRUE) >> } >> >> # e.g. >> myFilter("infile.txt", "outfile.txt") >> > > I love this the best! Even it is not as simple as the bash one liner > (system( "cat infile.txt | grep -v FALSE > outfile.txt", wait=TRUE )), but I > am very happy to learn that R does have other similar functions as in bash. > If there is a document or a list of all such functions, that would be > excellent. > > Thanks Gabor, >
Check out these help files: help.search(keyword = "character", package = "base") -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.