On 9/10/10 4:24 PM, Gabor Grothendieck wrote:
On Fri, Sep 10, 2010 at 4:20 PM, Duke<duke.li...@gmx.com> wrote:
On 9/10/10 2:49 PM, Gabor Grothendieck wrote:
On Fri, Sep 10, 2010 at 1:24 PM, Duke<duke.li...@gmx.com> wrote:
Hi all,
I have to filter a tab-delimited text file like below:
"GeneNames" "value1" "value2" "log2(Fold_change)"
"log2(Fold_change) normalized" "Signature(abs(log2(Fold_change)
normalized)> 4)"
ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656
TRUE
ENSG00000177133 142 2 5.46771720082336 5.13545298955309
FALSE
ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982
TRUE
ENSG00000009724 10 162 -4.69995182667858 -5.03221603794886
FALSE
ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731
TRUE
based on the last column (TRUE), and then write to a new text file,
meaning
I should get something like below:
"GeneNames" "value1" "value2" "log2(Fold_change)"
"log2(Fold_change) normalized" "Signature(abs(log2(Fold_change)
normalized)> 4)"
ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656
TRUE
ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982
TRUE
ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731
TRUE
I used read.table and write.table but I am still not very satisfied with
the
results. Here is what I did:
expFC<- read.table( "test.txt", header=T, sep="\t" )
expFC.TRUE<- expFC[expFC[dim(expFC)[2]]=="TRUE",]
write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t"
)
Result:
"GeneNames" "value1" "value2" "log2.Fold_change."
"log2.Fold_change..normalized"
"Signature.abs.log2.Fold_change..normalized....4."
"ENSG00000209350" 4 35 -3.81131293562629 -4.14357714689656
TRUE
"ENSG00000116285" 115 1669 -4.54130810709955
-4.87357231836982
TRUE
"ENSG00000162460" 3 31 -4.05126372834704 -4.38352793961731
TRUE
As you can see, there are two points:
1. The headers were altered. All the special characters were converted to
dot (.).
2. The gene names (first column) were quoted (which were not in the
original
file).
This will copy input lines matching pattern as well as the header to
the output verbatim preserving all quotes, spacing, etc.
myFilter<- function(infile, outfile, pattern = "TRUE$") {
L<- readLines(infile)
cat(L[1], "\n", file = outfile)
L2<- grep(pattern, L[-1], value = TRUE)
for(el in L2) cat(el, "\n", file = outfile, append = TRUE)
}
# e.g.
myFilter("infile.txt", "outfile.txt")
I love this the best! Even it is not as simple as the bash one liner
(system( "cat infile.txt | grep -v FALSE> outfile.txt", wait=TRUE )), but I
am very happy to learn that R does have other similar functions as in bash.
If there is a document or a list of all such functions, that would be
excellent.
Thanks Gabor,
Check out these help files:
help.search(keyword = "character", package = "base")
Great! Thanks so much Gabor.
D.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.