[R] Removing funny characters from a column of a data frame

Bansal, Vikas Sun, 07 Aug 2011 12:58:31 -0700

Dear all,

The 5th column of my data frame is like this-


.$.$.$.$.$,$,$...,,,,,.,,.,,...,,,,.,,....,,,T...,,,,,,,,,,,.,,,,,....,,...,,
,..,,....,,,,,...,,,..,,......,,,,,,,....,,,.,,,,....,,...G.,,,,,,,,...,,,,,,.,,
,t.,,c,,.a.,,,.A,,,,....,,,.....,,,,..........,,,,,..,,,.,,,....,,,,,...,,,$....
.,,,,..,,,...,,,,,..,,,,,,.............$..,,,,,,...,,..,,$,...,,,,,,,....,,,,,,.
,,,,......,,,,.,,.......,.....,,,,,,.,,..,,...,,,,,.,......,.......,,....,,,,..,,
,,,,.........,,,,,.....,,,,...,,,.....,,.....,,......,....,,......,.,,..,,,,...,,
H.,,,..,,.....,,,,..,,,,,,,,,^~.^~.^\".^~.^~.^~.^~,^~,^~,^~,"  

I just want to have A,a,C,c,G,g,T,t and dot and comma in the columns.

example of first row should be-

.....,,...,,,,,.,,.,,...,,,,.,,....,,,T...,,,,,,,,,,,.,,,,,....,,...,,


currently i am using this code-

df$V5 <-  apply(df, 1, function(x) 
gsub("\\:|\\$|\\^|!|\\-|1|2|3|4|5|6|7|8|10|~|H", "",x[5]))

this use of gsub looks odd to me,although result is coming good but I want 
something fast because data is large.I want something like this-

delete everything else except  A,a,C,c,G,g,T,t and dot and comma.

Any suggestions Please.


        
Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Removing funny characters from a column of a data frame

Reply via email to