On 21-May-09 16:56:23, retama wrote: > Patrick Burns kindly provided an article about this issue called > 'The R Inferno'. However, I will expand a little bit my question > because I think it is not clear and, if I coud improve the code > it will be more understandable to other users reading this messages > when I will paste it :) > > In my example, I have a dataframe with several hundreds of DNA > sequences in the column data$sequences (each value is a long string > written in an alphabet of four characters, which are A, C, T and G). > I'm trying to know parameter number of Gs plus Cs over the total > [G+C/(A+T+C+G)] in each sequence. In example, data$sequence [1] is > something like AATTCCCGGGGGG but a little bit longer, and, its G+C > content is 0.69 . I need to compute a vector with all G+C contents > (in my example, in data$GCsequence, in which data$GCsequence[1] is > 0.69). > > So the question was if making a loop and a combination of values with > c() or cbind() or with logical subscripts is ok or not. And which > approach should produce better results in terms of efficiency (my > script goes really slow). > > Thank you, > Retama
Perhaps the following could be the basis of your code for the bigger problem: S <- unlist(strsplit("AATTCCCGGGGGG","")) S # [1] "A" "A" "T" "T" "C" "C" "C" "G" "G" "G" "G" "G" "G" (sum((S=="C")|(S=="G"))) # [1] 9 (sum((S=="C")|(S=="G")))/length(S) # [1] 0.6923077 You could build a function on those lines, to evaluate what you want for any given string; and then apply() it to the elements (which are the separate character strings) of data$sequences (which is presumably a vector of character strings). Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 21-May-09 Time: 18:18:24 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.