I think I am making this problem harder than it has to be and so I keep getting
stuck on what might be a trivial problem.
I have used the seqinr package to load a protein sequence alignment containing
15 protein sequences;
> library(seqinr) > x =
read.alignment("proteins.fasta",format="fasta",forceToLower=FALSE)This
automatically loads in a list of 4 elements including the sequences and other
information.
I store the sequences to a new list;
> mylist = x$seqwhich returns a character vector of 15 strings.
I have found that if I split the long character strings into individual
characters it is easy to use lapply to loop over this list. So I use strsplit;
>list.2 = strsplit(mylist, split = NULL)
>From this list I can determine which proteins have changes at certain
>positions by using;
>lapply(list.2, "[", 10) == "L"This returns a logical T/F vector for those
elements of the list that do/do not the letter L at position 10.
Because each of the protein sequences contains 99amino acids, I want to
automate this process so that I do not have to compare/contrast positions 1 x
1. Most of the changes occur between positions/letters 10-95. I have a standard
character vector that I wish to use for comparison when looping through the
list.
Should I perhaps combine all -- the standard "letter"/aa vector, the list of
protein sequences -- into one list? Or is it better to leave them separate for
this comparison? I'm not sure what the output should be as I need to use it for
another statistical test. Would a list of logical vectors be the most
sufficient output to return?
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.