Hi Debra,

As already noted by Boris, the right packages can be found in Bioconductor, 
namely Biostrings (for handling sets of sequences and pairwise alignments) and 
msa (for multiple alignments; a package I am maintaining). Your question does 
not yet clearly indicate to me whether pairwise or multiple sequence alignments 
are the right choice. If I understand you correctly, this is not the point 
anyway, since you want a solution how to find out in which positions two 
aligned sequences differ, right? The following code snippet demonstrates by 
means of a simple example with two DNA strings how to check where two strings 
in a pairwise alignment differ:

   library(Biostrings)
   seq1 <- DNAString("AGCGAGCGA")
   seq2 <- DNAString("AGGATCGA")
   aln <- pairwiseAlignment(seq1, seq2, type="global",
   substitutionMatrix=nucleotideSubstitutionMatrix(4, -1))
   which(strsplit(as.character(pattern(aln)), split="")[[1]] !=
   strsplit(as.character(subject(aln)), split="")[[1]])

Maybe there are more elegant solutions than this one. This is just a quick approach using basic R. Note that the positions are relative to the positions in the alignment, which need not be the same as the positions in your original "non-mutant" sequence if the alignment algorithm has inserted some gaps in this sequence. If you use multiple alignments, the approach is basically the same:

   library(msa)
   seqs <- DNAStringSet(c(seq1="AGCGAGCGA", seq2="AGGATCGA",
   seq3="AGCGAGC"))
   aln <- msaMuscle(seqs, order="input")
   ## seq1 against seq2:
   which(strsplit(as.character(aln)[1], split="")[[1]] !=
   strsplit(as.character(aln)[2], split="")[[1]])
   ## seq1 against seq3:
   which(strsplit(as.character(aln)[1], split="")[[1]] !=
   strsplit(as.character(aln)[3], split="")[[1]])

I hope this helps.

Best regards,
Ulrich

P.S.: I fully agree with Bert that the Bioconductor support forum would be a good place to discuss this.


On Nov 24, 2015, at 12:04 PM, debra ragland via R-help<r-help@r-project.org>  
wrote:

I have 15 protein sequences of 99 amino acids each. After doing some looking around I have found 
that there are several ways you can read sequences into R and do pairwise or multiple alignments. 
I, however, do not know how to probe changes at specific positions. For instance, I would like to 
know the best way to align a standard sequence with one(1) or several mutant sequences and probe 
each amino acid position that does not match the standard sequence. In other words seq1 = 
"standard amino acid seq" and seq2 = "mutant seq", align these 2 and then have 
a way to ask R to report whether there is a change at position 10, or 11, or 12 and so on such that 
R reports(for example) TRUE or FALSE for this question. Where all the sequences that have a 
reported TRUE for a change at position X can be grouped against those that do not have a change at 
this position.I'm not even sure that R is the best way to do this, but it's the only language I'm 
somewhat familiar with.I hope this make!
s sense.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to