Hi Debra,
As already noted by Boris, the right packages can be found in Bioconductor,
namely Biostrings (for handling sets of sequences and pairwise alignments) and
msa (for multiple alignments; a package I am maintaining). Your question does
not yet clearly indicate to me whether pairwise or multiple sequence alignments
are the right choice. If I understand you correctly, this is not the point
anyway, since you want a solution how to find out in which positions two
aligned sequences differ, right? The following code snippet demonstrates by
means of a simple example with two DNA strings how to check where two strings
in a pairwise alignment differ:
library(Biostrings)
seq1 <- DNAString("AGCGAGCGA")
seq2 <- DNAString("AGGATCGA")
aln <- pairwiseAlignment(seq1, seq2, type="global",
substitutionMatrix=nucleotideSubstitutionMatrix(4, -1))
which(strsplit(as.character(pattern(aln)), split="")[[1]] !=
strsplit(as.character(subject(aln)), split="")[[1]])
Maybe there are more elegant solutions than this one. This is just a
quick approach using basic R. Note that the positions are relative to
the positions in the alignment, which need not be the same as the
positions in your original "non-mutant" sequence if the alignment
algorithm has inserted some gaps in this sequence. If you use multiple
alignments, the approach is basically the same:
library(msa)
seqs <- DNAStringSet(c(seq1="AGCGAGCGA", seq2="AGGATCGA",
seq3="AGCGAGC"))
aln <- msaMuscle(seqs, order="input")
## seq1 against seq2:
which(strsplit(as.character(aln)[1], split="")[[1]] !=
strsplit(as.character(aln)[2], split="")[[1]])
## seq1 against seq3:
which(strsplit(as.character(aln)[1], split="")[[1]] !=
strsplit(as.character(aln)[3], split="")[[1]])
I hope this helps.
Best regards,
Ulrich
P.S.: I fully agree with Bert that the Bioconductor support forum would
be a good place to discuss this.
On Nov 24, 2015, at 12:04 PM, debra ragland via R-help<r-help@r-project.org>
wrote:
I have 15 protein sequences of 99 amino acids each. After doing some looking around I have found
that there are several ways you can read sequences into R and do pairwise or multiple alignments.
I, however, do not know how to probe changes at specific positions. For instance, I would like to
know the best way to align a standard sequence with one(1) or several mutant sequences and probe
each amino acid position that does not match the standard sequence. In other words seq1 =
"standard amino acid seq" and seq2 = "mutant seq", align these 2 and then have
a way to ask R to report whether there is a change at position 10, or 11, or 12 and so on such that
R reports(for example) TRUE or FALSE for this question. Where all the sequences that have a
reported TRUE for a change at position X can be grouped against those that do not have a change at
this position.I'm not even sure that R is the best way to do this, but it's the only language I'm
somewhat familiar with.I hope this make!
s sense.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.