Re: [R] Probing a protein sequence alignment in R

Ulrich Bodenhofer Wed, 25 Nov 2015 05:47:45 -0800

Hi Debra,

As already noted by Boris, the right packages can be found in Bioconductor, 
namely Biostrings (for handling sets of sequences and pairwise alignments) and 
msa (for multiple alignments; a package I am maintaining). Your question does 
not yet clearly indicate to me whether pairwise or multiple sequence alignments 
are the right choice. If I understand you correctly, this is not the point 
anyway, since you want a solution how to find out in which positions two 
aligned sequences differ, right? The following code snippet demonstrates by 
means of a simple example with two DNA strings how to check where two strings 
in a pairwise alignment differ:


   library(Biostrings)
   seq1 <- DNAString("AGCGAGCGA")
   seq2 <- DNAString("AGGATCGA")
   aln <- pairwiseAlignment(seq1, seq2, type="global",
   substitutionMatrix=nucleotideSubstitutionMatrix(4, -1))
   which(strsplit(as.character(pattern(aln)), split="")[[1]] !=
   strsplit(as.character(subject(aln)), split="")[[1]])

Maybe there are more elegant solutions than this one. This is just aquick approach using basic R. Note that the positions are relative tothe positions in the alignment, which need not be the same as thepositions in your original "non-mutant" sequence if the alignmentalgorithm has inserted some gaps in this sequence. If you use multiplealignments, the approach is basically the same:


   library(msa)
   seqs <- DNAStringSet(c(seq1="AGCGAGCGA", seq2="AGGATCGA",
   seq3="AGCGAGC"))
   aln <- msaMuscle(seqs, order="input")
   ## seq1 against seq2:
   which(strsplit(as.character(aln)[1], split="")[[1]] !=
   strsplit(as.character(aln)[2], split="")[[1]])
   ## seq1 against seq3:
   which(strsplit(as.character(aln)[1], split="")[[1]] !=
   strsplit(as.character(aln)[3], split="")[[1]])

I hope this helps.

Best regards,
Ulrich

P.S.: I fully agree with Bert that the Bioconductor support forum wouldbe a good place to discuss this.



On Nov 24, 2015, at 12:04 PM, debra ragland via R-help<r-help@r-project.org>  
wrote:

I have 15 protein sequences of 99 amino acids each. After doing some looking around I have found 
that there are several ways you can read sequences into R and do pairwise or multiple alignments. 
I, however, do not know how to probe changes at specific positions. For instance, I would like to 
know the best way to align a standard sequence with one(1) or several mutant sequences and probe 
each amino acid position that does not match the standard sequence. In other words seq1 = 
"standard amino acid seq" and seq2 = "mutant seq", align these 2 and then have 
a way to ask R to report whether there is a change at position 10, or 11, or 12 and so on such that 
R reports(for example) TRUE or FALSE for this question. Where all the sequences that have a 
reported TRUE for a change at position X can be grouped against those that do not have a change at 
this position.I'm not even sure that R is the best way to do this, but it's the only language I'm 
somewhat familiar with.I hope this make!

s sense.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Probing a protein sequence alignment in R

Reply via email to