Hello! I need your help. I am trying to calculate the pairwise differences between sequences from several fasta files. I would like for each of my DNA alignments (fasta files), calculate the pairwise differences and then: - 1. Combine all the data of each file to have one file and one histogram (mismatch distribution) - 2. calculate the mean for each difference for all the file and again make a mismatch distribution plot
Here the script that I wrote: library("pegas") > library("seqinr") > library("ggplot2") > > > Files <- list.files(pattern="fas") > nb_files <- length(Files) > > > for (i in 1:nb_files) { > Dist <- as.numeric(dist.gene(read.dna(Files[i], "fasta"), method > = "pairwise", > pairwise.deletion = FALSE, variance = FALSE)) > > Data <- merge(Data, Dist, by=c("x"), all=T) > } > > hist(Data, prob=TRUE) > lines(density(Data), col="blue", lwd=2) > However, the script does not work and I do not know what to change to make it working. Thanks in advance for your help. Myriam -- Myriam Croze, PhD Post-doctorante Division of EcoScience, Ewha Womans University Seoul, South Korea Email: myriam.croz...@gmail.com [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.