Hi Bogdan -- On 04/14/2010 08:19 PM, Bogdan Tanasa wrote: > Dear all, > > please could you suggest any R functions or packages (or external > programs), that likely you'll have more luck on the Bioconductor mailing list,
http://bioconductor.org/docs/mailList.html but... > a. take as input a large number (> 10 000) of short 20-30 nt > sequences, and do sequence assembly, to reconstruct larger (extended) > 30-50 sequences ? I don't know of any sequence assemblers in R; velvet would be a first stop third party tool but it sounds like you have some fairly specific requirements.... > b. take as input a larger number of sequences (100 000 - 1 mil) and > cluster these sequences in distinct classes based on the sequence > similarity ? The Biostrings package has various functions to calculate edit distance, which might form the input to familiar R clustering algorithms. See installation instructions at http://www.bioconductor.org/packages/release/bioc/html/Biostrings.html This thread https://stat.ethz.ch/pipermail/bioconductor/2010-March/032580.html might suggest some directions. Martin > > thanks a lot, > > bogdan > > [[alternative HTML version deleted]] > > ______________________________________________ R-help@r-project.org > mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do > read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.