Christopher Samuel wrote: > I've heard 1TB of RAM quoted by the bioinfomatics people > as the amount of RAM needed to do de-novo reassembly of > the human genome with Velvet (which is a single threaded > application).
That would be an expensive thing to do though, much more efficient to paint the new reads onto the existing consensus and then go back and deal with any discrepancies. Anyway, the Velvet memory usage estimator is described here: http://listserver.ebi.ac.uk/pipermail/velvet-users/2009-July/000474.html Velvet is for assembling from short reads. Short reads are easy to get in large numbers but are cruddy for fully assembling large genomes de novo, since genomes are full of high copy DNA longer than the reads. So the de novo sequence would come out in a lot of disconnected chunks which would have to mapped back onto the reference sequence anyway. Assuming reads of 100 ( bp), a genome of 3000 (in MB, rounded down to the nearest billion), numreads = genome size/read size * 20/1000000 (reads in millions, 20X over sequenced) plugging into that formula gives: -109635 + 18977*100 + 86326*3000 + 233353*(3000000000/100)*20/1000000 - 51092*31 399194013 Kb Roughly 381 Gb (if I didn't typo anything). Most of that is in the hashes, where each base from a read is included in 31 different hashes, once in each position 1->31 (the hash is calculated on a sliding window that increments by 1, not by 31). Effectively the hash takes the 60Gbases of raw sequence and expands it 31X. Regards, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf