On 01/09/2013 12:00 PM, Andrew Holway wrote: > As its a single thread I doubt that faster memory is going to help you much. > It's going to suck whatever you do. > Hi
I don't know anything about computational chemistry or its grid/mesh requirements, but if you look at the makefile, you'll see that the code is compiled with OpenMP: CCFLAGS = -O3 -Wno-deprecated -ffor-scope -fopenmp LFLAGS = -O3 -fopenmp -static Hence, it should work multi-threaded. Whether the use of OpenMP in the code leads to good scaling or not, that is a different story. There are just a handful of OpenMP pragmas, and restricted to three source files only. basin_eval-4.6.cxx: #pragma omp parallel basin_eval-4.6.cxx: #pragma omp for basin_eval-4.6.cxx: #pragma omp parallel basin_eval-4.6.cxx: #pragma omp for compute_wf-4.6.cxx: #pragma omp parallel compute_wf-4.6.cxx: #pragma omp for grid_util-4.6.cxx: #pragma omp parallel grid_util-4.6.cxx: #pragma omp for private (ijk, position) Who knows, if this covers the algorithm's core/expensive loops, that may do it. It may be worth test scaling in any multicore machine, before buying a bigger one. Also, the code seems to be all C++ (irregular grid/adaptive mesh folks seem to love it). I couldn't find any good ol' Fortran or bona fide C. I hope this helps, Gus Correa > Am 9 Jan 2013 um 17:29 schrieb Jörg > Saßmannshausen<[email protected]>: > >> Dear all, >> >> many thanks for the quick reply and all the suggestions. >> >> The code we want to use is that one here: >> >> http://www.cpfs.mpg.de/~kohout/dgrid.html >> >> Feel free to download and dig into the code. I am no expert in Fortran so I >> won't be able to help you much if you got specific questions to the code :-( >> However, my understanding is that it will only run on one core/thread. >> >> As for the budget: That is where it is getting a bit tricky. The ceiling is >> 10k GBP. I know that machines with less memory, say 256 GB, are cheaper, so >> one solution would be to get two of the beast so we can do two calculations >> at >> the same time. If there are enough slots free, we could upgrade to 500 GB >> once >> we got another pot of money. >> >> I guess I would go for DDR3, simply as it is faster. Waiting 2 weeks for a >> calculation is no fun, so if we can save a bit of time here (faster RAM) we >> gain actually quite a bit here. >> >> I am not convinced with the AMD Bulldozer to be honest. From what I >> understand >> the Sandybridge has the faster memory access (higher bandwidth). Is that >> correct or do I miss out something here. >> >> I gather that the idea of just using one CPU is not a good one. So we need to >> have a dual CPU machine, which is fine with me. >> >> I am wondering about the vSMP / ScaleMP suggestion from Joe. If I am using an >> InfiniBand network here, would I be able to spread the 'bottlenecks' a bit >> better? What I am after is, when I tested out the InfiniBand on the new >> cluster >> we got, I noticed that if you are running a job in parallel between nodes, >> the >> same amount of cores are marginally faster. At the time I put that down due >> to >> a slightly faster memory access as there was no bottleneck to the RAM. >> I am not familiar with vSMP (i.e. I never used it), but is it possible to >> aggregate RAM from a number of nodes (say 40) and use it as a large virtual >> SMP? So one node would be slaving away with the calculations and the other >> nodes are only doing memory IO. Is that possible with vSMP? >> In a related context, how about NUMAScale? >> >> The idea of the aggregates SDD is nice as well. I know some storage vendors >> are using a mixture of RAM and SDD for their meta-data (fast access) and that >> seems to work quite well. So that would be a large swap file / partition or >> is >> there another way to use disc-space as RAM? I need to read the paper of >> NVMalloc I suppose. Is that actually used or is that just a good idea and we >> got a working example here? >> >> I don't think there is much disc IO here. There is most certainly no network >> bound traffic as it is a single thread. A fast CPU would be of advantage as >> well, however, I gut the feeling the trade-off would be the memory access >> speed >> (bandwidth). >> >> I have tried to answer the questions raised. Let me know whether there are >> still some unclear points. >> >> Thanks for all your help and suggestions so far. I will need to digest that. >> >> All the best from a sunny London >> >> Jörg >> >> -- >> ************************************************************* >> Jörg Saßmannshausen >> University College London >> Department of Chemistry >> Gordon Street >> London >> WC1H 0AJ >> >> email: [email protected] >> web: http://sassy.formativ.net >> >> Please avoid sending me Word or PowerPoint attachments. >> See http://www.gnu.org/philosophy/no-word-attachments.html >> >> _______________________________________________ >> Beowulf mailing list, [email protected] sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, [email protected] sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
