Re: [Beowulf] GPU Beowulf Clusters

C. Bergström Mon, 01 Feb 2010 11:10:22 -0800

Jon Forrest wrote:

On 1/30/2010 4:31 AM, Micha Feigin wrote:

It is recommended BTW, that you have at least the same amount ofsystem memory
as GPU memory, so with tesla it is 4GB per GPU.


I'm not going to get Teslas, for several reasons:

1) This is a proof of concept cluster. Spending $1200
per graphics card means that the GPUs alone, assuming
2 GPUs, would cost as much as a whole node with
2 consumer-grade cards. (See below)

2) We know that the Fermi cards are coming out
soon. If we were going to spend big bucks
on GPUs, we'd wait for them. But, our funding
runs out before the Fermis will be available.
This is too bad but there's nothing I can do
about it.

See below for comments regarding CPUs and cores.

You use dedicated systems. Either one 1u pizza box for the CPU and amatched 1u
tesla s1070 pizza box which has 4 tesla GPUs


Since my first post I've learned about the Supermicro boxes
that have space for two GPUs

(http://www.supermicro.com/products/system/1U/6016/SYS-6016GT-TF.cfm?GPU=).

This looks like a good way to go for a proof-of-concept cluster. Plus,
since we have to pay $10/U/month at the Data Center, it's a good
way to use space.

The GPU that looks the most promising is the GeForce GTX275.
(http://www.evga.com/products/moreInfo.asp?pn=017-P3-1175-AR)
It has 1792MB of RAM and is only ~$300. I realize that there
are better cards but for this proof-of-concept cluster we
want to get the best bang for the buck. Later, after we've
ported our programs, and have some experience optimizing them,
then we'll consider something better, probably using whatever
the best Fermi-based card is.

The research group that will be purchasing this cluster does
molecular dynamics simulations that usually take 24 hours or more
to complete using quad-core Xeons. We hope to bring down this
time substantially.

It doesn't have a swap in/swap out mechanism, so the way it may timeshare isby alternating kernels as long as there is enough memory. Shouldn'tbe done for
HPC (same with CPU by the way due to numa/l2 cache and context switching
issues).


Right. So this means 4 cores should be good enough for 2 GPUs.
I wish somebody made a motherboard that would allow 6-core
AMD Istanbuls, but they don't. Putting 2 4-cores CPUs on the
motherboard might not be worth the cost. I'm not sure.

The processes will be sharing the pci bus though for communicationsso you mayprefer to setup the system as 1 job per machine or at least a roundrobin
scheduler.


This is another reason not to go crazy with lots of cores.
They'll be sitting idle most of the time, unless I also
create queues for normal non-GPU jobs.

Take note that the s1070 is ~6k$ so you are talking at most two to three
machines here with your budget.


Ha, ha!! ~$6K should get me two compute nodes, complete
with graphics cards.

I appreciate everyone's comments, and I welcome more.

Hi Jon,

I must emphasize what David Mathog said about the importance of the gpuprogramming model.


My perspective (with hopefully not too much opinion added)

OpenCL vs CUDA - OpenCL is 1/10th as popular, lacks in features, moretedious to write and in an effort to stay generic loses the potential tofully exploit the gpu. At one point the performance of the drivers fromNvidia was not equivalent, but I think that's been fixed. (This doesnot mean all vendors are unilaterally doing a good job)

HMPP and everything else I'm far too biased to offer my commentspublicly. (Feel free to email me offlist if curious)

Have you considered sharing access with another research lab that hasalready purchased something similar?(Some vendors may also be willing to let you run your codes in exchangefor feedback.)


I'd not completely disregard the importance of the host processor.

   1) sw thread synchronization chews up processor time

2) Do you already know if your code has enough computationalcomplexity to outweigh the memory access costs?3) Do you know if the GTX275 has enough vram? Your benchmarks willsuffer if you start going to gart and page faulting4) I can tell you 100% that not all gpu are created equally when itcomes to handling cuda code. I don't have experience with the GTX275,but if you do hit issues I would be curious to hear about them.


Some questions in return..
Is your code currently C, C++ or Fortran?

Is there any interest in optimizations at the compiler level which couldbenefit molecular dynamics simulations?



Best,

./Christopher

_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] GPU Beowulf Clusters

Reply via email to