I'm about to spend ~$20K on a new cluster that will be a proof-of-concept for doing GPU-based computing in one of the research groups here.
A GPU cluster is different from a traditional HPC cluster in several ways: 1) The CPU speed and number of cores are not that important because most of the computing will be done inside the GPU. 2) Serious GPU boards are large enough that they don't easily fit into standard 1U pizza boxes. Plus, they require more power than the standard power supplies in such boxes can provide. I'm not familiar with the boxes that therefore should be used in a GPU cluster. 3) Ideally, I'd like to put more than one GPU card in each computer node, but then I hit the issues in #2 even harder. 4) Assuming that a GPU can't be "time shared", this means that I'll have to set up my batch engine to treat the GPU as a non-sharable resource. This means that I'll only be able to run as many jobs on a compute node as I have GPUs. This also means that it would be wasteful to put CPUs in a compute node with more cores than the number GPUs in the node. (This is assuming that the jobs don't do anything parallel on the CPUs - only on the GPUs). Even if GPUs can be time shared, given the expense of copying between main memory and GPU memory, sharing GPUs among several processes will degrade performance. Are there any other issues I'm leaving out? Cordially, -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforr...@berkeley.edu _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf