> > 1. One processor at each of the compute nodes > > 2. Two processors (on one mother board) at each of the compute nodes > > 3. Two Processors (each one dual-core processor) (total 4 cores on > > 4. four processor (on one mother board) at each of the compute nodes.
not considering a 4x2 configuration? > > Initially, we are deciding to use Gigabit ehternet switch and 1GB of > >RAM at > >each node. that seems like an odd choice. it's not much ram, and gigabit is extremely slow (relative to alternatives, or in comparison to on-board memory access.) > I've heard many times that memory throughput is extremally important > in CFD and that using of 1 cpu/1 core per node (or 2 single cores > Opteron having independed memory channels) is in some cases better > than any sharing of memory bus(es). I've heard that too - it's a shame someone doesn't simply use the profiling registers to look at cache hit-rates on these codes... but I'd be somewhat surprised if modern CFD codes were entirely mem-bandwidth-dominated, that is, that they wouldn't make some use of the cache. my very general observation is that it's getting to be unusual to encounter code which has as "flat" a memory reference pattern as Stream - just iterating over whole swaths of memory sequentially. advances such as mesh adaptation, etc tend to make memory references less sequential (more random, but also touching fewer overall bytes, and thus possibly more cache-friendly.) of course, I'm just an armchair CFD'er ;) in short, it's important not to disregard memory bandwidth, but 6.4 GB/s is quite a bit, and may not be a problem on a dual-core system where each core has 1MB L2 to itself. especially since 1GB/system implies that the models are not huge in the first place. that said, I find that CFDers tend not to aspire to running on large numbers of processors. so a cluster of 4x2 machines (which aim to run mostly <= 8p jobs on single nodes) might be very nice. there are nice side-effects to having fatter nodes, especially if your workload is not embarassingly parallel. (we should have terminology to describe other levels of parallel coupling - "mortifyingly parallel", for instance. I think "shamefully parallel" is great description of people who wrap serial job in an MPI wrapper gratuitously, for instance. and how about "immodestly parallel" for coupled jobs that scale well, but still somewhat sub-linearly?) _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf