On Mon, Mar 18, 2013 at 1:04 PM, Mark Hahn <h...@mcmaster.ca> wrote: >> flame-wars? The people in HPC who care about SP gflops are those who >> understand the mathematics in their algorithms and don't want to waste >> very precious memory bandwidth by unnecessarily promoting their > > > I'm not disagreeing, but wonder if you'd mind talking about SP-friendly > algorithms a bit. I can see it in principle, but always come up short > when thinking about, eg simulation-type modeling with such low-precision > numbers. does someone really comb through all the calculations, looking for > iffy stuff? (iffy stuff can be pretty subtle - ordering and theoretically > equivalent substitutions. maybe there are compiler-based > tools that perform this sort of analysis automatically?) > > your mention of precious bandwidth is actually relevant to one of the > earlier threads - that there is an upcoming opportunity for cpu vendors > to integrate decent (much bigger than cache) amounts of wide storage > within the package, using 2.5d integration. if there were enough fast > in-package ram, it would presumably not be worthwhile to drive any > off-package ram - any speculation on that threshold? > > regards, mark hahn.
Mark, My experience has been with Weather and Ocean models. In the current weather business, we care about models that can run out to about 14 days, so I am not talking about the 100s of years of simulation done in climate. Our next generation weather and hurricane models are still single precision, with a scattering of double precision in those places that is required. I don't have an exact answer as to how the computational scientists figure out where double precision is needed, but I know they do (I will go ask for some examples). I know in my previous life (15 years ago), I have done the same thing. I was working with an ocean model and tailoring it to do ocean tide prediction. The model came from a Cray where all ops were DP to start with. While optimizing and porting (ie. bug fixing), I found that on non-Cray systems the code ran just fine (within precision) in SP except for one routine, the calculation of the tidal forcing. Comparisons to real data showed no significant difference in the answer with the SP/DP version than the only DP. How did I find it? I was a grad student, so trial and error. Thinking back about it now I probably could have thought about which quantities were small and would have issues with SP precision, and started with those. We have been working with GPUs and now MIC for the last 5 years to find more cost effective architectures for our dwindling hardware budgets. We liked GPUs in the beginning because the memory bandwidth between what a GPU can do (internally) and a socket of Intel/AMD was about a 10x difference. As the years go on the ratio continues to get closer and closer and we get less and less out of the new architectures. Yes, the new stackable memory is something we are looking forward to, but I will wait until I can run on it to determine if we get our memory bandwidth performance curve back (which I highly doubt). Our memory footprint generally is not that high, so keeping the memory in package can work (like it generally does with GPUs and MICs now). I will wait until I can actually runs things before passing judgement. I know that there are many domains where DP (or more) is going to be needed. I know that as we extend our model from weather prediction (< 14 days) to climate prediction ( ~ 90-180 days), then SP may not be adequate. My point was that we do real HPC, and SP performance is something we still care about and saying HPC == DP is not accurate. Craig _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf