------- Comment #10 from whaley at cs dot utsa dot edu 2006-12-19 17:18 ------- Guys,
In the interests of full disclosure, I did some quick timings on the Core2Duo, and as I kind of suspected, scalar SSE crushed x87 there. I was pretty sure scalar SSE could achieve 2 flop/cycle, while Intel kept the x87 at 1 flop/cycle, and that's what my timings show. So, it does appear likely that the only people using the x87 in the future on the Intel will be people who need the extra precision (and those people would really like this fix, I will point out :). All other Intel archs (P4, PIII, etc) do 1 flop cycle for both scalar SSE and x87. On the AMDs, both x87 and scalar SSE can achieve 2 flop/cycle, with x87 running somewhat faster, with only a slight advantage in double precision, and a more commanding one in single. It looks like the next generation of AMDs will increase the maximal flop rate of vector SSE, but it does not look like they will increase the max flop rate of scalar SSE, so this may continue to be the case going forward . . . Cheers, Clint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255