------- Additional Comments From tbptbp at gmail dot com 2005-01-31 22:21 ------- Oops, my bad. Thought pshufd mixed both operands à la shufps; i'm obviously not familiar with the integer side of SSE.
And yes the combination is a lose, albeit a small one around 3%. But i'm timing the whole thing not just that intersection code. To give you an idea here's the peak perf for a given tree. intersection inlined: ICC: 16.4 fps gcc: 14.7 unpatched, 14.4 patched gcc & -fno-gcse: 14.8 unpatched, 14.5 patched intersection not inlined: gcc: 14.9 unpatched, 13.7 patched gcc & -fno-gcse: 14.8 unpatched, 14.0 patched The problem is that the surrounding code (kdtree search) also change a lot and gcc doesn't find an optimal way for that either. What's reassuring is that ICC isn't much smarter than gcc on the tree search. Each compiler comes up with nice trick here and there but none got the whole picture right. Still ICC doesn't suck as much on average. I'm going to replace all that tree search with asm to settle the issue someday. BTW the difference between ICC and GCC is around ~5% on the scalar path. I'm going to try each patch in turn as requested. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19680