GCC makes the problem is even worse if only SSE and not SSE 2 instructions are enabled. Since the integer bitwise instructions are only available with SSE 2, using casts instead of intrinsics causes GCC to expand the operation into a long series of instructions.
This was also a bug and a patch for this has been posted and approved. Paolo