The ps[rl]ldq instruction takes an immediate to indicate number of bytes to shift. __builtin_ia32_psrldqi128 and __builtin_ia32_pslldqi128 in gcc 3.4 model those instructions. But with this patch
http://gcc.gnu.org/ml/gcc-patches/2005-01/msg00468.html changes them to number of bits to shift in gcc 4.0/4.1. As the result, __builtin_ia32_psrldqi128 and __builtin_ia32_ps[rl]ldqi128 are incompatible between gcc 3.4 and gcc 4.x. -- Summary: [4.0/4.1 Regression]: __builtin_ia32_psr[rl]dqi128 are changed Product: gcc Version: 4.0.3 Status: UNCONFIRMED Severity: normal Priority: P2 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: hjl at lucon dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24392