http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46419
Summary: xmmintrin.h: _mm_cvtpu16_ps (and hence _mm_cvtpu8_ps) returns false result in gcc >= 4.4 Product: gcc Version: 4.4.5 Status: UNCONFIRMED Severity: critical Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: release_candid...@yahoo.com Created attachment 22367 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22367 example code Dear GCC developers, I guess the patch set <http://gcc.gnu.org/viewcvs?view=revision&revision=134558> broke the _mm_cvtpu16_ps() and _mm_cvtpu8_ps() intrinsics. For demonstration, please refer to the attached example. It is intended to convert four chars (1,2,3,4) into a SSE float vector type (__m128) by using the Intel intrinsics _mm_cvtpu8_ps() and _mm_setr_pi8(). The output of the program compiled with gcc-4.3 is: image: 1 2 3 4 out4: 1 2 3 4 This result is correct, and complies with Intel's intrinsic docs <http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_mmx_set.htm> // <http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse_conversion.htm>, as well as the output of icc compilation. The output of gcc-4.4 and gcc-4.5 compilation is: image: 1 2 3 4 out4: 3 4 1 2 I was able to trace this back the change set referred above. If I include the old xmmintrin.h instead of the new header when using gcc-4.4, the result is correct again. I didn't study the changes of rev. 134558 in detail, and I do not know if the new algorithm is theoretically correct at all. Could you please fix this bug? I don't know about the other intrinsics touched by that patch. Within this context, concerning the bug <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37496> might also be worth while. Thanks, Dirk