http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46419

           Summary: xmmintrin.h: _mm_cvtpu16_ps (and hence _mm_cvtpu8_ps)
                    returns false result in gcc >= 4.4
           Product: gcc
           Version: 4.4.5
            Status: UNCONFIRMED
          Severity: critical
          Priority: P3
         Component: c
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: release_candid...@yahoo.com


Created attachment 22367
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22367
example code

Dear GCC developers,

I guess the patch set
<http://gcc.gnu.org/viewcvs?view=revision&revision=134558> broke the
_mm_cvtpu16_ps() and _mm_cvtpu8_ps() intrinsics.

For demonstration, please refer to the attached example. It is intended to
convert four chars (1,2,3,4) into a SSE float vector type (__m128) by using the
Intel intrinsics _mm_cvtpu8_ps() and _mm_setr_pi8().

The output of the program compiled with gcc-4.3 is:
    image: 1 2 3 4
    out4:  1 2 3 4
This result is correct, and complies with Intel's intrinsic docs 
<http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_mmx_set.htm>
//
<http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse_conversion.htm>,
as well as the output of icc compilation.

The output of gcc-4.4 and gcc-4.5 compilation is:
    image: 1 2 3 4
    out4:  3 4 1 2


I was able to trace this back the change set referred above. If I include the
old xmmintrin.h instead of the new header when using gcc-4.4, the result is
correct again. I didn't study the changes of rev. 134558  in detail, and I do
not know if the new algorithm is theoretically correct at all.


Could you please fix this bug?
I don't know about the other intrinsics touched by that patch.
Within this context, concerning the bug
<http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37496> might also be worth while.


Thanks,

Dirk

Reply via email to