http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253



Richard Biener <rguenth at gcc dot gnu.org> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

             Target|                            |x86_64-*-*, i?86-*-*

             Status|UNCONFIRMED                 |NEW

   Last reconfirmed|                            |2013-02-08

          Component|tree-optimization           |target

     Ever Confirmed|0                           |1



--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-08 
12:38:50 UTC ---

Confirmed.  That's because we have



__m256 foo(__m256, __m256, __m256) (__m256 a, __m256 b, __m256 c)

{

  __m256 D.6689;

  __m256 D.6686;

  __m256 _5;

  __m256 _6;



  <bb 2>:

  _5 = __builtin_ia32_mulps256 (a_1(D), b_2(D));

  _6 = __builtin_ia32_addps256 (_5, c_3(D));

  return _6;



instead of



  _5 = a_1(D) * b_2(D);

  _6 = _5 + c_3(D);



not sure why we use builtins for these basic operations...



_mm256_add_ps could for example be simply



extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__,

__artificial__))

_mm256_add_ps (__m256 __A, __m256 __B)

{

  return (__m256) ((__v8sf)__A + (__v8sf)__B);

}



with the caveat of using a GNU extension.

Reply via email to