https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93395

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|                            |x86_64-*-*, i?86-*-*
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-01-23
                 CC|                            |hjl.tools at gmail dot com
     Ever confirmed|0                           |1
      Known to fail|                            |10.0

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
We expand from

perm_missed_optimization (__m256d a)
{
  vector(4) double _3;

  <bb 2> [local count: 1073741824]:
  _3 = __builtin_ia32_permdf256 (a_2(D), 177); [tail call]
  return _3;

}

perm_pessimization (__m256d a)
{
  vector(4) double _3;

  <bb 2> [local count: 1073741824]:
  _3 = __builtin_ia32_vpermilpd256 (a_2(D), 5); [tail call]
  return _3;

}
perm_workaround (__m256d a)
{
  vector(4) double _3;

  <bb 2> [local count: 1073741824]:
  _3 = __builtin_ia32_shufpd256 (a_2(D), a_2(D), 5); [tail call]
  return _3;

}

where perm_pessimization ends up as

(insn 7 6 8 (set (reg:V4DF 84)
        (vec_select:V4DF (reg:V4DF 85)
            (parallel [
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                    (const_int 3 [0x3])
                    (const_int 2 [0x2])
                ]))) "./include/avxintrin.h":651:20 -1
     (nil))

exactly the same as perm_missed_optimization

workaround looks like

(insn 8 7 9 (set (reg:V4DF 84)
        (vec_select:V4DF (vec_concat:V8DF (reg:V4DF 85)
                (reg:V4DF 86))
            (parallel [
                    (const_int 1 [0x1])
                    (const_int 4 [0x4])
                    (const_int 3 [0x3])
                    (const_int 6 [0x6])
                ]))) "./include/avxintrin.h":339:20 -1
     (nil))

so we seem to miss a pattern for the earlier variant matching vpermilpd.

Reply via email to