https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93395
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target| |x86_64-*-*, i?86-*-*
Status|UNCONFIRMED |NEW
Last reconfirmed| |2020-01-23
CC| |hjl.tools at gmail dot com
Ever confirmed|0 |1
Known to fail| |10.0
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
We expand from
perm_missed_optimization (__m256d a)
{
vector(4) double _3;
<bb 2> [local count: 1073741824]:
_3 = __builtin_ia32_permdf256 (a_2(D), 177); [tail call]
return _3;
}
perm_pessimization (__m256d a)
{
vector(4) double _3;
<bb 2> [local count: 1073741824]:
_3 = __builtin_ia32_vpermilpd256 (a_2(D), 5); [tail call]
return _3;
}
perm_workaround (__m256d a)
{
vector(4) double _3;
<bb 2> [local count: 1073741824]:
_3 = __builtin_ia32_shufpd256 (a_2(D), a_2(D), 5); [tail call]
return _3;
}
where perm_pessimization ends up as
(insn 7 6 8 (set (reg:V4DF 84)
(vec_select:V4DF (reg:V4DF 85)
(parallel [
(const_int 1 [0x1])
(const_int 0 [0])
(const_int 3 [0x3])
(const_int 2 [0x2])
]))) "./include/avxintrin.h":651:20 -1
(nil))
exactly the same as perm_missed_optimization
workaround looks like
(insn 8 7 9 (set (reg:V4DF 84)
(vec_select:V4DF (vec_concat:V8DF (reg:V4DF 85)
(reg:V4DF 86))
(parallel [
(const_int 1 [0x1])
(const_int 4 [0x4])
(const_int 3 [0x3])
(const_int 6 [0x6])
]))) "./include/avxintrin.h":339:20 -1
(nil))
so we seem to miss a pattern for the earlier variant matching vpermilpd.