https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93395
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Target| |x86_64-*-*, i?86-*-* Status|UNCONFIRMED |NEW Last reconfirmed| |2020-01-23 CC| |hjl.tools at gmail dot com Ever confirmed|0 |1 Known to fail| |10.0 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- We expand from perm_missed_optimization (__m256d a) { vector(4) double _3; <bb 2> [local count: 1073741824]: _3 = __builtin_ia32_permdf256 (a_2(D), 177); [tail call] return _3; } perm_pessimization (__m256d a) { vector(4) double _3; <bb 2> [local count: 1073741824]: _3 = __builtin_ia32_vpermilpd256 (a_2(D), 5); [tail call] return _3; } perm_workaround (__m256d a) { vector(4) double _3; <bb 2> [local count: 1073741824]: _3 = __builtin_ia32_shufpd256 (a_2(D), a_2(D), 5); [tail call] return _3; } where perm_pessimization ends up as (insn 7 6 8 (set (reg:V4DF 84) (vec_select:V4DF (reg:V4DF 85) (parallel [ (const_int 1 [0x1]) (const_int 0 [0]) (const_int 3 [0x3]) (const_int 2 [0x2]) ]))) "./include/avxintrin.h":651:20 -1 (nil)) exactly the same as perm_missed_optimization workaround looks like (insn 8 7 9 (set (reg:V4DF 84) (vec_select:V4DF (vec_concat:V8DF (reg:V4DF 85) (reg:V4DF 86)) (parallel [ (const_int 1 [0x1]) (const_int 4 [0x4]) (const_int 3 [0x3]) (const_int 6 [0x6]) ]))) "./include/avxintrin.h":339:20 -1 (nil)) so we seem to miss a pattern for the earlier variant matching vpermilpd.