https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84361
Bug ID: 84361 Summary: Fails to use vfmaddsub* for complex multiplication Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Blocks: 53947 Target Milestone: --- Target: x86_64-*-*, i?86-*-* I see vfmadd132ps %ymm12, %ymm8, %ymm2 vfmsub132ps %ymm12, %ymm8, %ymm7 vblendps $170, %ymm2, %ymm7, %ymm7 generated from _298 = -vect__174.663_871; vect__38.664_872 = vect__173.659_831 * vect__178.660_844 + _298; vect__38.665_873 = vect__173.659_831 * vect__178.660_844 + vect__174.663_871; _874 = VEC_PERM_EXPR <vect__38.664_872, vect__38.665_873, { 0, 9, 2, 11, 4, 13, 6, 15 }>; which is similar to the addsub cases we already handle. combine sees (insn 391 390 392 21 (set (reg:V8SF 845 [ vect__38.664 ]) (fma:V8SF (reg:V8SF 440 [ vect__173.659 ]) (reg:V8SF 445 [ vect__178.660 ]) (neg:V8SF (reg:V8SF 457 [ vect__174.663 ])))) 1886 {*fma_fmsub_v8sf} (nil)) (insn 392 391 393 21 (set (reg:V8SF 846 [ vect__38.665 ]) (fma:V8SF (reg:V8SF 440 [ vect__173.659 ]) (reg:V8SF 445 [ vect__178.660 ]) (reg:V8SF 457 [ vect__174.663 ]))) 1842 {*fma_fmadd_v8sf} (expr_list:REG_DEAD (reg:V8SF 457 [ vect__174.663 ]) (expr_list:REG_DEAD (reg:V8SF 445 [ vect__178.660 ]) (expr_list:REG_DEAD (reg:V8SF 440 [ vect__173.659 ]) (nil))))) (insn 393 392 394 21 (set (reg:V8SF 460 [ _874 ]) (vec_merge:V8SF (reg:V8SF 846 [ vect__38.665 ]) (reg:V8SF 845 [ vect__38.664 ]) (const_int 170 [0xaa]))) 3885 {avx_blendps256} (expr_list:REG_DEAD (reg:V8SF 846 [ vect__38.665 ]) (expr_list:REG_DEAD (reg:V8SF 845 [ vect__38.664 ]) (nil)))) I can find <avx512>_fmaddsub_<mode>_mask<round_name> which looks like a patter for AVX512 but I miss the AVX256 case? The non-fma patterns look like (define_insn "avx_addsubv8sf3" [(set (match_operand:V8SF 0 "register_operand" "=x") (vec_merge:V8SF (minus:V8SF (match_operand:V8SF 1 "register_operand" "x") (match_operand:V8SF 2 "nonimmediate_operand" "xm")) (plus:V8SF (match_dup 1) (match_dup 2)) (const_int 85)))] "TARGET_AVX" "vaddsubps\t{%2, %1, %0|%0, %1, %2}" This occurs in polyhedron capacita in the hot loop in fourir. If you build with -Ofast -march=core-avx2 -fno-vect-cost-model you should see the above. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations