https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67612
Bug ID: 67612 Summary: Unable to vectorize DOT_PROD_EXPR (PMADDWD) Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: dmalcolm at gcc dot gnu.org Target Milestone: --- Created attachment 36346 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36346&action=edit Test case The attached code is a reduced form of a loop that the user hoped would be auto-vectorized to using PMADDWD, but no vectorization occurs (this was whilst investigating possible use of libgccjit for autovectorization). With a recent gcc trunk (r227686), I get this for the reproducer at -O3: 0000000000000000 <test_muladd>: 0: 31 c0 xor %eax,%eax 2: 85 c9 test %ecx,%ecx 4: 7e 37 jle 3d <test_muladd+0x3d> 6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) d: 00 00 00 10: 44 0f bf 04 82 movswl (%rdx,%rax,4),%r8d 15: 44 0f bf 14 86 movswl (%rsi,%rax,4),%r10d 1a: 44 0f bf 4c 82 02 movswl 0x2(%rdx,%rax,4),%r9d 20: 45 0f af d0 imul %r8d,%r10d 24: 44 0f bf 44 86 02 movswl 0x2(%rsi,%rax,4),%r8d 2a: 45 0f af c1 imul %r9d,%r8d 2e: 45 01 d0 add %r10d,%r8d 31: 44 89 04 87 mov %r8d,(%rdi,%rax,4) 35: 48 83 c0 01 add $0x1,%rax 39: 39 c1 cmp %eax,%ecx 3b: 7f d3 jg 10 <test_muladd+0x10> 3d: f3 c3 repz retq Building with -fdump-tree-vect-details to see why gcc -O3 fails to vectorize, I see this in FILENAME.c.130t.vect: (snip) ../../src/vector_dot_prod.c:11:3: note: ==> examining pattern statement: patt_91 = DOT_PROD_EXPR <_14, _18, _29>; ../../src/vector_dot_prod.c:11:3: note: vect_is_simple_use: operand _14 ../../src/vector_dot_prod.c:11:3: note: def_stmt: _14 = *_13; ../../src/vector_dot_prod.c:11:3: note: type of def: internal ../../src/vector_dot_prod.c:11:3: note: not vectorized: relevant stmt not supported: patt_91 = DOT_PROD_EXPR <_14, _18, _29>; ../../src/vector_dot_prod.c:11:3: note: bad operation or unsupported loop bound. ../../src/vector_dot_prod.c:5:1: note: vectorized 0 loops in function. Stepping through: gcc/tree-vect-stmts.c:vect_analyze_stmt for stmt: patt_91 = DOT_PROD_EXPR <_14, _18, _29>; I see that vectorizable_operation returns false here: 4821 if (nunits_out != nunits_in) 4910 return false; (gdb) p nunits_out $16 = 4 (gdb) p nunits_in $17 = 8 Should this be a vectorizable_operation?