https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123002

            Bug ID: 123002
           Summary: WIDEN_MULT_EXPR are incorrectly vectorized since
                    gcc-15
           Product: gcc
           Version: 15.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liam at seine dot email
  Target Milestone: ---

The vectorization tree optimization has a bug that leads to gcc producing
incorrect code for widening multiplications.

Incorrect code is produced for C inputs as simple as:

main.c:
```
static unsigned int const enc_table_32[8][3] = {
    {513735U, 77223048U, 437087610U },
    {0U,      78508U,    646269101U },
    {0U,      0U,        11997U,    },
    {0U,      0U,        0U,        },
    {0U,      0U,        0U,        },
    {0U,      0U,        0U,        },
    {0U,      0U,        0U,        },
    {0U,      0U,        0U,        }};

int main() {
    unsigned long intermediate[3] = {0};

    for (unsigned long i = 0UL; i < 8; i++) {
        intermediate[0] += 2 * (unsigned long)(enc_table_32)[i][0];
        intermediate[1] += 2 * (unsigned long)(enc_table_32)[i][1];
        intermediate[2] += 2 * (unsigned long)(enc_table_32)[i][2];
    }

    if (intermediate[0] == 0xfad8e &&
        intermediate[1] == 0x9370e68 && intermediate[2] == 0x8125ca08) {
        return 0;
    } else {
        return 1;
    }
}
```


`gcc-14 -mavx2 -O3 main.c`
will correctly produce an a.out that returns 0 when run:
```
main:
        xor     eax, eax
        ret
```
`gcc-15 -mavx2 -O3 main.c`
however, will produce incorrect machine code, where `./a.out` return 1.
```
main:
        mov     eax, 1
        ret
```
This and the faulty pass are conveniently inspectable side-by-side here:
https://godbolt.org/z/bbssefq39

The problem can be narrowed down to the vectorization tree optimization pass,
where it selects WIDEN_MULT_LO_EXPR and WIDEN_MULT_HI_EXPR for a "reduction",
in a way that is only generally valid for a scalar result, not for an
array/vector. Essentially, it confuses the cases described here:
https://gcc.gnu.org/cgit/gcc/tree/gcc/tree-vect-stmts.cc?id=d3e71b99194bff878d3bf3b35f9528a350d10df9#n14154
```
...
  vect__1.9_46 = MEM <const vector(8) unsigned int> [(unsigned int
*)vectp_enc_table_32.7_54];
  vectp_enc_table_32.7_45 = vectp_enc_table_32.7_54 + 32;
  vect__1.10_44 = MEM <const vector(8) unsigned int> [(unsigned int
*)vectp_enc_table_32.7_45];
  vectp_enc_table_32.7_43 = vectp_enc_table_32.7_54 + 64;
  vect__1.11_42 = MEM <const vector(8) unsigned int> [(unsigned int
*)vectp_enc_table_32.7_43];
  vect_patt_57.12_41 = WIDEN_MULT_LO_EXPR <vect__1.9_46, { 2, 2, 2, 2, 2, 2, 2,
2 }>;
  vect_patt_57.12_40 = WIDEN_MULT_HI_EXPR <vect__1.9_46, { 2, 2, 2, 2, 2, 2, 2,
2 }>;
  vect_patt_57.12_39 = WIDEN_MULT_LO_EXPR <vect__1.10_44, { 2, 2, 2, 2, 2, 2,
2, 2 }>;
  vect_patt_57.12_37 = WIDEN_MULT_HI_EXPR <vect__1.10_44, { 2, 2, 2, 2, 2, 2,
2, 2 }>;
  vect_patt_57.12_35 = WIDEN_MULT_LO_EXPR <vect__1.11_42, { 2, 2, 2, 2, 2, 2,
2, 2 }>;
  vect_patt_57.12_32 = WIDEN_MULT_HI_EXPR <vect__1.11_42, { 2, 2, 2, 2, 2, 2,
2, 2 }>;
  vect__4.14_13 = vect_patt_57.12_41 + vect_intermediate$0_38.13_31;
  vect__4.14_51 = vect_patt_57.12_40 + vect_intermediate$0_38.13_30;
  vect__4.14_52 = vect_patt_57.12_39 + vect_intermediate$0_38.13_18;
  vect__4.14_49 = vect_patt_57.12_37 + vect_intermediate$0_38.13_17;
  vect__4.14_50 = vect_patt_57.12_35 + vect_intermediate$0_38.13_16;
  vect__4.14_47 = vect_patt_57.12_32 + vect_intermediate$0_38.13_14;
...
  _68 = BIT_FIELD_REF <vect__4.14_48, 64, 0>;
  _69 = BIT_FIELD_REF <vect__4.14_48, 64, 64>;
  _70 = BIT_FIELD_REF <vect__4.14_48, 64, 128>;
  _71 = BIT_FIELD_REF <vect__4.14_48, 64, 192>;
  _72 = BIT_FIELD_REF <vect__4.14_63, 64, 0>;
  _73 = BIT_FIELD_REF <vect__4.14_63, 64, 64>;
  _74 = BIT_FIELD_REF <vect__4.14_63, 64, 128>;
  _75 = BIT_FIELD_REF <vect__4.14_63, 64, 192>;
  _76 = BIT_FIELD_REF <vect__4.14_64, 64, 0>;
  _77 = BIT_FIELD_REF <vect__4.14_64, 64, 64>;
  _78 = BIT_FIELD_REF <vect__4.14_64, 64, 128>;
  _79 = BIT_FIELD_REF <vect__4.14_64, 64, 192>;
  _80 = BIT_FIELD_REF <vect__4.14_65, 64, 0>;
  _81 = BIT_FIELD_REF <vect__4.14_65, 64, 64>;
  _82 = BIT_FIELD_REF <vect__4.14_65, 64, 128>;
  _83 = BIT_FIELD_REF <vect__4.14_65, 64, 192>;
  _84 = BIT_FIELD_REF <vect__4.14_66, 64, 0>;
  _85 = BIT_FIELD_REF <vect__4.14_66, 64, 64>;
  _86 = BIT_FIELD_REF <vect__4.14_66, 64, 128>;
  _87 = BIT_FIELD_REF <vect__4.14_66, 64, 192>;
  _88 = BIT_FIELD_REF <vect__4.14_67, 64, 0>;
  _89 = BIT_FIELD_REF <vect__4.14_67, 64, 64>;
  _90 = BIT_FIELD_REF <vect__4.14_67, 64, 128>;
  _91 = BIT_FIELD_REF <vect__4.14_67, 64, 192>;
  _92 = _68 + _71;
  _93 = _69 + _72;
  _94 = _70 + _73;
  _95 = _92 + _74;
  _96 = _93 + _75;
  _97 = _94 + _76;
  _98 = _95 + _77;
  _99 = _96 + _78;
  _100 = _97 + _79;
  _101 = _98 + _80;
  _102 = _99 + _81;
  _103 = _100 + _82;
  _104 = _101 + _83;
  _105 = _102 + _84;
  _106 = _103 + _85;
  _107 = _104 + _86;
  _108 = _105 + _87;
  _109 = _106 + _88;
  _110 = _107 + _89;
  _111 = _108 + _90;
  _112 = _109 + _91;
  _25 = _111 == 154603112;
  _24 = _110 == 1027470;
  _23 = _24 & _25;
  _26 = _112 == 2166737416;
  _27 = _23 & _26;
  _28 = ~_27;
  _29 = (int) _28;
  return _29;
```

This lines up with the bisect result where the bug is first triggered on this
commit: d3e71b99194bff878d3bf3b35f9528a350d10df9 /
https://inbox.sourceware.org/gcc-patches/[email protected]/T/

Reply via email to