https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123002
Bug ID: 123002
Summary: WIDEN_MULT_EXPR are incorrectly vectorized since
gcc-15
Product: gcc
Version: 15.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: liam at seine dot email
Target Milestone: ---
The vectorization tree optimization has a bug that leads to gcc producing
incorrect code for widening multiplications.
Incorrect code is produced for C inputs as simple as:
main.c:
```
static unsigned int const enc_table_32[8][3] = {
{513735U, 77223048U, 437087610U },
{0U, 78508U, 646269101U },
{0U, 0U, 11997U, },
{0U, 0U, 0U, },
{0U, 0U, 0U, },
{0U, 0U, 0U, },
{0U, 0U, 0U, },
{0U, 0U, 0U, }};
int main() {
unsigned long intermediate[3] = {0};
for (unsigned long i = 0UL; i < 8; i++) {
intermediate[0] += 2 * (unsigned long)(enc_table_32)[i][0];
intermediate[1] += 2 * (unsigned long)(enc_table_32)[i][1];
intermediate[2] += 2 * (unsigned long)(enc_table_32)[i][2];
}
if (intermediate[0] == 0xfad8e &&
intermediate[1] == 0x9370e68 && intermediate[2] == 0x8125ca08) {
return 0;
} else {
return 1;
}
}
```
`gcc-14 -mavx2 -O3 main.c`
will correctly produce an a.out that returns 0 when run:
```
main:
xor eax, eax
ret
```
`gcc-15 -mavx2 -O3 main.c`
however, will produce incorrect machine code, where `./a.out` return 1.
```
main:
mov eax, 1
ret
```
This and the faulty pass are conveniently inspectable side-by-side here:
https://godbolt.org/z/bbssefq39
The problem can be narrowed down to the vectorization tree optimization pass,
where it selects WIDEN_MULT_LO_EXPR and WIDEN_MULT_HI_EXPR for a "reduction",
in a way that is only generally valid for a scalar result, not for an
array/vector. Essentially, it confuses the cases described here:
https://gcc.gnu.org/cgit/gcc/tree/gcc/tree-vect-stmts.cc?id=d3e71b99194bff878d3bf3b35f9528a350d10df9#n14154
```
...
vect__1.9_46 = MEM <const vector(8) unsigned int> [(unsigned int
*)vectp_enc_table_32.7_54];
vectp_enc_table_32.7_45 = vectp_enc_table_32.7_54 + 32;
vect__1.10_44 = MEM <const vector(8) unsigned int> [(unsigned int
*)vectp_enc_table_32.7_45];
vectp_enc_table_32.7_43 = vectp_enc_table_32.7_54 + 64;
vect__1.11_42 = MEM <const vector(8) unsigned int> [(unsigned int
*)vectp_enc_table_32.7_43];
vect_patt_57.12_41 = WIDEN_MULT_LO_EXPR <vect__1.9_46, { 2, 2, 2, 2, 2, 2, 2,
2 }>;
vect_patt_57.12_40 = WIDEN_MULT_HI_EXPR <vect__1.9_46, { 2, 2, 2, 2, 2, 2, 2,
2 }>;
vect_patt_57.12_39 = WIDEN_MULT_LO_EXPR <vect__1.10_44, { 2, 2, 2, 2, 2, 2,
2, 2 }>;
vect_patt_57.12_37 = WIDEN_MULT_HI_EXPR <vect__1.10_44, { 2, 2, 2, 2, 2, 2,
2, 2 }>;
vect_patt_57.12_35 = WIDEN_MULT_LO_EXPR <vect__1.11_42, { 2, 2, 2, 2, 2, 2,
2, 2 }>;
vect_patt_57.12_32 = WIDEN_MULT_HI_EXPR <vect__1.11_42, { 2, 2, 2, 2, 2, 2,
2, 2 }>;
vect__4.14_13 = vect_patt_57.12_41 + vect_intermediate$0_38.13_31;
vect__4.14_51 = vect_patt_57.12_40 + vect_intermediate$0_38.13_30;
vect__4.14_52 = vect_patt_57.12_39 + vect_intermediate$0_38.13_18;
vect__4.14_49 = vect_patt_57.12_37 + vect_intermediate$0_38.13_17;
vect__4.14_50 = vect_patt_57.12_35 + vect_intermediate$0_38.13_16;
vect__4.14_47 = vect_patt_57.12_32 + vect_intermediate$0_38.13_14;
...
_68 = BIT_FIELD_REF <vect__4.14_48, 64, 0>;
_69 = BIT_FIELD_REF <vect__4.14_48, 64, 64>;
_70 = BIT_FIELD_REF <vect__4.14_48, 64, 128>;
_71 = BIT_FIELD_REF <vect__4.14_48, 64, 192>;
_72 = BIT_FIELD_REF <vect__4.14_63, 64, 0>;
_73 = BIT_FIELD_REF <vect__4.14_63, 64, 64>;
_74 = BIT_FIELD_REF <vect__4.14_63, 64, 128>;
_75 = BIT_FIELD_REF <vect__4.14_63, 64, 192>;
_76 = BIT_FIELD_REF <vect__4.14_64, 64, 0>;
_77 = BIT_FIELD_REF <vect__4.14_64, 64, 64>;
_78 = BIT_FIELD_REF <vect__4.14_64, 64, 128>;
_79 = BIT_FIELD_REF <vect__4.14_64, 64, 192>;
_80 = BIT_FIELD_REF <vect__4.14_65, 64, 0>;
_81 = BIT_FIELD_REF <vect__4.14_65, 64, 64>;
_82 = BIT_FIELD_REF <vect__4.14_65, 64, 128>;
_83 = BIT_FIELD_REF <vect__4.14_65, 64, 192>;
_84 = BIT_FIELD_REF <vect__4.14_66, 64, 0>;
_85 = BIT_FIELD_REF <vect__4.14_66, 64, 64>;
_86 = BIT_FIELD_REF <vect__4.14_66, 64, 128>;
_87 = BIT_FIELD_REF <vect__4.14_66, 64, 192>;
_88 = BIT_FIELD_REF <vect__4.14_67, 64, 0>;
_89 = BIT_FIELD_REF <vect__4.14_67, 64, 64>;
_90 = BIT_FIELD_REF <vect__4.14_67, 64, 128>;
_91 = BIT_FIELD_REF <vect__4.14_67, 64, 192>;
_92 = _68 + _71;
_93 = _69 + _72;
_94 = _70 + _73;
_95 = _92 + _74;
_96 = _93 + _75;
_97 = _94 + _76;
_98 = _95 + _77;
_99 = _96 + _78;
_100 = _97 + _79;
_101 = _98 + _80;
_102 = _99 + _81;
_103 = _100 + _82;
_104 = _101 + _83;
_105 = _102 + _84;
_106 = _103 + _85;
_107 = _104 + _86;
_108 = _105 + _87;
_109 = _106 + _88;
_110 = _107 + _89;
_111 = _108 + _90;
_112 = _109 + _91;
_25 = _111 == 154603112;
_24 = _110 == 1027470;
_23 = _24 & _25;
_26 = _112 == 2166737416;
_27 = _23 & _26;
_28 = ~_27;
_29 = (int) _28;
return _29;
```
This lines up with the bisect result where the bug is first triggered on this
commit: d3e71b99194bff878d3bf3b35f9528a350d10df9 /
https://inbox.sourceware.org/gcc-patches/[email protected]/T/