https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88064
--- Comment #2 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #0) > Something I've discovered by code inspection: > > int a[64], b[64], c[64]; > > void > foo () > { > int i; > for (i = 0; i < 64; i++) > { > long long d = a[i]; > long long e = b[i]; > d += e; > c[i] = d; > } > } > > with -O3 -fno-tree-forwprop -fno-tree-vrp might introduce UB into the source > where there was none. > In *.ifcvt we have: > _1 = a[i_14]; > d_7 = (long long int) _1; > _2 = b[i_14]; > e_8 = (long long int) _2; > d_9 = d_7 + e_8; > _3 = (int) d_9; > c[i_14] = _3; > suppose int is 32-bit, long long 64-bit and a[i] is always INT_MAX and b[i] > 1. > The original program doesn't invoke UB, because the addition is done in long > long int type. Now, GCC 9 vectorizes this as: > vector(4) int vect__1.6; > vector(4) int vect__2.9; > vector(4) signed int vect_patt_25.10; > vector(4) int vect_patt_23.11; > > vect__1.6_19 = MEM[(int *)vectp_a.4_21]; > vect__2.9_16 = MEM[(int *)vectp_b.7_18]; > vect_patt_25.10_13 = vect__1.6_19 + vect__2.9_16; > vect_patt_23.11_12 = VIEW_CONVERT_EXPR<vector(4) int>(vect_patt_25.10_13); > MEM[(int *)vectp_c.12_5] = vect_patt_23.11_12; > The VCE is weird, why do we have special vectors of signed int vs. int? This is because the types for the narrowed addition are created by build_nonstandard_integer_type, which happens to give signed int rather than int. The final gimple IL gives the misleading impression that vect_patt_23 only exists to convert from "signed int" to "int", and wouldn't exist if the two types were the same. But vect_patt_23 is really just a side-effect of the way pattern statements are handled. Once we narrow the addition, we still need *something* for the original demotion cast. So what we use is a nop pattern statement. The VCE is the vectorisation of that nop. We could teach vectorizable_conversion to avoid the VCE for nop casts. But a nicer fix would be to allow the demotion statement to be replaced directly by the result of the addition, rather than have a dummy statement inbetween. I've got some WIP patches for extending loads and truncating stores that would allow this. > But, more importantly, of course the addition in this case needs to be done > in in vector(4) unsigned int type, because we've demoted it from the original > (unless we can prove e.g. through ranges that undefined behavior will not be > triggered). Testing a fix for this.