[Bug tree-optimization/88064] [9 Regression] Incorrect vectorizer over_widening pattern handling

rsandifo at gcc dot gnu.org Sat, 01 Dec 2018 09:36:01 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88064


--- Comment #2 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
(In reply to Jakub Jelinek from comment #0)
> Something I've discovered by code inspection:
> 
> int a[64], b[64], c[64];
> 
> void
> foo ()
> {
>   int i;
>   for (i = 0; i < 64; i++)
>     {
>       long long d = a[i];
>       long long e = b[i];
>       d += e;
>       c[i] = d;
>     }
> }
> 
> with -O3 -fno-tree-forwprop -fno-tree-vrp might introduce UB into the source
> where there was none.
> In *.ifcvt we have:
>   _1 = a[i_14];
>   d_7 = (long long int) _1;
>   _2 = b[i_14];
>   e_8 = (long long int) _2;
>   d_9 = d_7 + e_8;
>   _3 = (int) d_9;
>   c[i_14] = _3;
> suppose int is 32-bit, long long 64-bit and a[i] is always INT_MAX and b[i]
> 1.
> The original program doesn't invoke UB, because the addition is done in long
> long int type.  Now, GCC 9 vectorizes this as:
>   vector(4) int vect__1.6;
>   vector(4) int vect__2.9;
>   vector(4) signed int vect_patt_25.10;
>   vector(4) int vect_patt_23.11;
> 
>   vect__1.6_19 = MEM[(int *)vectp_a.4_21];
>   vect__2.9_16 = MEM[(int *)vectp_b.7_18];
>   vect_patt_25.10_13 = vect__1.6_19 + vect__2.9_16;
>   vect_patt_23.11_12 = VIEW_CONVERT_EXPR<vector(4) int>(vect_patt_25.10_13);
>   MEM[(int *)vectp_c.12_5] = vect_patt_23.11_12;
> The VCE is weird, why do we have special vectors of signed int vs. int?
This is because the types for the narrowed addition are created by
build_nonstandard_integer_type, which happens to give signed int
rather than int.  The final gimple IL gives the misleading impression
that vect_patt_23 only exists to convert from "signed int" to "int",
and wouldn't exist if the two types were the same.  But vect_patt_23
is really just a side-effect of the way pattern statements are handled.
Once we narrow the addition, we still need *something* for the original
demotion cast.  So what we use is a nop pattern statement.  The VCE is
the vectorisation of that nop.

We could teach vectorizable_conversion to avoid the VCE for
nop casts.  But a nicer fix would be to allow the demotion
statement to be replaced directly by the result of the
addition, rather than have a dummy statement inbetween.
I've got some WIP patches for extending loads and truncating
stores that would allow this.

> But, more importantly, of course the addition in this case needs to be done
> in in vector(4) unsigned int type, because we've demoted it from the original
> (unless we can prove e.g. through ranges that undefined behavior will not be
> triggered).
Testing a fix for this.

[Bug tree-optimization/88064] [9 Regression] Incorrect vectorizer over_widening pattern handling

Reply via email to