On March 8, 2019 5:57:35 PM GMT+01:00, Martin Jambor <mjam...@suse.cz> wrote: >Hi, > >BTW, we have dropped the gcc-patches from the CC quite some emails ago >:-/ > >On Fri, Mar 08 2019, Richard Biener wrote: >> yOn Thu, 7 Mar 2019, Martin Jambor wrote: >> >>> Hello again, >>> >>> On Thu, Mar 07 2019, Martin Jambor wrote: >>> > Hi, >>> > >>> > On Thu, Mar 07 2019, Richard Biener wrote: >>> >> >>> >> So possibly we should restrict total scalarization to the case >>> >> where all accesses are layout compatible (that is, >>> >> types_compatible_p)? >>> >> >>> > >>> > That would be the patch below (currently undergoing testing). It >is >>> > slightly more strict than mine but on the three testcases in >question, >>> > it gives the same results too. People memcpying their aggregates >>> > around, even when not doing anything silly like storing stuff in >>> > padding, will find their code slower but arguably closer to what >they >>> > wanted. >>> >>> So, testing revealed that (unlike my previous version) this breaks >two >>> guality tests where we depend on total scalarization to convey to >the >>> debugger what is in bits of an aggregate. >>> >>> In gcc.dg/guality/pr54970.c, the array a is no longer totally >scalarized >>> because it is memcpied into and gdb no longer knows what constant >lurks >>> in a[0], which is otherwise unused. Probably not a big deal. >>> >>> In gcc.dg/guality/pr59776.c, where the ain type and function are: >>> >>> struct S { float f, g; }; >>> >>> __attribute__((noinline, noclone)) void >>> foo (struct S *p) >>> { >>> struct S s1, s2; /* { dg-final { >gdb-test pr59776.c:17 "s1.f" "5.0" } } */ >>> s1 = *p; /* { dg-final { >gdb-test pr59776.c:17 "s1.g" "6.0" } } */ >>> s2 = s1; /* { dg-final { >gdb-test pr59776.c:17 "s2.f" "0.0" } } */ >>> *(int *) &s2.f = 0; /* { dg-final { >gdb-test pr59776.c:17 "s2.g" "6.0" } } */ >>> asm volatile (NOP : : : "memory"); /* { dg-final { >gdb-test pr59776.c:20 "s1.f" "5.0" } } */ >>> asm volatile (NOP : : : "memory"); /* { dg-final { >gdb-test pr59776.c:20 "s1.g" "6.0" } } */ >>> s2 = s1; /* { dg-final { >gdb-test pr59776.c:20 "s2.f" "5.0" } } */ >>> asm volatile (NOP : : : "memory"); /* { dg-final { >gdb-test pr59776.c:20 "s2.g" "6.0" } } */ >>> asm volatile (NOP : : : "memory"); >>> } >>> >>> the zeroing through integer type caused that s2 variable is no >longer >>> totally scalarized and thus somehow opaque for the debugger. I >don't >>> think this particular test is a big concern either, but such scalar >type >>> casted accesses were exactly the reason why I initially decided >against >>> completely disabling total scalarization when I see a >types-incompatible >>> access. >>> >>> In any event, although I would still slightly prefer my patch from >>> yesterday (or at least tolerating scalar type-incompatible >accesses), I >>> am also happy with the one from today, provided I have permission to >>> XFAIL the guality tests. >> >> I guess I'd play safe at this point and do less adjustments. The >> second patch looks nicer, with >> >> static bool >> -contains_vce_or_bfcref_p (const_tree ref) >> +contains_vce_or_bfcref_p (const_tree ref, bool *type_changing_p = >NULL) > >OK > >> >> (a default for type_chagning_p) there would be even less hunks. Is >> it functionally equivalent to the first variant when you change >> the types_compatible_p back to the TYPE_MAIN_VARIANT compare? >> That is, is this the only real difference? > >I probably don't understand the question because I'd say that the >second >guality failure shows there is a difference between being careful we >just don't loose any data in padding and disabling total scalarization >altogether. > >To give a non-guality example, the following will regress with the >yesterday's variant - compared to current trunk: > > struct S {unsigned a,b;}; > > void foo (struct S *in, struct S *out) > { > struct S s; > s = *in; > *(int *)&s.b = -1; > *out = s; > } > >Currently s is totally scalarized away, but if we "restrict total >scalarization to the case where all accesses are layout compatible," it >of course isn't and s survives all the way to optimized dump, whereas >currently we eliminate it even in early SRA. > >Even the assembly output changes from: > > movl (%rdi), %eax > movl %eax, (%rsi) > movl $-1, 4(%rsi) > ret > >to a bit unexpected: > > movabsq $-4294967296, %rax > orq (%rdi), %rax > movq %rax, (%rsi) > ret > >So I guess now I am even more convinced about what I wrote yesterday, >i.e. that if we should at least allow the scalar incompatible types. >The reason why this is safe is that when a scalar access is not a leaf >of an access tree, scalarization of that portion of the aggregate fails >(and thus total scalarization of the entire aggregate fails which we >notice because that flag is &= propagated up in the access tree). > >Therefore, for total scalarization to proceed, a scalar access in a >weird type must either exactly match a field/element of the declared >type of the variable or entirely fit into some padding. And in the >latter case it is recorded appropriately and data will be copied in/out >carefully. > >Patch doing that is below (currently undergoing bootstrap/testing). >Or we can just focus on the padding which is the only thing known to >cause problems and go with the Wednesday version of the patch.
The patch below is OK. Richard. >Martin > > >2019-03-08 Martin Jambor <mjam...@suse.cz> > > PR tree-optimization/85762 > PR tree-optimization/87008 > PR tree-optimization/85459 > * tree-sra.c (contains_vce_or_bfcref_p): New parameter, set the bool > it points to if there is a type changing MEM_REF. Adjust all callers. > (build_accesses_from_assign): Disable total scalarization if > contains_vce_or_bfcref_p returns true through the new parameter, for > both rhs and lhs. > > testsuite/ > * g++.dg/tree-ssa/pr87008.C: New test. > * gcc.dg/guality/pr54970.c: Xfail tests querying a[0] everywhere. >--- > gcc/testsuite/g++.dg/tree-ssa/pr87008.C | 17 ++++++++++++ > gcc/testsuite/gcc.dg/guality/pr54970.c | 6 ++--- > gcc/tree-sra.c | 36 ++++++++++++++++++------- > 3 files changed, 47 insertions(+), 12 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr87008.C > >diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr87008.C >b/gcc/testsuite/g++.dg/tree-ssa/pr87008.C >new file mode 100644 >index 00000000000..eef521f9ad5 >--- /dev/null >+++ b/gcc/testsuite/g++.dg/tree-ssa/pr87008.C >@@ -0,0 +1,17 @@ >+/* { dg-do compile } */ >+/* { dg-options "-O2 -fdump-tree-optimized" } */ >+ >+extern void dontcallthis(); >+ >+struct A { long a, b; }; >+struct B : A {}; >+template<class T>void cp(T&a,T const&b){a=b;} >+long f(B x){ >+ B y; cp<A>(y,x); >+ B z; cp<A>(z,x); >+ if (y.a - z.a) >+ dontcallthis (); >+ return 0; >+} >+ >+/* { dg-final { scan-tree-dump-not "dontcallthis" "optimized" } } */ >diff --git a/gcc/testsuite/gcc.dg/guality/pr54970.c >b/gcc/testsuite/gcc.dg/guality/pr54970.c >index 5d32af07c32..2e0bc5784a9 100644 >--- a/gcc/testsuite/gcc.dg/guality/pr54970.c >+++ b/gcc/testsuite/gcc.dg/guality/pr54970.c >@@ -8,17 +8,17 @@ > int > main () > { >- int a[] = { 1, 2, 3 }; /* { dg-final { gdb-test .+4 "a\[0\]" "1" } } >*/ >+ int a[] = { 1, 2, 3 }; /* { dg-final { gdb-test .+4 "a\[0\]" "1" { >xfail { *-*-* } } } } */ > int *p = a + 2; /* { dg-final { gdb-test .+3 "a\[1\]" "2" } } */ > int *q = a + 1; /* { dg-final { gdb-test .+2 "a\[2\]" "3" } } */ > /* { dg-final { gdb-test .+1 "*p" "3" } } */ > asm volatile (NOP); /* { dg-final { gdb-test . "*q" "2" } } */ >- *p += 10; /* { dg-final { gdb-test .+4 "a\[0\]" "1" } } */ >+ *p += 10; /* { dg-final { gdb-test .+4 "a\[0\]" "1" { >xfail { >*-*-* } } } } */ > /* { dg-final { gdb-test .+3 "a\[1\]" "2" } } */ > /* { dg-final { gdb-test .+2 "a\[2\]" "13" } } > */ > /* { dg-final { gdb-test .+1 "*p" "13" } } */ > asm volatile (NOP); /* { dg-final { gdb-test . "*q" "2" } } */ >- *q += 10; /* { dg-final { gdb-test .+4 "a\[0\]" "1" } } */ >+ *q += 10; /* { dg-final { gdb-test .+4 "a\[0\]" "1" { >xfail { >*-*-* } } } } */ > /* { dg-final { gdb-test .+3 "a\[1\]" "12" } } > */ > /* { dg-final { gdb-test .+2 "a\[2\]" "13" } } > */ > /* { dg-final { gdb-test .+1 "*p" "13" } } */ >diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c >index eeef31ba496..ca3858d5fc7 100644 >--- a/gcc/tree-sra.c >+++ b/gcc/tree-sra.c >@@ -1150,29 +1150,36 @@ contains_view_convert_expr_p (const_tree ref) > return false; > } > >-/* Return true if REF contains a VIEW_CONVERT_EXPR or a MEM_REF that >performs >- type conversion or a COMPONENT_REF with a bit-field field >declaration. */ >+/* Return true if REF contains a VIEW_CONVERT_EXPR or a COMPONENT_REF >with a >+ bit-field field declaration. If TYPE_CHANGING_P is non-NULL, set >the bool >+ it points to will be set if REF contains any of the above or a >MEM_REF >+ expression that effectively performs type conversion. */ > > static bool >-contains_vce_or_bfcref_p (const_tree ref) >+contains_vce_or_bfcref_p (const_tree ref, bool *type_changing_p = >NULL) > { > while (handled_component_p (ref)) > { > if (TREE_CODE (ref) == VIEW_CONVERT_EXPR > || (TREE_CODE (ref) == COMPONENT_REF > && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))) >- return true; >+ { >+ if (type_changing_p) >+ *type_changing_p = true; >+ return true; >+ } > ref = TREE_OPERAND (ref, 0); > } > >- if (TREE_CODE (ref) != MEM_REF >+ if (!type_changing_p >+ || TREE_CODE (ref) != MEM_REF > || TREE_CODE (TREE_OPERAND (ref, 0)) != ADDR_EXPR) > return false; > > tree mem = TREE_OPERAND (TREE_OPERAND (ref, 0), 0); > if (TYPE_MAIN_VARIANT (TREE_TYPE (ref)) > != TYPE_MAIN_VARIANT (TREE_TYPE (mem))) >- return true; >+ *type_changing_p = true; > > return false; > } >@@ -1368,15 +1375,26 @@ build_accesses_from_assign (gimple *stmt) > lacc->grp_assignment_write = 1; > if (storage_order_barrier_p (rhs)) > lacc->grp_unscalarizable_region = 1; >+ >+ if (should_scalarize_away_bitmap && !is_gimple_reg_type >(lacc->type)) >+ { >+ bool type_changing_p = false; >+ contains_vce_or_bfcref_p (lhs, &type_changing_p); >+ if (type_changing_p) >+ bitmap_set_bit (cannot_scalarize_away_bitmap, >+ DECL_UID (lacc->base)); >+ } > } > > if (racc) > { > racc->grp_assignment_read = 1; >- if (should_scalarize_away_bitmap && !gimple_has_volatile_ops >(stmt) >- && !is_gimple_reg_type (racc->type)) >+ if (should_scalarize_away_bitmap && !is_gimple_reg_type >(racc->type)) > { >- if (contains_vce_or_bfcref_p (rhs)) >+ bool type_changing_p = false; >+ contains_vce_or_bfcref_p (rhs, &type_changing_p); >+ >+ if (type_changing_p || gimple_has_volatile_ops (stmt)) > bitmap_set_bit (cannot_scalarize_away_bitmap, > DECL_UID (racc->base)); > else