Hi!

On 2022-10-17T15:58:47+0200, Aldy Hernandez <al...@redhat.com> wrote:
> On Mon, Oct 17, 2022 at 9:44 AM Thomas Schwinge <tho...@codesourcery.com> 
> wrote:
>> On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches 
>> <gcc-patches@gcc.gnu.org> wrote:
>> > When solving 0 = _15 & 1, we calculate _15 as:
>> >
>> >       [irange] int [-INF, -2][0, +INF] NONZERO 0xfffffffe
>> >
>> > The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with
>> > the above, yielding:
>> >
>> >       [0, 1] NONZERO 0x0
>> >
>> > This eventually gets copied to a _Bool [0, 1] NONZERO 0x0.
>> >
>> > This is problematic because here we have a bool which is zero, but
>> > returns false for irange::zero_p, since the latter does not look at
>> > nonzero bits.  This causes logical_combine to assume the range is
>> > not-zero, and all hell breaks loose.
>> >
>> > I think we should just normalize a nonzero mask of 0 to [0, 0] at
>> > creation, thus avoiding all this.
>>
>> 1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
>> "[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx
>> offloading test case:
>>
>>     UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
>>     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
>> (test for excess errors)
>>     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
>> execution test
>>     [-PASS:-]{+FAIL:+} 
>> libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   
>> scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* 
>> [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"
>>
>> Same for C++.
>>
>> I'll later send a patch (for the test case!) to fix that up.
>>
>> 2. Looking into this, I found that this
>> commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
>> "[PR107195] Set range to zero when nonzero mask is 0" actually enables a
>> code transformation/optimization that GCC apparently has not been doing
>> before!  I've tried to capture that in the attached
>> "Add 'c-c++-common/torture/pr107195-1.c' [PR107195]".
>
> Nice.
>
>> Will you please verify that one?  In its current '#if 1' configuration,
>> it's all-PASS after commit
>> r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
>> "[PR107195] Set range to zero when nonzero mask is 0", whereas before, we
>> get two calls to 'foo', because GCC apparently didnn't understand the
>> relation (optimization opportunity) between 'r *= 2;' and the subsequent
>> 'if (r & 1)'.
>
> Yeah, that looks correct.  We keep better track of nonzero masks.

OK, next observation: this also works for split-up expressions
'if ((r & 2) && (r & 1))' (same rationale as for 'if (r & 1)' alone).
I've added such a variant in my test case.

But: it doesn't work for logically equal 'if (r & 3)'.  I've added such
an XFAILed variant in my test case.  Do you have guidance what needs to
be done to make such cases work, too?

>> I've left in the other '#if' variants in case you'd like to experiment
>> with these, but would otherwise clean that up before pushing.
>>
>> Where does one put such a test case?
>>
>> Should the file be named 'pr107195' or something else?
>
> The aforementioned patch already has:
>
>             * gcc.dg/tree-ssa/pr107195-1.c: New test.
>             * gcc.dg/tree-ssa/pr107195-2.c: New test.
>
> So I would just add a pr107195-3.c test.

But note that unlike yours in 'gcc.dg/tree-ssa/', I had put mine into
'c-c++-common/torture/'.  That's so that we get C and C++ testing, and
all torture testing flag variants.  (... where we do see the optimization
happen starting at '-O1'.)  Do you think that is excessive, and a single
'gcc.dg/tree-ssa/' test case, C only, '-O1' only is sufficient for this?
(I don't have much experience with test cases in such regions of GCC,
hence these questions.)

>> Do we scan 'optimized', or an earlier dump?
>>
>> At '-O1', the actual code transformation is visible already in the 'dom2'
>> dump:
>>
>>        <bb 3> [local count: 536870913]:
>>        gimple_assign <mult_expr, r_7, r_6(D), 2, NULL>
>>     +  gimple_assign <bit_and_expr, _11, r_7, 1, NULL>
>>     +  goto <bb 6>; [100.00%]
>>
>>     -  <bb 4> [local count: 1073741824]:
>>     -  # gimple_phi <r_4, r_6(D)(2), r_7(3)>
>>     +  <bb 4> [local count: 536870912]:
>>     +  # gimple_phi <r_4, r_6(D)(2)>
>>        gimple_assign <bit_and_expr, _2, r_4, 1, NULL>
>>        gimple_cond <ne_expr, _2, 0, NULL, NULL>
>>     -    goto <bb 5>; [50.00%]
>>     +    goto <bb 5>; [100.00%]
>>        else
>>     -    goto <bb 6>; [50.00%]
>>     +    goto <bb 6>; [0.00%]
>>
>>        <bb 5> [local count: 536870913]:
>>        gimple_call <foo, _3, r_4>
>>        gimple_assign <plus_expr, r_8, _3, r_4, NULL>
>>
>>        <bb 6> [local count: 1073741824]:
>>     -  # gimple_phi <r_5, r_4(4), r_8(5)>
>>     +  # gimple_phi <r_5, r_4(4), r_8(5), r_7(3)>
>>        gimple_return <r_5>
>>
>> And, the actual "avoid second call 'foo'" optimization is visiable
>> starting 'dom3':
>>
>>        <bb 3> [local count: 536870913]:
>>        gimple_assign <mult_expr, r_7, r_6(D), 2, NULL>
>>     +  goto <bb 6>; [100.00%]
>>
>>     -  <bb 4> [local count: 1073741824]:
>>     -  # gimple_phi <r_4, r_6(D)(2), r_7(3)>
>>     -  gimple_assign <bit_and_expr, _2, r_4, 1, NULL>
>>     +  <bb 4> [local count: 536870912]:
>>     +  gimple_assign <bit_and_expr, _2, r_6(D), 1, NULL>
>>        gimple_cond <ne_expr, _2, 0, NULL, NULL>
>>     -    goto <bb 5>; [50.00%]
>>     +    goto <bb 5>; [100.00%]
>>        else
>>     -    goto <bb 6>; [50.00%]
>>     +    goto <bb 6>; [0.00%]
>>
>>        <bb 5> [local count: 536870913]:
>>     -  gimple_call <foo, _3, r_4>
>>     -  gimple_assign <plus_expr, r_8, _3, r_4, NULL>
>>     +  gimple_assign <integer_cst, _3, 0, NULL, NULL>
>>     +  gimple_assign <ssa_name, r_8, r_6(D), NULL, NULL>
>>
>>        <bb 6> [local count: 1073741824]:
>>     -  # gimple_phi <r_5, r_4(4), r_8(5)>
>>     +  # gimple_phi <r_5, r_6(D)(4), r_6(D)(5), r_7(3)>
>>        gimple_return <r_5>
>>
>> ..., but I don't know if either of those would be stable/appropriate to
>> scan instead of 'optimized'?
>
> IMO, either dom3 or optimized is fine.

OK, I left it at 'optimized', as I don't have any rationale why exactly
it should happen in 'dom3' already.  ;-)


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 8b1a33933002ec9e9accdb41229b9b19a431b66a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tho...@codesourcery.com>
Date: Mon, 17 Oct 2022 09:10:03 +0200
Subject: [PATCH] Add 'c-c++-common/torture/pr107195-1.c' [PR107195]

... to display optimization performed as of recent
commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
"[PR107195] Set range to zero when nonzero mask is 0".

	PR tree-optimization/107195
	gcc/testsuite/
	* c-c++-common/torture/pr107195-1.c: New.
---
 .../c-c++-common/torture/pr107195-1.c         | 78 +++++++++++++++++++
 1 file changed, 78 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/torture/pr107195-1.c

diff --git a/gcc/testsuite/c-c++-common/torture/pr107195-1.c b/gcc/testsuite/c-c++-common/torture/pr107195-1.c
new file mode 100644
index 00000000000..477eed6e959
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/torture/pr107195-1.c
@@ -0,0 +1,78 @@
+/* Inspired by 'libgomp.oacc-c-c++-common/nvptx-sese-1.c'.  */
+
+/* { dg-additional-options -fdump-tree-optimized-raw }
+   { dg-skip-if {} { *-*-* } { {-flto -fno-fat-lto-objects} } { } } */
+
+extern int
+__attribute__((const))
+foo1 (int);
+
+int f1 (int r)
+{
+  if (foo1 (r)) /* If this first 'if' holds...  */
+    r *= 2;  /* ..., 'r' now has a zero-value lower-most bit...  */
+
+  if (r & 1) /* ..., so this second 'if' can never hold...  */
+    { /* ..., so this is unreachable.  */
+      /* In constrast, if the first 'if' does not hold ('foo1 (r) == 0'), the
+	 second 'if' may hold, but we know ('foo1' being 'const') that
+	 'foo1 (r) == 0', so don't have to re-evaluate it here: */
+      r += foo1 (r);
+      /* Thus, if optimizing, we only ever expect one call of 'foo1'.
+	 { dg-final { scan-tree-dump-times {gimple_call <foo1,} 1 optimized { target __OPTIMIZE__ } } }
+	 { dg-final { scan-tree-dump-times {gimple_call <foo1,} 2 optimized { target { ! __OPTIMIZE__ } } } }
+      */
+    }
+
+  return r;
+}
+
+
+extern int
+__attribute__((const))
+foo2 (int);
+
+int f2 (int r)
+{
+  if (foo2 (r)) /* If this first 'if' holds...  */
+    r *= 2;  /* ..., 'r' now has a zero-value lower-most bit...  */
+
+  if ((r & 2) && (r & 1)) /* ..., so 'r & 1' in this second 'if' can never hold...  */
+    { /* ..., so this is unreachable.  */
+      /* In constrast, if the first 'if' does not hold ('foo2 (r) == 0'), the
+	 second 'if' may hold, but we know ('foo2' being 'const') that
+	 'foo2 (r) == 0', so don't have to re-evaluate it here: */
+      r += foo2 (r);
+      /* Thus, if optimizing, we only ever expect one call of 'foo2'.
+	 { dg-final { scan-tree-dump-times {gimple_call <foo2,} 1 optimized { target __OPTIMIZE__ } } }
+	 { dg-final { scan-tree-dump-times {gimple_call <foo2,} 2 optimized { target { ! __OPTIMIZE__ } } } }
+      */
+    }
+
+  return r;
+}
+
+
+extern int
+__attribute__((const))
+foo3 (int);
+
+int f3 (int r)
+{
+  if (foo3 (r)) /* If this first 'if' holds...  */
+    r *= 2;  /* ..., 'r' now has a zero-value lower-most bit...  */
+
+  if (r & 3) /* ..., so this second 'if' can never hold...  */
+    { /* ..., so this is unreachable.  */
+      /* In constrast, if the first 'if' does not hold ('foo3 (r) == 0'), the
+	 second 'if' may hold, but we know ('foo3' being 'const') that
+	 'foo3 (r) == 0', so don't have to re-evaluate it here: */
+      r += foo3 (r);
+      /* Thus, if optimizing, we only ever expect one call of 'foo3'.
+	 { dg-final { scan-tree-dump-times {gimple_call <foo3,} 1 optimized { target __OPTIMIZE__ xfail *-*-* } } }
+	 { dg-final { scan-tree-dump-times {gimple_call <foo3,} 2 optimized { target { ! __OPTIMIZE__ } } } }
+      */
+    }
+
+  return r;
+}
-- 
2.25.1

Reply via email to