Hi! On 2022-10-17T15:58:47+0200, Aldy Hernandez <al...@redhat.com> wrote: > On Mon, Oct 17, 2022 at 9:44 AM Thomas Schwinge <tho...@codesourcery.com> > wrote: >> On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches >> <gcc-patches@gcc.gnu.org> wrote: >> > When solving 0 = _15 & 1, we calculate _15 as: >> > >> > [irange] int [-INF, -2][0, +INF] NONZERO 0xfffffffe >> > >> > The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with >> > the above, yielding: >> > >> > [0, 1] NONZERO 0x0 >> > >> > This eventually gets copied to a _Bool [0, 1] NONZERO 0x0. >> > >> > This is problematic because here we have a bool which is zero, but >> > returns false for irange::zero_p, since the latter does not look at >> > nonzero bits. This causes logical_combine to assume the range is >> > not-zero, and all hell breaks loose. >> > >> > I think we should just normalize a nonzero mask of 0 to [0, 0] at >> > creation, thus avoiding all this. >> >> 1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 >> "[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx >> offloading test case: >> >> UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 >> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 >> (test for excess errors) >> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 >> execution test >> [-PASS:-]{+FAIL:+} >> libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 >> scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* >> [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}" >> >> Same for C++. >> >> I'll later send a patch (for the test case!) to fix that up. >> >> 2. Looking into this, I found that this >> commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 >> "[PR107195] Set range to zero when nonzero mask is 0" actually enables a >> code transformation/optimization that GCC apparently has not been doing >> before! I've tried to capture that in the attached >> "Add 'c-c++-common/torture/pr107195-1.c' [PR107195]". > > Nice. > >> Will you please verify that one? In its current '#if 1' configuration, >> it's all-PASS after commit >> r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 >> "[PR107195] Set range to zero when nonzero mask is 0", whereas before, we >> get two calls to 'foo', because GCC apparently didnn't understand the >> relation (optimization opportunity) between 'r *= 2;' and the subsequent >> 'if (r & 1)'. > > Yeah, that looks correct. We keep better track of nonzero masks.
OK, next observation: this also works for split-up expressions 'if ((r & 2) && (r & 1))' (same rationale as for 'if (r & 1)' alone). I've added such a variant in my test case. But: it doesn't work for logically equal 'if (r & 3)'. I've added such an XFAILed variant in my test case. Do you have guidance what needs to be done to make such cases work, too? >> I've left in the other '#if' variants in case you'd like to experiment >> with these, but would otherwise clean that up before pushing. >> >> Where does one put such a test case? >> >> Should the file be named 'pr107195' or something else? > > The aforementioned patch already has: > > * gcc.dg/tree-ssa/pr107195-1.c: New test. > * gcc.dg/tree-ssa/pr107195-2.c: New test. > > So I would just add a pr107195-3.c test. But note that unlike yours in 'gcc.dg/tree-ssa/', I had put mine into 'c-c++-common/torture/'. That's so that we get C and C++ testing, and all torture testing flag variants. (... where we do see the optimization happen starting at '-O1'.) Do you think that is excessive, and a single 'gcc.dg/tree-ssa/' test case, C only, '-O1' only is sufficient for this? (I don't have much experience with test cases in such regions of GCC, hence these questions.) >> Do we scan 'optimized', or an earlier dump? >> >> At '-O1', the actual code transformation is visible already in the 'dom2' >> dump: >> >> <bb 3> [local count: 536870913]: >> gimple_assign <mult_expr, r_7, r_6(D), 2, NULL> >> + gimple_assign <bit_and_expr, _11, r_7, 1, NULL> >> + goto <bb 6>; [100.00%] >> >> - <bb 4> [local count: 1073741824]: >> - # gimple_phi <r_4, r_6(D)(2), r_7(3)> >> + <bb 4> [local count: 536870912]: >> + # gimple_phi <r_4, r_6(D)(2)> >> gimple_assign <bit_and_expr, _2, r_4, 1, NULL> >> gimple_cond <ne_expr, _2, 0, NULL, NULL> >> - goto <bb 5>; [50.00%] >> + goto <bb 5>; [100.00%] >> else >> - goto <bb 6>; [50.00%] >> + goto <bb 6>; [0.00%] >> >> <bb 5> [local count: 536870913]: >> gimple_call <foo, _3, r_4> >> gimple_assign <plus_expr, r_8, _3, r_4, NULL> >> >> <bb 6> [local count: 1073741824]: >> - # gimple_phi <r_5, r_4(4), r_8(5)> >> + # gimple_phi <r_5, r_4(4), r_8(5), r_7(3)> >> gimple_return <r_5> >> >> And, the actual "avoid second call 'foo'" optimization is visiable >> starting 'dom3': >> >> <bb 3> [local count: 536870913]: >> gimple_assign <mult_expr, r_7, r_6(D), 2, NULL> >> + goto <bb 6>; [100.00%] >> >> - <bb 4> [local count: 1073741824]: >> - # gimple_phi <r_4, r_6(D)(2), r_7(3)> >> - gimple_assign <bit_and_expr, _2, r_4, 1, NULL> >> + <bb 4> [local count: 536870912]: >> + gimple_assign <bit_and_expr, _2, r_6(D), 1, NULL> >> gimple_cond <ne_expr, _2, 0, NULL, NULL> >> - goto <bb 5>; [50.00%] >> + goto <bb 5>; [100.00%] >> else >> - goto <bb 6>; [50.00%] >> + goto <bb 6>; [0.00%] >> >> <bb 5> [local count: 536870913]: >> - gimple_call <foo, _3, r_4> >> - gimple_assign <plus_expr, r_8, _3, r_4, NULL> >> + gimple_assign <integer_cst, _3, 0, NULL, NULL> >> + gimple_assign <ssa_name, r_8, r_6(D), NULL, NULL> >> >> <bb 6> [local count: 1073741824]: >> - # gimple_phi <r_5, r_4(4), r_8(5)> >> + # gimple_phi <r_5, r_6(D)(4), r_6(D)(5), r_7(3)> >> gimple_return <r_5> >> >> ..., but I don't know if either of those would be stable/appropriate to >> scan instead of 'optimized'? > > IMO, either dom3 or optimized is fine. OK, I left it at 'optimized', as I don't have any rationale why exactly it should happen in 'dom3' already. ;-) Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
>From 8b1a33933002ec9e9accdb41229b9b19a431b66a Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <tho...@codesourcery.com> Date: Mon, 17 Oct 2022 09:10:03 +0200 Subject: [PATCH] Add 'c-c++-common/torture/pr107195-1.c' [PR107195] ... to display optimization performed as of recent commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 "[PR107195] Set range to zero when nonzero mask is 0". PR tree-optimization/107195 gcc/testsuite/ * c-c++-common/torture/pr107195-1.c: New. --- .../c-c++-common/torture/pr107195-1.c | 78 +++++++++++++++++++ 1 file changed, 78 insertions(+) create mode 100644 gcc/testsuite/c-c++-common/torture/pr107195-1.c diff --git a/gcc/testsuite/c-c++-common/torture/pr107195-1.c b/gcc/testsuite/c-c++-common/torture/pr107195-1.c new file mode 100644 index 00000000000..477eed6e959 --- /dev/null +++ b/gcc/testsuite/c-c++-common/torture/pr107195-1.c @@ -0,0 +1,78 @@ +/* Inspired by 'libgomp.oacc-c-c++-common/nvptx-sese-1.c'. */ + +/* { dg-additional-options -fdump-tree-optimized-raw } + { dg-skip-if {} { *-*-* } { {-flto -fno-fat-lto-objects} } { } } */ + +extern int +__attribute__((const)) +foo1 (int); + +int f1 (int r) +{ + if (foo1 (r)) /* If this first 'if' holds... */ + r *= 2; /* ..., 'r' now has a zero-value lower-most bit... */ + + if (r & 1) /* ..., so this second 'if' can never hold... */ + { /* ..., so this is unreachable. */ + /* In constrast, if the first 'if' does not hold ('foo1 (r) == 0'), the + second 'if' may hold, but we know ('foo1' being 'const') that + 'foo1 (r) == 0', so don't have to re-evaluate it here: */ + r += foo1 (r); + /* Thus, if optimizing, we only ever expect one call of 'foo1'. + { dg-final { scan-tree-dump-times {gimple_call <foo1,} 1 optimized { target __OPTIMIZE__ } } } + { dg-final { scan-tree-dump-times {gimple_call <foo1,} 2 optimized { target { ! __OPTIMIZE__ } } } } + */ + } + + return r; +} + + +extern int +__attribute__((const)) +foo2 (int); + +int f2 (int r) +{ + if (foo2 (r)) /* If this first 'if' holds... */ + r *= 2; /* ..., 'r' now has a zero-value lower-most bit... */ + + if ((r & 2) && (r & 1)) /* ..., so 'r & 1' in this second 'if' can never hold... */ + { /* ..., so this is unreachable. */ + /* In constrast, if the first 'if' does not hold ('foo2 (r) == 0'), the + second 'if' may hold, but we know ('foo2' being 'const') that + 'foo2 (r) == 0', so don't have to re-evaluate it here: */ + r += foo2 (r); + /* Thus, if optimizing, we only ever expect one call of 'foo2'. + { dg-final { scan-tree-dump-times {gimple_call <foo2,} 1 optimized { target __OPTIMIZE__ } } } + { dg-final { scan-tree-dump-times {gimple_call <foo2,} 2 optimized { target { ! __OPTIMIZE__ } } } } + */ + } + + return r; +} + + +extern int +__attribute__((const)) +foo3 (int); + +int f3 (int r) +{ + if (foo3 (r)) /* If this first 'if' holds... */ + r *= 2; /* ..., 'r' now has a zero-value lower-most bit... */ + + if (r & 3) /* ..., so this second 'if' can never hold... */ + { /* ..., so this is unreachable. */ + /* In constrast, if the first 'if' does not hold ('foo3 (r) == 0'), the + second 'if' may hold, but we know ('foo3' being 'const') that + 'foo3 (r) == 0', so don't have to re-evaluate it here: */ + r += foo3 (r); + /* Thus, if optimizing, we only ever expect one call of 'foo3'. + { dg-final { scan-tree-dump-times {gimple_call <foo3,} 1 optimized { target __OPTIMIZE__ xfail *-*-* } } } + { dg-final { scan-tree-dump-times {gimple_call <foo3,} 2 optimized { target { ! __OPTIMIZE__ } } } } + */ + } + + return r; +} -- 2.25.1