[PATCH, rs6000] Do not enable pcrel-opt by default

2021-06-22 Thread Aaron Sawdey via Gcc-patches
SPEC2017 testing on p10 shows that this optimization does not have a positive impact on performance. So we are no longer going to enable it by default. The test cases for it needed to be updated so they always enable it to test it. OK for trunk and backport to 11 if bootstrap/regtest passes? Than

[PATCH,rs6000] Fix p10 fusion regtests

2021-06-18 Thread Aaron Sawdey via Gcc-patches
From: Aaron Sawdey Update the count of matches for the fusion combine patterns after the recent changes to them. At Segher's request, used \m and \M in the match patterns. Also I have grouped together all alternatives of each fusion insn, which should hopefully make this test a little

[PATCH] Add needed earlyclobber to fusion patterns

2021-06-16 Thread Aaron Sawdey via Gcc-patches
The add-logical and add-add fusion patterns all have constraint alternatives "=0,1,&r,r" for the output (3). The inputs 0 and 1 are used in the first fusion instruction and then either may be reused as a temp for the output of the first insn which is input to the second. However, if input 2 is the

[PATCH,rs6000] Do not check if SMS succeeds on powerpc

2021-06-11 Thread Aaron Sawdey via Gcc-patches
These tests have become unstable and SMS either succeeds or doesn't depending on things like changes in instruction latency. Removing the scan-rtl-dump-times checks for powerpc*-*-*. If bootstrap/regtest is passes, ok for trunk and backport to 11? Thanks! Aaron gcc/testsuite * gcc.dg

[PATCH,rs6000] Fix operand order to subf for p10 fusion.

2021-06-02 Thread Aaron Sawdey via Gcc-patches
This certainly causes a bootstrap miscompare, and might also be responsible for PR/100820. The operands to subf were reversed in the logical-add/sub fusion patterns, and I screwed up my bootstrap test which is how it ended up getting committed. If bootstrap and regtest passes, ok for trunk (and ev

[PATCH,rs6000] Fix p10 fusion test cases for -m32

2021-05-26 Thread Aaron Sawdey via Gcc-patches
For some reason this never showed up on gcc-patches, trying again. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > Begin forwarded message: > > From: Aaron Sawdey > Subject: [PATCH,rs6000] Fix p10 fusion test cases for -m32 > Date: May 25, 2021 at 1:45

Re: [PATCH,rs6000 2/2] Fusion patterns for add-logical/logical-add

2021-05-24 Thread Aaron Sawdey via Gcc-patches
One last addendum to this. I discovered that that needs a "sort" in front of "keys %logicals_addsub" because otherwise you may get the operators in different orders sometimes which leads to fusion.md having the patterns in different orders which isn't helpful for sane debugging. Segher and I discu

Re: [PATCH,rs6000] Test cases for p10 fusion patterns

2021-05-11 Thread Aaron Sawdey via Gcc-patches
Ping. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Apr 26, 2021, at 2:00 PM, acsaw...@linux.ibm.com wrote: > > From: Aaron Sawdey > > This adds some test cases to make sure that the combine patterns for p10 > fusion are working. > >

Re: [PATCH,rs6000] Add insn types for fusion pairs

2021-05-11 Thread Aaron Sawdey via Gcc-patches
Ping. In answer to Will’s question — some of these are not immediately used but will be in other pending patches. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Apr 26, 2021, at 1:04 PM, acsaw...@linux.ibm.com wrote: > > From: Aaron Sawdey > >

Re: [PATCH,rs6000 0/2] p10 add-add and add-logical fusion series

2021-05-11 Thread Aaron Sawdey via Gcc-patches
Ping. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Apr 26, 2021, at 3:21 PM, acsaw...@linux.ibm.com wrote: > > From: Aaron Sawdey > > Two more sets of combine patterns for p10 fusion. These require > the "Add insn types for fusion pairs&

Re: [PATCH,rs6000] Optimize pcrel access of globals [ping]

2021-01-18 Thread Aaron Sawdey via Gcc-patches
Ping. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Dec 9, 2020, at 11:04 AM, acsaw...@linux.ibm.com wrote: > > From: Aaron Sawdey > > Ping. I've folded in the changes to comments suggested by Will Schmidt. > > This patch implements a

Re: [PATCH,rs6000] Test cases for p10 fusion patterns

2021-01-18 Thread Aaron Sawdey via Gcc-patches
Ping. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Jan 3, 2021, at 2:44 PM, Aaron Sawdey wrote: > > Ping. > > Aaron Sawdey, Ph.D. saw...@linux.ibm.com > IBM Linux on POWER Toolchain > > >> On Dec 11, 2020, at 1:53 PM, acsaw...@li

Re: [PATCH,rs6000] Fusion patterns for logical-logical

2021-01-18 Thread Aaron Sawdey via Gcc-patches
Ping. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Jan 3, 2021, at 2:43 PM, Aaron Sawdey wrote: > > Ping. > > Aaron Sawdey, Ph.D. saw...@linux.ibm.com > IBM Linux on POWER Toolchain > > >> On Dec 10, 2020, at 8:41 PM, acsaw...@li

Re: [PATCH,rs6000] Combine patterns for p10 load-cmpi fusion

2021-01-18 Thread Aaron Sawdey via Gcc-patches
Ping. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Jan 3, 2021, at 2:42 PM, Aaron Sawdey wrote: > > Ping. > > I assume we’re going to want a separate patch for the new instruction type. > > Aaron Sawdey, Ph.D. saw...@linux.ibm.com

Re: [PATCH,rs6000] Test cases for p10 fusion patterns

2021-01-03 Thread Aaron Sawdey via Gcc-patches
Ping. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Dec 11, 2020, at 1:53 PM, acsaw...@linux.ibm.com wrote: > > From: Aaron Sawdey > > This adds some test cases to make sure that the combine patterns for p10 > fusion are working. > >

Re: [PATCH,rs6000] Fusion patterns for logical-logical

2021-01-03 Thread Aaron Sawdey via Gcc-patches
Ping. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Dec 10, 2020, at 8:41 PM, acsaw...@linux.ibm.com wrote: > > From: Aaron Sawdey > > This patch adds a new function to genfusion.pl to generate patterns for > logical-logical fusion. They are

Re: [PATCH,rs6000] Combine patterns for p10 load-cmpi fusion

2021-01-03 Thread Aaron Sawdey via Gcc-patches
Ping. I assume we’re going to want a separate patch for the new instruction type. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Dec 4, 2020, at 1:19 PM, acsaw...@linux.ibm.com wrote: > > From: Aaron Sawdey > > This patch adds the first batch

Re: [PATCH] Additional small changes to support opaque modes

2020-11-20 Thread Aaron Sawdey via Gcc-patches
> On Nov 20, 2020, at 4:57 AM, Aaron Sawdey via Gcc-patches > wrote: > > >> On Nov 20, 2020, at 3:55 AM, Richard Sandiford >> wrote: >> >> acsawdey--- via Gcc-patches writes: >>> @@ -16767,7 +16768,7 @@ loc_descriptor (rtx rtl, machine_mode mod

Re: [PATCH] Additional small changes to support opaque modes

2020-11-20 Thread Aaron Sawdey via Gcc-patches
think it deserves a comment at least. > > The rest looks good to me FWIW. > > Richard I should look at this again — since I originally put that in, I switched the target portion of what I’ve been doing to use an UNSPEC to remove all use of an opaque mode const_int from the rtf. This may not be needed any more. Thanks, Aaron Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain

[PATCH,rs6000] Make MMA builtins use opaque modes [v2]

2020-11-19 Thread Aaron Sawdey via Gcc-patches
For some reason this patch never showed up on gcc-patches. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > Begin forwarded message: > > From: acsaw...@linux.ibm.com > Subject: [PATCH,rs6000] Make MMA builtins use opaque modes [v2] > Date: November 19, 2

Re: [PATCH,rs6000] Add patterns for combine to support p10 fusion

2020-11-04 Thread Aaron Sawdey via Gcc-patches
Ping. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Oct 26, 2020, at 4:44 PM, acsaw...@linux.ibm.com wrote: > > From: Aaron Sawdey > > This patch adds the first couple patterns to support p10 fusion. These > will allow combine to create a sing

Re: [PATCH] [PATCH] PR rtl-optimization/96791 Check precision of partial modes

2020-11-02 Thread Aaron Sawdey via Gcc-patches
id it. There is no solution like that for the MMA builtins that use POImode and are (in theory) exposed to the same problem. So I ask again, how can we tell extract_low_bits() that POImode is off limits to its prying fingers? Thanks, Aaron Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Lin

Ping: [PATCH] PR rtl-optimization/96791 Check precision of partial modes

2020-10-05 Thread Aaron Sawdey via Gcc-patches
Not exactly a patch ping, but I was hoping we could re-engage the discussion on this and figure out how we can make POImode work for powerpc. How does x86 solve this? There was some suggestion that it has some similar situations? Thanks, Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux

[PATCH][PR96791] disable POImode ld/st for memcpy [committed]

2020-09-10 Thread Aaron Sawdey via Gcc-patches
This is a (hopefully temporary) fix to PR96791. This will make the default be -mno-block-ops-vector-pair even on power10, so we will not hit the issue of DSE trying to truncate a POImode register. I am still concerned it will be possible to hit this because the MMA builtins will also generate POImo

Re: [PATCH] [PATCH] PR rtl-optimization/96791 Check precision of partial modes

2020-09-10 Thread Aaron Sawdey via Gcc-patches
So, would it be legitimate for extract_low_bits to query if the truncate pattern it will likely use is actually available? Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Sep 10, 2020, at 10:10 AM, Segher Boessenkool > wrote: > > Hi! > > On Th

Re: [PATCH] [PATCH] PR rtl-optimization/96791 Check precision of partial modes

2020-09-10 Thread Aaron Sawdey via Gcc-patches
If it feels like a hack, that would because it is a hack. What I’d really like to discuss is how to accomplish the real goal: keep anything from trying to do other operations (zero/sign extend for one) to POImode. Is there an existing mechanism for this? Thanks, Aaron Aaron Sawdey, Ph.D

[PATCH] [PATCH] PR rtl-optimization/96791 Check precision of partial modes

2020-09-09 Thread Aaron Sawdey via Gcc-patches
Now that the documentation for partial modes says they have a known number of bits of precision, would it make sense for extract_low_bits to check this before attempting to extract the bits? This would solve the problem we have been having with POImode and extract_low_bits -- DSE tries to use it t

[committed] rs6000: unaligned VSX in memcpy/memmove expansion

2020-08-18 Thread Aaron Sawdey via Gcc-patches
I've modified slightly per Will & Segher's comments, re-regstrapped and posting what I've actually committed. Aaron This patch adds a few new instructions to inline expansion of memcpy/memmove. Generation of all these are controlled by the option -mblock-ops-unaligned-vsx which is set on by def

[PATCH] rs6000: unaligned VSX in memcpy/memmove expansion

2020-08-14 Thread Aaron Sawdey via Gcc-patches
This patch adds a few new instructions to inline expansion of memcpy/memmove. Generation of all these is controlled by the option -mblock-ops-unaligned-vsx which is set on by default if the target has TARGET_EFFICIENT_UNALIGNED_VSX. * unaligned vsx load/store (V2DImode) * unaligned vsx pair load/

[PATCH] rs6000: clean up testsuite power10_hw check

2020-07-13 Thread Aaron Sawdey via Gcc-patches
Because the check for power10_hw is not called check_effective_target_power10_hw, it needs to be looked for by is-effective-target-keyword. Also reorder things in is-effective-target to put power10_hw with the other ppc stuff. These little fixes for power10 dejagnu support were pre-approved for tr

[PATCH] rs6000: add effective-target test ppc_mma_hw

2020-07-10 Thread Aaron Sawdey via Gcc-patches
Add a test for dejagnu to determine if execution of MMA instructions is supported in the test environment. Add an execution test to make sure that __builtin_cpu_supports("mma") is true if we can execute MMA instructions. OK for trunk and backport to 10? Thanks! Aaron gcc/testsuite/ *

[PATCH] rs6000: Add execution tests for mma builtins [v4]

2020-07-10 Thread Aaron Sawdey via Gcc-patches
est environment correctly identifies itself, and that it can execute MMA code and get the right answer. A future patch will add an effective-target test for powerpc_mma_hw, which these mma tests will also need to check for. OK for trunk and backport to 10? 2020-06-30 Rajalakshmi Srinivasaraghavan

Re: [PATCH] expr: Move reduce_bit_field target mode check [PR96151]

2020-07-10 Thread Aaron Sawdey via Gcc-patches
This fixed the ICE I was seeing, thanks. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Jul 10, 2020, at 10:40 AM, Richard Sandiford > wrote: > > In some cases, expand_expr_real_2 prefers to use the mode of the > caller-suggested target instead of

[PATCH] rs6000: Add execution tests for mma builtins. [v3]

2020-07-07 Thread Aaron Sawdey via Gcc-patches
Aaron 2020-06-30 Rajalakshmi Srinivasaraghavan Aaron Sawdey gcc/testsuite/ * gcc.target/powerpc/p10-identify.c: New file. * gcc.target/powerpc/mma-single-test.c: New file. * gcc.target/powerpc/mma-double-test.c: New file. --- .../gcc.target/powerp

[PATCH] rs6000: fix power10_hw test [v2]

2020-07-07 Thread Aaron Sawdey via Gcc-patches
The code snippet for this test was returning 1 if power10 instructions executed correctly. It should return 0 if the test passes. Approved offline by Segher with slight change. Will push after posting. * lib/target-supports.exp (check_power10_hw_available): Return 0 for passing t

[PATCH] rs6000: fix power10_hw test

2020-07-07 Thread Aaron Sawdey via Gcc-patches
The code snippet for this test was returning 1 if power10 instructions executed correctly. It should return 0 if the test passes. OK for trunk and backport to 10? Thanks, Aaron * lib/target-supports.exp (check_power10_hw_available): Return 0 for passing test. --- gcc/testsuit

[PATCH] rs6000: Add execution tests for mma builtins.

2020-07-07 Thread Aaron Sawdey via Gcc-patches
Updated slightly, removed -Wno-psabi as requested and also fixed the fact that it wasn't actually checking __builtin_cpu_is or __builtin_cpu_supports. OK for trunk and backport to 10? Thanks, Aaron 2020-06-30 Rajalakshmi Srinivasaraghavan Aaron Sawdey gcc/test

[PATCH] rs6000: Add execution tests for mma builtins.

2020-06-30 Thread Aaron Sawdey via Gcc-patches
. Actually the power10_hw test I think requires current glibc to pick up the change that lets __builtin_cpu_is("power10") work. OK for trunk? Thanks, Aaron 2020-06-30 Rajalakshmi Srinivasaraghavan Aaron Sawdey gcc/testsuite/ * gcc.target/powerpc/mma-single-test.c

[PATCH] rs6000: Allow --with-cpu=power10

2020-06-23 Thread Aaron Sawdey via Gcc-patches
Update config.gcc so that we can use --with-cpu=power10. I've tested that this does do the expected thing with --with-cpu=power10 and also that it still builds and bootstraps correctly using --with-cpu=power9 on power9. If there isn't any other testing I need to do for this, ok for trunk? Thanks

Re: [pushed][PATCH] identify lfs prefixed case PR95347

2020-06-15 Thread Aaron Sawdey via Gcc-patches
Now that this has been in trunk for a bit with no issues, ok to back port to 10? Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Jun 3, 2020, at 4:10 PM, Aaron Sawdey wrote: > > This passed regstrap and was approved offline by Segher, posting > th

[pushed][PATCH] identify lfs prefixed case PR95347

2020-06-03 Thread Aaron Sawdey via Gcc-patches
This passed regstrap and was approved offline by Segher, posting the final form (minus my debug code, oops). The same problem also arises for plfs where prefixed_load_p() doesn't recognize it so we get just lfs in the asm output with an @pcrel address. PR target/95347 * config/rs6

[PATCH] rs6000: identify lfs prefixed case PR95347

2020-06-02 Thread Aaron Sawdey via Gcc-patches
The same problem also arises for plfs where prefixed_load_p() doesn't recognize it so we get just lfs in the asm output with a @pcrel address. OK for trunk if regstrap on ppc64le passes? Thanks, Aaron PR target/95347 * config/rs6000/rs6000.c (is_stfs_insn): Rename to

[PATCH] rs6000: PR target/95347 Correctly identify stfs if prefixed

2020-05-29 Thread Aaron Sawdey via Gcc-patches
en if we use NON_PREFIXED_DEFAULT, address_to_insn_form() can see that it has the PCREL symbol ref. OK for trunk if regstrap on ppc64le passes? Thanks, Aaron 2020-05-29 Aaron Sawdey PR target/95347 * config/rs6000/rs6000.c (prefixed_store_p): Add special case for

[PATCH][v3], rs6000: Use plq/pstq for atomic_{load, store} (PR94622)

2020-04-21 Thread Aaron Sawdey via Gcc-patches
, also with the doubleword swap, which was wrong. While adding comments I realized we have exactly the same problem with pstq/stq so I have added fixes for that as well. Assuming that regstrap passes, OK for trunk? Thanks, Aaron 2020-04-20 Aaron Sawdey PR target/94622

Re: [PATCH][v2], rs6000, PR/target 94622, Be more careful with plq for atomic_load

2020-04-20 Thread Aaron Sawdey via Gcc-patches
with the doubleword swap, which was wrong. So, of course you can't use set_attr with an if_then_else. The below code actually builds and passes regstrap on ppc64le power9. OK for trunk? Thanks, Aaron 2020-04-20 Aaron Sawdey PR target/94622 * config/rs6000/sy

[PATCH], rs6000, PR/target 94622, Be more careful with plq for atomic_load

2020-04-20 Thread Aaron Sawdey via Gcc-patches
the doubleword swap, which was wrong. OK for trunk if regstrap passes on ppc64le power9? Thanks, Aaron 2020-04-20 Aaron Sawdey PR target/94622 * config/rs6000/sync.md (load_quadpti): Make this have attr prefixed if TARGET_PREFIXED. (atomic_load): Do not

[PATCH][rs6000][PR92379] fix UB shift of 64-bit type by 64 bits

2020-03-13 Thread Aaron Sawdey via Gcc-patches
This is a fix for PR92379. Passes regstrap on ppc64le. Pre-approved by Segher, committing after posting. 2020-03-13  Aaron Sawdey     PR target/92379     * config/rs6000/rs6000.c (num_insns_constant_multi) Don't shift a     64-bit value by 64 bits (UB). diff --git a/gcc/config/rs6000/r

Re: [PATCH] Use movmem optab to attempt inline expansion of __builtin_memmove()

2019-10-03 Thread Aaron Sawdey
On 10/2/19 5:44 PM, Aaron Sawdey wrote: > On 10/2/19 5:35 PM, Jakub Jelinek wrote: >> On Wed, Oct 02, 2019 at 09:21:23AM -0500, Aaron Sawdey wrote: >>>>> 2019-09-27 Aaron Sawdey >>>>> >>>>> * builtins.c (expand_builtin_memory_copy_args):

Re: [PATCH] Use movmem optab to attempt inline expansion of __builtin_memmove()

2019-10-02 Thread Aaron Sawdey
On 10/2/19 5:35 PM, Jakub Jelinek wrote: > On Wed, Oct 02, 2019 at 09:21:23AM -0500, Aaron Sawdey wrote: >>>> 2019-09-27 Aaron Sawdey >>>> >>>>* builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm. >>>>

Re: [PATCH] Use movmem optab to attempt inline expansion of __builtin_memmove()

2019-10-02 Thread Aaron Sawdey
On 10/1/19 4:45 PM, Jeff Law wrote: > On 9/27/19 12:23 PM, Aaron Sawdey wrote: >> This is the third piece of my effort to improve inline expansion of memmove. >> The >> first two parts I posted back in June fixed the names of the optab entries >> involved so that opta

[PATCH, RS6000] Add movmemsi pattern for inline expansion of memmove()

2019-09-30 Thread Aaron Sawdey
ress on ppc64le (power9), if tests are ok, is this ok for trunk after the movmem optab patch posted last week is approved? Thanks! Aaron 2019-09-30 Aaron Sawdey * config/rs6000/rs6000-protos.h (expand_block_move): Change prototype. * config/rs6000/rs6000-string.c (expand_b

[PATCH] Use movmem optab to attempt inline expansion of __builtin_memmove()

2019-09-27 Thread Aaron Sawdey
e might_overlap case. Bootstrap/regtest passed on ppc64le, in progress on x86_64. If everything passes, is this ok for trunk? 2019-09-27 Aaron Sawdey * builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm. (expand_builtin_memcpy): Us

[PATCH] Add movmem optab entry back in for overlapping moves

2019-07-02 Thread Aaron Sawdey
bootstrap/regtest on ppc64le and x86_64. Ok for trunk? 2019-07-02 Aaron Sawdey * optabs.def (movmem_optab): Add movmem back for memmove(). * doc/md.texi: Add description of movmem pattern for overlapping move. Index: gcc/doc/md.texi

Re: [PATCH 32/30] Document movmem/cpymem changes in gcc-10/changes.html

2019-06-27 Thread Aaron Sawdey
On 6/25/19 4:43 PM, Jeff Law wrote: > On 6/25/19 2:22 PM, acsaw...@linux.ibm.com wrote: >> From: Aaron Sawdey >> >> * builtins.c (get_memory_rtx): Fix comment. >> * optabs.def (movmem_optab): Change to cpymem_optab. >> * expr.c (emit_block_move_via

[PATCH 31/30] Update documentation for movmem to cpymem change

2019-06-26 Thread Aaron Sawdey
On 6/25/19 4:43 PM, Jeff Law wrote: > On 6/25/19 2:22 PM, acsaw...@linux.ibm.com wrote: >> From: Aaron Sawdey >> >> * builtins.c (get_memory_rtx): Fix comment. >> * optabs.def (movmem_optab): Change to cpymem_optab. >> * expr.c (emit_block_move_via

Re: [PATCH] sched-ebb.c: avoid moving table jumps (PR rtl-optimization/88423)

2019-02-18 Thread Aaron Sawdey
On 2/18/19 10:41 AM, Alexander Monakov wrote: > On Mon, 18 Feb 2019, Aaron Sawdey wrote: > >> The code in emit_case_dispatch_table() will very clearly always emit >> branch/label/jumptable_data/barrier >> so this does need to be handled. So, yes tablejump always looks

Re: [PATCH] sched-ebb.c: avoid moving table jumps (PR rtl-optimization/88423)

2019-02-18 Thread Aaron Sawdey
/ppc32), ok for trunk? 2019-02-18 Aaron Sawdey PR rtl-optimization/88347 * schedule-ebb.c (begin_move_insn): Apply Segher's patch to handle a jump table before the barrier. On 1/24/19 9:43 AM, Alexander Monakov wrote: > On Wed, 23 Jan 2019, Alexander Monakov wrote:

[PATCH] PR rtl-optimization/88308 Update LABEL_NUSES in move_insn_for_shrink_wrap

2019-02-13 Thread Aaron Sawdey
e (32/64) and x86_64? Thanks! Aaron 2019-02-13 Aaron Sawdey * shrink-wrap.c (move_insn_for_shrink_wrap): Fix LABEL_NUSES counts on copied instruction. Index: gcc/shrink-wrap.c === --- gcc/shrink-wrap.c (revi

Re: [PATCH, rs6000] PR target/89112 put branch probabilities on branches generated by inline expansion

2019-02-08 Thread Aaron Sawdey
Missed two more conditional branches created by inline expansion that should have had branch probability notes. 2019-02-08 Aaron Sawdey * config/rs6000/rs6000-string.c (expand_compare_loop, expand_block_compare): Insert REG_BR_PROB notes in inline expansion of memcmp

[PATCH, rs6000] PR target/89112 put branch probabilities on branches generated by inline expansion

2019-02-04 Thread Aaron Sawdey
, which is what caused the long branches in 89112. With this patch, the test case for 89112 does not have any long branches within the expansion of memcmp, and the code for each memcmp is contiguous. OK for trunk and 8 backport if bootstrap/regtest passes? Thanks! Aaron 2019-02-04 Aaron Sawdey

[PATCH, rs6000] PR target/89112 [8/9 Regression] fix bdnzt pattern for long branch case

2019-02-02 Thread Aaron Sawdey
backport to 8? Thanks! 2019-02-02 Aaron Sawdey * config/rs6000/rs6000.md (tf_): generate a local label for the long branch case. Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 268403

Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion

2019-01-14 Thread Aaron Sawdey
The patch for this was committed to trunk as 267562 (see below). Is this also ok for backport to 8? Thanks, Aaron On 12/20/18 5:44 PM, Segher Boessenkool wrote: > On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote: >> On 12/20/18 3:51 AM, Segher Boessenkool wrote: >&g

Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion

2018-12-20 Thread Aaron Sawdey
On 12/20/18 5:44 PM, Segher Boessenkool wrote: > On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote: >> On 12/20/18 3:51 AM, Segher Boessenkool wrote: >>> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote: >>>> Because of POWER9 dd2.1 iss

Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion

2018-12-20 Thread Aaron Sawdey
On 12/20/18 3:51 AM, Segher Boessenkool wrote: > On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote: >> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions >> to cache inhibited memory, here is a patch that keeps memmove (and memcpy) >> inline

Re: [PATCH] -Wtautological-compare: fix comparison of macro expansions

2018-12-20 Thread Aaron Sawdey
it to trunk as r267299. > > Aaron, does this fix the issue you saw? > > Thanks, and sorry again about the breakage. > Dave > Dave, Thanks for the quick response, the build issue is fixed with r267299. Aaron -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (5

Re: [PATCH 2/2] v2: C++: improvements to binary operator diagnostics (PR c++/87504)

2018-12-19 Thread Aaron Sawdey
ee_code, tree, >> enum tree_code, tree, enum tree_code, tree); >> -extern void warn_tautological_cmp (location_t, enum tree_code, tree, tree); >> +extern void warn_tautological_cmp (const op_location_t &, enum tree_code, >> +   tree, tree); >>   extern void warn_logical_not_parentheses (location_t, enum tree_code, tree, >>     tree); >>   extern bool warn_if_unused_value (const_tree, location_t); >> diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c >> index fc7f87c..fce9d84 100644 >> --- a/gcc/c-family/c-warn.c >> +++ b/gcc/c-family/c-warn.c >> @@ -322,7 +322,8 @@ find_array_ref_with_const_idx_r (tree *expr_p, int *, >> void *) >>       if ((TREE_CODE (expr) == ARRAY_REF >> || TREE_CODE (expr) == ARRAY_RANGE_REF) >> -  && TREE_CODE (TREE_OPERAND (expr, 1)) == INTEGER_CST) >> +  && (TREE_CODE (tree_strip_any_location_wrapper (TREE_OPERAND (expr, >> 1))) >> +  == INTEGER_CST)) >>   return integer_type_node; > > I think we want fold_for_warn here.  OK with that change (assuming it passes). > > Jason > -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain

[PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion

2018-12-19 Thread Aaron Sawdey
://patchwork.ozlabs.org/patch/814059/ OK for trunk if bootstrap/regtest ok? Thanks! Aaron 2018-12-19 Aaron Sawdey * config/rs6000/rs6000-string.c (expand_block_move): Don't use unaligned vsx and avoid lxvd2x/stxvd2x. (gen_lvx_v4si_move): New function. Index: gcc/config/r

Re: [PATCH][rs6000] better use of unaligned vsx in memset() expansion

2018-11-28 Thread Aaron Sawdey
align >= 128) - || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX))) + && (bytes >= 16 && ( align >= 128 || unaligned_vsx_ok))) { clear_bytes = 16; mode = V4SImode; On 11/26/18 4:29 PM, Segher Boessenkool wrote: > On Mon, Nov 26,

Re: [PATCH][rs6000][8 backport] improve gpr inline expansion of str[n]cmp

2018-11-26 Thread Aaron Sawdey
gtest on a couple different ppc64 architectures (unless anyone has any objections). Thanks, Aaron 2018-11-26 Aaron Sawdey Backport from mainline 2018-10-25 Aaron Sawdey * config/rs6000/rs6000-string.c (expand_strncmp_gpr_sequence): Change to a shorter seq

[PATCH][rs6000] better use of unaligned vsx in memset() expansion

2018-11-26 Thread Aaron Sawdey
d vsx for the last 32 bytes of any block being cleared. So this change puts the test up front so it is not affected by the decrement of bytes. OK for trunk if regstrap passes? Thanks! Aaron 2018-11-26 Aaron Sawdey * config/rs6000/rs6000-string.c (expand_block_clear): Chang

Re: [PATCH][rs6000] inline expansion of memcmp using vsx

2018-11-15 Thread Aaron Sawdey
On 11/15/18 4:02 AM, Richard Biener wrote: > On Wed, Nov 14, 2018 at 5:43 PM Aaron Sawdey wrote: >> >> This patch generalizes some the functions added earlier to do vsx expansion >> of strncmp >> so that the can also generate the code needed for memcmp. I reorganized

[PATCH][rs6000] inline expansion of memcmp using vsx

2018-11-14 Thread Aaron Sawdey
r than the gpr inline code if the strings are equal and is comparable if the strings have a 10% chance of being equal (spread across the string). Currently regtesting, ok for trunk if tests pass? Thanks! Aaron 2018-11-14 Aaron Sawdey * config/rs6000/rs6000-string.c (emit_vsx_zero

[PATCH][rs6000] use index form addresses more often for l[wh]brx/st[wh]brx

2018-11-05 Thread Aaron Sawdey
power8/power9, ok for trunk? Thanks! Aaron 2018-11-05 Aaron Sawdey * config/rs6000/rs6000.md (bswap2): Force address into register if not in indexed or indirect form. (bswap2_load): Change predicate to indexed_or_indirect_operand. (bswap2_store): Ditto

[PATCH][rs6000] fix ICE for strncmp expansion on power6

2018-11-02 Thread Aaron Sawdey
(load_mode, tmp_reg_src2, addr2, orig_src2); /* We must always left-align the data we read, and -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain

[PATCH][rs6000] cleanup and rename rs6000_address_for_fpconvert

2018-11-01 Thread Aaron Sawdey
incoming rtx which matches what the insns this is used to prepare for are using as their predicate. Bootstrap/regtest passes on ppc64le (power7, power9), ok for trunk? 2018-11-01 Aaron Sawdey * config/rs6000/rs6000-protos.h (rs6000_address_for_fpconvert): Remove prototype

Re: [PATCH][rs6000] use index form addresses more often for ldbrx/stdbrx

2018-10-30 Thread Aaron Sawdey
I had to make one more change to make this actually work. In rs6000_force_indexed_or_indirect_mem() it was necessary to return the updated rtx. Bootstrap/regtest passes on ppc64le (power7, power9), ok for trunk? Thanks! Aaron 2018-10-30 Aaron Sawdey * config/rs6000/rs6000.md

Re: [PATCH][rs6000] use index form addresses more often for ldbrx/stdbrx

2018-10-29 Thread Aaron Sawdey
On 10/27/18 12:52 PM, Segher Boessenkool wrote: > Hi Aaron, > > On Sat, Oct 27, 2018 at 11:20:01AM -0500, Aaron Sawdey wrote: >> --- gcc/config/rs6000/rs6000.md (revision 265393) >> +++ gcc/config/rs6000/rs6000.md (working copy) >> @@ -2512,9 +2512,27 @@

[PATCH][rs6000] use index form addresses more often for ldbrx/stdbrx

2018-10-27 Thread Aaron Sawdey
ut I have other cases where it will update them if there is more register pressure. in either case the code is more compact and makes full use of the indexed addressing of ldbrx. Bootstrap/regtest passed on ppc64le targeting power7/power8/power9, ok for trunk? Thanks! Aaron 2018-10-27 Aa

[PATCH][rs6000] improve gpr inline expansion of str[n]cmp

2018-10-25 Thread Aaron Sawdey
is faster for long strings that do not differ, but that isn't important because if vsx is enabled, the gpr sequence is only used for 15 bytes or less. Bootstrap/regtest passes on ppc64le (power8, power9), ppc64 (power8) and ppc32 (power8). Ok for trunk? Thanks, Aaron 2018-10-25 Aaron S

Re: [PATCH][rs6000][PR target/87474] fix strncmp expansion with -mno-power8-vector

2018-10-02 Thread Aaron Sawdey
On 10/2/18 3:38 AM, Segher Boessenkool wrote: > On Mon, Oct 01, 2018 at 11:09:44PM -0500, Aaron Sawdey wrote: >> PR/87474 happens because I didn't check that both vector and VSX instructions >> were enabled, so insns that are disabled get generated with >> -mno-po

[PATCH][rs6000][PR target/87474] fix strncmp expansion with -mno-power8-vector

2018-10-01 Thread Aaron Sawdey
PR/87474 happens because I didn't check that both vector and VSX instructions were enabled, so insns that are disabled get generated with -mno-power8-vector. Regstrap passes on ppc64le, ok for trunk? Thanks! Aaron 2018-10-01 Aaron Sawdey PR target/87474 * config/r

[PATCH, rs6000] inline expansion of str[n]cmp using vec/vsx instructions

2018-08-22 Thread Aaron Sawdey
ppc64le (power8 and power9). Ok for trunk? Thanks! Aaron 2018-08-22 Aaron Sawdey * config/rs6000/altivec.md (altivec_eq): Remove star. * config/rs6000/rs6000-string.c (do_load_for_compare): Support vector load modes. (expand_strncmp_vec_sequence): New function

[PATCH, rs6000] refactor/cleanup in rs6000-string.c

2018-07-31 Thread Aaron Sawdey
Just teasing things apart a bit more in this function so I can add vec/vsx code generation without making it enormous and incomprehensible. Bootstrap/regtest passes on powerpc64le, ok for trunk? Thanks, Aaron 2018-07-31 Aaron Sawdey * config/rs6000/rs6000-string.c

[PATCH, rs6000] don't use unaligned vsx for memset of less than 32 bytes

2018-06-25 Thread Aaron Sawdey
U2006 runs show the performance regression is fixed. Regstrap passes on powerpc64le, ok for trunk and backport to 8? Thanks, Aaron 2018-06-25 Aaron Sawdey * config/rs6000/rs6000-string.c (expand_block_clear): Don't use unaligned vsx for 16B memset. -- Aaron Sa

[PATCH, rs6000] PR target/86222 fix truncation issue with constants when compiling -m32

2018-06-21 Thread Aaron Sawdey
backport to 8? Thanks, Aaron 2018-06-19 Aaron Sawdey * config/rs6000/rs6000-string.c (expand_strn_compare): Handle -m32 correctly. -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC

[PATCH, rs6000] cleanup/refactor in rs6000-string.c

2018-06-14 Thread Aaron Sawdey
-- ok for trunk? Thanks! Aaron 2018-06-14 Aaron Sawdey * config/rs6000/rs6000-string.c (select_block_compare_mode): Check TARGET_EFFICIENT_OVERLAPPING_UNALIGNED here instead of in caller. (do_and3, do_and3_mask, do_compb3, do_rotl3): New func

Re: [PATCH] rs6000 PR83660 fix ICE with vec_extract

2018-04-23 Thread Aaron Sawdey
This also affects gcc 7 and is fixed by the same patch. I've tested the backport to 7 on ppc64le and it causes no new fails. OK for backport to 7 (and 6 if it's also needed there)? Thanks, Aaron On Fri, 2018-04-13 at 15:37 -0500, Aaron Sawdey wrote: > Per the discussion on th

[PATCH] rs6000 PR83660 fix ICE with vec_extract

2018-04-13 Thread Aaron Sawdey
PR in there, it has side effects and this problem will not occur. Doing bootstrap/regtest on ppc64le with -mcpu=power7 since that is where this issue arises. OK for trunk if everything passes? Thanks, Aaron 2018-04-13 Aaron Sawdey PR target/83660 * config/rs6000/rs600

[PATCH, rs6000] PR85321 improve documentation of -mcall and -mtraceback=

2018-04-10 Thread Aaron Sawdey
invoke.texi. This is the last piece for 85321. Testing in progress on linux-ppc64le, ok for trunk if tests are ok? Thanks, Aaron 2018-04-10 Aaron Sawdey PR target/85321 * doc/invoke.texi (RS/6000 and PowerPC Options): Document options -mcall= and -mtraceback. Remove

[PATCH, committed] Update my MAINTAINERS entry

2018-04-10 Thread Aaron Sawdey
Update to my new email address. Committed as 259301. 2018-04-10 Aaron Sawdey * MAINTAINERS: Update my email address. -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC ToolchainIndex: MAINTAINERS

[PATCH rs6000: document options (PR85321)

2018-04-10 Thread Aaron Sawdey
This updates invoke.texi to document -mblock-compare-inline-limit, -mblock-compare-inline-loop-limit, and -mstring-compare-inline-limit. Tested with "make pdf", ok for trunk? 2018-04-10 Aaron Sawdey PR target/85321 * doc/invoke.texi (RS/6000 and PowerPC Options)

[PATCH, rs6000] PR target/83822 fix redundant conditions

2018-03-29 Thread Aaron Sawdey
I've fixed the redundant conditions in the expressions pointed out by 83822. Bootstrap/regtest passes on ppc64le, ok for trunk? Aaron 2018-03-29 Aaron Sawdey PR target/83822 * config/rs6000/rs6000-string.c (expand_compare_loop): Fix redundant cond

PR target/84743 adjust reassociation widths for power8/power9

2018-03-12 Thread Aaron Sawdey
% Bottom line is net improvement for CPU2017 int compared with either current trunk, or disabling parallel reassociation. For CPU2017 fp, very small overall degradation. Currently doing regstrap on ppc64le, ok for trunk if results look good? Thanks! Aaron 2018-03-12 Aaron Sawdey

Re: [PATCH][AArch64] PR84114: Avoid reassociating FMA

2018-02-27 Thread Aaron Sawdey
so it would be nice to be able to avoid causing issues as a result of that. -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain

Re: [PATCH, rs6000][PR debug/83758] v2 rs6000_internal_arg_pointer should only return a register

2018-01-30 Thread Aaron Sawdey
otstrap, go tests run. Segher is currently regtesting on ppc64le power9. OK for trunk if tests pass? 2018-01-30 Aaron Sawdey * config/rs6000/rs6000.c (rs6000_internal_arg_pointer ): Only return a reg rtx. -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253

Re: [PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()

2018-01-30 Thread Aaron Sawdey
>args.internal_arg_pointer) > in var-tracking.c. > rs6000/powerpcspe with -fsplit-stack are the only cases where > crtl->args.internal_arg_pointer is not a REG, so just running libgo > testsuite on powerpc{,64,64le} should cover it all. I'll give this a try today when I get to the office. Thanks, Aaron > > Jakub > -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain

[PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()

2018-01-29 Thread Aaron Sawdey
ok on ppc64le and x86_64, ok for trunk? 2018-01-29 Aaron Sawdey * var-tracking.c (vt_add_function_parameter): Fix comparison of rtx. Index: gcc/var-tracking.c === --- gcc/var-tracking.c (revision 257159) +++ gcc/var-tracking.

[PATCH] reduce runtime of gcc.dg/memcmp-1.c test

2018-01-10 Thread Aaron Sawdey
This brings it back not quite to where it was but a lot more reasonable than what I put into 256351. 2018-01-10 Aaron Sawdey * gcc.dg/memcmp-1.c: Reduce runtime to something reasonable. OK for trunk? Thanks, Aaron -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113

Re: [PATCH, rs6000] generate loop code for memcmp inline expansion

2018-01-10 Thread Aaron Sawdey
I'll check the runtime of that --- I added some test cases to memcmp- 1.c and probably it is now taking too long. I will revise it so it's no longer than it was before. Aaron On Wed, 2018-01-10 at 14:25 +, Szabolcs Nagy wrote: > On 08/01/18 19:37, Aaron Sawdey wrote: > &

Re: [PATCH, rs6000] generate loop code for memcmp inline expansion

2018-01-08 Thread Aaron Sawdey
On Tue, 2017-12-12 at 10:13 -0600, Segher Boessenkool wrote: > Please fix those trivialities, and it's okay for trunk (after the > rtlanal patch is approved too). Thanks! Here's the final version of this, which is committed as 256351. 2018-01-08 Aaron Sawdey * conf

  1   2   >