[PATCH, rs6000] Add two peephole2 patterns for mr. insn

2023-05-29 Thread HAO CHEN GUI via Gcc-patches
Hi, By checking the object files of SPECint, I found that two kinds of compare/move can't be combined to "mr." pattern as there is no register link between them. The patch adds two peephole2 patterns for them. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Thanks Gu

[PATCHv2, rs6000] Add two peephole2 patterns for mr. insn

2023-06-11 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds two peephole2 patterns which help convert certain insn sequences to "mr." instruction. These insn sequences can't be combined in combine pass. Compared to last version, it adds a new mode iterator "Q" which should be used for dot instruction. With "-m32/-mpowerpc64" set, th

[PATCHv3, rs6000] Add two peephole2 patterns for mr. insn

2023-06-13 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds two peephole2 patterns which help convert certain insn sequences to "mr." instruction. These insn sequences can't be combined in combine pass. Compared to last version, it changes the new mode iterator name from "Q" to "WORD". Bootstrapped and tested on powerpc64-linux B

[PATCH-2, rs6000] Don't widen shift mode when target has rotate/mask instruction on original mode [PR93738]

2023-07-18 Thread HAO CHEN GUI via Gcc-patches
Hi, The patch relies on the fist patch. The reason of the change is also described in the first patch. This patch implements the target hook have_rotate_and_mask. It also modifies some test cases. The regression of rlwimi-2.c is fixed. For rlwinm-0.c and rlwinm-2.c, one more 32bit rotate/mask ins

[PATCH-1, combine] Don't widen shift mode when target has rotate/mask instruction on original mode [PR93738]

2023-07-18 Thread HAO CHEN GUI via Gcc-patches
Hi, The shift mode will be widen in combine pass if the operand has a normal subreg. But when the target already has rotate/mask/insert instructions on the narrow mode, it's unnecessary to widen the mode for lshiftrt. As the lshiftrt is commonly converted to rotate/mask insn, the widen mode block

Ping [PATCH v7, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2023-07-18 Thread HAO CHEN GUI via Gcc-patches
Hi, As the ticket(PR107013, adding fmin/max to RTL code) is suspended, I ping this patch. The unspec of fmin/max can be replaced with corresponding RTL code after that ticket is fixed. https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602181.html Thanks Gui Haochen 在 2022/9/26 11:35, H

Re: [PATCH-1, combine] Don't widen shift mode when target has rotate/mask instruction on original mode [PR93738]

2023-07-20 Thread HAO CHEN GUI via Gcc-patches
Hi Jeff, 在 2023/7/21 5:27, Jeff Law 写道: > Wouldn't it make more sense to just try rotate/mask in the original mode > before trying a shift in a widened mode?  I'm not sure why we need a target  > hook here. There is no change to try rotate/mask with the original mode when expensive_optimizations

Re: [PATCH-1, combine] Don't widen shift mode when target has rotate/mask instruction on original mode [PR93738]

2023-07-20 Thread HAO CHEN GUI via Gcc-patches
Sorry for the typo s/change/chance 在 2023/7/21 8:59, HAO CHEN GUI 写道: > Hi Jeff, > > 在 2023/7/21 5:27, Jeff Law 写道: >> Wouldn't it make more sense to just try rotate/mask in the original mode >> before trying a shift in a widened mode?  I'm not sure why we need a target  >> hook here. > > There

[PATCHv2, rs6000] Generate mfvsrwz for all subtargets and remove redundant zero extend [PR106769]

2023-07-20 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch modifies vsx extract expand and generates mfvsrwz/stxsiwx for all subtargets when the mode is V4SI and the index of extracted element is 1 for BE and 2 for LE. Also this patch adds a insn pattern for mfvsrwz which can help eliminate redundant zero extend. Compared to last versio

[PATCHv2, rs6000] Generate mfvsrwz for all subtargets and remove redundant zero extend [PR106769]

2023-07-24 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch modifies vsx extract expand and generates mfvsrwz/stxsiwx for all subtargets when the mode is V4SI and the index of extracted element is 1 for BE and 2 for LE. Also this patch adds a insn pattern for mfvsrwz which helps eliminate redundant zero extend. Compared to last version,

[PATCH v3, rs6000] Disable TImode from Bool expanders [PR100694, PR93123]

2022-07-03 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch fails TImode for all 128-bit logical operation expanders. So TImode splits to two DI registers during expand. Potential optimizations can be taken after expand pass. Originally, the TImode logical operations are split after reload pass. It's too late. The test case illustrates it.

Ping [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-07-03 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597158.html Thanks. On 24/6/2022 上午 10:02, HAO CHEN GUI wrote: > Hi, > This patch implements optab f[min/max]_optab by xs[min/max]dp on rs6000. > Tests show that outputs of xs[min/max]dp are consistent with the standard

Re: [PATCH v2, rs6000] Use CC for BCD operations [PR100736]

2022-07-03 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597020.html Thanks. On 22/6/2022 下午 4:26, HAO CHEN GUI wrote: > Hi, > This patch uses CC instead of CCFP for all BCD operations. Thus, infinite > math flag has no impact on BCD operations. To support BCD overflow and >

[PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]

2022-07-07 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch modifies the combine pattern after recog fails. With a helper - change_pseudo_and_mask, it converts a single pseudo to the pseudo AND with a mask when the outer operator is IOR/XOR/PLUS and inner operator is ASHIFT or AND. The conversion helps pattern to match rotate and mask insn

Re: [PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]

2022-07-10 Thread HAO CHEN GUI via Gcc-patches
Hi, Segher On 8/7/2022 上午 1:31, Segher Boessenkool wrote: >> --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c >> +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c >> @@ -2,14 +2,14 @@ >> /* { dg-options "-O2" } */ >> >> /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 14121 { target ilp32 }

Re: [PATCH v3, rs6000] Disable TImode from Bool expanders [PR100694, PR93123]

2022-07-18 Thread HAO CHEN GUI via Gcc-patches
Hi Segher, Thanks for your comments. On 13/7/2022 上午 1:26, Segher Boessenkool wrote: >> --- a/gcc/config/rs6000/rs6000.md >> +++ b/gcc/config/rs6000/rs6000.md >> @@ -7078,27 +7078,38 @@ (define_expand "subti3" >> }) >> >> ;; 128-bit logical operations expanders >> +;; Fail TImode in all 128

[PATCH v3] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]

2022-07-22 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch creates a new function - change_pseudo_and_mask. If recog fails, the function converts a single pseudo to the pseudo AND with a mask if the outer operator is IOR/XOR/PLUS and inner operator is ASHIFT or AND. The conversion helps pattern to match rotate and mask insn on some targets

[PATCH, rs6000] Add multiply-add expand pattern [PR103109]

2022-07-24 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds an expand and several insns for multiply-add with three 64bit operands. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is this okay for trunk? Any recommendations? Thanks a lot. ChangeLog 2022-07-22 Haochen Gui gcc/ PR target/103109

Ping [PATCH v3] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]

2022-07-31 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598685.html Thanks. On 22/7/2022 下午 3:07, HAO CHEN GUI wrote: > Hi, > This patch creates a new function - change_pseudo_and_mask. If recog fails, > the function converts a single pseudo to the pseudo AND with a mask if

Ping [PATCH, rs6000] Add multiply-add expand pattern [PR103109]

2022-07-31 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598744.html Thanks On 25/7/2022 下午 1:11, HAO CHEN GUI wrote: > Hi, > This patch adds an expand and several insns for multiply-add with > three 64bit operands. > > Bootstrapped and tested on powerpc64-linux BE and LE

Ping^2 [PATCH v2, rs6000] Use CC for BCD operations [PR100736]

2022-07-31 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597020.html Thanks. On 4/7/2022 下午 2:33, HAO CHEN GUI wrote: > Hi, >Gentle ping this: > https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597020.html > Thanks. > > On 22/6/2022 下午 4:26, HAO CHEN GUI wrote: >> Hi,

Ping^2 [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-07-31 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597158.html Thanks. On 4/7/2022 下午 2:32, HAO CHEN GUI wrote: > Hi, >Gentle ping this: > https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597158.html > Thanks. > > On 24/6/2022 上午 10:02, HAO CHEN GUI wrote: >> Hi,

[PATCH, rs6000] TARGET_MADDLD should include TARGET_POWERPC64

2022-08-03 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch changes the definition of TARGET_MADDLD and includes TARGET_POWERPC64, since maddld is a 64 bit instruction. maddld-1.c now checks "has_arch_ppc64". It depends on a patch which fixes empty TU problem. https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598744.html Bootstrappe

Re: [PATCH, rs6000] TARGET_MADDLD should include TARGET_POWERPC64

2022-08-03 Thread HAO CHEN GUI via Gcc-patches
Hi Segher, On 4/8/2022 上午 12:54, Segher Boessenkool wrote: > Hrm. But the maddld insn is useful for SImode as well, in 32-bit mode, > it is just its name that is a bit confusing then. Sorry for confusing > things :-( > > Add a test for SImode maddld as well? Thanks for your comments. Just w

[PATCH, rs6000] Correct return value of check_p9modulo_hw_available

2022-08-04 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch corrects return value of check_p9modulo_hw_available. It should return 0 when p9modulo is supported. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is this okay for trunk? Any recommendations? Thanks a lot. ChangeLog 2022-08-04 Haochen Gui gcc/tes

Re: [PATCH, rs6000] Correct return value of check_p9modulo_hw_available

2022-08-04 Thread HAO CHEN GUI via Gcc-patches
Hi Segher, Thanks so much for your explanation. Now I have a clear picture about the usage of return value. Patch was committed as r13-1971. Thanks Gui Haochen On 5/8/2022 上午 1:09, Segher Boessenkool wrote: > Hi! > > On Thu, Aug 04, 2022 at 05:55:20PM +0800, HAO CHEN GUI wrote: >> This patc

[PATCH v2, rs6000] Add multiply-add expand pattern [PR103109]

2022-08-07 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds an expand and several insns for multiply-add with three 64bit operands. Compared with last version, the main changes are: 1 The "maddld" pattern is reused for the low-part generation. 2 A runnable testcase replaces the original compiling case. 3 Fixes indention problems.

Re: [PATCH v2, rs6000] Add multiply-add expand pattern [PR103109]

2022-08-09 Thread HAO CHEN GUI via Gcc-patches
Hi Segher, Thanks for your comments. I checked the cost table. For P9 and P10, the cost of all mul* insn is the same, not relevant to the size of operand. I will split the test case to one compiling and one runnable case. Thanks. Gui Haochen On 10/8/2022 上午 5:43, Segher Boessenkool wrote: >

Re: [PATCH v3] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]

2022-08-10 Thread HAO CHEN GUI via Gcc-patches
Hi Segher, Really appreciate your review comments. On 11/8/2022 上午 1:38, Segher Boessenkool wrote: > Hi! > > Sorry for the tardiness. > > On Fri, Jul 22, 2022 at 03:07:55PM +0800, HAO CHEN GUI wrote: >> This patch creates a new function - change_pseudo_and_mask. If recog fails, >> the functi

[PATCHv4, rs6000] Generate mfvsrwz for all subtargets and remove redundant zero extend [PR106769]

2023-08-13 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch modifies vsx extract expand and generates mfvsrwz/stxsiwx for all sub targets when the mode is V4SI and the extracted element is word 1 from BE order. Also this patch adds a insn pattern for mfvsrwz which helps eliminate redundant zero extend. Compared to last version, the main

Re: [PATCH, rs6000] Skip redundant vector extract if the element is first element of dword0 [PR110429]

2023-08-15 Thread HAO CHEN GUI via Gcc-patches
Committed after tweaking and testing. https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=d471bdb0453de7b738f49148b66d57cb5871937d Thanks Gui Haochen 在 2023/7/28 17:32, Kewen.Lin 写道: > Hi Haochen, > > on 2023/7/5 11:22, HAO CHEN GUI wrote: >> Hi, >> This patch skips redundant vector extract insn to

Re: [PATCHv4, rs6000] Generate mfvsrwz for all subtargets and remove redundant zero extend [PR106769]

2023-08-15 Thread HAO CHEN GUI via Gcc-patches
Committed after fixing the comments. https://gcc.gnu.org/g:a79cf858b39e01c80537bc5d47a5e9004418c267 Thanks Gui Haochen 在 2023/8/14 15:47, Kewen.Lin 写道: > Hi Haochen, > > on 2023/8/14 10:18, HAO CHEN GUI wrote: >> Hi, >> This patch modifies vsx extract expand and generates mfvsrwz/stxsiwx >> f

Re: [PATCH-1, combine] Don't widen shift mode when target has rotate/mask instruction on original mode [PR93738]

2023-08-20 Thread HAO CHEN GUI via Gcc-patches
Jeff, Thanks a lot for your comments. The widen shift mode is on i1/i2 before they're combined with i3 to newpat. The newpat matches rotate/mask pattern. The i1/i2 itself don't match rotate/mask pattern. I did an experiment to disable widen shift mode for lshiftrt. I tested it on powerpc/x8

[PATCHv2, rs6000] Extract the element in dword0 by mfvsrd and shift/mask [PR110331]

2023-08-21 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch implements the vector element extraction by mfvsrd and shift/mask when the element is in dword0 of the vector. Originally, it generates vsplat/mfvsrd on P8 and li/vextract on P9. Since mfvsrd has lower latency than vextract and rldicl has lower latency than vsplat, the new sequence

[PATCH, rs6000] Generate mfvsrwz for all platforms and remove redundant zero extend [PR106769]

2023-06-18 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch modifies vsx extract expander and generates mfvsrwz/stxsiwx for all platforms when the mode is V4SI and the index of extracted element is 1 for BE and 2 for LE. Also this patch adds a insn pattern for mfvsrwz which can help eliminate redundant zero extend. Bootstrapped and teste

Re: [PATCH, rs6000] Add two peephole2 patterns for mr. insn

2023-06-19 Thread HAO CHEN GUI via Gcc-patches
HP, It makes sense. I will update the patch. Thanks Gui Haochen 在 2023/6/20 8:07, Hans-Peter Nilsson 写道: > On Tue, 30 May 2023, HAO CHEN GUI via Gcc-patches wrote: > >> +++ b/gcc/config/rs6000/rs6000.md >> @@ -7891,6 +7891,36 @@ (define_insn "*mov_internal2"

[PATCHv4, rs6000] Add two peephole2 patterns for mr. insn

2023-06-20 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds two peephole2 patterns which help convert certain insn sequences to "mr." instruction. These insn sequences can't be combined in combine pass. Compared to last version, the empty constraint is removed and test cases run only on powerpc Linux as AIX doesn't support "-mregnam

[PATCHv4, rs6000] Splat vector small V2DI constants with ISA 2.07 instructions [PR104124]

2023-06-24 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds a new insn for vector splat with small V2DI constants on P8. If the value of constant is in RANGE (-16, 15) and not 0 or -1, it can be loaded with vspltisw and vupkhsw on P8. It should be efficient than loading vector from memory. Compared to last version, the main change i

[PATCH, rs6000] Extract the element in dword0 by mfvsrd and shift/mask [PR110331]

2023-07-02 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch implements the vector element extraction by mfvsrd and shift/mask when the element is in dword0 of the vector. Originally, it generates vsplat/mfvsrd on P8 and li/vextract on P9. Since mfvsrd has lower latency than vextract and rldicl has lower latency than vsplat, the new sequence

[PATCH, rs6000] Skip redundant vector extract if the element is first element of dword0 [PR110429]

2023-07-04 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch skips redundant vector extract insn to be generated when the extracted element is the first element of dword0 and the destination is a memory operand. Only one 'stxsi[hb]x' instruction is enough. The V4SImode is fixed in a previous patch. https://gcc.gnu.org/pipermail/gcc-patche

[PATCH, rs6000] Merge two vector shift when their sources are the same

2023-02-20 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch merges two "vsldoi" insns when their sources are the same. Particularly, it is simplified to be one move if the total shift is multiples of 16 bytes. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog 2023-02-20 Haochen Gui

Ping [PATCH, rs6000] Split TImode for logical operations in expand pass [PR100694]

2023-02-20 Thread HAO CHEN GUI via Gcc-patches
Hi, Gently ping this: https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html Gui Haochen Thanks 在 2023/2/8 13:08, HAO CHEN GUI 写道: > Hi, > The logical operations for TImode is split after reload pass right now. Some > potential optimizations miss as the split is too late. This

[PATCHv2, rs6000] Merge two vector shift when their sources are the same

2023-02-27 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch merges two "vsldoi" insns when their sources are the same. Particularly, it is simplified to be one move if the total shift is multiples of 16 bytes. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog 2023-02-28 Haochen Gui

[PATCH, gfortran] Escalate failure when Hollerith constant to real conversion fails [PR103628]

2023-02-28 Thread HAO CHEN GUI via Gcc-patches
Hi, The patch escalates the failure when Hollerith constant to real conversion fails in native_interpret_expr. It finally reports an "Unclassifiable statement" error. The patch of pr95450 added a verification for decoding/encoding checking in native_interpret_expr. native_interpret_expr may fa

[PATCHv2, gfortran] Escalate failure when Hollerith constant to real conversion fails [PR103628]

2023-03-03 Thread HAO CHEN GUI via Gcc-patches
Hi, The patch escalates the failure when Hollerith constant to real conversion fails in native_interpret_expr. It finally reports an "Unclassifiable statement" error. The patch of pr95450 added a verification for decoding/encoding checking in native_interpret_expr. native_interpret_expr may fa

Re: [PATCHv2, gfortran] Escalate failure when Hollerith constant to real conversion fails [PR103628]

2023-03-03 Thread HAO CHEN GUI via Gcc-patches
Hi, The patch passed regression test on Power linux platforms. Sorry for missing the information. Gui Haochen 在 2023/3/3 17:12, HAO CHEN GUI via Gcc-patches 写道: > Hi, > The patch escalates the failure when Hollerith constant to real conversion > fails in native_interpret_expr. I

Re: [PATCHv2, gfortran] Escalate failure when Hollerith constant to real conversion fails [PR103628]

2023-03-03 Thread HAO CHEN GUI via Gcc-patches
Hi Tobias, 在 2023/3/3 17:29, Tobias Burnus 写道: > But could you also include the 'gcc/fortran/intrinsic.cc' change > proposed in > https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613030.html (and > acknowledged by Steve)? Sure, I will merge it into the patch and do the regression test. Addi

[PATCHv3, gfortran] Escalate failure when Hollerith constant to real conversion fails [PR103628]

2023-03-07 Thread HAO CHEN GUI via Gcc-patches
Hi, The patch escalates the failure when Hollerith constant to real conversion fails in native_interpret_expr. It finally reports an "Cannot simplify expression" error in do_simplify method. The patch of pr95450 added a verification for decoding/encoding checking in native_interpret_expr. nati

Re: [PATCH] testsuite, rs6000: Adjust ppc-fortran.exp to support dg-{warning,error}

2023-03-10 Thread HAO CHEN GUI via Gcc-patches
Hi Kewen, I tested it with my fortran test case. It works. Thanks a lot. Gui Haochen 在 2023/3/6 17:27, Kewen.Lin 写道: > Hi, > > According to Haochen's finding in [1], currently ppc-fortran.exp > doesn't support Fortran specific warning or error messages well. > By looking into it, it's due to t

[PATCH-1, rs6000] Put constant into pseudo at expand when it needs two insns [PR86106]

2023-03-15 Thread HAO CHEN GUI via Gcc-patches
Hi, Currently, rs6000 directly expands to 2 insns if an integer constant is the second operand and it needs two insns. For example, addi/addis and ori/oris. It may not benefit when the constant is used for more than 2 times in an extended basic block, just like the case in PR shows. One possib

[PATCH-2, rs6000] Put constant into pseudo at expand when it needs two insns [PR86106]

2023-03-15 Thread HAO CHEN GUI via Gcc-patches
Hi, The background and motivation of the patch are listed in the note of PATCH-1. This patch changes the expander of ior/xor and force constant to a pseudo when it needs 2 insn. Also a combine and split pattern for ior/xor is defined. rtx_cost of ior insn is adjusted as now it may have 2 insns

Re: [PATCH-1, rs6000] Put constant into pseudo at expand when it needs two insns [PR86106]

2023-03-16 Thread HAO CHEN GUI via Gcc-patches
Hi Richard, 在 2023/3/16 15:57, Richard Biener 写道: > So this is one way around the lack of CSE/PRE of constant operands. I'd > argue that a better spot for this _might_ be LRA (split the constant out if > there's a free register available), postreload-[g]cse (CSE the constants) and > then maybe cp

[PATCHv3, rs6000] Splat vector small V2DI constants with ISA 2.07 instructions [PR104124]

2023-05-25 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds a new insn for vector splat with small V2DI constants on P8. If the value of constant is in RANGE (-16, 15) and not 0 or -1, it can be loaded with vspltisw and vupkhsw on P8. It should be efficient than loading vector from memory. Compared to last version, the main change i

[PATCH, rs6000] Optimization for PowerPC 64bit constant generation [PR94395]

2021-01-28 Thread HAO CHEN GUI via Gcc-patches
Hi,    This patch tries to optimize PowerPC 64 bit constant generation when the constant can be transformed from a 32 bit or 16 bit constant by rotating, shifting and mask AND.    The attachments are the patch diff file and change log file.    Bootstrapped and tested on powerpc64le with no r

Re: [PATCH, rs6000] Optimization for PowerPC 64bit constant generation [PR94395]

2021-02-02 Thread HAO CHEN GUI via Gcc-patches
Alan,    Thanks for your info. Just notice your patch. I will wait for your patch being reviewed. On 3/2/2021 上午 10:32, Alan Modra wrote: On Fri, Jan 29, 2021 at 11:11:23AM +0800, HAO CHEN GUI via Gcc-patches wrote:    This patch tries to optimize PowerPC 64 bit constant generation when

[PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-08-24 Thread HAO CHEN GUI via Gcc-patches
Hi    The patch disables gimple fold for float or double vec_min/max builtin when fast-math is not set. Two test cases are added to verify the patch.    The attachments are the patch diff and change log file.    Bootstrapped and tested on powerpc64le-linux with no regressions. Is this okay

Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-08-25 Thread HAO CHEN GUI via Gcc-patches
Hi,     I refined the patch according to Bill's advice. I pasted the ChangeLog and diff file here. If it doesn't work, please let me know. Thanks. 2021-08-25 Haochen Gui gcc/     * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):     Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_

Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-08-25 Thread HAO CHEN GUI via Gcc-patches
Hi Kewen,   Thanks for your advice. On 25/8/2021 下午 3:50, Kewen.Lin wrote: Hi Haochen, on 2021/8/25 下午3:06, HAO CHEN GUI via Gcc-patches wrote: Hi,     I refined the patch according to Bill's advice. I pasted the ChangeLog and diff file here. If it doesn't work, please let me kn

Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-08-25 Thread HAO CHEN GUI via Gcc-patches
On 25/8/2021 下午 4:17, HAO CHEN GUI via Gcc-patches wrote: Hi Kewen,   Thanks for your advice. On 25/8/2021 下午 3:50, Kewen.Lin wrote: Hi Haochen, on 2021/8/25 下午3:06, HAO CHEN GUI via Gcc-patches wrote: Hi, I refined the patch according to Bill's advice. I pasted the ChangeLo

Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-08-25 Thread HAO CHEN GUI via Gcc-patches
Hi Bill,    Thanks for your comments. Hi Segher,    Here is the ChangeLog and patch diff. Thanks. 2021-08-25 Haochen Gui gcc/     * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):     Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,     VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_V

[PATCH] Put absolute address jump table in data.rel.ro.local if targets support relocations

2020-09-13 Thread HAO CHEN GUI via Gcc-patches
Hi,   Jump tables are put into text or rodata section originally. On some platforms, it gains the performance benefit from absolute address jump tables. So I want to let absolute address jump table be relocatable.  This patch puts absolute jump table in read only relocation section if the tar

Re: Do we need to do a loop invariant motion after loop interchange ?

2020-09-21 Thread HAO CHEN GUI via Gcc-patches
Bin, I just tested your patch on current trunk.  Here is my summary. 1. About some iv aren't moved out of inner loop (Lijia mentioned in his last email)   [local count: 955630226]:   # l_32 = PHI <1(12), l_54(21)>   # ivtmp_165 = PHI <_446(12), ivtmp_155(21)>   _26 = (integer(kind=8)) l_32;  

Re: [PATCH, rs6000] Add non-relative jump table support on Power Linux

2020-09-27 Thread HAO CHEN GUI via Gcc-patches
Segher,     Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553486.html Thanks Gui Haochen On 9/9/2020 下午 4:55, HAO CHEN GUI wrote: Hi Segher,     Thanks for your advice. I removed macros defined in linux64.h and linux.h. So they take relative jump tables b

Re: [PATCH] Put absolute address jump table in data.rel.ro.local if targets support relocations

2020-09-27 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553809.html Thanks Gui Haochen On 14/9/2020 上午 11:01, HAO CHEN GUI wrote: Hi,   Jump tables are put into text or rodata section originally. On some platforms, it gains the performance benefit from absolute addre

[PATCH, rs6000] Splat vector small V2DI constants with ISA 2.07 instructions [PR104124]

2022-09-20 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds a new insn for vector splat with small V2DI constants on P8. If the value of constant is in RANGE (-16, 15) and not 0 or -1, it can be loaded with vspltisw and vupkhsw on P8. It should be efficient than loading vector from TOC. Bootstrapped and tested on powerpc64-linux BE

Ping^3 [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-20 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597158.html Thanks. On 1/8/2022 上午 10:03, HAO CHEN GUI wrote: > Hi, >Gentle ping this: > https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597158.html > Thanks. > > > On 4/7/2022 下午 2:32, HAO CHEN GUI wrote: >> H

Ping^3 [PATCH v2, rs6000] Use CC for BCD operations [PR100736]

2022-09-20 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597020.html Thanks. On 1/8/2022 上午 10:02, HAO CHEN GUI wrote: > Hi, > Gentle ping this: > https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597020.html > Thanks. > > On 4/7/2022 下午 2:33, HAO CHEN GUI wrote: >> H

Ping [PATCH v3, rs6000] Change mode and insn condition for VSX scalar extract/insert instructions

2022-09-20 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601196.html Thanks. On 7/9/2022 下午 3:44, HAO CHEN GUI wrote: > Hi, > > For scalar extract/insert instructions, exponent field can be stored in a > 32-bit register. So this patch changes the mode of exponent fiel

Re: [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-22 Thread HAO CHEN GUI via Gcc-patches
Hi Kewen & Segher, Thanks so much for your review comments. On 22/9/2022 上午 10:28, Kewen.Lin wrote: > on 2022/9/22 05:56, Segher Boessenkool wrote: >> Hi! >> >> On Fri, Jun 24, 2022 at 10:02:19AM +0800, HAO CHEN GUI wrote: >>> This patch also binds __builtin_vsx_xs[min/max]dp to fmin/max instea

[PATCH v7, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-25 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch implements optab f[min/max]_optab by xs[min/max]dp on rs6000. Tests show that outputs of xs[min/max]dp are consistent with the standard of C99 fmin/max. This patch also binds __builtin_vsx_xs[min/max]dp to fmin/max instead of smin/max when fast-math is not set. While fast-math i

[PATCH-1, rs6000] Generate permute index directly for little endian target [PR100866]

2022-10-11 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch modifies the help function which generates permute index for vector byte reversion and generates permute index directly for little endian targets. It saves one "xxlnor" instructions on P8 little endian targets as the original process needs an "xxlnor" to calculate complement for th

[PATCH v3, rs6000] Enable absolute jump table for PPC AIX and Linux

2022-02-28 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch enables absolute jump tables on PPC AIX and Linux. For AIX, the jump table is placed in data section. For Linux, it is placed in RELRO section when relocation is needed. Bootstrapped and tested on AIX,Linux BE and LE with no regressions. Is this okay for trunk? Any recommend

PATCH, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-03-09 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds V1TI mode into mode iterator used in vector comparison expands.With the patch, both built-ins and direct comparison could generate P10 new V1TI comparison instructions. Bootstrapped and tested on ppc64 Linux BE and LE with no regressions. Is this okay for trunk? Any recom

Ping^1 [PATCH, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-03-14 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591507.html Thanks On 10/3/2022 下午 2:31, HAO CHEN GUI wrote: > Hi, >This patch adds V1TI mode into mode iterator used in vector comparison > expands.With the patch, both built-ins and direct comparison could generat

Ping^1 [PATCH, rs6000] Correct match pattern in pr56605.c

2022-03-14 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590958.html Thanks On 28/2/2022 上午 11:17, HAO CHEN GUI wrote: > Hi, > This patch corrects the match pattern in pr56605.c. The former pattern > is wrong and test case fails with GCC11. It should match following insn

[PATCHv2, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-03-16 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds V1TI mode into a new mode iterator used in vector comparison expands.With the patch, both built-ins and direct comparison could generate P10 new V1TI comparison instructions. Bootstrapped and tested on ppc64 Linux BE and LE with no regressions. Is this okay for trunk? Any

[PATCH v3, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-03-20 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds V1TI mode into a new mode iterator used in vector comparison expands.Without the patch, the comparisons between two vector __int128 are converted to scalar comparisons with branches. The code is suboptimal.The patch fixes the issue. Now all comparisons between two vector __in

Ping^2 [PATCH, rs6000] Correct match pattern in pr56605.c

2022-04-07 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590958.html Thanks On 15/3/2022 上午 10:06, HAO CHEN GUI wrote: > Hi, > Gentle ping this: > https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590958.html > Thanks > > On 28/2/2022 上午 11:17, HAO CHEN GUI wro

Re: [PATCH, rs6000] Correct match pattern in pr56605.c

2022-04-10 Thread HAO CHEN GUI via Gcc-patches
Hi, On 9/4/2022 上午 12:48, will schmidt wrote: > On Mon, 2022-02-28 at 11:17 +0800, HAO CHEN GUI via Gcc-patches wrote: >> Hi, >> This patch corrects the match pattern in pr56605.c. The former pattern >> is wrong and test case fails with GCC11. It should match following insn

Re: [PATCH, rs6000] Correct match pattern in pr56605.c

2022-04-10 Thread HAO CHEN GUI via Gcc-patches
Hi, On 9/4/2022 上午 3:36, Segher Boessenkool wrote: > Hi! > > On Mon, Feb 28, 2022 at 11:17:27AM +0800, HAO CHEN GUI wrote: >> This patch corrects the match pattern in pr56605.c. The former pattern >> is wrong and test case fails with GCC11. It should match following insn on >> each subtarget af

Re: Ping^2 [PATCH, rs6000] Correct match pattern in pr56605.c

2022-04-19 Thread HAO CHEN GUI via Gcc-patches
11, 2022 at 08:54:14PM -0300, Alexandre Oliva wrote: >> On Apr 7, 2022, HAO CHEN GUI via Gcc-patches >> wrote: >> >>> Gentle ping this: >>>https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590958.html >>> Thanks >> >>>>

Re: Ping^2 [PATCH, rs6000] Correct match pattern in pr56605.c

2022-04-19 Thread HAO CHEN GUI via Gcc-patches
Hi Segher, Yes, the old committed patch caused it matches two insns. So I submitted the new patch which fixes the problem. Here is the new patch. https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590958.html The new pattern is: /* { dg-final { scan-rtl-dump-times {\(compare:CC \(and:SI \(

[PATCH, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-18 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds a combine pattern for "CA minus one". As CA only has two values (0 or 1), we could convert following pattern (sign_extend:DI (plus:SI (reg:SI 98 ca) (const_int -1 [0x] to (plus:DI (reg:DI 98 ca) (const_int -1 [0xfff

Re: [PATCH, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-19 Thread HAO CHEN GUI via Gcc-patches
On 19/1/2022 下午 3:52, Andrew Pinski wrote: > On Tue, Jan 18, 2022 at 11:13 PM HAO CHEN GUI via Gcc-patches > wrote: >> >> Hi, >>This patch adds a combine pattern for "CA minus one". As CA only has two >> values (0 or 1), we could convert following p

[PATCH v2, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-19 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds a combine pattern for "CA minus one". As CA only has two values (0 or 1), we could convert following pattern (sign_extend:DI (plus:SI (reg:SI 98 ca) (const_int -1 [0x] to (plus:DI (reg:DI 98 ca) (const_int -1 [0xfff

Re: [PATCH v2, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-20 Thread HAO CHEN GUI via Gcc-patches
Thanks so much for your advice. Please see my comments. On 21/1/2022 上午 5:42, Segher Boessenkool wrote: > Hi! > > On Thu, Jan 20, 2022 at 01:46:48PM -0500, David Edelsohn wrote: >> On Thu, Jan 20, 2022 at 2:36 AM HAO CHEN GUI wrote: >>>This patch adds a combine pattern for "CA minus one". As

[PATCH v3, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-21 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch adds a combine pattern for "CA minus one". As CA only has two values (0 or 1), we could convert following pattern (sign_extend:DI (plus:SI (reg:SI 98 ca) (const_int -1 [0x] to (plus:DI (reg:DI 98 ca) (const_int -1 [0xfff

[PATCH, rs6000] Remove TImode from mode iterator BOOL_128 [PR100694]

2022-02-08 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch removes TImode from mode iterator BOOL_128. Thus, bool operations (AND, IOR, XOR, NOT) on TImode will be split to the relevant operations on word mode during expand (in optabs.c). Potential optimizations can be implemented after the split. The former practice splits it after the

Ping^1 [PATCH, rs6000] Remove TImode from mode iterator BOOL_128 [PR100694, PR93123]

2022-02-13 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590057.html Thanks On 9/2/2022 上午 10:43, HAO CHEN GUI wrote: > Hi, > This patch removes TImode from mode iterator BOOL_128. Thus, bool > operations (AND, IOR, XOR, NOT) > on TImode will be split to the relevant ope

Ping^1 [PATCH v3, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-02-13 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589006.html Thanks On 21/1/2022 下午 5:28, HAO CHEN GUI wrote: > Hi, >This patch adds a combine pattern for "CA minus one". As CA only has two > values (0 or 1), we could convert following pattern > (sign_extend

Ping^2 [PATCH, rs6000] Fix ICE on expand bcd__ [PR100736]

2022-02-13 Thread HAO CHEN GUI via Gcc-patches
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587253.html Thanks On 10/1/2022 上午 11:14, HAO CHEN GUI wrote: > Hi, > > Gentle ping this: > > https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587253.html > > Thanks > > On 21/12/2021 下午 4:19, HAO

Re: [PATCH, rs6000] Remove TImode from mode iterator BOOL_128 [PR100694]

2022-02-14 Thread HAO CHEN GUI via Gcc-patches
Segher, Thanks for your comments. Here are my comments and questions.Thanks. On 15/2/2022 上午 5:36, Segher Boessenkool wrote: > Hi! > > On Wed, Feb 09, 2022 at 10:43:17AM +0800, HAO CHEN GUI wrote: >> This patch removes TImode from mode iterator BOOL_128. Thus, bool >> operations (AND, IOR, X

Re: [PATCH, rs6000] Remove TImode from mode iterator BOOL_128 [PR100694]

2022-02-16 Thread HAO CHEN GUI via Gcc-patches
Hi, On 15/2/2022 下午 10:56, Segher Boessenkool wrote: > On Tue, Feb 15, 2022 at 11:01:03AM +0800, HAO CHEN GUI wrote: > Hi! > >> On 15/2/2022 上午 5:36, Segher Boessenkool wrote: >>> On Wed, Feb 09, 2022 at 10:43:17AM +0800, HAO CHEN GUI wrote: >>> All that are arguments for expanding to split form,

[PATCH v2, rs6000] Enable absolute jump table for PPC AIX and Linux

2022-02-16 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch enables absolute jump tables on PPC AIX and Linux. For AIX, the jump table is placed in data section. For Linux, it is placed in RELRO section when relocation is needed. Bootstrapped and tested on AIX,Linux BE and LE with no regressions. Is this okay for trunk? Any recommend

[PATCH v2, rs6000] Disable TImode from Bool expanders [PR100694, PR93123]

2022-02-20 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch disables TImode for Bool expanders. Thus TI register can be split to two DI registers during expand.Potential optimizations can be implemented after the split. The new test case illustrates it. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is this oka

Re: [PATCH v2, rs6000] Enable absolute jump table for PPC AIX and Linux

2022-02-21 Thread HAO CHEN GUI via Gcc-patches
Kewen, Thanks so much for your advice. On 21/2/2022 下午 5:42, Kewen.Lin wrote: > Hi Haochen, > > Some minor comments are inlined. > > on 2022/2/16 下午4:42, HAO CHEN GUI via Gcc-patches wrote: >> Hi, >>This patch enables absolute jump tables on PPC AIX and L

[PATCH, rs6000] Correct match pattern in pr56605.c

2022-02-27 Thread HAO CHEN GUI via Gcc-patches
Hi, This patch corrects the match pattern in pr56605.c. The former pattern is wrong and test case fails with GCC11. It should match following insn on each subtarget after mode promotion is disabled. The patch need to be backported to GCC11. //gimple _17 = (unsigned int) _20; prolog_loop_niters.

Re: [PATCH 0/3] Add zero cycle move support

2021-11-22 Thread HAO CHEN GUI via Gcc-patches
Bill and David,     Currently, the absolute jump table is not by default enabled. It can be enabled by undocumented option "-mno-relative-jumptables". If the target supports named sections (have_named_sections), the feature can be enabled. We plan to enable the future by default in GCC12 and th

Re: [PATCH, rs6000] optimization for vec_reve builtin [PR100868]

2021-11-23 Thread HAO CHEN GUI via Gcc-patches
Thanks for your review. Committed as r12-5463. On 22/11/2021 上午 10:56, David Edelsohn wrote: > On Wed, Nov 17, 2021 at 3:28 AM HAO CHEN GUI wrote: >> Hi, >> >> The patch optimized for vec_reve builtin on rs6000. For V2DI and V2DF, it >> is implemented by xxswapd on all targets. For V16QI, V8HI

[PATCH] Modify combine pattern by anding a pseudo with its nonzero bits

2021-11-30 Thread HAO CHEN GUI via Gcc-patches
Hi,     This patch modifies the combine pattern with a helper - change_pseudo_and_mask when recog fails. The helper converts a single pseudo to the pseudo and with a mask if the outer operator is IOR/XOR/PLUS and the inner operator is ASHIFT/LSHIFTRT/AND. The conversion helps match shift + ior

  1   2   3   >