Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]

2023-09-29 Thread HAO CHEN GUI
Richard, 在 2023/9/28 21:39, Richard Sandiford 写道: > That looks easily solvable though. I've posted a potential fix as: > >https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631595.html > > Is that the only blocker to doing this in generic code? Thanks so much for your patch. It works

[PATCH-1, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-08 Thread HAO CHEN GUI
Hi, Vector mode instructions are efficient on some targets (e.g. ppc64). This patch enables vector mode for compare_by_pieces. The non-member function widest_fixed_size_mode_for_size takes by_pieces_operation as the second argument and decide whether vector mode is enabled or not by the type of o

[PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-08 Thread HAO CHEN GUI
Hi, This patch enables vector mode for memory equality compare by adding a new expand cbranchv16qi4 and implementing it. Also the corresponding CC reg and compare code is set in rs6000_generate_compare. With the patch, 16-byte equality compare can be implemented by one vector compare instructions

Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-10 Thread HAO CHEN GUI
Hi David, Thanks for your review comments. 在 2023/10/9 23:42, David Edelsohn 写道: >  #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8) >  #define MAX_MOVE_MAX 8 > +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16) > +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16) > > >

[PATCH-1v2, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-11 Thread HAO CHEN GUI
Hi, Vector mode instructions are efficient on some targets (e.g. ppc64). This patch enables vector mode for compare_by_pieces. The non-member function widest_fixed_size_mode_for_size takes by_pieces_operation as the second argument and decide whether vector mode is enabled or not by the type of o

[PATCH-2v2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-11 Thread HAO CHEN GUI
Hi, This patch enables vector mode for memory equality compare by adding a new expand cbranchv16qi4 and implementing it. Also the corresponding CC reg and compare code is set in rs6000_generate_compare. With the patch, 16-byte equality compare can be implemented by one vector compare instructions

Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-11 Thread HAO CHEN GUI
Hi David, 在 2023/10/10 20:44, David Edelsohn 写道: > Are you stating that although PPC32 supports V16QImode in VSX, the > move_by_pieces support also requires TImode, which is not available on PPC32? > Yes. By setting MOVE_MAX_PIECES to 16, TImode compare might be generated as it checks vector mo

PATCH-1v3, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-13 Thread HAO CHEN GUI
Hi, Vector mode instructions are efficient for compare on some targets. This patch enables vector mode for compare_by_pieces. Currently, vector mode is enabled for compare, set and clear. Helper function "qi_vector_p" decides if vector mode is enabled for certain by pieces operation. optabs_check

Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-19 Thread HAO CHEN GUI
Kewen & David, Thanks for your comments. 在 2023/10/17 10:19, Kewen.Lin 写道: > I think David raised a good question, it sounds to me that the current > handling simply consider that if MOVE_MAX_PIECES is set to 16, the > required operations for this optimization on TImode are always available, > b

[PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-20 Thread HAO CHEN GUI
Hi, Vector mode instructions are efficient for compare on some targets. This patch enables vector mode for compare_by_pieces. Two help functions are added to check if vector mode is available for certain by pieces operations and if if optabs exists for the mode and certain by pieces operations. O

Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-22 Thread HAO CHEN GUI
Committed as r14-4835. https://gcc.gnu.org/g:f08ca5903c7a02b450b93143467f70b9fd8e0085 Thanks Gui Haochen 在 2023/10/20 16:49, Richard Sandiford 写道: > HAO CHEN GUI writes: >> Hi, >> Vector mode instructions are efficient for compare on some targets. >> This patch en

Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-24 Thread HAO CHEN GUI
; Sent: Tuesday, October 24, 2023 4:43 PM > To: HAO CHEN GUI ; Richard Sandiford > > Cc: gcc-patches > Subject: RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces > [PR111449] > > Hi Haochen Gui, > > It seems that the commit caused lots of test case fai

Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-25 Thread HAO CHEN GUI
Hi Haochen, The regression cases are caused by "targetm.scalar_mode_supported_p" I added for scalar mode checking. XImode, OImode and TImode (with -m32) are not enabled in ix86_scalar_mode_supported_p. So they're excluded from by pieces operations on i386. The original code doesn't do a check

[PATCH, expand] Checking available optabs for scalar modes in by pieces operations

2023-10-27 Thread HAO CHEN GUI
Hi, This patch checks available optabs for scalar modes used in by pieces operations. It fixes the regression cases caused by previous patch. Now both scalar and vector modes are examined by the same approach. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is

Re: [PATCH, expand] Checking available optabs for scalar modes in by pieces operations

2023-10-29 Thread HAO CHEN GUI
Committed as r14-5001. Thanks Gui Haochen 在 2023/10/27 17:29, Richard Sandiford 写道: > HAO CHEN GUI writes: >> Hi, >> This patch checks available optabs for scalar modes used in by >> pieces operations. It fixes the regression cases caused by previous >> patch. Now b

[PATCH-2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-05 Thread HAO CHEN GUI
Hi, This patch enables vector mode for by pieces equality compare. It adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare relies both move and compare instructions, so both macro are changed. The vector load/stor

[PATCH-3, rs6000] Enable 16-byte by pieces move [PR111449]

2023-11-05 Thread HAO CHEN GUI
Hi, The patch 2 enables 16-byte by pieces move on rs6000. This patch fixes the regression cases caused by previous patch. For sra-17/18, the long array with 4 elements can be loaded by one 16-byte by pieces move on 32-bit platform. So the array is not be constructed in LC0 and SRA optimization is

[PATCH-3v2, rs6000] Enable 16-byte by pieces move [PR111449]

2023-11-06 Thread HAO CHEN GUI
Hi, The patch 2 enables 16-byte by pieces move on rs6000. This patch fixes the regression cases caused by previous patch. For sra-17/18, the long array with 4 elements can be loaded by one 16-byte by pieces move on 32-bit platform. So the array is not be constructed in LC0 and SRA optimization is

[PATCH, rs6000] Add subreg patterns for SImode rotate and mask insert

2024-02-29 Thread HAO CHEN GUI
Hi, This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode lshiftrt with an out AND. It matches a DImode rotate and mask insert on rs6000. Trying 2 -> 7: 2: r122:DI=r129:DI REG_DEAD r129:DI 7:

Re: [PATCH] fwprop: Avoid volatile defines to be propagated

2024-03-03 Thread HAO CHEN GUI
Hi Jeff, Thanks for your comments. 在 2024/3/4 6:02, Jeff Law 写道: > Why specifically are you worried here?  Propagation of a volatile shouldn't > in and of itself cause a problem.  We're not changing the number of volatile > accesses or anything like that -- we're just moving them around a bit.

Re: [PATCH] fwprop: Avoid volatile defines to be propagated

2024-03-04 Thread HAO CHEN GUI
Hi Jeff, 在 2024/3/4 11:37, Jeff Law 写道: > Can the same thing happen with a volatile memory load?  I don't think that  > will be caught by the volatile_insn_p check. Yes, I think so. If the define rtx contains volatile memory references, it may hit the same problem. We may use volatile_refs_p inst

[PATCHv2] fwprop: Avoid volatile defines to be propagated

2024-03-04 Thread HAO CHEN GUI
Hi, This patch tries to fix a potential problem which is raised by the patch for PR111267. The volatile asm operand tries to be propagated to a single set insn with the patch for PR111267. The volatile asm operand might be executed for multiple times if the define insn isn't eliminated after prop

[PATCHv2, rs6000] Add subreg patterns for SImode rotate and mask insert

2024-03-08 Thread HAO CHEN GUI
Hi, This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode lshiftrt with an out AND. It matches a DImode rotate and mask insert on rs6000. Trying 2 -> 7: 2: r122:DI=r129:DI REG_DEAD r129:DI 7:

[PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits

2024-03-10 Thread HAO CHEN GUI
Hi, This patch tries to fix the problem when a canonical form doesn't benefit on a specific target. The const operand of AND is and with the nonzero bits of another operand in combine pass. It's a canonical form, but it's no benefits for the target which has rotate and mask insns. As the mask is

Re: [PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits

2024-03-18 Thread HAO CHEN GUI
Hi, Gently ping this: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html Thanks Gui Haochen 在 2024/3/11 13:41, HAO CHEN GUI 写道: > Hi, > This patch tries to fix the problem when a canonical form doesn't benefit > on a specific target. The const operand of AND i

[PATCH] Value Range: Add range op for builtin isinf

2024-03-24 Thread HAO CHEN GUI
Hi, The builtin isinf is not folded at front end if the corresponding optab exists. It causes the range evaluation failed on the targets which has optab_isinf. For instance, range-sincos.c will fail on the targets which has optab_isinf as it calls builtin_isinf. This patch fixed the problem by

[patch, rs6000] Implement optab_isinf for SFmode, DFmode and TFmode [PR97786]

2024-03-24 Thread HAO CHEN GUI
Hi, This patch implemented optab_isinf for SF/DF/TFmode by rs6000 test data class instructions. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for next stage 1? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isinf for SFmode, DFmode and TFmode gcc/

[Patch] Builtin: Fold builtin_isinf on IBM long double to builtin_isinf on double [PR97786]

2024-03-27 Thread HAO CHEN GUI
Hi, This patch folds builtin_isinf on IBM long double to builtin_isinf on double type. The former patch https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html implemented the DFmode isinf_optab. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for ne

[Patch, rs6000] Enable overlap memory store for block memory clear

2024-02-25 Thread HAO CHEN GUI
Hi, This patch enables overlap memory store for block memory clear which saves the number of store instructions. The expander calls widest_fixed_size_mode_for_block_clear to get the mode for looped block clear and calls widest_fixed_size_mode_for_block_clear to get the mode for last overlapped cl

[PATCH] fwprop: Avoid volatile defines to be propagated

2024-02-25 Thread HAO CHEN GUI
Hi, This patch tries to fix a potential problem which is raised by the patch for PR111267. The volatile asm operand tries to be propagated to a single set insn with the patch for PR111267. It has potential risk as the behavior is wrong. Currently set_src_cost comparison can reject such propagatio

[PATCH, rs6000] Refactor expand_compare_loop and split it to two functions

2024-01-09 Thread HAO CHEN GUI
Hi, This patch refactors function expand_compare_loop and split it to two functions. One is for fixed length and another is for variable length. These two functions share some low level common help functions. Besides above changes, the patch also does: 1. Don't generate load and compare loop w

[Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]

2024-01-11 Thread HAO CHEN GUI
Hi, This patch eliminates unnecessary byte swaps for block clear on P8 LE. For block clear, all the bytes are set to zero. The byte order doesn't make sense. So the alignment of destination could be set to the store mode size in stead of 1 byte in order to eliminates unnecessary byte swap instruc

Re: [Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]

2024-01-11 Thread HAO CHEN GUI
Hi Richard, Thanks so much for your comments. >> patch.diff >> diff --git a/gcc/config/rs6000/rs6000-string.cc >> b/gcc/config/rs6000/rs6000-string.cc >> index 7f777666ba9..4c9b2cbeefc 100644 >> --- a/gcc/config/rs6000/rs6000-string.cc >> +++ b/gcc/config/rs6000/rs6000-string.cc >> @@ -140,7

[PATCH, rs6000] Enable block compare expand on P9 with m32 and mpowerpc64

2024-01-11 Thread HAO CHEN GUI
Hi, On P9 "setb" is used to set the result of block compare. So it works with m32 and mpowerpc64. On P8, carry bit is used. So it can't work with m32 and mpowerpc64. This patch enables block compare expand for m32 and mpowerpc64 on P9. Bootstrapped and tested on x86 and powerpc64-linux BE and

Re: [PATCH, rs6000] Refactor expand_compare_loop and split it to two functions

2024-01-15 Thread HAO CHEN GUI
Hi Kewen, 在 2024/1/15 14:16, Kewen.Lin 写道: > Considering it's stage 4 now and the impact of this patch, let's defer > this to next stage 1, if possible could you organize the above changes > into patches: > > 1) Refactor expand_compare_loop by splitting into two functions without >any functio

[PATCH, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-01-15 Thread HAO CHEN GUI
Hi, This patch adds const0 move checking for CLEAR_BY_PIECES. The original vec_duplicate handles duplicates of non-constant inputs. But 0 is a constant. So even a platform doesn't support vec_duplicate, it could still do clear by pieces if it supports const0 move by that mode. The test cases w

[PATCH-1] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-01-25 Thread HAO CHEN GUI
Hi, This patch replaces rtx_cost with insn_cost in forward propagation. In the PR, one constant vector should be propagated and replace a pseudo in a store insn if we know it's a duplicated constant vector. It reduces the insn cost but not rtx cost. In this case, the kind of destination operand (

[Patch-2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]

2024-01-25 Thread HAO CHEN GUI
Hi, This patch creates an insn_and_split pattern which helps the duplicated constant vector replace the source pseudo of store insn in fwprop pass. Thus the store can be implemented by a single stxvd2x and it eliminates the unnecessary byte swap insn on P8 LE. The test case shows the optimization

[Patchv2, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-17 Thread HAO CHEN GUI
Hi, The patch corrects the definition of TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of slow_unaligned_access. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640076.html the main change is to replace the macro with slow_unaligned_acc

[Patchv2, rs6000] Clean up pre-checkings of expand_block_compare

2023-12-17 Thread HAO CHEN GUI
Hi, This patch cleans up pre-checkings of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Return false when optimizing for size. 3. Remove P7 processor test as only P7 above can enter this function and P7 LE is excluded by targ

[Patchv3, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-20 Thread HAO CHEN GUI
Hi, The patch corrects the definition of TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of slow_unaligned_access. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640832.html the main change is to pass alignment measured by bits to slow_u

[Patch, rs6000] Call library for block memory compare when optimizing for size

2023-12-20 Thread HAO CHEN GUI
Hi, This patch call library function for block memory compare when it's optimized for size. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Call library for block memory compare when optimizing for s

[Patchv3, rs6000] Clean up pre-checkings of expand_block_compare

2023-12-20 Thread HAO CHEN GUI
Hi, This patch cleans up pre-checkings of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Remove P7 processor test as only P7 above can enter this function and P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the p

[patch-2, rs6000] guard fctid on PPC64 and powerpc 476 [PR112707]

2023-11-30 Thread HAO CHEN GUI
Hi, The "fctid" is supported on 64-bit Power processors and powerpc 476. It need a guard to check it. The patch fixes the issue. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: guard fctid on PPC64 a

[patch-1, rs6000] enable fctiw on old archs [PR112707]

2023-11-30 Thread HAO CHEN GUI
Hi, SImode in float register is supported on P7 above. It causes "fctiw" can be generated on old 32-bit processors as the output operand of fctiw insn is a SImode in float/double register. This patch fixes the problem by adding an expand and an insn pattern for fctiw. The output of new pattern is

[patch-1v2, rs6000] enable fctiw on old archs [PR112707]

2023-12-06 Thread HAO CHEN GUI
Hi, SImode in float register is supported on P7 above. It causes "fctiw" can't be generated on old 32-bit processors as the output operand of fctiw insn is an SImode in float/double register. This patch fixes the problem by adding one expand and one insn pattern for fctiw. The output of new patte

[patch-2v2, rs6000] guard fctid on PPC64 and powerpc 476 [PR112707]

2023-12-06 Thread HAO CHEN GUI
Hi, The "fctid" is supported on 64-bit Power processors and powerpc 476. It need a guard to check it. The patch fixes the issue. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/638859.html the main change is to define TARGET_FCTID to POWERPC64 or PPC476. Als

Re: [patch-2v3, rs6000] Guard fctid on PowerPC64 and PowerPC476 [PR112707]

2023-12-07 Thread HAO CHEN GUI
Hi, The "fctid" is supported on 64-bit Power processors and PowerPC476. It need a guard to check it. The patch fixes the issue. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639536.html the main change is to change the target requirement in pr88558*.c.

[Patch, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-10 Thread HAO CHEN GUI
Hi, The patch corrects the definition of TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and change its name to a comprehensible name. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Correct definition of mac

[Patch, rs6000] Clean up pre-checking of expand_block_compare

2023-12-10 Thread HAO CHEN GUI
Hi, This patch cleans up pre-checking of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Return false when optimizing for size. 3. Remove P7 CPU test as only P7 above can enter this function and P7 LE is excluded by targetm.slo

Re: [PATCH-2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-06 Thread HAO CHEN GUI
Hi Kewen, Thanks for your review comments. Just one question on following comment. 在 2023/11/7 10:40, Kewen.Lin 写道: > Nit: has_arch_pwr8 would make it un-tested on Power7 default env, I'd prefer > to remove this "has_arch_pwr8" and append "-mdejagnu-cpu=power8" to > dg-options. My original

[PATCH-2v2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-08 Thread HAO CHEN GUI
Hi, This patch enables vector mode for by pieces equality compare. It adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare relies both move and compare instructions, so both macro are changed. As the vector load/s

[PATCH-3v3, rs6000] Fix regression cases caused 16-byte by pieces move [PR111449]

2023-11-08 Thread HAO CHEN GUI
Hi, Originally 16-byte memory to memory is expanded via pattern. expand_block_move does an optimization on P8 LE to leverage V2DI reversed load/store for memory to memory move. Now it's done by 16-byte by pieces move and the optimization is lost. This patch adds an insn_and_split pattern to retak

[PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-08 Thread HAO CHEN GUI
Hi, This patch modifies expand_builtin_return and make it call expand_misaligned_mem_ref to load unaligned memory. The memory reference pointed by void* pointer might be unaligned, so expanding it with unaligned move optabs is safe. The new test case illustrates the problem. rs6000 doesn't ha

Re: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-09 Thread HAO CHEN GUI
Hi Richard, Thanks so much for your comments. 在 2023/11/9 19:41, Richard Biener 写道: > I'm not sure if the testcase is valid though? > > @defbuiltin{{void} __builtin_return (void *@var{result})} > This built-in function returns the value described by @var{result} from > the containing function.

[PATCH-3v4, rs6000] Fix regression cases caused 16-byte by pieces move [PR111449]

2023-11-10 Thread HAO CHEN GUI
Hi, Originally 16-byte memory to memory is expanded via pattern. expand_block_move does an optimization on P8 LE to leverage V2DI reversed load/store for memory to memory move. Now it's done by 16-byte by pieces move and the optimization is lost. This patch adds an insn_and_split pattern to retak

Re: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-10 Thread HAO CHEN GUI
Hi Richard, 在 2023/11/10 17:06, Richard Biener 写道: > On Fri, Nov 10, 2023 at 8:52 AM HAO CHEN GUI wrote: >> >> Hi Richard, >> Thanks so much for your comments. >> >> 在 2023/11/9 19:41, Richard Biener 写道: >>> I'm not sure if the test

Re: Fwd: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-13 Thread HAO CHEN GUI
Sorry, forgot to cc gcc-patches. 在 2023/11/13 16:05, HAO CHEN GUI 写道: > Andrew, > Could you kindly inform us what's the functionality of __objc_forward? > Does it change the memory content pointed by args? Thanks a lot. > > Thanks > Gui Haochen > > > libob

[PATCH] Clean up

2023-11-14 Thread HAO CHEN GUI
Hi, This patch cleans up by_pieces_ninsns and does following things. 1. Do the length and alignment adjustment for by pieces compare when overlap operation is enabled. 2. Remove unnecessary mov_optab checks. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is t

[PATCH] Clean up by_pieces_ninsns

2023-11-14 Thread HAO CHEN GUI
Hi, This patch cleans up by_pieces_ninsns and does following things. 1. Do the length and alignment adjustment for by pieces compare when overlap operation is enabled. 2. Remove unnecessary mov_optab checks. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is t

[PATCHv2] Clean up by_pieces_ninsns

2023-11-22 Thread HAO CHEN GUI
Hi, This patch cleans up by_pieces_ninsns and does following things. 1. Do the length and alignment adjustment for by pieces compare when overlap operation is enabled. 2. Replace unnecessary mov_optab checks with gcc assertions. Compared to last version, the main change is to replace unnecessa

[PATCH] Expand: Pass down equality only flag to cmpmem expand

2023-11-27 Thread HAO CHEN GUI
Hi, This patch passes down the equality only flags from emit_block_cmp_hints to cmpmem optab so that the target specific expand can generate optimized insns for equality only compare. Targets (e.g. rs6000) can generate more efficient insn sequence if the block compare is equality only. Bootstr

[PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]

2023-09-20 Thread HAO CHEN GUI
Hi, This patch enables vector compare for 16-byte memory equality compare. The 16-byte memory equality compare can be efficiently implemented by instruction "vcmpequb." It reduces one branch and one compare compared with two 8-byte compare sequence. 16-byte vector compare is not enabled on 32b

Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]

2023-09-24 Thread HAO CHEN GUI
Hi Kewen, 在 2023/9/18 15:34, Kewen.Lin 写道: > hanks for checking! So for P7, this patch looks neutral, but for P8 and > later, it may cause some few differences in code gen. I'm curious that how > many total object files and different object files were checked and found > on P8? P8 with -O2, fo

[PATCH-2v3, rs6000] Implement 32bit inline lrint [PR88558]

2023-09-24 Thread HAO CHEN GUI
Hi, This patch implements 32bit inline lrint by "fctiw". It depends on the patch1 to do SImode move from FP registers on P7. Compared to last version, the main change is to add some test cases. https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629187.html Bootstrapped and tested on p

Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]

2023-09-28 Thread HAO CHEN GUI
Kewen and Richard, Thanks for your comments. Please let me clarify it. 在 2023/9/27 19:10, Richard Sandiford 写道: > Yeah, I agree there doesn't seem to be a good reason to exclude vectors. > Sorry to dive straight into details, but maybe we should have something > called bitwise_mode_for_size that

[PATCH-1v4, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-06-26 Thread HAO CHEN GUI
Hi, This patch implemented optab_isinf for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to define and use the constant mask for test data class insns. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652593.html Bootstrapped and test

[PATCH-3v4, rs6000] Implement optab_isnormal for SFDF and IEEE128

2024-06-26 Thread HAO CHEN GUI
Hi, This patch implemented optab_isnormal for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to use the constant mask for test data class insns. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652595.html Bootstrapped and tested on po

[PATCH-2v4, rs6000] Implement optab_isfinite for SFDF and IEEE128

2024-06-26 Thread HAO CHEN GUI
Hi, This patch implemented optab_isfinite for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to use the constant mask for test data class insns. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652594.html Bootstrapped and tested on po

Ping^3 [PATCH-1v3] Value Range: Add range op for builtin isinf

2024-07-01 Thread HAO CHEN GUI
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html Thanks Gui Haochen 在 2024/6/24 9:40, HAO CHEN GUI 写道: > Hi, > Gently ping it. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html > > Thanks > Gui Haochen > > 在 2024/6/2

Re: [PATCH] RISC-V: use fclass insns to implement isfinite and isnormal builtins

2024-07-01 Thread HAO CHEN GUI
>   return std::isfinite (x); > } > > generating the new seq > > .LFB4: >     fclass.d    a0,fa0 >     andi    a0,a0,126 >     snez    a0,a0 >     ret > > vs. > >     li    a0,1 >     ret > > I have a hunch this requires the pending value range patch from Hao Chen > GUI. > > Thx, > -Vineet > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html

Ping^3 [PATCH-3v2] Value Range: Add range op for builtin isnormal

2024-07-01 Thread HAO CHEN GUI
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html Thanks Gui Haochen 在 2024/6/24 9:41, HAO CHEN GUI 写道: > Hi, > Gently ping it. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html > > Thanks > Gui Haochen > > 在 2024/6/2

Ping^3 [PATCH-2v4] Value Range: Add range op for builtin isfinite

2024-07-01 Thread HAO CHEN GUI
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html Thanks Gui Haochen 在 2024/6/24 9:41, HAO CHEN GUI 写道: > Hi, > Gently ping it. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html > > Thanks > Gui Haochen > > 在 2024/6/2

Ping^2 [PATCHv2, rs6000] Optimize vector construction with two vector doubleword loads [PR103568]

2024-07-01 Thread HAO CHEN GUI
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653180.html Thanks Gui Haochen 在 2024/6/20 15:01, HAO CHEN GUI 写道: > Hi, > Gently ping it. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653180.html > > Thanks > Gui Haochen > > 在 2024/5/3

[PATCH-1v5, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-07-09 Thread HAO CHEN GUI
Hi, This patch implemented optab_isinf for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main changes are: 1 Define 3 mode attributes which are used for predicate, constraint and asm print selection. They help merge sp/dp/qp patterns to one. 2 Remove ori

Re: [PATCH v2] RISC-V: use fclass insns to implement isfinite and isnormal builtins

2024-07-09 Thread HAO CHEN GUI
Hi, 在 2024/7/10 8:04, Vineet Gupta 写道: > So it seems initial versions of the patch didn't specify anything about > output mode. Richi asked for it in review and in v4 Hao added it. > But I don't see anyone asking specifically for SImode. > I guess that can be relaxed. Hao do you have any inputs he

Re: [PATCH] Expand: Pass down equality only flag to cmpmem expand

2024-07-09 Thread HAO CHEN GUI
Hi Jeff, 在 2024/7/10 7:35, Jeff Law 写道: > Is this patch still relevant?  It was submitted after stage1 closed for > gcc-14.  With the trunk open for development, you should probably rebase and  > repost if the patch is still relevant/useful. > > Conceptually knowing that we just want to do an eq

[PATCH, rs6000] Add TARGET_FLOAT128_HW guard for quad-precision insns

2024-07-10 Thread HAO CHEN GUI
Hi, This patch adds TARGET_FLOAT128_HW into pattern conditions for quad- precision insns. Also it removes FLOAT128_IEEE_P check from pattern conditions if the mode of pattern is IEEE128 as the mode iterator - IEEE128 already checks with FLOAT128_IEEE_P. For test case float128-cmp2-runnable.c,

[PATCH-1v4] Value Range: Add range op for builtin isinf

2024-07-11 Thread HAO CHEN GUI
Hi, The builtin isinf is not folded at front end if the corresponding optab exists. It causes the range evaluation failed on the targets which has optab_isinf. For instance, range-sincos.c will fail on the targets which has optab_isinf as it calls builtin_isinf. This patch fixed the problem by

Re: Ping^3 [PATCH-1v3] Value Range: Add range op for builtin isinf

2024-07-11 Thread HAO CHEN GUI
Hi Ruoyao, Thanks for your info. I updated my patch and sent it for review. Thanks Gui Haochen 在 2024/7/10 22:01, Xi Ruoyao 写道: > On Wed, 2024-07-10 at 21:54 +0800, Xi Ruoyao wrote: >> On Mon, 2024-07-01 at 09:11 +0800, HAO CHEN GUI wrote: >>> Hi, >>>   Gently ping

Re: [PATCH, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-07-11 Thread HAO CHEN GUI
Hi Jeff, 在 2024/7/11 6:25, Jeff Law 写道: > OK.  But given this patch is several months old, can you re-bootstrap & test  > before committing to the trunk. Thanks. I will rebase the patch and test it again. Thanks Gui Haochen

Re: [PATCH-1v4] Value Range: Add range op for builtin isinf

2024-07-11 Thread HAO CHEN GUI
Hi Jeff, Thanks for your comments. 在 2024/7/12 6:13, Jeff Law 写道: > > > On 7/11/24 1:32 AM, HAO CHEN GUI wrote: >> Hi, >>    The builtin isinf is not folded at front end if the corresponding optab >> exists. It causes the range evaluation failed on the targets wh

[PATCHv2, rs6000] Add TARGET_FLOAT128_HW guard for quad-precision insns

2024-07-14 Thread HAO CHEN GUI
Hi, This patch adds TARGET_FLOAT128_HW into pattern conditions for quad- precision insns. Some qp patterns are guarded by TARGET_P9_VECTOR originally, so replace it with "TARGET_FLOAT128_HW". For test case float128-cmp2-runnable.c, it should be guarded with ppc_float128_hw as it calls qp insns

[PATCH, rs6000] Remove redundant guard for float128 mode patterns

2024-07-14 Thread HAO CHEN GUI
Hi, This patch removes FLOAT128_IEEE_P guard when the mode of pattern is IEEE128 and FLOAT128_IBM_P when the mode of pattern is IBM128. The mode iterators already do the checking. So they're redundant. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk

[PATCH-2v5, rs6000] Implement optab_isfinite for SFDF and IEEE128

2024-07-17 Thread HAO CHEN GUI
Hi, This patch implemented optab_isfinite for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to merge the patterns of SFDF and IEEE128 into one. https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655780.html Bootstrapped and tested on p

[PATCH-3v5, rs6000] Implement optab_isnormal for SFDF and IEEE128

2024-07-17 Thread HAO CHEN GUI
Hi, This patch implemented optab_isnormal for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to merge the patterns of SFDF and IEEE128 into one. https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655781.html Bootstrapped and tested on p

Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-14 Thread HAO CHEN GUI
Hi Mikael, Thanks for your comments. 在 2024/5/9 16:03, Mikael Morin 写道: > I think the canonical API behaviour sets R to varying and returns true > instead of just returning false if nothing is known about the range. > > I'm not sure whether it makes any difference; Aldy can probably tell. But

Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-14 Thread HAO CHEN GUI
Hi Jakub, Thanks for your review comments. 在 2024/5/14 23:57, Jakub Jelinek 写道: > BUILT_IN_ISFINITE is just one of many BUILT_IN_IS... builtins, > would be nice to handle the others as well. > > E.g. isnormal/isnan/isinf, fpclassify etc. > Yes, I already sent the patches which add range op for

Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-14 Thread HAO CHEN GUI
Hi Andrew, Thanks so much for your explanation. I got it. I will address the issue. Thanks Gui Haochen 在 2024/5/15 2:45, Andrew MacLeod 写道: > > On 5/9/24 04:47, HAO CHEN GUI wrote: >> Hi Mikael, >> >>    Thanks for your comments. >> >> 在 2024/5/9 16:

Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]

2024-05-16 Thread HAO CHEN GUI
Hi Segher, Thanks for your review comments. I will modify it and resend. Just one question on the insn condition. 在 2024/5/17 1:25, Segher Boessenkool 写道: >> +(define_expand "isnormal2" >> + [(use (match_operand:SI 0 "gpc_reg_operand")) >> +(use (match_operand:SFDF 1 "gpc_reg_operand"))] >>

[PATCH-1v2, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-05-19 Thread HAO CHEN GUI
Hi, This patch implemented optab_isinf for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to modify the dg-options and dg-finals of test cases according to reviewer's advice. https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html

[PATCH-2v2, rs6000] Implement optab_isfinite for SFDF and IEEE128

2024-05-19 Thread HAO CHEN GUI
Hi, This patch implemented optab_isfinite for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is not to test if pseudo can be created in expand and modify dg-options and dg-finals of test cases according to reviewer's advice. https://gcc.gnu.or

[PATCH-3v2, rs6000] Implement optab_isnormal for SFDF and IEEE128

2024-05-19 Thread HAO CHEN GUI
Hi, This patch implemented optab_isnormal for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is not to test if pseudo can be created in expand and modify dg-options and dg-finals of test cases according to reviewer's advice. https://gcc.gnu.or

Re: [PATCH] Optab: add isfinite_optab for __builtin_isfinite

2024-05-19 Thread HAO CHEN GUI
Hi Andrew, 在 2024/5/19 3:42, Andrew Pinski 写道: > This is missing adding documentation for the new optab. > It should be documented in md.texi under `Standard Pattern Names For > Generation` section. Thanks for your reminder. I will add ones for all patches. Thanks Gui Haochen

[PATCHv2] Optab: add isfinite_optab for __builtin_isfinite

2024-05-20 Thread HAO CHEN GUI
Hi, This patch adds an optab for __builtin_isfinite. The finite check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version, the

[PATCHv2] Optab: add isnormal_optab for __builtin_isnormal

2024-05-20 Thread HAO CHEN GUI
Hi, This patch adds an optab for __builtin_isnormal. The normal check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version, the

[PATCH-1v2] Value Range: Add range op for builtin isinf

2024-05-20 Thread HAO CHEN GUI
Hi, The builtin isinf is not folded at front end if the corresponding optab exists. It causes the range evaluation failed on the targets which has optab_isinf. For instance, range-sincos.c will fail on the targets which has optab_isinf as it calls builtin_isinf. This patch fixed the problem by

[PATCH-2v3] Value Range: Add range op for builtin isfinite

2024-05-20 Thread HAO CHEN GUI
Hi, This patch adds the range op for builtin isfinite. Compared to previous version, the main change is to set varying if nothing is known about the range. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650857.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no reg

[PATCH-3] Value Range: Add range op for builtin isnormal

2024-05-20 Thread HAO CHEN GUI
Hi, This patch adds the range op for builtin isnormal. It also adds two help function in frange to detect range of normal floating-point and range of subnormal or zero. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen

Re: [PATCH-1v2, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-05-23 Thread HAO CHEN GUI
Hi Peter, Thanks for your comments. 在 2024/5/23 5:58, Peter Bergner 写道: > Is there a reason not to use the vsx_register_operand predicate for op1 > which matches the predicate for the operand of the xststdcp pattern > we're passing op1 to? No, I will fix them. Thanks Gui Haochen

  1   2   3   4   5   >