Re: [PATCH v2] rs6000: Change bitwise xor to an equality operator [PR106907]

2023-10-16 Thread Kewen.Lin
Hi, on 2023/10/11 19:50, jeevitha wrote: > Hi All, > > The following patch has been bootstrapped and regtested on powerpc64le-linux. > > PR106907 has a few warnings spotted from cppcheck. These warnings > are related to the need of precedence clarification. Instead of using xor, > it has been ch

Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-16 Thread Kewen.Lin
Hi, on 2023/10/10 16:18, HAO CHEN GUI wrote: > Hi David, > > Thanks for your review comments. > > 在 2023/10/9 23:42, David Edelsohn 写道: >>  #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8) >>  #define MAX_MOVE_MAX 8 >> +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16) >> +#def

[PATCH] vect: Cost adjacent vector loads/stores together [PR111784]

2023-10-17 Thread Kewen.Lin
Hi, As comments[1][2], this patch is to change the costing way on some adjacent vector loads/stores from costing one by one to costing them together with the total number once. It helps to fix the exposed regression PR111784 on aarch64, as aarch64 specific costing could make different decisions a

Re: [PATCH] vect: Cost adjacent vector loads/stores together [PR111784]

2023-10-22 Thread Kewen.Lin
Hi Richard, on 2023/10/20 06:12, Richard Sandiford wrote: > "Kewen.Lin" writes: >> Hi, >> >> As comments[1][2], this patch is to change the costing way >> on some adjacent vector loads/stores from costing one by >> one to costing them together with t

Re: [PATCH 1/3]rs6000: update num_insns_constant for 2 insns

2023-10-24 Thread Kewen.Lin
Hi, on 2023/10/25 10:00, Jiufu Guo wrote: > Hi, > > Trunk gcc supports more constants to be built via two instructions: e.g. > "li/lis; xori/xoris/rldicl/rldicr/rldic". > And then num_insns_constant should also be updated. > Thanks for updating this. > Bootstrap & regtest pass ppc64{,le}. > Is

Re: [PATCH 2/3]rs6000: using 'pli' to load 34bit-constant

2023-10-24 Thread Kewen.Lin
on 2023/10/25 10:00, Jiufu Guo wrote: > Hi, > > For constants with 16bit values, 'li or lis' can be used to generate > the value. For 34bit constant, 'pli' is ok to generate the value. > > Bootstrap®test pass on ppc64{,le}. > Is this ok for trunk? > > BR, > Jeff (Jiufu Guo) > > gcc/ChangeLog:

Re: [PATCH 3/3]rs6000: split complicate constant to constant pool

2023-10-24 Thread Kewen.Lin
Hi, on 2023/10/25 10:00, Jiufu Guo wrote: > Hi, > > Sometimes, a complicated constant is built via 3(or more) > instructions to build. Generally speaking, it would not be > as faster as loading it from the constant pool (as a few > discussions in PR63281). I may miss some previous discussions, b

[PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-10-24 Thread Kewen.Lin
Hi, This is almost a repost for v2 which was posted at[1] in March excepting for: 1) rebased from r14-4810 which is relatively up-to-date, some conflicts on "int to bool" return type change have been resolved; 2) adjust commit log a bit; 3) fix misspelled "articial" with "artificia

PING^5 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare

2023-10-24 Thread Kewen.Lin
Hi, Gentle ping this series: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html BR, Kewen on 2022/11/24 17:15, Kewen Lin wrote: > Hi, > > Following Segher's suggestion, this patch series is to rework > function rs6000_emit_vector_compare for vector float an

PING^3 [PATCH v2] rs6000: Don't use optimize_function_for_speed_p too early [PR108184]

2023-10-24 Thread Kewen.Lin
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609993.html BR, Kewen >> on 2023/1/16 17:08, Kewen.Lin via Gcc-patches wrote: >>> Hi, >>> >>> As Honza pointed out in [1], the current uses of function >

Re: [PATCH 3/3]rs6000: split complicate constant to constant pool

2023-10-25 Thread Kewen.Lin
on 2023/10/25 16:14, Jiufu Guo wrote: > > Hi, > > "Kewen.Lin" writes: > >> Hi, >> >> on 2023/10/25 10:00, Jiufu Guo wrote: >>> Hi, >>> >>> Sometimes, a complicated constant is built via 3(or more) >>> instructions

[PATCH] rs6000: Consider inline asm as safe if no assembler complains [PR111828]

2023-10-29 Thread Kewen.Lin
Hi, As discussed in PR111828, rs6000_update_ipa_fn_target_info is much conservative, currently for any non-empty inline asm, without any parsing, it would take inline asm could have HTM insns. It means for one function attributed with power8 having inline asm, even if it has no HTM insns, we don'

Re: [PATCH] rs6000, Add missing overloaded bcd builtin tests

2023-10-30 Thread Kewen.Lin
Hi Carl, on 2023/10/31 08:08, Carl Love wrote: > GCC maintainers: > > The following patch adds tests for two of the rs6000 overloaded built- > ins that do not have tests. Additionally the GCC documentation file I just found that actually they have the test coverage, because we have #define __b

[PATCH 1/3][rs6000] Replace vsx_xvcdpsp by vsx_xvcvdpsp

2019-10-23 Thread Kewen.Lin
Hi, I noticed that vsx_xvcdpsp and vsx_xvcvdpsp are almost the same, and vsx_xvcdpsp looks replaceable with vsx_xvcvdpsp since it's only called by gen_*. Bootstrapped and regress tested on powerpc64le-linux-gnu. gcc/ChangeLog 2019-10-23 Kewen Lin * config/rs6000/vsx.md (vsx_xvcdpsp

[PATCH 2/3][rs6000] vector conversion RTL pattern update for same unit size

2019-10-23 Thread Kewen.Lin
Hi, For those fixed point <-> floating point vector conversion with same element unit size, such as: SP <-> SI, DP <-> DI, it's fine to use the existing RTL operations like any_fix/any_float for them. This patch is to update them with any_fix/any_float. Bootstrapped and regress tested on powerpc

[PATCH 3/3][rs6000] vector conversion RTL pattern update for diff unit size

2019-10-23 Thread Kewen.Lin
Hi, Following the previous one 2/3, this patch is to update the vector conversions between fixed point and floating point with different element unit sizes, such as: SP <-> DI, DP <-> SI. Bootstrap and regression testing just launched. gcc/ChangeLog 2019-10-23 Kewen Lin * config/rs

[PATCH rs6000]Fix PR92132

2019-10-25 Thread Kewen.Lin
Hi, To support full condition reduction vectorization, we have to define vec_cmp_* and vcond_mask_*. This patch is to add related expands. Add vector_{ungt,unge,unlt,unle} for unique vector_* interface support. Regression testing just launched. gcc/ChangeLog 2019-10-25 Kewen Lin PR

Re: [PATCH rs6000]Fix PR92132

2019-10-28 Thread Kewen.Lin
Fixed one place without consistent mode. Bootstrapped and regress testing passed on powerpc64le-linux. Thanks! Kewen --- gcc/ChangeLog 2019-10-25 Kewen Lin PR target/92132 * config/rs6000/rs6000.md (one_cmpl3_internal): Expose name. * config/rs6000/vector.md (fpcm

[PATCH, rs6000] Fix PR92127

2019-10-30 Thread Kewen.Lin
Hi, As PR92127 shows, recent commit r276645 enables more unrollings, two ppc vectorization cost model test cases are fragile and failed after the change. This patch is to disable unrolling for the loops of interest to make test cases more robust. Verified on ppc64-redhat-linux. Should be fine o

[PATCH 3/3 V2][rs6000] vector conversion RTL pattern update for diff unit size

2019-10-31 Thread Kewen.Lin
Hi Segher, Thanks a lot for the comments. on 2019/10/31 上午2:49, Segher Boessenkool wrote: > Hi! > > On Wed, Oct 23, 2019 at 05:42:45PM +0800, Kewen.Lin wrote: >> Following the previous one 2/3, this patch is to update the >> vector conversions between fixed point and f

Re: [PATCH 3/3 V2][rs6000] vector conversion RTL pattern update for diff unit size

2019-10-31 Thread Kewen.Lin
Hi Segher, on 2019/11/1 上午2:49, Segher Boessenkool wrote: > Hi! > > On Thu, Oct 31, 2019 at 05:35:22PM +0800, Kewen.Lin wrote: >>>> +/* Half VMX/VSX vector (for select) */ >>>> +VECTOR_MODE (FLOAT, SF, 2); /* V2SF */ >>>> +VECTOR_M

[PATCH, rs6000] Make load cost more in vectorization cost for P8/P9

2019-11-03 Thread Kewen.Lin
Hi, To align with rs6000_insn_cost costing more for load type insns, this patch is to make load insns cost more in vectorization cost function. Considering that the result of load usually is used somehow later (true-dep) but store won't, we keep the store as before. The SPEC2017 performance eval

Re: [PATCH V3] rs6000: Refine small loop unroll in loop_unroll_adjust hook

2019-11-04 Thread Kewen.Lin
Hi Jeff, Thanks for the patch, I learned a lot from it. Some nits embedded. on 2019/11/4 下午2:31, Jiufu Guo wrote: > Hi, > > In this patch, loop unroll adjust hook is introduced for powerpc. We can do > target related hueristic adjustment in this hook. In this patch, small loops > is unrolled 2

Re: [PATCH, rs6000] Make load cost more in vectorization cost for P8/P9

2019-11-04 Thread Kewen.Lin
Hi Segher, Thanks for the comments! on 2019/11/5 上午4:21, Segher Boessenkool wrote: > Hi! > > On Mon, Nov 04, 2019 at 03:16:06PM +0800, Kewen.Lin wrote: >> To align with rs6000_insn_cost costing more for load type insns, > > (Which itself has history in rs6000_rtx_costs). &

Re: [PATCH v2] PR92090: Fix testcase failures by r276469

2019-11-04 Thread Kewen.Lin
on 2019/11/5 上午6:57, Joseph Myers wrote: > On Mon, 4 Nov 2019, luoxhu wrote: > >> -finline-functions is enabled by default for O2 since r276469, update the >> test cases with -fno-inline-functions. >> >> v2: disable inlining for the failed cases. Add two more failed cases >> not listed in BZ. Te

Re: [PATCH rs6000]Fix PR92132

2019-11-05 Thread Kewen.Lin
Hi Segher, Thanks for the comments! on 2019/11/2 上午7:17, Segher Boessenkool wrote: > On Tue, Oct 29, 2019 at 01:16:53PM +0800, Kewen.Lin wrote: >> (vcond_mask_): New expand. > > Say for which mode please? Like > (vcond_mask_ for VEC_I and VEC_I): New expand.

[PATCH, rs6000 v2] Make load cost more in vectorization cost for P8/P9

2019-11-06 Thread Kewen.Lin
Hi Segher, on 2019/11/7 上午1:38, Segher Boessenkool wrote: > Hi! > > On Tue, Nov 05, 2019 at 10:14:46AM +0800, Kewen.Lin wrote: >>>> + benefits were observed on Power8 and up, we can unify it if similar >>>> + profits are measured on Power6 and Power7.

Re: [PATCH rs6000]Fix PR92132

2019-11-07 Thread Kewen.Lin
Hi Segher, on 2019/11/7 上午7:49, Segher Boessenkool wrote: > > The expander named "one_cmpl3": > > Erm. 2, not 3 :-) > > (define_expand "one_cmpl2" > [(set (match_operand:BOOL_128 0 "vlogical_operand") > (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")))] > "" > "") >

Re: [PATCH, rs6000 v2] Make load cost more in vectorization cost for P8/P9

2019-11-07 Thread Kewen.Lin
Hi Segher, on 2019/11/8 上午6:36, Segher Boessenkool wrote: > On Thu, Nov 07, 2019 at 11:22:12AM +0800, Kewen.Lin wrote: >> One updated patch to enable it everywhere attached. > >> 2019-11-07 Kewen Lin >> >> * config/rs6000/rs6000.c (rs6000_bui

Re: [PATCH rs6000]Fix PR92132

2019-11-07 Thread Kewen.Lin
Hi Segher, on 2019/11/8 上午8:07, Segher Boessenkool wrote: > Hi! > >>> Half are pretty simple: >>> >>> lt(a,b) = gt(b,a) >>> gt(a,b) = gt(a,b) >>> eq(a,b) = eq(a,b) >>> le(a,b) = ge(b,a) >>> ge(a,b) = ge(a,b) >>> >>> ltgt(a,b) = ge(a,b) ^ ge(b,a) >>> ord(a,b) = ge(a,b) | ge(b,a) >>> >>> The other

Re: [PATCH rs6000]Fix PR92132

2019-11-10 Thread Kewen.Lin
Hi Segher, on 2019/11/9 上午1:36, Segher Boessenkool wrote: > Hi! > > On Fri, Nov 08, 2019 at 10:38:13AM +0800, Kewen.Lin wrote: >>>> + [(set (match_operand: 0 "vint_operand") >>>> + (match_operator 1 "comparison_operator" >>> &g

[PATCH, rs6000] Refactor FP vector comparison operators

2019-11-10 Thread Kewen.Lin
Hi, This is a subsequent patch to refactor the existing float point vector comparison operator supports. The patch to fix PR92132 supplemented vector float point comparison by exposing the names for unordered/ordered/uneq/ltgt and adding ungt/unge/unlt/unle/ ne. As Segher pointed out, some patte

Re: [PATCH, rs6000] Refactor FP vector comparison operators

2019-11-12 Thread Kewen.Lin
Hi Segher, on 2019/11/11 下午8:51, Segher Boessenkool wrote: > Hi! > >> pattern 1: >> lt(a,b) = gt(b,a) >> le(a,b) = ge(b,a) > > This is done by swap_condition normally. Nice! Done. > >> pattern 2: >> unge(a,b) = ~gt(b,a) >> unle(a,b) = ~gt(a,b) >> ne(a,b) = ~eq(a,b) >> ungt(a,b)

[PATCH, testsuite] Fix PR92464 by adjust test case loop bound

2019-11-12 Thread Kewen.Lin
Hi, As PR92464 shows, the recent vectorization cost adjustment on load insns is responsible for this regression. It leads the profitable min iteration count to change from 19 to 12. The case happens to hit the threshold. By actual runtime performance evaluation, the vectorized version perform o

[PATCH] Fix typo and avoid possible memory leak

2020-01-12 Thread Kewen.Lin
Hi, Function average_num_loop_insns forgets to free loop body in early return. Besides, overflow comparison checks 100 (e6) but the return value is 10 (e5), I guess it's unexpected, a typo? Bootstrapped and regress tested on powerpc64le-linux-gnu. I guess this should go to GCC11? Is i

Re: [PATCH] Fix typo and avoid possible memory leak

2020-01-14 Thread Kewen.Lin
on 2020/1/13 下午6:46, Richard Sandiford wrote: > "Kewen.Lin" writes: >> Hi, >> >> Function average_num_loop_insns forgets to free loop body in early return. >> Besides, overflow comparison checks 100 (e6) but the return value is >> 1

[PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

2020-01-16 Thread Kewen.Lin
Hi, As we discussed in the thread https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html, I'm working to teach IVOPTs to consider D-form group access during unrolling. The difference on D-form and other forms during unrolling is

[PATCH 1/4 GCC11] Add middle-end unroll factor estimation

2020-01-16 Thread Kewen.Lin
gcc/ChangeLog 2020-01-16 Kewen Lin * cfgloop.h (struct loop): New field estimated_uf. * config/rs6000/rs6000.c (TARGET_LOOP_UNROLL_ADJUST_TREE): New macro. (rs6000_loop_unroll_adjust_tree): New function. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGE

[PATCH 2/4 GCC11] Add target hook stride_dform_valid_p

2020-01-16 Thread Kewen.Lin
gcc/ChangeLog 2020-01-16 Kewen Lin * config/rs6000/rs6000.c (TARGET_STRIDE_DFORM_VALID_P): New macro. (rs6000_stride_dform_valid_p): New function. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_STRIDE_DFORM_VALID_P): New hook. * target.def (stride_

[PATCH 3/4 GCC11] IVOPTs Consider cost_step on different forms during unrolling

2020-01-16 Thread Kewen.Lin
gcc/ChangeLog 2020-01-16 Kewen Lin * tree-ssa-loop-ivopts.c (struct iv_group): New field dform_p. (struct iv_cand): New field dform_p. (struct ivopts_data): New field mark_dform_p. (record_group): Initialize dform_p. (mark_dform_groups): New function.

[PATCH 4/4 GCC11] rs6000: P9 D-form test cases

2020-01-16 Thread Kewen.Lin
gcc/testsuite/ChangeLog 2020-01-16 Kelvin Nilsen Kewen Lin * gcc.target/powerpc/p9-dform-0.c: New test. * gcc.target/powerpc/p9-dform-1.c: New test. * gcc.target/powerpc/p9-dform-2.c: New test. * gcc.target/powerpc/p9-dform-3.c: New test.

Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

2020-02-09 Thread Kewen.Lin
Hi Segher, on 2020/1/20 下午8:33, Segher Boessenkool wrote: > Hi! > > On Thu, Jan 16, 2020 at 05:36:52PM +0800, Kewen.Lin wrote: >> As we discussed in the thread >> https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html >> Original: https://gcc.gnu.org/ml/gcc-pa

[PATCH 1/4 v2 GCC11] Add middle-end unroll factor estimation

2020-02-09 Thread Kewen.Lin
(tree_average_num_loop_insns): New function. * tree-ssa-loop.h (tree_average_num_loop_insns): New declare. BR, Kewen on 2020/1/20 下午9:02, Segher Boessenkool wrote: > Hi! > > On Thu, Jan 16, 2020 at 05:39:40PM +0800, Kewen.Lin wrote: >> --- a/gcc/cfgloop.h >> +++ b/gcc/cfgloop.

[PATCH 4/4 v2 GCC11] rs6000: P9 D-form test cases

2020-02-09 Thread Kewen.Lin
-dform-2.c: New test. * gcc.target/powerpc/p9-dform-3.c: New test. * gcc.target/powerpc/p9-dform-4.c: New test. * gcc.target/powerpc/p9-dform-generic.h: New test. on 2020/1/20 下午9:19, Segher Boessenkool wrote: > Hi! > > On Thu, Jan 16, 2020 at 05:42:41PM +0800,

Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

2020-02-10 Thread Kewen.Lin
on 2020/2/11 上午5:29, Segher Boessenkool wrote: > Hi! > > On Mon, Feb 10, 2020 at 02:17:04PM +0800, Kewen.Lin wrote: >> on 2020/1/20 下午8:33, Segher Boessenkool wrote: >>> On Thu, Jan 16, 2020 at 05:36:52PM +0800, Kewen.Lin wrote: >>>> As we discussed in the th

Re: [PATCH 1/4 v2 GCC11] Add middle-end unroll factor estimation

2020-02-10 Thread Kewen.Lin
Hi Jeff, on 2020/2/11 上午10:14, Jiufu Guo wrote: > "Kewen.Lin" writes: > >> Hi Segher, >> >> Thanks for your comments! Updated to v2 as below: >> >> 1) Removed unnecessary hook loop_unroll_adjust_tree. >> 2) Updated estimated_uf to est

[PATCH 1/4 v3 GCC11] Add middle-end unroll factor estimation

2020-02-10 Thread Kewen.Lin
w declaration. * tree-ssa-loop.c (tree_average_num_loop_insns): New function. * tree-ssa-loop.h (tree_average_num_loop_insns): New declaration. on 2020/2/11 上午7:34, Segher Boessenkool wrote: > Hi! > > On Mon, Feb 10, 2020 at 02:20:17PM +0800, Kewen.Lin wrote: >>

[PATCH, IRA] Fix PR91052 by skipping multiple_sets insn in combine_and_move_insns

2020-02-11 Thread Kewen.Lin
Hi, As PR91052's comments show, commit r272731 exposed one issue in function combine_and_move_insns. Function combine_and_move_insns perform the below unexpected transformation. ** Before: ** 67: NOTE_INSN_BASIC_BLOCK 8 ... 59: {r184:SF=[sfp:SI-0x190];r121:SI=sfp:SI-0x190;} ==> move obje

Re: [PATCH, IRA] Fix PR91052 by skipping multiple_sets insn in combine_and_move_insns

2020-02-11 Thread Kewen.Lin
on 2020/2/12 上午12:24, Vladimir Makarov wrote: > On 2/11/20 3:01 AM, Kewen.Lin wrote: >> Hi, >> >> As PR91052's comments show, commit r272731 exposed one issue in function >> combine_and_move_insns.  Function combine_and_move_insns perform the >> below unex

[PATCH, rs6000] Adjust vectorization cost for scalar COND_EXPR

2019-12-11 Thread Kewen.Lin
Hi, We found that the vectorization cost modeling on scalar COND_EXPR is a bit off on rs6000. One typical case is 548.exchange2_r, -Ofast -mcpu=power9 -mrecip -fvect-cost-model=unlimited is better than -Ofast -mcpu=power9 -mrecip (the default is -fvect-cost-model=dynamic) by 1.94%. Scalar COND_E

[RFC/PATCH] IVOPTs select cand with preferred D-form access

2020-01-06 Thread Kewen.Lin
Hi all, Recently I'm investigating on an issue related to use D-form/X-form vector memory access, it's the same as what the patch https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01879.html was intended to deal with. Power9 introduces DQ-form instructions for vector memory access, we perfer to use

Re: [RFC] IVOPTs select cand with preferred D-form access

2020-01-07 Thread Kewen.Lin
on 2020/1/7 下午5:14, Richard Biener wrote: > On Mon, 6 Jan 2020, Kewen.Lin wrote: > >> We are thinking whether it can be handled in IVOPTs instead of one RTL pass. >> >> During IVOPTs selecting IV cands, it doesn't know the loop will be unrolled >> so >> i

Re: [RFC] IVOPTs select cand with preferred D-form access

2020-01-07 Thread Kewen.Lin
on 2020/1/7 下午7:25, Richard Biener wrote: > On Tue, 7 Jan 2020, Kewen.Lin wrote: > >> on 2020/1/7 下午5:14, Richard Biener wrote: >>> On Mon, 6 Jan 2020, Kewen.Lin wrote: >>> >>>> We are thinking whether it can be handled in IVOPTs instead of one RTL >&

Re: [RFC] IVOPTs select cand with preferred D-form access

2020-01-08 Thread Kewen.Lin
Hi Bin, > I am a bit worried that would make IVOPTs heavy too, it might be > possible to compute heuristics whether loop should be unrolled as a > post-IVOPTs transformation. Of course the transformation needs to do > more work than simply unrolling in order to take advantage of > aforementioned

Re: [PATCH, rs6000] Add subreg patterns for SImode rotate and mask insert

2024-03-06 Thread Kewen.Lin
Hi, on 2024/3/1 10:41, HAO CHEN GUI wrote: > Hi, > This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In > combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode > lshiftrt with an out AND. It matches a DImode rotate and mask insert on > rs6000. > > Trying 2

Re: [PATCH V3] rs6000: Don't ICE when compiling the __builtin_vsx_splat_2di built-in [PR113950]

2024-03-06 Thread Kewen.Lin
Hi, on 2024/3/4 02:55, jeevitha wrote: > Hi All, > > The following patch has been bootstrapped and regtested on powerpc64le-linux. > > When we expand the __builtin_vsx_splat_2di function, we were allowing > immediate

Re: [PATCH] fix PowerPC < 7 w/ Altivec not to default to power7

2024-03-10 Thread Kewen.Lin
Hi, on 2024/3/8 19:33, Rene Rebe wrote: > This might not be the best timing -short before a major release-, > however, Sam just commented on the bug I filled years ago [1], so here > we go: > > Glibc uses .machine to determine assembler optimizations to use. > However, since reworking the rs6000

Re: [PATCH V3] rs6000: Don't ICE when compiling the __builtin_vsx_splat_2di built-in [PR113950]

2024-03-17 Thread Kewen.Lin
Hi, on 2024/3/16 04:34, Peter Bergner wrote: > On 3/6/24 3:27 AM, Kewen.Lin wrote: >> on 2024/3/4 02:55, jeevitha wrote: >>> The following patch has been bootstrapped and regtested on >>> powerpc64le-linux. >>>

Re: [PATCH] rs6000: Fix up setup_incoming_varargs [PR114175]

2024-03-19 Thread Kewen.Lin
Hi Jakub, on 2024/3/19 01:21, Jakub Jelinek wrote: > Hi! > > The c23-stdarg-8.c test (as well as the new test below added to cover even > more cases) FAIL on powerpc64le-linux and presumably other powerpc* targets > as well. > Like in the r14-9503-g218d174961 change on x86-64 we need to advance >

Re: [PATCH v1] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-04-01 Thread Kewen.Lin
Hi! on 2024/3/22 17:36, Jakub Jelinek wrote: > On Fri, Mar 22, 2024 at 02:55:43PM +0530, Ajit Agarwal wrote: >> rs6000: Stackoverflow in optimized code on PPC [PR100799] >> >> When using FlexiBLAS with OpenBLAS we noticed corruption of >> the parameters passed to OpenBLAS functions. FlexiBLAS >> b

Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-04-01 Thread Kewen.Lin
Hi! on 2024/3/24 02:37, Ajit Agarwal wrote: > > > On 23/03/24 9:33 pm, Peter Bergner wrote: >> On 3/23/24 4:33 AM, Ajit Agarwal wrote: > - else if (align_words < GP_ARG_NUM_REG) > + else if (align_words < GP_ARG_NUM_REG > +|| (cum->hidden_string_length > +

Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-04-02 Thread Kewen.Lin
Hi Jakub, on 2024/4/2 16:03, Jakub Jelinek wrote: > On Tue, Apr 02, 2024 at 02:12:04PM +0800, Kewen.Lin wrote: >>>>>> The old code for the unused hidden parameter (which was the 9th param) >>>>>> would >>>>>> fall thru to the "retu

Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-04-03 Thread Kewen.Lin
Hi Jakub, on 2024/4/3 16:35, Jakub Jelinek wrote: > On Wed, Apr 03, 2024 at 01:18:54PM +0800, Kewen.Lin wrote: >>> I'd prefer not to remove DECL_ARGUMENTS chains, they are valid arguments >>> that just some >>> invalid code doesn't pass. By remo

Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-04-03 Thread Kewen.Lin
Hi! on 2024/4/3 17:23, Jakub Jelinek wrote: > On Wed, Apr 03, 2024 at 05:02:40PM +0800, Kewen.Lin wrote: >> on 2024/4/3 16:35, Jakub Jelinek wrote: >>> On Wed, Apr 03, 2024 at 01:18:54PM +0800, Kewen.Lin wrote: >>>>> I'd prefer not to remove DECL_ARGU

Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-04-03 Thread Kewen.Lin
on 2024/4/3 19:18, Jakub Jelinek wrote: > On Wed, Apr 03, 2024 at 07:01:50PM +0800, Kewen.Lin wrote: >> Thanks for the details on debugging support, but IIUC with this workaround >> being adopted, the debuggability on hidden args are already broken, aren't? > > No. >

Re: [PATCH] rs6000: Replace OPTION_MASK_DIRECT_MOVE with OPTION_MASK_P8_VECTOR [PR101865]

2024-04-08 Thread Kewen.Lin
Hi Peter, on 2024/4/6 06:28, Peter Bergner wrote: > This is a cleanup patch in preparation to fixing the real bug in PR101865. > TARGET_DIRECT_MOVE is redundant with TARGET_P8_VECTOR, so alias it to that. > Also replace all usages of OPTION_MASK_DIRECT_MOVE with OPTION_MASK_P8_VECTOR > and delete

Re: [PATCH 3/3] Add -mcpu=power11 tests

2024-04-08 Thread Kewen.Lin
Hi Mike, on 2024/3/20 12:16, Michael Meissner wrote: > This patch adds some simple tests for -mcpu=power11 support. In order to run > these tests, you need an assembler that supports the appropriate option for > supporting the Power11 processor (-mpower11 under Linux or -mpwr11 under AIX). > > I

[PATCH] testsuite: Add profile_update_atomic check to gcov-20.c [PR114614]

2024-04-08 Thread Kewen.Lin
Hi, As PR114614 shows, the newly added test case gcov-20.c by commit r14-9789-g08a52331803f66 failed on targets which do not support atomic profile update, there would be a message like: warning: target does not support atomic profile update, single mode is selected Since the test c

[PATCH] rs6000: Fix wrong align passed to build_aligned_type [PR88309]

2024-04-08 Thread Kewen.Lin
Hi, As the comments in PR88309 show, there are two oversights in rs6000_gimple_fold_builtin that pass align in bytes to build_aligned_type but which actually requires align in bits, it causes unexpected ICE or hanging in function is_miss_rate_acceptable due to zero align_unit value. This patch is

Re: [PATCH] rs6000: Fix wrong align passed to build_aligned_type [PR88309]

2024-04-08 Thread Kewen.Lin
on 2024/4/8 18:47, Richard Biener wrote: > On Mon, Apr 8, 2024 at 11:22 AM Kewen.Lin wrote: >> >> Hi, >> >> As the comments in PR88309 show, there are two oversights >> in rs6000_gimple_fold_builtin that pass align in bytes to >> build_aligned_type but which

Re: [PATCH] testsuite: Add profile_update_atomic check to gcov-20.c [PR114614]

2024-04-08 Thread Kewen.Lin
on 2024/4/8 18:47, Richard Biener wrote: > On Mon, Apr 8, 2024 at 11:23 AM Kewen.Lin wrote: >> >> Hi, >> >> As PR114614 shows, the newly added test case gcov-20.c by >> commit r14-9789-g08a52331803f66 failed on targets which do >> not support atomic p

Re: [PATCH] rs6000: Replace OPTION_MASK_DIRECT_MOVE with OPTION_MASK_P8_VECTOR [PR101865]

2024-04-08 Thread Kewen.Lin
Hi Peter, on 2024/4/8 21:21, Peter Bergner wrote: > On 4/8/24 3:55 AM, Kewen.Lin wrote: >> on 2024/4/6 06:28, Peter Bergner wrote: >>> +mno-direct-move >>> +Target Undocumented WarnRemoved >>> + >>> mdirect-move >>> -Target Undocument

Re: [PATCH] rs6000: Replace OPTION_MASK_DIRECT_MOVE with OPTION_MASK_P8_VECTOR [PR101865]

2024-04-08 Thread Kewen.Lin
on 2024/4/9 11:20, Peter Bergner wrote: > On 4/8/24 9:37 PM, Kewen.Lin wrote: >> on 2024/4/8 21:21, Peter Bergner wrote: >> I prefer to remove it completely, that is: >> >>> -mdirect-move >>> -Target Undocumented Mask(DIRECT_MOVE) Var(rs6000_isa_flags) War

[PATCH] testsuite: Adjust pr113359-2_*.c with unsigned long long [PR114662]

2024-04-09 Thread Kewen.Lin
Hi, pr113359-2_*.c define a struct having unsigned long type members ay and az which have 4 bytes size at -m32, while the related constants CL1 and CL2 used for equality check are always 8 bytes, it makes compiler consider the below 69 if (a.ay != CL1) 70 __builtin_abort (); always to

Re: [PATCH] testsuite: Adjust pr113359-2_*.c with unsigned long long [PR114662]

2024-04-10 Thread Kewen.Lin
on 2024/4/10 15:11, Richard Biener wrote: > On Wed, Apr 10, 2024 at 8:24 AM Kewen.Lin wrote: >> >> Hi, >> >> pr113359-2_*.c define a struct having unsigned long type >> members ay and az which have 4 bytes size at -m32, while >> the related constants CL1

Re: Repost [PATCH 4/6] PowerPC: Make MMA insns support DMR registers.

2024-02-03 Thread Kewen.Lin
Hi Mike, on 2024/1/6 07:39, Michael Meissner wrote: > This patch changes the MMA instructions to use either FPR registers > (-mcpu=power10) or DMRs (-mcpu=future). In this patch, the existing MMA > instruction names are used. > > A macro (__PPC_DMR__) is defined if the MMA instructions use the D

Re: Repost [PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations.

2024-02-03 Thread Kewen.Lin
Hi Mike, on 2024/1/6 07:40, Michael Meissner wrote: > This patch changes the assembler instruction names for MMA instructions from > the original name used in power10 to the new name when used with the dense > math > system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the > sa

Re: Repost [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.

2024-02-04 Thread Kewen.Lin
Hi Mike, on 2024/1/6 07:42, Michael Meissner wrote: > This patch is a prelimianry patch to add the full 1,024 bit dense math > register> (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the > top of the > DMR register. > > This patch only adds the new 1,024 bit register support.

Re: [PATCH v2] rs6000: Rework option -mpowerpc64 handling [PR106680]

2024-02-05 Thread Kewen.Lin
Hi Sebastian, on 2024/2/5 18:38, Sebastian Huber wrote: > Hello, > > On 27.12.22 11:16, Kewen.Lin via Gcc-patches wrote: >> Hi Segher, >> >> on 2022/12/24 04:26, Segher Boessenkool wrote: >>> Hi! >>> >>> On Wed, Oct 12, 2022 at 04:12:21PM

Re: Repost [PATCH 1/6] Add -mcpu=future

2024-02-07 Thread Kewen.Lin
on 2024/2/6 14:01, Michael Meissner wrote: > On Tue, Jan 23, 2024 at 04:44:32PM +0800, Kewen.Lin wrote: ... >>> diff --git a/gcc/config/rs6000/rs6000-opts.h >>> b/gcc/config/rs6000/rs6000-opts.h >>> index 33fd0efc936..25890ae3034 100644 >>> --- a/gcc/co

Re: Repost [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.

2024-02-07 Thread Kewen.Lin
on 2024/2/7 08:06, Michael Meissner wrote: > On Thu, Jan 25, 2024 at 05:28:49PM +0800, Kewen.Lin wrote: >> Hi Mike, >> >> on 2024/1/6 07:38, Michael Meissner wrote: >>> The MMA subsystem added the notion of accumulator registers as an optional >>> feature o

Re: [PATCH] rs6000: Neuter option -mpower{8,9}-vector [PR109987]

2024-02-20 Thread Kewen.Lin
Hi Segher, Thanks for the review comments! on 2024/2/20 02:45, Segher Boessenkool wrote: > Hi! > > On Tue, Jan 16, 2024 at 10:50:01AM +0800, Kewen.Lin wrote: >> As PR109987 and its duplicated bugs show, -mno-power8-vector >> (and -mno-power9-vector) cause some problems and

Re: [PATCH] rs6000: Update instruction counts due to combine changes [PR112103]

2024-02-20 Thread Kewen.Lin
Hi Peter, on 2024/2/20 06:35, Peter Bergner wrote: > rs6000: Update instruction counts due to combine changes [PR112103] > > The PR91865 combine fix changed instruction counts slightly for rlwinm-0.c. > Adjust expected instruction counts accordingly. > > This passed on both powerpc64le-linux and

Re: Repost [PATCH 1/6] Add -mcpu=future

2024-02-20 Thread Kewen.Lin
Hi Mike, Sorry for late reply (just back from vacation). on 2024/2/8 03:58, Michael Meissner wrote: > On Wed, Feb 07, 2024 at 05:21:10PM +0800, Kewen.Lin wrote: >> on 2024/2/6 14:01, Michael Meissner wrote: >> Sorry for the possible confusion here, the "tune_proc" tha

Re: [PATCH] rs6000: Neuter option -mpower{8,9}-vector [PR109987]

2024-02-20 Thread Kewen.Lin
on 2024/2/20 19:19, Segher Boessenkool wrote: > On Tue, Feb 20, 2024 at 05:27:07PM +0800, Kewen.Lin wrote: >> Good question, it mainly follows the practice of option direct-move here. >> IMHO at least for power8-vector we want WarnRemoved for now as it's >> documented b

Re: [PATCH] rs6000: Neuter option -mpower{8,9}-vector [PR109987]

2024-02-20 Thread Kewen.Lin
on 2024/2/21 09:37, Peter Bergner wrote: > On 2/20/24 3:27 AM, Kewen.Lin wrote: >> on 2024/2/20 02:45, Segher Boessenkool wrote: >>> On Tue, Jan 16, 2024 at 10:50:01AM +0800, Kewen.Lin wrote: >>>> it consists of some aspects: >>>> - effective

Re: Repost [PATCH 1/6] Add -mcpu=future

2024-02-26 Thread Kewen.Lin
on 2024/2/21 15:19, Michael Meissner wrote: > On Tue, Feb 20, 2024 at 06:35:34PM +0800, Kewen.Lin wrote: >> Hi Mike, >> >> Sorry for late reply (just back from vacation). >> >> on 2024/2/8 03:58, Michael Meissner wrote: >>> On Wed, Feb 07, 2024 at 05:21:1

Re: [PATCH] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-26 Thread Kewen.Lin
Hi, on 2024/2/26 14:18, jeevitha wrote: > Hi All, > > The following patch has been bootstrapped and regtested on powerpc64le-linux. > > There is no immediate value splatting instruction in powerpc. Currently that > needs to be stored in a register or memory. For addressing this I have updated >

Re: [PATCH] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-26 Thread Kewen.Lin
on 2024/2/26 23:07, Peter Bergner wrote: > On 2/26/24 4:49 AM, Kewen.Lin wrote: >> on 2024/2/26 14:18, jeevitha wrote: >>> Hi All, >>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md >>> index 6111cc90eb7..e5688ff972a 100644 >>> --- a/

Re: [PATCH] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-26 Thread Kewen.Lin
on 2024/2/27 10:13, Peter Bergner wrote: > On 2/26/24 7:55 PM, Kewen.Lin wrote: >> on 2024/2/26 23:07, Peter Bergner wrote: >>> so I think we should use both Jeevitha's predicate change and >>> your operands[1] change. >> >> Since either the original p

Re: [PATCH 01/11] rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins

2024-02-28 Thread Kewen.Lin
Hi, on 2024/2/21 01:55, Carl Love wrote: > > GCC maintainers: > > This patch fixes the arguments and return type for the various > __builtin_vsx_cmple* built-ins. They were defined as signed but should have > been defined as unsigned. > > The patch has been tested on Power 10 with no regress

Re: [PATCH 02/11] rs6000, fix arguments, add documentation for vector, element conversions

2024-02-28 Thread Kewen.Lin
Hi, on 2024/2/21 01:56, Carl Love wrote: > > GCC maintainers: > > This patch fixes the return type for the __builtin_vsx_xvcvdpuxws and > __builtin_vsx_xvcvspuxds built-ins. They were defined as signed but should > have been defined as unsigned. > > The patch has been tested on Power 10 wit

Re: [PATCH 04/11] rs6000, Update comment for the __builtin_vsx_vper*, built-ins.

2024-02-28 Thread Kewen.Lin
Hi, on 2024/2/21 01:56, Carl Love wrote: > GCC maintainers: > > The patch expands an existing comment to document that the duplicates are > covered by an overloaded built-in. I am wondering if we should just go ahead > and remove the duplicates? As the below comments Bill placed before, I thi

Re: [PATCH 03/11] rs6000, remove duplicated built-ins

2024-02-28 Thread Kewen.Lin
on 2024/2/21 01:56, Carl Love wrote: > GCC maintainers: > > There are a number of undocumented built-ins that are duplicates of other > documented built-ins. This patch removes the duplicates so users will only > use the documented built-in. > > The patch has been tested on Power 10 with no re

Re: [PATCH 05/11] rs6000, __builtin_vsx_xvneg[sp,dp] add documentation, and test cases

2024-02-28 Thread Kewen.Lin
Hi, on 2024/2/21 01:56, Carl Love wrote: > GCC maintainers: > > The patch adds documentation and test cases for the __builtin_vsx_xvnegsp, > __builtin_vsx_xvnegdp built-ins. > > The patch has been tested on Power 10 with no regressions. > > Please let me know if this patch is acceptable for ma

Re: [PATCH 06/11] rs6000, __builtin_vsx_xxpermdi_1ti add documentation, and test case

2024-02-28 Thread Kewen.Lin
Hi Carl, on 2024/2/21 01:57, Carl Love wrote: > GCC maintainers: > > The patch adds documentation and test case for the __builtin_vsx_xxpermdi_1ti > built-in. > > The patch has been tested on Power 10 with no regressions. > > Please let me know if this patch is acceptable for mainline. Thanks

Re: [PATCH 07/11] rs6000, __builtin_vsx_xvcmpeq[sp, dp, sp_p] add, documentation and test case

2024-02-28 Thread Kewen.Lin
Hi Carl, on 2024/2/21 01:57, Carl Love wrote: > > GCC maintainers: > > The patch adds documentation and test case for the __builtin_vsx_xvcmpeq[sp, > dp, sp_p] built-ins. > > The patch has been tested on Power 10 with no regressions. > > Please let me know if this patch is acceptable for ma

Re: [PATCH 09/11] rs6000, add test cases for the vec_cmpne built-ins

2024-02-28 Thread Kewen.Lin
Hi, on 2024/2/21 01:57, Carl Love wrote: > GCC maintainers: > > The patch adds test cases for the vec_cmpne of built-ins. > > The patch has been tested on Power 10 with no regressions. > > Please let me know if this patch is acceptable for mainline. Thanks. > > Carl > -

Re: PATCH 11/11] rs6000, make test vec-cmpne.c a runnable test

2024-02-28 Thread Kewen.Lin
Hi, on 2024/2/21 01:58, Carl Love wrote: > GCC maintainers: > > The patch changes the vec-cmpne.c from a compile only test to a runnable > test. The macros to create the functions needed to test the built-ins and > verify the restults are all there in the include file. The .c file just > n

Re: [PATCH 08/11] rs6000, add tests and documentation for various, built-ins

2024-02-28 Thread Kewen.Lin
Hi, on 2024/2/21 01:57, Carl Love wrote: > > GCC maintainers: > > The patch adds documentation a number of built-ins. > > The patch has been tested on Power 10 with no regressions. > > Please let me know if this patch is acceptable for mainline. Thanks. > > Carl > --

  1   2   3   4   5   6   7   8   9   10   >