Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-09 Thread Kewen.Lin
on 2024/1/8 19:44, Richard Biener wrote: > On Mon, Jan 8, 2024 at 3:35 AM Kewen.Lin wrote: >> >> Hi, >> >> As PR113100 shows, the unbiasing introduced by r14-6737 can >> cause the scrubbing to overrun and screw some critical data >> on stack like saved

Re: [PATCH] PR target/112886, Add %S to print_operand for vector pair support

2024-01-10 Thread Kewen.Lin
Hi Mike, on 2024/1/6 06:18, Michael Meissner wrote: > In looking at support for load vector pair and store vector pair for the > PowerPC in GCC, I noticed that we were missing a print_operand output modifier > if you are dealing with vector pairs to print the 2nd register in the vector > pair. >

Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-11 Thread Kewen.Lin
Hi Alexandre, on 2024/1/11 17:05, Alexandre Oliva wrote: > On Jan 7, 2024, "Kewen.Lin" wrote: > >> As PR113100 shows, the unbiasing introduced by r14-6737 can >> cause the scrubbing to overrun and screw some critical data >> on stack like saved toc base conse

Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-14 Thread Kewen.Lin
on 2024/1/12 19:03, Alexandre Oliva wrote: > On Jan 12, 2024, "Kewen.Lin" wrote: > >>>> By checking PR112917, IMHO we should keep this unbiasing >>>> guarded under SPARC_STACK_BOUNDARY_HACK (TARGET_ARCH64 && >>>> TARGET_STACK_BIAS), sim

Re: [PATCH, rs6000] Refactor expand_compare_loop and split it to two functions

2024-01-14 Thread Kewen.Lin
Hi Haochen, on 2024/1/10 09:35, HAO CHEN GUI wrote: > Hi, > This patch refactors function expand_compare_loop and split it to two > functions. One is for fixed length and another is for variable length. > These two functions share some low level common help functions. I'm expecting refactoring

Re: [PATCH, rs6000] Enable block compare expand on P9 with m32 and mpowerpc64

2024-01-14 Thread Kewen.Lin
Hi Haochen, on 2024/1/12 14:48, HAO CHEN GUI wrote: > Hi, > On P9 "setb" is used to set the result of block compare. So it works > with m32 and mpowerpc64. On P8, carry bit is used. So it can't work > with m32 and mpowerpc64. This patch enables block compare expand for > m32 and mpowerpc64 on P9

Re: [Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]

2024-01-14 Thread Kewen.Lin
Hi Haochen, on 2024/1/11 16:28, HAO CHEN GUI wrote: > Hi, > This patch eliminates unnecessary byte swaps for block clear on P8 > LE. For block clear, all the bytes are set to zero. The byte order > doesn't make sense. So the alignment of destination could be set to > the store mode size in stead

[PATCH] testsuite: Fix vect_long_mult on Power [PR109705]

2024-01-15 Thread Kewen.Lin
Hi, As pointed out by the discussion in PR109705, the current vect_long_mult effective target check on Power is broken. This patch is to fix it accordingly. With additional change by adding a guard vect_long_mult in gcc.dg/vect/pr25413a.c , it's tested well on Power{8,9} LE & BE (also on Power10

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-16 Thread Kewen.Lin
on 2024/1/16 06:22, Ajit Agarwal wrote: > Hello Richard: > > On 15/01/24 6:25 pm, Ajit Agarwal wrote: >> >> >> On 15/01/24 6:14 pm, Ajit Agarwal wrote: >>> Hello Richard: >>> >>> On 15/01/24 3:03 pm, Richard Biener wrote: On Sun, Jan 14, 2024 at 4:29 PM Ajit Agarwal wrote: > >

[committed] testsuite, rs6000: Adjust fold-vec-extract-char.p7.c [PR111850]

2024-01-17 Thread Kewen.Lin
Hi, As PR101169 comment #c4 shows, previously the addi count update on fold-vec-extract-char.p7.c covered a sub-optimal code gen issue. On trunk, pass fold-mem-offsets helps to recover the best code sequence, so this patch is to revert the count back to the original which matches the optimal addi

Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-17 Thread Kewen.Lin
aix is: make check-gcc RUNTESTFLAGS="--target_board=unix'{-m64,-m32}' dg.exp=strub-unsupported*.c" BR, Kewen > Thanks, David > > > On Wed, Jan 17, 2024 at 8:06 PM Alexandre Oliva <mailto:ol...@adacore.com>> wrote: > > David, > &g

Re: Repost [PATCH 1/6] Add -mcpu=future

2024-01-23 Thread Kewen.Lin
Hi Mike, on 2024/1/6 07:35, Michael Meissner wrote: > This patch implements support for a potential future PowerPC cpu. Features > added with -mcpu=future, may or may not be added to new PowerPC processors. > > This patch adds support for the -mcpu=future option. If you use -mcpu=future, > the

Re: Repost [PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.

2024-01-23 Thread Kewen.Lin
on 2024/1/6 07:37, Michael Meissner wrote: > This patch re-enables generating load and store vector pair instructions when > doing certain memory copy operations when -mcpu=future is used. > > During power10 development, it was determined that using store vector pair > instructions were problemati

Re: [PATCH, V2] PR target/112886, Add %S to print_operand for vector pair support.

2024-01-23 Thread Kewen.Lin
Hi Mike, on 2024/1/12 01:29, Michael Meissner wrote: > This is version 2 of the patch. The only difference is I made the test case > simpler to read. > > In looking at support for load vector pair and store vector pair for the > PowerPC in GCC, I noticed that we were missing a print_operand outp

Re: [PATCH, V2] PR target/112886, Add %S to print_operand for vector pair support.

2024-01-23 Thread Kewen.Lin
on 2024/1/24 11:11, Peter Bergner wrote: > On 1/23/24 8:30 PM, Kewen.Lin wrote: >>> - output_operand_lossage ("invalid %%x value"); >>> + output_operand_lossage ("invalid %%%c value", (code == 'S' ? 'S' : >>> 'x&#x

Re: [PATCH, V2] PR target/112886, Add %S to print_operand for vector pair support.

2024-01-24 Thread Kewen.Lin
on 2024/1/24 23:51, Peter Bergner wrote: > On 1/24/24 12:04 AM, Kewen.Lin wrote: >> on 2024/1/24 11:11, Peter Bergner wrote: >>> But not with this. The -mdejagnu-cpu=power10 option already enables -mvsx. >>> If the user explcitly forces -mno-vsx via RUNTESTFLAGS, the

Re: [PATCH] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-01-24 Thread Kewen.Lin
Hi, Thanks for adjusting this. on 2024/1/24 19:42, Xi Ruoyao wrote: > On Wed, 2024-01-24 at 19:08 +0800, chenxiaolong wrote: >> At 19:00 +0800 on Wednesday, 2024-01-24, Xi Ruoyao wrote: >>> On Wed, 2024-01-24 at 18:32 +0800, chenxiaolong wrote: On 20:09 +0800 on Tuesday, 2024-01-23, Xi Ruoya

Re: Repost [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.

2024-01-25 Thread Kewen.Lin
Hi Mike, on 2024/1/6 07:38, Michael Meissner wrote: > The MMA subsystem added the notion of accumulator registers as an optional > feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with > the traditional floating point registers 0..31, but logically the accumulator > registe

Re: [PATCH] testsuite: Fix vect_long_mult on Power [PR109705]

2024-01-28 Thread Kewen.Lin
on 2024/1/27 06:42, Andrew Pinski wrote: > On Mon, Jan 15, 2024 at 6:43 PM Kewen.Lin wrote: >> >> Hi, >> >> As pointed out by the discussion in PR109705, the current >> vect_long_mult effective target check on Power is broken. >> This patch is to fix it ac

[PATCH] sel-sched: Verify change before replacing dest in EXPR_INSN_RTX [PR112995]

2023-12-15 Thread Kewen.Lin
Hi, PR112995 exposed one issue in current try_replace_dest_reg that the result rtx insn after replace_dest_with_reg_in_expr is probably unable to match any constraints. Although there are some checks on the changes onto dest or src of orig_insn, none is performed on the EXPR_INSN_RTX. This patch

Re: [Patchv2, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-18 Thread Kewen.Lin
Hi Haochen, on 2023/12/18 10:43, HAO CHEN GUI wrote: > Hi, > The patch corrects the definition of > TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of > slow_unaligned_access. > > Compared with last version, > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640076.

Re: [Patchv2, rs6000] Clean up pre-checkings of expand_block_compare

2023-12-18 Thread Kewen.Lin
Hi Haochen, on 2023/12/18 10:44, HAO CHEN GUI wrote: > Hi, > This patch cleans up pre-checkings of expand_block_compare. It does > 1. Assert only P7 above can enter this function as it's already guard > by the expand. > 2. Return false when optimizing for size. > 3. Remove P7 processor test as o

[PATCH] sched: Don't skip empty block by removing no_real_insns_p [PR108273]

2023-12-20 Thread Kewen.Lin
Hi, This patch follows Richi's suggestion "scheduling shouldn't special case empty blocks as they usually do not appear" in [1], it removes function no_real_insns_p and its uses completely. There is some case that one block previously has only one INSN_P, but while scheduling some other blocks th

Re: PING^1 [PATCH] sched: Remove debug counter sched_block

2023-12-20 Thread Kewen.Lin
Hi Jeff, on 2023/12/21 04:43, Jeff Law wrote: > > > On 12/11/23 23:17, Kewen.Lin wrote: >> Hi, >> >> Gentle ping this: >> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636597.html >> >> BR, >> Kewen >> >> on 2

Re: [PATCH] sel-sched: Verify change before replacing dest in EXPR_INSN_RTX [PR112995]

2023-12-20 Thread Kewen.Lin
Hi Jeff, on 2023/12/21 04:30, Jeff Law wrote: > > > On 12/15/23 01:52, Kewen.Lin wrote: >> Hi, >> >> PR112995 exposed one issue in current try_replace_dest_reg >> that the result rtx insn after replace_dest_with_reg_in_expr >> is probably unable to match

Re: [Patchv3, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-20 Thread Kewen.Lin
Hi Haochen, on 2023/12/20 16:51, HAO CHEN GUI wrote: > Hi, > The patch corrects the definition of > TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of > slow_unaligned_access. > > Compared with last version, > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640832.

Re: [Patch, rs6000] Call library for block memory compare when optimizing for size

2023-12-20 Thread Kewen.Lin
Hi Haochen, on 2023/12/20 16:56, HAO CHEN GUI wrote: > Hi, > This patch call library function for block memory compare when it's > optimized for size. > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. Is this OK for trunk? > > Thanks > Gui Haochen > > C

Re: [Patchv3, rs6000] Clean up pre-checkings of expand_block_compare

2023-12-20 Thread Kewen.Lin
Hi, on 2023/12/21 09:37, HAO CHEN GUI wrote: > Hi, > This patch cleans up pre-checkings of expand_block_compare. It does > 1. Assert only P7 above can enter this function as it's already guard > by the expand. > 2. Remove P7 processor test as only P7 above can enter this function and > P7 LE is

[PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-07 Thread Kewen.Lin
Hi, As PR113100 shows, the unbiasing introduced by r14-6737 can cause the scrubbing to overrun and screw some critical data on stack like saved toc base consequently cause segfault on Power. By checking PR112917, IMHO we should keep this unbiasing guarded under SPARC_STACK_BOUNDARY_HACK (TARGET_A

[PATCH] testsuite, rs6000: Adjust pcrel-sibcall-1.c with noipa [PR112751]

2024-01-07 Thread Kewen.Lin
Hi, As PR112751 shows, commit r14-5628 caused pcrel-sibcall-1.c to fail as it enables ipa-vrp which makes return values of functions {x,y,xx} as known and propagated. This patch is to adjust it with noipa to make it not fragile. Tested well on powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu

[PATCH] rs6000: Eliminate zext fed by vclzlsbb [PR111480]

2024-01-07 Thread Kewen.Lin
Hi, As PR111480 shows, commit r14-4079 only optimizes the case of vctzlsbb but not for the similar vclzlsbb. This patch is to consider vclzlsbb as well and avoid the failure on the reported test case. It also simplifies the patterns with iterator and attribute. Bootstrapped and regtested on pow

[PATCH] rs6000: Make copysign (x, -1) back to -abs (x) for IEEE128 float [PR112606]

2024-01-07 Thread Kewen.Lin
Hi, I noticed that commit r14-6192 can't help PR112606 #c3 as it only takes care of SF/DF but TF/KF can still suffer the issue. Similar to commit r14-6192, this patch is to take care of copysign3 with IEEE128 as well. Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and powerpc64le-linux-

Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-03 Thread Kewen.Lin
Hi Ajit, on 2023/12/1 17:10, Ajit Agarwal wrote: > Hello Kewen: > > On 24/11/23 3:01 pm, Kewen.Lin wrote: >> Hi Ajit, >> >> Don't forget to CC David (CC-ed) :), some comments are inlined below. >> >> on 2023/10/8 03:04, Ajit Agarwal wrote: >&

Re: [patch-2, rs6000] guard fctid on PPC64 and powerpc 476 [PR112707]

2023-12-04 Thread Kewen.Lin
Hi Haochen, on 2023/12/1 10:42, HAO CHEN GUI wrote: > Hi, > The "fctid" is supported on 64-bit Power processors and powerpc 476. It > need a guard to check it. The patch fixes the issue. > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with > no regressions. Is this OK for tru

Re: [PATCH] rs6000: Canonicalize copysign (x, -1) back to -abs (x) in the backend [PR112606]

2023-12-04 Thread Kewen.Lin
Hi Jakub, on 2023/11/25 18:17, Jakub Jelinek wrote: > Hi! > > The middle-end has been changed quite recently to canonicalize > -abs (x) to copysign (x, -1) rather than the other way around. > While I agree with that at GIMPLE level, since it matches the GIMPLE > goal of as few operations as possi

[PATCH] range: Workaround different type precision issue between _Float128 and long double [PR112788]

2023-12-04 Thread Kewen.Lin
Hi, As PR112788 shows, on rs6000 with -mabi=ieeelongdouble type _Float128 has the different type precision (128) from that (127) of type long double, but actually they has the same underlying mode, so they have the same precision as the mode indicates the same real type format ieee_quad_format. I

PING^7 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare

2023-12-04 Thread Kewen.Lin
Hi, Gentle ping this series: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html BR, Kewen > >> on 2022/11/24 17:15, Kewen Lin wrote: >>> Hi, >>> >>> Following Segher's suggestion, this patch series is to rework >>> function rs6000_emit_vector_compare for ve

PING^5 [PATCH v2] rs6000: Don't use optimize_function_for_speed_p too early [PR108184]

2023-12-04 Thread Kewen.Lin
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609993.html BR, Kewen > >>>> on 2023/1/16 17:08, Kewen.Lin via Gcc-patches wrote: >>>>> Hi, >>>>> >>>>> As Honza pointed out in [1], the cur

Re: [patch-1, rs6000] enable fctiw on old archs [PR112707]

2023-12-04 Thread Kewen.Lin
Hi Haochen, on 2023/12/1 10:41, HAO CHEN GUI wrote: > Hi, > SImode in float register is supported on P7 above. It causes "fctiw" > can be generated on old 32-bit processors as the output operand of typo? I guess you meant to say "can NOT"? > fctiw insn is a SImode in float/double register. Th

Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-05 Thread Kewen.Lin
on 2023/12/6 02:01, Ajit Agarwal wrote: > Hello Kewen: > > > On 05/12/23 7:13 pm, Ajit Agarwal wrote: >> Hello Kewen: >> >> On 04/12/23 7:31 am, Kewen.Lin wrote: >>> Hi Ajit, >>> >>> on 2023/12/1 17:10, Ajit Agarwal wrote: >>

Re: [PATCH V3 1/3]rs6000: update num_insns_constant for 2 insns

2023-12-06 Thread Kewen.Lin
Hi Jeff, on 2023/12/6 13:24, Jiufu Guo wrote: > Hi, > > Trunk gcc supports more constants to be built via two instructions: > e.g. "li/lis; xori/xoris/rldicl/rldicr/rldic". > And then num_insns_constant should also be updated. > > Function "rs6000_emit_set_long_const" is used to build complicate

Re: [PATCH V3 2/3] Using pli for constant splitting

2023-12-06 Thread Kewen.Lin
Hi Jeff, on 2023/12/6 13:24, Jiufu Guo wrote: > Hi, > > For constant building e.g. r120=0x, which does not fit 'li or lis', > 'pli' is used to build this constant via 'emit_move_insn'. > > While for a complicated constant, e.g. 0xULL, when using > 'rs6000_emit_set_long_co

Re: [patch-2v2, rs6000] guard fctid on PPC64 and powerpc 476 [PR112707]

2023-12-06 Thread Kewen.Lin
Hi Haochen, on 2023/12/6 16:13, HAO CHEN GUI wrote: > Hi, > The "fctid" is supported on 64-bit Power processors and powerpc 476. It > need a guard to check it. The patch fixes the issue. > > Compared with last version, > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/638859.html > th

Re: [patch-1v2, rs6000] enable fctiw on old archs [PR112707]

2023-12-06 Thread Kewen.Lin
Hi, on 2023/12/6 16:13, HAO CHEN GUI wrote: > Hi, > SImode in float register is supported on P7 above. It causes "fctiw" > can't be generated on old 32-bit processors as the output operand of > fctiw insn is an SImode in float/double register. This patch fixes the > problem by adding one expand

Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-06 Thread Kewen.Lin
on 2023/12/6 13:09, Michael Meissner wrote: > On Wed, Dec 06, 2023 at 10:22:57AM +0800, Kewen.Lin wrote: >> I'd expect you use UNSPEC_MMA_EXTRACT to extract V16QI from the result of >> lxvp, >> the current define_insn_and_split "*vsx_disassemble_pair" shou

Re: [patch-2v3, rs6000] Guard fctid on PowerPC64 and PowerPC476 [PR112707]

2023-12-07 Thread Kewen.Lin
Hi Haochen, on 2023/12/8 09:58, HAO CHEN GUI wrote: > Hi, > The "fctid" is supported on 64-bit Power processors and PowerPC476. It > need a guard to check it. The patch fixes the issue. > > Compared with last version, > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639536.html > the

Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-08 Thread Kewen.Lin
Hi Ajit, on 2023/12/8 16:01, Ajit Agarwal wrote: > Hello Kewen: > > On 07/12/23 4:31 pm, Ajit Agarwal wrote: >> Hello Kewen: >> >> On 06/12/23 7:52 am, Kewen.Lin wrote: >>> on 2023/12/6 02:01, Ajit Agarwal wrote: >>>> Hello Kewen: >&g

Re: [PATCH V4 1/3]rs6000: accurate num_insns_constant_gpr

2023-12-11 Thread Kewen.Lin
Hi Jeff, on 2023/12/11 11:26, Jiufu Guo wrote: > Hi, > > Trunk gcc supports more constants to be built via two instructions: > e.g. "li/lis; xori/xoris/rldicl/rldicr/rldic". > And then num_insns_constant should also be updated. > > Function "rs6000_emit_set_long_const" is used to build complicat

Re: [PATCH V4 2/3] Using pli for constant splitting

2023-12-11 Thread Kewen.Lin
Hi, on 2023/12/11 11:26, Jiufu Guo wrote: > Hi, > > For constant building e.g. r120=0x, which does not fit 'li or lis', > 'pli' is used to build this constant via 'emit_move_insn'. > > While for a complicated constant, e.g. 0xULL, when using > 'rs6000_emit_set_long_const'

Re: [Patch, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-11 Thread Kewen.Lin
Hi, on 2023/12/11 09:49, HAO CHEN GUI wrote: > Hi, > The patch corrects the definition of > TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and change its name to a > comprehensible name. > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. Is this OK for trunk? > >

Re: [Patch, rs6000] Clean up pre-checking of expand_block_compare

2023-12-11 Thread Kewen.Lin
Hi, on 2023/12/11 10:54, HAO CHEN GUI wrote: > Hi, > This patch cleans up pre-checking of expand_block_compare. It does > 1. Assert only P7 above can enter this function as it's already guard > by the expand. > 2. Return false when optimizing for size. > 3. Remove P7 CPU test as only P7 above ca

PING^1 [PATCH] range: Workaround different type precision issue between _Float128 and long double [PR112788]

2023-12-11 Thread Kewen.Lin
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639140.html BR, Kewen on 2023/12/4 17:49, Kewen.Lin wrote: > Hi, > > As PR112788 shows, on rs6000 with -mabi=ieeelongdouble type _Float128 > has the different type precision (128) from that (127)

PING^8 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare

2023-12-11 Thread Kewen.Lin
Hi, Gentle ping this series: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html BR, Kewen >>> on 2022/11/24 17:15, Kewen Lin wrote: Hi, Following Segher's suggestion, this patch series is to rework function rs6000_emit_vector_compare for

PING^6 [PATCH v2] rs6000: Don't use optimize_function_for_speed_p too early [PR108184]

2023-12-11 Thread Kewen.Lin
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609993.html BR, Kewen >>>>> on 2023/1/16 17:08, Kewen.Lin via Gcc-patches wrote: >>>>>> Hi, >>>>>> >>>>>> As Honza pointed out in [1],

PING^1 [PATCH] rs6000: New pass to mitigate SP float load perf issue on Power10

2023-12-11 Thread Kewen.Lin
Hi, Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636599.html BR, Kewen on 2023/11/15 17:16, Kewen.Lin wrote: > Hi, > > As Power ISA defines, when loading a scalar single precision (SP) > floating point from memory, we have the double precision (DP) format

PING^1 [PATCH] sched: Remove debug counter sched_block

2023-12-11 Thread Kewen.Lin
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636597.html BR, Kewen on 2023/11/15 17:01, Kewen.Lin wrote: > Hi, > > on 2023/11/10 01:40, Alexander Monakov wrote: > >> I agree with the concern. I hoped that solving the problem by skipping

Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-11 Thread Kewen.Lin
Hi Ajit, on 2023/12/8 16:01, Ajit Agarwal wrote: > Hello Kewen: > [snip...] > With UNSPEC_MMA_EXTRACT I could generate the register pair but functionally > here is the > below code which is incorrect. > > llxvp %vs0,0(%r4) > xxlor %vs32,%vs0,%vs0 > xvf32ger 0,%vs34,%vs32

[PATCH draft v2] sched: Don't skip empty block in scheduling [PR108273]

2023-12-11 Thread Kewen.Lin
Hi, on 2023/11/22 17:30, Kewen.Lin wrote: > on 2023/11/17 20:55, Alexander Monakov wrote: >> >> On Fri, 17 Nov 2023, Kewen.Lin wrote: >>>> I don't think you can run cleanup_cfg after sched_init. I would suggest >>>> to put it early in schedule_insns. &g

Re: PING^1 [PATCH] range: Workaround different type precision issue between _Float128 and long double [PR112788]

2023-12-12 Thread Kewen.Lin
Hi Jakub & Andrew, on 2023/12/12 22:42, Jakub Jelinek wrote: > On Tue, Dec 12, 2023 at 09:33:38AM -0500, Andrew MacLeod wrote: >> I leave this for the release managers, but I am not opposed to it for this >> release... It would be nice to remove it for the next release > > I can live with it for

Re: [PATCH] rs6000, testcase: Add require-effective-target has_arch_ppc64 to pr106550_1.c

2023-11-06 Thread Kewen.Lin
Hi, on 2023/11/6 15:20, Jiufu Guo wrote: > Hi, > > With latest trunk, case pr106550_1.c can run with failure on ppc under -m32. > While, the case is testing 64bit constant building. So, "has_arch_ppc64" > is required. Please also mention that it failed with ICE initially due to PR111971, now tha

Re: [PATCH-2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-06 Thread Kewen.Lin
Hi Haochen, on 2023/11/6 10:36, HAO CHEN GUI wrote: > Hi, > This patch enables vector mode for by pieces equality compare. It > adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES > and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare > relies both move and compar

Re: [PATCH-3v2, rs6000] Enable 16-byte by pieces move [PR111449]

2023-11-06 Thread Kewen.Lin
Hi, on 2023/11/6 17:47, HAO CHEN GUI wrote: > Hi, > The patch 2 enables 16-byte by pieces move on rs6000. This patch fixes > the regression cases caused by previous patch. For sra-17/18, the long > array with 4 elements can be loaded by one 16-byte by pieces move on 32-bit > platform. So the arr

Re: [PATCH-2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-06 Thread Kewen.Lin
Hi, on 2023/11/7 11:24, HAO CHEN GUI wrote: > Hi Kewen, > >Thanks for your review comments. Just one question on following > comment. > > 在 2023/11/7 10:40, Kewen.Lin 写道: >> Nit: has_arch_pwr8 would make it un-tested on Power7 default env, I'd prefer >>

PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-07 Thread Kewen.Lin
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html BR, Kewen on 2023/10/25 10:45, Kewen.Lin wrote: > Hi, > > This is almost a repost for v2 which was posted at[1] in March > excepting for: > 1) rebased from r14-4810 which is relati

PING^6 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare

2023-11-07 Thread Kewen.Lin
Hi, Gentle ping this series: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html BR, Kewen > on 2022/11/24 17:15, Kewen Lin wrote: >> Hi, >> >> Following Segher's suggestion, this patch series is to rework >> function rs6000_emit_vector_compare for vector flo

PING^4 [PATCH v2] rs6000: Don't use optimize_function_for_speed_p too early [PR108184]

2023-11-07 Thread Kewen.Lin
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609993.html BR, Kewen >>> on 2023/1/16 17:08, Kewen.Lin via Gcc-patches wrote: >>>> Hi, >>>> >>>> As Honza pointed out in [1], the current uses of f

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-09 Thread Kewen.Lin
Hi Maxim and Alexander, Thanks a lot for the review comments! on 2023/11/10 01:40, Alexander Monakov wrote: > > On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote: > >> Hi Kewen, >> >> Below are my comments. I don't want to override Alexander's review, and if >> the patch looks good to him, it's fine to

Re: [PATCH-2v2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-09 Thread Kewen.Lin
Hi, on 2023/11/9 09:31, HAO CHEN GUI wrote: > Hi, > This patch enables vector mode for by pieces equality compare. It > adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES > and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare > relies both move and compare instru

Re: [PATCH-3v4, rs6000] Fix regression cases caused 16-byte by pieces move [PR111449]

2023-11-13 Thread Kewen.Lin
Hi, on 2023/11/10 17:22, HAO CHEN GUI wrote: > Hi, > Originally 16-byte memory to memory is expanded via pattern. > expand_block_move does an optimization on P8 LE to leverage V2DI reversed > load/store for memory to memory move. Now it's done by 16-byte by pieces > move and the optimization is

Re: [PATCH] rs6000: Disable PCREL for unsupported targets [PR111045]

2023-11-13 Thread Kewen.Lin
Hi Peter, on 2023/11/11 07:51, Peter Bergner wrote: > On 8/27/23 9:06 PM, Kewen.Lin wrote: >> Assuming we only have ELFv2_ABI_CHECK in PCREL_SUPPORTED_BY_OS, we >> can have either TARGET_PCREL or !TARGET_PCREL after the checking. >> For the latter, it's fine and don&

Re: [PATCH] rs6000: Only enable PCREL on supported ABIs [PR111045]

2023-11-15 Thread Kewen.Lin
Hi, on 2023/11/15 11:01, Peter Bergner wrote: > PCREL data accesses are only officially supported on ELFv2. We currently > incorrectly enable PCREL on all Power10 compiles in which prefix instructions > are also enabled. Rework the option override code so we only enable PCREL > for those ABIs th

Re: [PATCH] Clean up by_pieces_ninsns

2023-11-15 Thread Kewen.Lin
Hi, on 2023/11/15 10:26, HAO CHEN GUI wrote: > Hi, > This patch cleans up by_pieces_ninsns and does following things. > 1. Do the length and alignment adjustment for by pieces compare when > overlap operation is enabled. > 2. Remove unnecessary mov_optab checks. > > Bootstrapped and tested on

[PATCH] sched: Remove debug counter sched_block

2023-11-15 Thread Kewen.Lin
Hi, on 2023/11/10 01:40, Alexander Monakov wrote: > I agree with the concern. I hoped that solving the problem by skipping the BB > like the (bit-rotted) debug code needs to would be a minor surgery. As things > look now, it may be better to remove the non-working sched_block debug counter > enti

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-15 Thread Kewen.Lin
Hi Alexander/Richard/Jeff, Thanks for the insightful comments! on 2023/11/10 22:41, Alexander Monakov wrote: > > On Fri, 10 Nov 2023, Richard Biener wrote: > >> On Fri, Nov 10, 2023 at 3:18 PM Alexander Monakov wrote: >>> >>> >>> On Fri, 10 Nov 2023, Richard Biener wrote: >>> > I'm afraid

[PATCH] rs6000: New pass to mitigate SP float load perf issue on Power10

2023-11-15 Thread Kewen.Lin
Hi, As Power ISA defines, when loading a scalar single precision (SP) floating point from memory, we have the double precision (DP) format in target register converted from SP, it's unlike some other architectures which supports SP and DP in registers with their separated formats. The scalar SP i

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-17 Thread Kewen.Lin
on 2023/11/15 17:43, Alexander Monakov wrote: > > On Wed, 15 Nov 2023, Kewen.Lin wrote: > >>>> And I suppose it would be OK to do that. Empty BBs are usually removed by >>>> CFG cleanup so the situation should only happen in rare corner cases where >>

Re: [PATCH V2 1/3]rs6000: update num_insns_constant for 2 insns

2023-11-22 Thread Kewen.Lin
Hi, on 2023/11/15 11:02, Jiufu Guo wrote: > Hi, > > Trunk gcc supports more constants to be built via two instructions: e.g. > "li/lis; xori/xoris/rldicl/rldicr/rldic". > And then num_insns_constant should also be updated. > > Function "rs6000_emit_set_long_const" is used to build complicate > c

Re: [PATCH V2 2/3] Using pli to split 34bits constant

2023-11-22 Thread Kewen.Lin
Hi, on 2023/11/15 11:02, Jiufu Guo wrote: > Hi, > > For constants with 16bit values, 'li or lis' can be used to generate > the value. For 34bit constant, 'pli' is ok to generate the value. > For example: 0xULL, "pli 3,1717986918; rldimi 3,3,32,0" > can be used. Since now if emit

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-22 Thread Kewen.Lin
on 2023/11/17 20:55, Alexander Monakov wrote: > > On Fri, 17 Nov 2023, Kewen.Lin wrote: >>> I don't think you can run cleanup_cfg after sched_init. I would suggest >>> to put it early in schedule_insns. >> >> Thanks for the suggestion, I placed it at the

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-22 Thread Kewen.Lin
on 2023/11/22 18:25, Richard Biener wrote: > On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin wrote: >> >> on 2023/11/17 20:55, Alexander Monakov wrote: >>> >>> On Fri, 17 Nov 2023, Kewen.Lin wrote: >>>>> I don't think you can run cleanup_cfg after

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-23 Thread Kewen.Lin
on 2023/11/23 16:20, Richard Biener wrote: > On Thu, Nov 23, 2023 at 4:02 AM Kewen.Lin wrote: >> >> on 2023/11/22 18:25, Richard Biener wrote: >>> On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin wrote: >>>> >>>> on 2023/11/17 20:55, Alexander Monakov w

Re: [PATCH] rs6000: Disassemble opaque modes using subregs to allow optimizations [PR109116]

2023-11-24 Thread Kewen.Lin
Hi Peter, on 2023/11/16 07:50, Peter Bergner wrote: > PR109116 exposes an issue where using unspecs to access each vector component > of an opaque mode variable leads to unneeded register copies, because our rtl > optimizers cannot handle unspecs. Instead, use subregs to access each vector > comp

Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-11-24 Thread Kewen.Lin
Hi Ajit, Don't forget to CC David (CC-ed) :), some comments are inlined below. on 2023/10/8 03:04, Ajit Agarwal wrote: > Hello All: > > This patch add new pass to replace contiguous addresses vector load lxv with > mma instruction > lxvp. IMHO the current binding lxvp (and lxvpx, stxvp{x,}) to

Re: [PATCH 0/4] Add vector pair support to PowerPC attribute((vector_size(32)))

2023-11-24 Thread Kewen.Lin
on 2023/11/20 16:56, Michael Meissner wrote: > On Mon, Nov 20, 2023 at 08:24:35AM +0100, Richard Biener wrote: >> I wouldn't expose the "fake" larger modes to the vectorizer but rather >> adjust m_suggested_unroll_factor (which you already do to some extent). > > Thanks. I figure I first need to

Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-11-28 Thread Kewen.Lin
Hi Mike, on 2023/11/28 12:34, Michael Meissner wrote: > On Fri, Nov 24, 2023 at 05:31:20PM +0800, Kewen.Lin wrote: >> Hi Ajit, >> >> Don't forget to CC David (CC-ed) :), some comments are inlined below. >> >> on 2023/10/8 03:04, Ajit Agarwal wrote: >>

Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-11-28 Thread Kewen.Lin
on 2023/11/28 15:05, Michael Meissner wrote: > I tried using this patch to compare with the vector size attribute patch I > posted. I could not build it as a cross compiler on my x86_64 because the > assembler gives the following error: > > Error: operand out of domain (11 is not a multiple of 2)

Re: [PATCH] Expand: Pass down equality only flag to cmpmem expand

2023-11-28 Thread Kewen.Lin
Hi Haochen, on 2023/11/28 15:43, HAO CHEN GUI wrote: > Hi, > This patch passes down the equality only flags from > emit_block_cmp_hints to cmpmem optab so that the target specific expand > can generate optimized insns for equality only compare. Targets > (e.g. rs6000) can generate more efficient

Re: [PATCH/RFC 08/10] aarch64: Don't use CEIL for vector_store in aarch64_stp_sequence_cost

2023-09-19 Thread Kewen.Lin
Hi, on 2023/9/18 16:53, Richard Biener wrote: > On Mon, Sep 18, 2023 at 10:41 AM Richard Sandiford > wrote: >> >> Kewen Lin writes: >>> This costing adjustment patch series exposes one issue in >>> aarch64 specific costing adjustment for STP sequence. It >>> causes the below test cases to fail:

Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]

2023-09-24 Thread Kewen.Lin
Hi, on 2023/9/20 16:49, HAO CHEN GUI wrote: > Hi, > This patch enables vector compare for 16-byte memory equality compare. > The 16-byte memory equality compare can be efficiently implemented by > instruction "vcmpequb." It reduces one branch and one compare compared > with two 8-byte compare se

Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]

2023-09-26 Thread Kewen.Lin
Hi, on 2023/9/25 09:57, HAO CHEN GUI wrote: > Hi Kewen, > > 在 2023/9/18 15:34, Kewen.Lin 写道: >> hanks for checking! So for P7, this patch looks neutral, but for P8 and >> later, it may cause some few differences in code gen. I'm curious that how >> many total o

Re: [PATCH-2v3, rs6000] Implement 32bit inline lrint [PR88558]

2023-09-26 Thread Kewen.Lin
Hi, on 2023/9/25 10:05, HAO CHEN GUI wrote: > Hi, > This patch implements 32bit inline lrint by "fctiw". It depends on > the patch1 to do SImode move from FP registers on P7. > > Compared to last version, the main change is to add some test cases. > https://gcc.gnu.org/pipermail/gcc-patches/2

[PATCH] rs6000: Make 32 bit stack_protect support prefixed insn [PR111367]

2023-09-26 Thread Kewen.Lin
Hi, As PR111367 shows, with prefixed insn supported, some of checkings consider it's able to leverage prefixed insn for stack protect related load/store, but since we don't actually change the emitted assembly for 32 bit, it can cause the assembler error as exposed. Mike's commit r10-4547-gce6a6c

[PATCH] testsuite: Avoid uninit var in pr60510.f [PR111427]

2023-09-26 Thread Kewen.Lin
Hi, The uninitialized variable a in pr60510.f can cause some random failures as exposed in PR111427, see the details there. This patch is to make it initialized accordingly. As verified, it can fix the reported -m32 failures on P7 and P8 BE. It's also tested well on powerpc64-linux-gnu P9 and p

Re: [PATCH V4 1/2] rs6000: optimize moving to sf from highpart di

2023-09-27 Thread Kewen.Lin
Hi Jeff, on 2023/8/30 15:43, Jiufu Guo wrote: > Hi, > > Currently, we have the pattern "movsf_from_si2" which was trying > to support moving high part DI to SF. > > The pattern looks like: XX:SF=bitcast:SF(subreg(YY:DI>>32),0) > It only accepts the "ashiftrt" for ">>", but "lshiftrt" is also ok.

Re: [PATCH V4 2/2] rs6000: use mtvsrws to move sf from si p9

2023-09-27 Thread Kewen.Lin
Hi Jeff, on 2023/8/30 15:43, Jiufu Guo wrote: > Hi, > > As mentioned in PR108338, on p9, we could use mtvsrws to implement > the bitcast from SI to SF (or lowpart DI to SF). > > For code: > *(long long*)buff = di; > float f = *(float*)(buff); > > "sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" i

[RFC/PATCH] isel: Fold more in gimple_expand_vec_cond_expr with andc/iorc

2024-07-01 Thread Kewen.Lin
Hi, As PR115659 shows, assuming c = x CMP y, there are some folding chances for patterns r = c ? 0/z : z/-1: - For r = c ? 0 : z, it can be folded into r = ~c & z. - For r = c ? z : -1, it can be folded into r = ~c | z. But BIT_AND/BIT_IOR applied on one BIT_NOT operand is a compound operatio

[PATCH] isel: Fold more in gimple_expand_vec_cond_expr [PR115659]

2024-07-01 Thread Kewen.Lin
Hi, As PR115659 shows, assuming c = x CMP y, there are some folding chances for patterns r = c ? -1/z : z/0. For r = c ? -1 : z, it can be folded into: - r = c | z (with ior_optab supported) - or r = c ? c : z while for r = c ? z : 0, it can be foled into: - r = c & z (with and_optab supp

[PATCH] sparc: define SPARC_LONG_DOUBLE_TYPE_SIZE for vxworks [PR115739]

2024-07-01 Thread Kewen.Lin
Hi, Commit r15-1594 removed define of LONG_DOUBLE_TYPE_SIZE in sparc.cc, it's based on the assumption that each OS has its own define (see the comments in sparc.h), but it exposes an issue on vxworks which lacks of the define. We can bring back the default SPARC_LONG_DOUBLE_TYPE_SIZE to sparc.cc,

Re: [PATCH] isel: Fold more in gimple_expand_vec_cond_expr [PR115659]

2024-07-02 Thread Kewen.Lin
on 2024/7/1 22:28, Richard Biener wrote: > On Mon, Jul 1, 2024 at 8:16 AM Kewen.Lin wrote: >> >> Hi, >> >> As PR115659 shows, assuming c = x CMP y, there are some >> folding chances for patterns r = c ? -1/z : z/0. >> >> For r = c ? -1 : z, it can be

Re: [RFC/PATCH] isel: Fold more in gimple_expand_vec_cond_expr with andc/iorc

2024-07-02 Thread Kewen.Lin
Hi! on 2024/7/2 04:28, Segher Boessenkool wrote: > On Mon, Jul 01, 2024 at 04:36:44PM +0200, Richard Biener wrote: >> On Mon, Jul 1, 2024 at 8:17 AM Kewen.Lin wrote: >>> As PR115659 shows, assuming c = x CMP y, there are some >>> folding chances for patterns r = c ? 0/

<    1   2   3   4   5   6   7   8   9   10   >