Re: [PATCH] testsuite: add the case to cover vectorization of A[(i+x)*stride] [PR114322]

2024-03-20 Thread Hao Liu OS
> So - OK with using { target vect_int } instead. Sure, it's much better to be target independent. Refactored and committed in r14-9569-g4c276896 Thanks, - Hao From: Richard Biener Sent: Wednesday, March 20, 2024 16:21 To: Hao Liu OS Cc: GCC

[PATCH] testsuite: add the case to cover vectorization of A[(i+x)*stride] [PR114322]

2024-03-20 Thread Hao Liu OS
Hi Richard, As mentioned in the comments of PR114322 (which has been fixed by PR114151 r14-9540-ge0e9499a), this patch is to cover the case. Bootstrapped and regression tested on aarch64-linux-gnu, OK for trunk? gcc/testsuite/ChangeLog: PR tree-optimization/114322 * gcc.dg/vect/

Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

2023-12-07 Thread Hao Liu OS
/gcc.gnu.org/g:2efe3a7de0107618397264017fb045f237764cc7 Thanks, Hao. From: Richard Biener Sent: Thursday, December 7, 2023 22:12 To: Hao Liu OS Cc: GCC-patches@gcc.gnu.org Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag On Thu, De

Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

2023-12-07 Thread Hao Liu OS
s with such unsigned access pattern, and it can get huge improvements. Thanks, Hao From: Richard Biener Sent: Wednesday, December 6, 2023 19:49 To: Hao Liu OS Cc: GCC-patches@gcc.gnu.org Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chr

Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

2023-12-06 Thread Hao Liu OS
AND (POLYNOMIAL_CHREC_CHECK (NODE), 1) +#define CHREC_VARIABLE(NODE)POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping). */ +#define CHREC_NOWRAP(NODE) POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag /* LABEL_EXPR acc

[PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

2023-12-04 Thread Hao Liu OS
Loop vecotorization can not optimize following case due to SCEV is not affine failure (i+offset may overflow): int A[1024 * 2]; int foo (unsigned offset, unsigned N) { int sum = 0; for (unsigned i = 0; i < N; i++) sum += A[i + offset]; return sum; }

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-08-03 Thread Hao Liu OS via Gcc-patches
Gentle ping. Is it OK for master? I'm afraid the ICE may cause trouble and hope it can be fixed ASAP. Thanks, Hao From: Hao Liu OS Sent: Wednesday, August 2, 2023 11:45 To: Richard Sandiford Cc: Richard Biener; GCC-patches@gcc.gnu.org Subjec

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-08-01 Thread Hao Liu OS via Gcc-patches
lt; 4; i++) +sum += tmp[i]; + + return (unsigned int) sum >> 1; +} -- 2.34.1 ________ From: Hao Liu OS Sent: Tuesday, August 1, 2023 17:43 To: Richard Sandiford Cc: Richard Biener; GCC-patches@gcc.gnu.org Subject: Re: [PATCH] AArch64: Do not increase the vect

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-08-01 Thread Hao Liu OS via Gcc-patches
-- 2.40.0 From: Richard Sandiford Sent: Monday, July 31, 2023 17:11 To: Hao Liu OS Cc: Richard Biener; GCC-patches@gcc.gnu.org Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Hao Liu OS writes: >> Which test case do you see this for

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-31 Thread Hao Liu OS via Gcc-patches
Sure, the helper makes the code simpler. I'll test the new patch and push if there is no other issue. Thanks, Hao From: Richard Sandiford Sent: Monday, July 31, 2023 17:11 To: Hao Liu OS Cc: Richard Biener; GCC-patches@gcc.gnu.org Subject: Re: [

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-30 Thread Hao Liu OS via Gcc-patches
<-- This is not live, may be caused by the below type cast stmt. res_15 = (short int) _7; i_16 = i_20 + 1; if (n_11(D) > i_16) goto ; else goto ; : goto ; Thanks, -Hao From: Richard Sandiford Sent: Saturday, July 29,

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-28 Thread Hao Liu OS via Gcc-patches
tency should be multiplied by the count for + single_defuse_cycle. */ + +long +f (long res, short *ptr1, short *ptr2, int n) +{ + for (int i = 0; i < n; ++i) +res += (long) ptr1[i] << ptr2[i]; + return res; +} -- 2.34.1 From: Hao Liu OS Sent

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-26 Thread Hao Liu OS via Gcc-patches
s->reduction_latency, base * count); else ops->reduction_latency = MAX (ops->reduction_latency, base); Thanks, Hao ____ From: Richard Sandiford Sent: Wednesday, July 26, 2023 17:14 To: Richard Biener Cc: Hao Liu OS; GCC-patches@gcc.gnu.org S

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-25 Thread Hao Liu OS via Gcc-patches
VECTYPE (stmt_info): 0x0 Thanks, Hao From: Richard Sandiford Sent: Tuesday, July 25, 2023 17:44 To: Hao Liu OS Cc: GCC-patches@gcc.gnu.org Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Hao Liu OS writes: > Hi, > > Tha

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-25 Thread Hao Liu OS via Gcc-patches
ied by the count for + single_defuse_cycle. */ + +long +f (long res, short *ptr1, short *ptr2, int n) +{ + for (int i = 0; i < n; ++i) +res += (long) ptr1[i] << ptr2[i]; + return res; +} -- 2.34.1 From: Richard Sandiford Sent: Monda

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-23 Thread Hao Liu OS via Gcc-patches
Hi Richard, Gentle ping. Is it ok for trunk? Or, you will have patch covering such fix? Thanks, -Hao From: Hao Liu OS Sent: Wednesday, July 19, 2023 12:33 To: GCC-patches@gcc.gnu.org Cc: richard.sandif...@arm.com Subject: [PATCH] AArch64: Do not

[PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-18 Thread Hao Liu OS via Gcc-patches
This only affects the new costs in aarch64 backend. Currently, the reduction latency of vector body is too large as it is multiplied by stmt count. As the scalar reduction latency is small, the new costs model may think "scalar code would issue more quickly" and increase the vector body cost a lo

Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2023-07-06 Thread Hao Liu OS via Gcc-patches
Biener; Hao Liu OS Cc: GCC-patches@gcc.gnu.org Subject: Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449) On 7/6/23 06:44, Richard Biener via Gcc-patches wrote: > On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches > wrote:

[PATCH] Vect: select small VF for epilog of unrolled loop (PR tree-optimization/110474)

2023-07-05 Thread Hao Liu OS via Gcc-patches
Hi, If a loop is unrolled during vectorization (i.e. suggested_unroll_factor > 1), the VFs of both main and epilog loop are enlarged. The epilog vect loop is specific for a loop with small iteration counts, so a large VF may hurt performance. This patch unscales the main loop VF by suggested_unr

[PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2023-07-04 Thread Hao Liu OS via Gcc-patches
Hi, If a loop is unrolled by n times during vectoriation, two steps are used to calculate the induction variable: - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step) - The large step for the whole loop: vec_loop = vec_iv + (VF * Step) This patch calculates an extra vec_

[PATCH] Vect: avoid using uninitialized variable (PR tree-optimization/110531)

2023-07-04 Thread Hao Liu OS via Gcc-patches
slp_done_for_suggested_uf is used in vect_analyze_loop_2 without initialization, which is undefined behavior. Initialize it to false according to the discussion. gcc/ChangeLog: PR tree-optimization/110531 * tree-vect-loop.cc (vect_analyze_loop_1): initialize slp_done_for_s

RE: Add libcody

2020-12-21 Thread Hao Liu OS via Gcc-patches
Hi Nathan, This patch causes a build failure on CentOS. More information: https://gcc.gnu.org/bugzilla//show_bug.cgi?id=98318#c3 Thanks, -Hao > -Original Message- > From: Gcc-patches On Behalf Of Nathan > Sidwell > Sent: Tuesday, December 15, 2020 11:46 PM > To: GCC Patches > Subject

Re: [PATCH] extend cselim to check non-trapping for more references (PR tree-optimizaton/89430)

2020-06-03 Thread Hao Liu OS via Gcc-patches
Sent: Thursday, June 4, 2020 12:55 To: Hao Liu OS Cc: Richard Biener; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] extend cselim to check non-trapping for more references (PR tree-optimizaton/89430) On Thu, Jun 04, 2020 at 04:47:43AM +0000, Hao Liu OS wrote: > The patch is refactored a little a

Re: [PATCH] extend cselim to check non-trapping for more references (PR tree-optimizaton/89430)

2020-06-03 Thread Hao Liu OS via Gcc-patches
r2bb->bb = bb; } else { - n2bb = XNEW (struct name_to_bb); - n2bb->ssa_name_ver = SSA_NAME_VERSION (name); - n2bb->phase = nt_call_phase; - n2bb->bb = bb; - n2bb->store = store; - n

RE: [PATCH] extend cselim to check non-trapping for more references (PR tree-optimizaton/89430)

2020-05-29 Thread Hao Liu OS via Gcc-patches
r2bb->bb = bb; } else { - n2bb = XNEW (struct name_to_bb); - n2bb->ssa_name_ver = SSA_NAME_VERSION (name); - n2bb->phase = nt_call_phase; - n2bb->bb = bb; - n2bb->store = store; - n2b

[PATCH] extend cselim to check non-trapping for more references (PR tree-optimizaton/89430)

2020-05-26 Thread Hao Liu OS via Gcc-patches
Hi all, Previously, the fix for PR89430 was reverted by PR94734 due to a bug. The root cause is missing non-trapping check with dominating LOAD/STORE. This patch extends the cselim non-tra