[Committed] RISC-V: Some minior tweak on dynamic LMUL cost model

2023-12-26 Thread Juzhe-Zhong
Tweak some codes of dynamic LMUL cost model to make computation more 
predictable and accurate.

Tested on both RV32 and RV64 no regression.

Committed.

PR target/113112

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (compute_estimated_lmul): Tweak 
LMUL estimation.
(has_unexpected_spills_p): Ditto.
(costs::record_potential_unexpected_spills): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: Add more checks.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-12.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-2.c: New test.

---
 gcc/config/riscv/riscv-vector-costs.cc| 42 +--
 .../costmodel/riscv/rvv/dynamic-lmul1-1.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul1-2.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul1-3.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul1-4.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul1-5.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul1-6.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul1-7.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-1.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-2.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-3.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-4.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-5.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-1.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-2.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-3.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-5.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-6.c |  5 ++-
 .../costmodel/riscv/rvv/dynamic-lmul4-7.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-8.c |  5 ++-
 .../costmodel/riscv/rvv/dynamic-lmul8-1.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-10.c|  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-11.c|  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-12.c| 25 +++
 .../costmodel/riscv/rvv/dynamic-lmul8-2.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-3.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-4.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-5.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-6.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-7.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-8.c |  3 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-9.c |  3 ++
 .../vect/costmodel/riscv/rvv/pr113112-2.c | 20 +
 33 files changed, 166 insertions(+), 15 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-12.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-2.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 7b837b08f9e..74b8e86a5e1 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -394,21 +394,32 @@ compute_estimated_lmul (loop_vec_info loop_vinfo, 
machine_mode mode)

[PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor

2023-12-26 Thread pan2 . li
From: Pan Li 

This patch would like to XFAIL the test case pr30957-1.c for the RVV when
build the elf with some configurations (list at the end of the log)
It will be vectorized during vect_transform_loop with a variable factor.
It won't benefit from unrolling/peeling and mark the loop->unroll as 1.
Of course, it will do nothing during unroll_loops when loop->unroll is 1.

The aarch64_sve may have the similar issue but it initialize the const
`0.0 / -5.0` in the test file to `+0.0` before pass to the function foo.
Then it will pass the execution test.

aarch64:
moviv0.2s, #0x0
stp x29, x30, [sp, #-16]!
mov w0, #0xa
mov x29, sp
bl  400280  <== s0 is +0.0

Unfortunately, the riscv initialize the the const `0.0 / -5.0` to the
`-0.0`, and then pass it to the function foo. Of course it the execution
test will fail.

riscv:
flw fa0,388(gp) # 1299c <__SDATA_BEGIN__+0x4>
addisp,sp,-16
li  a0,10
sd  ra,8(sp)
jal 101fc   <== fa0 is -0.0

After this patch the loops vectorized with a variable factor of the RVV
will be treated as XFAIL by the tree dump when riscv_v and
variable_vect_length.

The below configurations are validated as XFAIL for RV64.

* 
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
* 
riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax

gcc/testsuite/ChangeLog:

* gcc.dg/pr30957-1.c: Add XFAIL for RVV when vectorized with
variable length.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.dg/pr30957-1.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr30957-1.c b/gcc/testsuite/gcc.dg/pr30957-1.c
index 564410913ab..7a7242ec16d 100644
--- a/gcc/testsuite/gcc.dg/pr30957-1.c
+++ b/gcc/testsuite/gcc.dg/pr30957-1.c
@@ -3,7 +3,7 @@
where each addition is a library call.  /
 /* { dg-require-effective-target hard_float } */
 /* -fassociative-math requires -fno-trapping-math and -fno-signed-zeros. */
-/* { dg-options "-O2 -funroll-loops -fassociative-math -fno-trapping-math 
-fno-signed-zeros -fvariable-expansion-in-unroller -fdump-rtl-loop2_unroll" } */
+/* { dg-options "-O2 -funroll-loops -fassociative-math -fno-trapping-math 
-fno-signed-zeros -fvariable-expansion-in-unroller -fdump-rtl-loop2_unroll 
-fdump-tree-vect-details" } */
 
 extern void abort (void);
 extern void exit (int);
@@ -34,3 +34,4 @@ main ()
 }
 
 /* { dg-final { scan-rtl-dump "Expanding Accumulator" "loop2_unroll" { xfail 
mmix-*-* } } } */
+/* { dg-f

[Committed] RISC-V: Fix typo

2023-12-26 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: Fix typo.

---
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
index f3c2315c2c5..e47af25aa9b 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
@@ -19,5 +19,5 @@ bar (int *x, int a, int b, int n)
 
 /* { dg-final { scan-assembler {e32,m4} } } */
 /* { dg-final { scan-assembler-not {jr} } } */
-/* { dg-final { scan-assembler-times {ret} 2 } } *
+/* { dg-final { scan-assembler-times {ret} 2 } } */
 /* { dg-final { scan-tree-dump-times "Preferring smaller LMUL loop because it 
has unexpected spills" 1 "vect" } } */
-- 
2.36.3



Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-12-26 Thread Jason Merrill

On 12/23/23 02:10, waffl3x wrote:

On Friday, December 22nd, 2023 at 10:26 AM, Jason Merrill  
wrote:



On 12/22/23 04:01, waffl3x wrote:


int n = 0;
auto f = [](this Self){
static_assert(__is_same (decltype(n), int));
decltype((n)) a; // { dg-error {is not captured} }
};
f();

Could you clarify if this error being removed was intentional. I do
recall that Patrick Palka wanted to remove this error in his patch, but
it seemed to me like you stated it would be incorrect to allow it.
Since the error is no longer present I assume I am misunderstanding the
exchange.

In any case, let me know if I need to modify my test case or if this
error needs to be added back in.



Removing the error was correct under
https://eel.is/c++draft/expr.prim#id.unqual-3
Naming n in that lambda would not refer a capture by copy, so the
decltype is the same as outside the lambda.


Alright, I've fixed my tests to reflect that.

I've got defaulting assignment operators working. Defaulting equality
and comparison operators seemed to work out of the box somehow, so I
just have to make some fleshed out tests for those cases.

There can always be more tests, I have a few ideas for what still needs
to be covered, mostly with dependent lambdas. Tests for xobj conversion
operators definitely need to be more fleshed out. I also need to
formulate some tests to make sure constraints are not being taking into
account when the object parameters should not correspond, but that's a
little more tough to test for than the valid cases.

Other than tests though, is there anything you can think of that the
patch is missing? Other than the aforementioned tests, I'm pretty
confident everything is done.

To recap, I have CWG2789 implemented on my end with the change we
discussed to require corresponding object parameters instead of the
same type, and I have CWG2586 implemented. I can't recall what other
outstanding issues we had, and my notes don't mention anything other
than tests. So I'm assuming everything is good.


Sounds good!  Did you mean to include the updated patch?

Jason



Re: [PATCH v2 2/2] libstdc++: implement std::generator

2023-12-26 Thread Will Hawkins
On Thu, Dec 21, 2023 at 4:26 PM Arsen Arsenović  wrote:
>
> libstdc++-v3/ChangeLog:
>

... snip ...

> +   void
> +   _M_jump_in(_Coro_handle __rest, _Coro_handle __new) noexcept
> +   {
> + __glibcxx_assert(&__new.promise()._M_nest == this);
> + __glibcxx_assert(this->_M_is_bottom());
> + // We're bottom.  We're also top of top is unset (note that this is

Should this read:

// We're bottom.  We're also top if top is unset (note that this is

?

Very impressive work -- I learned a ton by reading your implementation.
Will


Re: [PATCH] cse: Fix handling of fake vec_select sets [PR111702]

2023-12-26 Thread Prathamesh Kulkarni
On Thu, 21 Dec 2023 at 00:00, Richard Sandiford
 wrote:
>
> If cse sees:
>
>   (set (reg R) (const_vector [A B ...]))
>
> it creates fake sets of the form:
>
>   (set R[0] A)
>   (set R[1] B)
>   ...
>
> (with R[n] replaced by appropriate rtl) and then adds them to the tables
> in the same way as for normal sets.  This allows a sequence like:
>
>   (set (reg R2) A)
>   ...(reg R2)...
>
> to try to use R[0] instead of (reg R2).
>
> But the pass was taking the analogy too far, and was trying to simplify
> these fake sets based on costs.  That is, if there was an earlier:
>
>   (set (reg T) A)
>
> the pass would go to considerable effort trying to work out whether:
>
>   (set R[0] A)
>
> or:
>
>   (set R[0] (reg T))
>
> was more profitable.  This included running validate*_change on the sets,
> which has no meaning given that the sets are not part of the insn.
>
> In this example, the equivalence A == T is already known, and the
> purpose of the fake sets is to add A == T == R[0].  We can do that
> just as easily (or, as the PR shows, more easily) if we keep the
> original form of the fake set, with A instead of T.
>
> The problem in the PR occurred if we had:
>
> (1) something that establishes an equivalence between a vector V1 of
> M-bit scalar integers and a hard register H
>
> (2) something that establishes an equivalence between a vector V2 of
> N-bit scalar integers, where N instances of V1[0]
>
> (1) established an equivalence between V1[0] and H in M bits.
> (2) then triggered a search for an equivalence of V1[0] in N bits.
> This included:
>
>   /* See if we have a CONST_INT that is already in a register in a
>  wider mode.  */
>
> which (correctly) found that the low N bits of H contain the right value.
> But because it came from a wider mode, this equivalence between N-bit H
> and N-bit V1[0] was not yet in the hash table.  It therefore survived
> the purge in:
>
>   /* At this point, ELT, if nonzero, points to a class of expressions
>  equivalent to the source of this SET and SRC, SRC_EQV, SRC_FOLDED,
>  and SRC_RELATED, if nonzero, each contain additional equivalent
>  expressions.  Prune these latter expressions by deleting expressions
>  already in the equivalence class.
>
> And since more than 1 set found the same N-bit equivalence between
> H and V1[0], the pass tried to add it more than once.
>
> Things were already wrong at this stage, but an ICE was only triggered
> later when trying to merge this N-bit equivalence with another one.
>
> We could avoid the double registration by adding:
>
>   for (elt = classp; elt; elt = elt->next_same_value)
> if (rtx_equal_p (elt->exp, x))
>   return elt;
>
> to insert_with_costs, or by making cse_insn check whether previous
> sets have recorded the same equivalence.  The latter seems more
> appealing from a compile-time perspective.  But in this case,
> doing that would be adding yet more spurious work to the handling
> of fake sets.
>
> The handling of fake sets therefore seems like the more fundamental bug.
>
> While there, the patch also makes sure that we don't apply REG_EQUAL
> notes to these fake sets.  They only describe the "real" (first) set.
Hi Richard,
Thanks for the detailed explanation and fix!

Thanks,
Prathamesh
>
> gcc/
> PR rtl-optimization/111702
> * cse.cc (set::mode): Move earlier.
> (set::src_in_memory, set::src_volatile): Convert to bitfields.
> (set::is_fake_set): New member variable.
> (add_to_set): Add an is_fake_set parameter.
> (find_sets_in_insn): Update calls accordingly.
> (cse_insn): Do not apply REG_EQUAL notes to fake sets.  Do not
> try to optimize them either, or validate changes to them.
>
> gcc/
> PR rtl-optimization/111702
> * gcc.dg/rtl/aarch64/pr111702.c: New test.
> ---
>  gcc/cse.cc  | 38 +++---
>  gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c | 43 +
>  2 files changed, 67 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
>
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index f9603fdfd43..9fd51ca2832 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -4128,13 +4128,17 @@ struct set
>unsigned dest_hash;
>/* The SET_DEST, with SUBREG, etc., stripped.  */
>rtx inner_dest;
> +  /* Original machine mode, in case it becomes a CONST_INT.  */
> +  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
>/* Nonzero if the SET_SRC is in memory.  */
> -  char src_in_memory;
> +  unsigned int src_in_memory : 1;
>/* Nonzero if the SET_SRC contains something
>   whose value cannot be predicted and understood.  */
> -  char src_volatile;
> -  /* Original machine mode, in case it becomes a CONST_INT.  */
> -  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
> +  unsigned int src_volatile : 1;
> +  /* Nonzero if RTL is an artifical set that has been c

Re: [PATCH 2/3][RFC] RISC-V: Add vector related reservations

2023-12-26 Thread Edwin Lu




On 12/20/2023 2:55 PM, Edwin Lu wrote:

On 12/20/2023 10:57 AM, Jeff Law wrote:



On 12/15/23 11:53, Edwin Lu wrote:

This patch copies the vector reservations from generic-ooo.md and
inserts them into generic.md and sifive.md. The vector pipelines are
necessary to avoid an ICE from the assert


I forgot to get clarification earlier but this patch would introduce 
many scan-dump failures for both vector and non-vector targets 
(https://github.com/ewlu/gcc-precommit-ci/issues/950#issuecomment-1858392181). 
I haven't identified any execution errors that would be introduced on 
mtune=rocket aside from one ICE which I'm currently working on fixing.


Additionally, as mentioned in PR113035, there are significant testsuite 
differences between mtune=rocket and mtune=sifive-7-series. I haven't 
gone through all of the differences and I don't know if they are a 
problem with the patch or a result of the cost modeling assumptions.


Is there a problem with the current way mtune=rocket is modeled with 
generic.md?


Edwin


[PATCH] LoongArch: Fix infinite secondary reloading of FCCmode [PR113148]

2023-12-26 Thread Xi Ruoyao
The GCC internal doc says:

 X might be a pseudo-register or a 'subreg' of a pseudo-register,
 which could either be in a hard register or in memory.  Use
 'true_regnum' to find out; it will return -1 if the pseudo is in
 memory and the hard register number if it is in a register.

So "MEM_P (x)" is not enough for checking if we are reloading from/to
the memory.  This bug has caused reload pass to stall and finally ICE
complaining with "maximum number of generated reload insns per insn
achieved", since r14-6814.

Check if "true_regnum (x)" is -1 besides "MEM_P (x)" to fix the issue.

gcc/ChangeLog:

PR target/113148
* config/loongarch/loongarch.cc (loongarch_secondary_reload):
Check if regno == -1 besides MEM_P (x) for reloading FCCmode
from/to FPR to/from memory.

gcc/testsuite/ChangeLog:

PR target/113148
* gcc.target/loongarch/pr113148.c: New test.
---

Bootstrapped & regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.cc |  3 +-
 gcc/testsuite/gcc.target/loongarch/pr113148.c | 44 +++
 2 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr113148.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 5ffd06ce9be..c0a0af3dda5 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6951,7 +6951,8 @@ loongarch_secondary_reload (bool in_p ATTRIBUTE_UNUSED, 
rtx x,
  return NO_REGS;
}
 
-  if (reg_class_subset_p (rclass, FP_REGS) && MEM_P (x))
+  if (reg_class_subset_p (rclass, FP_REGS)
+ && (regno == -1 || MEM_P (x)))
return GR_REGS;
 
   return NO_REGS;
diff --git a/gcc/testsuite/gcc.target/loongarch/pr113148.c 
b/gcc/testsuite/gcc.target/loongarch/pr113148.c
new file mode 100644
index 000..cf48e552053
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr113148.c
@@ -0,0 +1,44 @@
+/* PR 113148: ICE caused by infinite reloading */
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=la464 -mfpu=64 -mabi=lp64d" } */
+
+struct bound
+{
+  double max;
+} drawQuadrant_bound;
+double w4, innerXfromXY_y, computeBound_right_0;
+struct arc_def
+{
+  double w, h;
+  double a0, a1;
+};
+static void drawQuadrant (struct arc_def *);
+static void
+computeBound (struct arc_def *def, struct bound *bound)
+{
+  double ellipsex_1, ellipsex_0;
+  bound->max = def->a1 ?: __builtin_sin (w4) * def->h;
+  if (def->a0 == 5 && def->w == def->h)
+;
+  else
+ellipsex_0 = def->a0 == 0.0 ?: __builtin_cos (w4);
+  if (def->a1 == 5 && def->w == def->h)
+ellipsex_1 = bound->max;
+  __builtin_sqrt (ellipsex_1 * innerXfromXY_y * innerXfromXY_y * w4);
+  computeBound_right_0 = ellipsex_0;
+}
+void
+drawArc ()
+{
+  struct arc_def foo;
+  for (;;)
+drawQuadrant (&foo);
+}
+void
+drawQuadrant (struct arc_def *def)
+{
+  int y, miny;
+  computeBound (def, &drawQuadrant_bound);
+  while (y >= miny)
+;
+}
-- 
2.43.0



Re: [PATCH v2 2/2] libstdc++: implement std::generator

2023-12-26 Thread Arsen Arsenović
Hi Will,

Will Hawkins  writes:

> On Thu, Dec 21, 2023 at 4:26 PM Arsen Arsenović  wrote:
>>
>> libstdc++-v3/ChangeLog:
>>
>
> ... snip ...
>
>> +   void
>> +   _M_jump_in(_Coro_handle __rest, _Coro_handle __new) noexcept
>> +   {
>> + __glibcxx_assert(&__new.promise()._M_nest == this);
>> + __glibcxx_assert(this->_M_is_bottom());
>> + // We're bottom.  We're also top of top is unset (note that this is
>
> Should this read:
>
> // We're bottom.  We're also top if top is unset (note that this is
>
> ?

Yes, I am decently sure so - well spotted.  I'll leave your message
unread to correct this typo later.  Thanks!

> Very impressive work -- I learned a ton by reading your implementation.
> Will

Happy to hear that :-)

Have a lovely night!
--
Arsen Arsenović


signature.asc
Description: PGP signature


[PING][PATCH] enable ATOMIC_COMPARE_EXCHANGE opt for floating type or types contains padding

2023-12-26 Thread xndcn
Ping, thanks.

I did some benchmarks, and there is some significant time optimization for 
float/double types,
while there is no regression for long double type.



[PATCH] RISC-V: Disallow transformation into VLMAX AVL for cond_len_xxx when length is in range [0, 31]

2023-12-26 Thread Juzhe-Zhong
Notice we have this following situation:

vsetivlizero,4,e32,m1,ta,ma
vlseg4e32.v v4,(a5)
vlseg4e32.v v12,(a3)
vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since 
VLMAX AVL = 4 when it is fixed-vlmax
vfadd.vfv3,v13,fa0
vfadd.vfv1,v12,fa1
vfmul.vvv17,v3,v5
vfmul.vvv16,v1,v5

The rootcause is that we transform COND_LEN_xxx into VLMAX AVL when len == 
NUNITS blindly.
However, we don't need to transform all of them since when len is range of 
[0,31], we don't need to
consume scalar registers.

After this patch:

vsetivlizero,4,e32,m1,tu,ma
addia4,a5,400
vlseg4e32.v v12,(a3)
vfadd.vfv3,v13,fa0
vfadd.vfv1,v12,fa1
vlseg4e32.v v4,(a4)
vfadd.vfv2,v14,fa1
vfmul.vvv17,v3,v5
vfmul.vvv16,v1,v5

Tested on both RV32 and RV64 no regression.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-v.cc (is_vlmax_len_p): New function.
(expand_load_store): Disallow transformation into VLMAX when len is in 
range of [0,31]
(expand_cond_len_op): Ditto.
(expand_gather_scatter): Ditto.
(expand_lanes_load_store): Ditto.
(expand_fold_extract_last): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/post-ra-avl.c: Adapt test.
* gcc.target/riscv/rvv/base/vf_avl-2.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 21 +--
 .../riscv/rvv/autovec/post-ra-avl.c   |  2 +-
 .../gcc.target/riscv/rvv/base/vf_avl-2.c  | 21 +++
 3 files changed, 37 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 038ab084a37..0cc7af58da6 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -68,6 +68,16 @@ imm_avl_p (machine_mode mode)
   : false;
 }
 
+/* Return true if LEN is equal to NUNITS that outbounds range of [0, 31].  */
+static bool
+is_vlmax_len_p (machine_mode mode, rtx len)
+{
+  poly_int64 value;
+  return poly_int_rtx_p (len, &value)
+&& known_eq (value, GET_MODE_NUNITS (mode))
+&& !satisfies_constraint_K (len);
+}
+
 /* Helper functions for insn_flags && insn_types */
 
 /* Return true if caller need pass mask operand for insn pattern with
@@ -3776,7 +3786,7 @@ expand_load_store (rtx *ops, bool is_load)
   rtx len = ops[3];
   machine_mode mode = GET_MODE (ops[0]);
 
-  if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
+  if (is_vlmax_len_p (mode, len))
 {
   /* If the length operand is equal to VF, it is VLMAX load/store.  */
   if (is_load)
@@ -3842,8 +3852,7 @@ expand_cond_len_op (unsigned icode, insn_flags op_type, 
rtx *ops, rtx len)
   machine_mode mask_mode = GET_MODE (mask);
   poly_int64 value;
   bool is_dummy_mask = rtx_equal_p (mask, CONSTM1_RTX (mask_mode));
-  bool is_vlmax_len
-= poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode));
+  bool is_vlmax_len = is_vlmax_len_p (mode, len);
 
   unsigned insn_flags = HAS_DEST_P | HAS_MASK_P | HAS_MERGE_P | op_type;
   if (is_dummy_mask)
@@ -4012,7 +4021,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
   unsigned inner_offsize = GET_MODE_BITSIZE (inner_idx_mode);
   poly_int64 nunits = GET_MODE_NUNITS (vec_mode);
   poly_int64 value;
-  bool is_vlmax = poly_int_rtx_p (len, &value) && known_eq (value, nunits);
+  bool is_vlmax = is_vlmax_len_p (vec_mode, len);
 
   /* Extend the offset element to address width.  */
   if (inner_offsize < BITS_PER_WORD)
@@ -4199,7 +4208,7 @@ expand_lanes_load_store (rtx *ops, bool is_load)
   rtx reg = is_load ? ops[0] : ops[1];
   machine_mode mode = GET_MODE (ops[0]);
 
-  if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
+  if (is_vlmax_len_p (mode, len))
 {
   /* If the length operand is equal to VF, it is VLMAX load/store.  */
   if (is_load)
@@ -4252,7 +4261,7 @@ expand_fold_extract_last (rtx *ops)
   rtx slide_vect = gen_reg_rtx (mode);
   insn_code icode;
 
-  if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
+  if (is_vlmax_len_p (mode, len))
 len = NULL_RTX;
 
   /* Calculate the number of 1-bit in mask. */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
index f3d12bac7cd..c77b2d187fe 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
@@ -13,4 +13,4 @@ int foo() {
   return a;
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c 
b/gcc/tests

[PATCH V2] RISC-V: Disallow transformation into VLMAX AVL for cond_len_xxx when length is in range [0, 31]

2023-12-26 Thread Juzhe-Zhong
Notice we have this following situation:

vsetivlizero,4,e32,m1,ta,ma
vlseg4e32.v v4,(a5)
vlseg4e32.v v12,(a3)
vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since 
VLMAX AVL = 4 when it is fixed-vlmax
vfadd.vfv3,v13,fa0
vfadd.vfv1,v12,fa1
vfmul.vvv17,v3,v5
vfmul.vvv16,v1,v5

The rootcause is that we transform COND_LEN_xxx into VLMAX AVL when len == 
NUNITS blindly.
However, we don't need to transform all of them since when len is range of 
[0,31], we don't need to
consume scalar registers.

After this patch:

vsetivlizero,4,e32,m1,tu,ma
addia4,a5,400
vlseg4e32.v v12,(a3)
vfadd.vfv3,v13,fa0
vfadd.vfv1,v12,fa1
vlseg4e32.v v4,(a4)
vfadd.vfv2,v14,fa1
vfmul.vvv17,v3,v5
vfmul.vvv16,v1,v5

Tested on both RV32 and RV64 no regression.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-v.cc (is_vlmax_len_p): New function.
(expand_load_store): Disallow transformation into VLMAX when len is in 
range of [0,31]
(expand_cond_len_op): Ditto.
(expand_gather_scatter): Ditto.
(expand_lanes_load_store): Ditto.
(expand_fold_extract_last): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/post-ra-avl.c: Adapt test.
* gcc.target/riscv/rvv/base/vf_avl-2.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 21 +--
 .../riscv/rvv/autovec/post-ra-avl.c   |  2 +-
 .../gcc.target/riscv/rvv/base/vf_avl-2.c  | 21 +++
 3 files changed, 37 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 038ab084a37..0cc7af58da6 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -68,6 +68,16 @@ imm_avl_p (machine_mode mode)
   : false;
 }
 
+/* Return true if LEN is equal to NUNITS that outbounds range of [0, 31].  */
+static bool
+is_vlmax_len_p (machine_mode mode, rtx len)
+{
+  poly_int64 value;
+  return poly_int_rtx_p (len, &value)
+&& known_eq (value, GET_MODE_NUNITS (mode))
+&& !satisfies_constraint_K (len);
+}
+
 /* Helper functions for insn_flags && insn_types */
 
 /* Return true if caller need pass mask operand for insn pattern with
@@ -3776,7 +3786,7 @@ expand_load_store (rtx *ops, bool is_load)
   rtx len = ops[3];
   machine_mode mode = GET_MODE (ops[0]);
 
-  if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
+  if (is_vlmax_len_p (mode, len))
 {
   /* If the length operand is equal to VF, it is VLMAX load/store.  */
   if (is_load)
@@ -3842,8 +3852,7 @@ expand_cond_len_op (unsigned icode, insn_flags op_type, 
rtx *ops, rtx len)
   machine_mode mask_mode = GET_MODE (mask);
   poly_int64 value;
   bool is_dummy_mask = rtx_equal_p (mask, CONSTM1_RTX (mask_mode));
-  bool is_vlmax_len
-= poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode));
+  bool is_vlmax_len = is_vlmax_len_p (mode, len);
 
   unsigned insn_flags = HAS_DEST_P | HAS_MASK_P | HAS_MERGE_P | op_type;
   if (is_dummy_mask)
@@ -4012,7 +4021,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
   unsigned inner_offsize = GET_MODE_BITSIZE (inner_idx_mode);
   poly_int64 nunits = GET_MODE_NUNITS (vec_mode);
   poly_int64 value;
-  bool is_vlmax = poly_int_rtx_p (len, &value) && known_eq (value, nunits);
+  bool is_vlmax = is_vlmax_len_p (vec_mode, len);
 
   /* Extend the offset element to address width.  */
   if (inner_offsize < BITS_PER_WORD)
@@ -4199,7 +4208,7 @@ expand_lanes_load_store (rtx *ops, bool is_load)
   rtx reg = is_load ? ops[0] : ops[1];
   machine_mode mode = GET_MODE (ops[0]);
 
-  if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
+  if (is_vlmax_len_p (mode, len))
 {
   /* If the length operand is equal to VF, it is VLMAX load/store.  */
   if (is_load)
@@ -4252,7 +4261,7 @@ expand_fold_extract_last (rtx *ops)
   rtx slide_vect = gen_reg_rtx (mode);
   insn_code icode;
 
-  if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
+  if (is_vlmax_len_p (mode, len))
 len = NULL_RTX;
 
   /* Calculate the number of 1-bit in mask. */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
index f3d12bac7cd..bff6dcb1c38 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
@@ -13,4 +13,4 @@ int foo() {
   return a;
 }
 
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s+[a-x0-9]+,\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vf_

Re: [PATCH] RISC-V: Disallow transformation into VLMAX AVL for cond_len_xxx when length is in range [0, 31]

2023-12-26 Thread juzhe.zh...@rivai.ai
send V2 with test tweak:

https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641447.html 




juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-12-27 09:52
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Disallow transformation into VLMAX AVL for 
cond_len_xxx when length is in range [0,31]
Notice we have this following situation:
 
vsetivlizero,4,e32,m1,ta,ma
vlseg4e32.v v4,(a5)
vlseg4e32.v v12,(a3)
vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since 
VLMAX AVL = 4 when it is fixed-vlmax
vfadd.vfv3,v13,fa0
vfadd.vfv1,v12,fa1
vfmul.vvv17,v3,v5
vfmul.vvv16,v1,v5
 
The rootcause is that we transform COND_LEN_xxx into VLMAX AVL when len == 
NUNITS blindly.
However, we don't need to transform all of them since when len is range of 
[0,31], we don't need to
consume scalar registers.
 
After this patch:
 
vsetivli zero,4,e32,m1,tu,ma
addi a4,a5,400
vlseg4e32.v v12,(a3)
vfadd.vf v3,v13,fa0
vfadd.vf v1,v12,fa1
vlseg4e32.v v4,(a4)
vfadd.vf v2,v14,fa1
vfmul.vv v17,v3,v5
vfmul.vv v16,v1,v5
 
Tested on both RV32 and RV64 no regression.
 
Ok for trunk ?
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (is_vlmax_len_p): New function.
(expand_load_store): Disallow transformation into VLMAX when len is in range of 
[0,31]
(expand_cond_len_op): Ditto.
(expand_gather_scatter): Ditto.
(expand_lanes_load_store): Ditto.
(expand_fold_extract_last): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/post-ra-avl.c: Adapt test.
* gcc.target/riscv/rvv/base/vf_avl-2.c: New test.
 
---
gcc/config/riscv/riscv-v.cc   | 21 +--
.../riscv/rvv/autovec/post-ra-avl.c   |  2 +-
.../gcc.target/riscv/rvv/base/vf_avl-2.c  | 21 +++
3 files changed, 37 insertions(+), 7 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 038ab084a37..0cc7af58da6 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -68,6 +68,16 @@ imm_avl_p (machine_mode mode)
   : false;
}
+/* Return true if LEN is equal to NUNITS that outbounds range of [0, 31].  */
+static bool
+is_vlmax_len_p (machine_mode mode, rtx len)
+{
+  poly_int64 value;
+  return poly_int_rtx_p (len, &value)
+ && known_eq (value, GET_MODE_NUNITS (mode))
+ && !satisfies_constraint_K (len);
+}
+
/* Helper functions for insn_flags && insn_types */
/* Return true if caller need pass mask operand for insn pattern with
@@ -3776,7 +3786,7 @@ expand_load_store (rtx *ops, bool is_load)
   rtx len = ops[3];
   machine_mode mode = GET_MODE (ops[0]);
-  if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
+  if (is_vlmax_len_p (mode, len))
 {
   /* If the length operand is equal to VF, it is VLMAX load/store.  */
   if (is_load)
@@ -3842,8 +3852,7 @@ expand_cond_len_op (unsigned icode, insn_flags op_type, 
rtx *ops, rtx len)
   machine_mode mask_mode = GET_MODE (mask);
   poly_int64 value;
   bool is_dummy_mask = rtx_equal_p (mask, CONSTM1_RTX (mask_mode));
-  bool is_vlmax_len
-= poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode));
+  bool is_vlmax_len = is_vlmax_len_p (mode, len);
   unsigned insn_flags = HAS_DEST_P | HAS_MASK_P | HAS_MERGE_P | op_type;
   if (is_dummy_mask)
@@ -4012,7 +4021,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
   unsigned inner_offsize = GET_MODE_BITSIZE (inner_idx_mode);
   poly_int64 nunits = GET_MODE_NUNITS (vec_mode);
   poly_int64 value;
-  bool is_vlmax = poly_int_rtx_p (len, &value) && known_eq (value, nunits);
+  bool is_vlmax = is_vlmax_len_p (vec_mode, len);
   /* Extend the offset element to address width.  */
   if (inner_offsize < BITS_PER_WORD)
@@ -4199,7 +4208,7 @@ expand_lanes_load_store (rtx *ops, bool is_load)
   rtx reg = is_load ? ops[0] : ops[1];
   machine_mode mode = GET_MODE (ops[0]);
-  if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
+  if (is_vlmax_len_p (mode, len))
 {
   /* If the length operand is equal to VF, it is VLMAX load/store.  */
   if (is_load)
@@ -4252,7 +4261,7 @@ expand_fold_extract_last (rtx *ops)
   rtx slide_vect = gen_reg_rtx (mode);
   insn_code icode;
-  if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
+  if (is_vlmax_len_p (mode, len))
 len = NULL_RTX;
   /* Calculate the number of 1-bit in mask. */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
index f3d12bac7cd..c77b2d187fe 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
@@ -13,4 +13,4 @@ int foo() {
   return a;
}
-/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero} 1 } } */

回复:[PATCH v3 1/6] RISC-V: Refactor riscv-vector-builtins-bases.cc

2023-12-26 Thread joshua
Hi Jeff,

Perhaps fold_fault_load cannot be moved to riscv-protos.h since
gimple_folder is declared in riscv-vector-builtins.h. It's not reasonable
to include riscv-vector-builtins.h in riscv-protos.h. 

In fact, fold_fault_load is defined specially for some builtin functions, and
it would be better to just prototype in riscv-vector-builtins-bases.h.

Joshua







--
发件人:Jeff Law 
发送时间:2023年12月21日(星期四) 02:14
收件人:"Jun Sha (Joshua)"; 
"gcc-patches"
抄 送:"jim.wilson.gcc"; palmer; 
andrew; "philipp.tomsich"; 
"christoph.muellner"; 
"juzhe.zhong"; Jin Ma; Xianmiao 
Qu
主 题:Re: [PATCH v3 1/6] RISC-V: Refactor riscv-vector-builtins-bases.cc




On 12/20/23 05:25, Jun Sha (Joshua) wrote:
> This patch moves the definition of the enums lst_type and
> frm_op_type into riscv-vector-builtins-bases.h and removes
> the static visibility of fold_fault_load(), so these
> can be used in other compile units.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-vector-builtins-bases.cc (enum lst_type):
>  (enum frm_op_type): move to riscv-vector-builtins-bases.h
>  * config/riscv/riscv-vector-builtins-bases.h
>  (GCC_RISCV_VECTOR_BUILTINS_BASES_H): Add header files.
>  (enum lst_type): move from
>  (enum frm_op_type): riscv-vector-builtins-bases.cc
>  (fold_fault_load): riscv-vector-builtins-bases.cc
I'm largely hoping to leave the heavy review lifting here to Juzhe who 
knows GCC's RV vector bits as well as anyone.

Just one small issue.  Would it be better to prototype fold_fault_load 
elsewhere and avoid the gimple.h inclusion in 
riscv-vector-builtins-bases.h?  Perhaps riscv-protos.h?

You might consider prefixing the function name with riscv_.  It's not 
strictly necessary, but it appears to be relatively common in risc-v port.

Thanks,
Jeff

Re: [PATCH] RISC-V: Add crypto machine descriptions

2023-12-26 Thread Kito Cheng
Thanks Feng, the patch is LGTM from my side, I am happy to accept
vector crypto stuffs for GCC 14, it's mostly intrinsic stuff, and the
only few non-intrinsic stuff also low risk enough (e.g. vrol, vctz)


On Fri, Dec 22, 2023 at 10:04 AM Feng Wang  wrote:
>
> 2023-12-22 09:59 Feng Wang  wrote:
>
> Sorry for forgetting to add the patch version number. It should be [PATCH v8 
> 2/3]
>
> >Patch v8: Remove unused iterator and add newline at the end.
>
>
>
> >Patch v7: Remove mode of const_int_operand and typo. Add
>
>
>
> >  newline at the end and comment at the beginning.
>
>
>
> >Patch v6: Swap the operator order of vandn.vv
>
>
>
> >Patch v5: Add vec_duplicate operator.
>
>
>
> >Patch v4: Add process of SEW=64 in RV32 system.
>
>
>
> >Patch v3: Moidfy constrains for crypto vector.
>
>
>
> >Patch v2: Add crypto vector ins into RATIO attr and use vr as
>
>
>
> >destination register.
>
>
>
> >
>
>
>
> >This patch add the crypto machine descriptions(vector-crypto.md) and
>
>
>
> >some new iterators which are used by crypto vector ext.
>
>
>
> >
>
>
>
> >Co-Authored by: Songhe Zhu 
>
>
>
> >Co-Authored by: Ciyan Pan 
>
>
>
> >gcc/ChangeLog:
>
>
>
> >
>
>
>
> >   * config/riscv/iterators.md: Add rotate insn name.
>
>
>
> >   * config/riscv/riscv.md: Add new insns name for crypto vector.
>
>
>
> >   * config/riscv/vector-iterators.md: Add new iterators for crypto 
> > vector.
>
>
>
> >   * config/riscv/vector.md: Add the corresponding attr for crypto 
> > vector.
>
>
>
> >   * config/riscv/vector-crypto.md: New file.The machine descriptions 
> > for crypto vector.
>
>
>
> >---
>
>
>
> > gcc/config/riscv/iterators.md|   4 +-
>
>
>
> > gcc/config/riscv/riscv.md|  33 +-
>
>
>
> > gcc/config/riscv/vector-crypto.md| 654 +++
>
>
>
> > gcc/config/riscv/vector-iterators.md |  36 ++
>
>
>
> > gcc/config/riscv/vector.md   |  55 ++-
>
>
>
> > 5 files changed, 761 insertions(+), 21 deletions(-)
>
>
>
> > create mode 100755 gcc/config/riscv/vector-crypto.md
>
>
>
> >
>
>
>
> >diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
>
>
>
> >index ecf033f2fa7..f332fba7031 100644
>
>
>
> >--- a/gcc/config/riscv/iterators.md
>
>
>
> >+++ b/gcc/config/riscv/iterators.md
>
>
>
> >@@ -304,7 +304,9 @@
>
>
>
> >(umax "maxu")
>
>
>
> >(clz "clz")
>
>
>
> >(ctz "ctz")
>
>
>
> >-   (popcount "cpop")])
>
>
>
> >+   (popcount "cpop")
>
>
>
> >+   (rotate "rol")
>
>
>
> >+   (rotatert "ror")])
>
>
>
> >
>
>
>
> > ;; ---
>
>
>
> > ;; Int Iterators.
>
>
>
> >diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
>
>
>
> >index ee8b71c22aa..88019a46a53 100644
>
>
>
> >--- a/gcc/config/riscv/riscv.md
>
>
>
> >+++ b/gcc/config/riscv/riscv.md
>
>
>
> >@@ -427,6 +427,34 @@
>
>
>
> > ;; vcompressvector compress instruction
>
>
>
> > ;; vmov whole vector register move
>
>
>
> > ;; vector   unknown vector instruction
>
>
>
> >+;; 17. Crypto Vector instructions
>
>
>
> >+;; vandncrypto vector bitwise and-not instructions
>
>
>
> >+;; vbrevcrypto vector reverse bits in elements instructions
>
>
>
> >+;; vbrev8   crypto vector reverse bits in bytes instructions
>
>
>
> >+;; vrev8crypto vector reverse bytes instructions
>
>
>
> >+;; vclz crypto vector count leading Zeros instructions
>
>
>
> >+;; vctz crypto vector count lrailing Zeros instructions
>
>
>
> >+;; vrol crypto vector rotate left instructions
>
>
>
> >+;; vror crypto vector rotate right instructions
>
>
>
> >+;; vwsllcrypto vector widening shift left logical instructions
>
>
>
> >+;; vclmul   crypto vector carry-less multiply - return low half 
> >instructions
>
>
>
> >+;; vclmulh  crypto vector carry-less multiply - return high half 
> >instructions
>
>
>
> >+;; vghshcrypto vector add-multiply over GHASH Galois-Field 
> >instructions
>
>
>
> >+;; vgmulcrypto vector multiply over GHASH Galois-Field instrumctions
>
>
>
> >+;; vaesef   crypto vector AES final-round encryption instructions
>
>
>
> >+;; vaesem   crypto vector AES middle-round encryption instructions
>
>
>
> >+;; vaesdf   crypto vector AES final-round decryption instructions
>
>
>
> >+;; vaesdm   crypto vector AES middle-round decryption instructions
>
>
>
> >+;; vaeskf1  crypto vector AES-128 Forward KeySchedule generation 
> >instructions
>
>
>
> >+;; vaeskf2  crypto vector AES-256 Forward KeySchedule generation 
> >instructions
>
>
>
> >+;; vaeszcrypto vector AES round zero encryption/decryption 
> >instructions
>
>
>
> >+;; vsha2ms  crypto vector SHA-2 message schedule instructions
>
>
>
> >+;; vsha2ch

回复:[PATCH v3 2/6] RISC-V: Split csr_operand in predicates.md for vector patterns.

2023-12-26 Thread joshua
Hi Jeff,

Yes, I will change soemthing in vector_csr_operand in the following
patches.

Constraints will be added that the AVL cannot be encoded as an
immediate for xtheadvecotr vsetvl.

Joshua







--
发件人:Jeff Law 
发送时间:2023年12月21日(星期四) 02:16
收件人:"Jun Sha (Joshua)"; 
"gcc-patches"
抄 送:"jim.wilson.gcc"; palmer; 
andrew; "philipp.tomsich"; 
"christoph.muellner"; 
"juzhe.zhong"; Jin Ma; Xianmiao 
Qu
主 题:Re: [PATCH v3 2/6] RISC-V: Split csr_operand in predicates.md for vector 
patterns.




On 12/20/23 05:27, Jun Sha (Joshua) wrote:
> This patch splits the definition of csr_operand in predicates.md.
> The newly defined vector_csr_operand has the same functionality
> as csr_operand but can only be used in vector patterns, so that
> changes for vector will not affect scalar patterns in files
> like riscv.md.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/predicates.md (vector_csr_operand):
>  Define vector_csr_opeand for vector.
>  * config/riscv/vector.md:
>  Use newly defined csr_operand for vector.
So do you envision changing something in vector_csr_operand?  If not, 
then this doesn't make much sense.

Jeff

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-26 Thread chenglulu


在 2023/12/23 下午6:44, Xi Ruoyao 写道:

On Sat, 2023-12-23 at 10:29 +0800, chenglulu wrote:

The performance drop has nothing to do with this patch. I found that the h264 
performance compiled
by r14-6787 compared to r14-6421 dropped by 6.4%.

Then I guess we should create a bug report...

The code h264 score in r14-6818 is the same as that of r14-6421.



  But there is a problem. My regression test has the following two fail 
items.(based on r14-6787)
+FAIL: gcc.dg/cpp/_Pragma3.c (test for excess errors)
+FAIL: gcc.dg/pr86617.c scan-rtl-dump-times final "mem/v" 6

Strange.  I didn't see them on r14-6650 (with or without the patch).



+FAIL: gcc.dg/pr86617.c scan-rtl-dump-times final "mem/v" 6

In r14-6818 the issue persists. I kind of chased the code and found that the 
problem is like this:
  volatile unsigned char u8;

  void test (void)
  {
u8 = u8 + u8;
u8 = u8 - u8;
  }

$./gcc/cc1 test.c -o test.s -fdump-rtl-all-all -fdiagnostics-plain-output  -Os 
-fdump-rtl-final -ffat-lto-objects

test.c.301r.outof_cfglayout

 (insn 7 6 9 2 (set (reg:DI 80 [ u8.0_1 ])
(zero_extend:DI*(mem/v/c*:QI (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) [0 
u8D.2193+0 S1 A8]))) "volatile.c":5:11 459 {simple_load_uextdiqidi}
 (nil))

test.c.302r.split1

(insn 27 6 28 2 (set (reg:DI 98)
(unspec:DI [
(symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
] UNSPEC_PCALAU12I_GR)) "volatile.c":5:11 -1
 (nil))
(insn 28 27 9 2 (set (reg:DI 80 [ u8.0_1 ])
(zero_extend:DI*(mem:*QI (lo_sum:DI (reg:DI 98)
(symbol_ref:DI ("*.LANCHOR0") [flags 0x182])) [0  S1 A8]))) 
"volatile.c":5:11 -1
 (nil))

The volatile property of the mem here is gone, so the test fails.



Re: [PATCH] LoongArch: Fix infinite secondary reloading of FCCmode [PR113148]

2023-12-26 Thread chenglulu



在 2023/12/27 上午6:37, Xi Ruoyao 写道:

The GCC internal doc says:

  X might be a pseudo-register or a 'subreg' of a pseudo-register,
  which could either be in a hard register or in memory.  Use
  'true_regnum' to find out; it will return -1 if the pseudo is in
  memory and the hard register number if it is in a register.

So "MEM_P (x)" is not enough for checking if we are reloading from/to
the memory.  This bug has caused reload pass to stall and finally ICE
complaining with "maximum number of generated reload insns per insn
achieved", since r14-6814.

Check if "true_regnum (x)" is -1 besides "MEM_P (x)" to fix the issue.

gcc/ChangeLog:

PR target/113148
* config/loongarch/loongarch.cc (loongarch_secondary_reload):
Check if regno == -1 besides MEM_P (x) for reloading FCCmode
from/to FPR to/from memory.

gcc/testsuite/ChangeLog:

PR target/113148
* gcc.target/loongarch/pr113148.c: New test.
---

Bootstrapped & regtested on loongarch64-linux-gnu.  Ok for trunk?


LGTM!

Thanks!



  gcc/config/loongarch/loongarch.cc |  3 +-
  gcc/testsuite/gcc.target/loongarch/pr113148.c | 44 +++
  2 files changed, 46 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/pr113148.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 5ffd06ce9be..c0a0af3dda5 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6951,7 +6951,8 @@ loongarch_secondary_reload (bool in_p ATTRIBUTE_UNUSED, 
rtx x,
  return NO_REGS;
}
  
-  if (reg_class_subset_p (rclass, FP_REGS) && MEM_P (x))

+  if (reg_class_subset_p (rclass, FP_REGS)
+ && (regno == -1 || MEM_P (x)))
return GR_REGS;
  
return NO_REGS;

diff --git a/gcc/testsuite/gcc.target/loongarch/pr113148.c 
b/gcc/testsuite/gcc.target/loongarch/pr113148.c
new file mode 100644
index 000..cf48e552053
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr113148.c
@@ -0,0 +1,44 @@
+/* PR 113148: ICE caused by infinite reloading */
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=la464 -mfpu=64 -mabi=lp64d" } */
+
+struct bound
+{
+  double max;
+} drawQuadrant_bound;
+double w4, innerXfromXY_y, computeBound_right_0;
+struct arc_def
+{
+  double w, h;
+  double a0, a1;
+};
+static void drawQuadrant (struct arc_def *);
+static void
+computeBound (struct arc_def *def, struct bound *bound)
+{
+  double ellipsex_1, ellipsex_0;
+  bound->max = def->a1 ?: __builtin_sin (w4) * def->h;
+  if (def->a0 == 5 && def->w == def->h)
+;
+  else
+ellipsex_0 = def->a0 == 0.0 ?: __builtin_cos (w4);
+  if (def->a1 == 5 && def->w == def->h)
+ellipsex_1 = bound->max;
+  __builtin_sqrt (ellipsex_1 * innerXfromXY_y * innerXfromXY_y * w4);
+  computeBound_right_0 = ellipsex_0;
+}
+void
+drawArc ()
+{
+  struct arc_def foo;
+  for (;;)
+drawQuadrant (&foo);
+}
+void
+drawQuadrant (struct arc_def *def)
+{
+  int y, miny;
+  computeBound (def, &drawQuadrant_bound);
+  while (y >= miny)
+;
+}




Re: [PATCH v2] LoongArch: Expand left rotate to right rotate with negated amount

2023-12-26 Thread chenglulu

LGTM!

Thanks!

在 2023/12/24 下午8:33, Xi Ruoyao 写道:

gcc/ChangeLog:

* config/loongarch/loongarch.md (rotl3):
New define_expand.
* config/loongarch/simd.md (vrotl3): Likewise.
(rotl3): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/rotl-with-rotr.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-b.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-h.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-w.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-d.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-b.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-h.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-w.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-d.c: New test.
---

Change from [v1]:
- Wrap the negated wrapping amount with subreg: for
   rotl, to avoid an ICE left rotating QI and HI vectors.
- Add tests for QI, HI, and DI vectors.

[v1]:https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640872.html

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.md | 12 
  gcc/config/loongarch/simd.md  | 29 +++
  .../gcc.target/loongarch/rotl-with-rotr.c |  9 ++
  .../gcc.target/loongarch/rotl-with-vrotr-b.c  |  7 +
  .../gcc.target/loongarch/rotl-with-vrotr-d.c  |  7 +
  .../gcc.target/loongarch/rotl-with-vrotr-h.c  |  7 +
  .../gcc.target/loongarch/rotl-with-vrotr-w.c  | 28 ++
  .../gcc.target/loongarch/rotl-with-xvrotr-b.c |  7 +
  .../gcc.target/loongarch/rotl-with-xvrotr-d.c |  7 +
  .../gcc.target/loongarch/rotl-with-xvrotr-h.c |  7 +
  .../gcc.target/loongarch/rotl-with-xvrotr-w.c |  7 +
  11 files changed, 127 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr-b.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr-d.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr-h.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr-w.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr-b.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr-d.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr-h.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr-w.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 30025bf1908..939432b83e0 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2903,6 +2903,18 @@ (define_insn "rotrsi3_extend"
[(set_attr "type" "shift,shift")
 (set_attr "mode" "SI")])
  
+;; Expand left rotate to right rotate.

+(define_expand "rotl3"
+  [(set (match_dup 3)
+   (neg:SI (match_operand:SI 2 "register_operand")))
+   (set (match_operand:GPR 0 "register_operand")
+   (rotatert:GPR (match_operand:GPR 1 "register_operand")
+ (match_dup 3)))]
+  ""
+  {
+operands[3] = gen_reg_rtx (SImode);
+  });
+
  ;; The following templates were added to generate "bstrpick.d + alsl.d"
  ;; instruction pairs.
  ;; It is required that the values of const_immalsl_operand and
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 13202f79bee..93fb39abcf5 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -268,6 +268,35 @@ (define_insn "vrotr3"
[(set_attr "type" "simd_int_arith")
 (set_attr "mode" "")])
  
+;; Expand left rotate to right rotate.

+(define_expand "vrotl3"
+  [(set (match_dup 3)
+   (neg:IVEC (match_operand:IVEC 2 "register_operand")))
+   (set (match_operand:IVEC 0 "register_operand")
+   (rotatert:IVEC (match_operand:IVEC 1 "register_operand")
+  (match_dup 3)))]
+  ""
+  {
+operands[3] = gen_reg_rtx (mode);
+  });
+
+;; Expand left rotate with a scalar amount to right rotate: negate the
+;; scalar before broadcasting it because scalar negation is cheaper than
+;; vector negation.
+(define_expand "rotl3"
+  [(set (match_dup 3)
+   (neg:SI (match_operand:SI 2 "register_operand")))
+   (set (match_dup 4)
+   (vec_duplicate:IVEC (subreg: (match_dup 3) 0)))
+   (set (match_operand:IVEC 0 "register_operand")
+   (rotatert:IVEC (match_operand:IVEC 1 "register_operand")
+  (match_dup 4)))]
+  ""
+  {
+operands[3] = gen_reg_rtx (SImode);
+operands[4] = gen_reg_rtx (mode);
+  });
+
  ;; vrotri.{b/h/w/d}
  
  (define_insn "rotr3"

diff --git a/gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c 
b/gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c
new file mode 100644
index 000..84cc53cecaf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options

Re: [pushed][PATCH v1] LoongArch: Fix ICE when passing two same vector argument consecutively

2023-12-26 Thread chenglulu

Pushed to r14-6849.

在 2023/12/22 下午4:18, Chenghui Pan 写道:

Following code will cause ICE on LoongArch target:

   #include 

   extern void bar (__m128i, __m128i);

   __m128i a;

   void
   foo ()
   {
 bar (a, a);
   }

It is caused by missing constraint definition in mov_lsx. This
patch fixes the template and remove the unnecessary processing from
loongarch_split_move () function.

This patch also cleanup the redundant definition from
loongarch_split_move () and loongarch_split_move_p ().

gcc/ChangeLog:

* config/loongarch/lasx.md: Use loongarch_split_move and
  loongarch_split_move_p directly.
* config/loongarch/loongarch-protos.h
(loongarch_split_move): Remove unnecessary argument.
(loongarch_split_move_insn_p): Delete.
(loongarch_split_move_insn): Delete.
* config/loongarch/loongarch.cc
(loongarch_split_move_insn_p): Delete.
(loongarch_load_store_insns): Use loongarch_split_move_p
  directly.
(loongarch_split_move): remove the unnecessary processing.
(loongarch_split_move_insn): Delete.
* config/loongarch/lsx.md: Use loongarch_split_move and
  loongarch_split_move_p directly.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lsx/lsx-mov-1.c: New test.
---
  gcc/config/loongarch/lasx.md  |  4 +-
  gcc/config/loongarch/loongarch-protos.h   |  4 +-
  gcc/config/loongarch/loongarch.cc | 49 +--
  gcc/config/loongarch/lsx.md   | 10 ++--
  .../loongarch/vector/lsx/lsx-mov-1.c  | 14 ++
  5 files changed, 24 insertions(+), 57 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-mov-1.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index eeac8cd984b..6418ff52fe5 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -912,10 +912,10 @@ (define_split
[(set (match_operand:LASX 0 "nonimmediate_operand")
(match_operand:LASX 1 "move_operand"))]
"reload_completed && ISA_HAS_LASX
-   && loongarch_split_move_insn_p (operands[0], operands[1])"
+   && loongarch_split_move_p (operands[0], operands[1])"
[(const_int 0)]
  {
-  loongarch_split_move_insn (operands[0], operands[1], curr_insn);
+  loongarch_split_move (operands[0], operands[1]);
DONE;
  })
  
diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h

index c66ab932d67..7bf21a45c69 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -82,11 +82,9 @@ extern rtx loongarch_legitimize_call_address (rtx);
  
  extern rtx loongarch_subword (rtx, bool);

  extern bool loongarch_split_move_p (rtx, rtx);
-extern void loongarch_split_move (rtx, rtx, rtx);
+extern void loongarch_split_move (rtx, rtx);
  extern bool loongarch_addu16i_imm12_operand_p (HOST_WIDE_INT, machine_mode);
  extern void loongarch_split_plus_constant (rtx *, machine_mode);
-extern bool loongarch_split_move_insn_p (rtx, rtx);
-extern void loongarch_split_move_insn (rtx, rtx, rtx);
  extern void loongarch_split_128bit_move (rtx, rtx);
  extern bool loongarch_split_128bit_move_p (rtx, rtx);
  extern void loongarch_split_256bit_move (rtx, rtx);
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 390e3206a17..98709123770 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2562,7 +2562,6 @@ loongarch_split_const_insns (rtx x)
return low + high;
  }
  
-bool loongarch_split_move_insn_p (rtx dest, rtx src);

  /* Return one word of 128-bit value OP, taking into account the fixed
 endianness of certain registers.  BYTE selects from the byte address.  */
  
@@ -2602,7 +2601,7 @@ loongarch_load_store_insns (rtx mem, rtx_insn *insn)

  {
set = single_set (insn);
if (set
- && !loongarch_split_move_insn_p (SET_DEST (set), SET_SRC (set)))
+ && !loongarch_split_move_p (SET_DEST (set), SET_SRC (set)))
might_split_p = false;
  }
  
@@ -4220,7 +4219,7 @@ loongarch_split_move_p (rtx dest, rtx src)

 SPLIT_TYPE describes the split condition.  */
  
  void

-loongarch_split_move (rtx dest, rtx src, rtx insn_)
+loongarch_split_move (rtx dest, rtx src)
  {
rtx low_dest;
  
@@ -4258,33 +4257,6 @@ loongarch_split_move (rtx dest, rtx src, rtx insn_)

   loongarch_subword (src, true));
}
  }
-
-  /* This is a hack.  See if the next insn uses DEST and if so, see if we
- can forward SRC for DEST.  This is most useful if the next insn is a
- simple store.  */
-  rtx_insn *insn = (rtx_insn *) insn_;
-  struct loongarch_address_info addr = {};
-  if (insn)
-{
-  rtx_insn *next = next_nonnote_nondebug_insn_bb (insn);
-  if (next)
-   {
- rtx set = single_set (next);
- if (set && SET_SRC (set) == dest)
-   {
- 

Re: [pushed ][PATCH v1] LoongArch: Fix insn output of vec_concat templates for LASX.

2023-12-26 Thread chenglulu

Pused to r14-6848.

在 2023/12/22 下午4:22, Chenghui Pan 写道:

When investigaing failure of gcc.dg/vect/slp-reduc-sad.c, following
instruction block are being generated by vec_concatv32qi (which is
generated by vec_initv32qiv16qi) at entrance of foo() function:

   vldx$vr3,$r5,$r6
   vld $vr2,$r5,0
   xvpermi.q   $xr2,$xr3,0x20

causes the reversion of vec_initv32qiv16qi operation's high and
low 128-bit part.

According to other target's similar impl and LSX impl for following
RTL representation, current definition in lasx.md of "vec_concat"
are wrong:

   (set (op0) (vec_concat (op1) (op2)))

For correct behavior, the last argument of xvpermi.q should be 0x02
instead of 0x20. This patch fixes this issue and cleanup the vec_concat
template impl.

gcc/ChangeLog:

* config/loongarch/lasx.md (vec_concatv4di): Delete.
(vec_concatv8si): Delete.
(vec_concatv16hi): Delete.
(vec_concatv32qi): Delete.
(vec_concatv4df): Delete.
(vec_concatv8sf): Delete.
(vec_concat): New template with insn output fixed.
---
  gcc/config/loongarch/lasx.md | 74 
  1 file changed, 7 insertions(+), 67 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index eeac8cd984b..a9d948bb606 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -590,77 +590,17 @@ (define_insn "lasx_xvinsgr2vr_"
[(set_attr "type" "simd_insert")
 (set_attr "mode" "")])
  
-(define_insn "vec_concatv4di"

-  [(set (match_operand:V4DI 0 "register_operand" "=f")
-   (vec_concat:V4DI
- (match_operand:V2DI 1 "register_operand" "0")
- (match_operand:V2DI 2 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-{
-  return "xvpermi.q\t%u0,%u2,0x20";
-}
-  [(set_attr "type" "simd_splat")
-   (set_attr "mode" "V4DI")])
-
-(define_insn "vec_concatv8si"
-  [(set (match_operand:V8SI 0 "register_operand" "=f")
-   (vec_concat:V8SI
- (match_operand:V4SI 1 "register_operand" "0")
- (match_operand:V4SI 2 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-{
-  return "xvpermi.q\t%u0,%u2,0x20";
-}
-  [(set_attr "type" "simd_splat")
-   (set_attr "mode" "V4DI")])
-
-(define_insn "vec_concatv16hi"
-  [(set (match_operand:V16HI 0 "register_operand" "=f")
-   (vec_concat:V16HI
- (match_operand:V8HI 1 "register_operand" "0")
- (match_operand:V8HI 2 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-{
-  return "xvpermi.q\t%u0,%u2,0x20";
-}
-  [(set_attr "type" "simd_splat")
-   (set_attr "mode" "V4DI")])
-
-(define_insn "vec_concatv32qi"
-  [(set (match_operand:V32QI 0 "register_operand" "=f")
-   (vec_concat:V32QI
- (match_operand:V16QI 1 "register_operand" "0")
- (match_operand:V16QI 2 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-{
-  return "xvpermi.q\t%u0,%u2,0x20";
-}
-  [(set_attr "type" "simd_splat")
-   (set_attr "mode" "V4DI")])
-
-(define_insn "vec_concatv4df"
-  [(set (match_operand:V4DF 0 "register_operand" "=f")
-   (vec_concat:V4DF
- (match_operand:V2DF 1 "register_operand" "0")
- (match_operand:V2DF 2 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-{
-  return "xvpermi.q\t%u0,%u2,0x20";
-}
-  [(set_attr "type" "simd_splat")
-   (set_attr "mode" "V4DF")])
-
-(define_insn "vec_concatv8sf"
-  [(set (match_operand:V8SF 0 "register_operand" "=f")
-   (vec_concat:V8SF
- (match_operand:V4SF 1 "register_operand" "0")
- (match_operand:V4SF 2 "register_operand" "f")))]
+(define_insn "vec_concat"
+  [(set (match_operand:LASX 0 "register_operand" "=f")
+   (vec_concat:LASX
+ (match_operand: 1 "register_operand" "0")
+ (match_operand: 2 "register_operand" "f")))]
"ISA_HAS_LASX"
  {
-  return "xvpermi.q\t%u0,%u2,0x20";
+  return "xvpermi.q\t%u0,%u2,0x02";
  }
[(set_attr "type" "simd_splat")
-   (set_attr "mode" "V4DI")])
+   (set_attr "mode" "")])
  
  ;; xshuf.w

  (define_insn "lasx_xvperm_"




Re: [pushed][PATCH v1] LoongArch: Fixed bug in *bstrins__for_ior_mask template.

2023-12-26 Thread chenglulu

Pushed to r14-6847.

在 2023/12/25 上午11:20, Li Wei 写道:

We found that using the latest compiled gcc will cause a miscompare error
when running spec2006 400.perlbench test with -flto turned on.  After testing,
it was found that only the LoongArch architecture will report errors.
The first error commit was located through the git bisect command as
r14-3773-g5b857e87201335.  Through debugging, it was found that the problem
was that the split condition of the *bstrins__for_ior_mask template was
empty, which should actually be consistent with the insn condition.

gcc/ChangeLog:

* config/loongarch/loongarch.md: Adjust.
---
  gcc/config/loongarch/loongarch.md | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 7021105b241..2b0609f2f31 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1489,7 +1489,7 @@ (define_insn_and_split "*bstrins__for_ior_mask"
"loongarch_pre_reload_split () && \
 loongarch_use_bstrins_for_ior_with_mask (mode, operands)"
"#"
-  ""
+  "&& true"
[(set (match_dup 0) (match_dup 1))
 (set (zero_extract:GPR (match_dup 0) (match_dup 2) (match_dup 4))
(match_dup 3))]