[Committed] RISC-V: Some minior tweak on dynamic LMUL cost model
Tweak some codes of dynamic LMUL cost model to make computation more predictable and accurate. Tested on both RV32 and RV64 no regression. Committed. PR target/113112 gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (compute_estimated_lmul): Tweak LMUL estimation. (has_unexpected_spills_p): Ditto. (costs::record_potential_unexpected_spills): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: Add more checks. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-12.c: New test. * gcc.dg/vect/costmodel/riscv/rvv/pr113112-2.c: New test. --- gcc/config/riscv/riscv-vector-costs.cc| 42 +-- .../costmodel/riscv/rvv/dynamic-lmul1-1.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul1-2.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul1-3.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul1-4.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul1-5.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul1-6.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul1-7.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul2-1.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul2-2.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul2-3.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul2-4.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul2-5.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul4-1.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul4-2.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul4-3.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul4-5.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul4-6.c | 5 ++- .../costmodel/riscv/rvv/dynamic-lmul4-7.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul4-8.c | 5 ++- .../costmodel/riscv/rvv/dynamic-lmul8-1.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul8-10.c| 3 ++ .../costmodel/riscv/rvv/dynamic-lmul8-11.c| 3 ++ .../costmodel/riscv/rvv/dynamic-lmul8-12.c| 25 +++ .../costmodel/riscv/rvv/dynamic-lmul8-2.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul8-3.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul8-4.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul8-5.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul8-6.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul8-7.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul8-8.c | 3 ++ .../costmodel/riscv/rvv/dynamic-lmul8-9.c | 3 ++ .../vect/costmodel/riscv/rvv/pr113112-2.c | 20 + 33 files changed, 166 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-12.c create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-2.c diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc index 7b837b08f9e..74b8e86a5e1 100644 --- a/gcc/config/riscv/riscv-vector-costs.cc +++ b/gcc/config/riscv/riscv-vector-costs.cc @@ -394,21 +394,32 @@ compute_estimated_lmul (loop_vec_info loop_vinfo, machine_mode mode)
[PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor
From: Pan Li This patch would like to XFAIL the test case pr30957-1.c for the RVV when build the elf with some configurations (list at the end of the log) It will be vectorized during vect_transform_loop with a variable factor. It won't benefit from unrolling/peeling and mark the loop->unroll as 1. Of course, it will do nothing during unroll_loops when loop->unroll is 1. The aarch64_sve may have the similar issue but it initialize the const `0.0 / -5.0` in the test file to `+0.0` before pass to the function foo. Then it will pass the execution test. aarch64: moviv0.2s, #0x0 stp x29, x30, [sp, #-16]! mov w0, #0xa mov x29, sp bl 400280 <== s0 is +0.0 Unfortunately, the riscv initialize the the const `0.0 / -5.0` to the `-0.0`, and then pass it to the function foo. Of course it the execution test will fail. riscv: flw fa0,388(gp) # 1299c <__SDATA_BEGIN__+0x4> addisp,sp,-16 li a0,10 sd ra,8(sp) jal 101fc <== fa0 is -0.0 After this patch the loops vectorized with a variable factor of the RVV will be treated as XFAIL by the tree dump when riscv_v and variable_vect_length. The below configurations are validated as XFAIL for RV64. * riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax * riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax gcc/testsuite/ChangeLog: * gcc.dg/pr30957-1.c: Add XFAIL for RVV when vectorized with variable length. Signed-off-by: Pan Li --- gcc/testsuite/gcc.dg/pr30957-1.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/pr30957-1.c b/gcc/testsuite/gcc.dg/pr30957-1.c index 564410913ab..7a7242ec16d 100644 --- a/gcc/testsuite/gcc.dg/pr30957-1.c +++ b/gcc/testsuite/gcc.dg/pr30957-1.c @@ -3,7 +3,7 @@ where each addition is a library call. / /* { dg-require-effective-target hard_float } */ /* -fassociative-math requires -fno-trapping-math and -fno-signed-zeros. */ -/* { dg-options "-O2 -funroll-loops -fassociative-math -fno-trapping-math -fno-signed-zeros -fvariable-expansion-in-unroller -fdump-rtl-loop2_unroll" } */ +/* { dg-options "-O2 -funroll-loops -fassociative-math -fno-trapping-math -fno-signed-zeros -fvariable-expansion-in-unroller -fdump-rtl-loop2_unroll -fdump-tree-vect-details" } */ extern void abort (void); extern void exit (int); @@ -34,3 +34,4 @@ main () } /* { dg-final { scan-rtl-dump "Expanding Accumulator" "loop2_unroll" { xfail mmix-*-* } } } */ +/* { dg-f
[Committed] RISC-V: Fix typo
gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: Fix typo. --- .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c index f3c2315c2c5..e47af25aa9b 100644 --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c @@ -19,5 +19,5 @@ bar (int *x, int a, int b, int n) /* { dg-final { scan-assembler {e32,m4} } } */ /* { dg-final { scan-assembler-not {jr} } } */ -/* { dg-final { scan-assembler-times {ret} 2 } } * +/* { dg-final { scan-assembler-times {ret} 2 } } */ /* { dg-final { scan-tree-dump-times "Preferring smaller LMUL loop because it has unexpected spills" 1 "vect" } } */ -- 2.36.3
Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]
On 12/23/23 02:10, waffl3x wrote: On Friday, December 22nd, 2023 at 10:26 AM, Jason Merrill wrote: On 12/22/23 04:01, waffl3x wrote: int n = 0; auto f = [](this Self){ static_assert(__is_same (decltype(n), int)); decltype((n)) a; // { dg-error {is not captured} } }; f(); Could you clarify if this error being removed was intentional. I do recall that Patrick Palka wanted to remove this error in his patch, but it seemed to me like you stated it would be incorrect to allow it. Since the error is no longer present I assume I am misunderstanding the exchange. In any case, let me know if I need to modify my test case or if this error needs to be added back in. Removing the error was correct under https://eel.is/c++draft/expr.prim#id.unqual-3 Naming n in that lambda would not refer a capture by copy, so the decltype is the same as outside the lambda. Alright, I've fixed my tests to reflect that. I've got defaulting assignment operators working. Defaulting equality and comparison operators seemed to work out of the box somehow, so I just have to make some fleshed out tests for those cases. There can always be more tests, I have a few ideas for what still needs to be covered, mostly with dependent lambdas. Tests for xobj conversion operators definitely need to be more fleshed out. I also need to formulate some tests to make sure constraints are not being taking into account when the object parameters should not correspond, but that's a little more tough to test for than the valid cases. Other than tests though, is there anything you can think of that the patch is missing? Other than the aforementioned tests, I'm pretty confident everything is done. To recap, I have CWG2789 implemented on my end with the change we discussed to require corresponding object parameters instead of the same type, and I have CWG2586 implemented. I can't recall what other outstanding issues we had, and my notes don't mention anything other than tests. So I'm assuming everything is good. Sounds good! Did you mean to include the updated patch? Jason
Re: [PATCH v2 2/2] libstdc++: implement std::generator
On Thu, Dec 21, 2023 at 4:26 PM Arsen Arsenović wrote: > > libstdc++-v3/ChangeLog: > ... snip ... > + void > + _M_jump_in(_Coro_handle __rest, _Coro_handle __new) noexcept > + { > + __glibcxx_assert(&__new.promise()._M_nest == this); > + __glibcxx_assert(this->_M_is_bottom()); > + // We're bottom. We're also top of top is unset (note that this is Should this read: // We're bottom. We're also top if top is unset (note that this is ? Very impressive work -- I learned a ton by reading your implementation. Will
Re: [PATCH] cse: Fix handling of fake vec_select sets [PR111702]
On Thu, 21 Dec 2023 at 00:00, Richard Sandiford wrote: > > If cse sees: > > (set (reg R) (const_vector [A B ...])) > > it creates fake sets of the form: > > (set R[0] A) > (set R[1] B) > ... > > (with R[n] replaced by appropriate rtl) and then adds them to the tables > in the same way as for normal sets. This allows a sequence like: > > (set (reg R2) A) > ...(reg R2)... > > to try to use R[0] instead of (reg R2). > > But the pass was taking the analogy too far, and was trying to simplify > these fake sets based on costs. That is, if there was an earlier: > > (set (reg T) A) > > the pass would go to considerable effort trying to work out whether: > > (set R[0] A) > > or: > > (set R[0] (reg T)) > > was more profitable. This included running validate*_change on the sets, > which has no meaning given that the sets are not part of the insn. > > In this example, the equivalence A == T is already known, and the > purpose of the fake sets is to add A == T == R[0]. We can do that > just as easily (or, as the PR shows, more easily) if we keep the > original form of the fake set, with A instead of T. > > The problem in the PR occurred if we had: > > (1) something that establishes an equivalence between a vector V1 of > M-bit scalar integers and a hard register H > > (2) something that establishes an equivalence between a vector V2 of > N-bit scalar integers, where N instances of V1[0] > > (1) established an equivalence between V1[0] and H in M bits. > (2) then triggered a search for an equivalence of V1[0] in N bits. > This included: > > /* See if we have a CONST_INT that is already in a register in a > wider mode. */ > > which (correctly) found that the low N bits of H contain the right value. > But because it came from a wider mode, this equivalence between N-bit H > and N-bit V1[0] was not yet in the hash table. It therefore survived > the purge in: > > /* At this point, ELT, if nonzero, points to a class of expressions > equivalent to the source of this SET and SRC, SRC_EQV, SRC_FOLDED, > and SRC_RELATED, if nonzero, each contain additional equivalent > expressions. Prune these latter expressions by deleting expressions > already in the equivalence class. > > And since more than 1 set found the same N-bit equivalence between > H and V1[0], the pass tried to add it more than once. > > Things were already wrong at this stage, but an ICE was only triggered > later when trying to merge this N-bit equivalence with another one. > > We could avoid the double registration by adding: > > for (elt = classp; elt; elt = elt->next_same_value) > if (rtx_equal_p (elt->exp, x)) > return elt; > > to insert_with_costs, or by making cse_insn check whether previous > sets have recorded the same equivalence. The latter seems more > appealing from a compile-time perspective. But in this case, > doing that would be adding yet more spurious work to the handling > of fake sets. > > The handling of fake sets therefore seems like the more fundamental bug. > > While there, the patch also makes sure that we don't apply REG_EQUAL > notes to these fake sets. They only describe the "real" (first) set. Hi Richard, Thanks for the detailed explanation and fix! Thanks, Prathamesh > > gcc/ > PR rtl-optimization/111702 > * cse.cc (set::mode): Move earlier. > (set::src_in_memory, set::src_volatile): Convert to bitfields. > (set::is_fake_set): New member variable. > (add_to_set): Add an is_fake_set parameter. > (find_sets_in_insn): Update calls accordingly. > (cse_insn): Do not apply REG_EQUAL notes to fake sets. Do not > try to optimize them either, or validate changes to them. > > gcc/ > PR rtl-optimization/111702 > * gcc.dg/rtl/aarch64/pr111702.c: New test. > --- > gcc/cse.cc | 38 +++--- > gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c | 43 + > 2 files changed, 67 insertions(+), 14 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c > > diff --git a/gcc/cse.cc b/gcc/cse.cc > index f9603fdfd43..9fd51ca2832 100644 > --- a/gcc/cse.cc > +++ b/gcc/cse.cc > @@ -4128,13 +4128,17 @@ struct set >unsigned dest_hash; >/* The SET_DEST, with SUBREG, etc., stripped. */ >rtx inner_dest; > + /* Original machine mode, in case it becomes a CONST_INT. */ > + ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE; >/* Nonzero if the SET_SRC is in memory. */ > - char src_in_memory; > + unsigned int src_in_memory : 1; >/* Nonzero if the SET_SRC contains something > whose value cannot be predicted and understood. */ > - char src_volatile; > - /* Original machine mode, in case it becomes a CONST_INT. */ > - ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE; > + unsigned int src_volatile : 1; > + /* Nonzero if RTL is an artifical set that has been c
Re: [PATCH 2/3][RFC] RISC-V: Add vector related reservations
On 12/20/2023 2:55 PM, Edwin Lu wrote: On 12/20/2023 10:57 AM, Jeff Law wrote: On 12/15/23 11:53, Edwin Lu wrote: This patch copies the vector reservations from generic-ooo.md and inserts them into generic.md and sifive.md. The vector pipelines are necessary to avoid an ICE from the assert I forgot to get clarification earlier but this patch would introduce many scan-dump failures for both vector and non-vector targets (https://github.com/ewlu/gcc-precommit-ci/issues/950#issuecomment-1858392181). I haven't identified any execution errors that would be introduced on mtune=rocket aside from one ICE which I'm currently working on fixing. Additionally, as mentioned in PR113035, there are significant testsuite differences between mtune=rocket and mtune=sifive-7-series. I haven't gone through all of the differences and I don't know if they are a problem with the patch or a result of the cost modeling assumptions. Is there a problem with the current way mtune=rocket is modeled with generic.md? Edwin
[PATCH] LoongArch: Fix infinite secondary reloading of FCCmode [PR113148]
The GCC internal doc says: X might be a pseudo-register or a 'subreg' of a pseudo-register, which could either be in a hard register or in memory. Use 'true_regnum' to find out; it will return -1 if the pseudo is in memory and the hard register number if it is in a register. So "MEM_P (x)" is not enough for checking if we are reloading from/to the memory. This bug has caused reload pass to stall and finally ICE complaining with "maximum number of generated reload insns per insn achieved", since r14-6814. Check if "true_regnum (x)" is -1 besides "MEM_P (x)" to fix the issue. gcc/ChangeLog: PR target/113148 * config/loongarch/loongarch.cc (loongarch_secondary_reload): Check if regno == -1 besides MEM_P (x) for reloading FCCmode from/to FPR to/from memory. gcc/testsuite/ChangeLog: PR target/113148 * gcc.target/loongarch/pr113148.c: New test. --- Bootstrapped & regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.cc | 3 +- gcc/testsuite/gcc.target/loongarch/pr113148.c | 44 +++ 2 files changed, 46 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/pr113148.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 5ffd06ce9be..c0a0af3dda5 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -6951,7 +6951,8 @@ loongarch_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x, return NO_REGS; } - if (reg_class_subset_p (rclass, FP_REGS) && MEM_P (x)) + if (reg_class_subset_p (rclass, FP_REGS) + && (regno == -1 || MEM_P (x))) return GR_REGS; return NO_REGS; diff --git a/gcc/testsuite/gcc.target/loongarch/pr113148.c b/gcc/testsuite/gcc.target/loongarch/pr113148.c new file mode 100644 index 000..cf48e552053 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/pr113148.c @@ -0,0 +1,44 @@ +/* PR 113148: ICE caused by infinite reloading */ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=la464 -mfpu=64 -mabi=lp64d" } */ + +struct bound +{ + double max; +} drawQuadrant_bound; +double w4, innerXfromXY_y, computeBound_right_0; +struct arc_def +{ + double w, h; + double a0, a1; +}; +static void drawQuadrant (struct arc_def *); +static void +computeBound (struct arc_def *def, struct bound *bound) +{ + double ellipsex_1, ellipsex_0; + bound->max = def->a1 ?: __builtin_sin (w4) * def->h; + if (def->a0 == 5 && def->w == def->h) +; + else +ellipsex_0 = def->a0 == 0.0 ?: __builtin_cos (w4); + if (def->a1 == 5 && def->w == def->h) +ellipsex_1 = bound->max; + __builtin_sqrt (ellipsex_1 * innerXfromXY_y * innerXfromXY_y * w4); + computeBound_right_0 = ellipsex_0; +} +void +drawArc () +{ + struct arc_def foo; + for (;;) +drawQuadrant (&foo); +} +void +drawQuadrant (struct arc_def *def) +{ + int y, miny; + computeBound (def, &drawQuadrant_bound); + while (y >= miny) +; +} -- 2.43.0
Re: [PATCH v2 2/2] libstdc++: implement std::generator
Hi Will, Will Hawkins writes: > On Thu, Dec 21, 2023 at 4:26 PM Arsen Arsenović wrote: >> >> libstdc++-v3/ChangeLog: >> > > ... snip ... > >> + void >> + _M_jump_in(_Coro_handle __rest, _Coro_handle __new) noexcept >> + { >> + __glibcxx_assert(&__new.promise()._M_nest == this); >> + __glibcxx_assert(this->_M_is_bottom()); >> + // We're bottom. We're also top of top is unset (note that this is > > Should this read: > > // We're bottom. We're also top if top is unset (note that this is > > ? Yes, I am decently sure so - well spotted. I'll leave your message unread to correct this typo later. Thanks! > Very impressive work -- I learned a ton by reading your implementation. > Will Happy to hear that :-) Have a lovely night! -- Arsen Arsenović signature.asc Description: PGP signature
[PING][PATCH] enable ATOMIC_COMPARE_EXCHANGE opt for floating type or types contains padding
Ping, thanks. I did some benchmarks, and there is some significant time optimization for float/double types, while there is no regression for long double type.
[PATCH] RISC-V: Disallow transformation into VLMAX AVL for cond_len_xxx when length is in range [0, 31]
Notice we have this following situation: vsetivlizero,4,e32,m1,ta,ma vlseg4e32.v v4,(a5) vlseg4e32.v v12,(a3) vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since VLMAX AVL = 4 when it is fixed-vlmax vfadd.vfv3,v13,fa0 vfadd.vfv1,v12,fa1 vfmul.vvv17,v3,v5 vfmul.vvv16,v1,v5 The rootcause is that we transform COND_LEN_xxx into VLMAX AVL when len == NUNITS blindly. However, we don't need to transform all of them since when len is range of [0,31], we don't need to consume scalar registers. After this patch: vsetivlizero,4,e32,m1,tu,ma addia4,a5,400 vlseg4e32.v v12,(a3) vfadd.vfv3,v13,fa0 vfadd.vfv1,v12,fa1 vlseg4e32.v v4,(a4) vfadd.vfv2,v14,fa1 vfmul.vvv17,v3,v5 vfmul.vvv16,v1,v5 Tested on both RV32 and RV64 no regression. Ok for trunk ? gcc/ChangeLog: * config/riscv/riscv-v.cc (is_vlmax_len_p): New function. (expand_load_store): Disallow transformation into VLMAX when len is in range of [0,31] (expand_cond_len_op): Ditto. (expand_gather_scatter): Ditto. (expand_lanes_load_store): Ditto. (expand_fold_extract_last): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/post-ra-avl.c: Adapt test. * gcc.target/riscv/rvv/base/vf_avl-2.c: New test. --- gcc/config/riscv/riscv-v.cc | 21 +-- .../riscv/rvv/autovec/post-ra-avl.c | 2 +- .../gcc.target/riscv/rvv/base/vf_avl-2.c | 21 +++ 3 files changed, 37 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 038ab084a37..0cc7af58da6 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -68,6 +68,16 @@ imm_avl_p (machine_mode mode) : false; } +/* Return true if LEN is equal to NUNITS that outbounds range of [0, 31]. */ +static bool +is_vlmax_len_p (machine_mode mode, rtx len) +{ + poly_int64 value; + return poly_int_rtx_p (len, &value) +&& known_eq (value, GET_MODE_NUNITS (mode)) +&& !satisfies_constraint_K (len); +} + /* Helper functions for insn_flags && insn_types */ /* Return true if caller need pass mask operand for insn pattern with @@ -3776,7 +3786,7 @@ expand_load_store (rtx *ops, bool is_load) rtx len = ops[3]; machine_mode mode = GET_MODE (ops[0]); - if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode))) + if (is_vlmax_len_p (mode, len)) { /* If the length operand is equal to VF, it is VLMAX load/store. */ if (is_load) @@ -3842,8 +3852,7 @@ expand_cond_len_op (unsigned icode, insn_flags op_type, rtx *ops, rtx len) machine_mode mask_mode = GET_MODE (mask); poly_int64 value; bool is_dummy_mask = rtx_equal_p (mask, CONSTM1_RTX (mask_mode)); - bool is_vlmax_len -= poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)); + bool is_vlmax_len = is_vlmax_len_p (mode, len); unsigned insn_flags = HAS_DEST_P | HAS_MASK_P | HAS_MERGE_P | op_type; if (is_dummy_mask) @@ -4012,7 +4021,7 @@ expand_gather_scatter (rtx *ops, bool is_load) unsigned inner_offsize = GET_MODE_BITSIZE (inner_idx_mode); poly_int64 nunits = GET_MODE_NUNITS (vec_mode); poly_int64 value; - bool is_vlmax = poly_int_rtx_p (len, &value) && known_eq (value, nunits); + bool is_vlmax = is_vlmax_len_p (vec_mode, len); /* Extend the offset element to address width. */ if (inner_offsize < BITS_PER_WORD) @@ -4199,7 +4208,7 @@ expand_lanes_load_store (rtx *ops, bool is_load) rtx reg = is_load ? ops[0] : ops[1]; machine_mode mode = GET_MODE (ops[0]); - if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode))) + if (is_vlmax_len_p (mode, len)) { /* If the length operand is equal to VF, it is VLMAX load/store. */ if (is_load) @@ -4252,7 +4261,7 @@ expand_fold_extract_last (rtx *ops) rtx slide_vect = gen_reg_rtx (mode); insn_code icode; - if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode))) + if (is_vlmax_len_p (mode, len)) len = NULL_RTX; /* Calculate the number of 1-bit in mask. */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c index f3d12bac7cd..c77b2d187fe 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c @@ -13,4 +13,4 @@ int foo() { return a; } -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero} 1 } } */ +/* { dg-final { scan-assembler-not {vsetvli} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c b/gcc/tests
[PATCH V2] RISC-V: Disallow transformation into VLMAX AVL for cond_len_xxx when length is in range [0, 31]
Notice we have this following situation: vsetivlizero,4,e32,m1,ta,ma vlseg4e32.v v4,(a5) vlseg4e32.v v12,(a3) vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since VLMAX AVL = 4 when it is fixed-vlmax vfadd.vfv3,v13,fa0 vfadd.vfv1,v12,fa1 vfmul.vvv17,v3,v5 vfmul.vvv16,v1,v5 The rootcause is that we transform COND_LEN_xxx into VLMAX AVL when len == NUNITS blindly. However, we don't need to transform all of them since when len is range of [0,31], we don't need to consume scalar registers. After this patch: vsetivlizero,4,e32,m1,tu,ma addia4,a5,400 vlseg4e32.v v12,(a3) vfadd.vfv3,v13,fa0 vfadd.vfv1,v12,fa1 vlseg4e32.v v4,(a4) vfadd.vfv2,v14,fa1 vfmul.vvv17,v3,v5 vfmul.vvv16,v1,v5 Tested on both RV32 and RV64 no regression. Ok for trunk ? gcc/ChangeLog: * config/riscv/riscv-v.cc (is_vlmax_len_p): New function. (expand_load_store): Disallow transformation into VLMAX when len is in range of [0,31] (expand_cond_len_op): Ditto. (expand_gather_scatter): Ditto. (expand_lanes_load_store): Ditto. (expand_fold_extract_last): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/post-ra-avl.c: Adapt test. * gcc.target/riscv/rvv/base/vf_avl-2.c: New test. --- gcc/config/riscv/riscv-v.cc | 21 +-- .../riscv/rvv/autovec/post-ra-avl.c | 2 +- .../gcc.target/riscv/rvv/base/vf_avl-2.c | 21 +++ 3 files changed, 37 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 038ab084a37..0cc7af58da6 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -68,6 +68,16 @@ imm_avl_p (machine_mode mode) : false; } +/* Return true if LEN is equal to NUNITS that outbounds range of [0, 31]. */ +static bool +is_vlmax_len_p (machine_mode mode, rtx len) +{ + poly_int64 value; + return poly_int_rtx_p (len, &value) +&& known_eq (value, GET_MODE_NUNITS (mode)) +&& !satisfies_constraint_K (len); +} + /* Helper functions for insn_flags && insn_types */ /* Return true if caller need pass mask operand for insn pattern with @@ -3776,7 +3786,7 @@ expand_load_store (rtx *ops, bool is_load) rtx len = ops[3]; machine_mode mode = GET_MODE (ops[0]); - if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode))) + if (is_vlmax_len_p (mode, len)) { /* If the length operand is equal to VF, it is VLMAX load/store. */ if (is_load) @@ -3842,8 +3852,7 @@ expand_cond_len_op (unsigned icode, insn_flags op_type, rtx *ops, rtx len) machine_mode mask_mode = GET_MODE (mask); poly_int64 value; bool is_dummy_mask = rtx_equal_p (mask, CONSTM1_RTX (mask_mode)); - bool is_vlmax_len -= poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)); + bool is_vlmax_len = is_vlmax_len_p (mode, len); unsigned insn_flags = HAS_DEST_P | HAS_MASK_P | HAS_MERGE_P | op_type; if (is_dummy_mask) @@ -4012,7 +4021,7 @@ expand_gather_scatter (rtx *ops, bool is_load) unsigned inner_offsize = GET_MODE_BITSIZE (inner_idx_mode); poly_int64 nunits = GET_MODE_NUNITS (vec_mode); poly_int64 value; - bool is_vlmax = poly_int_rtx_p (len, &value) && known_eq (value, nunits); + bool is_vlmax = is_vlmax_len_p (vec_mode, len); /* Extend the offset element to address width. */ if (inner_offsize < BITS_PER_WORD) @@ -4199,7 +4208,7 @@ expand_lanes_load_store (rtx *ops, bool is_load) rtx reg = is_load ? ops[0] : ops[1]; machine_mode mode = GET_MODE (ops[0]); - if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode))) + if (is_vlmax_len_p (mode, len)) { /* If the length operand is equal to VF, it is VLMAX load/store. */ if (is_load) @@ -4252,7 +4261,7 @@ expand_fold_extract_last (rtx *ops) rtx slide_vect = gen_reg_rtx (mode); insn_code icode; - if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode))) + if (is_vlmax_len_p (mode, len)) len = NULL_RTX; /* Calculate the number of 1-bit in mask. */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c index f3d12bac7cd..bff6dcb1c38 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c @@ -13,4 +13,4 @@ int foo() { return a; } -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero} 1 } } */ +/* { dg-final { scan-assembler-not {vsetvli\s+[a-x0-9]+,\s*zero} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vf_
Re: [PATCH] RISC-V: Disallow transformation into VLMAX AVL for cond_len_xxx when length is in range [0, 31]
send V2 with test tweak: https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641447.html juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-12-27 09:52 To: gcc-patches CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH] RISC-V: Disallow transformation into VLMAX AVL for cond_len_xxx when length is in range [0,31] Notice we have this following situation: vsetivlizero,4,e32,m1,ta,ma vlseg4e32.v v4,(a5) vlseg4e32.v v12,(a3) vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since VLMAX AVL = 4 when it is fixed-vlmax vfadd.vfv3,v13,fa0 vfadd.vfv1,v12,fa1 vfmul.vvv17,v3,v5 vfmul.vvv16,v1,v5 The rootcause is that we transform COND_LEN_xxx into VLMAX AVL when len == NUNITS blindly. However, we don't need to transform all of them since when len is range of [0,31], we don't need to consume scalar registers. After this patch: vsetivli zero,4,e32,m1,tu,ma addi a4,a5,400 vlseg4e32.v v12,(a3) vfadd.vf v3,v13,fa0 vfadd.vf v1,v12,fa1 vlseg4e32.v v4,(a4) vfadd.vf v2,v14,fa1 vfmul.vv v17,v3,v5 vfmul.vv v16,v1,v5 Tested on both RV32 and RV64 no regression. Ok for trunk ? gcc/ChangeLog: * config/riscv/riscv-v.cc (is_vlmax_len_p): New function. (expand_load_store): Disallow transformation into VLMAX when len is in range of [0,31] (expand_cond_len_op): Ditto. (expand_gather_scatter): Ditto. (expand_lanes_load_store): Ditto. (expand_fold_extract_last): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/post-ra-avl.c: Adapt test. * gcc.target/riscv/rvv/base/vf_avl-2.c: New test. --- gcc/config/riscv/riscv-v.cc | 21 +-- .../riscv/rvv/autovec/post-ra-avl.c | 2 +- .../gcc.target/riscv/rvv/base/vf_avl-2.c | 21 +++ 3 files changed, 37 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 038ab084a37..0cc7af58da6 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -68,6 +68,16 @@ imm_avl_p (machine_mode mode) : false; } +/* Return true if LEN is equal to NUNITS that outbounds range of [0, 31]. */ +static bool +is_vlmax_len_p (machine_mode mode, rtx len) +{ + poly_int64 value; + return poly_int_rtx_p (len, &value) + && known_eq (value, GET_MODE_NUNITS (mode)) + && !satisfies_constraint_K (len); +} + /* Helper functions for insn_flags && insn_types */ /* Return true if caller need pass mask operand for insn pattern with @@ -3776,7 +3786,7 @@ expand_load_store (rtx *ops, bool is_load) rtx len = ops[3]; machine_mode mode = GET_MODE (ops[0]); - if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode))) + if (is_vlmax_len_p (mode, len)) { /* If the length operand is equal to VF, it is VLMAX load/store. */ if (is_load) @@ -3842,8 +3852,7 @@ expand_cond_len_op (unsigned icode, insn_flags op_type, rtx *ops, rtx len) machine_mode mask_mode = GET_MODE (mask); poly_int64 value; bool is_dummy_mask = rtx_equal_p (mask, CONSTM1_RTX (mask_mode)); - bool is_vlmax_len -= poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)); + bool is_vlmax_len = is_vlmax_len_p (mode, len); unsigned insn_flags = HAS_DEST_P | HAS_MASK_P | HAS_MERGE_P | op_type; if (is_dummy_mask) @@ -4012,7 +4021,7 @@ expand_gather_scatter (rtx *ops, bool is_load) unsigned inner_offsize = GET_MODE_BITSIZE (inner_idx_mode); poly_int64 nunits = GET_MODE_NUNITS (vec_mode); poly_int64 value; - bool is_vlmax = poly_int_rtx_p (len, &value) && known_eq (value, nunits); + bool is_vlmax = is_vlmax_len_p (vec_mode, len); /* Extend the offset element to address width. */ if (inner_offsize < BITS_PER_WORD) @@ -4199,7 +4208,7 @@ expand_lanes_load_store (rtx *ops, bool is_load) rtx reg = is_load ? ops[0] : ops[1]; machine_mode mode = GET_MODE (ops[0]); - if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode))) + if (is_vlmax_len_p (mode, len)) { /* If the length operand is equal to VF, it is VLMAX load/store. */ if (is_load) @@ -4252,7 +4261,7 @@ expand_fold_extract_last (rtx *ops) rtx slide_vect = gen_reg_rtx (mode); insn_code icode; - if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode))) + if (is_vlmax_len_p (mode, len)) len = NULL_RTX; /* Calculate the number of 1-bit in mask. */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c index f3d12bac7cd..c77b2d187fe 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c @@ -13,4 +13,4 @@ int foo() { return a; } -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero} 1 } } */
回复:[PATCH v3 1/6] RISC-V: Refactor riscv-vector-builtins-bases.cc
Hi Jeff, Perhaps fold_fault_load cannot be moved to riscv-protos.h since gimple_folder is declared in riscv-vector-builtins.h. It's not reasonable to include riscv-vector-builtins.h in riscv-protos.h. In fact, fold_fault_load is defined specially for some builtin functions, and it would be better to just prototype in riscv-vector-builtins-bases.h. Joshua -- 发件人:Jeff Law 发送时间:2023年12月21日(星期四) 02:14 收件人:"Jun Sha (Joshua)"; "gcc-patches" 抄 送:"jim.wilson.gcc"; palmer; andrew; "philipp.tomsich"; "christoph.muellner"; "juzhe.zhong"; Jin Ma; Xianmiao Qu 主 题:Re: [PATCH v3 1/6] RISC-V: Refactor riscv-vector-builtins-bases.cc On 12/20/23 05:25, Jun Sha (Joshua) wrote: > This patch moves the definition of the enums lst_type and > frm_op_type into riscv-vector-builtins-bases.h and removes > the static visibility of fold_fault_load(), so these > can be used in other compile units. > > gcc/ChangeLog: > > * config/riscv/riscv-vector-builtins-bases.cc (enum lst_type): > (enum frm_op_type): move to riscv-vector-builtins-bases.h > * config/riscv/riscv-vector-builtins-bases.h > (GCC_RISCV_VECTOR_BUILTINS_BASES_H): Add header files. > (enum lst_type): move from > (enum frm_op_type): riscv-vector-builtins-bases.cc > (fold_fault_load): riscv-vector-builtins-bases.cc I'm largely hoping to leave the heavy review lifting here to Juzhe who knows GCC's RV vector bits as well as anyone. Just one small issue. Would it be better to prototype fold_fault_load elsewhere and avoid the gimple.h inclusion in riscv-vector-builtins-bases.h? Perhaps riscv-protos.h? You might consider prefixing the function name with riscv_. It's not strictly necessary, but it appears to be relatively common in risc-v port. Thanks, Jeff
Re: [PATCH] RISC-V: Add crypto machine descriptions
Thanks Feng, the patch is LGTM from my side, I am happy to accept vector crypto stuffs for GCC 14, it's mostly intrinsic stuff, and the only few non-intrinsic stuff also low risk enough (e.g. vrol, vctz) On Fri, Dec 22, 2023 at 10:04 AM Feng Wang wrote: > > 2023-12-22 09:59 Feng Wang wrote: > > Sorry for forgetting to add the patch version number. It should be [PATCH v8 > 2/3] > > >Patch v8: Remove unused iterator and add newline at the end. > > > > >Patch v7: Remove mode of const_int_operand and typo. Add > > > > > newline at the end and comment at the beginning. > > > > >Patch v6: Swap the operator order of vandn.vv > > > > >Patch v5: Add vec_duplicate operator. > > > > >Patch v4: Add process of SEW=64 in RV32 system. > > > > >Patch v3: Moidfy constrains for crypto vector. > > > > >Patch v2: Add crypto vector ins into RATIO attr and use vr as > > > > >destination register. > > > > > > > > > >This patch add the crypto machine descriptions(vector-crypto.md) and > > > > >some new iterators which are used by crypto vector ext. > > > > > > > > > >Co-Authored by: Songhe Zhu > > > > >Co-Authored by: Ciyan Pan > > > > >gcc/ChangeLog: > > > > > > > > > > * config/riscv/iterators.md: Add rotate insn name. > > > > > * config/riscv/riscv.md: Add new insns name for crypto vector. > > > > > * config/riscv/vector-iterators.md: Add new iterators for crypto > > vector. > > > > > * config/riscv/vector.md: Add the corresponding attr for crypto > > vector. > > > > > * config/riscv/vector-crypto.md: New file.The machine descriptions > > for crypto vector. > > > > >--- > > > > > gcc/config/riscv/iterators.md| 4 +- > > > > > gcc/config/riscv/riscv.md| 33 +- > > > > > gcc/config/riscv/vector-crypto.md| 654 +++ > > > > > gcc/config/riscv/vector-iterators.md | 36 ++ > > > > > gcc/config/riscv/vector.md | 55 ++- > > > > > 5 files changed, 761 insertions(+), 21 deletions(-) > > > > > create mode 100755 gcc/config/riscv/vector-crypto.md > > > > > > > > > >diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md > > > > >index ecf033f2fa7..f332fba7031 100644 > > > > >--- a/gcc/config/riscv/iterators.md > > > > >+++ b/gcc/config/riscv/iterators.md > > > > >@@ -304,7 +304,9 @@ > > > > >(umax "maxu") > > > > >(clz "clz") > > > > >(ctz "ctz") > > > > >- (popcount "cpop")]) > > > > >+ (popcount "cpop") > > > > >+ (rotate "rol") > > > > >+ (rotatert "ror")]) > > > > > > > > > > ;; --- > > > > > ;; Int Iterators. > > > > >diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md > > > > >index ee8b71c22aa..88019a46a53 100644 > > > > >--- a/gcc/config/riscv/riscv.md > > > > >+++ b/gcc/config/riscv/riscv.md > > > > >@@ -427,6 +427,34 @@ > > > > > ;; vcompressvector compress instruction > > > > > ;; vmov whole vector register move > > > > > ;; vector unknown vector instruction > > > > >+;; 17. Crypto Vector instructions > > > > >+;; vandncrypto vector bitwise and-not instructions > > > > >+;; vbrevcrypto vector reverse bits in elements instructions > > > > >+;; vbrev8 crypto vector reverse bits in bytes instructions > > > > >+;; vrev8crypto vector reverse bytes instructions > > > > >+;; vclz crypto vector count leading Zeros instructions > > > > >+;; vctz crypto vector count lrailing Zeros instructions > > > > >+;; vrol crypto vector rotate left instructions > > > > >+;; vror crypto vector rotate right instructions > > > > >+;; vwsllcrypto vector widening shift left logical instructions > > > > >+;; vclmul crypto vector carry-less multiply - return low half > >instructions > > > > >+;; vclmulh crypto vector carry-less multiply - return high half > >instructions > > > > >+;; vghshcrypto vector add-multiply over GHASH Galois-Field > >instructions > > > > >+;; vgmulcrypto vector multiply over GHASH Galois-Field instrumctions > > > > >+;; vaesef crypto vector AES final-round encryption instructions > > > > >+;; vaesem crypto vector AES middle-round encryption instructions > > > > >+;; vaesdf crypto vector AES final-round decryption instructions > > > > >+;; vaesdm crypto vector AES middle-round decryption instructions > > > > >+;; vaeskf1 crypto vector AES-128 Forward KeySchedule generation > >instructions > > > > >+;; vaeskf2 crypto vector AES-256 Forward KeySchedule generation > >instructions > > > > >+;; vaeszcrypto vector AES round zero encryption/decryption > >instructions > > > > >+;; vsha2ms crypto vector SHA-2 message schedule instructions > > > > >+;; vsha2ch
回复:[PATCH v3 2/6] RISC-V: Split csr_operand in predicates.md for vector patterns.
Hi Jeff, Yes, I will change soemthing in vector_csr_operand in the following patches. Constraints will be added that the AVL cannot be encoded as an immediate for xtheadvecotr vsetvl. Joshua -- 发件人:Jeff Law 发送时间:2023年12月21日(星期四) 02:16 收件人:"Jun Sha (Joshua)"; "gcc-patches" 抄 送:"jim.wilson.gcc"; palmer; andrew; "philipp.tomsich"; "christoph.muellner"; "juzhe.zhong"; Jin Ma; Xianmiao Qu 主 题:Re: [PATCH v3 2/6] RISC-V: Split csr_operand in predicates.md for vector patterns. On 12/20/23 05:27, Jun Sha (Joshua) wrote: > This patch splits the definition of csr_operand in predicates.md. > The newly defined vector_csr_operand has the same functionality > as csr_operand but can only be used in vector patterns, so that > changes for vector will not affect scalar patterns in files > like riscv.md. > > gcc/ChangeLog: > > * config/riscv/predicates.md (vector_csr_operand): > Define vector_csr_opeand for vector. > * config/riscv/vector.md: > Use newly defined csr_operand for vector. So do you envision changing something in vector_csr_operand? If not, then this doesn't make much sense. Jeff
Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine
在 2023/12/23 下午6:44, Xi Ruoyao 写道: On Sat, 2023-12-23 at 10:29 +0800, chenglulu wrote: The performance drop has nothing to do with this patch. I found that the h264 performance compiled by r14-6787 compared to r14-6421 dropped by 6.4%. Then I guess we should create a bug report... The code h264 score in r14-6818 is the same as that of r14-6421. But there is a problem. My regression test has the following two fail items.(based on r14-6787) +FAIL: gcc.dg/cpp/_Pragma3.c (test for excess errors) +FAIL: gcc.dg/pr86617.c scan-rtl-dump-times final "mem/v" 6 Strange. I didn't see them on r14-6650 (with or without the patch). +FAIL: gcc.dg/pr86617.c scan-rtl-dump-times final "mem/v" 6 In r14-6818 the issue persists. I kind of chased the code and found that the problem is like this: volatile unsigned char u8; void test (void) { u8 = u8 + u8; u8 = u8 - u8; } $./gcc/cc1 test.c -o test.s -fdump-rtl-all-all -fdiagnostics-plain-output -Os -fdump-rtl-final -ffat-lto-objects test.c.301r.outof_cfglayout (insn 7 6 9 2 (set (reg:DI 80 [ u8.0_1 ]) (zero_extend:DI*(mem/v/c*:QI (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) [0 u8D.2193+0 S1 A8]))) "volatile.c":5:11 459 {simple_load_uextdiqidi} (nil)) test.c.302r.split1 (insn 27 6 28 2 (set (reg:DI 98) (unspec:DI [ (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) ] UNSPEC_PCALAU12I_GR)) "volatile.c":5:11 -1 (nil)) (insn 28 27 9 2 (set (reg:DI 80 [ u8.0_1 ]) (zero_extend:DI*(mem:*QI (lo_sum:DI (reg:DI 98) (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])) [0 S1 A8]))) "volatile.c":5:11 -1 (nil)) The volatile property of the mem here is gone, so the test fails.
Re: [PATCH] LoongArch: Fix infinite secondary reloading of FCCmode [PR113148]
在 2023/12/27 上午6:37, Xi Ruoyao 写道: The GCC internal doc says: X might be a pseudo-register or a 'subreg' of a pseudo-register, which could either be in a hard register or in memory. Use 'true_regnum' to find out; it will return -1 if the pseudo is in memory and the hard register number if it is in a register. So "MEM_P (x)" is not enough for checking if we are reloading from/to the memory. This bug has caused reload pass to stall and finally ICE complaining with "maximum number of generated reload insns per insn achieved", since r14-6814. Check if "true_regnum (x)" is -1 besides "MEM_P (x)" to fix the issue. gcc/ChangeLog: PR target/113148 * config/loongarch/loongarch.cc (loongarch_secondary_reload): Check if regno == -1 besides MEM_P (x) for reloading FCCmode from/to FPR to/from memory. gcc/testsuite/ChangeLog: PR target/113148 * gcc.target/loongarch/pr113148.c: New test. --- Bootstrapped & regtested on loongarch64-linux-gnu. Ok for trunk? LGTM! Thanks! gcc/config/loongarch/loongarch.cc | 3 +- gcc/testsuite/gcc.target/loongarch/pr113148.c | 44 +++ 2 files changed, 46 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/pr113148.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 5ffd06ce9be..c0a0af3dda5 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -6951,7 +6951,8 @@ loongarch_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x, return NO_REGS; } - if (reg_class_subset_p (rclass, FP_REGS) && MEM_P (x)) + if (reg_class_subset_p (rclass, FP_REGS) + && (regno == -1 || MEM_P (x))) return GR_REGS; return NO_REGS; diff --git a/gcc/testsuite/gcc.target/loongarch/pr113148.c b/gcc/testsuite/gcc.target/loongarch/pr113148.c new file mode 100644 index 000..cf48e552053 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/pr113148.c @@ -0,0 +1,44 @@ +/* PR 113148: ICE caused by infinite reloading */ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=la464 -mfpu=64 -mabi=lp64d" } */ + +struct bound +{ + double max; +} drawQuadrant_bound; +double w4, innerXfromXY_y, computeBound_right_0; +struct arc_def +{ + double w, h; + double a0, a1; +}; +static void drawQuadrant (struct arc_def *); +static void +computeBound (struct arc_def *def, struct bound *bound) +{ + double ellipsex_1, ellipsex_0; + bound->max = def->a1 ?: __builtin_sin (w4) * def->h; + if (def->a0 == 5 && def->w == def->h) +; + else +ellipsex_0 = def->a0 == 0.0 ?: __builtin_cos (w4); + if (def->a1 == 5 && def->w == def->h) +ellipsex_1 = bound->max; + __builtin_sqrt (ellipsex_1 * innerXfromXY_y * innerXfromXY_y * w4); + computeBound_right_0 = ellipsex_0; +} +void +drawArc () +{ + struct arc_def foo; + for (;;) +drawQuadrant (&foo); +} +void +drawQuadrant (struct arc_def *def) +{ + int y, miny; + computeBound (def, &drawQuadrant_bound); + while (y >= miny) +; +}
Re: [PATCH v2] LoongArch: Expand left rotate to right rotate with negated amount
LGTM! Thanks! 在 2023/12/24 下午8:33, Xi Ruoyao 写道: gcc/ChangeLog: * config/loongarch/loongarch.md (rotl3): New define_expand. * config/loongarch/simd.md (vrotl3): Likewise. (rotl3): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/rotl-with-rotr.c: New test. * gcc.target/loongarch/rotl-with-vrotr-b.c: New test. * gcc.target/loongarch/rotl-with-vrotr-h.c: New test. * gcc.target/loongarch/rotl-with-vrotr-w.c: New test. * gcc.target/loongarch/rotl-with-vrotr-d.c: New test. * gcc.target/loongarch/rotl-with-xvrotr-b.c: New test. * gcc.target/loongarch/rotl-with-xvrotr-h.c: New test. * gcc.target/loongarch/rotl-with-xvrotr-w.c: New test. * gcc.target/loongarch/rotl-with-xvrotr-d.c: New test. --- Change from [v1]: - Wrap the negated wrapping amount with subreg: for rotl, to avoid an ICE left rotating QI and HI vectors. - Add tests for QI, HI, and DI vectors. [v1]:https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640872.html Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.md | 12 gcc/config/loongarch/simd.md | 29 +++ .../gcc.target/loongarch/rotl-with-rotr.c | 9 ++ .../gcc.target/loongarch/rotl-with-vrotr-b.c | 7 + .../gcc.target/loongarch/rotl-with-vrotr-d.c | 7 + .../gcc.target/loongarch/rotl-with-vrotr-h.c | 7 + .../gcc.target/loongarch/rotl-with-vrotr-w.c | 28 ++ .../gcc.target/loongarch/rotl-with-xvrotr-b.c | 7 + .../gcc.target/loongarch/rotl-with-xvrotr-d.c | 7 + .../gcc.target/loongarch/rotl-with-xvrotr-h.c | 7 + .../gcc.target/loongarch/rotl-with-xvrotr-w.c | 7 + 11 files changed, 127 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr-b.c create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr-d.c create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr-h.c create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr-w.c create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr-b.c create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr-d.c create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr-h.c create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr-w.c diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 30025bf1908..939432b83e0 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -2903,6 +2903,18 @@ (define_insn "rotrsi3_extend" [(set_attr "type" "shift,shift") (set_attr "mode" "SI")]) +;; Expand left rotate to right rotate. +(define_expand "rotl3" + [(set (match_dup 3) + (neg:SI (match_operand:SI 2 "register_operand"))) + (set (match_operand:GPR 0 "register_operand") + (rotatert:GPR (match_operand:GPR 1 "register_operand") + (match_dup 3)))] + "" + { +operands[3] = gen_reg_rtx (SImode); + }); + ;; The following templates were added to generate "bstrpick.d + alsl.d" ;; instruction pairs. ;; It is required that the values of const_immalsl_operand and diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md index 13202f79bee..93fb39abcf5 100644 --- a/gcc/config/loongarch/simd.md +++ b/gcc/config/loongarch/simd.md @@ -268,6 +268,35 @@ (define_insn "vrotr3" [(set_attr "type" "simd_int_arith") (set_attr "mode" "")]) +;; Expand left rotate to right rotate. +(define_expand "vrotl3" + [(set (match_dup 3) + (neg:IVEC (match_operand:IVEC 2 "register_operand"))) + (set (match_operand:IVEC 0 "register_operand") + (rotatert:IVEC (match_operand:IVEC 1 "register_operand") + (match_dup 3)))] + "" + { +operands[3] = gen_reg_rtx (mode); + }); + +;; Expand left rotate with a scalar amount to right rotate: negate the +;; scalar before broadcasting it because scalar negation is cheaper than +;; vector negation. +(define_expand "rotl3" + [(set (match_dup 3) + (neg:SI (match_operand:SI 2 "register_operand"))) + (set (match_dup 4) + (vec_duplicate:IVEC (subreg: (match_dup 3) 0))) + (set (match_operand:IVEC 0 "register_operand") + (rotatert:IVEC (match_operand:IVEC 1 "register_operand") + (match_dup 4)))] + "" + { +operands[3] = gen_reg_rtx (SImode); +operands[4] = gen_reg_rtx (mode); + }); + ;; vrotri.{b/h/w/d} (define_insn "rotr3" diff --git a/gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c b/gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c new file mode 100644 index 000..84cc53cecaf --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options
Re: [pushed][PATCH v1] LoongArch: Fix ICE when passing two same vector argument consecutively
Pushed to r14-6849. 在 2023/12/22 下午4:18, Chenghui Pan 写道: Following code will cause ICE on LoongArch target: #include extern void bar (__m128i, __m128i); __m128i a; void foo () { bar (a, a); } It is caused by missing constraint definition in mov_lsx. This patch fixes the template and remove the unnecessary processing from loongarch_split_move () function. This patch also cleanup the redundant definition from loongarch_split_move () and loongarch_split_move_p (). gcc/ChangeLog: * config/loongarch/lasx.md: Use loongarch_split_move and loongarch_split_move_p directly. * config/loongarch/loongarch-protos.h (loongarch_split_move): Remove unnecessary argument. (loongarch_split_move_insn_p): Delete. (loongarch_split_move_insn): Delete. * config/loongarch/loongarch.cc (loongarch_split_move_insn_p): Delete. (loongarch_load_store_insns): Use loongarch_split_move_p directly. (loongarch_split_move): remove the unnecessary processing. (loongarch_split_move_insn): Delete. * config/loongarch/lsx.md: Use loongarch_split_move and loongarch_split_move_p directly. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lsx/lsx-mov-1.c: New test. --- gcc/config/loongarch/lasx.md | 4 +- gcc/config/loongarch/loongarch-protos.h | 4 +- gcc/config/loongarch/loongarch.cc | 49 +-- gcc/config/loongarch/lsx.md | 10 ++-- .../loongarch/vector/lsx/lsx-mov-1.c | 14 ++ 5 files changed, 24 insertions(+), 57 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-mov-1.c diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index eeac8cd984b..6418ff52fe5 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -912,10 +912,10 @@ (define_split [(set (match_operand:LASX 0 "nonimmediate_operand") (match_operand:LASX 1 "move_operand"))] "reload_completed && ISA_HAS_LASX - && loongarch_split_move_insn_p (operands[0], operands[1])" + && loongarch_split_move_p (operands[0], operands[1])" [(const_int 0)] { - loongarch_split_move_insn (operands[0], operands[1], curr_insn); + loongarch_split_move (operands[0], operands[1]); DONE; }) diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index c66ab932d67..7bf21a45c69 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -82,11 +82,9 @@ extern rtx loongarch_legitimize_call_address (rtx); extern rtx loongarch_subword (rtx, bool); extern bool loongarch_split_move_p (rtx, rtx); -extern void loongarch_split_move (rtx, rtx, rtx); +extern void loongarch_split_move (rtx, rtx); extern bool loongarch_addu16i_imm12_operand_p (HOST_WIDE_INT, machine_mode); extern void loongarch_split_plus_constant (rtx *, machine_mode); -extern bool loongarch_split_move_insn_p (rtx, rtx); -extern void loongarch_split_move_insn (rtx, rtx, rtx); extern void loongarch_split_128bit_move (rtx, rtx); extern bool loongarch_split_128bit_move_p (rtx, rtx); extern void loongarch_split_256bit_move (rtx, rtx); diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 390e3206a17..98709123770 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -2562,7 +2562,6 @@ loongarch_split_const_insns (rtx x) return low + high; } -bool loongarch_split_move_insn_p (rtx dest, rtx src); /* Return one word of 128-bit value OP, taking into account the fixed endianness of certain registers. BYTE selects from the byte address. */ @@ -2602,7 +2601,7 @@ loongarch_load_store_insns (rtx mem, rtx_insn *insn) { set = single_set (insn); if (set - && !loongarch_split_move_insn_p (SET_DEST (set), SET_SRC (set))) + && !loongarch_split_move_p (SET_DEST (set), SET_SRC (set))) might_split_p = false; } @@ -4220,7 +4219,7 @@ loongarch_split_move_p (rtx dest, rtx src) SPLIT_TYPE describes the split condition. */ void -loongarch_split_move (rtx dest, rtx src, rtx insn_) +loongarch_split_move (rtx dest, rtx src) { rtx low_dest; @@ -4258,33 +4257,6 @@ loongarch_split_move (rtx dest, rtx src, rtx insn_) loongarch_subword (src, true)); } } - - /* This is a hack. See if the next insn uses DEST and if so, see if we - can forward SRC for DEST. This is most useful if the next insn is a - simple store. */ - rtx_insn *insn = (rtx_insn *) insn_; - struct loongarch_address_info addr = {}; - if (insn) -{ - rtx_insn *next = next_nonnote_nondebug_insn_bb (insn); - if (next) - { - rtx set = single_set (next); - if (set && SET_SRC (set) == dest) - { -
Re: [pushed ][PATCH v1] LoongArch: Fix insn output of vec_concat templates for LASX.
Pused to r14-6848. 在 2023/12/22 下午4:22, Chenghui Pan 写道: When investigaing failure of gcc.dg/vect/slp-reduc-sad.c, following instruction block are being generated by vec_concatv32qi (which is generated by vec_initv32qiv16qi) at entrance of foo() function: vldx$vr3,$r5,$r6 vld $vr2,$r5,0 xvpermi.q $xr2,$xr3,0x20 causes the reversion of vec_initv32qiv16qi operation's high and low 128-bit part. According to other target's similar impl and LSX impl for following RTL representation, current definition in lasx.md of "vec_concat" are wrong: (set (op0) (vec_concat (op1) (op2))) For correct behavior, the last argument of xvpermi.q should be 0x02 instead of 0x20. This patch fixes this issue and cleanup the vec_concat template impl. gcc/ChangeLog: * config/loongarch/lasx.md (vec_concatv4di): Delete. (vec_concatv8si): Delete. (vec_concatv16hi): Delete. (vec_concatv32qi): Delete. (vec_concatv4df): Delete. (vec_concatv8sf): Delete. (vec_concat): New template with insn output fixed. --- gcc/config/loongarch/lasx.md | 74 1 file changed, 7 insertions(+), 67 deletions(-) diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index eeac8cd984b..a9d948bb606 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -590,77 +590,17 @@ (define_insn "lasx_xvinsgr2vr_" [(set_attr "type" "simd_insert") (set_attr "mode" "")]) -(define_insn "vec_concatv4di" - [(set (match_operand:V4DI 0 "register_operand" "=f") - (vec_concat:V4DI - (match_operand:V2DI 1 "register_operand" "0") - (match_operand:V2DI 2 "register_operand" "f")))] - "ISA_HAS_LASX" -{ - return "xvpermi.q\t%u0,%u2,0x20"; -} - [(set_attr "type" "simd_splat") - (set_attr "mode" "V4DI")]) - -(define_insn "vec_concatv8si" - [(set (match_operand:V8SI 0 "register_operand" "=f") - (vec_concat:V8SI - (match_operand:V4SI 1 "register_operand" "0") - (match_operand:V4SI 2 "register_operand" "f")))] - "ISA_HAS_LASX" -{ - return "xvpermi.q\t%u0,%u2,0x20"; -} - [(set_attr "type" "simd_splat") - (set_attr "mode" "V4DI")]) - -(define_insn "vec_concatv16hi" - [(set (match_operand:V16HI 0 "register_operand" "=f") - (vec_concat:V16HI - (match_operand:V8HI 1 "register_operand" "0") - (match_operand:V8HI 2 "register_operand" "f")))] - "ISA_HAS_LASX" -{ - return "xvpermi.q\t%u0,%u2,0x20"; -} - [(set_attr "type" "simd_splat") - (set_attr "mode" "V4DI")]) - -(define_insn "vec_concatv32qi" - [(set (match_operand:V32QI 0 "register_operand" "=f") - (vec_concat:V32QI - (match_operand:V16QI 1 "register_operand" "0") - (match_operand:V16QI 2 "register_operand" "f")))] - "ISA_HAS_LASX" -{ - return "xvpermi.q\t%u0,%u2,0x20"; -} - [(set_attr "type" "simd_splat") - (set_attr "mode" "V4DI")]) - -(define_insn "vec_concatv4df" - [(set (match_operand:V4DF 0 "register_operand" "=f") - (vec_concat:V4DF - (match_operand:V2DF 1 "register_operand" "0") - (match_operand:V2DF 2 "register_operand" "f")))] - "ISA_HAS_LASX" -{ - return "xvpermi.q\t%u0,%u2,0x20"; -} - [(set_attr "type" "simd_splat") - (set_attr "mode" "V4DF")]) - -(define_insn "vec_concatv8sf" - [(set (match_operand:V8SF 0 "register_operand" "=f") - (vec_concat:V8SF - (match_operand:V4SF 1 "register_operand" "0") - (match_operand:V4SF 2 "register_operand" "f")))] +(define_insn "vec_concat" + [(set (match_operand:LASX 0 "register_operand" "=f") + (vec_concat:LASX + (match_operand: 1 "register_operand" "0") + (match_operand: 2 "register_operand" "f")))] "ISA_HAS_LASX" { - return "xvpermi.q\t%u0,%u2,0x20"; + return "xvpermi.q\t%u0,%u2,0x02"; } [(set_attr "type" "simd_splat") - (set_attr "mode" "V4DI")]) + (set_attr "mode" "")]) ;; xshuf.w (define_insn "lasx_xvperm_"
Re: [pushed][PATCH v1] LoongArch: Fixed bug in *bstrins__for_ior_mask template.
Pushed to r14-6847. 在 2023/12/25 上午11:20, Li Wei 写道: We found that using the latest compiled gcc will cause a miscompare error when running spec2006 400.perlbench test with -flto turned on. After testing, it was found that only the LoongArch architecture will report errors. The first error commit was located through the git bisect command as r14-3773-g5b857e87201335. Through debugging, it was found that the problem was that the split condition of the *bstrins__for_ior_mask template was empty, which should actually be consistent with the insn condition. gcc/ChangeLog: * config/loongarch/loongarch.md: Adjust. --- gcc/config/loongarch/loongarch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 7021105b241..2b0609f2f31 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -1489,7 +1489,7 @@ (define_insn_and_split "*bstrins__for_ior_mask" "loongarch_pre_reload_split () && \ loongarch_use_bstrins_for_ior_with_mask (mode, operands)" "#" - "" + "&& true" [(set (match_dup 0) (match_dup 1)) (set (zero_extract:GPR (match_dup 0) (match_dup 2) (match_dup 4)) (match_dup 3))]