Re: [PATCH v8] RISC-V: Add the 'zfa' extension, version 0.2.

2023-05-06 Thread Jin Ma via Gcc-patches




On 4/19/23 03:57, Jin Ma wrote:
> This patch adds the 'Zfa' extension for riscv, which is based on:
>https://github.com/riscv/riscv-isa-manual/commits/zfb
>
https://github.com/riscv/riscv-isa-manual/commit/1f038182810727f5feca311072e630d6baac51da
> 
> The binutils-gdb for 'Zfa' extension:

>https://github.com/a4lg/binutils-gdb/commits/riscv-zfa
> 
> What needs special explanation is:

> 1, The immediate number of the instructions FLI.H/S/D is represented in the 
assembly as a
>floating-point value, with scientific counting when rs1 is 1,2, and 
decimal numbers for
>the rest.
> 
>Related llvm link:

>  https://reviews.llvm.org/D145645
>Related discussion link:
>  https://github.com/riscv/riscv-isa-manual/issues/980
Right.  I think the goal right now is to get the bulk of this reviewed 
now.  Ideally we'll get to the point where the only outstanding issue is 
the interface between the assembler & gcc.


I will send a new version referring to the latest binutils(v5) in the near 
future:
https://sourceware.org/pipermail/binutils/2023-April/127060.html



> 
> 2, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally to

>accelerate the processing of JavaScript Numbers.", so it seems that no 
implementation
>is required.
Fair enough.  There's seems to be a general desire to wire up builtins 
for many things that aren't directly usable by the compiler.  So 
consider such a change as a follow-up.   I don't think something like 
this should hold up the blk of Zfa.


> 
> 3, The instructions FMINM and FMAXM correspond to C23 library function fminimum and fmaximum.

>Therefore, this patch has simply implemented the pattern of 
fminm3 and
>fmaxm3 to prepare for later.
Sounds good.


> 
> gcc/ChangeLog:
> 
> 	* common/config/riscv/riscv-common.cc: Add zfa extension version.

>* config/riscv/constraints.md (Zf): Constrain the floating point number 
that the
>instructions FLI.H/S/D can load.
>((TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS : NO_REGS): enable FMVP.D.X 
and FMVH.X.D.
>* config/riscv/iterators.md (ceil): New.
>* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): New.
>* config/riscv/riscv.cc (find_index_in_array): New.
>(riscv_float_const_rtx_index_for_fli): Get the index of the floating-point 
number that
>the instructions FLI.H/S/D can mov.
>(riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, 
memory is not applicable.
>(riscv_const_insns): The cost of FLI.H/S/D is 3.
>(riscv_legitimize_const_move): Likewise.
>(riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no split 
is required.
>(riscv_output_move): Output the mov instructions in zfa extension.
>(riscv_print_operand): Output the floating-point value of the FLI.H/S/D 
immediate in assembly
>(riscv_secondary_memory_needed): Likewise.
>* config/riscv/riscv.h (GP_REG_RTX_P): New.
>* config/riscv/riscv.md (fminm3): New.
> 


> index c448e6b37e9..62d9094f966 100644
> --- a/gcc/config/riscv/constraints.md
> +++ b/gcc/config/riscv/constraints.md
> @@ -118,6 +118,13 @@ (define_constraint "T"
> (and (match_operand 0 "move_operand")
>  (match_test "CONSTANT_P (op)")))
>   
> +;; Zfa constraints.

> +
> +(define_constraint "Zf"
> +  "A floating point number that can be loaded using instruction `fli` in 
zfa."
> +  (and (match_code "const_double")
> +   (match_test "(riscv_float_const_rtx_index_for_fli (op) != -1)")))
> +
>   ;; Vector constraints.
>   
>   (define_register_constraint "vr" "TARGET_VECTOR ? V_REGS : NO_REGS"

> @@ -183,8 +190,8 @@ (define_memory_constraint "Wdm"
>   
>   ;; Vendor ISA extension constraints.
>   
> -(define_register_constraint "th_f_fmv" "TARGET_XTHEADFMV ? FP_REGS : NO_REGS"

> +(define_register_constraint "th_f_fmv" "(TARGET_XTHEADFMV || TARGET_ZFA) ? 
FP_REGS : NO_REGS"
> "A floating-point register for XTheadFmv.")
>   
> -(define_register_constraint "th_r_fmv" "TARGET_XTHEADFMV ? GR_REGS : NO_REGS"

> +(define_register_constraint "th_r_fmv" "(TARGET_XTHEADFMV || TARGET_ZFA) ? 
GR_REGS : NO_REGS"
> "An integer register for XTheadFmv.")
I think Christoph had good suggestions on the constraints.  So let's go 
with his suggestions.


You might consider a follow-up patch where you use negation of one of 
the predefined constants for synthesis.  I would not be surprised at all 
if that's as efficient on some cores as loading the negated constants 
out of the constant pool.  But I don't think it has to be a part of this 
patch.




I also think the Christoph is right, and I will revise it according to his 
suggestion.





> diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
> index 9b767038452..c81b08e3cc5 100644
> --- a/gcc/config/riscv/iterators.md
> +++ b/gcc/config/riscv/iterators.md
> @@ -288,3 +288,8 @@ (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET 
UNSPEC_FLE_QUIET])
>   (define_in

[PATCH 0/2] RISC-V: support Zcmp extension

2023-05-06 Thread Fei Gao
Before implementing Zcmp, I did some optimizations and restructures to 
save-restore.
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a5b2a3bff8152aa34408d8ce40add82f4d22ff87
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a782346757c54a5a3cfb9f416a7ebe3554a617d7

Then Zcmp can share the same logic as save-restore in stack allocation: 
pre-allocation
by cm.push, step 1 and step 2. 

please be noted cm.push pushes ra, s0-s11 in reverse order than what 
save-restore does.
So adaption has been done in .cfi directives in my patch. A discussion be found 
here: 
https://github.com/riscv/riscv-code-size-reduction/issues/182

Weeks before, Jiawei also posted Zcmp in 
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615287.html. 
[PATCH 0/5] RISC-V: Support ZC* extensions.   Jiawei
[PATCH 1/5] RISC-V: Minimal support for ZC extensions.   Jiawei
[PATCH 2/5] RISC-V: Enable compressible features when use ZC* extensions.   
Jiawei
[PATCH 3/5] RISC-V: Add ZC* test for march args being passed.   Jiawei
[PATCH 4/5] RISC-V: Add Zcmp extension supports.   Jiawei
[PATCH 5/5] RISC-V: Add ZCMP push/pop testcases.   Jiawei

I tested his codes and observed some issues in [PATCH 4/5].
So I plan to post my codes as an alternative of Jiawei's [PATCH 4/5].

My Zcmp switch codes are almost same as Jiawei's.
So i avoid repeating them in my patch series, 
but please pick up Jiawei's [PATCH 1/5] before picking up my patch series.


Here're some comparison. 
Result left side is REF from Jiawei and right side is from my patch.

1. REF fails to generate zcmp insns.
TC rv32e_zcmp.c
foo:   foo: 
 
addisp,sp,-12  cm.push 
{ra}, -16 
sw  ra,8(sp)
 
callf1 call
f1
lw  ra,8(sp)   cm.pop  
{ra}, 16  
addisp,sp,12
 
tailf2 tail
f2  

2. REF fails to restore regs.
TC rv32i_zcmp.c
test_f0:   test_f0: 
 
cm.push {ra,s0},-32cm.push 
{ra, s0}, -32 
fsw fs0,12(sp) fsw 
fs0,12(sp)
callmy_getchar call
my_getchar
mv  s0,a0  mv  
s0,a0 
callgetf   call
getf  
fmv.s   fs0,fa0fmv.s   
fs0,fa0   
callmy_getchar call
my_getchar
fcvt.s.wfa5,s0 fcvt.s.w 
   fa5,s0
fcvt.s.wfa4,a0 fcvt.s.w 
   fa4,a0
fadd.s  fa0,fa5,fs0fadd.s  
fa0,fa5,fs0   
flw fs0,-20(sp) //issue in restoring fs0  flw 
fs0,12(sp)
fadd.s  fa0,fa0,fa4fadd.s  
fa0,fa0,fa4   
fcvt.w.s a0,fa0,rtzfcvt.w.s 
a0,fa0,rtz   
cm.popret   {ra,s0},32 
cm.popret   {ra, s0}, 32   

3. REF accesses incorrect address of incoming para.
TC: zcmp_stack_alignment.c
fool_rv32e:fool_rv32e:  
 
cm.push {ra,s0-s1},-32 cm.push 
{ra, s0-s1}, -32  
   mv  
s0,a0 
sw  a1,12(sp)  sw  
a1

[PATCH 1/2] [RISC-V] disable shrink-wrap-separate if zcmp enabled.

2023-05-06 Thread Fei Gao
zcmp aims to reduce code size, while shrink-wrap-separate prefers
speed to code size. So disable shrink-wrap-separate if zcmp
enabled, just like what save-restore has done.

author: Zhangjin Liao liaozhang...@eswincomputing.com

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_get_separate_components):
---
 gcc/config/riscv/riscv.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 45a63cab9c9..629e5e45cac 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5729,7 +5729,8 @@ riscv_get_separate_components (void)
 
   if (riscv_use_save_libcall (&cfun->machine->frame)
   || cfun->machine->interrupt_handler_p
-  || !cfun->machine->frame.gp_sp_offset.is_constant ())
+  || !cfun->machine->frame.gp_sp_offset.is_constant ()
+  || TARGET_ZCMP)
 return components;
 
   offset = cfun->machine->frame.gp_sp_offset.to_constant ();
-- 
2.17.1



[PATCH 2/2] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-05-06 Thread Fei Gao
Zcmp can share the same logic as save-restore in stack allocation: 
pre-allocation
by cm.push, step 1 and step 2.

please be noted cm.push pushes ra, s0-s11 in reverse order than what 
save-restore does.
So adaption has been done in .cfi directives in my patch.

gcc/ChangeLog:

* config/riscv/predicates.md (gpr_multi_push_operation): predicates for 
cm.push
* config/riscv/riscv-protos.h (riscv_output_gpr_multi_push_pop): 
declaration
(riscv_gpr_multi_push_operation_p): likewise
(riscv_gen_multi_push_insn):likewise
* config/riscv/riscv.cc (struct riscv_frame_info): add zcmp info
(GPR_SAVE_REG_ORDER_SKIP_T0T1): skip t0 t1
(riscv_output_gpr_multi_push_pop): output zcmp asm
(riscv_avoid_multi_push): helper function of riscv_use_multi_push
(riscv_use_multi_push): true if multi push is used
(riscv_multi_push_sregs_count): num of sregs in multi-push
(riscv_calculate_rlist): calculate rlist based on frame mask
(riscv_16bytes_align): align to 16 bytes
(riscv_stack_align): moved to a better place 
(riscv_save_libcall_count): no change
(riscv_compute_frame_info): add zcmp frame info
(riscv_adjust_multi_push_cfi_prologue): adjust cfi for cm.push
(riscv_expand_prologue): pre-allocate stack by cm.push
(riscv_adjust_multi_pop_cfi_epilogue): adjust cfi for cm.pop[ret]
(riscv_expand_epilogue): deallocate stack by cm.pop[ret]
(riscv_gen_multi_push_insn): genrate rtl for cm.push
(riscv_gpr_multi_push_operation_p): true if pattern matches cm.push
* config/riscv/riscv.h (RETURN_ADDR_MASK): mask of ra
(S0_MASK): likewise
(S1_MASK): likewise
(S2_MASK): likewise
(S3_MASK): likewise
(S4_MASK): likewise
(S5_MASK): likewise
(S6_MASK): likewise
(S7_MASK): likewise
(S8_MASK): likewise
(S9_MASK): likewise
(S10_MASK): likewise
(S11_MASK): likewise
(MULTI_PUSH_GPR_MASK): GPR_MASK that cm.push can cover at most
(ZCMP_MAX_SPIMM): max spimm value
(ZCMP_SP_INC_STEP): zcmp sp increment step
(ZCMP_MAX_RLIST): max rlist value
(ZCMP_MIN_RLIST): min rlist value
(ZCMP_INVALID_S0S10_SREGS_COUNTS): num of s0-s10
(ZCMP_S0S11_SREGS_COUNTS): num of s0-s11
(ZCMP_REG_LIST_RA_S0S11): rlist of ra,s0-s11
(ZCMP_RLIST_OFFSET_TO_SREGS_COUNTS): offset rlist to sregs num
* config/riscv/riscv.md: include zc.md
* config/riscv/zc.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: New test.
* gcc.target/riscv/rv32i_zcmp.c: New test.
* gcc.target/riscv/zcmp_stack_alignment.c: New test.
---
 gcc/config/riscv/predicates.md|   6 +
 gcc/config/riscv/riscv-protos.h   |   3 +
 gcc/config/riscv/riscv.cc | 400 --
 gcc/config/riscv/riscv.h  |  26 ++
 gcc/config/riscv/riscv.md |   7 +
 gcc/config/riscv/zc.md|  55 +++
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   | 239 +++
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   | 239 +++
 .../gcc.target/riscv/zcmp_stack_alignment.c   |  23 +
 9 files changed, 958 insertions(+), 40 deletions(-)
 create mode 100644 gcc/config/riscv/zc.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index e5adf06fa25..4d3d95f56f0 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -226,6 +226,12 @@
   return riscv_gpr_save_operation_p (op);
 })
 
+(define_special_predicate "gpr_multi_push_operation"
+  (match_code "parallel")
+{
+  return riscv_gpr_multi_push_operation_p (op);
+})
+
 ;; Predicates for the ZBS extension.
 (define_predicate "single_bit_mask_operand"
   (and (match_code "const_int")
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 7760a9cac8d..7d269d7e837 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -56,6 +56,9 @@ extern bool riscv_split_64bit_move_p (rtx, rtx);
 extern void riscv_split_doubleword_move (rtx, rtx);
 extern const char *riscv_output_move (rtx, rtx);
 extern const char *riscv_output_return ();
+extern void riscv_output_gpr_multi_push_pop (const char *, bool, rtx, rtx);
+extern bool riscv_gpr_multi_push_operation_p (rtx);
+extern rtx riscv_gen_multi_push_insn (struct riscv_frame_info *, int, int);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 629e5e45cac..98482e5eef1 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1

Re: Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.

2023-05-06 Thread Fei Gao
On 2023-05-05 23:57  Sinan  wrote:
>
>> hi Jiawei
>>
>> Please ignore my previous reply. I accidently sent the email before I 
>> finished it.
>> Sorry for that!
>>
>> I downloaded the series of patches from you and found in some cases
>> it fails to generate zcmp push and pop insns.
>>
>> TC:
>>
>> char my_getchar();
>> int test_s0()
>> {
>>
>> int a = my_getchar();
>> int b = my_getchar();
>> return a+b;
>> }
>>
>> cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e 
>> -mcmodel=medlow test.c
>>
>> -fno-shrink-wrap-separate is used here to avoid the impact from 
>> shrink-wrap-separate that is by default
>> enabled in O2.
>>
>> As i'm also interested in Zc*, i did some changes mainly in prologue and 
>> epilogue pass quite simliar to
>> what has been done for save and restore except the CFI directives due to 
>> reversed order that zcmp
>> pushes and pops ra, s regs than what save and restore do.
>>
>> I will refine and share the code soon for your review.
>>
>> BR
>> Fei
>Hi Fei,
>In the current implementation, cm.push will not increase the original 
>adjustment size of the stack pointer. As cm.push uses a minimum adjustment 
>size of 16, and in your example, the adjustment size of sp is 12, so cm.push 
>will not be generated.
>you can find the check at riscv_use_push_pop
>> > + */
>> > + if (base_size > frame_size)
>> > + return false;
>> > +
>And if this check is removed, then you can get the output that you expect.
>```
> cm.push {ra,s0},-16
> call my_getchar
> mv s0,a0
> call my_getchar
> add a0,s0,a0
> cm.popret {ra,s0},16
>```
>In many scenarios of rv32e, cm.push cannot be generated as a result. Perhaps 
>we can remove this check? I haven't tested if it is ok to remove this check, 
>and CC jiawei to help test it.
>BR,
>Sinan 

hi Sinan

Thanks for your reply. 
I posted my codes at 
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg306921.html
In the cover letter, i did some comparision. 
Could you please review?

Thanks & BR, 
Fei

>--
>Sender:Fei Gao 
>Sent At:2023 Apr. 25 (Tue.) 18:12
>Recipient:jiawei 
>Cc:gcc-patches 
>Subject:[PATCH 4/5] RISC-V: Add Zcmp extension supports.
>hi Jiawei
>Please ignore my previous reply. I accidently sent the email before I finished 
>it.
>Sorry for that!
>I downloaded the series of patches from you and found in some cases
>it fails to generate zcmp push and pop insns.
>TC:
>char my_getchar();
>int test_s0()
>{
> int a = my_getchar();
> int b = my_getchar();
> return a+b;
>}
>cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e 
>-mcmodel=medlow test.c
>-fno-shrink-wrap-separate is used here to avoid the impact from 
>shrink-wrap-separate that is by default
>enabled in O2.
>As i'm also interested in Zc*, i did some changes mainly in prologue and 
>epilogue pass quite simliar to
>what has been done for save and restore except the CFI directives due to 
>reversed order that zcmp
>pushes and pops ra, s regs than what save and restore do.
>I will refine and share the code soon for your review.
>BR
>Fei
>On Thu Apr 6 06:21:17 GMT 2023 Jiawei jia...@iscas.ac.cn wrote:
>>
>>Add Zcmp extension instructions support. Generate push/pop
>>with follow steps:
>>
>> 1. preprocessing:
>> 1.1. if there is no push rtx, then just return. e.g.
>> (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>> (plus:SI (reg/f:SI 2 sp)
>> (const_int -32 [0xffe0])))
>> (nil))
>> (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>> 1.2. if push rtx exists, then we compute the number of
>> pushed s-registers, n_sreg.
>>
>> push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>>
>> [2 and 3 happend simultaneously]
>>
>> 2. find valid move pattern, mv sN, aN, where N < n_sreg,
>> and aN is not used the move pattern, and sN is not
>> defined before the move pattern (from prologue to the
>> position of move pattern).
>>
>> 3. analysis use and reach of every instruction from prologue
>> to the position of move pattern.
>> if any sN is used, then we mark the corresponding argument list
>> candidate as invalid.
>> e.g.
>> push {ra,s0-s3}, {}, -32
>> sw s0,44(sp) # s0 is used, then argument list is invalid
>> mv a0,a5 # a0 is defined, then argument list is invalid
>> ...
>> mv s0,a0
>> mv s1,a1
>> mv s2,a2
>>
>> 4. if there is a valid argument list, then replace the pop
>> push parallel insn, and delete mv pattern.
>> if not, skip.
>>
>>All "zcmpe" means Zcmp with RVE extension.
>>The push/pop instrunction implement is mostly finished by Sinan Lin.
>>
>>Co-Authored by: Sinan Lin 
>>Co-Authored by: Simon Cook 
>>Co-Authored by: Shihua Liao 
>>
>>gcc/ChangeLog:
>>
>> * config.gcc: New object.
>> * config/riscv/predicates.md (riscv_stack_push_operation):
>> New predicate.
>> (riscv_stack_pop_operation): Ditto.
>> (pop_return_value_constant): Ditto.
>> * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): New pass.
>> * config/riscv/riscv-protos.h (riscv_output_popr

[PATCH] RISC-V: Optimize vsetvli of LCM INSERTED edge for user vsetvli [PR 109743]

2023-05-06 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch is fixing: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109743.

This issue happens is because we are currently very conservative in 
optimization of user vsetvli.

Consider this following case:

bb 1:
  vsetvli a5,a4... (demand AVL = a4).
bb 2:
  RVV insn use a5 (demand AVL = a5).

LCM will hoist vsetvl of bb 2 into bb 1.
We don't do AVL propagation for this situation since it's complicated that
we should analyze the code sequence between vsetvli in bb 1 and RVV insn in bb 
2.
They are not necessary the consecutive blocks.

This patch is doing the optimizations after LCM, we will check and eliminate 
the vsetvli
in LCM inserted edge if such vsetvli is redundant. Such approach is much 
simplier and safe.

code:
void
foo2 (int32_t *a, int32_t *b, int n)
{
  if (n <= 0)
  return;
  int i = n;
  size_t vl = __riscv_vsetvl_e32m1 (i);

  for (; i >= 0; i--)
  {
vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl);
__riscv_vse32_v_i32m1 (b, v, vl);

if (i >= vl)
  continue;

if (i == 0)
  return;

vl = __riscv_vsetvl_e32m1 (i);
  }
}

Before this patch:
foo2:
.LFB2:
.cfi_startproc
ble a2,zero,.L1
mv  a4,a2
li  a3,-1
vsetvli a5,a2,e32,m1,ta,mu
vsetvli zero,a5,e32,m1,ta,ma  <- can be eliminated.
.L5:
vle32.v v1,0(a0)
vse32.v v1,0(a1)
bgeua4,a5,.L3
.L10:
beq a2,zero,.L1
vsetvli a5,a4,e32,m1,ta,mu
addia4,a4,-1
vsetvli zero,a5,e32,m1,ta,ma  <- can be eliminated.
vle32.v v1,0(a0)
vse32.v v1,0(a1)
addiw   a2,a2,-1
bltua4,a5,.L10
.L3:
addiw   a2,a2,-1
addia4,a4,-1
bne a2,a3,.L5
.L1:
ret

After this patch:
f:
ble a2,zero,.L1
mv  a4,a2
li  a3,-1
vsetvli a5,a2,e32,m1,ta,ma
.L5:
vle32.v v1,0(a0)
vse32.v v1,0(a1)
bgeua4,a5,.L3
.L10:
beq a2,zero,.L1
vsetvli a5,a4,e32,m1,ta,ma
addia4,a4,-1
vle32.v v1,0(a0)
vse32.v v1,0(a1)
addiw   a2,a2,-1
bltua4,a5,.L10
.L3:
addiw   a2,a2,-1
addia4,a4,-1
bne a2,a3,.L5
.L1:
ret

PR target/109743

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::commit_vsetvls): Add 
optimization for LCM inserted edge.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr109743-1.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-3.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-4.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 42 +++
 .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c  | 26 
 .../gcc.target/riscv/rvv/vsetvl/pr109743-2.c  | 27 
 .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c  | 28 +
 .../gcc.target/riscv/rvv/vsetvl/pr109743-4.c  | 28 +
 5 files changed, 151 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index f55907a410e..fcee7fdf323 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3834,6 +3834,48 @@ pass_vsetvl::commit_vsetvls (void)
  const vector_insn_info *require
= m_vector_manager->vector_exprs[i];
  gcc_assert (require->valid_or_dirty_p ());
+
+ /* Here we optimize the VSETVL is hoisted by LCM:
+
+Before LCM:
+  bb 1:
+vsetvli a5,a2,e32,m1,ta,mu
+  bb 2:
+vsetvli zero,a5,e32,m1,ta,mu
+...
+
+After LCM:
+  bb 1:
+vsetvli a5,a2,e32,m1,ta,mu
+LCM INSERTED: vsetvli zero,a5,e32,m1,ta,mu --> eliminate
+  bb 2:
+...
+  */
+ const basic_block pred_cfg_bb = eg->src;
+ const auto block_info
+   = m_vector_manager->vector_block_infos[pred_cfg_bb->index];
+ const insn_info *pred_insn = block_info.reaching_out.get_insn ();
+ if (pred_insn && vsetvl_insn_p (pred_insn->rtl ())
+ && require->get_avl_source ()
+ && require->get_avl_source ()->insn ()
+ && require->skip_avl_compatible_p (block_info.reaching_out))
+   {
+ vector_insn_info new_info = *require;
+ new_info.set_avl_info (
+   block_info.reaching_out.get_avl_info ());
+ new_info
+ 

[PATCH] hurd: Add multilib paths for gnu-x86_64

2023-05-06 Thread Samuel Thibault via Gcc-patches
We need the multilib paths in gcc to find e.g. glibc crt files on
Debian.  This is essentially based on t-linux64 version.

gcc/ChangeLog:

* gcc/config/i386/t-gnu64: New file.
* gcc/config.gcc [x86_64-*-gnu*): Add i386/t-gnu64 to
tmake_file.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 671c7e3b018..6b1939b9f09 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -5828,6 +5828,9 @@ case ${target} in
visium-*-*)
target_cpu_default2="TARGET_CPU_$with_cpu"
;;
+   x86_64-*-gnu*)
+   tmake_file="$tmake_file i386/t-gnu64"
+   ;;
 esac
 
 t=
diff --git a/gcc/config/i386/t-gnu64 b/gcc/config/i386/t-gnu64
index e69de29bb2d..23ee6823d65 100644
--- a/gcc/config/i386/t-gnu64
+++ b/gcc/config/i386/t-gnu64
@@ -0,0 +1,38 @@
+# Copyright (C) 2002-2023 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# On Debian, Ubuntu and other derivative distributions, the 32bit libraries
+# are found in /lib32 and /usr/lib32, /lib64 and /usr/lib64 are symlinks to
+# /lib and /usr/lib, while other distributions install libraries into /lib64
+# and /usr/lib64.  The LSB does not enforce the use of /lib64 and /usr/lib64,
+# it doesn't tell anything about the 32bit libraries on those systems.  Set
+# MULTILIB_OSDIRNAMES according to what is found on the target.
+
+# To support i386, x86-64 and x32 libraries, the directory structrue
+# should be:
+#
+#  /lib has i386 libraries.
+#  /lib64 has x86-64 libraries.
+#  /libx32 has x32 libraries.
+#
+comma=,
+MULTILIB_OPTIONS= $(subst $(comma),/,$(TM_MULTILIB_CONFIG))
+MULTILIB_DIRNAMES   = $(patsubst m%, %, $(subst /, ,$(MULTILIB_OPTIONS)))
+MULTILIB_OSDIRNAMES = m64=../lib64$(call if_multiarch,:x86_64-gnu)
+MULTILIB_OSDIRNAMES+= m32=$(if $(wildcard $(shell echo 
$(SYSTEM_HEADER_DIR))/../../usr/lib32),../lib32,../lib)$(call 
if_multiarch,:i386-gnu)
+MULTILIB_OSDIRNAMES+= mx32=../libx32$(call if_multiarch,:x86_64-gnux32)


[PATCH] hurd: Ad default-pie and static-pie support

2023-05-06 Thread Samuel Thibault via Gcc-patches
This fixes the Hurd spec in the default-pie case, and adds static-pie
support.

gcc/ChangeLog:

* gcc/config/i386/gnu.h: Use PIE_SPEC, add static-pie case.
* gcc/config/i386/gnu64.h: Use PIE_SPEC, add static-pie case.

diff --git a/gcc/config/i386/gnu.h b/gcc/config/i386/gnu.h
index 8dc6d9ee4e3..e776144f96c 100644
--- a/gcc/config/i386/gnu.h
+++ b/gcc/config/i386/gnu.h
@@ -27,12 +27,12 @@ along with GCC.  If not, see .
 #undef STARTFILE_SPEC
 #if defined HAVE_LD_PIE
 #define STARTFILE_SPEC \
-  "%{!shared: 
%{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};pie:Scrt1.o%s;static:crt0.o%s;:crt1.o%s}}
 \
-   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
+  "%{!shared: 
%{pg|p|profile:%{static-pie:grcrt0.o%s;static:gcrt0.o%s;:gcrt1.o%s};static-pie:rcrt0.o%s;static:crt0.o%s;"
 PIE_SPEC ":Scrt1.o%s;:crt1.o%s}} \
+   crti.o%s %{static:crtbeginT.o%s;shared|static-pie|" PIE_SPEC 
":crtbeginS.o%s;:crtbegin.o%s}"
 #else
 #define STARTFILE_SPEC \
   "%{!shared: 
%{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};static:crt0.o%s;:crt1.o%s}} \
-   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
+   crti.o%s %{static:crtbeginT.o%s;shared:crtbeginS.o%s;:crtbegin.o%s}"
 #endif
 
 #ifdef TARGET_LIBC_PROVIDES_SSP
diff --git a/gcc/config/i386/gnu64.h b/gcc/config/i386/gnu64.h
index a411f0e802a..332372fa067 100644
--- a/gcc/config/i386/gnu64.h
+++ b/gcc/config/i386/gnu64.h
@@ -31,10 +31,10 @@ along with GCC.  If not, see .
 #undef STARTFILE_SPEC
 #if defined HAVE_LD_PIE
 #define STARTFILE_SPEC \
-  "%{!shared: 
%{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};pie:Scrt1.o%s;static:crt0.o%s;:crt1.o%s}}
 \
-   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
+  "%{!shared: 
%{pg|p|profile:%{static-pie:grcrt0.o%s;static:gcrt0.o%s;:gcrt1.o%s};static-pie:rcrt0.o%s;static:crt0.o%s;"
 PIE_SPEC ":Scrt1.o%s;:crt1.o%s}} \
+   crti.o%s %{static:crtbeginT.o%s;shared|static-pie|" PIE_SPEC 
":crtbeginS.o%s;:crtbegin.o%s}"
 #else
 #define STARTFILE_SPEC \
   "%{!shared: 
%{pg|p|profile:%{static:gcrt0.o%s;:gcrt1.o%s};static:crt0.o%s;:crt1.o%s}} \
-   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
+   crti.o%s %{static:crtbeginT.o%s;shared|static-pie|" PIE_SPEC 
":crtbeginS.o%s;:crtbegin.o%s}"
 #endif


Re: [PATCH v8] RISC-V: Add the 'zfa' extension, version 0.2.

2023-05-06 Thread jinma via Gcc-patches
> > > diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
> > > index 9b767038452..c81b08e3cc5 100644
> > > --- a/gcc/config/riscv/iterators.md
> > > +++ b/gcc/config/riscv/iterators.md
> > > @@ -288,3 +288,8 @@ (define_int_iterator QUIET_COMPARISON 
> > > [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET])
> > >   (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") 
> > > (UNSPEC_FLE_QUIET "le")])
> > >   (define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") 
> > > (UNSPEC_FLE_QUIET "LE")])
> > >   
> > > +(define_int_iterator ROUND [UNSPEC_ROUND UNSPEC_FLOOR UNSPEC_CEIL 
> > > UNSPEC_BTRUNC UNSPEC_ROUNDEVEN UNSPEC_NEARBYINT])
> > > +(define_int_attr round_pattern [(UNSPEC_ROUND "round") (UNSPEC_FLOOR 
> > > "floor") (UNSPEC_CEIL "ceil")
> > > + (UNSPEC_BTRUNC "btrunc") (UNSPEC_ROUNDEVEN 
> > > "roundeven") (UNSPEC_NEARBYINT "nearbyint")])
> > > +(define_int_attr round_rm [(UNSPEC_ROUND "rmm") (UNSPEC_FLOOR "rdn") 
> > > (UNSPEC_CEIL "rup")
> > > +(UNSPEC_BTRUNC "rtz") (UNSPEC_ROUNDEVEN "rne") 
> > > (UNSPEC_NEARBYINT "dyn")])
> > Do we really need to use unspecs for all these cases?  I would expect 
> > some correspond to the trunc, round, ceil, nearbyint, etc well known RTX 
> > codes.
> > 
> > In general, we should try to avoid unspecs when there is a clear 
> > semantic match between the instruction and GCC's RTX opcodes.  So please 
> > review the existing RTX code semantics to see if any match the new 
> > instructions.  If there are matches, use those RTX codes rather than 
> > UNSPECs.
> 
> I'll try, thanks.


I encountered some confusion about this. I checked gcc's documents and
found no RTX codes that can correspond to round, ceil, nearbyint, etc.
Only "(fix:m x)" seems to correspond to trunc, which can be expressed
as rounding towards zero, while others have not yet been found.


In addition, I found that other architectures also seem to adopt the
unspecs for all these cases  on the latest master branch.
arm: 
https://github.com/gcc-mirror/gcc/commit/1dd4fe1fd892458ce29f15f3ca95125a11b2534f#diff-159a39276c509272adfaeef91c2110f54f65c38f7fd1ab2f1e750af0a7f86377R1251
rs6000: 
https://github.com/gcc-mirror/gcc/commit/7042fe5ef83ff0585eb91144817105f26d566d4c#diff-1a2d4976d867ead4556899cab1dbb39f5069574276e06a2976fb62b771ece2e3R6995
i386: 
https://github.com/gcc-mirror/gcc/commit/3e8c4b925a9825fdb8c81f47b621f63108894362#diff-f00b14a8846eb6aaeb981077e36ac3668160d7dabb490beeb1f62792afa83281R23332

Can you give me some advice?

> > > @@ -1580,6 +1609,26 @@ (define_insn 
> > > "l2"
> > > [(set_attr "type" "fcvt")
> > >  (set_attr "mode" "")])
> > >   
> > > +(define_insn "2"
> > > +  [(set (match_operand:ANYF 0 "register_operand" "=f")
> > > + (unspec:ANYF
> > > + [(match_operand:ANYF 1 "register_operand" " f")]
> > > + ROUND))]
> > > +  "TARGET_HARD_FLOAT && TARGET_ZFA"
> > > +  "fround.\t%0,%1,"
> > > +  [(set_attr "type" "fcvt")
> > > +   (set_attr "mode" "")])
> > > +
> > > +(define_insn "rint2"
> > > +  [(set (match_operand:ANYF 0 "register_operand" "=f")
> > > + (unspec:ANYF
> > > + [(match_operand:ANYF 1 "register_operand" " f")]
> > > + UNSPEC_RINT))]
> > > +  "TARGET_HARD_FLOAT && TARGET_ZFA"
> > > +  "froundnx.\t%0,%1"
> > > +  [(set_attr "type" "fcvt")
> > > +   (set_attr "mode" "")])
> > Please review the existing RTX codes and their semantics in the 
> > internals manual and if any of the new instructions match those existing 
> > primitives, implement them using those RTX codes rather than with an UNSPEC.
> >
> 
> I'll try, thanks.
> 

thanks.

Jin Ma

[PATCH] Don't call emit_clobber in lower-subreg.cc's resolve_simple_move.

2023-05-06 Thread Roger Sayle

Following up on posts/reviews by Segher and Uros, there's some question
over why the middle-end's lower subreg pass emits a clobber (of a
multi-word register) into the instruction stream before emitting the
sequence of moves of the word-sized parts.  This clobber interferes
with (LRA) register allocation, preventing the multi-word pseudo to
remain in the same hard registers.  This patch eliminates this
(presumably superfluous) clobber and thereby improves register allocation.

A concrete example of the observed improvement is PR target/43644.
For the test case:
__int128 foo(__int128 x, __int128 y) { return x+y; }

on x86_64-pc-linux-gnu, gcc -O2 currently generates:

foo:movq%rsi, %rax
movq%rdi, %r8
movq%rax, %rdi
movq%rdx, %rax
movq%rcx, %rdx
addq%r8, %rax
adcq%rdi, %rdx
ret

with this patch, we now generate the much improved:

foo:movq%rdx, %rax
movq%rcx, %rdx
addq%rdi, %rax
adcq%rsi, %rdx
ret

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32} with
no new failures.  OK for mainline?


2023-05-06  Roger Sayle  

gcc/ChangeLog
PR target/43644
* lower-subreg.cc (resolve_simple_move): Don't emit a clobber
immediately before moving a multi-word register by parts.

gcc/testsuite/ChangeLog
PR target/43644
* gcc.target/i386/pr43644.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc
index 81fc5380..7c9cc3c 100644
--- a/gcc/lower-subreg.cc
+++ b/gcc/lower-subreg.cc
@@ -1086,9 +1086,6 @@ resolve_simple_move (rtx set, rtx_insn *insn)
 {
   unsigned int i;
 
-  if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest)))
-   emit_clobber (dest);
-
   for (i = 0; i < words; ++i)
{
  rtx t = simplify_gen_subreg_concatn (word_mode, dest,
diff --git a/gcc/testsuite/gcc.target/i386/pr43644.c 
b/gcc/testsuite/gcc.target/i386/pr43644.c
new file mode 100644
index 000..ffdf31c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr43644.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128 foo(__int128 x, __int128 y)
+{
+  return x+y;
+}
+
+/* { dg-final { scan-assembler-times "movq" 2 } } */
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */


Re: [PATCH V7] RISC-V: Enable basic RVV auto-vectorization support.

2023-05-06 Thread Kito Cheng via Gcc-patches
LGTM, thanks :)

於 2023年5月6日 週六,11:44寫道:

> From: Juzhe-Zhong 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (preferred_simd_mode): New function.
> * config/riscv/riscv-v.cc (autovec_use_vlmax_p): Ditto.
> (preferred_simd_mode): Ditto.
> * config/riscv/riscv.cc (riscv_get_arg_info): Handle RVV type in
> function arg.
> (riscv_convert_vector_bits): Adjust for RVV auto-vectorization.
> (riscv_preferred_simd_mode): New function.
> (TARGET_VECTORIZE_PREFERRED_SIMD_MODE): New target hook support.
> * config/riscv/vector.md: Add autovec.md.
> * config/riscv/autovec.md: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp: Add testcases for RVV
> auto-vectorization.
> * gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: New test.
> * gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c: New test.
> * gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: New test.
> * gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c: New
> test.
> * gcc.target/riscv/rvv/autovec/scalable-1.c: New test.
> * gcc.target/riscv/rvv/autovec/template-1.h: New test.
> * gcc.target/riscv/rvv/autovec/v-1.c: New test.
> * gcc.target/riscv/rvv/autovec/v-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32f-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32f-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32f-3.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32x-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32x-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32x-3.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64d-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64d-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64d-3.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64f-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64f-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64f-3.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64x-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64x-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64x-3.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   |  49 
>  gcc/config/riscv/riscv-protos.h   |   1 +
>  gcc/config/riscv/riscv-v.cc   |  51 +
>  gcc/config/riscv/riscv.cc |  31 -
>  gcc/config/riscv/vector.md|   4 +-
>  .../riscv/rvv/autovec/fixed-vlmax-1.c |  24 
>  .../rvv/autovec/partial/single_rgroup-1.c |   8 ++
>  .../rvv/autovec/partial/single_rgroup-1.h | 106 ++
>  .../rvv/autovec/partial/single_rgroup_run-1.c |  19 
>  .../gcc.target/riscv/rvv/autovec/scalable-1.c |  17 +++
>  .../gcc.target/riscv/rvv/autovec/template-1.h |  68 +++
>  .../gcc.target/riscv/rvv/autovec/v-1.c|  11 ++
>  .../gcc.target/riscv/rvv/autovec/v-2.c|   6 +
>  .../gcc.target/riscv/rvv/autovec/zve32f-1.c   |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve32f-2.c   |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve32f-3.c   |   6 +
>  .../riscv/rvv/autovec/zve32f_zvl128b-1.c  |   6 +
>  .../riscv/rvv/autovec/zve32f_zvl128b-2.c  |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve32x-1.c   |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve32x-2.c   |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve32x-3.c   |   6 +
>  .../riscv/rvv/autovec/zve32x_zvl128b-1.c  |   6 +
>  .../riscv/rvv/autovec/zve32x_zvl128b-2.c  |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve64d-1.c   |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve64d-2.c   |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve64d-3.c   |   6 +
>  .../riscv/rvv/autovec/zve64d_zvl128b-1.c  |   6 +
>  .../riscv/rvv/autovec/zve64d_zvl128b-2.c  |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve64f-1.c   |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve64f-2.c   |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve64f-3.c   |   6 +
>  .../riscv/rvv/autovec/zve64f_zvl128b-1.c  |   6 +
>  .../riscv/rvv/autovec/zve64f_zvl128b-2.c  |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve64x-1.c   |   6 +
>  .../gcc.

Re: [PATCH] hurd: Add multilib paths for gnu-x86_64

2023-05-06 Thread Samuel Thibault via Gcc-patches
(and it'd be useful to have it backported to the 13 branch)

Samuel Thibault, le sam. 06 mai 2023 13:50:36 +0200, a ecrit:
> We need the multilib paths in gcc to find e.g. glibc crt files on
> Debian.  This is essentially based on t-linux64 version.
> 
> gcc/ChangeLog:
> 
>   * gcc/config/i386/t-gnu64: New file.
>   * gcc/config.gcc [x86_64-*-gnu*): Add i386/t-gnu64 to
>   tmake_file.
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 671c7e3b018..6b1939b9f09 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -5828,6 +5828,9 @@ case ${target} in
>   visium-*-*)
>   target_cpu_default2="TARGET_CPU_$with_cpu"
>   ;;
> + x86_64-*-gnu*)
> + tmake_file="$tmake_file i386/t-gnu64"
> + ;;
>  esac
>  
>  t=
> diff --git a/gcc/config/i386/t-gnu64 b/gcc/config/i386/t-gnu64
> index e69de29bb2d..23ee6823d65 100644
> --- a/gcc/config/i386/t-gnu64
> +++ b/gcc/config/i386/t-gnu64
> @@ -0,0 +1,38 @@
> +# Copyright (C) 2002-2023 Free Software Foundation, Inc.
> +#
> +# This file is part of GCC.
> +#
> +# GCC is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3, or (at your option)
> +# any later version.
> +#
> +# GCC is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# .
> +
> +# On Debian, Ubuntu and other derivative distributions, the 32bit libraries
> +# are found in /lib32 and /usr/lib32, /lib64 and /usr/lib64 are symlinks to
> +# /lib and /usr/lib, while other distributions install libraries into /lib64
> +# and /usr/lib64.  The LSB does not enforce the use of /lib64 and /usr/lib64,
> +# it doesn't tell anything about the 32bit libraries on those systems.  Set
> +# MULTILIB_OSDIRNAMES according to what is found on the target.
> +
> +# To support i386, x86-64 and x32 libraries, the directory structrue
> +# should be:
> +#
> +#/lib has i386 libraries.
> +#/lib64 has x86-64 libraries.
> +#/libx32 has x32 libraries.
> +#
> +comma=,
> +MULTILIB_OPTIONS= $(subst $(comma),/,$(TM_MULTILIB_CONFIG))
> +MULTILIB_DIRNAMES   = $(patsubst m%, %, $(subst /, ,$(MULTILIB_OPTIONS)))
> +MULTILIB_OSDIRNAMES = m64=../lib64$(call if_multiarch,:x86_64-gnu)
> +MULTILIB_OSDIRNAMES+= m32=$(if $(wildcard $(shell echo 
> $(SYSTEM_HEADER_DIR))/../../usr/lib32),../lib32,../lib)$(call 
> if_multiarch,:i386-gnu)
> +MULTILIB_OSDIRNAMES+= mx32=../libx32$(call if_multiarch,:x86_64-gnux32)

-- 
Samuel
---
Pour une évaluation indépendante, transparente et rigoureuse !
Je soutiens la Commission d'Évaluation de l'Inria.


[x86_64 PATCH] Introduce insvti_highpart define_insn_and_split.

2023-05-06 Thread Roger Sayle

Hi Uros,
This is a repost/respin of a patch that was conditionally approved:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609470.html

This patch adds a convenient post-reload splitter for setting/updating
the highpart of a TImode variable, using i386's previously added
split_double_concat infrastructure.

For the new test case below:

__int128 foo(__int128 x, unsigned long long y)
{
  __int128 t = (__int128)y << 64;
  __int128 r = (x & ~0ull) | t;
  return r;
}

mainline GCC with -O2 currently generates:

foo:movq%rdi, %rcx
xorl%eax, %eax
xorl%edi, %edi
orq %rcx, %rax
orq %rdi, %rdx
ret

with this patch, GCC instead now generates the much better:

foo:movq%rdi, %rcx
movq%rcx, %rax
ret

It turns out that the -m32 equivalent of this testcase, already
avoids using explict orl/xor instructions, as it gets optimized
(in combine) by a completely different path.  Given that this idiom
isn't seen in 32-bit code (so this pattern doesn't match with -m32),
and also that the shorter 32-bit AND bitmask is represented as a
CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split
is implemented for just TARGET_64BIT rather than contort a "generic"
implementation using DWI mode iterators.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline now that we're back in stage 1?


2023-05-06  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.md (any_or_plus): Move definition earlier.
(*insvti_highpart_1): New define_insn_and_split to overwrite
(insv) the highpart of a TImode register/memory.

gcc/testsuite/ChangeLog
* gcc.target/i386/insvti_highpart-1.c: New test case.


Thanks again,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d49f1cd..62cafe7 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3479,6 +3479,31 @@
   "mov{b}\t{%h1, %h0|%h0, %h1}"
   [(set_attr "type" "imov")
(set_attr "mode" "QI")])
+
+(define_code_iterator any_or_plus [plus ior xor])
+(define_insn_and_split "*insvti_highpart_1"
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=ro,r,r,&r")
+   (any_or_plus:TI
+ (and:TI
+   (match_operand:TI 1 "nonimmediate_operand" "r,m,r,m")
+   (match_operand:TI 3 "const_scalar_int_operand" "n,n,n,n"))
+ (ashift:TI
+   (zero_extend:TI
+ (match_operand:DI 2 "nonimmediate_operand" "r,r,m,m"))
+   (const_int 64]
+  "TARGET_64BIT
+   && CONST_WIDE_INT_P (operands[3])
+   && CONST_WIDE_INT_NUNITS (operands[3]) == 2
+   && CONST_WIDE_INT_ELT (operands[3], 0) == -1
+   && CONST_WIDE_INT_ELT (operands[3], 1) == 0"
+  "#"
+  "&& reload_completed"
+  [(clobber (const_int 0))]
+{
+  operands[4] = gen_lowpart (DImode, operands[1]);
+  split_double_concat (TImode, operands[0], operands[4], operands[2]);
+  DONE;
+})
 
 ;; Floating point push instructions.
 
@@ -11573,7 +11598,6 @@
(set_attr "mode" "QI")])
 
 ;; Split DST = (HI<<32)|LO early to minimize register usage.
-(define_code_iterator any_or_plus [plus ior xor])
 (define_insn_and_split "*concat3_1"
   [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
(any_or_plus:
diff --git a/gcc/testsuite/gcc.target/i386/insvti_highpart-1.c 
b/gcc/testsuite/gcc.target/i386/insvti_highpart-1.c
new file mode 100644
index 000..4ae9ccf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/insvti_highpart-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128 foo(__int128 x, unsigned long long y)
+{
+  __int128 t = (__int128)y << 64;
+  __int128 r = (x & ~0ull) | t;
+  return r;
+}
+
+/* { dg-final { scan-assembler-not "xorl" } } */
+/* { dg-final { scan-assembler-not "orq" } } */


Re: [PATCH] RISC-V: Fix CTZ unnecessary sign extension [PR #106888]

2023-05-06 Thread Jeff Law via Gcc-patches




On 5/4/23 11:14, Raphael Moreira Zinsly wrote:

We were not able to match the CTZ sign extend pattern on RISC-V
because it get optimized to zero extend and/or to ANDI patterns.
For the ANDI case, combine scrambles the RTL and generates the
extension by using subregs.

So to provide a few more details here.

Coming into combine we have:

(insn 2 4 3 2 (set (reg/v:DI 136 [ i ])
(reg:DI 10 a0 [ i ])) "j.c":3:1 179 {*movdi_64bit}
 (expr_list:REG_DEAD (reg:DI 10 a0 [ i ])
(nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:SI 137)
(ctz:SI (subreg/u:SI (reg/v:DI 136 [ i ]) 0))) "j.c":4:13 345 {*ctzsi2}
 (expr_list:REG_DEAD (reg/v:DI 136 [ i ])
(nil)))
(insn 7 6 12 2 (set (reg/v:DI 135 [  ])
(sign_extend:DI (reg:SI 137))) "j.c":4:13 116 {extendsidi2}
 (expr_list:REG_DEAD (reg:SI 137)
(nil)))



The first key being we're starting with an SImode CTZ.  So we have an 
extension on the result and the original argument is in DImode, so we 
have a subreg to get the input into SImode.  That allows us to match the 
standard ctzw pattern.


Of course we know the result of the ctz is in the range 0..32 because 
it's an SImode operand. Thus the extension is redundant and we'd like to 
remove it.


Even though insn 7 is a SIGN_EXTEND, combine knows the SImode sign bit 
is always zero.  As a result it'll canonicalize to ZERO_EXTEND (there's 
a larger discussion around that in the context of aarch64 that I'm not 
going to wade into at the moment).


So combine ultimately tries to match this:



Trying 6 -> 7:
6: r137:SI=ctz(r139:DI#0)
  REG_DEAD r139:DI
7: r135:DI=sign_extend(r137:SI)
  REG_DEAD r137:SI
Successfully matched this instruction:
(set (reg/v:DI 135 [  ])
(and:DI (subreg:DI (ctz:SI (subreg/u:SI (reg:DI 139) 0)) 0)
(const_int 127 [0x7f])))


The inner subreg is (of course) still there and must remain so that we 
can continue to distinguish between an SI and DI mode ctz which generate 
different assembler codes on riscv.


Combine has turned the zero extension into a masking operation.  Of 
course the masking operation has to happen in DImode hence new  subreg 
wrapping the result of the ctz so that it can be used in a DImode operation.









gcc/ChangeLog:
PR target/106888
* config/riscv/bitmanip.md
(disi2): Match with any_extend.
(disi2_sext): New pattern to match
with sign extend using an ANDI instruction.

gcc/testsuite/ChangeLog:
PR target/106888
* gcc.target/riscv/pr106888.c: New test.
* gcc.target/riscv/zbbw.c: Check for ANDI.
---
  gcc/config/riscv/bitmanip.md  | 14 +-
  gcc/testsuite/gcc.target/riscv/pr106888.c | 12 
  gcc/testsuite/gcc.target/riscv/zbbw.c |  1 +
  3 files changed, 26 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/pr106888.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index a27fc3e34a1..8dc3e85a338 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -246,13 +246,25 @@
  
  (define_insn "*disi2"

[(set (match_operand:DI 0 "register_operand" "=r")
-(sign_extend:DI
+(any_extend:DI
(clz_ctz_pcnt:SI (match_operand:SI 1 "register_operand" "r"]
"TARGET_64BIT && TARGET_ZBB"
"w\t%0,%1"
[(set_attr "type" "")
 (set_attr "mode" "SI")])
  
+;; A SImode clz_ctz_pcnt may be extended to DImode via subreg.

+(define_insn "*disi2_sext"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(and:DI (subreg:DI
+  (clz_ctz_pcnt:SI (subreg:SI
+ (match_operand:DI 1 "register_operand" "r") 0)) 0)
+  (match_operand:DI 2 "const_int_operand")))]
+  "TARGET_64BIT && TARGET_ZBB && ((INTVAL (operands[2]) & 0x3f) == 0x3f)"
+  "w\t%0,%1"
+  [(set_attr "type" "bitmanip")
+   (set_attr "mode" "SI")])
Looking at this again after a few months away, I'm pretty sure we can 
eliminate that explicit (subreg:SI ...)).  Instead just use 
(match_operand:SI ...) just like the existing pattern already did.


So the pattern just needs the (subreg:DI ...) on the result to allow us 
to mask in DImode...  So something like this:



;; A SImode clz_ctz_pcnt may be extended to DImode via subreg.
(define_insn "*disi2_sext"
  [(set (match_operand:DI 0 "register_operand" "=r")
(and:DI (subreg:DI
  (clz_ctz_pcnt:SI (match_operand:SI 1 "register_operand" "r")) 0)
  (match_operand:DI 2 "const_int_operand")))]
  "TARGET_64BIT && TARGET_ZBB && ((INTVAL (operands[2]) & 0x3f) == 0x3f)"
  "w\t%0,%1"
  [(set_attr "type" "bitmanip")
   (set_attr "mode" "SI")])



I lightly tested this locally and it seems to work just as well as your 
original, but is slightly simpler and avoids the explicit subreg.


OK with that change.

jeff


Re: [PATCH V7] RISC-V: Enable basic RVV auto-vectorization support.

2023-05-06 Thread Jeff Law via Gcc-patches




On 5/5/23 21:43, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

gcc/ChangeLog:

 * config/riscv/riscv-protos.h (preferred_simd_mode): New function.
 * config/riscv/riscv-v.cc (autovec_use_vlmax_p): Ditto.
 (preferred_simd_mode): Ditto.
 * config/riscv/riscv.cc (riscv_get_arg_info): Handle RVV type in 
function arg.
 (riscv_convert_vector_bits): Adjust for RVV auto-vectorization.
 (riscv_preferred_simd_mode): New function.
 (TARGET_VECTORIZE_PREFERRED_SIMD_MODE): New target hook support.
 * config/riscv/vector.md: Add autovec.md.
 * config/riscv/autovec.md: New file.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/rvv.exp: Add testcases for RVV 
auto-vectorization.
 * gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: New test.
 * gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c: New test.
 * gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: New test.
 * gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c: New test.
 * gcc.target/riscv/rvv/autovec/scalable-1.c: New test.
 * gcc.target/riscv/rvv/autovec/template-1.h: New test.
 * gcc.target/riscv/rvv/autovec/v-1.c: New test.
 * gcc.target/riscv/rvv/autovec/v-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32f-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32f-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32f-3.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32x-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32x-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32x-3.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64d-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64d-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64d-3.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64f-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64f-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64f-3.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64x-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64x-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64x-3.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: New test.
I went ahead and committed this.  Hopefully we can get things unblocked 
for everyone this weekend :-)


jeff


[PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic.

2023-05-06 Thread Roger Sayle
 

This patch adds support for (a pair of) bit reversal intrinsics

__builtin_nvptx_brev and __builtin_nvptx_brevll which perform 32-bit

and 64-bit bit reversal (using nvptx's brev instruction) matching

the __brev and __brevll instrinsics provided by NVidia's nvcc compiler.

https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT
.html

 

This patch has been tested on nvptx-none which make and make -k check

with no new failures.  Ok for mainline?

 

 

2023-05-06  Roger Sayle  

 

gcc/ChangeLog

* config/nvptx/nvptx.cc (nvptx_expand_brev): Expand target

builtin for bit reversal using brev instruction.

(enum nvptx_builtins): Add NVPTX_BUILTIN_BREV and

NVPTX_BUILTIN_BREVLL.

(nvptx_init_builtins): Define "brev" and "brevll".

(nvptx_expand_builtin): Expand NVPTX_BUILTIN_BREV and

NVPTX_BUILTIN_BREVLL via nvptx_expand_brev function.

* doc/extend.texi (Nvidia PTX Builtin-in Functions): New

section, document __builtin_nvptx_brev{,ll}.

 

gcc/testsuite/ChangeLog

* gcc.target/nvptx/brev-1.c: New 32-bit test case.

* gcc.target/nvptx/brev-2.c: Likewise.

* gcc.target/nvptx/brevll-1.c: New 64-bit test case.

* gcc.target/nvptx/brevll-2.c: Likewise.

 

 

Thanks in advance,

Roger

--

 

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 89349da..1b99fca 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -6047,6 +6047,29 @@ nvptx_expand_shuffle (tree exp, rtx target, machine_mode 
mode, int ignore)
   return target;
 }
 
+/* Expander for the bit reverse builtins.  */
+
+static rtx
+nvptx_expand_brev (tree exp, rtx target, machine_mode mode, int ignore)
+{
+  if (ignore)
+return target;
+  
+  rtx arg = expand_expr (CALL_EXPR_ARG (exp, 0),
+NULL_RTX, mode, EXPAND_NORMAL);
+  if (!REG_P (arg))
+arg = copy_to_mode_reg (mode, arg);
+  if (!target)
+target = gen_reg_rtx (mode);
+  rtx pat;
+  if (mode == SImode)
+pat = gen_bitrevsi2 (target, arg);
+  else
+pat = gen_bitrevdi2 (target, arg);
+  emit_insn (pat);
+  return target;
+}
+
 const char *
 nvptx_output_red_partition (rtx dst, rtx offset)
 {
@@ -6164,6 +6187,8 @@ enum nvptx_builtins
   NVPTX_BUILTIN_BAR_RED_AND,
   NVPTX_BUILTIN_BAR_RED_OR,
   NVPTX_BUILTIN_BAR_RED_POPC,
+  NVPTX_BUILTIN_BREV,
+  NVPTX_BUILTIN_BREVLL,
   NVPTX_BUILTIN_MAX
 };
 
@@ -6292,6 +6317,9 @@ nvptx_init_builtins (void)
   DEF (BAR_RED_POPC, "bar_red_popc",
(UINT, UINT, UINT, UINT, UINT, NULL_TREE));
 
+  DEF (BREV, "brev", (UINT, UINT, NULL_TREE));
+  DEF (BREVLL, "brevll", (LLUINT, LLUINT, NULL_TREE));
+
 #undef DEF
 #undef ST
 #undef UINT
@@ -6339,6 +6367,10 @@ nvptx_expand_builtin (tree exp, rtx target, rtx 
ARG_UNUSED (subtarget),
 case NVPTX_BUILTIN_BAR_RED_POPC:
   return nvptx_expand_bar_red (exp, target, mode, ignore);
 
+case NVPTX_BUILTIN_BREV:
+case NVPTX_BUILTIN_BREVLL:
+  return nvptx_expand_brev (exp, target, mode, ignore);
+
 default: gcc_unreachable ();
 }
 }
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ac47680..871f0cf 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -14682,6 +14682,7 @@ instructions, but allow the compiler to schedule those 
calls.
 * Other MIPS Built-in Functions::
 * MSP430 Built-in Functions::
 * NDS32 Built-in Functions::
+* Nvidia PTX Built-in Functions::
 * Basic PowerPC Built-in Functions::
 * PowerPC AltiVec/VSX Built-in Functions::
 * PowerPC Hardware Transactional Memory Built-in Functions::
@@ -17941,6 +17942,20 @@ Enable global interrupt.
 Disable global interrupt.
 @enddefbuiltin
 
+@node Nvidia PTX Built-in Functions
+@subsection Nvidia PTX Built-in Functions
+
+These built-in functions are available for the Nvidia PTX target:
+
+@defbuiltin{unsigned int __builtin_nvptx_brev (unsigned int @var{x})}
+Reverse the bit order of a 32-bit unsigned integer.
+Disable global interrupt.
+@enddefbuiltin
+
+@defbuiltin{unsigned long long __builtin_nvptx_brevll (unsigned long long 
@var{x})}
+Reverse the bit order of a 64-bit unsigned integer.
+@enddefbuiltin
+
 @node Basic PowerPC Built-in Functions
 @subsection Basic PowerPC Built-in Functions
 
diff --git a/gcc/testsuite/gcc.target/nvptx/brev-1.c 
b/gcc/testsuite/gcc.target/nvptx/brev-1.c
new file mode 100644
index 000..fbb4fff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/brev-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+unsigned int foo(unsigned int x)
+{
+  return __builtin_nvptx_brev(x);
+}
+
+/* { dg-final { scan-assembler "brev.b32" } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/brev-2.c 
b/gcc/testsuite/gcc.target/nvptx/brev-2.c
new file mode 100644
index 000..9d0defe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/brev-2.c
@@ -0,0 +1,94 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+unsigned int bitreverse32(unsigned int x)
+{
+  return __builtin_nvptx

[PATCH] Add RTX codes for BITREVERSE and COPYSIGN.

2023-05-06 Thread Roger Sayle

An analysis of backend UNSPECs reveals that two of the most common UNSPECs
across target backends are for copysign and bit reversal.  This patch
adds RTX codes for these expressions to allow their representation to
be standardized, and them to optimized by the middle-end RTL optimizers.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-32} with
no new failures.  Ok for mainline?


2023-05-06  Roger Sayle  

gcc/ChangeLog
* doc/rtl.texi (bitreverse, copysign): Document new RTX codes.
* rtl.def (BITREVERSE, COPYSIGN): Define new RTX codes.
* simplify-rtx.cc (simplify_unary_operation_1): Optimize
NOT (BITREVERSE x) as BITREVERSE (NOT x).
Optimize POPCOUNT (BITREVERSE x) as POPCOUNT x.
Optimize PARITY (BITREVERSE x) as PARITY x.
Optimize BITREVERSE (BITREVERSE x) as x.
(simplify_const_unary_operation) : Evaluate
BITREVERSE of a constant integer at compile-time.
(simplify_binary_operation_1) :  Optimize
COPY_SIGN (x, x) as x.  Optimize COPYSIGN (x, C) as ABS x
or NEG (ABS x) for constant C.  Optimize COPYSIGN (ABS x, y)
and COPYSIGN (NEG x, y) as COPYSIGN (x, y).  Optimize
COPYSIGN (x, ABS y) as ABS x.
Optimize COPYSIGN (COPYSIGN (x, y), z) as COPYSIGN (x, z).
Optimize COPYSIGN (x, COPYSIGN (y, z)) as COPYSIGN (x, z).
(simplify_const_binary_operation): Evaluate COPYSIGN of constant
arguments at compile-time.
* wide-int.cc (wide_int_storage::bitreverse): Provide a
wide_int implementation, based upon bswap implementation.
* wide-int.g (wide_int_storage::bitreverse): Prototype here.


Thanks in advance,
Roger
--

diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 1de2494..76aeafb 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -2742,6 +2742,17 @@ integer of mode @var{m}.  The mode of @var{x} must be 
@var{m} or
 Represents the value @var{x} with the order of bytes reversed, carried out
 in mode @var{m}, which must be a fixed-point machine mode.
 The mode of @var{x} must be @var{m} or @code{VOIDmode}.
+
+@findex bitreverse
+@item (bitreverse:@var{m} @var{x})
+Represents the value @var{x} with the order of bits reversed, carried out
+in mode @var{m}, which must be a fixed-point machine mode.
+The mode of @var{x} must be @var{m} or @code{VOIDmode}.
+
+@findex copysign
+@item (copysign:@var{m} @var{x} @var{y})
+Represents the value @var{x} with the sign of @var{y}.
+Both @var{x} and @var{y} must have floating point machine mode @var{m}.
 @end table
 
 @node Comparisons
diff --git a/gcc/rtl.def b/gcc/rtl.def
index 6ddbce3..88e2b19 100644
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@@ -664,6 +664,9 @@ DEF_RTL_EXPR(POPCOUNT, "popcount", "e", RTX_UNARY)
 /* Population parity (number of 1 bits modulo 2).  */
 DEF_RTL_EXPR(PARITY, "parity", "e", RTX_UNARY)
 
+/* Reverse bits.  */
+DEF_RTL_EXPR(BITREVERSE, "bitreverse", "e", RTX_UNARY)
+
 /* Reference to a signed bit-field of specified size and position.
Operand 0 is the memory unit (usually SImode or QImode) which
contains the field's first bit.  Operand 1 is the width, in bits.
@@ -753,6 +756,9 @@ DEF_RTL_EXPR(US_TRUNCATE, "us_truncate", "e", RTX_UNARY)
 /* Floating point multiply/add combined instruction.  */
 DEF_RTL_EXPR(FMA, "fma", "eee", RTX_TERNARY)
 
+/* Floating point copysign.  Operand 0 with the sign of operand 1.  */
+DEF_RTL_EXPR(COPYSIGN, "copysign", "ee", RTX_BIN_ARITH)
+
 /* Information about the variable and its location.  */
 DEF_RTL_EXPR(VAR_LOCATION, "var_location", "te", RTX_EXTRA)
 
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index d4aeebc..26fa2b9 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -1040,10 +1040,10 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
code, machine_mode mode,
}
 
   /* (not (bswap x)) -> (bswap (not x)).  */
-  if (GET_CODE (op) == BSWAP)
+  if (GET_CODE (op) == BSWAP || GET_CODE (op) == BITREVERSE)
{
  rtx x = simplify_gen_unary (NOT, mode, XEXP (op, 0), mode);
- return simplify_gen_unary (BSWAP, mode, x, mode);
+ return simplify_gen_unary (GET_CODE (op), mode, x, mode);
}
   break;
 
@@ -1419,6 +1419,7 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
code, machine_mode mode,
   switch (GET_CODE (op))
{
case BSWAP:
+   case BITREVERSE:
  /* (popcount (bswap )) = (popcount ).  */
  return simplify_gen_unary (POPCOUNT, mode, XEXP (op, 0),
 GET_MODE (XEXP (op, 0)));
@@ -1448,6 +1449,7 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
code, machine_mode mode,
{
case NOT:
case BSWAP:
+   case BITREVERSE:
  return simplify_gen_unary (PARITY, mode, XEXP (op, 0),
 GET_MODE (XEXP (op, 0)));
 
@@ -1481,6 +1483,12 @

Re: [PATCH v6 1/9] RISC-V: autovec: Add new predicates and function prototypes

2023-05-06 Thread Jeff Law via Gcc-patches




On 5/5/23 09:45, Michael Collison wrote:

2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-protos.h
(riscv_vector_preferred_simd_mode): New.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(emit_vlmax_vsetvl): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(vlmul_field_enum): Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
Remove static scope.
* config/riscv/riscv-opts.h (riscv_vector_lmul_enum): New enum.
---
  gcc/config/riscv/riscv-opts.h   | 10 ++
  gcc/config/riscv/riscv-protos.h |  9 +
  2 files changed, 19 insertions(+)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 4207db240ea..00c4ab222ae 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,7 @@ enum stack_protector_guard {
SSP_GLOBAL  /* global canary */
  };
  
+

  /* RISC-V auto-vectorization preference.  */
  enum riscv_autovec_preference_enum {
NO_AUTOVEC,

Extranous change.  Removed.


@@ -82,6 +83,15 @@ enum riscv_autovec_lmul_enum {
RVV_M8 = 8
  };
  
+/* vectorization factor.  */

+enum riscv_vector_lmul_enum
+{
+  RVV_LMUL1 = 1,
+  RVV_LMUL2 = 2,
+  RVV_LMUL4 = 4,
+  RVV_LMUL8 = 8
+};
+
  #define MASK_ZICSR(1 << 0)
  #define MASK_ZIFENCEI (1 << 1)
  
I ack'd this hunk earlier, but Kito asked for it to be removed.  Given I 
don't see any uses of LMUL in the series, I'm just going to remove this 
for now.  We can always add it back at the point where we need it.




diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 33eb574aadc..fb39b856735 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -243,4 +243,13 @@ th_mempair_output_move (rtx[4], bool, machine_mode, 
RTX_CODE);
  #endif
  
  extern bool riscv_use_divmod_expander (void);

+/* Routines implemented in riscv-v.cc.  */
+
+namespace riscv_vector {
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode);

This prototype is on the trunk now.


+extern bool riscv_vector_mask_mode_p (machine_mode);
+extern opt_machine_mode riscv_vector_get_mask_mode (machine_mode mode);
+extern rtx get_mask_policy_no_pred ();
+extern rtx get_tail_policy_no_pred ();
I'll go ahead and commit these.  I think that's all that's left from 
this patch.  Going forward, the right time to add the prototypes is in 
the same patch that adds the function.




Jeff




Pushed: [PATCH] build: Use -nostdinc generating macro_list [PR109522]

2023-05-06 Thread Xi Ruoyao via Gcc-patches
On Sat, 2023-04-29 at 12:05 -0600, Jeff Law wrote:
> 
> 
> On 4/15/23 06:01, Xi Ruoyao via Gcc-patches wrote:
> > This prevents a spurious message building a cross-compiler when
> > target
> > libc is not installed yet:
> > 
> >  cc1: error: no include path in which to search for stdc-
> > predef.h
> > 
> > As stdc-predef.h was added to define __STDC_* macros by libc, it's
> > unlikely the header will ever contain some bad definitions w/o "__"
> > prefix so it should be safe.
> > 
> > gcc/ChangeLog:
> > 
> > PR other/109522
> > * Makefile.in (s-macro_list): Pass -nostdinc to
> > $(GCC_FOR_TARGET).
> OK.  Thanks.
> 
> jeff

Pushed r14-544.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Pushed: [PATCH v2] LoongArch: Enable shrink wrapping

2023-05-06 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-04-26 at 21:29 +0800, Xi Ruoyao via Gcc-patches wrote:
> > 
> >    Do you have any questions about the test cases mentioned by
> > Guo
> > Jie? If there is no problem, modify the test case,
> > 
> > I think the code can be merged into the main branch.
> 
> I'll rewrite the test and commit in a few days (now I'm occupied with
> something :( ).

The patch has been pushed as the following (with test updated). 
Unfortunately I forgot to modify the change log to include the SPEC
result and the change for test case :(.

-- >8 --

>From d90eed13ae655fbb4adb173fdae392b082e82a56 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Sun, 23 Apr 2023 20:52:22 +0800
Subject: [PATCH] LoongArch: Enable shrink wrapping

This commit implements the target macros for shrink wrapping of function
prologues/epilogues shrink wrapping on LoongArch.

Bootstrapped and regtested on loongarch64-linux-gnu.  I don't have an
access to SPEC CPU so I hope the reviewer can perform a benchmark to see
if there is real benefit.

gcc/ChangeLog:

* config/loongarch/loongarch.h (struct machine_function): Add
reg_is_wrapped_separately array for register wrapping
information.
* config/loongarch/loongarch.cc
(loongarch_get_separate_components): New function.
(loongarch_components_for_bb): Likewise.
(loongarch_disqualify_components): Likewise.
(loongarch_process_components): Likewise.
(loongarch_emit_prologue_components): Likewise.
(loongarch_emit_epilogue_components): Likewise.
(loongarch_set_handled_components): Likewise.
(TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define.
(TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise.
(TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise.
(loongarch_for_each_saved_reg): Skip registers that are wrapped
separately.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/shrink-wrap.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 179 +-
 gcc/config/loongarch/loongarch.h  |   2 +
 .../gcc.target/loongarch/shrink-wrap.c|  19 ++
 3 files changed, 197 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/shrink-wrap.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index d808cb3a5ae..7f4e0e59573 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "rtl-iter.h"
 #include "opts.h"
+#include "function-abi.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -1017,19 +1018,23 @@ loongarch_for_each_saved_reg (HOST_WIDE_INT sp_offset,
   for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
 if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
   {
-   loongarch_save_restore_reg (word_mode, regno, offset, fn);
+   if (!cfun->machine->reg_is_wrapped_separately[regno])
+ loongarch_save_restore_reg (word_mode, regno, offset, fn);
+
offset -= UNITS_PER_WORD;
   }
 
   /* This loop must iterate over the same space as its companion in
  loongarch_compute_frame_info.  */
   offset = cfun->machine->frame.fp_sp_offset - sp_offset;
+  machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode;
+
   for (int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
 if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST))
   {
-   machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode;
+   if (!cfun->machine->reg_is_wrapped_separately[regno])
+ loongarch_save_restore_reg (word_mode, regno, offset, fn);
 
-   loongarch_save_restore_reg (mode, regno, offset, fn);
offset -= GET_MODE_SIZE (mode);
   }
 }
@@ -6633,6 +6638,151 @@ loongarch_asan_shadow_offset (void)
   return TARGET_64BIT ? (HOST_WIDE_INT_1 << 46) : 0;
 }
 
+static sbitmap
+loongarch_get_separate_components (void)
+{
+  HOST_WIDE_INT offset;
+  sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
+  bitmap_clear (components);
+  offset = cfun->machine->frame.gp_sp_offset;
+
+  /* The stack should be aligned to 16-bytes boundary, so we can make the use
+ of ldptr instructions.  */
+  gcc_assert (offset % UNITS_PER_WORD == 0);
+
+  for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
+if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
+  {
+   /* We can wrap general registers saved at [sp, sp + 32768) using the
+  ldptr/stptr instructions.  For large offsets a pseudo register
+  might be needed which cannot be created during the shrink
+  wrapping pass.
+
+  TODO: This may need a revi

Re: [PATCH] LoongArch: Enable shrink wrapping

2023-05-06 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-04-26 at 18:21 +0800, WANG Xuerui wrote:
> On 2023/4/26 18:14, Lulu Cheng wrote:
> > 
> > 在 2023/4/26 下午6:02, WANG Xuerui 写道:
> > > 
> > > On 2023/4/26 17:53, Lulu Cheng wrote:
> > > > Hi, ruoyao:
> > > > 
> > > >   The performance of spec2006 is finished. The fixed-point 
> > > > 400.perlbench has about 3% performance improvement,
> > > > 
> > > > and the other basics have not changed, and the floating-point tests 
> > > > have basically remained the same.
> > > Nice to know!
> > > > 
> > > >   Do you have any questions about the test cases mentioned by 
> > > > Guo Jie? If there is no problem, modify the test case,
> > > > 
> > > > I think the code can be merged into the main branch.
> > > > 
> > > > 
> > > BTW what about the previous function/loop alignment patches? The LLVM 
> > > changes are also waiting for such results. ;-)
> > Well, there are many combinations in this align test, so the test time 
> > will be very long. I will reply the result as soon as the test results 
> > come out.:-)
> > 
> Oh, I got. Thanks very much for all the tests and take your time!

Sorry if it's noisy, but I hope there is some (maybe preliminary)
result: now I finally have some spare time to rebuild the system with
GCC 13 and I'd like to use some -falign-functions= in my CFLAGS :).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v6 2/9] RISC-V: autovec: Export policy functions to global scope

2023-05-06 Thread Jeff Law via Gcc-patches




On 5/5/23 09:46, Michael Collison wrote:

2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
Remove static declaration to to make externally visible.
(get_mask_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
New external declaration.
(get_mask_policy_for_pred): Ditto.

Thanks.  I've pushed this to the trunk.
jeff


[libgcc PATCH] Add bit reversal functions __bitrev[qhsd]i2.

2023-05-06 Thread Roger Sayle

This patch proposes adding run-time library support for bit reversal,
by adding a __bitrevsi2 function to libgcc.  Thoughts/opinions?

I'm also tempted to add __popcount[qh]i2 and __parity[qh]i2 to libgcc,
to allow the RTL optimizers to perform narrowing operations, but I'm
curious to hear whether QImode and HImode support, though more efficient,
is frowned by the libgcc maintainers/philosophy.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32} and
on nvptx-none, with no new regressions.  Ok for mainline?


2023-05-06  Roger Sayle  

gcc/ChangeLog
* doc/libgcc.texi (__bitrevqi2): Document bit reversal run-time
functions; __bitrevqi2, __bitrevhi2, __bitrevsi2 and __bitrevdi2.

libgcc/ChangeLog
* Makfile.in (lib2funcs): Add __bitrev[qhsd]i2.
* libgcc-std.ver.in (GCC_14.0.0): Add __bitrev[qhsd]i2.
* libgcc2.c (__bitrevqi2): New function.
(__bitrevhi2): Likewise.
(__bitrevsi2): Likewise.
(__bitrevdi2): Likewise.
* libgcc2.h (__bitrevqi2): Prototype here.
(__bitrevhi2): Likewise.
(__bitrevsi2): Likewise.
(__bitrevdi2): Likewise.

Thanks in advance,
Roger
--

diff --git a/gcc/doc/libgcc.texi b/gcc/doc/libgcc.texi
index 73aa803..7611347 100644
--- a/gcc/doc/libgcc.texi
+++ b/gcc/doc/libgcc.texi
@@ -218,6 +218,13 @@ These functions return the number of bits set in @var{a}.
 These functions return the @var{a} byteswapped.
 @end deftypefn
 
+@deftypefn {Runtime Function} int8_t __bitrevqi2 (int8_t @var{a})
+@deftypefnx {Runtime Function} int16_t __bitrevhi2 (int16_t @var{a})
+@deftypefnx {Runtime Function} int32_t __bitrevsi2 (int32_t @var{a})
+@deftypefnx {Runtime Function} int64_t __bitrevdi2 (int64_t @var{a})
+These functions return the bit reversed @var{a}.
+@end deftypefn
+
 @node Soft float library routines
 @section Routines for floating point emulation
 @cindex soft float library
diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index 6c4dc79..67c54df 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -446,7 +446,7 @@ lib2funcs = _muldi3 _negdi2 _lshrdi3 _ashldi3 _ashrdi3 
_cmpdi2 _ucmpdi2\
_paritysi2 _paritydi2 _powisf2 _powidf2 _powixf2 _powitf2  \
_mulhc3 _mulsc3 _muldc3 _mulxc3 _multc3 _divhc3 _divsc3\
_divdc3 _divxc3 _divtc3 _bswapsi2 _bswapdi2 _clrsbsi2  \
-   _clrsbdi2
+   _clrsbdi2 _bitrevqi2 _bitrevhi2 _bitrevsi2 _bitrevdi2
 
 # The floating-point conversion routines that involve a single-word integer.
 # XX stands for the integer mode.
diff --git a/libgcc/libgcc-std.ver.in b/libgcc/libgcc-std.ver.in
index c4f87a5..2198b0e 100644
--- a/libgcc/libgcc-std.ver.in
+++ b/libgcc/libgcc-std.ver.in
@@ -1944,3 +1944,12 @@ GCC_7.0.0 {
   __PFX__divmoddi4
   __PFX__divmodti4
 }
+
+%inherit GCC_14.0.0 GCC_7.0.0
+GCC_14.0.0 {
+  # bit reversal functions
+  __PFX__bitrevqi2
+  __PFX__bitrevhi2
+  __PFX__bitrevsi2
+  __PFX__bitrevdi2
+}
diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
index e0017d1..2bef2a1 100644
--- a/libgcc/libgcc2.c
+++ b/libgcc/libgcc2.c
@@ -488,6 +488,54 @@ __bswapdi2 (DItype u)
  | (((u) & 0x00ffull) << 56));
 }
 #endif
+
+#ifdef L_bitrevqi2
+QItype
+__bitrevqi2 (QItype x)
+{
+  UQItype u = x;
+  u = (((u) >> 1) & 0x55) | (((u) & 0x55) << 1);
+  u = (((u) >> 2) & 0x33) | (((u) & 0x33) << 2);
+  return ((u) >> 4) | ((u) << 4);
+}
+#endif
+#ifdef L_bitrevhi2
+HItype
+__bitrevhi2 (HItype x)
+{
+  UHItype u = x;
+  u = (((u) >> 1) & 0x) | (((u) & 0x) << 1);
+  u = (((u) >> 2) & 0x) | (((u) & 0x) << 2);
+  u = (((u) >> 4) & 0x0f0f) | (((u) & 0x0f0f) << 4);
+  return ((u) >> 8) | ((u) << 8);
+}
+#endif
+#ifdef L_bitrevsi2
+SItype
+__bitrevsi2 (SItype x)
+{
+  USItype u = x;
+  u = (((u) >> 1) & 0x) | (((u) & 0x) << 1);
+  u = (((u) >> 2) & 0x) | (((u) & 0x) << 2);
+  u = (((u) >> 4) & 0x0f0f0f0f) | (((u) & 0x0f0f0f0f) << 4);
+  return __bswapsi2 (u);
+}
+#endif
+#ifdef L_bitrevdi2
+DItype
+__bitrevdi2 (DItype x)
+{
+  UDItype u = x;
+  u = (((u) >> 1) & 0xll)
+  | (((u) & 0xll) << 1);
+  u = (((u) >> 2) & 0xll)
+  | (((u) & 0xll) << 2);
+  u = (((u) >> 4) & 0x0f0f0f0f0f0f0f0fll)
+  | (((u) & 0x0f0f0f0f0f0f0f0fll) << 4);
+  return __bswapdi2 (u);
+}
+#endif
+
 #ifdef L_ffssi2
 #undef int
 int
diff --git a/libgcc/libgcc2.h b/libgcc/libgcc2.h
index 3ec9bbd..e1abc0d 100644
--- a/libgcc/libgcc2.h
+++ b/libgcc/libgcc2.h
@@ -338,6 +338,10 @@ typedef int shift_count_type __attribute__((mode 
(__libgcc_shift_count__)));
 #define __udiv_w_sdiv  __N(udiv_w_sdiv)
 #define __clear_cache  __N(clear_cache)
 #define __enable_execute_stack __N(enable_execute_stack)
+#define __bitrevqi2__N(bitrevqi2)
+#define __bitrevhi2__N(bitrevhi2)
+#define __bitrevsi2  

Re: [libgcc PATCH] Add bit reversal functions __bitrev[qhsd]i2.

2023-05-06 Thread Andrew Pinski via Gcc-patches
On Sat, May 6, 2023 at 10:26 AM Roger Sayle  wrote:
>
>
> This patch proposes adding run-time library support for bit reversal,
> by adding a __bitrevsi2 function to libgcc.  Thoughts/opinions?

Are you going to add a builtin for these functions too? If so that is
recorded as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 .

Thanks,
Andrew

>
> I'm also tempted to add __popcount[qh]i2 and __parity[qh]i2 to libgcc,
> to allow the RTL optimizers to perform narrowing operations, but I'm
> curious to hear whether QImode and HImode support, though more efficient,
> is frowned by the libgcc maintainers/philosophy.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32} and
> on nvptx-none, with no new regressions.  Ok for mainline?
>
>
> 2023-05-06  Roger Sayle  
>
> gcc/ChangeLog
> * doc/libgcc.texi (__bitrevqi2): Document bit reversal run-time
> functions; __bitrevqi2, __bitrevhi2, __bitrevsi2 and __bitrevdi2.
>
> libgcc/ChangeLog
> * Makfile.in (lib2funcs): Add __bitrev[qhsd]i2.
> * libgcc-std.ver.in (GCC_14.0.0): Add __bitrev[qhsd]i2.
> * libgcc2.c (__bitrevqi2): New function.
> (__bitrevhi2): Likewise.
> (__bitrevsi2): Likewise.
> (__bitrevdi2): Likewise.
> * libgcc2.h (__bitrevqi2): Prototype here.
> (__bitrevhi2): Likewise.
> (__bitrevsi2): Likewise.
> (__bitrevdi2): Likewise.
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH v6 3/9] RISC-V:autovec: Add auto-vectorization support functions

2023-05-06 Thread Jeff Law via Gcc-patches




On 5/5/23 09:46, Michael Collison wrote:

2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-v.cc
(riscv_vector_preferred_simd_mode): New function.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.

Didn't include the addition of autovec_use_vlmax_p.  Fixed.


---


  
+/* SCALABLE means that the vector-length is agnostic (run-time invariant and

+   compile-time unknown). FIXED meands that the vector-length is specific
+   (compile-time known). Both RVV_SCALABLE and RVV_FIXED_VLMAX are doing
+   auto-vectorization using VLMAX vsetvl configuration.  */
+static bool
+autovec_use_vlmax_p (void)
+{
+  return riscv_autovec_preference == RVV_SCALABLE
+|| riscv_autovec_preference == RVV_FIXED_VLMAX;
When a line gets wrapped, add parens and adjust indentation accordingly. 
 I've fixed it this time in the interests of getting this stuff unblocked.




+}
+
+/* Return the vectorization machine mode for RVV according to LMUL.  */
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode)
+{
+  /* We only enable auto-vectorization when TARGET_MIN_VLEN >= 128 &&
+ riscv_autovec_lmul < RVV_M2. Since GCC loop vectorizer report ICE
+ when we enable -march=rv64gc_zve32* and -march=rv32gc_zve64*.
+ in the 'can_duplicate_and_interleave_p' of tree-vect-slp.cc. Since we have
+ VNx1SImode in -march=*zve32* and VNx1DImode in -march=*zve64*, they are
+ enabled in targetm. vector_mode_supported_p and SLP vectorizer will try to
+ use them. Currently, we can support auto-vectorization in
+ -march=rv32_zve32x_zvl128b. Wheras, -march=rv32_zve32x_zvl32b or
+ -march=rv32_zve32x_zvl64b are disabled.
+ */
Another nit.  Go ahead and close the comment on the last line of text. 
I think my question from last week still stands.




+  if (autovec_use_vlmax_p ())
+{
+  /* If TARGET_MIN_VLEN < 128, we don't allow LMUL < 2
+auto-vectorization since Loop Vectorizer may use VNx1SImode or
+VNx1DImode to vectorize which will create ICE in the
+'can_duplicate_and_interleave_p' of tree-vect-slp.cc.  */

Seems redundant with outer conditional.  Removed.



+  if (TARGET_MIN_VLEN < 128 && riscv_autovec_lmul < RVV_M2)
+   return word_mode;
+  /* We use LMUL = 1 as base bytesize which is BYTES_PER_RISCV_VECTOR and
+riscv_autovec_lmul as multiply factor to calculate the the NUNITS to
+get the auto-vectorization mode.  *
+  poly_uint64 nunits;
+  poly_uint64 vector_size
+   = BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul);
+  poly_uint64 scalar_size = GET_MODE_SIZE (mode);
+  gcc_assert (multiple_p (vector_size, scalar_size, &nunits));
+  machine_mode rvv_mode;
+  if (get_vector_mode (mode, nunits).exists (&rvv_mode))
+   return rvv_mode;
+}
+  /* TODO: We will support minimum length VLS auto-vectorization in the future.
+   */

Rewrapped to avoid having the close comment on a line by itself.


@@ -430,6 +482,45 @@ get_avl_type_rtx (enum avl_type type)
return gen_int_mode (type, Pmode);
  }
  
+/* Return the mask policy for no predication.  */

+rtx
+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return the tail policy for no predication.  */
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_tail_policy_for_pred (PRED_TYPE_none);
+}

Added explicit "void" to the argument list for those two functions.




+/* Return the appropriate mask mode for MODE.  */
+
+opt_machine_mode
+riscv_vector_get_mask_mode (machine_mode mode)
+{
+  machine_mode mask_mode;
+  int nf = 1;
+
+  FOR_EACH_MODE_IN_CLASS (mask_mode, MODE_VECTOR_BOOL)
+  if (GET_MODE_INNER (mask_mode) == BImode
+  && known_eq (GET_MODE_NUNITS (mask_mode) * nf, GET_MODE_NUNITS (mode))
+  && riscv_vector_mask_mode_p (mask_mode))
+return mask_mode;
Presumably the IF is part of the loop, meaning it needs to be indented 
to show that relationship.  Fixed.


Pushed to the trunk with the above fixes.
jeff


Re: [patch, fortran] PR109662 Namelist input with comma after name accepted

2023-05-06 Thread Harald Anlauf via Gcc-patches

Hi Jerry, Steve,

I think I have to pour a little water into the wine.

The patch fixes the reported issue only for a comma after
the namelist name, but we still accept a few other illegal
characters, e.g. ';', because:

#define is_separator(c) (c == '/' ||  c == ',' || c == '\n' || c == ' ' \
 || c == '\t' || c == '\r' || c == ';' || \
 (dtp->u.p.namelist_mode && c == '!'))

We don't want that in standard conformance mode, or do we?

Cheers,
Harald

On 5/6/23 06:02, Steve Kargl via Gcc-patches wrote:

On Fri, May 05, 2023 at 08:41:48PM -0700, Jerry D via Fortran wrote:

The attached patch adds a check for the invalid comma and emits a runtime
error if -std=f95,f2003,f2018 are specified at compile time.

Attached patch includes a new test case.

Regression tested on x86_64-linux-gnu.

OK for mainline?



Yes.  Thanks for the fix.  It's been a long time since
I looked at libgfortran code and couldn't quite determine
where to start to fix this.





Re: [PATCH v6 4/9] RISC-V:autovec: Add target vectorization hooks

2023-05-06 Thread Jeff Law via Gcc-patches




On 5/5/23 09:46, Michael Collison wrote:

2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.cc
(riscv_estimated_poly_value): Implement
TARGET_ESTIMATED_POLY_VALUE.
(riscv_preferred_simd_mode): Implement
TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
(riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
(riscv_empty_mask_is_expensive): Implement
TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
(riscv_vectorize_create_costs): Implement
TARGET_VECTORIZE_CREATE_COSTS.
(riscv_support_vector_misalignment): Implement
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT.
(TARGET_ESTIMATED_POLY_VALUE): Register target macro.
(TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
(TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): Ditto.
Thanks.  I removed the duplicated preferred_simd_mode definition and 
related macro and pushed this to the trunk.

jeff


Re: [PATCH v6 7/9] RISC-V: autovec: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-05-06 Thread Jeff Law via Gcc-patches




On 5/5/23 09:46, Michael Collison wrote:

While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
where GET_MODE_NUNITS is equal to one.

Tested on RISCV and x86_64-linux-gnu. Okay?

2023-03-09  Michael Collison  

* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is a multiple of 2.
I've pushed this to the trunk given it was acked by Richard S and he 
explicitly indicated it need not wait for all the patches in this kit.


jeff


Re: [PATCH] Don't call emit_clobber in lower-subreg.cc's resolve_simple_move.

2023-05-06 Thread Jeff Law via Gcc-patches




On 5/6/23 06:57, Roger Sayle wrote:


Following up on posts/reviews by Segher and Uros, there's some question
over why the middle-end's lower subreg pass emits a clobber (of a
multi-word register) into the instruction stream before emitting the
sequence of moves of the word-sized parts.  This clobber interferes
with (LRA) register allocation, preventing the multi-word pseudo to
remain in the same hard registers.  This patch eliminates this
(presumably superfluous) clobber and thereby improves register allocation.
Those clobbered used to help dataflow analysis know that a multi word 
register was fully assigned by a subsequent sequence.  I suspect they 
haven't been terribly useful in quite a while.





A concrete example of the observed improvement is PR target/43644.
For the test case:
__int128 foo(__int128 x, __int128 y) { return x+y; }

on x86_64-pc-linux-gnu, gcc -O2 currently generates:

foo:movq%rsi, %rax
 movq%rdi, %r8
 movq%rax, %rdi
 movq%rdx, %rax
 movq%rcx, %rdx
 addq%r8, %rax
 adcq%rdi, %rdx
 ret

with this patch, we now generate the much improved:

foo:movq%rdx, %rax
 movq%rcx, %rdx
 addq%rdi, %rax
 adcq%rsi, %rdx
 ret

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32} with
no new failures.  OK for mainline?


2023-05-06  Roger Sayle  

gcc/ChangeLog
 PR target/43644
 * lower-subreg.cc (resolve_simple_move): Don't emit a clobber
 immediately before moving a multi-word register by parts.

gcc/testsuite/ChangeLog
 PR target/43644
 * gcc.target/i386/pr43644.c: New test case.
OK for the trunk.  I won't be at all surprised to see fallout in the 
various target tests.  We can fault in fixes as needed.  More 
importantly I think we want as much soak time for this change as we can 
in case there are unexpected consequences.


jeff


Re: Support parallel testing in libgomp, part I [PR66005]

2023-05-06 Thread Bernhard Reutner-Fischer via Gcc-patches
On Fri, 5 May 2023 10:55:41 +0200
Thomas Schwinge  wrote:

> So I recently had re-created this patch independently, before remembering
> that Rainer had -- just eight years ago... ;-) -- already submitted this.

thanks to you both :)

> etc. (where "normal" is a libstdc++ detail), and regarding:
> 
> >> > with a minimal change
> >> > to libgomp.exp so the generated libgomp-test-support.exp file is found
> >> > in both the sequential and parallel cases.  This isn't an issue in
> >> > libstdc++ since all necessary variables are stored in a single
> >> > site.exp.  
> 
> ... in 'libgomp/testsuite/lib/libgomp.exp', I've changed:
> 
> -load_file libgomp-test-support.exp
> +# Search in both .. and . to support parallel and sequential testing.
> +load_file -1 ../libgomp-test-support.exp libgomp-test-support.exp
> 
> ... into the more explicit:
> 
> -load_file libgomp-test-support.exp
> +# Search in '..' vs. '.' to support parallel vs. sequential testing.
> +if [info exists ::env(GCC_RUNTEST_PARALLELIZE_DIR)] {
> +load_file ../libgomp-test-support.exp
> +} else {
> +load_file libgomp-test-support.exp
> +}

Do we have to re-read those? Otherwise this would be load_lib:

We have libdirs in the minimum deja we require.

Speaking of which. IIRC i additionally deleted all load_gcc_lib:
https://gcc.gnu.org/legacy-ml/fortran/2012-03/msg00094.html
in lib{atomic,ffi,go,gomp,itm,phobos,stdc++-v3,vtv}

> And, for now, I hard-code the number of parallel slots to one.  This
> means that libgomp for 'make -j' now does use the parallel testing code
> paths, but is restricted to just one slot.  That is, no actual change in
> behavior, other than that 'libgomp.sum' then is filtered through
> 'contrib/dg-extract-results.sh'.
> 
> OK to push the attached
> "Support parallel testing in libgomp, part I [PR66005]"?

Some cosmetic nits.
See Jakubs one_to_.

+   @test ! -f $*/site.exp || mv $*/site.exp $*/site.bak
that's twisted

+ rm -rf libgomp-parallel || true; \

just || :; \
I count 4 times.

There seems to be a mixture of ${PWD_COMMAND} and am__cd && pwd:
+   @objdir=`${PWD_COMMAND}`/$*; \
+   srcdir=`$(am__cd) $(srcdir) && pwd`; export srcdir; \

+   runtest=$(_RUNTEST); \
+   if [ -z "$$runtest" ]; then runtest=runtest; fi; \
I think I have plain $${RUNTEST-runtest}
off the default wildcard $(top_srcdir)/../dejagnu/runtest

> >> It is far from trivial though.
> >> The point is that most of the OpenMP tests are parallelized with the
> >> default OMP_NUM_THREADS, so running the tests in parallel oversubscribes 
> >> the
> >> machine a lot, the higher number of hw threads the more.  
> >
> > Do you agree that we have two classes of test cases in libgomp: 1) test
> > cases that don't place a considerably higher load on the machine compared
> > to "normal" (single-threaded) execution tests, because they're just
> > testing some functionality that is not expected to actively depend
> > on/interfere with parallelism.  If needed, and/or if not already done,
> > such test cases can be parameterized (OMP_NUM_THREADS, OpenACC num_gangs,
> > num_workers, vector_length clauses, and so on) for low parallelism
> > levels.  And, 2) test cases that place a considerably higher load on the
> > machine compared to "normal" (single-threaded) execution tests, because
> > they're testing some functionality that actively depends on/interferes
> > with some kind of parallelism.  What about marking such tests specially,
> > such that DejaGnu will only ever schedule one of them for execution at
> > the same time?  For example, a new dg-* directive to run them wrapped
> > through »flock [libgomp/testsuite/serial.lock] [a.out]« or some such?

I think we all agree one that, yes.

> >  
> >> If we go forward with some parallelization of the tests, we at least should
> >> try to export something like OMP_WAIT_POLICY=passive so that the
> >> oversubscribed machine would at least not spend too much time in spinning. 
> >>  
> >
> > (Will again have the problem that DejaGnu doesn't provide infrastructure
> > to communicate environment variables to boards in remote testing.)

Are you sure? I'm pretty confident that this worked fine at least at
one point in the past for certain targets.

The rest of these 2 patches LGTM. Let's see what others have to say.
thanks,


[committed] Partial revert of recent changes

2023-05-06 Thread Jeff Law via Gcc-patches


It turns out a couple of bits submitted by Michael had already been 
pushed to the trunk.  These two patches removes the duplicated bits.


Jeffcommit b9b7981f3d6919518372daf4c7e8c40dfc58f49d
Author: Jeff Law 
Date:   Sat May 6 11:36:37 2023 -0600

Remove duplicated definition in risc-v vector support.

gcc/

* config/riscv/riscv-v.cc (autovec_use_vlmax_p): Remove
duplicate definition.

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 6f8b4abf46d..9d699d455b0 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1015,17 +1015,6 @@ expand_tuple_move (machine_mode mask_mode, rtx *ops)
 }
 }
 
-/* SCALABLE means that the vector-length is agnostic (run-time invariant and
-   compile-time unknown). FIXED meands that the vector-length is specific
-   (compile-time known). Both RVV_SCALABLE and RVV_FIXED_VLMAX are doing
-   auto-vectorization using VLMAX vsetvl configuration.  */
-static bool
-autovec_use_vlmax_p (void)
-{
-  return riscv_autovec_preference == RVV_SCALABLE
-|| riscv_autovec_preference == RVV_FIXED_VLMAX;
-}
-
 /* Return the vectorization machine mode for RVV according to LMUL.  */
 machine_mode
 preferred_simd_mode (scalar_mode mode)
commit 4c05f966a098744db9fa1e73074d7c08ace446fd
Author: Jeff Law 
Date:   Sat May 6 13:28:33 2023 -0600

Delete duplicated riscv definition.

gcc/
* config/riscv/riscv-v.cc (riscv_vector_preferred_simd_mode): 
Delete.

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9d699d455b0..8c7f3206771 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -190,44 +190,6 @@ autovec_use_vlmax_p (void)
  || riscv_autovec_preference == RVV_FIXED_VLMAX);
 }
 
-/* Return the vectorization machine mode for RVV according to LMUL.  */
-machine_mode
-riscv_vector_preferred_simd_mode (scalar_mode mode)
-{
-  /* We only enable auto-vectorization when TARGET_MIN_VLEN >= 128 &&
- riscv_autovec_lmul < RVV_M2. Since GCC loop vectorizer report ICE
- when we enable -march=rv64gc_zve32* and -march=rv32gc_zve64*.
- in the 'can_duplicate_and_interleave_p' of tree-vect-slp.cc. Since we have
- VNx1SImode in -march=*zve32* and VNx1DImode in -march=*zve64*, they are
- enabled in targetm. vector_mode_supported_p and SLP vectorizer will try to
- use them. Currently, we can support auto-vectorization in
- -march=rv32_zve32x_zvl128b. Wheras, -march=rv32_zve32x_zvl32b or
- -march=rv32_zve32x_zvl64b are disabled.  */
-  if (autovec_use_vlmax_p ())
-{
-  /* If TARGET_MIN_VLEN < 128, we don't allow LMUL < 2
-auto-vectorization since Loop Vectorizer may use VNx1SImode or
-VNx1DImode to vectorize which will create ICE in the
-'can_duplicate_and_interleave_p' of tree-vect-slp.cc.  */
-  if (TARGET_MIN_VLEN < 128 && riscv_autovec_lmul < RVV_M2)
-   return word_mode;
-  /* We use LMUL = 1 as base bytesize which is BYTES_PER_RISCV_VECTOR and
-riscv_autovec_lmul as multiply factor to calculate the the NUNITS to
-get the auto-vectorization mode.  */
-  poly_uint64 nunits;
-  poly_uint64 vector_size
-   = BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul);
-  poly_uint64 scalar_size = GET_MODE_SIZE (mode);
-  gcc_assert (multiple_p (vector_size, scalar_size, &nunits));
-  machine_mode rvv_mode;
-  if (get_vector_mode (mode, nunits).exists (&rvv_mode))
-   return rvv_mode;
-}
-  /* TODO: We will support minimum length VLS auto-vectorization in the future.
-   */
-  return word_mode;
-}
-
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
 emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,


Re: [PATCH] riscv: Allow vector constants in riscv_const_insns.

2023-05-06 Thread Jeff Law via Gcc-patches




On 5/3/23 23:07, juzhe.zh...@rivai.ai wrote:

This ideal of this patch looks good to me.
But I think this patch should be able to handle more cases (not only -16 
~ 15) in case of CONST_VECTOR initialization.


Case 1 (Other constant value that is not -16 ~ 15):
void vmv_m##VAL (TYPE dst[], int n) \
{ \
     for (int i = 0; i < n; i++) \
       dst[i] = 100; \
   }

I guess for const_vector:100 is not optimal currently so far, I think 
you may try (and add testcases).

Such code can be:

Codegen 1:                            Codegen 2:
li a5,100                                  vlse.v v24, (a5), zero ;; a5 
address memory has the value of 100.

vmv.v.x v1, a5

I am not sure codegen 1 or codegen 2, which one is better. I think you 
can decide it.
But my idea is that I think this patch should not only handle he 
constant value of -16 ~ 15, but also other constant value should be 
handled and tested in this patch.


Case 2 (Constant value *within 32bit* for INT64 in *RV32* system):

This is a special case:

void vmv_i64 (TYPE dst[], int n)
{
     for (int i = 0; i < n; i++)
       dst[i] = *0x*;
  }

In this case, the Codegen should be similiar with Case 1 since each 
scalar register can hold the whole constant value.



Case 3 (Constant value over* 32bit* for INT64 in *RV32* system):

This is a special case:

void vmv_i64 (TYPE dst[], int n)
{
     for (int i = 0; i < n; i++)
       dst[i] = *0xA*;
  }

In this case, since each scalar register can only hold 32bit value that 
is not the whole constant value (*0xA)*

I think in this case, we can only use vlse.v...

Would you refine this patch more? Thanks.
I think we can add those as distinct patches.  The [-16..15] change is 
simple, stands on its own and I don't see any strong reason to make it 
wait for handling additional cases.


Remember, there are multiple engineers working in this space now.  So 
things which are clearly correct should move forward quickly so that we 
don't end up duplicating work.


Handling the additional cases can be handled as a distinct patch on its 
own.


Jeff


Re: [PATCH v5 03/10] RISC-V:autovec: Add auto-vectorization support functions

2023-05-06 Thread Jeff Law via Gcc-patches




On 5/3/23 11:31, Michael Collison wrote:

HI Kito,

I see there have been many comments on the 
"riscv_vector_preferred_simd_mode" hook, is there an updated version?
I think there's a version on the trunk now.  So if there's updates to 
do, let's do them relative to what's on the trunk.


Jeff


Re: [PATCH] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-05-06 Thread Fangrui Song via Gcc-patches
On Thu, Apr 27, 2023 at 5:47 PM Fangrui Song  wrote:
>
> When using -mcmodel=medium, large data is placed into .l* sections.  GNU ld
> places .l* sections into separate output sections.  If small and medium
> code model object files are mixed, the .l* sections won't cause
> relocation overflow pressure on sections in -mcmodel=small object files.
>
> However, when using -mcmodel=large, -mlarge-data-threshold doesn't apply.  
> This
> means that the .rodata/.data/.bss sections may cause relocation overflow
> pressure on sections in -mcmodel=small object files.
>
> This patch allows -mcmodel=large to generate .l* sections.
>
> Signed-off-by: Fangrui Song 
> ---
> [...]

Ping https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616947.html :)


Re: [PATCH] LoongArch: Enable shrink wrapping

2023-05-06 Thread chenglulu



在 2023/5/7 上午1:07, Xi Ruoyao 写道:

On Wed, 2023-04-26 at 18:21 +0800, WANG Xuerui wrote:

On 2023/4/26 18:14, Lulu Cheng wrote:

在 2023/4/26 下午6:02, WANG Xuerui 写道:

On 2023/4/26 17:53, Lulu Cheng wrote:

Hi, ruoyao:

   The performance of spec2006 is finished. The fixed-point
400.perlbench has about 3% performance improvement,

and the other basics have not changed, and the floating-point tests
have basically remained the same.

Nice to know!

   Do you have any questions about the test cases mentioned by
Guo Jie? If there is no problem, modify the test case,

I think the code can be merged into the main branch.



BTW what about the previous function/loop alignment patch.es? The LLVM
changes are also waiting for such results. ;-)

Well, there are many combinations in this align test, so the test time
will be very long. I will reply the result as soon as the test results
come out.:-)


Oh, I got. Thanks very much for all the tests and take your time!

Sorry if it's noisy, but I hope there is some (maybe preliminary)
result: now I finally have some spare time to rebuild the system with
GCC 13 and I'd like to use some -falign-functions= in my CFLAGS :).

The test is still ongoing, and I will reply to the results by email 
after the test is completed.:-)




Re: Pushed: [PATCH v2] LoongArch: Enable shrink wrapping

2023-05-06 Thread chenglulu



在 2023/5/7 上午1:05, Xi Ruoyao 写道:

On Wed, 2023-04-26 at 21:29 +0800, Xi Ruoyao via Gcc-patches wrote:

    Do you have any questions about the test cases mentioned by
Guo
Jie? If there is no problem, modify the test case,

I think the code can be merged into the main branch.

I'll rewrite the test and commit in a few days (now I'm occupied with
something:(  ).

The patch has been pushed as the following (with test updated).
Unfortunately I forgot to modify the change log to include the SPEC
result and the change for test case:(.

It's a bit of a shame, but thank you for the changes.:-)


Re: Re: [PATCH] riscv: Allow vector constants in riscv_const_insns.

2023-05-06 Thread 钟居哲
OK, you can go ahead commit patch.
I am gonna send another patch to fix this.

Besides, I saw you have commit some redundant incorrect codes, I will clean 
them up in another patch.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-05-07 04:11
To: juzhe.zh...@rivai.ai; Robin Dapp; gcc-patches; kito.cheng; Kito.cheng; 
palmer; collison
Subject: Re: [PATCH] riscv: Allow vector constants in riscv_const_insns.
 
 
On 5/3/23 23:07, juzhe.zh...@rivai.ai wrote:
> This ideal of this patch looks good to me.
> But I think this patch should be able to handle more cases (not only -16 
> ~ 15) in case of CONST_VECTOR initialization.
> 
> Case 1 (Other constant value that is not -16 ~ 15):
> void vmv_m##VAL (TYPE dst[], int n) \
> { \
>  for (int i = 0; i < n; i++) \
>dst[i] = 100; \
>}
> 
> I guess for const_vector:100 is not optimal currently so far, I think 
> you may try (and add testcases).
> Such code can be:
> 
> Codegen 1:Codegen 2:
> li a5,100  vlse.v v24, (a5), zero ;; a5 
> address memory has the value of 100.
> vmv.v.x v1, a5
> 
> I am not sure codegen 1 or codegen 2, which one is better. I think you 
> can decide it.
> But my idea is that I think this patch should not only handle he 
> constant value of -16 ~ 15, but also other constant value should be 
> handled and tested in this patch.
> 
> Case 2 (Constant value *within 32bit* for INT64 in *RV32* system):
> 
> This is a special case:
> 
> void vmv_i64 (TYPE dst[], int n)
> {
>  for (int i = 0; i < n; i++)
>dst[i] = *0x*;
>   }
> 
> In this case, the Codegen should be similiar with Case 1 since each 
> scalar register can hold the whole constant value.
> 
> 
> Case 3 (Constant value over* 32bit* for INT64 in *RV32* system):
> 
> This is a special case:
> 
> void vmv_i64 (TYPE dst[], int n)
> {
>  for (int i = 0; i < n; i++)
>dst[i] = *0xA*;
>   }
> 
> In this case, since each scalar register can only hold 32bit value that 
> is not the whole constant value (*0xA)*
> I think in this case, we can only use vlse.v...
> 
> Would you refine this patch more? Thanks.
I think we can add those as distinct patches.  The [-16..15] change is 
simple, stands on its own and I don't see any strong reason to make it 
wait for handling additional cases.
 
Remember, there are multiple engineers working in this space now.  So 
things which are clearly correct should move forward quickly so that we 
don't end up duplicating work.
 
Handling the additional cases can be handled as a distinct patch on its 
own.
 
Jeff
 


RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-06 Thread Li, Pan2 via Gcc-patches
It looks like we cannot simply swap the code and mode in rtx_def, the code may 
have to be the same bits as the tree_code in tree_base. Or we will meet ICE 
like below.

rtx_def code 16 => 8 bits.
rtx_def mode 8 => 16 bits.

static inline decl_or_value
dv_from_value (rtx value)
{
  decl_or_value dv;
  dv = value;
  gcc_checking_assert (dv_is_value_p (dv));  <=  ICE 
  return dv;
}

Thus we also need to align the bits change to the tree_code like below. 
Unfortunately, only 8 bits may be not sufficient due to compile log 
"../../gcc/tree-core.h:1034:28: warning: ‘tree_base::code’ is too small to hold 
all values of ‘enum tree_code’".

tree_base code 16 => 8 bits.

So the one possible approach for the bits adjustment may look like below, I am 
not very sure if it is reasonable or not. Any ideas about this? Thank you all 
in advance, 😉.

rtx_def code 16 => 12 bits.
rtx_def mode 8 => 12 bits.
tree_base code 16 => 12 bits.

Pan


-Original Message-
From: Li, Pan2 
Sent: Saturday, May 6, 2023 10:49 AM
To: 'Kito Cheng' 
Cc: 'juzhe.zh...@rivai.ai' ; 'rguenther' 
; 'richard.sandiford' ; 
'jeffreyalaw' ; 'gcc-patches' ; 
'palmer' ; 'jakub' 
Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

Picked all changes mentioned in previous to single patch as attachment. Please 
help to review if any mistake.

Pan

-Original Message-
From: Li, Pan2
Sent: Saturday, May 6, 2023 10:20 AM
To: Kito Cheng 
Cc: juzhe.zh...@rivai.ai; rguenther ; richard.sandiford 
; jeffreyalaw ; gcc-patches 
; palmer ; jakub 
Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

Yes, that makes sense, will have a try and keep you posted.

Pan

-Original Message-
From: Kito Cheng 
Sent: Saturday, May 6, 2023 10:19 AM
To: Li, Pan2 
Cc: juzhe.zh...@rivai.ai; rguenther ; richard.sandiford 
; jeffreyalaw ; gcc-patches 
; palmer ; jakub 
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

I think x86 first? The major thing we want to make sure is that this change 
won't affect those targets which do not really require 16 bit machine_mode too 
much.


On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches 
 wrote:
>
> Sure thing, I will pick them all together and trigger(will send out the 
> overall diff before start to make sure my understand is correct) the test 
> again. BTW which target do we prefer first? X86 or RISC-V.
>
> Pan
>
> From: juzhe.zh...@rivai.ai 
> Sent: Saturday, May 6, 2023 10:00 AM
> To: kito.cheng ; Li, Pan2 
> Cc: rguenther ; richard.sandiford 
> ; jeffreyalaw ; 
> gcc-patches ; palmer ; 
> jakub 
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 
> 8-bit to 16-bit
>
> Yeah, you should also swap mode and code in rtx_def according to 
> Richard suggestion since it will not change the rtx_def data structure.
>
> I think the only problem is the mode in tree data structure.
> 
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-05-06 09:53
> To: Li, Pan2
> CC: Richard Biener;
> 钟居哲;
> richard.sandiford; Jeff 
> Law;
> gcc-patches;
> palmer; jakub
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 
> 8-bit to 16-bit Hi Pan:
>
> Could you try to apply the following diff and measure again? This 
> makes tree_type_common size unchanged.
>
>
> sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common=
> 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this
> diff)
>
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h index
> af795aa81f98..b8ccfa407ed9 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
>   tree attributes;
>   unsigned int uid;
>
> +  ENUM_BITFIELD(machine_mode) mode : 16;
> +
>   unsigned int precision : 10;
>   unsigned no_force_blk_flag : 1;
>   unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct
> GTY(()) tree_type_common {
>   unsigned restrict_flag : 1;
>   unsigned contains_placeholder_bits : 2;
>
> -  ENUM_BITFIELD(machine_mode) mode : 16;
>
>   /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
>  TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE.  */ @@ -1712,7
> +1713,7 @@ struct GTY(()) tree_type_common {
>   unsigned empty_flag : 1;
>   unsigned indivisible_p : 1;
>   unsigned no_named_args_stdarg_p : 1;
> -  unsigned spare : 15;
> +  unsigned spare : 7;
>
>   alias_set_type alias_set;
>   tree pointer_to;
>
> On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches 
> mailto:gcc-patches@gcc.gnu.org>> wrote:
> >
> > Yes, totally agree the number cannot be very accurate up to a point. Update 
> > the correlated memory b

New Croatian PO file for 'gcc' (version 13.1.0)

2023-05-06 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Croatian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/hr.po

(This file, 'gcc-13.1.0.hr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[PATCH 3/3] PHIOPT: factor out unary operations instead of just conversions

2023-05-06 Thread Andrew Pinski via Gcc-patches
After using factor_out_conditional_conversion with diamond bb,
we should be able do use it also for all normal unary gimple and not
just conversions. This allows to optimize PR 59424 for an example.
This is also a start to optimize PR 64700 and a few others.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

An example of this is:
```
static inline unsigned long long g(int t)
{
  unsigned t1 = t;
  return t1;
}
static int abs1(int a)
{
  if (a < 0)
a = -a;
  return a;
}
unsigned long long f(int c, int d, int e)
{
  unsigned long long t;
  if (d > e)
t = g(abs1(d));
  else
t = g(abs1(e));
  return t;
}
```

Which should be optimized to:
  _9 = MAX_EXPR ;
  _4 = ABS_EXPR <_9>;
  t_3 = (long long unsigned intD.16) _4;

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_conversion): Rename to ...
(factor_out_conditional_operation): This and add support for all unary
operations.
(pass_phiopt::execute): Update call to factor_out_conditional_conversion
to call factor_out_conditional_operation instead.

PR tree-optimization/109424
PR tree-optimization/59424

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/abs-2.c: Update tree scan for
details change in wording.
* gcc.dg/tree-ssa/minmax-17.c: Likewise.
* gcc.dg/tree-ssa/pr103771.c: Likewise.
* gcc.dg/tree-ssa/minmax-18.c: New test.
* gcc.dg/tree-ssa/minmax-19.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/abs-2.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/minmax-18.c | 27 +++
 gcc/testsuite/gcc.dg/tree-ssa/minmax-19.c | 10 
 gcc/testsuite/gcc.dg/tree-ssa/pr103771.c  |  2 +-
 gcc/tree-ssa-phiopt.cc| 56 +--
 6 files changed, 71 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-18.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-19.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
index 328b1802541..f8bbeb43237 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
@@ -16,5 +16,5 @@ test_abs(int *cur)
 
 /* We should figure out that test_abs has an ABS_EXPR in it. */
 /* { dg-final { scan-tree-dump " = ABS_EXPR" "phiopt1"} } */
-/* { dg-final { scan-tree-dump-times "changed to factor conversion out from" 1 
"phiopt1"} } */
+/* { dg-final { scan-tree-dump-times "changed to factor operation out from" 1 
"phiopt1"} } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c
index bd737e6b4cb..7c76cfc62a9 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c
@@ -18,4 +18,4 @@ unsigned long long test_max(int c, int d, int e)
 
 /* We should figure out that test_max has an MAX_EXPR in it. */
 /* { dg-final { scan-tree-dump " = MAX_EXPR" "phiopt1"} } */
-/* { dg-final { scan-tree-dump-times "changed to factor conversion out from" 2 
"phiopt1"} } */
+/* { dg-final { scan-tree-dump-times "changed to factor operation out from" 2 
"phiopt1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-18.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-18.c
new file mode 100644
index 000..c8e1670f64a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-18.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-phiopt1-details" } */
+
+static inline unsigned long long g(int t)
+{
+  unsigned t1 = t;
+  return t1;
+}
+static inline int abs1(int a)
+{
+  if (a < 0)
+a = -a;
+  return a;
+}
+unsigned long long f(int c, int d, int e)
+{
+  unsigned long long t;
+  if (d > e)
+t = g(abs1(d));
+  else
+t = g(abs1(e));
+  return t;
+}
+
+/* { dg-final { scan-tree-dump " = MAX_EXPR" "phiopt1"} } */
+/* { dg-final { scan-tree-dump-times " = ABS_EXPR" 2 "phiopt1"} } */
+/* { dg-final { scan-tree-dump-times "changed to factor operation out from" 3 
"phiopt1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-19.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-19.c
new file mode 100644
index 000..5ed55fe2e23
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-19.c
@@ -0,0 +1,10 @@
+/* PR tree-optimization/109424 */
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-phiopt1-details" } */
+
+int f2(int x, int y)
+{
+return (x > y) ? ~x : ~y;
+}
+/* { dg-final { scan-tree-dump " = MAX_EXPR" "phiopt1"} } */
+/* { dg-final { scan-tree-dump-times "changed to factor operation out from" 1 
"phiopt1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c
index 97c9db846cb..8faa45a8222 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -fdump-tree-phiopt1-details" } */
-/* { dg-final { scan-tree-dump-times "changed to facto

[PATCH 2/3] PHIOPT: Loop over calling factor_out_conditional_conversion

2023-05-06 Thread Andrew Pinski via Gcc-patches
After adding diamond shaped bb support to factor_out_conditional_conversion,
we can get a case where we have two conversions that needs factored out
and then would have another phiopt happen.
An example is:
```
static inline unsigned long long g(int t)
{
  unsigned t1 = t;
  return t1;
}
unsigned long long f(int c, int d, int e)
{
  unsigned long long t;
  if (c > d)
t = g(c);
  else
t = g(d);
  return t;
}
```
In this case we should get a MAX_EXPR in phiopt1 with two casts.
Before this patch, we would just factor out the outer cast and then
wait till phiopt2 to factor out the inner cast.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (pass_phiopt::execute): Loop
over factor_out_conditional_conversion.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/minmax-17.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c | 21 ++
 gcc/tree-ssa-phiopt.cc| 27 +--
 2 files changed, 36 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c
new file mode 100644
index 000..bd737e6b4cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-17.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-phiopt1-details" } */
+
+static inline unsigned long long g(int t)
+{
+  unsigned t1 = t;
+  return t1;
+}
+unsigned long long test_max(int c, int d, int e)
+{
+  unsigned long long t;
+  if (c > d)
+t = g(c);
+  else
+t = g(d);
+  return t;
+}
+
+/* We should figure out that test_max has an MAX_EXPR in it. */
+/* { dg-final { scan-tree-dump " = MAX_EXPR" "phiopt1"} } */
+/* { dg-final { scan-tree-dump-times "changed to factor conversion out from" 2 
"phiopt1"} } */
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 41fea78dc8d..7fe088b13ff 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -4085,20 +4085,23 @@ pass_phiopt::execute (function *)
  node.  */
   gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE);
 
-  gphi *newphi;
   if (single_pred_p (bb1)
- && EDGE_COUNT (merge->preds) == 2
- && (newphi = factor_out_conditional_conversion (e1, e2, phi,
- arg0, arg1,
- cond_stmt)))
+ && EDGE_COUNT (merge->preds) == 2)
{
- phi = newphi;
- /* factor_out_conditional_conversion may create a new PHI in
-BB2 and eliminate an existing PHI in BB2.  Recompute values
-that may be affected by that change.  */
- arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
- arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
- gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE);
+ gphi *newphi = phi;
+ while (newphi)
+   {
+ phi = newphi;
+ /* factor_out_conditional_conversion may create a new PHI in
+BB2 and eliminate an existing PHI in BB2.  Recompute values
+that may be affected by that change.  */
+ arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
+ arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
+ gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE);
+ newphi = factor_out_conditional_conversion (e1, e2, phi,
+ arg0, arg1,
+ cond_stmt);
+   }
}
 
   /* Do the replacement of conditional if it can be done.  */
-- 
2.31.1



[PATCH 1/3] PHIOPT: Add diamond bb form to factor_out_conditional_conversion

2023-05-06 Thread Andrew Pinski via Gcc-patches
So the function factor_out_conditional_conversion already supports
diamond shaped bb forms, just need to be called for such a thing.

harden-cond-comp.c needed to be changed as we would optimize out the
conversion now and that causes the compare hardening not needing to
split the block which it was testing. So change it such that there
would be no chance of optimization.

Also add two testcases that showed the improvement. PR 103771 is
solved in ifconvert also for the vectorizer but now it is solved
in a general sense.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/49959
PR tree-optimization/103771

gcc/ChangeLog:

* tree-ssa-phiopt.cc (pass_phiopt::execute): Support
Diamond shapped bb form for factor_out_conditional_conversion.

gcc/testsuite/ChangeLog:

* c-c++-common/torture/harden-cond-comp.c: Change testcase
slightly to avoid the new phiopt optimization.
* gcc.dg/tree-ssa/abs-2.c: New test.
* gcc.dg/tree-ssa/pr103771.c: New test.
---
 .../c-c++-common/torture/harden-cond-comp.c   |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/abs-2.c | 20 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr103771.c  | 18 +
 gcc/tree-ssa-phiopt.cc|  2 +-
 4 files changed, 42 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr103771.c

diff --git a/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c 
b/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
index 5aad890a1d3..dcf364ee993 100644
--- a/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
+++ b/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
@@ -1,11 +1,11 @@
 /* { dg-do compile } */
 /* { dg-options "-fharden-conditional-branches -fharden-compares 
-fdump-tree-hardcbr -fdump-tree-hardcmp -ffat-lto-objects" } */
 
-int f(int i, int j) {
+int f(int i, int j, int k, int l) {
   if (i == 0)
-return j != 0;
+return (j != 0) + l;
   else
-return i * j != 0;
+return (i * j != 0) * k;
 }
 
 /* { dg-final { scan-tree-dump-times "Splitting edge" 2 "hardcbr" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
new file mode 100644
index 000..328b1802541
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
@@ -0,0 +1,20 @@
+/* PR tree-optimization/49959 */
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-phiopt1-details" } */
+
+#define ABS(X)(((X)>0)?(X):-(X))
+unsigned long
+test_abs(int *cur)
+{
+  unsigned long sad = 0;
+  if (cur[0] > 0)
+sad = cur[0];
+  else
+sad = -cur[0];
+  return sad;
+}
+
+/* We should figure out that test_abs has an ABS_EXPR in it. */
+/* { dg-final { scan-tree-dump " = ABS_EXPR" "phiopt1"} } */
+/* { dg-final { scan-tree-dump-times "changed to factor conversion out from" 1 
"phiopt1"} } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c
new file mode 100644
index 000..97c9db846cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-phiopt1-details" } */
+/* { dg-final { scan-tree-dump-times "changed to factor conversion out from 
COND_EXPR." 1 "phiopt1" } } */
+
+typedef unsigned char uint8_t;
+
+static uint8_t x264_clip_uint8 (int x)
+{
+  return x & (~255) ? (-x) >> 31 : x;
+}
+
+void
+mc_weight (uint8_t* __restrict dst, uint8_t* __restrict src,
+  int i_width,int i_scale)
+{
+  for(int x = 0; x < i_width; x++)
+dst[x] = x264_clip_uint8 (src[x] * i_scale);
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index f14b7e8b7e6..41fea78dc8d 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -4087,7 +4087,7 @@ pass_phiopt::execute (function *)
 
   gphi *newphi;
   if (single_pred_p (bb1)
- && !diamond_p
+ && EDGE_COUNT (merge->preds) == 2
  && (newphi = factor_out_conditional_conversion (e1, e2, phi,
  arg0, arg1,
  cond_stmt)))
-- 
2.31.1



[PATCH 1/3] PHIOPT: Add diamond bb form to factor_out_conditional_conversion

2023-05-06 Thread Andrew Pinski via Gcc-patches
So the function factor_out_conditional_conversion already supports
diamond shaped bb forms, just need to be called for such a thing.

harden-cond-comp.c needed to be changed as we would optimize out the
conversion now and that causes the compare hardening not needing to
split the block which it was testing. So change it such that there
would be no chance of optimization.

Also add two testcases that showed the improvement. PR 103771 is
solved in ifconvert also for the vectorizer but now it is solved
in a general sense.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/49959
PR tree-optimization/103771

gcc/ChangeLog:

* tree-ssa-phiopt.cc (pass_phiopt::execute): Support
Diamond shapped bb form for factor_out_conditional_conversion.

gcc/testsuite/ChangeLog:

* c-c++-common/torture/harden-cond-comp.c: Change testcase
slightly to avoid the new phiopt optimization.
* gcc.dg/tree-ssa/abs-2.c: New test.
* gcc.dg/tree-ssa/pr103771.c: New test.
---
 .../c-c++-common/torture/harden-cond-comp.c   |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/abs-2.c | 20 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr103771.c  | 18 +
 gcc/tree-ssa-phiopt.cc|  2 +-
 4 files changed, 42 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr103771.c

diff --git a/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c 
b/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
index 5aad890a1d3..dcf364ee993 100644
--- a/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
+++ b/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
@@ -1,11 +1,11 @@
 /* { dg-do compile } */
 /* { dg-options "-fharden-conditional-branches -fharden-compares 
-fdump-tree-hardcbr -fdump-tree-hardcmp -ffat-lto-objects" } */
 
-int f(int i, int j) {
+int f(int i, int j, int k, int l) {
   if (i == 0)
-return j != 0;
+return (j != 0) + l;
   else
-return i * j != 0;
+return (i * j != 0) * k;
 }
 
 /* { dg-final { scan-tree-dump-times "Splitting edge" 2 "hardcbr" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
new file mode 100644
index 000..328b1802541
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-2.c
@@ -0,0 +1,20 @@
+/* PR tree-optimization/49959 */
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-phiopt1-details" } */
+
+#define ABS(X)(((X)>0)?(X):-(X))
+unsigned long
+test_abs(int *cur)
+{
+  unsigned long sad = 0;
+  if (cur[0] > 0)
+sad = cur[0];
+  else
+sad = -cur[0];
+  return sad;
+}
+
+/* We should figure out that test_abs has an ABS_EXPR in it. */
+/* { dg-final { scan-tree-dump " = ABS_EXPR" "phiopt1"} } */
+/* { dg-final { scan-tree-dump-times "changed to factor conversion out from" 1 
"phiopt1"} } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c
new file mode 100644
index 000..97c9db846cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr103771.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-phiopt1-details" } */
+/* { dg-final { scan-tree-dump-times "changed to factor conversion out from 
COND_EXPR." 1 "phiopt1" } } */
+
+typedef unsigned char uint8_t;
+
+static uint8_t x264_clip_uint8 (int x)
+{
+  return x & (~255) ? (-x) >> 31 : x;
+}
+
+void
+mc_weight (uint8_t* __restrict dst, uint8_t* __restrict src,
+  int i_width,int i_scale)
+{
+  for(int x = 0; x < i_width; x++)
+dst[x] = x264_clip_uint8 (src[x] * i_scale);
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index f14b7e8b7e6..41fea78dc8d 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -4087,7 +4087,7 @@ pass_phiopt::execute (function *)
 
   gphi *newphi;
   if (single_pred_p (bb1)
- && !diamond_p
+ && EDGE_COUNT (merge->preds) == 2
  && (newphi = factor_out_conditional_conversion (e1, e2, phi,
  arg0, arg1,
  cond_stmt)))
-- 
2.31.1



Contents of PO file 'cpplib-13.1-b20230212.ru.po'

2023-05-06 Thread Translation Project Robot


cpplib-13.1-b20230212.ru.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



New Russian PO file for 'cpplib' (version 13.1-b20230212)

2023-05-06 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Russian team of translators.  The file is available at:

https://translationproject.org/latest/cpplib/ru.po

(This file, 'cpplib-13.1-b20230212.ru.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.