[PATCH] RISC-V: Enable more tests of "vect" for RVV

2023-10-07 Thread Juzhe-Zhong
This patch enables almost full coverage vectorization tests for RVV, except 
these
following tests (not enabled yet):

1. Will enable soon:

check_effective_target_vect_call_lrint
check_effective_target_vect_call_btrunc
check_effective_target_vect_call_btruncf
check_effective_target_vect_call_ceil
check_effective_target_vect_call_ceilf
check_effective_target_vect_call_floor
check_effective_target_vect_call_floorf
check_effective_target_vect_call_lceil
check_effective_target_vect_call_lfloor
check_effective_target_vect_call_nearbyint
check_effective_target_vect_call_nearbyintf
check_effective_target_vect_call_round
check_effective_target_vect_call_roundf

2. Not sure we will need to enable or not:

check_effective_target_vect_complex_*
check_effective_target_vect_simd_clones
check_effective_target_vect_bswap
check_effective_target_vect_widen_shift
check_effective_target_vect_widen_mult_*
check_effective_target_vect_widen_sum_*
check_effective_target_vect_unpack
check_effective_target_vect_interleave
check_effective_target_vect_extract_even_odd
check_effective_target_vect_pack_trunc
check_effective_target_vect_check_ptrs
check_effective_target_vect_sdiv_pow2_si
check_effective_target_vect_usad_*
check_effective_target_vect_udot_*
check_effective_target_vect_sdot_*
check_effective_target_vect_gather_load_ifn

After this patch, we will have these following additional FAILs:
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-

Re: [PATCH] LoongArch: Reimplement multilib build option handling.

2023-10-07 Thread Xi Ruoyao
On Sat, 2023-10-07 at 11:41 +0800, Yang Yujie wrote:
> Thanks for the testing!
> 
> This error seems to be difficult to reproduce since it is a makefile 
> dependency
> problem.  I think appending loongarch-multilib.h to $(GTM_H) instead of 
> $(TM_H)
> could help.

FWIW such issues are easier to reproduce with a high -j number.  I can
easily reproduce it with -j32 on a 3C5000-based server.

> > And when this is fixed, it might be a nice idea to have a
> > --with-multilib-list config in ./contrib/config-list.mk .
> 
> Thanks, will add this later too.
> 
> P.S. Currently support for "f32" is not active, and it should probably be
> avoided if you want to build a working rootfs.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH V5 2/2] rs6000: use mtvsrws to move sf from si p9

2023-10-07 Thread Jiufu Guo


Hi,

David Edelsohn  writes:

> This Message Is From an External Sender 
> This message came from outside your organization. 
> Report Suspicious 
>  
> On Thu, Oct 5, 2023 at 12:14 AM Jiufu Guo  wrote:
>
>  Hi,
>
>  As mentioned in PR108338, on p9, we could use mtvsrws to implement
>  the bitcast from SI to SF (or lowpart DI to SF).
>
>  For example:
>*(long long*)buff = di;
>float f = *(float*)(buff);
>
>  "sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" is generated.
>  A better one would be "mtvsrws 1,3 ; xscvspdpn 1,1".
>
>  Compare with previous patch:
>  https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628791.html
>  According to review comments, this version refines commit message
>  and words in comments, also updates the test case
>
>  Pass bootstrap and regtest on ppc64{,le}.
>  Is this ok for trunk?
>
> Okay.

Thank you! Committed as r14-4445.

BR,
Jeff.

>
> Thanks, David
>  
>  
>  BR,
>  Jeff (Jiufu Guo)
>
>  PR target/108338
>
>  gcc/ChangeLog:
>
>  * config/rs6000/rs6000.md (movsf_from_si): Update to generate mtvsrws
>  for P9.
>
>  gcc/testsuite/ChangeLog:
>
>  * gcc.target/powerpc/pr108338.c: Updated to check mtvsrws for p9.
>
>  ---
>   gcc/config/rs6000/rs6000.md | 25 -
>   gcc/testsuite/gcc.target/powerpc/pr108338.c | 21 ++---
>   2 files changed, 37 insertions(+), 9 deletions(-)
>
>  diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>  index 56bd8bc1147..d6dfb25cea0 100644
>  --- a/gcc/config/rs6000/rs6000.md
>  +++ b/gcc/config/rs6000/rs6000.md
>  @@ -8283,13 +8283,26 @@ (define_insn_and_split "movsf_from_si"
>   {
> rtx op0 = operands[0];
> rtx op1 = operands[1];
>  -  rtx op2 = operands[2];
>  -  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
>
>  -  /* Move SF value to upper 32-bits for xscvspdpn.  */
>  -  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
>  -  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
>  -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
>  +  /* Move lowpart 32-bits from register for SFmode.  */
>  +  if (TARGET_P9_VECTOR)
>  +{
>  +  /* Using mtvsrws;xscvspdpn.  */
>  +  rtx op0_v = gen_rtx_REG (V4SImode, REGNO (op0));
>  +  emit_insn (gen_vsx_splat_v4si (op0_v, op1));
>  +  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
>  +}
>  +  else
>  +{
>  +  rtx op2 = operands[2];
>  +  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
>  +
>  +  /* Using sldi;mtvsrd;xscvspdpn.  */
>  +  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
>  +  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
>  +  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
>  +}
>  +
> DONE;
>   }
> [(set_attr "length"
>  diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c 
> b/gcc/testsuite/gcc.target/powerpc/pr108338.c
>  index bd83c0b3ad8..5f2f62866ee 100644
>  --- a/gcc/testsuite/gcc.target/powerpc/pr108338.c
>  +++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
>  @@ -3,9 +3,12 @@
>   /* { dg-options "-O2 -save-temps" } */
>
>   /* Under lp64, parameter 'v' is in DI regs, then bitcast sub DI to SF. */
>  -/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
>  -/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
>  +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { lp64 && 
> has_arch_pwr8 } } } } */
>  +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && { 
> has_arch_pwr8 && { ! has_arch_pwr9 } } } } } } */
>  +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { lp64 && 
> has_arch_pwr9 } } } } */
>  +/* { dg-final { scan-assembler-times {\mmtvsrws\M} 1 { target { lp64 && 
> has_arch_pwr9 } } } } */
>   /* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
>  +/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && { 
> has_arch_pwr8 && { ! has_arch_pwr9 } } } } } } */
>
>   struct di_sf_sf
>   {
>  @@ -22,16 +25,28 @@ sf_from_high32bit_di (struct di_sf_sf v)
>   #endif
>   }
>
>  +float __attribute__ ((noipa))
>  +sf_from_low32bit_di (struct di_sf_sf v)
>  +{
>  +#ifdef __LITTLE_ENDIAN__
>  +  return v.f1;
>  +#else
>  +  return v.f2;
>  +#endif
>  +}
>  +
>   int main()
>   {
> struct di_sf_sf v;
> v.f1 = v.f2 = 0.0f;
>   #ifdef __LITTLE_ENDIAN__
>  +  v.f1 = 1.0f;
> v.f2 = 2.0f;
>   #else
> v.f1 = 2.0f;
>  +  v.f2 = 1.0f;
>   #endif
>  -  if (sf_from_high32bit_di (v) != 2.0f)
>  +  if (sf_from_high32bit_di (v) != 2.0f || sf_from_low32bit_di (v) != 1.0f)
>   __builtin_abort ();
> return 0;
>   }
>  -- 
>  2.25.1


Re: [PATCH V5 1/2] rs6000: optimize moving to sf from highpart di

2023-10-07 Thread Jiufu Guo


Hi,

David Edelsohn  writes:
>  
> On Thu, Oct 5, 2023 at 12:50 AM Jiufu Guo  wrote:
>
>  Hi,
>
>  Currently, we have the pattern "movsf_from_si2" which was trying
>  to support moving high part DI to SF.
>
>  But current pattern only accepts "ashiftrt":
>  XX:SF=bitcast:SF(subreg(YY:DI>>32),0), but actually "lshiftrt" should
>  also be ok.
>  And current pattern only supports BE.
>
>  This patch updats the pattern to support BE and "lshiftrt".
>
>  Compare with previous version:
>  https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628790.html
>  This version refines the code slightly and updates the test case
>  according to review comments.
>
>  Pass bootstrap and regtest on ppc64{,le}.
>  Is this ok for trunk?
>
> Okay.

Thank you! Committed as r14-.

BR,
Jeff.

>
> Thanks, David
>  
>  
>  BR,
>  Jeff (Jiufu Guo)
>
>  PR target/108338
>
>  gcc/ChangeLog:
>
>  * config/rs6000/predicates.md (lowpart_subreg_operator): New
>  define_predicate.
>  * config/rs6000/rs6000.md (any_rshift): New code_iterator.
>  (movsf_from_si2): Rename to ...
>  (movsf_from_si2_): ... this.
>
>  gcc/testsuite/ChangeLog:
>
>  * gcc.target/powerpc/pr108338.c: New test.
>
>  ---
>   gcc/config/rs6000/predicates.md |  5 +++
>   gcc/config/rs6000/rs6000.md | 12 ---
>   gcc/testsuite/gcc.target/powerpc/pr108338.c | 37 +
>   3 files changed, 49 insertions(+), 5 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108338.c
>
>  diff --git a/gcc/config/rs6000/predicates.md 
> b/gcc/config/rs6000/predicates.md
>  index 925f69cd3fc..ef7d3f214c4 100644
>  --- a/gcc/config/rs6000/predicates.md
>  +++ b/gcc/config/rs6000/predicates.md
>  @@ -2098,3 +2098,8 @@ (define_predicate "macho_pic_address"
> else
>   return false;
>   })
>  +
>  +(define_predicate "lowpart_subreg_operator"
>  +  (and (match_code "subreg")
>  +   (match_test "subreg_lowpart_offset (mode, GET_MODE (SUBREG_REG (op)))
>  +   == SUBREG_BYTE (op)")))
>  diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>  index 1a9a7b1a479..56bd8bc1147 100644
>  --- a/gcc/config/rs6000/rs6000.md
>  +++ b/gcc/config/rs6000/rs6000.md
>  @@ -643,6 +643,9 @@ (define_code_iterator any_extend[sign_extend 
> zero_extend])
>   (define_code_iterator any_fix  [fix unsigned_fix])
>   (define_code_iterator any_float[float unsigned_float])
>
>  +; Shift right.
>  +(define_code_iterator any_shiftrt  [ashiftrt lshiftrt])
>  +
>   (define_code_attr u  [(sign_extend "")
>(zero_extend  "u")
>(fix  "")
>  @@ -8303,14 +8306,13 @@ (define_insn_and_split "movsf_from_si"
>   ;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
>   ;; split it before reload with "and mask" to avoid generating shift right
>   ;; 32 bit then shift left 32 bit.
>  -(define_insn_and_split "movsf_from_si2"
>  +(define_insn_and_split "movsf_from_si2_"
> [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>  (unspec:SF
>  -[(subreg:SI
>  -  (ashiftrt:DI
>  +[(match_operator:SI 3 "lowpart_subreg_operator"
>  +  [(any_shiftrt:DI
>  (match_operand:DI 1 "input_operand" "r")
>  -   (const_int 32))
>  -  0)]
>  +   (const_int 32))])]
>   UNSPEC_SF_FROM_SI))
> (clobber (match_scratch:DI 2 "=r"))]
> "TARGET_NO_SF_SUBREG"
>  diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c 
> b/gcc/testsuite/gcc.target/powerpc/pr108338.c
>  new file mode 100644
>  index 000..bd83c0b3ad8
>  --- /dev/null
>  +++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
>  @@ -0,0 +1,37 @@
>  +/* { dg-do run } */
>  +/* { dg-require-effective-target hard_float } */
>  +/* { dg-options "-O2 -save-temps" } */
>  +
>  +/* Under lp64, parameter 'v' is in DI regs, then bitcast sub DI to SF. */
>  +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
>  +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
>  +/* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
>  +
>  +struct di_sf_sf
>  +{
>  +  float f1; float f2; long long l;
>  +};
>  +
>  +float __attribute__ ((noipa))
>  +sf_from_high32bit_di (struct di_sf_sf v)
>  +{
>  +#ifdef __LITTLE_ENDIAN__
>  +  return v.f2;
>  +#else
>  +  return v.f1;
>  +#endif
>  +}
>  +
>  +int main()
>  +{
>  +  struct di_sf_sf v;
>  +  v.f1 = v.f2 = 0.0f;
>  +#ifdef __LITTLE_ENDIAN__
>  +  v.f2 = 2.0f;
>  +#else
>  +  v.f1 = 2.0f;
>  +#endif
>  +  if (sf_from_high32bit_di (v) != 2.0f)
>  +__builtin_abort ();
>  +  return 0;
>  +}
>  -- 
>  2.25.1


Re: [PATCH] LoongArch: Reimplement multilib build option handling.

2023-10-07 Thread Jan-Benedict Glaw
Hi!

On Sat, 2023-10-07 15:08:34 +0800, Xi Ruoyao  wrote:
> On Sat, 2023-10-07 at 11:41 +0800, Yang Yujie wrote:
> > Thanks for the testing!
> > 
> > This error seems to be difficult to reproduce since it is a makefile 
> > dependency
> > problem.  I think appending loongarch-multilib.h to $(GTM_H) instead of 
> > $(TM_H)
> > could help.
> 
> FWIW such issues are easier to reproduce with a high -j number.  I can
> easily reproduce it with -j32 on a 3C5000-based server.

That's interesting. It showed up on all of my CI builds and I don't do
parallel builds at all (as I'd like to be able to `diff` the build
logs after sanitizing eg. timestamps and the build path.)

MfG, JBG

-- 


signature.asc
Description: PGP signature


Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-10-07 Thread Hongyu Wang
> It would be nice to add to the documentation that INSN_BASE_REG_CLASS,
> INSN_INDEX_REG_CLASS, and REGNO_OK_FOR_INSN_BASE_P if defined have
> priority over older corresponding macros as it is already documented for
> REGNO_MODE_CODE_OK_FOR_BASE_P relating to REGNO_OK_FOR_BASE_P. But this
> small issue can be addressed later.
>

Thanks, I would add the description like below when committing.

+@defmac INSN_BASE_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+base register for a specified @var{insn} must belong. This macro is
+used when some backend insns may have limited usage of base register
+compared with other insns. If you define this macro, the compiler will
+use it instead of all other defined macros that relate to
+BASE_REG_CLASS.
+@end defmac
+

+@defmac REGNO_OK_FOR_INSN_BASE_P (@var{num}, @var{insn})
+A C expression which is nonzero if register number @var{num} is
+suitable for use as a base register in operand addresses for a specified
+@var{insn}. This macro is used when some backend insn may have limited
+usage of base register compared with other insns. If you define this
+macro, the compiler will use it instead of all other defined macros
+that relate to REGNO_OK_FOR_BASE_P.
+@end defmac
+

+@defmac INSN_INDEX_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+index register for a specified @var{insn} must belong. This macro is
+used when some backend insns may have limited usage of index register
+compared with other insns. If you defined this macro, the compiler
+will use it instead of @code{INDEX_REG_CLASS}.
+@end defmac
+


[PATCH] LoongArch: Adjust makefile dependency for loongarch headers.

2023-10-07 Thread Yang Yujie
gcc/ChangeLog:

* config.gcc: Add loongarch-driver.h to tm_files.
* config/loongarch/loongarch.h: Do not include loongarch-driver.h.
* config/loongarch/t-loongarch: Append loongarch-multilib.h to $(GTM_H)
instead of $(TM_H) for building generator programs.
---
 gcc/config.gcc   | 2 +-
 gcc/config/loongarch/loongarch.h | 3 ---
 gcc/config/loongarch/t-loongarch | 3 ++-
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index ee46d96bf62..9cb600ca006 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2524,7 +2524,7 @@ riscv*-*-freebsd*)
 
 loongarch*-*-linux*)
tm_file="elfos.h gnu-user.h linux.h linux-android.h glibc-stdint.h 
${tm_file}"
-   tm_file="${tm_file} loongarch/gnu-user.h loongarch/linux.h"
+   tm_file="${tm_file} loongarch/gnu-user.h loongarch/linux.h 
loongarch/loongarch-driver.h"
extra_options="${extra_options} linux-android.opt"
tmake_file="${tmake_file} loongarch/t-multilib loongarch/t-linux"
gnu_ld=yes
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index d357e32e414..19a18fb5f1b 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -49,9 +49,6 @@ along with GCC; see the file COPYING3.  If not see
 
 #define TARGET_LIBGCC_SDATA_SECTION ".sdata"
 
-/* Driver native functions for SPEC processing in the GCC driver.  */
-#include "loongarch-driver.h"
-
 /* This definition replaces the formerly used 'm' constraint with a
different constraint letter in order to avoid changing semantics of
the 'm' constraint when accepting new address formats in
diff --git a/gcc/config/loongarch/t-loongarch b/gcc/config/loongarch/t-loongarch
index 9b06fa84bcc..667a6bb3b50 100644
--- a/gcc/config/loongarch/t-loongarch
+++ b/gcc/config/loongarch/t-loongarch
@@ -16,7 +16,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # .
 
-TM_H += loongarch-multilib.h $(srcdir)/config/loongarch/loongarch-driver.h
+
+GTM_H += loongarch-multilib.h
 OPTIONS_H_EXTRA += $(srcdir)/config/loongarch/loongarch-def.h \
   $(srcdir)/config/loongarch/loongarch-tune.h
 
-- 
2.42.0



Re: [PATCH] LoongArch: Adjust makefile dependency for loongarch headers.

2023-10-07 Thread Yang Yujie
Unfortunately, I was unable to reproduce the problem mentioned in
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631933.html

Heres's a possible fix without testing.  Please tell me if this works.

On Sat, Oct 07, 2023 at 04:50:14PM +0800, Yang Yujie wrote:
> -TM_H += loongarch-multilib.h $(srcdir)/config/loongarch/loongarch-driver.h
> +
> +GTM_H += loongarch-multilib.h
>  OPTIONS_H_EXTRA += $(srcdir)/config/loongarch/loongarch-def.h \
>  $(srcdir)/config/loongarch/loongarch-tune.h


(Forgot to modify tm_files for loongarch-elf* targets, so a v2 is probably 
needed.)



Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

2023-10-07 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, 5 Oct 2023, Tamar Christina wrote:
>
>> > I suppose the idea is that -abs(x) might be easier to optimize with other
>> > patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
>> > 
>> > For abs vs copysign it's a canonicalization, but (negate (abs @0)) is less
>> > canonical than copysign.
>> > 
>> > > Should I try removing this?
>> > 
>> > I'd say yes (and put the reverse canonicalization next to this pattern).
>> > 
>> 
>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
>> canonical and allows a target to expand this sequence efficiently.  Such
>> sequences are common in scientific code working with gradients.
>> 
>> various optimizations in match.pd only happened on COPYSIGN but not 
>> COPYSIGN_ALL
>> which means they exclude IFN_COPYSIGN.  COPYSIGN however is restricted to 
>> only
>
> That's not true:
>
> (define_operator_list COPYSIGN
> BUILT_IN_COPYSIGNF
> BUILT_IN_COPYSIGN
> BUILT_IN_COPYSIGNL
> IFN_COPYSIGN)
>
> but they miss the extended float builtin variants like
> __builtin_copysignf16.  Also see below
>
>> the C99 builtins and so doesn't work for vectors.
>> 
>> The patch expands these optimizations to work on COPYSIGN_ALL.
>> 
>> There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
>> which I remove since this is a less efficient form.  The testsuite is also
>> updated in light of this.
>> 
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> 
>> Ok for master?
>> 
>> Thanks,
>> Tamar
>> 
>> gcc/ChangeLog:
>> 
>>  PR tree-optimization/109154
>>  * match.pd: Add new neg+abs rule, remove inverse copysign rule and
>>  expand existing copysign optimizations.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  PR tree-optimization/109154
>>  * gcc.dg/fold-copysign-1.c: Updated.
>>  * gcc.dg/pr55152-2.c: Updated.
>>  * gcc.dg/tree-ssa/abs-4.c: Updated.
>>  * gcc.dg/tree-ssa/backprop-6.c: Updated.
>>  * gcc.dg/tree-ssa/copy-sign-2.c: Updated.
>>  * gcc.dg/tree-ssa/mult-abs-2.c: Updated.
>>  * gcc.target/aarch64/fneg-abs_1.c: New test.
>>  * gcc.target/aarch64/fneg-abs_2.c: New test.
>>  * gcc.target/aarch64/fneg-abs_3.c: New test.
>>  * gcc.target/aarch64/fneg-abs_4.c: New test.
>>  * gcc.target/aarch64/sve/fneg-abs_1.c: New test.
>>  * gcc.target/aarch64/sve/fneg-abs_2.c: New test.
>>  * gcc.target/aarch64/sve/fneg-abs_3.c: New test.
>>  * gcc.target/aarch64/sve/fneg-abs_4.c: New test.
>> 
>> --- inline copy of patch ---
>> 
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 
>> 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2
>>  100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>  
>>  /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
>>  (for coss (COS COSH)
>> - copysigns (COPYSIGN)
>> - (simplify
>> -  (coss (copysigns @0 @1))
>> -   (coss @0)))
>> + (for copysigns (COPYSIGN_ALL)
>
> So this ends up generating for example the match
> (cosf (copysignl ...)) which doesn't make much sense.
>
> The lock-step iteration did
> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...))
> which is leaner but misses the case of
> (cosf (ifn_copysign ..)) - that's probably what you are
> after with this change.
>
> That said, there isn't a nice solution (without altering the match.pd
> IL).  There's the explicit solution, spelling out all combinations.
>
> So if we want to go with yout pragmatic solution changing this
> to use COPYSIGN_ALL isn't necessary, only changing the lock-step
> for iteration to a cross product for iteration is.
>
> Changing just this pattern to
>
> (for coss (COS COSH)
>  (for copysigns (COPYSIGN)
>   (simplify
>(coss (copysigns @0 @1))
>(coss @0
>
> increases the total number of gimple-match-x.cc lines from
> 234988 to 235324.

I guess the difference between this and the later suggestions is that
this one allows builtin copysign to be paired with ifn cos, which would
be potentially useful in other situations.  (It isn't here because
ifn_cos is rarely provided.)  How much of the growth is due to that,
and much of it is from nonsensical combinations like
(builtin_cosf (builtin_copysignl ...))?

If it's mostly from nonsensical combinations then would it be possible
to make genmatch drop them?

> The alternative is to do
>
> (for coss (COS COSH)
>  copysigns (COPYSIGN)
>  (simplify
>   (coss (copysigns @0 @1))
>(coss @0))
>  (simplify
>   (coss (IFN_COPYSIGN @0 @1))
>(coss @0)))
>
> which properly will diagnose a duplicate pattern.  Ther are
> currently no operator lists with just builtins defined (that
> could be fixed, see gencfn-macros.cc), supposed we'd have
> COS_C we could do
>
> (for coss (COS_C COSH_C IFN_COS IFN_COSH)
>  copysigns (COPYSIGN_C COPYSIGN_C IFN_COPYSIGN IFN_COPYSIGN 
> IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN 
> 

[PATCH] TEST: Fix XPASS of TSVC testsuites for RVV

2023-10-07 Thread Juzhe-Zhong
Fix these following XPASS FAILs of TSVC for RVV:

XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c -flto -ffat-lto-objects  scan-tree-dump 
vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c scan-tree-dump vect "vectorized 1 loops"

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/vect-tsvc-s1115.c: Fix TSVC XPASS.
* gcc.dg/vect/tsvc/vect-tsvc-s114.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1161.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1232.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s124.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1279.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s161.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s253.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s257.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-

Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec

2023-10-07 Thread juzhe.zh...@rivai.ai
These testcases cause multiple FAILs:

I think you should 
/* { dg-do run { target { riscv_v && riscv_zvfh_hw && riscv_zfh_ok } } } */



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-07 14:25
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
From: Pan Li 
 
For _Float16 types, add run test for:
* ceil
* floor
* nearbyint
* rint
* round
* roundeven
* trunc
 
For float and double, add run test for:
* roundeven
 
The zfa extension is required for these run test cases, the simulation
target_board may look like below for rv64.
 
target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow"
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add zfa for building.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-ceil-run-0.c  | 39 +++
.../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++
.../rvv/autovec/unop/math-nearbyint-run-0.c   | 48 +++
.../riscv/rvv/autovec/unop/math-rint-run-0.c  | 48 +++
.../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++
.../rvv/autovec/unop/math-roundeven-run-0.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-1.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-2.c   | 39 +++
.../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  4 +-
10 files changed, 371 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
new file mode 100644
index 000..70cba3602bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+_Float16 in[ARRAY_SIZE];
+_Float16 out[ARRAY_SIZE];
+_Float16 ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL (_Float16, __builtin_ceilf16)
+TEST_ASSERT (_Float16)
+
+TEST_INIT (_Float16, 1.2, 2.0, 1)
+TEST_INIT (_Float16, -1.2, -1.0, 2)
+TEST_INIT (_Float16, 3.0, 3.0, 3)
+TEST_INIT (_Float16, 1023.5, 1024.0, 4)
+TEST_INIT (_Float16, 1024.0, 1024.0, 5)
+TEST_INIT (_Float16, 0.0, 0.0, 6)
+TEST_INIT (_Float16, -0.0, -0.0, 7)
+TEST_INIT (_Float16, -1023.5, -1023.0, 8)
+TEST_INIT (_Float16, -1024.0, -1024.0, 9)
+
+int
+main ()
+{
+  RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
new file mode 100644
index 000..c542278c1f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-option

Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.

2023-10-07 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Oct 5, 2023 at 10:46 PM Tamar Christina  
> wrote:
>>
>> > -Original Message-
>> > From: Richard Sandiford 
>> > Sent: Thursday, October 5, 2023 9:26 PM
>> > To: Tamar Christina 
>> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> > ; Marcus Shawcroft
>> > ; Kyrylo Tkachov 
>> > Subject: Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.
>> >
>> > Tamar Christina  writes:
>> > >> -Original Message-
>> > >> From: Richard Sandiford 
>> > >> Sent: Thursday, October 5, 2023 8:29 PM
>> > >> To: Tamar Christina 
>> > >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> > >> ; Marcus Shawcroft
>> > >> ; Kyrylo Tkachov
>> > 
>> > >> Subject: Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.
>> > >>
>> > >> Tamar Christina  writes:
>> > >> > Hi All,
>> > >> >
>> > >> > This adds an implementation for masked copysign along with an
>> > >> > optimized pattern for masked copysign (x, -1).
>> > >>
>> > >> It feels like we're ending up with a lot of AArch64-specific code
>> > >> that just hard- codes the observation that changing the sign is
>> > >> equivalent to changing the top bit.  We then need to make sure that
>> > >> we choose the best way of changing the top bit for any given situation.
>> > >>
>> > >> Hard-coding the -1/negative case is one instance of that.  But it
>> > >> looks like we also fail to use the best sequence for SVE2.  E.g.
>> > >> [https://godbolt.org/z/ajh3MM5jv]:
>> > >>
>> > >> #include 
>> > >>
>> > >> void f(double *restrict a, double *restrict b) {
>> > >> for (int i = 0; i < 100; ++i)
>> > >> a[i] = __builtin_copysign(a[i], b[i]); }
>> > >>
>> > >> void g(uint64_t *restrict a, uint64_t *restrict b, uint64_t c) {
>> > >> for (int i = 0; i < 100; ++i)
>> > >> a[i] = (a[i] & ~c) | (b[i] & c); }
>> > >>
>> > >> gives:
>> > >>
>> > >> f:
>> > >> mov x2, 0
>> > >> mov w3, 100
>> > >> whilelo p7.d, wzr, w3
>> > >> .L2:
>> > >> ld1dz30.d, p7/z, [x0, x2, lsl 3]
>> > >> ld1dz31.d, p7/z, [x1, x2, lsl 3]
>> > >> and z30.d, z30.d, #0x7fff
>> > >> and z31.d, z31.d, #0x8000
>> > >> orr z31.d, z31.d, z30.d
>> > >> st1dz31.d, p7, [x0, x2, lsl 3]
>> > >> incdx2
>> > >> whilelo p7.d, w2, w3
>> > >> b.any   .L2
>> > >> ret
>> > >> g:
>> > >> mov x3, 0
>> > >> mov w4, 100
>> > >> mov z29.d, x2
>> > >> whilelo p7.d, wzr, w4
>> > >> .L6:
>> > >> ld1dz30.d, p7/z, [x0, x3, lsl 3]
>> > >> ld1dz31.d, p7/z, [x1, x3, lsl 3]
>> > >> bsl z31.d, z31.d, z30.d, z29.d
>> > >> st1dz31.d, p7, [x0, x3, lsl 3]
>> > >> incdx3
>> > >> whilelo p7.d, w3, w4
>> > >> b.any   .L6
>> > >> ret
>> > >>
>> > >> I saw that you originally tried to do this in match.pd and that the
>> > >> decision was to fold to copysign instead.  But perhaps there's a
>> > >> compromise where isel does something with the (new) copysign canonical
>> > form?
>> > >> I.e. could we go with your new version of the match.pd patch, and add
>> > >> some isel stuff as a follow-on?
>> > >>
>> > >
>> > > Sure if that's what's desired But..
>> > >
>> > > The example you posted above is for instance worse for x86
>> > > https://godbolt.org/z/x9ccqxW6T where the first operation has a
>> > > dependency chain of 2 and the latter of 3.  It's likely any open coding 
>> > > of this
>> > operation is going to hurt a target.
>> > >
>> > > So I'm unsure what isel transform this into...
>> >
>> > I didn't mean that we should go straight to using isel for the general 
>> > case, just
>> > for the new case.  The example above was instead trying to show the general
>> > point that hiding the logic ops in target code is a double-edged sword.
>>
>> I see.. but the problem here is that transforming copysign (x, -1) into
>> (x | 0x800) would require an integer operation on an FP value.  I'm 
>> happy to
>> do it but it seems like it'll be an AArch64 only thing anyway.
>>
>> If we want to do this we need to check can_change_mode_class or a hook.
>> Most targets including x86 reject the conversion.  So it'll just be 
>> effectively an AArch64
>> thing.
>>
>> You're right that the actual equivalent transformation is this 
>> https://godbolt.org/z/KesfrMv5z
>> But the target won't allow it.
>>
>> >
>> > The x86_64 example for the -1 case would be
>> > https://godbolt.org/z/b9s6MaKs8 where the isel change would be an
>> > improvement.  Without that, I guess
>> > x86_64 will need to have a similar patch to the AArch64 one.
>> >
>>
>> I think that's to be expected.  I think it's logical that every target just 
>> needs to implement
>> their optabs optimally.
>>
>> > That said, https://godbolt.org/z/e6nqoqbMh suggests that powerpc64 is
>> > probably relying on the current copysign -> neg/abs tran

Re: Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec

2023-10-07 Thread juzhe.zh...@rivai.ai
Also I have reverted your commit:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=066a43ce72ab6559ba14af9628df19daa0b85cdf

Plz test the patch and verify it doesn't cause any FAILs if the toolchain 
doesn't have "zvfh_zfh".




juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-10-07 17:49
To: pan2.li; gcc-patches
CC: pan2.li; yanzhang.wang; kito.cheng
Subject: Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
These testcases cause multiple FAILs:

I think you should 
/* { dg-do run { target { riscv_v && riscv_zvfh_hw && riscv_zfh_ok } } } */



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-07 14:25
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
From: Pan Li 
 
For _Float16 types, add run test for:
* ceil
* floor
* nearbyint
* rint
* round
* roundeven
* trunc
 
For float and double, add run test for:
* roundeven
 
The zfa extension is required for these run test cases, the simulation
target_board may look like below for rv64.
 
target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow"
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add zfa for building.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-ceil-run-0.c  | 39 +++
.../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++
.../rvv/autovec/unop/math-nearbyint-run-0.c   | 48 +++
.../riscv/rvv/autovec/unop/math-rint-run-0.c  | 48 +++
.../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++
.../rvv/autovec/unop/math-roundeven-run-0.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-1.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-2.c   | 39 +++
.../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  4 +-
10 files changed, 371 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
new file mode 100644
index 000..70cba3602bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+_Float16 in[ARRAY_SIZE];
+_Float16 out[ARRAY_SIZE];
+_Float16 ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL (_Float16, __builtin_ceilf16)
+TEST_ASSERT (_Float16)
+
+TEST_INIT (_Float16, 1.2, 2.0, 1)
+TEST_INIT (_Float16, -1.2, -1.0, 2)
+TEST_INIT (_Float16, 3.0, 3.0, 3)
+TEST_INIT (_Float16, 1023.5, 1024.0, 4)
+TEST_INIT (_Float16, 1024.0, 1024.0, 5)
+TEST_INIT (_Float16, 0.0, 0.0, 6)
+TEST_INIT (_Float16, -0.0, -0.0, 7)
+TEST_INIT (_Float16, -1023.5, -1023.0, 8)
+TEST_INIT (_Float16, -1024.0, -1024.0, 9)
+
+int
+main ()
+{
+  RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 9, __b

Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

2023-10-07 Thread Richard Biener



> Am 07.10.2023 um 11:23 schrieb Richard Sandiford :
> 
> Richard Biener  writes:
>> On Thu, 5 Oct 2023, Tamar Christina wrote:
>> 
 I suppose the idea is that -abs(x) might be easier to optimize with other
 patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
 
 For abs vs copysign it's a canonicalization, but (negate (abs @0)) is less
 canonical than copysign.
 
> Should I try removing this?
 
 I'd say yes (and put the reverse canonicalization next to this pattern).
 
>>> 
>>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
>>> canonical and allows a target to expand this sequence efficiently.  Such
>>> sequences are common in scientific code working with gradients.
>>> 
>>> various optimizations in match.pd only happened on COPYSIGN but not 
>>> COPYSIGN_ALL
>>> which means they exclude IFN_COPYSIGN.  COPYSIGN however is restricted to 
>>> only
>> 
>> That's not true:
>> 
>> (define_operator_list COPYSIGN
>>BUILT_IN_COPYSIGNF
>>BUILT_IN_COPYSIGN
>>BUILT_IN_COPYSIGNL
>>IFN_COPYSIGN)
>> 
>> but they miss the extended float builtin variants like
>> __builtin_copysignf16.  Also see below
>> 
>>> the C99 builtins and so doesn't work for vectors.
>>> 
>>> The patch expands these optimizations to work on COPYSIGN_ALL.
>>> 
>>> There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
>>> which I remove since this is a less efficient form.  The testsuite is also
>>> updated in light of this.
>>> 
>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>> 
>>> Ok for master?
>>> 
>>> Thanks,
>>> Tamar
>>> 
>>> gcc/ChangeLog:
>>> 
>>>PR tree-optimization/109154
>>>* match.pd: Add new neg+abs rule, remove inverse copysign rule and
>>>expand existing copysign optimizations.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>PR tree-optimization/109154
>>>* gcc.dg/fold-copysign-1.c: Updated.
>>>* gcc.dg/pr55152-2.c: Updated.
>>>* gcc.dg/tree-ssa/abs-4.c: Updated.
>>>* gcc.dg/tree-ssa/backprop-6.c: Updated.
>>>* gcc.dg/tree-ssa/copy-sign-2.c: Updated.
>>>* gcc.dg/tree-ssa/mult-abs-2.c: Updated.
>>>* gcc.target/aarch64/fneg-abs_1.c: New test.
>>>* gcc.target/aarch64/fneg-abs_2.c: New test.
>>>* gcc.target/aarch64/fneg-abs_3.c: New test.
>>>* gcc.target/aarch64/fneg-abs_4.c: New test.
>>>* gcc.target/aarch64/sve/fneg-abs_1.c: New test.
>>>* gcc.target/aarch64/sve/fneg-abs_2.c: New test.
>>>* gcc.target/aarch64/sve/fneg-abs_3.c: New test.
>>>* gcc.target/aarch64/sve/fneg-abs_4.c: New test.
>>> 
>>> --- inline copy of patch ---
>>> 
>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>> index 
>>> 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2
>>>  100644
>>> --- a/gcc/match.pd
>>> +++ b/gcc/match.pd
>>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>> 
>>> /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
>>> (for coss (COS COSH)
>>> - copysigns (COPYSIGN)
>>> - (simplify
>>> -  (coss (copysigns @0 @1))
>>> -   (coss @0)))
>>> + (for copysigns (COPYSIGN_ALL)
>> 
>> So this ends up generating for example the match
>> (cosf (copysignl ...)) which doesn't make much sense.
>> 
>> The lock-step iteration did
>> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...))
>> which is leaner but misses the case of
>> (cosf (ifn_copysign ..)) - that's probably what you are
>> after with this change.
>> 
>> That said, there isn't a nice solution (without altering the match.pd
>> IL).  There's the explicit solution, spelling out all combinations.
>> 
>> So if we want to go with yout pragmatic solution changing this
>> to use COPYSIGN_ALL isn't necessary, only changing the lock-step
>> for iteration to a cross product for iteration is.
>> 
>> Changing just this pattern to
>> 
>> (for coss (COS COSH)
>> (for copysigns (COPYSIGN)
>>  (simplify
>>   (coss (copysigns @0 @1))
>>   (coss @0
>> 
>> increases the total number of gimple-match-x.cc lines from
>> 234988 to 235324.
> 
> I guess the difference between this and the later suggestions is that
> this one allows builtin copysign to be paired with ifn cos, which would
> be potentially useful in other situations.  (It isn't here because
> ifn_cos is rarely provided.)  How much of the growth is due to that,
> and much of it is from nonsensical combinations like
> (builtin_cosf (builtin_copysignl ...))?
> 
> If it's mostly from nonsensical combinations then would it be possible
> to make genmatch drop them?
> 
>> The alternative is to do
>> 
>> (for coss (COS COSH)
>> copysigns (COPYSIGN)
>> (simplify
>>  (coss (copysigns @0 @1))
>>   (coss @0))
>> (simplify
>>  (coss (IFN_COPYSIGN @0 @1))
>>   (coss @0)))
>> 
>> which properly will diagnose a duplicate pattern.  Ther are
>> currently no operator lists with just builtins defined (that
>> could be fixed, see gencfn-macros.cc), supposed we'd have
>> COS_C we could do
>> 
>> (for coss (C

[PATCH] RISC-V: add static-pie support

2023-10-07 Thread yanzhang . wang
From: Yanzhang Wang 

We only need to pass options to the linker when static-pie is passed.
There's another patch to enable static-pie in glibc. And we need to
enable in GCC first.

gcc/ChangeLog:

* config/riscv/linux.h: Pass the static-pie specific options to
  the linker.

Signed-off-by: Yanzhang Wang 
---

Have tested with glibc enabled and no regression of gcc found.

 gcc/config/riscv/linux.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/linux.h b/gcc/config/riscv/linux.h
index 7323ff30f70..8901671ddf6 100644
--- a/gcc/config/riscv/linux.h
+++ b/gcc/config/riscv/linux.h
@@ -55,9 +55,10 @@ along with GCC; see the file COPYING3.  If not see
 %{shared} \
   %{!shared: \
 %{!static: \
-  %{rdynamic:-export-dynamic} \
-  -dynamic-linker " GNU_USER_DYNAMIC_LINKER "} \
-%{static:-static}}"
+  %{!static-pie: \
+   %{rdynamic:-export-dynamic} \
+   -dynamic-linker " GNU_USER_DYNAMIC_LINKER "}} \
+%{static:-static} %{static-pie:-static -pie --no-dynamic-linker -z text}}"
 
 #define STARTFILE_PREFIX_SPEC  \
"/lib" XLEN_SPEC "/" ABI_SPEC "/ "  \
-- 
2.42.0



Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

2023-10-07 Thread Richard Sandiford
Richard Biener  writes:
>> Am 07.10.2023 um 11:23 schrieb Richard Sandiford 
>> >> Richard Biener  writes:
>>> On Thu, 5 Oct 2023, Tamar Christina wrote:
>>> 
> I suppose the idea is that -abs(x) might be easier to optimize with other
> patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
> 
> For abs vs copysign it's a canonicalization, but (negate (abs @0)) is less
> canonical than copysign.
> 
>> Should I try removing this?
> 
> I'd say yes (and put the reverse canonicalization next to this pattern).
> 
 
 This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
 canonical and allows a target to expand this sequence efficiently.  Such
 sequences are common in scientific code working with gradients.
 
 various optimizations in match.pd only happened on COPYSIGN but not 
 COPYSIGN_ALL
 which means they exclude IFN_COPYSIGN.  COPYSIGN however is restricted to 
 only
>>> 
>>> That's not true:
>>> 
>>> (define_operator_list COPYSIGN
>>>BUILT_IN_COPYSIGNF
>>>BUILT_IN_COPYSIGN
>>>BUILT_IN_COPYSIGNL
>>>IFN_COPYSIGN)
>>> 
>>> but they miss the extended float builtin variants like
>>> __builtin_copysignf16.  Also see below
>>> 
 the C99 builtins and so doesn't work for vectors.
 
 The patch expands these optimizations to work on COPYSIGN_ALL.
 
 There is an existing canonicalization of copysign (x, -1) to fneg (fabs 
 (x))
 which I remove since this is a less efficient form.  The testsuite is also
 updated in light of this.
 
 Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
 
 Ok for master?
 
 Thanks,
 Tamar
 
 gcc/ChangeLog:
 
PR tree-optimization/109154
* match.pd: Add new neg+abs rule, remove inverse copysign rule and
expand existing copysign optimizations.
 
 gcc/testsuite/ChangeLog:
 
PR tree-optimization/109154
* gcc.dg/fold-copysign-1.c: Updated.
* gcc.dg/pr55152-2.c: Updated.
* gcc.dg/tree-ssa/abs-4.c: Updated.
* gcc.dg/tree-ssa/backprop-6.c: Updated.
* gcc.dg/tree-ssa/copy-sign-2.c: Updated.
* gcc.dg/tree-ssa/mult-abs-2.c: Updated.
* gcc.target/aarch64/fneg-abs_1.c: New test.
* gcc.target/aarch64/fneg-abs_2.c: New test.
* gcc.target/aarch64/fneg-abs_3.c: New test.
* gcc.target/aarch64/fneg-abs_4.c: New test.
* gcc.target/aarch64/sve/fneg-abs_1.c: New test.
* gcc.target/aarch64/sve/fneg-abs_2.c: New test.
* gcc.target/aarch64/sve/fneg-abs_3.c: New test.
* gcc.target/aarch64/sve/fneg-abs_4.c: New test.
 
 --- inline copy of patch ---
 
 diff --git a/gcc/match.pd b/gcc/match.pd
 index 
 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2
  100644
 --- a/gcc/match.pd
 +++ b/gcc/match.pd
 @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
 (for coss (COS COSH)
 - copysigns (COPYSIGN)
 - (simplify
 -  (coss (copysigns @0 @1))
 -   (coss @0)))
 + (for copysigns (COPYSIGN_ALL)
>>> 
>>> So this ends up generating for example the match
>>> (cosf (copysignl ...)) which doesn't make much sense.
>>> 
>>> The lock-step iteration did
>>> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...))
>>> which is leaner but misses the case of
>>> (cosf (ifn_copysign ..)) - that's probably what you are
>>> after with this change.
>>> 
>>> That said, there isn't a nice solution (without altering the match.pd
>>> IL).  There's the explicit solution, spelling out all combinations.
>>> 
>>> So if we want to go with yout pragmatic solution changing this
>>> to use COPYSIGN_ALL isn't necessary, only changing the lock-step
>>> for iteration to a cross product for iteration is.
>>> 
>>> Changing just this pattern to
>>> 
>>> (for coss (COS COSH)
>>> (for copysigns (COPYSIGN)
>>>  (simplify
>>>   (coss (copysigns @0 @1))
>>>   (coss @0
>>> 
>>> increases the total number of gimple-match-x.cc lines from
>>> 234988 to 235324.
>> 
>> I guess the difference between this and the later suggestions is that
>> this one allows builtin copysign to be paired with ifn cos, which would
>> be potentially useful in other situations.  (It isn't here because
>> ifn_cos is rarely provided.)  How much of the growth is due to that,
>> and much of it is from nonsensical combinations like
>> (builtin_cosf (builtin_copysignl ...))?
>> 
>> If it's mostly from nonsensical combinations then would it be possible
>> to make genmatch drop them?
>> 
>>> The alternative is to do
>>> 
>>> (for coss (COS COSH)
>>> copysigns (COPYSIGN)
>>> (simplify
>>>  (coss (copysigns @0 @1))
>>>   (coss @0))
>>> (simplify
>>>  (coss (IFN_COPYSIGN @0 @1))
>>>   (coss @0)))
>>> 
>>> which properly will diagnose a duplicate pattern.  Ther ar

[PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-07 Thread Juzhe-Zhong
This patch fixes the following dumple FAILs:
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_SUB" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_SUB" 1

For RVV, the expected dumple IR is COND_LEN_* pattern.

Also, we are still failing at this check:

FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
\\.COND_LEN_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_LEN_SUB"

Since we have a known bug in GIMPLE_FOLD that Robin is working on it.

@Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug 
fix patch.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV.
* gcc.dg/vect/vect-cond-arith-4.c: Ditto.
* gcc.dg/vect/vect-cond-arith-5.c: Ditto.
* gcc.dg/vect/vect-cond-arith-6.c: Ditto.

---
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c |  6 --
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c | 12 
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c | 12 
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c | 12 
 4 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
index 38994ea82a5..7bddc122037 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
@@ -41,5 +41,7 @@ neg_xi (double *x)
   return res_3;
 }
 
-/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
-/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { 
vect_double_cond_arith && { vect_fully_masked && { ! riscv_v } } } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { 
vect_double_cond_arith && { vect_fully_masked && { ! riscv_v } } } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_LEN_ADD} "vect" { target { 
vect_double_cond_arith && { vect_fully_masked && riscv_v } } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_LEN_SUB} "optimized" { target { 
vect_double_cond_arith && { vect_fully_masked && riscv_v } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
index 1af0fe6

Re: [PATCH 6/6] aarch64: Add front-end argument type checking for target builtins

2023-10-07 Thread Richard Sandiford
Richard Earnshaw  writes:
> On 03/10/2023 16:18, Victor Do Nascimento wrote:
>> In implementing the ACLE read/write system register builtins it was
>> observed that leaving argument type checking to be done at expand-time
>> meant that poorly-formed function calls were being "fixed" by certain
>> optimization passes, meaning bad code wasn't being properly picked up
>> in checking.
>> 
>> Example:
>> 
>>const char *regname = "amcgcr_el0";
>>long long a = __builtin_aarch64_rsr64 (regname);
>> 
>> is reduced by the ccp1 pass to
>> 
>>long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");
>> 
>> As these functions require an argument of STRING_CST type, there needs
>> to be a check carried out by the front-end capable of picking this up.
>> 
>> The introduced `check_general_builtin_call' function will be called by
>> the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin
>> belonging to the AARCH64_BUILTIN_GENERAL category is encountered,
>> carrying out any appropriate checks associated with a particular
>> builtin function code.
>
> Doesn't this prevent reasonable wrapping of the __builtin... names with 
> something more palatable?  Eg:
>
> static inline __attribute__(("always_inline")) long long get_sysreg_ll 
> (const char *regname)
> {
>return __builtin_aarch64_rsr64 (regname);
> }
>
> ...
>long long x = get_sysreg_ll("amcgcr_el0");
> ...

I think it's case of picking your poison.  If we didn't do this,
and only checked later, then it's unlikely that GCC and Clang would
be consistent about when a constant gets folded soon enough.

But yeah, it means that the above would need to be a macro in C.
Enlightened souls using C++ could instead do:

  template
  long long get_sysreg_ll()
  {
return __builtin_aarch64_rsr64(regname);
  }

  ... get_sysreg_ll<"amcgcr_el0">() ...

Or at least I hope so.  Might be nice to have a test for this.

Thanks,
Richard


Re: [PATCH] sso-string@gnu-versioned-namespace [PR83077]

2023-10-07 Thread François Dumont

Hi

Here is a rebased version of this patch.

There are few test failures when running 'make check-c++' but nothing new.

Still, there are 2 patches awaiting validation to fix some of them, PR 
c++/111524 to fix another bunch and I fear that we will have to live 
with the others.


    libstdc++: [_GLIBCXX_INLINE_VERSION] Use cxx11 abi [PR83077]

    Use cxx11 abi when activating versioned namespace mode. To do support
    a new configuration mode where !_GLIBCXX_USE_DUAL_ABI and 
_GLIBCXX_USE_CXX11_ABI.


    The main change is that std::__cow_string is now defined whenever 
_GLIBCXX_USE_DUAL_ABI
    or _GLIBCXX_USE_CXX11_ABI is true. Implementation is using 
available std::string in

    case of dual abi and a subset of it when it's not.

    On the other side std::__sso_string is defined only when 
_GLIBCXX_USE_DUAL_ABI is true
    and _GLIBCXX_USE_CXX11_ABI is false. Meaning that std::__sso_string 
is a typedef for the
    cow std::string implementation when dual abi is disabled and cow 
string is being used.


    libstdcxx-v3/ChangeLog:

    PR libstdc++/83077
    * acinclude.m4 [GLIBCXX_ENABLE_LIBSTDCXX_DUAL_ABI]: Default 
to "new" libstdcxx abi

    when enable_symvers is gnu-versioned-namespace.
    * config/locale/dragonfly/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Define money_base

    members.
    * config/locale/generic/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Likewise.
    * config/locale/gnu/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Likewise.

    * config/locale/gnu/numeric_members.cc
    [!_GLIBCXX_USE_DUAL_ABI](__narrow_multibyte_chars): Define.
    * configure: Regenerate.
    * include/bits/c++config
    [_GLIBCXX_INLINE_VERSION](_GLIBCXX_NAMESPACE_CXX11, 
_GLIBCXX_BEGIN_NAMESPACE_CXX11):

    Define empty.
[_GLIBCXX_INLINE_VERSION](_GLIBCXX_END_NAMESPACE_CXX11, 
_GLIBCXX_DEFAULT_ABI_TAG):

    Likewise.
    * include/bits/cow_string.h [!_GLIBCXX_USE_CXX11_ABI]: 
Define a light version of COW

    basic_string as __std_cow_string for use in stdexcept.
    * include/std/stdexcept [_GLIBCXX_USE_CXX11_ABI]: Define 
__cow_string.

    (__cow_string(const char*)): New.
    (__cow_string::c_str()): New.
    * python/libstdcxx/v6/printers.py 
(StdStringPrinter::__init__): Set self.new_string to True

    when std::__8::basic_string type is found.
    * src/Makefile.am 
[ENABLE_SYMVERS_GNU_NAMESPACE](ldbl_alt128_compat_sources): Define empty.

    * src/Makefile.in: Regenerate.
    * src/c++11/Makefile.am (cxx11_abi_sources): Rename into...
    (dual_abi_sources): ...this. Also move cow-local_init.cc, 
cxx11-hash_tr1.cc,

    cxx11-ios_failure.cc entries to...
    (sources): ...this.
    (extra_string_inst_sources): Move cow-fstream-inst.cc, 
cow-sstream-inst.cc, cow-string-inst.cc,
    cow-string-io-inst.cc, cow-wtring-inst.cc, 
cow-wstring-io-inst.cc, cxx11-locale-inst.cc,

    cxx11-wlocale-inst.cc entries to...
    (inst_sources): ...this.
    * src/c++11/Makefile.in: Regenerate.
    * src/c++11/cow-fstream-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-locale_init.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-sstream-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-stdexcept.cc [_GLIBCXX_USE_CXX11_ABI]: 
Include .
    [_GLIBCXX_USE_DUAL_ABI || 
_GLIBCXX_USE_CXX11_ABI](__cow_string): Redefine before
    including . Define 
_GLIBCXX_DEFINE_STDEXCEPT_INSTANTIATIONS so that

    __cow_string definition in  is skipped.
    [_GLIBCXX_USE_CXX11_ABI]: Skip Transaction Memory TS 
definitions.
    * src/c++11/string-inst.cc: Add sizeof/alignof 
static_assert on stdexcept

    __cow_string definition.
    (_GLIBCXX_DEFINING_CXX11_ABI_INSTANTIATIONS): Define 
following _GLIBCXX_USE_CXX11_ABI

    value.
    [_GLIBCXX_USE_CXX11_ABI && 
!_GLIBCXX_DEFINING_CXX11_ABI_INSTANTIATIONS]:
    Define _GLIBCXX_DEFINING_COW_STRING_INSTANTIATIONS. Include 
.
    Define basic_string as __std_cow_string for the current 
translation unit.
    * src/c++11/cow-string-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-string-io-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-wstring-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-wstring-io-inst.cc 
[_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
    * src/c++11/cxx11-hash_tr1.cc [!_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cxx11-ios_failure.cc [!_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.

    [!_GLIBCXX_USE_DUAL_ABI] (__ios_failure): Remove.
    * src/c++11/cxx11-l

Re: [PATCH v2][GCC] aarch64: Enable Cortex-X4 CPU

2023-10-07 Thread Richard Sandiford
Saurabh Jha  writes:
> On 10/6/2023 2:24 PM, Saurabh Jha wrote:
>> Hey,
>>
>> This patch adds support for the Cortex-X4 CPU to GCC.
>>
>> Regression testing for aarch64-none-elf target and found no regressions.
>>
>> Okay for gcc-master? I don't have commit access so if it looks okay, 
>> could someone please help me commit this?
>>
>>
>> Thanks,
>>
>> Saurabh
>>
>>
>> gcc/ChangeLog
>>
>>   * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add support for 
>> cortex-x4 core.
>>   * config/aarch64/aarch64-tune.md: Regenerated.
>>   * doc/invoke.texi: Add command-line option for cortex-x4 core.
>
> Apologies, I forgot to add the patch file on my previous email.

Thanks, pushed to trunk.

Richard


Re: [PATCH] RFC: Add late-combine pass [PR106594]

2023-10-07 Thread Richard Sandiford
Robin Dapp  writes:
> Hi Richard,
>
> cool, thanks.  I just gave it a try with my test cases and it does what
> it is supposed to do, at least if I disable the register pressure check :)
> A cursory look over the test suite showed no major regressions and just
> some overly specific tests.
>
> My test case only works before split, though, as the UNSPEC predicates will
> prevent further combination afterwards.
>
> Right now the (pre-RA) code combines every instance disregarding the actual
> pressure and just checking if the "new" value does not occupy more registers
> than the old one.
>
> - Shouldn't the "pressure" also depend on the number of available hard regs
> (i.e. an nregs = 2 is not necessarily worse than nregs = 1 if we have 32
> hard regs in the new class vs 16 in the old one)?

Right, that's what I meant by extending/tweaking the pressure heuristics
for your case.

> - I assume/hope you expected my (now obsolete) fwprop change could be re-used?

Yeah, I was hoping you'd be able to apply similar heuristics to the new pass.
(I didn't find time to look at the old heuristics in detail, though, sorry.)

I suppose the point of comparison would then be "new pass with current
heuristics" vs. "new pass with relaxed heuristics".

It'd be a good/interesting test of the new heuristics to apply them
without any constraint on the complexity of the SET_SRC.

> Otherwise we wouldn't want to unconditionally "propagate" into a loop for 
> example?
> For my test case the combination of the vec_duplicate into all insns leads
> to "high" register pressure that we could avoid.
>
> How should we continue here?  I suppose you'll first want to get this version
> to the trunk before complicating it further.

Yeah, that'd probably be best.  I need to split the patch up into a
proper submission sequence, do more testing, and make it RFA quality.
Jeff has also found a couple of regressions that I need to look at.

But the substance probably won't change much, so I don't think you'd
be wasting your time if you developed the heuristics based on the
current version.  I'd be happy to review them on that basis too
(though time is short at the moment).

Thanks,
Richard


Re: [PATCH] TEST: Fix XPASS of TSVC testsuites for RVV

2023-10-07 Thread Jeff Law




On 10/7/23 03:23, Juzhe-Zhong wrote:

Fix these following XPASS FAILs of TSVC for RVV:

XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c scan-tree-dump vect "vectorized 1 loops"

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/vect-tsvc-s1115.c: Fix TSVC XPASS.
* gcc.dg/vect/tsvc/vect-tsvc-s114.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1161.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1232.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s124.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1279.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s161.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s253.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s257.c: Di

Re: [PATCH] RISC-V: Enable more tests of "vect" for RVV

2023-10-07 Thread Jeff Law




On 10/7/23 01:04, Juzhe-Zhong wrote:

This patch enables almost full coverage vectorization tests for RVV, except 
these
following tests (not enabled yet):

1. Will enable soon:

check_effective_target_vect_call_lrint
check_effective_target_vect_call_btrunc
check_effective_target_vect_call_btruncf
check_effective_target_vect_call_ceil
check_effective_target_vect_call_ceilf
check_effective_target_vect_call_floor
check_effective_target_vect_call_floorf
check_effective_target_vect_call_lceil
check_effective_target_vect_call_lfloor
check_effective_target_vect_call_nearbyint
check_effective_target_vect_call_nearbyintf
check_effective_target_vect_call_round
check_effective_target_vect_call_roundf

2. Not sure we will need to enable or not:

check_effective_target_vect_complex_*
check_effective_target_vect_simd_clones
check_effective_target_vect_bswap
check_effective_target_vect_widen_shift
check_effective_target_vect_widen_mult_*
check_effective_target_vect_widen_sum_*
check_effective_target_vect_unpack
check_effective_target_vect_interleave
check_effective_target_vect_extract_even_odd
check_effective_target_vect_pack_trunc
check_effective_target_vect_check_ptrs
check_effective_target_vect_sdiv_pow2_si
check_effective_target_vect_usad_*
check_effective_target_vect_udot_*
check_effective_target_vect_sdot_*
check_effective_target_vect_gather_load_ifn

After this patch, we will have these following additional FAILs:
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS

Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-07 Thread Jeff Law




On 10/7/23 05:45, Juzhe-Zhong wrote:

This patch fixes the following dumple FAILs:
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump vect " 
= \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = \\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = \\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = \\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = \\.COND_SUB" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_SUB" 1

For RVV, the expected dumple IR is COND_LEN_* pattern.

Also, we are still failing at this check:

FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
\\.COND_LEN_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_LEN_SUB"

Since we have a known bug in GIMPLE_FOLD that Robin is working on it.

@Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug 
fix patch.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV.
* gcc.dg/vect/vect-cond-arith-4.c: Ditto.
* gcc.dg/vect/vect-cond-arith-5.c: Ditto.
* gcc.dg/vect/vect-cond-arith-6.c: Ditto.
Would it make more sense to adjust the regexp so that it matched the 
standard form as well as the LEN form?  So for example we could have a 
regexp that matched COND_ADD and COND_LEN_ADD.


Just wondering if that'll be better from a long term maintenance standpoint.

Jeff


[Patch] Fortran/OpenMP: Fix handling of strictly structured blocks

2023-10-07 Thread Tobias Burnus

Strictly structured blocks are '!$omp ' directly
followed by 'BLOCK ... END BLOCK', i.e. a Fortran block construct.

I did run into this issue because 'integer :: n; n = 5; !$omp ...;
block; integer :: A(n)' was not accepted.

Well, it turned out that was because the BLOCK handling was not quite right.

In an unrelated patch, I got an ICE for an empty labelled BLOCK - but
only without -fopenmp. I was not quite sure that we had a testcase for
it - my 'grep'  attempt did not find one but we use plenty of BLOCK.
Hence, I added another BLOCK testcase.

Comments, remarks, suggestions?

If not, I will later commit it.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran/OpenMP: Fix handling of strictly structured blocks

For strictly structured blocks, a BLOCK was created but the code
was placed after the block the outer structured block. Additionally,
labelled blocks were mishandled. As the code is now properly in a
BLOCK, it solves additional issues.

gcc/fortran/ChangeLog:

	* parse.cc (parse_omp_structured_block): Make the user code end
	up inside of BLOCK construct for strictly structured blocks;
	fix fallout for 'section' and 'teams'.
	* openmp.cc (resolve_omp_target): Fix changed BLOCK handling
	for teams in target checking.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/strictly-structured-block-1.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/block_17.f90: New test.
	* gfortran.dg/gomp/strictly-structured-block-5.f90: New test.

 gcc/fortran/openmp.cc  |  2 +
 gcc/fortran/parse.cc   | 22 +--
 gcc/testsuite/gfortran.dg/block_17.f90 |  9 +++
 .../gomp/strictly-structured-block-5.f90   | 77 ++
 .../strictly-structured-block-1.f90| 22 +++
 5 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index dc0c8013c3d..79b5ae0e4bd 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -11245,6 +11245,8 @@ resolve_omp_target (gfc_code *code)
   if (!code->ext.omp_clauses->contains_teams_construct)
 return;
   gfc_code *c = code->block->next;
+  if (c->op == EXEC_BLOCK)
+c = c->ext.block.ns->code;
   if (code->ext.omp_clauses->target_first_st_is_teams
   && ((GFC_IS_TEAMS_CONSTRUCT (c->op) && c->next == NULL)
 	  || (c->op == EXEC_BLOCK
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index 58386805ffe..444baf42cbd 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -5814,7 +5814,7 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
 {
   gfc_statement st, omp_end_st, first_st;
   gfc_code *cp, *np;
-  gfc_state_data s;
+  gfc_state_data s, s2;
 
   accept_statement (omp_st);
 
@@ -5915,13 +5915,21 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
   gfc_notify_std (GFC_STD_F2008, "BLOCK construct at %C");
 
   my_ns = gfc_build_block_ns (gfc_current_ns);
-  gfc_current_ns = my_ns;
-  my_parent = my_ns->parent;
-
   new_st.op = EXEC_BLOCK;
   new_st.ext.block.ns = my_ns;
   new_st.ext.block.assoc = NULL;
   accept_statement (ST_BLOCK);
+
+  push_state (&s2, COMP_BLOCK, my_ns->proc_name);
+  gfc_current_ns = my_ns;
+  my_parent = my_ns->parent;
+  if (omp_st == ST_OMP_SECTIONS
+	  || omp_st == ST_OMP_PARALLEL_SECTIONS)
+	{
+	  np = new_level (cp);
+	  np->op = cp->op;
+	}
+
   first_st = next_statement ();
   st = parse_spec (first_st);
 }
@@ -5937,6 +5945,8 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
   case ST_OMP_TEAMS_LOOP:
 	{
 	  gfc_state_data *stk = gfc_state_stack->previous;
+	  if (stk->state == COMP_OMP_STRICTLY_STRUCTURED_BLOCK)
+	stk = stk->previous;
 	  stk->tail->ext.omp_clauses->target_first_st_is_teams = true;
 	  break;
 	}
@@ -6035,8 +6045,10 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
   else if (block_construct && st == ST_END_BLOCK)
 	{
 	  accept_statement (st);
+	  gfc_current_ns->code = gfc_state_stack->head;
 	  gfc_current_ns = my_parent;
-	  pop_state ();
+	  pop_state ();  /* Inner BLOCK */
+	  pop_state ();  /* Outer COMP_OMP_STRICTLY_STRUCTURED_BLOCK */
 
 	  st = next_statement ();
 	  if (st == omp_end_st)
diff --git a/gcc/testsuite/gfortran.dg/block_17.f90 b/gcc/testsuite/gfortran.dg/block_17.f90
new file mode 100644
index 000..6ab3106ebd0
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/block_17.f90
@@ -0,0 +1,9 @@
+subroutine foo()
+  block
+  end block
+end
+
+subroutine bar()
+  my_name: block
+  end block my_name
+end
diff --git a/gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-5.f90 b/gcc/testsuite/gfortran.dg/gomp/

Re: [PATCH] RISC-V: add static-pie support

2023-10-07 Thread Jeff Law




On 10/7/23 05:32, yanzhang.w...@intel.com wrote:

From: Yanzhang Wang 

We only need to pass options to the linker when static-pie is passed.
There's another patch to enable static-pie in glibc. And we need to
enable in GCC first.

gcc/ChangeLog:

* config/riscv/linux.h: Pass the static-pie specific options to
  the linker.

OK.
jeff


Re: [PATCH] Support g++ 4.8 as a host compiler.

2023-10-07 Thread Jeff Law




On 10/4/23 16:19, Roger Sayle wrote:


The recent patch to remove poly_int_pod triggers a bug in g++ 4.8.5's
C++ 11 support which mistakenly believes poly_uint16 has a non-trivial
constructor.  This in turn prohibits it from being used as a member in
a union (rtxunion) that constructed statically, resulting in a (fatal)
error during stage 1.  A workaround is to add an explicit constructor
to the problematic union, which allows mainline to be bootstrapped with
the system compiler on older RedHat 7 systems.

This patch has been tested on x86_64-pc-linux-gnu where it allows a
bootstrap to complete when using g++ 4.8.5 as the host compiler.
Ok for mainline?


2023-10-04  Roger Sayle  

gcc/ChangeLog
* rtl.h (rtx_def::u): Add explicit constructor to workaround
issue using g++ 4.8 as a host compiler.
I think the bigger question is whether or not we're going to step 
forward on the minimum build requirements.


My recollection was we settled on gcc-4.8 for the benefit of RHEL 7 and 
Centos 7 which are rapidly approaching EOL (June 2024).


I would certainly support stepping forward to a more modern compiler for 
the build requirements, which might make this patch obsolete.


Jeff


[PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-10-07 Thread Ajit Agarwal
Hello All:

This patch add new pass to replace contiguous addresses vector load lxv with 
mma instruction
lxvp. This patch addresses one regressions failure in ARM architecture.

Bootstrapped and regtested with powepc64-linux-gnu.

Thanks & Regards
Ajit


rs6000: Add new pass for replacement of contiguous lxv with lxvp.

New pass to replace contiguous addresses lxv with lxvp. This pass
is registered after ree rtl pass.

2023-10-07  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000-passes.def: Registered vecload pass.
* config/rs6000/rs6000-vecload-opt.cc: Add new pass.
* config.gcc: Add new executable.
* config/rs6000/rs6000-protos.h: Add new prototype for vecload
pass.
* config/rs6000/rs6000.cc: Add new prototype for vecload pass.
* config/rs6000/t-rs6000: Add new rule.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/vecload.C: New test.
---
 gcc/config.gcc |   4 +-
 gcc/config/rs6000/rs6000-passes.def|   1 +
 gcc/config/rs6000/rs6000-protos.h  |   2 +
 gcc/config/rs6000/rs6000-vecload-opt.cc| 234 +
 gcc/config/rs6000/rs6000.cc|   3 +-
 gcc/config/rs6000/t-rs6000 |   4 +
 gcc/testsuite/g++.target/powerpc/vecload.C |  15 ++
 7 files changed, 260 insertions(+), 3 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-vecload-opt.cc
 create mode 100644 gcc/testsuite/g++.target/powerpc/vecload.C

diff --git a/gcc/config.gcc b/gcc/config.gcc
index ee46d96bf62..482ab094b89 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -515,7 +515,7 @@ or1k*-*-*)
;;
 powerpc*-*-*)
cpu_type=rs6000
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-vecload-opt.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
@@ -552,7 +552,7 @@ riscv*)
;;
 rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-vecload-opt.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-logue.cc 
\$(srcdir)/config/rs6000/rs6000-call.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
diff --git a/gcc/config/rs6000/rs6000-passes.def 
b/gcc/config/rs6000/rs6000-passes.def
index ca899d5f7af..9ecf8ce6a9c 100644
--- a/gcc/config/rs6000/rs6000-passes.def
+++ b/gcc/config/rs6000/rs6000-passes.def
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
  The power8 does not have instructions that automaticaly do the byte swaps
  for loads and stores.  */
   INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
+  INSERT_PASS_AFTER (pass_ree, 1, pass_analyze_vecload);
 
   /* Pass to do the PCREL_OPT optimization that combines the load of an
  external symbol's address along with a single load or store using that
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index f70118ea40f..9c44bae33d3 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -91,6 +91,7 @@ extern int mems_ok_for_quad_peep (rtx, rtx);
 extern bool gpr_or_gpr_p (rtx, rtx);
 extern bool direct_move_p (rtx, rtx);
 extern bool quad_address_p (rtx, machine_mode, bool);
+extern bool mode_supports_dq_form (machine_mode);
 extern bool quad_load_store_p (rtx, rtx);
 extern bool fusion_gpr_load_p (rtx, rtx, rtx, rtx);
 extern void expand_fusion_gpr_load (rtx *);
@@ -344,6 +345,7 @@ class rtl_opt_pass;
 
 extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
 extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *);
+extern rtl_opt_pass *make_pass_analyze_vecload (gcc::context *);
 extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
 extern bool rs6000_quadword_masked_address_p (const_rtx exp);
 extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx);
diff --git a/gcc/config/rs6000/rs6000-vecload-opt.cc 
b/gcc/config/rs6000/rs6000-vecload-opt.cc
new file mode 100644
index 000..63ee733af89
--- /dev/null
+++ b/gcc/config/rs6000/rs6000-vecload-opt.cc
@@ -0,0 +1,234 @@
+/* Subroutines used to replace lxv with lxvp
+   for p10 little-endian VSX code.
+   Copyright (C) 2020-2023 Free Software Foundation, Inc.
+   Contributed by Ajit Kumar Agarwal .
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distribut

Re: [PATCH] sso-string@gnu-versioned-namespace [PR83077]

2023-10-07 Thread François Dumont
I've been told that previous patch generated with 'git diff -b' was not 
applying properly so here is the same patch again with a simple 'git diff'.



On 07/10/2023 14:25, François Dumont wrote:

Hi

Here is a rebased version of this patch.

There are few test failures when running 'make check-c++' but nothing 
new.


Still, there are 2 patches awaiting validation to fix some of them, PR 
c++/111524 to fix another bunch and I fear that we will have to live 
with the others.


    libstdc++: [_GLIBCXX_INLINE_VERSION] Use cxx11 abi [PR83077]

    Use cxx11 abi when activating versioned namespace mode. To do support
    a new configuration mode where !_GLIBCXX_USE_DUAL_ABI and 
_GLIBCXX_USE_CXX11_ABI.


    The main change is that std::__cow_string is now defined whenever 
_GLIBCXX_USE_DUAL_ABI
    or _GLIBCXX_USE_CXX11_ABI is true. Implementation is using 
available std::string in

    case of dual abi and a subset of it when it's not.

    On the other side std::__sso_string is defined only when 
_GLIBCXX_USE_DUAL_ABI is true
    and _GLIBCXX_USE_CXX11_ABI is false. Meaning that 
std::__sso_string is a typedef for the
    cow std::string implementation when dual abi is disabled and cow 
string is being used.


    libstdcxx-v3/ChangeLog:

    PR libstdc++/83077
    * acinclude.m4 [GLIBCXX_ENABLE_LIBSTDCXX_DUAL_ABI]: 
Default to "new" libstdcxx abi

    when enable_symvers is gnu-versioned-namespace.
    * config/locale/dragonfly/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Define money_base

    members.
    * config/locale/generic/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Likewise.
    * config/locale/gnu/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Likewise.

    * config/locale/gnu/numeric_members.cc
    [!_GLIBCXX_USE_DUAL_ABI](__narrow_multibyte_chars): Define.
    * configure: Regenerate.
    * include/bits/c++config
    [_GLIBCXX_INLINE_VERSION](_GLIBCXX_NAMESPACE_CXX11, 
_GLIBCXX_BEGIN_NAMESPACE_CXX11):

    Define empty.
[_GLIBCXX_INLINE_VERSION](_GLIBCXX_END_NAMESPACE_CXX11, 
_GLIBCXX_DEFAULT_ABI_TAG):

    Likewise.
    * include/bits/cow_string.h [!_GLIBCXX_USE_CXX11_ABI]: 
Define a light version of COW

    basic_string as __std_cow_string for use in stdexcept.
    * include/std/stdexcept [_GLIBCXX_USE_CXX11_ABI]: Define 
__cow_string.

    (__cow_string(const char*)): New.
    (__cow_string::c_str()): New.
    * python/libstdcxx/v6/printers.py 
(StdStringPrinter::__init__): Set self.new_string to True

    when std::__8::basic_string type is found.
    * src/Makefile.am 
[ENABLE_SYMVERS_GNU_NAMESPACE](ldbl_alt128_compat_sources): Define empty.

    * src/Makefile.in: Regenerate.
    * src/c++11/Makefile.am (cxx11_abi_sources): Rename into...
    (dual_abi_sources): ...this. Also move cow-local_init.cc, 
cxx11-hash_tr1.cc,

    cxx11-ios_failure.cc entries to...
    (sources): ...this.
    (extra_string_inst_sources): Move cow-fstream-inst.cc, 
cow-sstream-inst.cc, cow-string-inst.cc,
    cow-string-io-inst.cc, cow-wtring-inst.cc, 
cow-wstring-io-inst.cc, cxx11-locale-inst.cc,

    cxx11-wlocale-inst.cc entries to...
    (inst_sources): ...this.
    * src/c++11/Makefile.in: Regenerate.
    * src/c++11/cow-fstream-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-locale_init.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-sstream-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-stdexcept.cc [_GLIBCXX_USE_CXX11_ABI]: 
Include .
    [_GLIBCXX_USE_DUAL_ABI || 
_GLIBCXX_USE_CXX11_ABI](__cow_string): Redefine before
    including . Define 
_GLIBCXX_DEFINE_STDEXCEPT_INSTANTIATIONS so that

    __cow_string definition in  is skipped.
    [_GLIBCXX_USE_CXX11_ABI]: Skip Transaction Memory TS 
definitions.
    * src/c++11/string-inst.cc: Add sizeof/alignof 
static_assert on stdexcept

    __cow_string definition.
    (_GLIBCXX_DEFINING_CXX11_ABI_INSTANTIATIONS): Define 
following _GLIBCXX_USE_CXX11_ABI

    value.
    [_GLIBCXX_USE_CXX11_ABI && 
!_GLIBCXX_DEFINING_CXX11_ABI_INSTANTIATIONS]:
    Define _GLIBCXX_DEFINING_COW_STRING_INSTANTIATIONS. 
Include .
    Define basic_string as __std_cow_string for the current 
translation unit.
    * src/c++11/cow-string-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-string-io-inst.cc 
[_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
    * src/c++11/cow-wstring-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-wstring-io-inst.cc 
[_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
    * src/c++11/cxx11-hash_tr1.cc [!_GLIBCXX_USE_CXX11_ABI]: 
Skip 

Re: [PATCH] Support g++ 4.8 as a host compiler.

2023-10-07 Thread Sam James


Jeff Law  writes:

> On 10/4/23 16:19, Roger Sayle wrote:
>> The recent patch to remove poly_int_pod triggers a bug in g++
>> 4.8.5's
>> C++ 11 support which mistakenly believes poly_uint16 has a non-trivial
>> constructor.  This in turn prohibits it from being used as a member in
>> a union (rtxunion) that constructed statically, resulting in a (fatal)
>> error during stage 1.  A workaround is to add an explicit constructor
>> to the problematic union, which allows mainline to be bootstrapped with
>> the system compiler on older RedHat 7 systems.
>> This patch has been tested on x86_64-pc-linux-gnu where it allows a
>> bootstrap to complete when using g++ 4.8.5 as the host compiler.
>> Ok for mainline?
>> 2023-10-04  Roger Sayle  
>> gcc/ChangeLog
>>  * rtl.h (rtx_def::u): Add explicit constructor to workaround
>>  issue using g++ 4.8 as a host compiler.
> I think the bigger question is whether or not we're going to step
> forward on the minimum build requirements.
>
> My recollection was we settled on gcc-4.8 for the benefit of RHEL 7
> and Centos 7 which are rapidly approaching EOL (June 2024).
>
> I would certainly support stepping forward to a more modern compiler
> for the build requirements, which might make this patch obsolete.

See also richi and jakub's comments at 
https://inbox.sourceware.org/gcc-patches/mpt5y3ppio0@arm.com/T/#m985295bedaadb47aa0b9ba63b7cb69a660a108bb.

>
> Jeff



Re: Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-07 Thread 钟居哲
Do you mean change it like this ?

/* { dg-final { scan-tree-dump-times { = \.COND_L?E?N?_?RDIV} 1 "optimized" { 
target vect_double_cond_arith } } } */



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-10-07 23:09
To: Juzhe-Zhong; gcc-patches
CC: rguenther; rdapp.gcc
Subject: Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV
 
 
On 10/7/23 05:45, Juzhe-Zhong wrote:
> This patch fixes the following dumple FAILs:
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
> vect " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
> \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_ADD" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_MUL" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_RDIV" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_SUB" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_ADD" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_MUL" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_RDIV" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_SUB" 1
> 
> For RVV, the expected dumple IR is COND_LEN_* pattern.
> 
> Also, we are still failing at this check:
> 
> FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
> \\.COND_LEN_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_LEN_SUB"
> 
> Since we have a known bug in GIMPLE_FOLD that Robin is working on it.
> 
> @Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug 
> fix patch.
> 
> Ok for trunk ?
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV.
> * gcc.dg/vect/vect-cond-arith-4.c: Ditto.
> * gcc.dg/vect/vect-cond-arith-5.c: Ditto.
> * gcc.dg/vect/vect-cond-arith-6.c: Ditto.
Would it make more sense to adjust the regexp so that it matched the 
standard form as well as the LEN form?  So for example we could have a 
regexp that matched COND_ADD and COND_LEN_ADD.
 
Just wondering if that'll be better from a long term maintenance standpoint.
 
Jeff
 


RE: [PATCH] RISC-V: Enable more tests of "vect" for RVV

2023-10-07 Thread Li, Pan2
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, October 7, 2023 10:48 PM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Enable more tests of "vect" for RVV



On 10/7/23 01:04, Juzhe-Zhong wrote:
> This patch enables almost full coverage vectorization tests for RVV, except 
> these
> following tests (not enabled yet):
> 
> 1. Will enable soon:
> 
> check_effective_target_vect_call_lrint
> check_effective_target_vect_call_btrunc
> check_effective_target_vect_call_btruncf
> check_effective_target_vect_call_ceil
> check_effective_target_vect_call_ceilf
> check_effective_target_vect_call_floor
> check_effective_target_vect_call_floorf
> check_effective_target_vect_call_lceil
> check_effective_target_vect_call_lfloor
> check_effective_target_vect_call_nearbyint
> check_effective_target_vect_call_nearbyintf
> check_effective_target_vect_call_round
> check_effective_target_vect_call_roundf
> 
> 2. Not sure we will need to enable or not:
> 
> check_effective_target_vect_complex_*
> check_effective_target_vect_simd_clones
> check_effective_target_vect_bswap
> check_effective_target_vect_widen_shift
> check_effective_target_vect_widen_mult_*
> check_effective_target_vect_widen_sum_*
> check_effective_target_vect_unpack
> check_effective_target_vect_interleave
> check_effective_target_vect_extract_even_odd
> check_effective_target_vect_pack_trunc
> check_effective_target_vect_check_ptrs
> check_effective_target_vect_sdiv_pow2_si
> check_effective_target_vect_usad_*
> check_effective_target_vect_udot_*
> check_effective_target_vect_sdot_*
> check_effective_target_vect_gather_load_ifn
> 
> After this patch, we will have these following additional FAILs:
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 
> loops"

RE: [PATCH] TEST: Fix XPASS of TSVC testsuites for RVV

2023-10-07 Thread Li, Pan2
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, October 7, 2023 10:44 PM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: rguent...@suse.de
Subject: Re: [PATCH] TEST: Fix XPASS of TSVC testsuites for RVV



On 10/7/23 03:23, Juzhe-Zhong wrote:
> Fix these following XPASS FAILs of TSVC for RVV:
> 
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c scan-tree-dump vect "vectorized 1 
> loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c scan-tree-dump vect "vectorized 1 
> loops"
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/tsvc/vect-tsvc-s11

RE: [PATCH] RISC-V: add static-pie support

2023-10-07 Thread Li, Pan2
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Sunday, October 8, 2023 12:13 AM
To: Wang, Yanzhang ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Li, Pan2 
Subject: Re: [PATCH] RISC-V: add static-pie support



On 10/7/23 05:32, yanzhang.w...@intel.com wrote:
> From: Yanzhang Wang 
> 
> We only need to pass options to the linker when static-pie is passed.
> There's another patch to enable static-pie in glibc. And we need to
> enable in GCC first.
> 
> gcc/ChangeLog:
> 
>   * config/riscv/linux.h: Pass the static-pie specific options to
> the linker.
OK.
jeff


[PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-07 Thread Juzhe-Zhong
This patch fixes the following dumple FAILs:
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_SUB" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_SUB" 1

For RVV, the expected dumple IR is COND_LEN_* pattern.

Also, we are still failing at this check:

FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
\\.COND_LEN_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_LEN_SUB"

Since we have a known bug in GIMPLE_FOLD that Robin is working on it.

@Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug 
fix patch.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV.
* gcc.dg/vect/vect-cond-arith-4.c: Ditto.
* gcc.dg/vect/vect-cond-arith-5.c: Ditto.
* gcc.dg/vect/vect-cond-arith-6.c: Ditto.

---
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c | 4 ++--
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c | 8 
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c | 8 
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c | 8 
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
index 38994ea82a5..3832a660023 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
@@ -41,5 +41,5 @@ neg_xi (double *x)
   return res_3;
 }
 
-/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
-/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?ADD} "vect" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?SUB} "optimized" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
index 1af0fe642a0..5bb75206a68 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
@@ -52,8 +52,8 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump { = \.COND_SUB} "opt

Re: Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-07 Thread juzhe.zh...@rivai.ai
Hi, Jeff.

Address your comments and fix on V2:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632239.html 

I think it look reasonable good for a long term maintenance now.

Ok for trunk ?



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-10-07 23:09
To: Juzhe-Zhong; gcc-patches
CC: rguenther; rdapp.gcc
Subject: Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV
 
 
On 10/7/23 05:45, Juzhe-Zhong wrote:
> This patch fixes the following dumple FAILs:
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
> vect " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
> \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_ADD" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_MUL" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_RDIV" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_SUB" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_ADD" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_MUL" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_RDIV" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_SUB" 1
> 
> For RVV, the expected dumple IR is COND_LEN_* pattern.
> 
> Also, we are still failing at this check:
> 
> FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
> \\.COND_LEN_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_LEN_SUB"
> 
> Since we have a known bug in GIMPLE_FOLD that Robin is working on it.
> 
> @Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug 
> fix patch.
> 
> Ok for trunk ?
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV.
> * gcc.dg/vect/vect-cond-arith-4.c: Ditto.
> * gcc.dg/vect/vect-cond-arith-5.c: Ditto.
> * gcc.dg/vect/vect-cond-arith-6.c: Ditto.
Would it make more sense to adjust the regexp so that it matched the 
standard form as well as the LEN form?  So for example we could have a 
regexp that matched COND_ADD and COND_LEN_ADD.
 
Just wondering if that'll be better from a long term maintenance standpoint.
 
Jeff
 


[PATCH] [i386] Fix apx test fails on 32bit target

2023-10-07 Thread Hongyu Wang
Since -mapxf works similar as -muintr that will emit error for 32bit
target, add !ia32 target guard for apx related tests.

Committed as obvious fix after test.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-egprs-names.c: Compile for non-ia32.
* gcc.target/i386/apx-inline-gpr-norex2.c: Likewise.
* gcc.target/i386/apx-interrupt-1.c: Likewise.
* gcc.target/i386/apx-legacy-insn-check-norex2-asm.c: Likewise.
* gcc.target/i386/apx-legacy-insn-check-norex2.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/apx-egprs-names.c | 2 +-
 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c   | 2 +-
 gcc/testsuite/gcc.target/i386/apx-interrupt-1.c | 2 +-
 .../gcc.target/i386/apx-legacy-insn-check-norex2-asm.c  | 2 +-
 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c| 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/apx-egprs-names.c 
b/gcc/testsuite/gcc.target/i386/apx-egprs-names.c
index 445bcf2c250..f0517e47c33 100644
--- a/gcc/testsuite/gcc.target/i386/apx-egprs-names.c
+++ b/gcc/testsuite/gcc.target/i386/apx-egprs-names.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! ia32 } } } */
 /* { dg-options "-mapxf -m64" } */
 /* { dg-final { scan-assembler "r31" } } */
 /* { dg-final { scan-assembler "r30" } } */
diff --git a/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c 
b/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
index ffd8f954500..208d53dc774 100644
--- a/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
+++ b/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! ia32 } } } */
 /* { dg-options "-O2 -mapxf -m64" } */
 
 typedef unsigned int u32;
diff --git a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c 
b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
index 441dbf04bf2..dc1fc3fe373 100644
--- a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
+++ b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! ia32 } } } */
 /* { dg-options "-mapxf -m64 -O2 -mgeneral-regs-only -mno-cld -mno-push-args 
-maccumulate-outgoing-args" } */
 
 extern void foo (void *) __attribute__ ((interrupt));
diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c 
b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
index 7ecc861435f..fb0f62e83e8 100644
--- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
+++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
@@ -1,4 +1,4 @@
-/* { dg-do assemble { target apxf } } */
+/* { dg-do assemble { target { apxf && { ! ia32 } } } } */
 /* { dg-options "-O1 -mapxf -m64 -DDTYPE32" } */
 
 #include "apx-legacy-insn-check-norex2.c"
diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c 
b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
index 771bcb078e1..641feafa27f 100644
--- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
+++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! ia32 } } } */
 /* { dg-options "-O3 -mapxf -m64 -DDTYPE32" } */
 
 #include 
-- 
2.31.1



[PATCH 1/2] [x86] Support smin/smax for V2HF/V4HF

2023-10-07 Thread liuhongt
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
Ready push to trunk.

gcc/ChangeLog:

* config/i386/mmx.md (VHF_32_64): New mode iterator.
(3): New define_expand, merged from ..
(v4hf3): .. this and
(v2hf3): .. this.
(movd_v2hf_to_sse_reg): New define_expand, splitted from ..
(movd_v2hf_to_sse): .. this.
(3): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-vminmaxph-1.c: New test.
* gcc.target/i386/avx512fp16-64-32-vecop-1.c: Scan-assembler
only for { target { ! ia32 } }.
---
 gcc/config/i386/mmx.md| 74 +++
 .../i386/avx512fp16-64-32-vecop-1.c   |  8 +-
 .../gcc.target/i386/part-vect-vminmaxph-1.c   | 36 +
 3 files changed, 83 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-vminmaxph-1.c

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index ef578222945..77f1db265ab 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1936,25 +1936,7 @@ (define_expand "lroundv2sfv2si2"
 ;;
 ;
 
-(define_expand "v4hf3"
-  [(set (match_operand:V4HF 0 "register_operand")
-   (plusminusmult:V4HF
- (match_operand:V4HF 1 "nonimmediate_operand")
- (match_operand:V4HF 2 "nonimmediate_operand")))]
-  "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
-{
-  rtx op2 = gen_reg_rtx (V8HFmode);
-  rtx op1 = gen_reg_rtx (V8HFmode);
-  rtx op0 = gen_reg_rtx (V8HFmode);
-
-  emit_insn (gen_movq_v4hf_to_sse (op2, operands[2]));
-  emit_insn (gen_movq_v4hf_to_sse (op1, operands[1]));
-
-  emit_insn (gen_v8hf3 (op0, op1, op2));
-
-  emit_move_insn (operands[0], lowpart_subreg (V4HFmode, op0, V8HFmode));
-  DONE;
-})
+(define_mode_iterator VHF_32_64 [V2HF (V4HF "TARGET_MMX_WITH_SSE")])
 
 (define_expand "divv4hf3"
   [(set (match_operand:V4HF 0 "register_operand")
@@ -1976,39 +1958,50 @@ (define_expand "divv4hf3"
   DONE;
 })
 
+(define_mode_attr mov_to_sse_suffix [(V2HF "d") (V4HF "q")])
 (define_expand "movd_v2hf_to_sse"
   [(set (match_operand:V8HF 0 "register_operand")
(vec_merge:V8HF
  (vec_duplicate:V8HF
(match_operand:V2HF 1 "nonimmediate_operand"))
- (match_operand:V8HF 2 "reg_or_0_operand")
+ (match_dup 2)
  (const_int 3)))]
   "TARGET_SSE"
 {
-  if (!flag_trapping_math && operands[2] == CONST0_RTX (V8HFmode))
+  if (!flag_trapping_math)
   {
 rtx op1 = force_reg (V2HFmode, operands[1]);
 emit_move_insn (operands[0], lowpart_subreg (V8HFmode, op1, V2HFmode));
 DONE;
   }
+  operands[2] = CONST0_RTX (V8HFmode);
 })
 
-(define_expand "v2hf3"
-  [(set (match_operand:V2HF 0 "register_operand")
-   (plusminusmult:V2HF
- (match_operand:V2HF 1 "nonimmediate_operand")
- (match_operand:V2HF 2 "nonimmediate_operand")))]
+(define_expand "movd_v2hf_to_sse_reg"
+  [(set (match_operand:V8HF 0 "register_operand")
+   (vec_merge:V8HF
+ (vec_duplicate:V8HF
+   (match_operand:V2HF 1 "nonimmediate_operand"))
+ (match_operand:V8HF 2 "register_operand")
+ (const_int 3)))]
+  "TARGET_SSE")
+
+(define_expand "3"
+  [(set (match_operand:VHF_32_64 0 "register_operand")
+   (plusminusmult:VHF_32_64
+ (match_operand:VHF_32_64 1 "nonimmediate_operand")
+ (match_operand:VHF_32_64 2 "nonimmediate_operand")))]
   "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
 {
   rtx op2 = gen_reg_rtx (V8HFmode);
   rtx op1 = gen_reg_rtx (V8HFmode);
   rtx op0 = gen_reg_rtx (V8HFmode);
 
-  emit_insn (gen_movd_v2hf_to_sse (op2, operands[2], CONST0_RTX (V8HFmode)));
-  emit_insn (gen_movd_v2hf_to_sse (op1, operands[1], CONST0_RTX (V8HFmode)));
+  emit_insn (gen_mov__to_sse (op2, operands[2]));
+  emit_insn (gen_mov__to_sse (op1, operands[1]));
   emit_insn (gen_v8hf3 (op0, op1, op2));
 
-  emit_move_insn (operands[0], lowpart_subreg (V2HFmode, op0, V8HFmode));
+  emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8HFmode));
   DONE;
 })
 
@@ -2023,15 +2016,34 @@ (define_expand "divv2hf3"
   rtx op1 = gen_reg_rtx (V8HFmode);
   rtx op0 = gen_reg_rtx (V8HFmode);
 
-  emit_insn (gen_movd_v2hf_to_sse (op2, operands[2],
+  emit_insn (gen_movd_v2hf_to_sse_reg (op2, operands[2],
  force_reg (V8HFmode, CONST1_RTX (V8HFmode;
-  emit_insn (gen_movd_v2hf_to_sse (op1, operands[1], CONST0_RTX (V8HFmode)));
+  emit_insn (gen_movd_v2hf_to_sse (op1, operands[1]));
   emit_insn (gen_divv8hf3 (op0, op1, op2));
 
   emit_move_insn (operands[0], lowpart_subreg (V2HFmode, op0, V8HFmode));
   DONE;
 })
 
+(define_expand "3"
+  [(set (match_operand:VHF_32_64 0 "register_operand")
+   (smaxmin:VHF_32_64
+ (match_operand:VHF_32_64 1 "nonimmediate_operand")
+ (match_operand:VHF_32_64 2 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16 && TARG

[PATCH 2/2] Support signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2HF/V4HF.

2023-10-07 Thread liuhongt
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
Ready push to trunk.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_build_const_vector): Handle V2HF
and V4HFmode.
(ix86_build_signbit_mask): Ditto.
* config/i386/mmx.md (mmxintvecmode): Ditto.
(2): New define_expand.
(*mmx_): New define_insn_and_split.
(*mmx_nabs2): Ditto.
(*mmx_andnot3): New define_insn.
(3): Ditto.
(copysign3): New define_expand.
(xorsign3): Ditto.
(signbit2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-absneghf.c: New test.
* gcc.target/i386/part-vect-copysignhf.c: New test.
* gcc.target/i386/part-vect-xorsignhf.c: New test.
---
 gcc/config/i386/i386.cc   |   4 +
 gcc/config/i386/mmx.md| 114 +-
 .../gcc.target/i386/part-vect-absneghf.c  |  91 ++
 .../gcc.target/i386/part-vect-copysignhf.c|  60 +
 .../gcc.target/i386/part-vect-vminmaxph-1.c   |   4 +-
 .../gcc.target/i386/part-vect-xorsignhf.c |  60 +
 6 files changed, 330 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-xorsignhf.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 9557bffd092..46326d3c82e 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -15752,6 +15752,8 @@ ix86_build_const_vector (machine_mode mode, bool vect, 
rtx value)
 case E_V2DImode:
   gcc_assert (vect);
   /* FALLTHRU */
+case E_V2HFmode:
+case E_V4HFmode:
 case E_V8HFmode:
 case E_V16HFmode:
 case E_V32HFmode:
@@ -15793,6 +15795,8 @@ ix86_build_signbit_mask (machine_mode mode, bool vect, 
bool invert)
 
   switch (mode)
 {
+case E_V2HFmode:
+case E_V4HFmode:
 case E_V8HFmode:
 case E_V16HFmode:
 case E_V32HFmode:
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 77f1db265ab..c68a3d6fe43 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -99,7 +99,8 @@ (define_mode_attr mmxdoublemode
 
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr mmxintvecmode
-  [(V2SF "V2SI") (V2SI "V2SI") (V4HI "V4HI") (V8QI "V8QI")])
+  [(V2SF "V2SI") (V2SI "V2SI") (V4HI "V4HI") (V8QI "V8QI")
+   (V4HF "V4HF") (V2HF "V2HI")])
 
 (define_mode_attr mmxintvecmodelower
   [(V2SF "v2si") (V2SI "v2si") (V4HI "v4hi") (V8QI "v8qi")])
@@ -2045,6 +2046,117 @@ (define_expand "3"
   DONE;
 })
 
+(define_expand "2"
+  [(set (match_operand:VHF_32_64 0 "register_operand")
+   (absneg:VHF_32_64
+ (match_operand:VHF_32_64 1 "register_operand")))]
+  "TARGET_SSE"
+  "ix86_expand_fp_absneg_operator (, mode, operands); DONE;")
+
+(define_insn_and_split "*mmx_"
+  [(set (match_operand:VHF_32_64 0 "register_operand" "=x,x,x")
+   (absneg:VHF_32_64
+ (match_operand:VHF_32_64 1 "register_operand" "0,x,x")))
+   (use (match_operand:VHF_32_64 2 "register_operand" "x,0,x"))]
+  "TARGET_SSE"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+   (: (match_dup 1) (match_dup 2)))]
+{
+  if (!TARGET_AVX && operands_match_p (operands[0], operands[2]))
+std::swap (operands[1], operands[2]);
+}
+  [(set_attr "isa" "noavx,noavx,avx")])
+
+(define_insn_and_split "*mmx_nabs2"
+  [(set (match_operand:VHF_32_64 0 "register_operand" "=x,x,x")
+   (neg:VHF_32_64
+ (abs:VHF_32_64
+   (match_operand:VHF_32_64 1 "register_operand" "0,x,x"
+   (use (match_operand:VHF_32_64 2 "register_operand" "x,0,x"))]
+  "TARGET_SSE"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+   (ior: (match_dup 1) (match_dup 2)))])
+
+;
+;;
+;; Parallel half-precision floating point logical operations
+;;
+;
+
+(define_insn "*mmx_andnot3"
+  [(set (match_operand:VHF_32_64 0 "register_operand""=x,x")
+   (and:VHF_32_64
+ (not:VHF_32_64
+   (match_operand:VHF_32_64 1 "register_operand" "0,x"))
+ (match_operand:VHF_32_64 2 "register_operand"   "x,x")))]
+  "TARGET_SSE"
+  "@
+   andnps\t{%2, %0|%0, %2}
+   vandnps\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog")
+   (set_attr "prefix" "orig,vex")
+   (set_attr "mode" "V4SF")])
+
+(define_insn "3"
+  [(set (match_operand:VHF_32_64 0 "register_operand"   "=x,x")
+   (any_logic:VHF_32_64
+ (match_operand:VHF_32_64 1 "register_operand" "%0,x")
+ (match_operand:VHF_32_64 2 "register_operand" " x,x")))]
+  "TARGET_SSE"
+  "@
+   ps\t{%2, %0|%0, %2}
+   vps\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog,sselog")
+   (set_attr "prefix" 

Re: [PATCH] Support g++ 4.8 as a host compiler.

2023-10-07 Thread Jeff Law




On 10/7/23 15:30, Sam James wrote:


Jeff Law  writes:


On 10/4/23 16:19, Roger Sayle wrote:

The recent patch to remove poly_int_pod triggers a bug in g++
4.8.5's
C++ 11 support which mistakenly believes poly_uint16 has a non-trivial
constructor.  This in turn prohibits it from being used as a member in
a union (rtxunion) that constructed statically, resulting in a (fatal)
error during stage 1.  A workaround is to add an explicit constructor
to the problematic union, which allows mainline to be bootstrapped with
the system compiler on older RedHat 7 systems.
This patch has been tested on x86_64-pc-linux-gnu where it allows a
bootstrap to complete when using g++ 4.8.5 as the host compiler.
Ok for mainline?
2023-10-04  Roger Sayle  
gcc/ChangeLog
* rtl.h (rtx_def::u): Add explicit constructor to workaround
issue using g++ 4.8 as a host compiler.

I think the bigger question is whether or not we're going to step
forward on the minimum build requirements.

My recollection was we settled on gcc-4.8 for the benefit of RHEL 7
and Centos 7 which are rapidly approaching EOL (June 2024).

I would certainly support stepping forward to a more modern compiler
for the build requirements, which might make this patch obsolete.


See also richi and jakub's comments at 
https://inbox.sourceware.org/gcc-patches/mpt5y3ppio0@arm.com/T/#m985295bedaadb47aa0b9ba63b7cb69a660a108bb.
Yea.  As Jakub notes, there's the cfarm situation, but I've had good 
success with DTS on Centos 7 systems (I have to support some of those 
internally within Ventana).  It quite literally "just works" though 
users would have to enable it.


Alternately, update the cfarm hosts?

Jeff