Re: [PATCHv2 0/2] Changes to libiberty buildargv

2024-04-27 Thread Andrew Burgess


Ping!

Any thoughts on these patches?

Thanks,
Andrew


Andrew Burgess  writes:

> I realise that these patches are not going to get merged until GCC is
> back in stage 1, but thought I'd post my latest set of changes for the
> libiberty buildargv function.
>
> Patch #1 is unchanged from V1.
>
> Patch #2 is new in V2.
>
> Thanks,
> Andrew
>
> ---
>
> Andrew Burgess (2):
>   libiberty/buildargv: POSIX behaviour for backslash handling
>   libiberty/buildargv: handle input consisting of only white space
>
>  libiberty/argv.c  | 104 
>  libiberty/testsuite/test-expandargv.c | 170 ++
>  2 files changed, 200 insertions(+), 74 deletions(-)
>
>
> base-commit: cff174fabd6c980c09aee95db1d9d5c22421761f
> -- 
> 2.25.4



[PATCH] testsuite: Verify r0-r3 are extended with CMSE

2024-04-27 Thread Torbjörn SVENSSON
Add regression test to the existing zero/sign extend tests for CMSE to
verify that r0, r1, r2 and r3 are properly extended, not just r0.

Test is done using -O0 to ensure the instructions are in a predictable
order.

gcc/testsuite/ChangeLog:

* gcc.target/arm/cmse/extend-param.c: Add regression test.

Signed-off-by: Torbjörn SVENSSON 
---
 .../gcc.target/arm/cmse/extend-param.c| 20 ++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c 
b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
index 01fac786238..b8b8ecbff56 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
@@ -93,4 +93,22 @@ __attribute__((cmse_nonsecure_entry)) char boolSecureFunc 
(bool index) {
 return 0;
   return array[index];
 
-}
\ No newline at end of file
+}
+
+/*
+**__acle_se_boolCharShortEnumSecureFunc:
+** ...
+** uxtbr0, r0
+** uxtbr1, r1
+** uxthr2, r2
+** uxtbr3, r3
+** ...
+*/
+__attribute__((cmse_nonsecure_entry,optimize(0))) char 
boolCharShortEnumSecureFunc (bool a, unsigned char b, unsigned short c, enum 
offset d) {
+
+  size_t index = a + b + c + d;
+  if (index >= ARRAY_SIZE)
+return 0;
+  return array[index];
+
+}
-- 
2.25.1



Re: [PATCH wwwdocs 1/1] gcc-14: document P1689R5 scanning output support

2024-04-27 Thread Ben Boeckel
On Sat, Jan 06, 2024 at 14:17:14 +0100, Arsen Arsenović wrote:
> Hi Ben,
> 
> Ben Boeckel  writes:
> 
> > Ping? Is this the right place to submit this patch?
> 
> Yes, this is the correct list, though it is usually recommended to use
> --subject-prefix='PATCH wwwdocs' or such, to catch the right eyes.  See:
> https://gcc.gnu.org/contribute.html#webchanges
> 
> I've added it to my subject, hopefully that works.

No bites yet… Anyone willing to review this patch so that it gets
mentioned on the website?

Thanks,

--Ben


[PATCH v1] RISC-V: Fix ICE for legitimize move on subreg const_poly_move

2024-04-27 Thread pan2 . li
From: Pan Li 

When we build with isl, there will be a ICE for graphite in both
the c/c++ and fortran.  The legitimize move cannot take care of
below rtl.

(set (subreg:DI (reg:TI 237) 8) (subreg:DI (const_poly_int:TI [4, 2]) 8))

Then we will have ice similar to below:

internal compiler error: in extract_insn, at recog.cc:2812.

This patch would like to take care of the above rtl.  Given the value of
const_poly_int can hardly excceed the max of int64,  we can simply
consider the highest 8 bytes of TImode is zero and then set the dest
to (const_int 0).

The below test cases are fixed by this PATCH.

C:
FAIL: gcc.dg/graphite/pr111878.c (internal compiler error: in
extract_insn, at recog.cc:2812)
FAIL: gcc.dg/graphite/pr111878.c (test for excess errors)

Fortran:
FAIL: gfortran.dg/graphite/vect-pr40979.f90   -O  (internal compiler
error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -g  (test for excess
errors)
FAIL: gfortran.dg/graphite/pr14741.f90   -O  (test for excess errors)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
excess errors)
FAIL: gfortran.dg/graphite/vect-pr40979.f90   -O  (test for excess
errors)
FAIL: gfortran.dg/graphite/id-27.f90   -O  (internal compiler error: in
extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -g  (internal compiler
error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -g  (test for excess
errors)
FAIL: gfortran.dg/graphite/id-27.f90   -O  (test for excess errors)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
excess errors)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr14741.f90   -O  (internal compiler error:
in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -g  (internal compiler
error: in extract_insn, at recog.cc:2812)

The below test suites are passed for this patch:
* The rv64gcv fully regression test.
* The rv64gc fully regression test.

Try to write some RTL code for test but not works well according to
existing test cases.  Thus, take above as test cases.  Please note
graphite require the gcc build with isl.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_subreg_const_poly_move): New
func impl to take care of (const_int_poly:TI 8).
(riscv_legitimize_move): Handle subreg is const_int_poly,

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0519e0679ed..bad23ea487f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2786,6 +2786,44 @@ riscv_v_adjust_scalable_frame (rtx target, poly_int64 
offset, bool epilogue)
   REG_NOTES (insn) = dwarf;
 }
 
+/* Take care below subreg const_poly_int move:
+
+   1. (set (subreg:DI (reg:TI 237) 8)
+  (subreg:DI (const_poly_int:TI [4, 2]) 8))
+  =>
+  (set (subreg:DI (reg:TI 237) 8)
+  (const_int 0)) */
+
+static bool
+riscv_legitimize_subreg_const_poly_move (machine_mode mode, rtx dest, rtx src)
+{
+  gcc_assert (SUBREG_P (src) && CONST_POLY_INT_P (SUBREG_REG (src)));
+  gcc_assert (SUBREG_BYTE (src).is_constant ());
+
+  int byte_offset = SUBREG_BYTE (src).to_constant ();
+  rtx const_poly = SUBREG_REG (src);
+  machine_mode subreg_mode = GET_MODE (const_poly);
+
+  if (subreg_mode != TImode) /* Only TImode is needed for now.  */
+return false;
+
+  if (byte_offset == 8)
+{ /* The const_poly_int cannot exceed int64, just set zero here.  */
+  emit_move_insn (dest, CONST0_RTX (mode));
+  return true;
+}
+
+  /* The below transform will be covered in somewhere else.
+ Thus, ignore this here.
+   1. (set (subreg:DI (reg:TI 237) 0)
+  (subreg:DI (const_poly_int:TI [4, 2]) 0))
+  =>
+  (set (subreg:DI (reg:TI 237) 0)
+  (const_poly_int:DI [4, 2])) */
+
+  return false;
+}
+
 /* If (set DEST SRC) is not a valid move instruction, emit an equivalent
sequence that is valid.  */
 
@@ -2839,6 +2877,11 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
}
   return true;
 }
+
+  if (SUBREG_P (src) && CONST_POLY_INT_P (SUBREG_REG (src))
+&& riscv_legitimize_subreg_const_poly_move (mode, dest, src))
+return true;
+
   /* Expand
(set (reg:DI target) (subreg:DI (reg:V8QI reg) 0))
  Expand this data movement instead of simply forbid it since
-- 
2.34.1



Re: [PATCH v1] RISC-V: Fix ICE for legitimize move on subreg const_poly_move

2024-04-27 Thread juzhe.zh...@rivai.ai
LGTM from my side. But give kito more time chime in.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-04-28 11:53
To: gcc-patches
CC: juzhe.zhong; kito.cheng; Pan Li
Subject: [PATCH v1] RISC-V: Fix ICE for legitimize move on subreg 
const_poly_move
From: Pan Li 
 
When we build with isl, there will be a ICE for graphite in both
the c/c++ and fortran.  The legitimize move cannot take care of
below rtl.
 
(set (subreg:DI (reg:TI 237) 8) (subreg:DI (const_poly_int:TI [4, 2]) 8))
 
Then we will have ice similar to below:
 
internal compiler error: in extract_insn, at recog.cc:2812.
 
This patch would like to take care of the above rtl.  Given the value of
const_poly_int can hardly excceed the max of int64,  we can simply
consider the highest 8 bytes of TImode is zero and then set the dest
to (const_int 0).
 
The below test cases are fixed by this PATCH.
 
C:
FAIL: gcc.dg/graphite/pr111878.c (internal compiler error: in
extract_insn, at recog.cc:2812)
FAIL: gcc.dg/graphite/pr111878.c (test for excess errors)
 
Fortran:
FAIL: gfortran.dg/graphite/vect-pr40979.f90   -O  (internal compiler
error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -g  (test for excess
errors)
FAIL: gfortran.dg/graphite/pr14741.f90   -O  (test for excess errors)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
excess errors)
FAIL: gfortran.dg/graphite/vect-pr40979.f90   -O  (test for excess
errors)
FAIL: gfortran.dg/graphite/id-27.f90   -O  (internal compiler error: in
extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -g  (internal compiler
error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -g  (test for excess
errors)
FAIL: gfortran.dg/graphite/id-27.f90   -O  (test for excess errors)
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
excess errors)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr14741.f90   -O  (internal compiler error:
in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -g  (internal compiler
error: in extract_insn, at recog.cc:2812)
 
The below test suites are passed for this patch:
* The rv64gcv fully regression test.
* The rv64gc fully regression test.
 
Try to write some RTL code for test but not works well according to
existing test cases.  Thus, take above as test cases.  Please note
graphite require the gcc build with isl.
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_legitimize_subreg_const_poly_move): New
func impl to take care of (const_int_poly:TI 8).
(riscv_legitimize_move): Handle subreg is const_int_poly,
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv.cc | 43 +++
1 file changed, 43 insertions(+)
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0519e0679ed..bad23ea487f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2786,6 +2786,44 @@ riscv_v_adjust_scalable_frame (rtx target, poly_int64 
offset, bool epilogue)
   REG_NOTES (insn) = dwarf;
}
+/* Take care below subreg const_poly_int move:
+
+   1. (set (subreg:DI (reg:TI 237) 8)
+(subreg:DI (const_poly_int:TI [4, 2]) 8))
+  =>
+  (set (subreg:DI (reg:TI 237) 8)
+(const_int 0)) */
+
+static bool
+riscv_legitimize_subreg_const_poly_move (machine_mode mode, rtx dest, rtx src)
+{
+  gcc_assert (SUBREG_P (src) && CONST_POLY_INT_P (SUBREG_REG (src)));
+  gcc_assert (SUBREG_BYTE (src).is_constant ());
+
+  int byte_offset = SUBREG_BYTE (src).to_constant ();
+  rtx const_poly = SUBREG_REG (src);
+  machine_mode subreg_mode = GET_MODE (const_poly);
+
+  if (subreg_mode != TImode) /* Only TImode is needed for now.  */
+return false;
+
+  if (byte_offset == 8)
+{ /* The const_poly_int cannot exceed int64, just set zero here.  */
+  emit_move_insn (dest, CONST0_RTX (mode));
+  return true;
+}
+
+  /* The below transform will be covered in somewhere else.
+ Thus, ignore this here.
+   1. (set (subreg:DI (reg:TI 237) 0)
+(subreg:DI (const_poly_int:TI [4, 2]) 0))
+  =>
+  (set (subreg:DI (reg:TI 237) 0)
+(const_poly_int:DI [4, 2])) */
+
+  return false;
+}
+
/* If (set DEST SRC) is not a valid move instruction, emit an equivalent
sequence that is valid.  */
@@ -2839,6 +2877,11 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
}
   return true;
 }
+
+  if (SUBREG_P (src) && CONST_POLY_INT_P (SUBREG_REG (src))
+&& riscv_legitimize_subreg_const_poly_move (mode, 

[PATCH] [x86] Adjust alternative *k to ?k for avx512 mask in zero_extend patterns

2024-04-27 Thread liuhongt
So when both source operand and dest operand require avx512 MASK_REGS, RA
can allocate MASK_REGS register instead of GPR to avoid reload it from
GPR to MASK_REGS.
It's similar as what did for logic patterns.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

* config/i386/i386.md: (zero_extendsidi2): Adjust
alternative *k to ?k.
(zero_extenddi2): Ditto.
(*zero_extendsi2): Ditto.
(*zero_extendqihi2): Ditto.
---
 gcc/config/i386/i386.md   | 16 +++
 .../gcc.target/i386/zero_extendkmask.c| 43 +++
 2 files changed, 51 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/zero_extendkmask.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d4ce3809e6d..f2ab7fdcd58 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -4567,10 +4567,10 @@ (define_expand "zero_extendsidi2"
 
 (define_insn "*zero_extendsidi2"
   [(set (match_operand:DI 0 "nonimmediate_operand"
-   "=r,?r,?o,r   ,o,?*y,?!*y,$r,$v,$x,*x,*v,*r,*k")
+   "=r,?r,?o,r   ,o,?*y,?!*y,$r,$v,$x,*x,*v,?r,?k")
(zero_extend:DI
 (match_operand:SI 1 "x86_64_zext_operand"
-   "0 ,rm,r ,rmWz,0,r  ,m   ,v ,r ,m ,*x,*v,*k,*km")))]
+   "0 ,rm,r ,rmWz,0,r  ,m   ,v ,r ,m ,*x,*v,?k,?km")))]
   ""
 {
   switch (get_attr_type (insn))
@@ -4703,9 +4703,9 @@ (define_mode_attr kmov_isa
   [(QI "avx512dq") (HI "avx512f") (SI "avx512bw") (DI "avx512bw")])
 
 (define_insn "zero_extenddi2"
-  [(set (match_operand:DI 0 "register_operand" "=r,*r,*k")
+  [(set (match_operand:DI 0 "register_operand" "=r,?r,?k")
(zero_extend:DI
-(match_operand:SWI12 1 "nonimmediate_operand" "m,*k,*km")))]
+(match_operand:SWI12 1 "nonimmediate_operand" "m,?k,?km")))]
   "TARGET_64BIT"
   "@
movz{l|x}\t{%1, %k0|%k0, %1}
@@ -4758,9 +4758,9 @@ (define_insn_and_split "zero_extendsi2_and"
(set_attr "mode" "SI")])
 
 (define_insn "*zero_extendsi2"
-  [(set (match_operand:SI 0 "register_operand" "=r,*r,*k")
+  [(set (match_operand:SI 0 "register_operand" "=r,?r,?k")
(zero_extend:SI
- (match_operand:SWI12 1 "nonimmediate_operand" "m,*k,*km")))]
+ (match_operand:SWI12 1 "nonimmediate_operand" "m,?k,?km")))]
   "!(TARGET_ZERO_EXTEND_WITH_AND && optimize_function_for_speed_p (cfun))"
   "@
movz{l|x}\t{%1, %0|%0, %1}
@@ -4813,8 +4813,8 @@ (define_insn_and_split "zero_extendqihi2_and"
 
 ; zero extend to SImode to avoid partial register stalls
 (define_insn "*zero_extendqihi2"
-  [(set (match_operand:HI 0 "register_operand" "=r,*r,*k")
-   (zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" 
"qm,*k,*km")))]
+  [(set (match_operand:HI 0 "register_operand" "=r,?r,?k")
+   (zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" 
"qm,?k,?km")))]
   "!(TARGET_ZERO_EXTEND_WITH_AND && optimize_function_for_speed_p (cfun))"
   "@
movz{bl|x}\t{%1, %k0|%k0, %1}
diff --git a/gcc/testsuite/gcc.target/i386/zero_extendkmask.c 
b/gcc/testsuite/gcc.target/i386/zero_extendkmask.c
new file mode 100644
index 000..6b18980bbd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero_extendkmask.c
@@ -0,0 +1,43 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-march=x86-64-v4 -O2" } */
+/* { dg-final { scan-assembler-not {(?n)shr[bwl]} } } */
+/* { dg-final { scan-assembler-not {(?n)movz[bw]} } } */
+
+#include
+
+__m512
+foo (__m512d a, __m512d b, __m512 c, __m512 d)
+{
+  return _mm512_mask_mov_ps (c, (__mmask16) (_mm512_cmpeq_pd_mask (a, b) >> 
1), d);
+}
+
+
+__m512i
+foo1 (__m512d a, __m512d b, __m512i c, __m512i d)
+{
+  return _mm512_mask_mov_epi16 (c, (__mmask32) (_mm512_cmpeq_pd_mask (a, b) >> 
1), d);
+}
+
+__m512i
+foo2 (__m512d a, __m512d b, __m512i c, __m512i d)
+{
+  return _mm512_mask_mov_epi8 (c, (__mmask64) (_mm512_cmpeq_pd_mask (a, b) >> 
1), d);
+}
+
+__m512i
+foo3 (__m512 a, __m512 b, __m512i c, __m512i d)
+{
+  return _mm512_mask_mov_epi16 (c, (__mmask32) (_mm512_cmpeq_ps_mask (a, b) >> 
1), d);
+}
+
+__m512i
+foo4 (__m512 a, __m512 b, __m512i c, __m512i d)
+{
+  return _mm512_mask_mov_epi8 (c, (__mmask64) (_mm512_cmpeq_ps_mask (a, b) >> 
1), d);
+}
+
+__m512i
+foo5 (__m512i a, __m512i b, __m512i c, __m512i d)
+{
+  return _mm512_mask_mov_epi8 (c, (__mmask64) (_mm512_cmp_epi16_mask (a, b, 5) 
>> 1), d);
+}
-- 
2.31.1



[PATCH] Update libbid according to the latest Intel Decimal Floating-Point Math Library.

2024-04-27 Thread liuhongt
The Intel Decimal Floating-Point Math Library is available as open-source on 
Netlib[1].

[1] https://www.netlib.org/misc/intel/.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

libgcc/config/libbid/ChangeLog:

* bid128_fma.c (add_and_round): Fix bug: the result
of (+5E+368)*(+10E-34)+(-10E+369) was returning
-99E+336 instead of expected
result -10E+337.
(bid128_ext_fma): Ditto.
(bid64qqq_fma): Ditto.
* bid128_noncomp.c: Change return type of bid128_class from
int to class_t.
* bid128_round_integral.c: Add default case to avoid compiler
warning.
* bid128_string.c (bid128_to_string): Replace 0x30 with '0'
for zero digit.
(bid128_from_string): Ditto.
* bid32_to_bid128.c (bid128_to_bid32): Fix Bug. In addition
to the INEXACT flag, the UNDERFLOW flag needs to be set (and
was not) when converting an input such as
+6931674235302037148946035460357709E+1857 to +100E-101
* bid32_to_bid64.c (bid64_to_bid32): fix Bug, In addition to
the INEXACT flag, the UNDERFLOW flag needs to be set (and was
not) when converting an input such as +9991E-111
to +100E-101. Furthermore, significant bits of NaNs are
set correctly now. For example,  0x7c3b9aca was
returning 0x7c02 instead of 0x 7c000100.
* bid64_noncomp.c: Change return type of bid64_class from int
to class_t.
* bid64_round_integral.c (bid64_round_integral_exact): Add
default case to avoid compiler warning.
* bid64_string.c (bid64_from_string): Fix bug for rounding
up. The input string "1" was returning
+1001E+1 instead of +1000E+1.
* bid64_to_bid128.c (bid128_to_bid64): Fix bug, in addition to
the INEXACT flag, the UNDERFLOW flag needs to be set (and was
not) when converting an input such as
+99E-417 to
+1000E-398.
* bid_binarydecimal.c (bid32_to_binary64): Fix bug for
conversion between binary and bid types. For example,
0x7c0F4240 was returning 0x7FFFA120 instead of
expected double precision 0x7FF8.
(binary64_to_bid32): Ditto.
(binary80_to_bid32): Ditto.
(binary128_to_bid32): Ditto.
(binary80_to_bid64): Ditto.
(binary128_to_bid64): Ditto.
* bid_conf.h (BID_HIGH_128W): New macro.
(BID_LOW_128W): Ditto.
* bid_functions.h (__ENABLE_BINARY80__): Ditto.
(ALIGN): Ditto.
* bid_inline_add.h (get_add128): Add default case to avoid compiler
warning.
* bid_internal.h (get_BID64): Ditto.
(fast_get_BID64_check_OF): Ditto.
(ALIGN): New macro.

Co-authored-by: Anderson, Cristina S 
Co-authored-by: Akkas, Ahmet 
Co-authored-by: Cornea, Marius 
---
 libgcc/config/libbid/bid128_fma.c| 188 ++-
 libgcc/config/libbid/bid128_noncomp.c|   2 +-
 libgcc/config/libbid/bid128_round_integral.c |   2 +
 libgcc/config/libbid/bid128_string.c |   7 +-
 libgcc/config/libbid/bid32_to_bid128.c   |   3 -
 libgcc/config/libbid/bid32_to_bid64.c|  11 +-
 libgcc/config/libbid/bid64_noncomp.c |   2 +-
 libgcc/config/libbid/bid64_round_integral.c  |   2 +
 libgcc/config/libbid/bid64_string.c  |  21 ++-
 libgcc/config/libbid/bid64_to_bid128.c   |   3 -
 libgcc/config/libbid/bid_binarydecimal.c | 167 ++--
 libgcc/config/libbid/bid_conf.h  |   8 +
 libgcc/config/libbid/bid_functions.h |  23 ++-
 libgcc/config/libbid/bid_inline_add.h|   2 +
 libgcc/config/libbid/bid_internal.h  |  17 +-
 15 files changed, 220 insertions(+), 238 deletions(-)

diff --git a/libgcc/config/libbid/bid128_fma.c 
b/libgcc/config/libbid/bid128_fma.c
index 67233193a42..cbcf225546f 100644
--- a/libgcc/config/libbid/bid128_fma.c
+++ b/libgcc/config/libbid/bid128_fma.c
@@ -417,13 +417,12 @@ add_and_round (int q3,
   R128.w[1] = R256.w[1];
   R128.w[0] = R256.w[0];
 }
+if (e4 + x0 < expmin) { // for all rounding modes
+  is_tiny = 1;
+}
 // the rounded result has p34 = 34 digits
 e4 = e4 + x0 + incr_exp;
-if (rnd_mode == ROUNDING_TO_NEAREST) {
-  if (e4 < expmin) {
-is_tiny = 1; // for other rounding modes apply correction
-  }
-} else {
+if (rnd_mode != ROUNDING_TO_NEAREST) {
   // for RM, RP, RZ, RA apply correction in order to determine tininess
   // but do not save the result; apply the correction to 
   // (-1)^p_sign * significand * 10^0
@@ -434,10 +433,6 @@ add_and_round (int q3,
   is_inexact_gt_midpoint, is_midpoint_lt_even,
   i

[PATCH 2/2] Extend usdot_prodv*qi with vpmaddwd when AVXVNNI/AVX512VNNI is not available.

2024-04-27 Thread liuhongt
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

* config/i386/sse.md (usdot_prodv*qi): Extend to VI1_AVX512
with vpmaddwd when avxvnni/avx512vnni is not available.
---
 gcc/config/i386/sse.md | 55 +++---
 1 file changed, 41 insertions(+), 14 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 1bf50726e83..f57f36ae380 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -29955,21 +29955,48 @@ (define_insn "vpshldv__maskz_1"
 
 (define_expand "usdot_prod"
   [(match_operand: 0 "register_operand")
-   (match_operand:VI1_AVX512VNNI 1 "register_operand")
-   (match_operand:VI1_AVX512VNNI 2 "register_operand")
+   (match_operand:VI1_AVX512 1 "register_operand")
+   (match_operand:VI1_AVX512 2 "register_operand")
(match_operand: 3 "register_operand")]
-  "(( == 64 && TARGET_EVEX512)
-|| ((TARGET_AVX512VNNI && TARGET_AVX512VL)
-   || TARGET_AVXVNNI))"
-{
-  operands[1] = lowpart_subreg (mode,
-   force_reg (mode, operands[1]),
-   mode);
-  operands[2] = lowpart_subreg (mode,
-   force_reg (mode, operands[2]),
-   mode);
-  emit_insn (gen_vpdpbusd_ (operands[0], operands[3],
- operands[1], operands[2]));
+  "TARGET_SSE2"
+{
+  if ( == 64
+ ? TARGET_AVX512VNNI
+ : ((TARGET_AVX512VNNI && TARGET_AVX512VL) || TARGET_AVXVNNI))
+{
+  operands[1] = lowpart_subreg (mode,
+   force_reg (mode, operands[1]),
+   mode);
+  operands[2] = lowpart_subreg (mode,
+   force_reg (mode, operands[2]),
+   mode);
+  emit_insn (gen_vpdpbusd_ (operands[0], operands[3],
+ operands[1], operands[2]));
+}
+  else
+{
+  /* Emulate with vpdpwssd.  */
+  rtx op1_lo = gen_reg_rtx (mode);
+  rtx op1_hi = gen_reg_rtx (mode);
+  rtx op2_lo = gen_reg_rtx (mode);
+  rtx op2_hi = gen_reg_rtx (mode);
+
+  emit_insn (gen_vec_unpacku_lo_ (op1_lo, operands[1]));
+  emit_insn (gen_vec_unpacks_lo_ (op2_lo, operands[2]));
+  emit_insn (gen_vec_unpacku_hi_ (op1_hi, operands[1]));
+  emit_insn (gen_vec_unpacks_hi_ (op2_hi, operands[2]));
+
+  rtx res1 = gen_reg_rtx (mode);
+  rtx res2 = gen_reg_rtx (mode);
+  rtx sum = gen_reg_rtx (mode);
+
+  emit_move_insn (sum, CONST0_RTX (mode));
+  emit_insn (gen_sdot_prod (res1, op1_lo,
+   op2_lo, sum));
+  emit_insn (gen_sdot_prod (res2, op1_hi,
+   op2_hi, operands[3]));
+  emit_insn (gen_add3 (operands[0], res1, res2));
+}
   DONE;
 })
 
-- 
2.31.1



[PATCH 1/2] [x86] Support dot_prod optabs for 64-bit vector.

2024-04-27 Thread liuhongt
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

PR target/113079
* config/i386/mmx.md (usdot_prodv8qi): New expander.
(sdot_prodv8qi): Ditto.
(udot_prodv8qi): Ditto.
(usdot_prodv4hi): Ditto.
(udot_prodv4hi): Ditto.
(sdot_prodv4hi): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr113079.c: New test.
* gcc.target/i386/pr113079-2.c: New test.
* gcc.target/i386/sse4-pr113079-2.c: New test.
---
 gcc/config/i386/mmx.md| 195 ++
 gcc/testsuite/gcc.target/i386/pr113079-2.c| 161 +++
 gcc/testsuite/gcc.target/i386/pr113079.c  |  57 +
 .../gcc.target/i386/sse4-pr113079-2.c | 158 ++
 4 files changed, 571 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr113079-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr113079.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse4-pr113079-2.c

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 9a8d6030d8b..5f342497885 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -6342,6 +6342,201 @@ (define_expand "usadv8qi"
   DONE;
 })
 
+(define_expand "usdot_prodv8qi"
+  [(match_operand:V2SI 0 "register_operand")
+   (match_operand:V8QI 1 "register_operand")
+   (match_operand:V8QI 2 "register_operand")
+   (match_operand:V2SI 3 "register_operand")]
+  "TARGET_MMX_WITH_SSE && TARGET_SSE4_1"
+{
+  operands[1] = force_reg (V8QImode, operands[1]);
+  operands[2] = force_reg (V8QImode, operands[2]);
+  operands[3] = force_reg (V2SImode, operands[3]);
+
+  if ((TARGET_AVX512VNNI && TARGET_AVX512VL)
+ || TARGET_AVXVNNI)
+{
+  rtx op1 = lowpart_subreg (V16QImode, operands[1], V8QImode);
+  rtx op2 = lowpart_subreg (V16QImode, operands[2], V8QImode);
+  rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
+  rtx op0 = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_usdot_prodv16qi (op0, op1, op2, op3));
+  emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
+ }
+   else
+ {
+  rtx op1 = gen_reg_rtx (V8HImode);
+  rtx op2 = gen_reg_rtx (V8HImode);
+  rtx op3 = gen_reg_rtx (V4SImode);
+  rtx op0 = gen_reg_rtx (V4SImode);
+  rtx op0_1 = gen_reg_rtx (V4SImode);
+
+  emit_move_insn (op3, CONST0_RTX (V4SImode));
+  emit_insn (gen_zero_extendv8qiv8hi2 (op1, operands[1]));
+  emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
+  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+
+  /* vec_perm (op0, 2, 3, 0, 1);  */
+  emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78)));
+  emit_insn (gen_addv4si3 (op0, op0, op0_1));
+  emit_insn (gen_addv2si3 (operands[0], operands[3],
+  lowpart_subreg (V2SImode, op0, V4SImode)));
+ }
+DONE;
+})
+
+(define_expand "sdot_prodv8qi"
+  [(match_operand:V2SI 0 "register_operand")
+   (match_operand:V8QI 1 "register_operand")
+   (match_operand:V8QI 2 "register_operand")
+   (match_operand:V2SI 3 "register_operand")]
+  "TARGET_MMX_WITH_SSE && TARGET_SSE4_1"
+{
+  operands[1] = force_reg (V8QImode, operands[1]);
+  operands[2] = force_reg (V8QImode, operands[2]);
+  operands[3] = force_reg (V2SImode, operands[3]);
+
+  if (TARGET_AVXVNNIINT8)
+{
+  rtx op1 = lowpart_subreg (V16QImode, operands[1], V8QImode);
+  rtx op2 = lowpart_subreg (V16QImode, operands[2], V8QImode);
+  rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
+  rtx op0 = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_sdot_prodv16qi (op0, op1, op2, op3));
+  emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
+}
+  else
+{
+  rtx op1 = gen_reg_rtx (V8HImode);
+  rtx op2 = gen_reg_rtx (V8HImode);
+  rtx op3 = gen_reg_rtx (V4SImode);
+  rtx op0 = gen_reg_rtx (V4SImode);
+  rtx op0_1 = gen_reg_rtx (V4SImode);
+
+  emit_move_insn (op3, CONST0_RTX (V4SImode));
+  emit_insn (gen_extendv8qiv8hi2 (op1, operands[1]));
+  emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
+  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+
+  /* vec_perm (op0, 2, 3, 0, 1);  */
+  emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78)));
+  emit_insn (gen_addv4si3 (op0, op0, op0_1));
+  emit_insn (gen_addv2si3 (operands[0], operands[3],
+  lowpart_subreg (V2SImode, op0, V4SImode)));
+}
+  DONE;
+
+})
+
+(define_expand "udot_prodv8qi"
+  [(match_operand:V2SI 0 "register_operand")
+   (match_operand:V8QI 1 "register_operand")
+   (match_operand:V8QI 2 "register_operand")
+   (match_operand:V2SI 3 "register_operand")]
+  "TARGET_MMX_WITH_SSE && TARGET_SSE4_1"
+{
+  operands[1] = force_reg (V8QImode, operands[1]);
+  operands[2] = force_reg (V8QImode, operands[2]);
+  operands[3] = force_reg (V2SImode, operands[3]);
+
+  if (TARGET_AVXVNNIINT8)
+{
+  rtx op1 =

[PATCH] [x86] Optimize 64-bit vector permutation with punpcklqdq + 128-bit vector pshuf.

2024-04-27 Thread liuhongt
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ready push to trunk.

gcc/ChangeLog:

PR target/113090
* config/i386/i386-expand.cc
(expand_vec_perm_punpckldq_pshuf): New function.
(ix86_expand_vec_perm_const_1): Try
expand_vec_perm_punpckldq_pshuf for sequence of 2
instructions.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr113090.c: New test.
---
 gcc/config/i386/i386-expand.cc   | 71 
 gcc/testsuite/gcc.target/i386/pr113090.c | 25 +
 2 files changed, 96 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr113090.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 8bb8f21e686..fd49d866004 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -20813,6 +20813,74 @@ expand_vec_perm_pshuflw_pshufhw (struct 
expand_vec_perm_d *d)
   return true;
 }
 
+/* Try to permute 2 64-bit vectors by punpckldq + 128-bit vector shuffle.  */
+static bool
+expand_vec_perm_punpckldq_pshuf (struct expand_vec_perm_d *d)
+{
+  if (GET_MODE_BITSIZE (d->vmode) != 64
+  || !TARGET_MMX_WITH_SSE
+  || d->one_operand_p)
+return false;
+
+  machine_mode widen_vmode;
+  switch (d->vmode)
+{
+/* pshufd.  */
+case E_V2SImode:
+  widen_vmode = V4SImode;
+  break;
+
+/* pshufd.  */
+case E_V2SFmode:
+  widen_vmode = V4SFmode;
+  break;
+
+case E_V4HImode:
+  widen_vmode = V8HImode;
+  /* pshufb.  */
+  if (!TARGET_SSSE3)
+   return false;
+  break;
+
+case E_V8QImode:
+  /* pshufb.  */
+  widen_vmode = V16QImode;
+  if (!TARGET_SSSE3)
+   return false;
+  break;
+
+default:
+  return false;
+}
+
+  if (d->testing_p)
+return true;
+
+  struct expand_vec_perm_d dperm;
+  dperm.target = gen_reg_rtx (widen_vmode);
+  rtx op0 = gen_reg_rtx (widen_vmode);
+  emit_move_insn (op0, gen_rtx_VEC_CONCAT (widen_vmode, d->op0, d->op1));
+  dperm.op0 = op0;
+  dperm.op1 = op0;
+  dperm.vmode = widen_vmode;
+  unsigned nelt = GET_MODE_NUNITS (widen_vmode);
+  dperm.nelt = nelt;
+  dperm.one_operand_p = true;
+  dperm.testing_p = false;
+
+  for (unsigned i = 0; i != nelt / 2; i++)
+{
+  dperm.perm[i] = d->perm[i];
+  dperm.perm[i + nelt / 2] = d->perm[i];
+}
+
+  gcc_assert (expand_vec_perm_1 (&dperm));
+  emit_move_insn (d->target, lowpart_subreg (d->vmode,
+dperm.target,
+dperm.vmode));
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_const_1.  Try to simplify
the permutation using the SSSE3 palignr instruction.  This succeeds
when all of the elements in PERM fit within one vector and we merely
@@ -23325,6 +23393,9 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d 
*d)
   if (expand_vec_perm_shufps_shufps (d))
 return true;
 
+  if (expand_vec_perm_punpckldq_pshuf (d))
+return true;
+
   /* Try sequences of three instructions.  */
 
   if (expand_vec_perm_even_odd_pack (d))
diff --git a/gcc/testsuite/gcc.target/i386/pr113090.c 
b/gcc/testsuite/gcc.target/i386/pr113090.c
new file mode 100644
index 000..0f0b7cc0084
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr113090.c
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -msse4.1" } */
+/* { dg-final { scan-assembler-times "pshufd" 3 } } */
+
+typedef int v2si __attribute__((vector_size(8)));
+typedef short v4hi __attribute__((vector_size(8)));
+typedef char v8qi __attribute__((vector_size(8)));
+
+v2si
+foo (v2si a, v2si b)
+{
+return __builtin_shufflevector (a, b, 1, 2);
+}
+
+v4hi
+foo1 (v4hi a, v4hi b)
+{
+  return __builtin_shufflevector (a, b, 2, 3, 4, 5);
+}
+
+v8qi
+foo2 (v8qi a, v8qi b)
+{
+  return __builtin_shufflevector (a, b, 4, 5, 6, 7, 8, 9, 10, 11);
+}
-- 
2.31.1



[PATCH 1/2] MATCH: change single_non_singleton_phi_for_edges for singleton phis

2024-04-27 Thread Andrew Pinski
I noticed that single_non_singleton_phi_for_edges could
return a phi whos entry are all the same for the edge.
This happens only if there was a single phis in the first place.
Also gimple_seq_singleton_p walks the sequence to see if it the one
element in the sequence so there is removing that check actually
reduces the number of pointer walks needed.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (single_non_singleton_phi_for_edges):
Remove the special case of gimple_seq_singleton_p.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 8 
 1 file changed, 8 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index d1746c4b468..f1e07502b02 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -62,14 +62,6 @@ single_non_singleton_phi_for_edges (gimple_seq seq, edge e0, 
edge e1)
 {
   gimple_stmt_iterator i;
   gphi *phi = NULL;
-  if (gimple_seq_singleton_p (seq))
-{
-  phi = as_a  (gsi_stmt (gsi_start (seq)));
-  /* Never return virtual phis.  */
-  if (virtual_operand_p (gimple_phi_result (phi)))
-   return NULL;
-  return phi;
-}
   for (i = gsi_start (seq); !gsi_end_p (i); gsi_next (&i))
 {
   gphi *p = as_a  (gsi_stmt (i));
-- 
2.43.0



[PATCH 2/2] PHI-OPT: speed up value_replacement slightly

2024-04-27 Thread Andrew Pinski
This adds a few early outs to value_replacement that I noticed
while rewriting this to use match-and-simplify but could be committed
seperately.
* virtual operands won't change so return early for them
* special case `A ? B : B` as that is already just `B`

Also moves the check for NE/EQ earlier as calculating empty_or_with_defined_p
is an IR walk for a BB and that might be big.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (value_replacement): Move check for
NE/EQ earlier.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index f1e07502b02..a2bdcb5eae8 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -1131,6 +1131,21 @@ value_replacement (basic_block cond_bb, basic_block 
middle_bb,
   enum tree_code code;
   bool empty_or_with_defined_p = true;
 
+  /* Virtual operands don't need to be handled. */
+  if (virtual_operand_p (arg1))
+return 0;
+
+  /* Special case A ? B : B as this will always simplify to B. */
+  if (operand_equal_for_phi_arg_p (arg0, arg1))
+return 0;
+
+  gcond *cond = as_a  (*gsi_last_bb (cond_bb));
+  code = gimple_cond_code (cond);
+
+  /* This transformation is only valid for equality comparisons.  */
+  if (code != NE_EXPR && code != EQ_EXPR)
+return 0;
+
   /* If the type says honor signed zeros we cannot do this
  optimization.  */
   if (HONOR_SIGNED_ZEROS (arg1))
@@ -1161,13 +1176,6 @@ value_replacement (basic_block cond_bb, basic_block 
middle_bb,
empty_or_with_defined_p = false;
 }
 
-  gcond *cond = as_a  (*gsi_last_bb (cond_bb));
-  code = gimple_cond_code (cond);
-
-  /* This transformation is only valid for equality comparisons.  */
-  if (code != NE_EXPR && code != EQ_EXPR)
-return 0;
-
   /* We need to know which is the true edge and which is the false
   edge so that we know if have abs or negative abs.  */
   extract_true_false_edges_from_block (cond_bb, &true_edge, &false_edge);
-- 
2.43.0