Re: [PATCH] i386: Improve ix86_expand_int_movcc [PR105338]

2022-04-23 Thread Uros Bizjak via Gcc-patches
On Sat, Apr 23, 2022 at 8:53 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase regressed on x86_64 on the trunk, due to some GIMPLE
> pass changes (r12-7687) we end up an *.optimized dump difference of:
> @@ -8,14 +8,14 @@ int foo (int i)
>
> [local count: 1073741824]:
>if (i_2(D) != 0)
> -goto ; [35.00%]
> +goto ; [35.00%]
>else
> -goto ; [65.00%]
> +goto ; [65.00%]
>
> -   [local count: 697932184]:
> +   [local count: 375809640]:
>
> [local count: 1073741824]:
> -  # iftmp.0_1 = PHI <5(2), i_2(D)(3)>
> +  # iftmp.0_1 = PHI <5(3), i_2(D)(2)>
>return iftmp.0_1;
>
>  }
> and similarly for the other functions.  That is functionally equivalent and
> there is no canonical form for those.  The reason for i_2(D) in the PHI
> argument as opposed to 0 is the uncprop pass, that is in many cases
> beneficial for expansion as we don't need to load the value into some pseudo
> in one of the if blocks.
> Now, for the 11.x ordering we have the pseudo = i insn in the extended basic
> block (it comes first) and so forwprop1 undoes what uncprop does by
> propagating constant 0 there.  But for the 12.x ordering, the extended basic
> block contains pseudo = 5 and pseudo = i is in the other bb and so fwprop1
> doesn't change it.
> During the ce1 pass, we attempt to emit a conditional move and we have very
> nice code for the cases where both last operands of ?: are constant, and yet
> another for !TARGET_CMOVE if at least one of them is.
>
> The following patch will undo the uncprop behavior during
> ix86_expand_int_movcc, but just for those spots that can benefit from both
> or at least one operands being constant, leaving the pure cmov case as is
> (because then it is useful not to have to load a constant into a pseudo
> as it already is in one).  We can do that in the
> op0 == op1 ? op0 : op3
> or
> op0 != op1 ? op2 : op0
> cases if op1 is a CONST_INT by pretending it is
> op0 == op1 ? op1 : op3
> or
> op0 != op1 ? op2 : op1
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2022-04-23  Jakub Jelinek  
>
> PR target/105338
> * config/i386/i386-expand.cc (ix86_expand_int_movcc): Handle
> op0 == cst1 ? op0 : op3 like op0 == cst1 ? cst1 : op3 for the non-cmov
> cases.
>
> * gcc.target/i386/pr105338.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386-expand.cc.jj   2022-04-13 15:42:39.0 +0200
> +++ gcc/config/i386/i386-expand.cc  2022-04-22 14:18:27.347135185 +0200
> @@ -3136,6 +3136,8 @@ ix86_expand_int_movcc (rtx operands[])
>bool sign_bit_compare_p = false;
>rtx op0 = XEXP (operands[1], 0);
>rtx op1 = XEXP (operands[1], 1);
> +  rtx op2 = operands[2];
> +  rtx op3 = operands[3];
>
>if (GET_MODE (op0) == TImode
>|| (GET_MODE (op0) == DImode
> @@ -3153,17 +3155,29 @@ ix86_expand_int_movcc (rtx operands[])
>|| (op1 == constm1_rtx && (code == GT || code == LE)))
>  sign_bit_compare_p = true;
>
> +  /* op0 == op1 ? op0 : op3 is equivalent to op0 == op1 ? op1 : op3,
> + but if op1 is a constant, the latter form allows more optimizations,
> + either through the last 2 ops being constant handling, or the one
> + constant and one variable cases.  On the other side, for cmov the
> + former might be better as we don't need to load the constant into
> + another register.  */
> +  if (code == EQ && CONST_INT_P (op1) && rtx_equal_p (op0, op2))
> +op2 = op1;
> +  /* Similarly for op0 != op1 ? op2 : op0 and op0 != op1 ? op2 : op1.  */
> +  else if (code == NE && CONST_INT_P (op1) && rtx_equal_p (op0, op3))
> +op3 = op1;
> +
>/* Don't attempt mode expansion here -- if we had to expand 5 or 6
>   HImode insns, we'd be swallowed in word prefix ops.  */
>
>if ((mode != HImode || TARGET_FAST_PREFIX)
>&& (mode != (TARGET_64BIT ? TImode : DImode))
> -  && CONST_INT_P (operands[2])
> -  && CONST_INT_P (operands[3]))
> +  && CONST_INT_P (op2)
> +  && CONST_INT_P (op3))
>  {
>rtx out = operands[0];
> -  HOST_WIDE_INT ct = INTVAL (operands[2]);
> -  HOST_WIDE_INT cf = INTVAL (operands[3]);
> +  HOST_WIDE_INT ct = INTVAL (op2);
> +  HOST_WIDE_INT cf = INTVAL (op3);
>HOST_WIDE_INT diff;
>
>diff = ct - cf;
> @@ -3559,6 +3573,9 @@ ix86_expand_int_movcc (rtx operands[])
>if (BRANCH_COST (optimize_insn_for_speed_p (), false) <= 2)
> return false;
>
> +  operands[2] = op2;
> +  operands[3] = op3;
> +
>/* If one of the two operands is an interesting constant, load a
>  constant with the above and mask it in with a logical operation.  */
>
> --- gcc/testsuite/gcc.target/i386/pr105338.c.jj 2022-04-22 16:14:35.827045371 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr105338.c2022-04-22 16:20:43.579913630 
> +0200
> @@ -0,0 +1,26 @@
> +/* PR target/105338 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-ipa-icf -masm=att" } */
> +/* { d

[PATCH] ppc: testsuite: float128-hw{,4}.c need -mlong-double-128 (was: [PATCH] ppc: testsuite: pr79004 needs -mlong-double-128)

2022-04-23 Thread Alexandre Oliva via Gcc-patches
On Apr 14, 2022, Alexandre Oliva  wrote:

>   * gcc.target/powerpr/pr79004.c: Add -mlong-double-128.

Just like pr79004, float128-hw.c requires -mlong-double-128 for some
the expected asm opcodes to be output on target variants that have
64-bit long doubles.  That's because their expanders,
e.g. floatsi2 for FLOAT128 modes, are conditioned on
TARGET_LONG_DOUBLE_128, which is not set on target variants that use
64-bit long double.

float128-hw4.c doesn't even compile without -mlong-double-128, on
64-bit long double target variants.  The error is "invalid parameter
combination for AltiVec intrinsic" in get_float128_exponent,
get_float128_mantissa, and set_float128_exponent_float128, presumably
caused by rs6000_builtin_type_compatible's refusal to consider
_Float128 compatible when TARGET_LONG_DOUBLE_128 is not set.

Since these are compile tests, -mlong-double-128 doesn't hurt even on
target variants that use 64-bit long double, and enables both tests to
pass.

Tested on x86_64-linux-gnu x ppc64-vx7r2 with gcc-11.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/powerpc/float128-hw.c: Add -mlong-double-128.
* gcc.target/powerpc/float128-hw4.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/float128-hw.c  |2 +-
 gcc/testsuite/gcc.target/powerpc/float128-hw4.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw.c 
b/gcc/testsuite/gcc.target/powerpc/float128-hw.c
index 8c9beafa90ad0..284d744c00978 100644
--- a/gcc/testsuite/gcc.target/powerpc/float128-hw.c
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target lp64 } } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
 /* { dg-require-effective-target float128 } */
-/* { dg-options "-mpower9-vector -O2" } */
+/* { dg-options "-mpower9-vector -O2 -mlong-double-128" } */
 
 #ifndef TYPE
 #define TYPE _Float128
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw4.c 
b/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
index fc149169bc632..d86eadbcc7d06 100644
--- a/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target lp64 } } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
 /* { dg-require-effective-target float128 } */
-/* { dg-options "-mpower9-vector -O2 -mabi=ieeelongdouble -Wno-psabi" } */
+/* { dg-options "-mpower9-vector -O2 -mlong-double-128 -mabi=ieeelongdouble 
-Wno-psabi" } */
 
 /* Insure that the ISA 3.0 IEEE 128-bit floating point built-in functions can
be used with long double when the default is IEEE 128-bit.  */


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] PR target/89125

2022-04-23 Thread Steve Kargl via Gcc-patches
ping.

On Fri, Apr 15, 2022 at 09:23:43AM -0700, Steve Kargl wrote:
> Can someone, anyone, please commit the attach patch, which is 
> also attached to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89125
> where one can read the audit trail.  The original patch was 
> submitted 2 years ago, and required manual intervention due to
> the recent *.c to *.cc rename.
> 
> Back story: When GCC is configured and built on non-glibc platforms,
> it seems very little to no effort is made to enumerate the available
> C99 libm functions.  It is all or nothing for C99 libm.  The patch
> introduces a new function, used on only FreeBSD, to inform gcc that
> it has C99 libm functions (minus a few which clearly GCC does not check
> nor test).
> 
> The patch introduces no regression on x86_64-*-freebsd while
> allowing an additional 31 new passes.
> 
> === gcc Summary ===
> w/o patch  w patch
> # of expected passes175405 175434
> # of unexpected failures1081   1051
> # of unexpected successes   20 20
> # of expected failures  1459   1459
> # of unresolved testcases   10 10
> # of unsupported tests  3252   3252
> 
> === g++ Summary ===
> w/o patch  w patch
> # of expected passes225338 225341
> # of unexpected failures678676
> # of expected failures  2071   2071
> # of unresolved testcases   11 11
> # of unsupported tests  10353  10353
> 
> === gfortran Summary ===
> w/o patch  w patch
> # of expected passes65901  65901
> # of unexpected failures12 12
> # of expected failures  272272
> # of unsupported tests  100100
> 
> 
> 2022-04-15  Steven G. Kargl  
> 
>   PR target/89125
>   * config/freebsd.h: Define TARGET_LIBC_HAS_FUNCTION to be
>   bsd_libc_has_function.
>   * gcc/targhooks.cc(bsd_libc_has_function): New function.
>   Expand the supported math functions to inclue C99 libm.
>   * gcc/targhooks.h: Prototype for bsd_libc_has_function.
> 
> -- 
> Steve

> diff --git a/gcc/config/freebsd.h b/gcc/config/freebsd.h
> index 28ebcad88d4..d89ee7dfc97 100644
> --- a/gcc/config/freebsd.h
> +++ b/gcc/config/freebsd.h
> @@ -55,7 +55,7 @@ along with GCC; see the file COPYING3.  If not see
>  #endif
>  
>  #undef TARGET_LIBC_HAS_FUNCTION
> -#define TARGET_LIBC_HAS_FUNCTION no_c99_libc_has_function
> +#define TARGET_LIBC_HAS_FUNCTION bsd_libc_has_function
>  
>  /* Use --as-needed -lgcc_s for eh support.  */
>  #ifdef HAVE_LD_AS_NEEDED
> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> index e22bc66a6c8..ff127763cf2 100644
> --- a/gcc/targhooks.cc
> +++ b/gcc/targhooks.cc
> @@ -1843,6 +1843,20 @@ no_c99_libc_has_function (enum function_class fn_class 
> ATTRIBUTE_UNUSED,
>return false;
>  }
>  
> +/* Assume some c99 functions are present at the runtime including sincos.  
> */ 
> +bool
> +bsd_libc_has_function (enum function_class fn_class,
> +tree type ATTRIBUTE_UNUSED)
> +{
> +  if (fn_class == function_c94
> +  || fn_class == function_c99_misc
> +  || fn_class == function_sincos)
> +return true;
> +
> +  return false;
> +}
> +
> +
>  tree
>  default_builtin_tm_load_store (tree ARG_UNUSED (type))
>  {
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index ecfa11287ef..ecce55ebe79 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -212,6 +212,7 @@ extern bool default_libc_has_function (enum 
> function_class, tree);
>  extern bool default_libc_has_fast_function (int fcode);
>  extern bool no_c99_libc_has_function (enum function_class, tree);
>  extern bool gnu_libc_has_function (enum function_class, tree);
> +extern bool bsd_libc_has_function (enum function_class, tree);
>  
>  extern tree default_builtin_tm_load_store (tree);
>  


-- 
Steve


Re: [PATCH v2] fortran: Detect duplicate unlimited polymorphic types [PR103662]

2022-04-23 Thread Harald Anlauf via Gcc-patches

Hi Mikael,

Am 22.04.22 um 12:53 schrieb Mikael Morin:

Le 21/04/2022 à 23:14, Mikael Morin a écrit :

Hello,

this is a fix for PR103662, a TBAA issue with unlimited polymorphic
types.

I attached a draft patch to the PR which was accumulating all unlimited
polymorphic symbols to a single namespace, avoiding duplicate symbols
and thus eliminating the problem.

After reviewing the code more in detail, I was afraid that some symbols
could still end up in the local namespace, and that the problem would
remain for them after all.

Despite not being able to generate a testcase where it happened, I
decided to produce a patch based on Jakub’s analysis in the PR audit
trail, as that way supports duplicates by design.

On top of Jakub’s patch, there are a couple more types registrations
just in case (they handle duplicates so that’s fine), and the type
comparison fix that he was too fortran-uncomfortable to do.

The testcase had to be fixed as we found out in the PR audit trail.

Regression tested on x86_64-pc-linux-gnu. OK for master?

Mikael


I have read Jakub’s analysis again, and it says the type registration is
useless for unlimited polymorphic fake symbols, as they are all
translated as ptr_type_node.
So it can be dropped, which brings this v2 patch closer to Jakub’s
original.

Regression tested again. OK?


LGTM.

Thanks for the patch!

Harald


[PATCH v1] RISC-V: Implement C[LT]Z_DEFINED_VALUE_AT_ZERO

2022-04-23 Thread Philipp Tomsich
The Zbb support has introduced ctz and clz to the backend, but some
transformations in GCC need to know what the value of c[lt]z at zero
is. This affects how the optab is generated and may suppress use of
CLZ/CTZ in tree passes.

Among other things, this is needed for the transformation of
table-based ctz-implementations, such as in deepsjeng, to work
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838).

Prior to this change, the test case from PR90838 would compile to
on RISC-V targets with Zbb:
  myctz:
lui a4,%hi(.LC0)
ld  a4,%lo(.LC0)(a4)
neg a5,a0
and a5,a5,a0
mul a5,a5,a4
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
srlia5,a5,58
sh2add  a5,a5,a4
lw  a0,0(a5)
ret

After this change, we get:
  myctz:
ctz a0,a0
andia0,a0,63
ret

Testing this with deepsjeng_r (from SPEC 2017) against QEMU, this
shows a clear reduction in dynamic instruction count:
 - before  1961888067076
 - after   1907928279874 (2.75% reduction)

gcc/ChangeLog:

* config/riscv/riscv.h (CLZ_DEFINED_VALUE_AT_ZERO): Implement.
(CTZ_DEFINED_VALUE_AT_ZERO): Same.

gcc/testsuite/ChangeLog:

* gcc.dg/pr90838.c: Add additional flags (dg-additional-options)
  when compiling for riscv64.
* gcc.target/riscv/zbb-ctz.c: New test.

Signed-off-by: Philipp Tomsich 
Signed-off-by: Manolis Tsamis 
Co-developed-by: Manolis Tsamis 

---
 gcc/config/riscv/riscv.h|  5 ++
 gcc/testsuite/gcc.dg/pr90838.c  |  2 +
 gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c | 65 
 gcc/testsuite/gcc.target/riscv/zbb-ctz.c| 66 +
 4 files changed, 138 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-ctz.c

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 4210e252255..95f72e2fd3f 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1019,4 +1019,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
(void);
 
 #define HARD_REGNO_RENAME_OK(FROM, TO) riscv_hard_regno_rename_ok (FROM, TO)
 
+#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
+
 #endif /* ! GCC_RISCV_H */
diff --git a/gcc/testsuite/gcc.dg/pr90838.c b/gcc/testsuite/gcc.dg/pr90838.c
index 41c5dab9a5c..162bd6f51d0 100644
--- a/gcc/testsuite/gcc.dg/pr90838.c
+++ b/gcc/testsuite/gcc.dg/pr90838.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-forwprop2-details" } */
+/* { dg-additional-options "-march=rv64gc_zbb" { target riscv64*-*-* } } */
 
 int ctz1 (unsigned x)
 {
@@ -57,3 +58,4 @@ int ctz4 (unsigned long x)
 }
 
 /* { dg-final { scan-tree-dump-times {= \.CTZ} 4 "forwprop2" { target 
aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times {= \.CTZ} 4 "forwprop2" { target 
riscv64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c 
b/gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c
new file mode 100644
index 000..b903517197a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zbb -mabi=ilp32" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+int ctz1 (unsigned x)
+{
+  static const char table[32] =
+{
+  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
+  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
+};
+
+  return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
+}
+
+int ctz2 (unsigned x)
+{
+#define u 0
+  static short table[64] =
+{
+  32, 0, 1,12, 2, 6, u,13, 3, u, 7, u, u, u, u,14,
+  10, 4, u, u, 8, u, u,25, u, u, u, u, u,21,27,15,
+  31,11, 5, u, u, u, u, u, 9, u, u,24, u, u,20,26,
+  30, u, u, u, u,23, u,19,29, u,22,18,28,17,16, u
+};
+
+  x = (x & -x) * 0x0450FBAF;
+  return table[x >> 26];
+}
+
+int ctz3 (unsigned x)
+{
+  static int table[32] =
+{
+  0, 1, 2,24, 3,19, 6,25, 22, 4,20,10,16, 7,12,26,
+  31,23,18, 5,21, 9,15,11,30,17, 8,14,29,13,28,27
+};
+
+  if (x == 0) return 32;
+  x = (x & -x) * 0x04D7651F;
+  return table[x >> 27];
+}
+
+static const unsigned long long magic = 0x03f08c5392f756cdULL;
+
+static const char table[64] = {
+ 0,  1, 12,  2, 13, 22, 17,  3,
+14, 33, 23, 36, 18, 58, 28,  4,
+62, 15, 34, 26, 24, 48, 50, 37,
+19, 55, 59, 52, 29, 44, 39,  5,
+63, 11, 21, 16, 32, 35, 57, 27,
+61, 25, 47, 49, 54, 51, 43, 38,
+10, 20, 31, 56, 60, 46, 53, 42,
+ 9, 30, 45, 41,  8, 40,  7,  6,
+};
+
+int ctz4 (unsigned long x)
+{
+  unsigned long lsb = x & -x;
+  return table[(lsb * magic) >> 58];
+}
+
+/* { dg-final { scan-assembler-times "ctz\t" 3 } } */
+/* { dg-final { scan-assembler-times "andi\t" 1 } } 

Re: *PING* [PATCH 0/4] Use pointer arithmetic for array references [PR102043]

2022-04-23 Thread Jerry D via Gcc-patches

Yes, Thank you Mikael!

On 4/22/22 6:59 AM, Thomas Koenig via Fortran wrote:


Hi Mikael,

Ping for the four patches starting at 
https://gcc.gnu.org/pipermail/fortran/2022-April/057759.html :

https://gcc.gnu.org/pipermail/fortran/2022-April/057757.html
https://gcc.gnu.org/pipermail/fortran/2022-April/057760.html
https://gcc.gnu.org/pipermail/fortran/2022-April/057758.html
https://gcc.gnu.org/pipermail/fortran/2022-April/057761.html

Richi accepted the general direction and the middle-end interaction.
I need a fortran frontend ack as well.


Looks good to me.

Thanks a lot for taking this on! This would have been a serious
regression if released with gcc 12.

Best regards

Thomas




Re: [PATCH] AVX512F: Add missing macro for mask(z?)_scalf_s[sd] [PR 105339]

2022-04-23 Thread Hongtao Liu via Gcc-patches
On Fri, Apr 22, 2022 at 8:43 PM Hongyu Wang  wrote:
>
> > Please add the corresponding intrinsic test in sse-14.c
>
> Sorry for forgetting this part. Updated patch. Thanks.
>
LGTM.
> Hongtao Liu via Gcc-patches  于2022年4月22日周五 16:49写道:
> >
> > On Fri, Apr 22, 2022 at 4:12 PM Hongyu Wang via Gcc-patches
> >  wrote:
> > >
> > > Hi,
> > >
> > > Add missing macro under O0 and adjust macro format for scalf
> > > intrinsics.
> > >
> > Please add the corresponding intrinsic test in sse-14.c.
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}.
> > >
> > > Ok for master and backport to GCC 9/10/11?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/105339
> > > * config/i386/avx512fintrin.h (_mm512_scalef_round_pd):
> > > Add parentheses for parameters and djust format.
> > > (_mm512_mask_scalef_round_pd): Ditto.
> > > (_mm512_maskz_scalef_round_pd): Ditto.
> > > (_mm512_scalef_round_ps): Ditto.
> > > (_mm512_mask_scalef_round_ps): Ditto.
> > > (_mm512_maskz_scalef_round_ps): Ditto.
> > > (_mm_scalef_round_sd): Use _mm_undefined_pd.
> > > (_mm_scalef_round_ss): Use _mm_undefined_ps.
> > > (_mm_mask_scalef_round_sd): New macro.
> > > (_mm_mask_scalef_round_ss): Ditto.
> > > (_mm_maskz_scalef_round_sd): Ditto.
> > > (_mm_maskz_scalef_round_ss): Ditto.
> > > ---
> > >  gcc/config/i386/avx512fintrin.h | 76 -
> > >  1 file changed, 56 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/avx512fintrin.h 
> > > b/gcc/config/i386/avx512fintrin.h
> > > index 29511fd2831..6dc69ff0234 100644
> > > --- a/gcc/config/i386/avx512fintrin.h
> > > +++ b/gcc/config/i386/avx512fintrin.h
> > > @@ -3286,31 +3286,67 @@ _mm_maskz_scalef_round_ss (__mmask8 __U, __m128 
> > > __A, __m128 __B, const int __R)
> > >   (__mmask8) __U, 
> > > __R);
> > >  }
> > >  #else
> > > -#define _mm512_scalef_round_pd(A, B, C)\
> > > -(__m512d)__builtin_ia32_scalefpd512_mask(A, B, 
> > > (__v8df)_mm512_undefined_pd(), -1, C)
> > > -
> > > -#define _mm512_mask_scalef_round_pd(W, U, A, B, C) \
> > > -(__m512d)__builtin_ia32_scalefpd512_mask(A, B, W, U, C)
> > > -
> > > -#define _mm512_maskz_scalef_round_pd(U, A, B, C)   \
> > > -(__m512d)__builtin_ia32_scalefpd512_mask(A, B, 
> > > (__v8df)_mm512_setzero_pd(), U, C)
> > > +#define _mm512_scalef_round_pd(A, B, C)  
> > >   \
> > > +  ((__m512d)   \
> > > +   __builtin_ia32_scalefpd512_mask((A), (B),   \
> > > +  (__v8df) _mm512_undefined_pd(),  \
> > > +  -1, (C)))
> > > +
> > > +#define _mm512_mask_scalef_round_pd(W, U, A, B, C) \
> > > +  ((__m512d) __builtin_ia32_scalefpd512_mask((A), (B), (W), (U), (C)))
> > > +
> > > +#define _mm512_maskz_scalef_round_pd(U, A, B, C)   \
> > > +  ((__m512d)   \
> > > +   __builtin_ia32_scalefpd512_mask((A), (B),   \
> > > +  (__v8df) _mm512_setzero_pd(),\
> > > +  (U), (C)))
> > > +
> > > +#define _mm512_scalef_round_ps(A, B, C)  
> > >   \
> > > +  ((__m512)\
> > > +   __builtin_ia32_scalefps512_mask((A), (B),   \
> > > +  (__v16sf) _mm512_undefined_ps(), \
> > > +  -1, (C)))
> > > +
> > > +#define _mm512_mask_scalef_round_ps(W, U, A, B, C) \
> > > +  ((__m512) __builtin_ia32_scalefps512_mask((A), (B), (W), (U), (C)))
> > > +
> > > +#define _mm512_maskz_scalef_round_ps(U, A, B, C)   \
> > > +  ((__m512)\
> > > +   __builtin_ia32_scalefps512_mask((A), (B),   \
> > > +  (__v16sf) _mm512_setzero_ps(),   \
> > > +  (U), (C)))
> > > +
> > > +#define _mm_scalef_round_sd(A, B, C)   \
> > > +  ((__m128d)   \
> > > +   __builtin_ia32_scalefsd_mask_round ((A), (B),   \
> > > +  (__v2df) _mm_undefined_pd (),\
> > > +  -1, (C)))
> > >
> > > -#define _mm512_scalef_round_ps(A, B, C)\
> > > -(__m512)__builtin_ia32_scalefps512_mask(A, B, 
> > > (__v16sf)_mm512_undefined_ps(), -1, C)
> > > +#define _mm_scalef_round_ss(A, B, C)   \
> > > +  ((__m128)   

[PATCH 0/1] RISC-V: Fix canonical extension order (K and J)

2022-04-23 Thread Tsukasa OI via Gcc-patches
**note**

My copyright assignment to FSF is not yet started (will start just after
sending this patch).  Please take care of the assignment status.



This patch fixes RISC-V's canonical extension order...
from: "J" -> "K"
to  : "K" -> "J"
as per the RISC-V ISA Manual draft-20210402-1271737 or later.

This bug in the GCC is currently harmless because neither J nor
Zj* extensions are implemented.  Intention of this commit is for future-
proofness.

This patch corresponds following patch for GNU Binutils:

[My copyright assignment is done on GNU Binutils]

References:






Tsukasa OI (1):
  RISC-V: Fix canonical extension order (K and J)

 gcc/common/config/riscv/riscv-common.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: ab54f6007c79711fc2192098d4ccc3c24e95f3e6
-- 
2.32.0



[PATCH 1/1] RISC-V: Fix canonical extension order (K and J)

2022-04-23 Thread Tsukasa OI via Gcc-patches
This commit fixes canonical extension order to follow the RISC-V ISA
Manual draft-20210402-1271737 or later.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_supported_std_ext):
Fix "K" extension prefix to be placed before "J".
---
 gcc/common/config/riscv/riscv-common.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 1501242e296..0b0ec2c4ec5 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -594,7 +594,7 @@ riscv_subset_list::lookup (const char *subset, int 
major_version,
 static const char *
 riscv_supported_std_ext (void)
 {
-  return "mafdqlcbjktpvn";
+  return "mafdqlcbkjtpvn";
 }
 
 /* Parsing subset version.
-- 
2.32.0



Re: [PATCH 1/1] RISC-V: Fix canonical extension order (K and J)

2022-04-23 Thread Andrew Waterman
Neither K nor J is an extension that exists, and so it doesn't make
sense to mandate any particular ordering.  The better change would be
to delete the letters `k' and `j' from that string, so that we aren't
enforcing constraints that don't serve a useful purpose.

cf. 
https://github.com/riscv/riscv-isa-manual/commit/f5f9c27010b69a015958ffebe1ac5a34f8776dff

On Sat, Apr 23, 2022 at 10:26 PM Tsukasa OI via Gcc-patches
 wrote:
>
> This commit fixes canonical extension order to follow the RISC-V ISA
> Manual draft-20210402-1271737 or later.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc (riscv_supported_std_ext):
> Fix "K" extension prefix to be placed before "J".
> ---
>  gcc/common/config/riscv/riscv-common.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 1501242e296..0b0ec2c4ec5 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -594,7 +594,7 @@ riscv_subset_list::lookup (const char *subset, int 
> major_version,
>  static const char *
>  riscv_supported_std_ext (void)
>  {
> -  return "mafdqlcbjktpvn";
> +  return "mafdqlcbkjtpvn";
>  }
>
>  /* Parsing subset version.
> --
> 2.32.0
>


Re: [PATCH 1/1] RISC-V: Fix canonical extension order (K and J)

2022-04-23 Thread Tsukasa OI via Gcc-patches
Hello,

> Neither K nor J is an extension that exists,

That is correct.

> and so it doesn't make
> sense to mandate any particular ordering.

No. It affects Z* extension ordering...


On 2022/04/24 14:36, Andrew Waterman wrote:
> Neither K nor J is an extension that exists, and so it doesn't make
> sense to mandate any particular ordering.  The better change would be
> to delete the letters `k' and `j' from that string, so that we aren't
> enforcing constraints that don't serve a useful purpose.
> 
> cf. 
> https://github.com/riscv/riscv-isa-manual/commit/f5f9c27010b69a015958ffebe1ac5a34f8776dff

Wait... so, you make constraints for existing single-letters (Zi -> Zv)
but not for non-existing single-letters? (Zk -> Zj, Zj -> Zk) anymore?

That's completely unexpected move but also makes sense.

Let me check your intentions and details: do we need to place Z[CH]*
extensions without single-letter extension [CH] after all existing ones
(like Zv*)? Or, Z[CH]* extensions without single-letter extension [CH]
have no constraints as long as all Z* extensions are grouped together?

> 
> On Sat, Apr 23, 2022 at 10:26 PM Tsukasa OI via Gcc-patches
>  wrote:
>>
>> This commit fixes canonical extension order to follow the RISC-V ISA
>> Manual draft-20210402-1271737 or later.
>>
>> gcc/ChangeLog:
>>
>> * common/config/riscv/riscv-common.cc (riscv_supported_std_ext):
>> Fix "K" extension prefix to be placed before "J".
>> ---
>>  gcc/common/config/riscv/riscv-common.cc | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/common/config/riscv/riscv-common.cc 
>> b/gcc/common/config/riscv/riscv-common.cc
>> index 1501242e296..0b0ec2c4ec5 100644
>> --- a/gcc/common/config/riscv/riscv-common.cc
>> +++ b/gcc/common/config/riscv/riscv-common.cc
>> @@ -594,7 +594,7 @@ riscv_subset_list::lookup (const char *subset, int 
>> major_version,
>>  static const char *
>>  riscv_supported_std_ext (void)
>>  {
>> -  return "mafdqlcbjktpvn";
>> +  return "mafdqlcbkjtpvn";
>>  }
>>
>>  /* Parsing subset version.
>> --
>> 2.32.0
>>
>