[PATCH V2]rs6000: re-enable web and rnreg with -funroll-loops

2019-12-23 Thread Jiufu Guo
Jiufu Guo  writes:

> Segher Boessenkool  writes:
>
>> Hi!
>>
>> On Fri, Dec 20, 2019 at 02:34:05PM +0800, Jiufu Guo wrote:
>>> Previously, limited unrolling was enabled at O2 for powerpc in r278034.  At 
>>> that
>>> time, -fweb and -frename-registers were not enabled together with 
>>> -funroll-loops
>>> even for -O3.  After that, we notice there are some performance degradation 
>>> on
>>> SPEC2006fp which caused by without web and rnreg.
>>
>> And 2017 was fine on all tests.  Annoying :-(
>>
>>> This patch enable -fweb
>>> and -frename-registers for -O3 to align original behavior before r278034.
>>
>> Okay.  Hopefully we can find a way to determine in what circumstances web
>> and rnreg help instead of hurt, but until then, the old behaviour is
>> certainly the safe choice.
>>
>>> --- a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c
>>> +++ b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c
>>> @@ -2,6 +2,7 @@
>>>  /* Originator: Andrew Church  */
>>>  /* { dg-do run } */
>>>  /* { dg-require-effective-target untyped_assembly } */
>>> +/* { dg-additional-options "-fno-rename-registers" { target { powerpc*-*-* 
>>> } } } */
>>
>> What is this for?  What happens without it?
> The reason of this fail is: -frename-registers does not work well with
> __builtin_return/__builtin_apply which need to save and restore
> registers which could not be renamed incorrectly.
>
> When this case runs with -O3, with this patch, -frename-registers is
> enabled. Originally, -frename-registers is enabled with -funroll-loops
> instead pure -O3. This change cause this case fail at -O3.
>

To align with original behavior better, I updated the patch and attached
it at the end of this mail.
The updated patch also pass bootstrap and regtests.

Is this patch ok for trunk?

Thanks,
Jiufu

>>
>> The rs6000/ parts are okay for trunk.  Thanks!
>>
>>
>> Segher

gcc/
2019-12-23  Jiufu Guo  

* gcc/config/rs6000/rs6000.c
(rs6000_option_override_internal): Enable -fweb and -frename-registers
with -funroll-loops

---
 gcc/config/rs6000/rs6000.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 23b6d99..dfba6b4 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4538,12 +4538,19 @@ rs6000_option_override_internal (bool global_init_p)
   param_sched_pressure_algorithm,
   SCHED_PRESSURE_MODEL);
 
-  /* Explicit -funroll-loops turns -munroll-only-small-loops off.  */
-  if (((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
+  /* Explicit -funroll-loops turns -munroll-only-small-loops off, and
+turns -fweb and -frename-registers on.  */
+  if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
   || (global_options_set.x_flag_unroll_all_loops
   && flag_unroll_all_loops))
- && !global_options_set.x_unroll_only_small_loops)
-   unroll_only_small_loops = 0;
+   {
+ if (!global_options_set.x_unroll_only_small_loops)
+   unroll_only_small_loops = 0;
+ if (!global_options_set.x_flag_rename_registers)
+   flag_rename_registers = 1;
+ if (!global_options_set.x_flag_web)
+   flag_web = 1;
+   }
 
   /* If using typedef char *va_list, signal that
 __builtin_va_start (&ap, 0) can be optimized to
-- 
2.7.4



Re: [PATCH] enable -fweb and -frename-registers at -O3 for rs6000

2019-12-23 Thread Jiufu Guo
Jiufu Guo  writes:

> Segher Boessenkool  writes:
>
>> Hi!
>>
[...]
>>> --- a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c
>>> +++ b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c
>>> @@ -2,6 +2,7 @@
>>>  /* Originator: Andrew Church  */
>>>  /* { dg-do run } */
>>>  /* { dg-require-effective-target untyped_assembly } */
>>> +/* { dg-additional-options "-fno-rename-registers" { target { powerpc*-*-* 
>>> } } } */
>>
>> What is this for?  What happens without it?
> The reason of this fail is: -frename-registers does not work well with
> __builtin_return/__builtin_apply which need to save and restore
> registers which could not be renamed incorrectly.
For this issue, I opened a bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93047.

Thanks,
Jiufu.

>
> When this case runs with -O3, with this patch, -frename-registers is
> enabled. Originally, -frename-registers is enabled with -funroll-loops
> instead pure -O3. This change cause this case fail at -O3.
>
>>
>> The rs6000/ parts are okay for trunk.  Thanks!
>>
>>
>> Segher


Re: [PATCH] libstdcxx: Update ctype_base.h from NetBSD upstream

2019-12-23 Thread Jonathan Wakely
On Sat, 21 Dec 2019 at 23:37, Gerald Pfeifer  wrote:
>
> Hi Matthew,
>
> On Mon, 4 Feb 2019, Matthew Bauer wrote:
> > The ctype_base.h file in libstdc++-v3 is out of date for NetBSD. They
> > have changed their ctype.h definition. It was updated in their intree
> > libstdc++-v3 but not in the GCC one. My understanding is this is a
> > straightforward rewrite. I've attached my own patch, but the file can
> > be obtained directly here:
> >
> > http://cvsweb.netbsd.org/bsdweb.cgi/src/external/gpl3/gcc/dist/libstdc%2b%2b-v3/config/os/bsd/netbsd/ctype_base.h
> >
> > With the attached patch, libstdc++-v3 can succesfully be built with
> > NetBSD headers (along with --disable-libcilkrts).
>
> I noticed this has not been applied yet, nor seen a follow-up?, and also
> noticed it went to the gcc-patches list, but not libstd...@gcc.gnu.org.

That's why it was ignored then.

> Let me re-address this to libstd...@gcc.gnu.org in the hope the
> maintainers there will have a look.

I'll take a look ASAP.


Check mask argument's type when vectorising conditional functions

2019-12-23 Thread Richard Sandiford
We can't yet vectorise conditional internal functions whose boolean
condition is fed by a data access (or more generally, by a tree of logic
ops in which all the leaves are data accesses).  Although we should add
that eventually, we'd need further work to generate good-quality code.

Unlike vectorizable_load and vectorizalbe_streo, vectorizable_call
wasn't checking whether the mask had a suitable type, leading to an
ICE on the testcases.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-12-23  Richard Sandiford  

gcc/
* tree-vect-stmts.c (vect_check_load_store_mask): Rename to...
(vect_check_scalar_mask): ...this.
(vectorizable_store, vectorizable_load): Update call accordingly.
(vectorizable_call): Use vect_check_scalar_mask to check the mask
argument in calls to conditional internal functions.

gcc/testsuite/
* gcc.dg/vect/vect-cond-arith-8.c: New test.
* gcc.target/aarch64/sve/cond_fmul_5.c: Likewise.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-12-10 12:20:03.569816222 +
+++ gcc/tree-vect-stmts.c   2019-12-23 14:41:40.761667889 +
@@ -2534,14 +2534,14 @@ get_load_store_type (stmt_vec_info stmt_
 }
 
 /* Return true if boolean argument MASK is suitable for vectorizing
-   conditional load or store STMT_INFO.  When returning true, store the type
+   conditional operation STMT_INFO.  When returning true, store the type
of the definition in *MASK_DT_OUT and the type of the vectorized mask
in *MASK_VECTYPE_OUT.  */
 
 static bool
-vect_check_load_store_mask (stmt_vec_info stmt_info, tree mask,
-   vect_def_type *mask_dt_out,
-   tree *mask_vectype_out)
+vect_check_scalar_mask (stmt_vec_info stmt_info, tree mask,
+   vect_def_type *mask_dt_out,
+   tree *mask_vectype_out)
 {
   vec_info *vinfo = stmt_info->vinfo;
   if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (mask)))
@@ -3262,6 +3262,14 @@ vectorizable_call (stmt_vec_info stmt_in
   for (i = 0; i < nargs; i++)
 {
   op = gimple_call_arg (stmt, i);
+
+  if ((int) i == mask_opno)
+   {
+ if (!vect_check_scalar_mask (stmt_info, op, &dt[i], &vectypes[i]))
+   return false;
+ continue;
+   }
+
   if (!vect_is_simple_use (op, vinfo, &dt[i], &vectypes[i]))
{
  if (dump_enabled_p ())
@@ -3270,11 +3278,6 @@ vectorizable_call (stmt_vec_info stmt_in
  return false;
}
 
-  /* Skip the mask argument to an internal function.  This operand
-has been converted via a pattern if necessary.  */
-  if ((int) i == mask_opno)
-   continue;
-
   /* We can only handle calls with arguments of the same type.  */
   if (rhs_type
  && !types_compatible_p (rhs_type, TREE_TYPE (op)))
@@ -3544,12 +3547,6 @@ vectorizable_call (stmt_vec_info stmt_in
  continue;
}
 
- if (mask_opno >= 0 && !vectypes[mask_opno])
-   {
- gcc_assert (modifier != WIDEN);
- vectypes[mask_opno] = truth_type_for (vectype_in);
-   }
-
  for (i = 0; i < nargs; i++)
{
  op = gimple_call_arg (stmt, i);
@@ -7378,8 +7375,8 @@ vectorizable_store (stmt_vec_info stmt_i
   if (mask_index >= 0)
{
  mask = gimple_call_arg (call, mask_index);
- if (!vect_check_load_store_mask (stmt_info, mask, &mask_dt,
-  &mask_vectype))
+ if (!vect_check_scalar_mask (stmt_info, mask, &mask_dt,
+  &mask_vectype))
return false;
}
 }
@@ -8598,8 +8595,8 @@ vectorizable_load (stmt_vec_info stmt_in
   if (mask_index >= 0)
{
  mask = gimple_call_arg (call, mask_index);
- if (!vect_check_load_store_mask (stmt_info, mask, &mask_dt,
-  &mask_vectype))
+ if (!vect_check_scalar_mask (stmt_info, mask, &mask_dt,
+  &mask_vectype))
return false;
}
 }
Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c   2019-12-23 
14:41:40.757667915 +
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+
+void
+f (float *x, _Bool *cond, float *y)
+{
+  for (int i = 0; i < 100; ++i)
+x[i] = cond[i] ? y[i] * 100 : y[i];
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/cond_fmul_5.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/cond_fmul_5.c  2019-12-23 
14:41:40.757667915 +
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-opti

Re: [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics

2019-12-23 Thread Roman Zhuykov

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93053

--
Roman

Richard Henderson wrote 18.09.2019 04:58:

This is the libgcc part of the interface -- providing the functions.
Rationale is provided at the top of libgcc/config/aarch64/lse.S.

* config/aarch64/lse-init.c: New file.
* config/aarch64/lse.S: New file.
* config/aarch64/t-lse: New file.
* config.host: Add t-lse to all aarch64 tuples.
---
 libgcc/config/aarch64/lse-init.c |  45 ++
 libgcc/config.host   |   4 +
 libgcc/config/aarch64/lse.S  | 235 +++
 libgcc/config/aarch64/t-lse  |  44 ++
 4 files changed, 328 insertions(+)
 create mode 100644 libgcc/config/aarch64/lse-init.c
 create mode 100644 libgcc/config/aarch64/lse.S
 create mode 100644 libgcc/config/aarch64/t-lse

diff --git a/libgcc/config/aarch64/lse-init.c 
b/libgcc/config/aarch64/lse-init.c

new file mode 100644
index 000..51fb21d45c9
--- /dev/null
+++ b/libgcc/config/aarch64/lse-init.c
@@ -0,0 +1,45 @@
+/* Out-of-line LSE atomics for AArch64 architecture, Init.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Linaro Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* Define the symbol gating the LSE implementations.  */
+_Bool __aarch64_have_lse_atomics
+  __attribute__((visibility("hidden"), nocommon));
+
+/* Disable initialization of __aarch64_have_lse_atomics during 
bootstrap.  */

+#ifndef inhibit_libc
+# include 
+
+/* Disable initialization if the system headers are too old.  */
+# if defined(AT_HWCAP) && defined(HWCAP_ATOMICS)
+
+static void __attribute__((constructor))
+init_have_lse_atomics (void)
+{
+  unsigned long hwcap = getauxval (AT_HWCAP);
+  __aarch64_have_lse_atomics = (hwcap & HWCAP_ATOMICS) != 0;
+}
+
+# endif /* HWCAP */
+#endif /* inhibit_libc */
diff --git a/libgcc/config.host b/libgcc/config.host
index 728e543ea39..122113fc519 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -350,12 +350,14 @@ aarch64*-*-elf | aarch64*-*-rtems*)
extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"
extra_parts="$extra_parts crtfastmath.o"
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
md_unwind_header=aarch64/aarch64-unwind.h
;;
 aarch64*-*-freebsd*)
extra_parts="$extra_parts crtfastmath.o"
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
md_unwind_header=aarch64/freebsd-unwind.h
;;
@@ -367,12 +369,14 @@ aarch64*-*-netbsd*)
;;
 aarch64*-*-fuchsia*)
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp"
;;
 aarch64*-*-linux*)
extra_parts="$extra_parts crtfastmath.o"
md_unwind_header=aarch64/linux-unwind.h
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
;;
 alpha*-*-linux*)
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
new file mode 100644
index 000..c24a39242ca
--- /dev/null
+++ b/libgcc/config/aarch64/lse.S
@@ -0,0 +1,235 @@
+/* Out-of-line LSE atomics for AArch64 architecture.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Linaro Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PA

Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]

2019-12-23 Thread Stam Markianos-Wright


On 12/19/19 10:01 AM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> [...]
>> @@ -659,6 +666,8 @@ aarch64_simd_builtin_std_type (machine_mode mode,
>> return float_type_node;
>>   case E_DFmode:
>> return double_type_node;
>> +case E_BFmode:
>> +  return aarch64_bf16_type_node;
>>   default:
>> gcc_unreachable ();
>>   }
>> @@ -750,6 +759,11 @@ aarch64_init_simd_builtin_types (void)
>> aarch64_simd_types[Float64x1_t].eltype = double_type_node;
>> aarch64_simd_types[Float64x2_t].eltype = double_type_node;
>>   
>> +
>> +/* Init Bfloat vector types with underlying uint types.  */
>> +  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
>> +  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
> 
> Formatting nits: too many blank lines, comment should be indented
> to match the code.

Done :)

> 
>> +
>> for (i = 0; i < nelts; i++)
>>   {
>> tree eltype = aarch64_simd_types[i].eltype;
>> @@ -1059,6 +1073,19 @@ aarch64_init_fp16_types (void)
>> aarch64_fp16_ptr_type_node = build_pointer_type (aarch64_fp16_type_node);
>>   }
>>   
>> +/* Initialize the backend REAL_TYPE type supporting bfloat types.  */
>> +static void
>> +aarch64_init_bf16_types (void)
>> +{
>> +  aarch64_bf16_type_node = make_node (REAL_TYPE);
>> +  TYPE_PRECISION (aarch64_bf16_type_node) = 16;
>> +  SET_TYPE_MODE (aarch64_bf16_type_node, BFmode);
>> +  layout_type (aarch64_bf16_type_node);
>> +
>> +  (*lang_hooks.types.register_builtin_type) (aarch64_bf16_type_node, 
>> "__bf16");
> 
> This style is mostly a carry-over from pre-ANSI days.  New code
> can just use "lang_hooks.types.register_builtin_type (...)".

Ahh good to know, thanks! Done

> 
>> +  aarch64_bf16_ptr_type_node = build_pointer_type (aarch64_bf16_type_node);
>> +}
>> +
>>   /* Pointer authentication builtins that will become NOP on legacy platform.
>>  Currently, these builtins are for internal use only (libgcc EH 
>> unwinder).  */
>>   
>> [...]
>> diff --git a/gcc/config/aarch64/aarch64-simd-builtin-types.def 
>> b/gcc/config/aarch64/aarch64-simd-builtin-types.def
>> index b015694293c..3b387377f38 100644
>> --- a/gcc/config/aarch64/aarch64-simd-builtin-types.def
>> +++ b/gcc/config/aarch64/aarch64-simd-builtin-types.def
>> @@ -50,3 +50,5 @@
>> ENTRY (Float32x4_t, V4SF, none, 13)
>> ENTRY (Float64x1_t, V1DF, none, 13)
>> ENTRY (Float64x2_t, V2DF, none, 13)
>> +  ENTRY (Bfloat16x4_t, V4BF, none, 15)
>> +  ENTRY (Bfloat16x8_t, V8BF, none, 15)
> 
> Should be 14 (number of characters + 2 for "__").  Would be good to have
> a test for correct C++ mangling.

Done, thank you for pointing it out!!

> 
>> [...]
>> @@ -101,10 +101,10 @@
>> [(set_attr "type" "neon_dup")]
>>   )
>>   
>> -(define_insn "*aarch64_simd_mov"
>> -  [(set (match_operand:VD 0 "nonimmediate_operand"
>> +(define_insn "*aarch64_simd_mov"
>> +  [(set (match_operand:VDMOV 0 "nonimmediate_operand"
>>  "=w, m,  m,  w, ?r, ?w, ?r, w")
>> -(match_operand:VD 1 "general_operand"
>> +(match_operand:VDMOV 1 "general_operand"
>>  "m,  Dz, w,  w,  w,  r,  r, Dn"))]
>> "TARGET_SIMD
>>  && (register_operand (operands[0], mode)
>> @@ -126,13 +126,14 @@
>>   }
>> [(set_attr "type" "neon_load1_1reg, store_8, neon_store1_1reg,\
>>   neon_logic, neon_to_gp, f_mcr,\
>> - mov_reg, neon_move")]
>> + mov_reg, neon_move")
>> +(set_attr "arch" "*,notbf16,*,*,*,*,*,notbf16")]
>>   )
> 
> Together with the changes to the arch attribute:
> 
>> @@ -378,6 +378,12 @@
>>  (and (eq_attr "arch" "fp16")
>>   (match_test "TARGET_FP_F16INST"))
>>   
>> +(and (eq_attr "arch" "fp16_notbf16")
>> + (match_test "TARGET_FP_F16INST && !TARGET_BF16_FP"))
>> +
>> +(and (eq_attr "arch" "notbf16")
>> + (match_test "!TARGET_BF16_SIMD"))
>> +
>>  (and (eq_attr "arch" "sve")
>>   (match_test "TARGET_SVE")))
>>   (const_string "yes")
> 
> this will disable the second and final alternatives for all VDMOV modes
> when bf16 is enabled.  E.g. enabling bf16 will disable those alternatives
> for V4HI as well as V4BF.
> 
> If you want to disable some alternatives for V4BF then it'd be better to
> use define_mode_attr instead.  But are you sure we need to disable them?
> The m<-Dz alternative should work for V4BF as well.  The w<-Dn alternative
> should work too -- it's up to aarch64_simd_valid_immediate to decide
> which immediates are valid.

Oh yes, I see what you mean about blocking it for V4HI and everything else 
under 
VDMOV as well...
Yea it was in the principle of doing what we can to block any internal Bfloat 
processing, Bfloat immediates, Bfloat constants, etc., but I wasn't sure on 
what 
should/shouldn't be allowed so was blocking anything that might allow for 
unintended operations to happen in BFmode. But I've got a better idea of this 
now, so, agreed, looking these basicall

Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]

2019-12-23 Thread Richard Sandiford
Stam Markianos-Wright  writes:
> On 12/19/19 10:01 AM, Richard Sandiford wrote:
>>> +
>>> +#pragma GCC push_options
>>> +#pragma GCC target ("arch=armv8.2-a+bf16")
>>> +#ifdef __ARM_FEATURE_BF16_SCALAR_ARITHMETIC
>>> +
>>> +typedef __bf16 bfloat16_t;
>>> +
>>> +
>>> +#endif
>>> +#pragma GCC pop_options
>>> +
>>> +#endif
>> 
>> Are you sure we need the #ifdef?  The target pragma should guarantee
>> that the macro's defined.
>> 
>> But the validity of the typedef shouldn't depend on target options,
>> so AFAICT this should just be:
>> 
>> typedef __bf16 bfloat16_t;
>
> Ok so it's a case of "what do we want to happen if the user tries to use 
> bfloats 
> without +bf16 enabled.
>
> So the intent of the ifdef was to not have bfloat16_t be visible if the macro 
> wasn't defined (i.e. not having any bf16 support), but I see now that this 
> was 
> being negated by the target macro, anyway! Oops, my bad for not really 
> understanding that, sorry!
>
> If we have the types always visible, then the user may use them, resulting in 
> an 
> ICE.
>
> But even if the #ifdef worked this still doesn't stop the user from trying to 
> use  __bf16 or __Bfloat16x4_t, __Bfloat16x8_t , which would still do produce 
> an 
> ICE, so it's not a perfect solution anyway...

Right.  Or they could use #pragma GCC target to switch to a different
non-bf16 target after including arm_bf16.h.

> One other thing I tried was the below change to aarch64-builtins.c which 
> stops 
> __bf16 or the vector types from being registered at all:
>
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -759,26 +759,32 @@ aarch64_init_simd_builtin_types (void)
>  aarch64_simd_types[Float64x1_t].eltype = double_type_node;
>  aarch64_simd_types[Float64x2_t].eltype = double_type_node;
>
> -  /* Init Bfloat vector types with underlying __bf16 type.  */
> -  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
> -  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
> +  if (TARGET_BF16_SIMD)
> +{
> +  /* Init Bfloat vector types with underlying __bf16 type.  */
> +  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
> +  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
> +}
>
>  for (i = 0; i < nelts; i++)
>{
>  tree eltype = aarch64_simd_types[i].eltype;
>  machine_mode mode = aarch64_simd_types[i].mode;
>
> -  if (aarch64_simd_types[i].itype == NULL)
> +  if (eltype != NULL)
>   {
> - aarch64_simd_types[i].itype
> -   = build_distinct_type_copy
> - (build_vector_type (eltype, GET_MODE_NUNITS (mode)));
> - SET_TYPE_STRUCTURAL_EQUALITY (aarch64_simd_types[i].itype);
> -   }
> + if (aarch64_simd_types[i].itype == NULL)
> +   {
> + aarch64_simd_types[i].itype
> +   = build_distinct_type_copy
> +   (build_vector_type (eltype, GET_MODE_NUNITS (mode)));
> + SET_TYPE_STRUCTURAL_EQUALITY (aarch64_simd_types[i].itype);
> +   }
>
> -  tdecl = add_builtin_type (aarch64_simd_types[i].name,
> -   aarch64_simd_types[i].itype);
> -  TYPE_NAME (aarch64_simd_types[i].itype) = tdecl;
> + tdecl = add_builtin_type (aarch64_simd_types[i].name,
> +   aarch64_simd_types[i].itype);
> + TYPE_NAME (aarch64_simd_types[i].itype) = tdecl;
> +   }
>}
>
>#define AARCH64_BUILD_SIGNED_TYPE(mode)  \
> @@ -1240,7 +1246,8 @@ aarch64_general_init_builtins (void)
>
>  aarch64_init_fp16_types ();
>
> -  aarch64_init_bf16_types ();
> +  if (TARGET_BF16_FP)
> +aarch64_init_bf16_types ();
>
>  if (TARGET_SIMD)
>aarch64_init_simd_builtins ();
>
>
>
> But the problem in that case was that it the types could not be re-enabled 
> using 
> a target pragma like:
>
> #pragma GCC push_options
> #pragma GCC target ("+bf16")
>
> Inside the test.
>
> (i.e. the pragma caused the ifdef to be TRUE, but __bf16 was still not being 
> enabled afaict?)
>
> So I'm not sure what to do, presumably we do want some guard around the type 
> so 
> as not to just ICE if the type is used without +bf16?

Other header files work both ways: you get the same definitions regardless
of what the target was when the header file was included.  Then we need
to raise an error if the user tries to do something that the current
target doesn't support.

I suppose for bf16 we could either (a) try to raise an error whenever
BF-related moves are emitted without the required target feature or
(b) handle __bf16 types like __fp16 types.  The justification for
(b) is that there aren't really any new instructions for moves;
__bf16 is mostly a software construct as far as this specific
patch goes.  (It's a different story for the intrinsics patch
of course.)

I don't know which of (a) or (b) is better.  Whichever we go for,
it would be good if cla

Re: [PATCH V2]rs6000: re-enable web and rnreg with -funroll-loops

2019-12-23 Thread Segher Boessenkool
On Mon, Dec 23, 2019 at 04:11:35PM +0800, Jiufu Guo wrote:
> To align with original behavior better, I updated the patch and attached
> it at the end of this mail.
> The updated patch also pass bootstrap and regtests.
> 
> Is this patch ok for trunk?

If this performs well, okay for trunk.  Thanks!


Segher


> 2019-12-23  Jiufu Guo  
> 
>   * gcc/config/rs6000/rs6000.c
>   (rs6000_option_override_internal): Enable -fweb and -frename-registers
>   with -funroll-loops


Re: [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal for AArch64 AdvSIMD

2019-12-23 Thread Richard Sandiford
Thanks for the patch, looks good.

Delia Burduv  writes:
> This patch adds the ARMv8.6 ACLE intrinsics for bfmmla, bfmlalb and bfmlalt 
> as part of the BFloat16 extension.
> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
> The intrinsics are declared in arm_neon.h and the RTL patterns are defined in 
> aarch64-simd.md.
> Two new tests are added to check assembler output.
>
> This patch depends on the two Aarch64 back-end patches. 
> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01323.html and 
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01324.html)
>
> Tested for regression on aarch64-none-elf and aarch64_be-none-elf. I don't 
> have commit rights, so if this is ok can someone please commit it for me?
>
> gcc/ChangeLog:
>
> 2019-10-29  Delia Burduv  
>
> * config/aarch64/aarch64-simd-builtins.def
>   (bfmmla): New built-in function.
>   (bfmlalb): New built-in function.
>   (bfmlalt): New built-in function.
>   (bfmlalb_lane): New built-in function.
>   (bfmlalt_lane): New built-in function.
>   (bfmlalb_laneq): New built-in function.
>   (bfmlalt_laneq): New built-in function
> * config/aarch64/aarch64-simd.md (bfmmla): New pattern.
>   (bfmlal): New patterns.
> * config/aarch64/arm_neon.h (vbfmmlaq_f32): New intrinsic.
>   (vbfmlalbq_f32): New intrinsic.
>   (vbfmlaltq_f32): New intrinsic.
>   (vbfmlalbq_lane_f32): New intrinsic.
>   (vbfmlaltq_lane_f32): New intrinsic.
>   (vbfmlalbq_laneq_f32): New intrinsic.
>   (vbfmlaltq_laneq_f32): New intrinsic.
> * config/aarch64/iterators.md (UNSPEC_BFMMLA): New UNSPEC.
>   (UNSPEC_BFMLALB): New UNSPEC.
>   (UNSPEC_BFMLALT): New UNSPEC.
>   (BF_MLA): New int iterator.
>   (bt): Added UNSPEC_BFMLALB, UNSPEC_BFMLALT.
> * config/arm/types.md (bf_mmla): New type.
>   (bf_mla): New type.
>
> gcc/testsuite/ChangeLog:
>
> 2019-10-29  Delia Burduv  
>
> * gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c: New test.
> * gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c: New test.
> * 
> gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c:
>   New test.

Formatting nit: continuation lines should only be indented by a tab,
rather than a tab and two spaces.  (I agree the above looks nicer,
but the policy is not to be flexible over this kind of thing...)

> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> f4ca35a59704c761fe2ac2b6d401fff7c8aba80d..5e9f50f090870d0c63916540a48f5ac132d2630d
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -682,3 +682,14 @@
>BUILTIN_VSFDF (UNOP, frint32x, 0)
>BUILTIN_VSFDF (UNOP, frint64z, 0)
>BUILTIN_VSFDF (UNOP, frint64x, 0)
> +
> +  /* Implemented by aarch64_bfmmlaqv4sf  */
> +  VAR1 (TERNOP, bfmmlaq, 0, v4sf)
> +
> +  /* Implemented by aarch64_bfmlal{_lane{q}}v4sf  */
> +  VAR1 (TERNOP, bfmlalb, 0, v4sf)
> +  VAR1 (TERNOP, bfmlalt, 0, v4sf)
> +  VAR1 (QUADOP_LANE, bfmlalb_lane, 0, v4sf)
> +  VAR1 (QUADOP_LANE, bfmlalt_lane, 0, v4sf)
> +  VAR1 (QUADOP_LANE, bfmlalb_laneq, 0, v4sf)
> +  VAR1 (QUADOP_LANE, bfmlalt_laneq, 0, v4sf)
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 55660ae248f4fa75d35ba2949cd4b9d5139ff5f5..66a6c4116a1fdd26dd4eec8b0609e28eb2c38fa1
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -7027,3 +7027,57 @@
>"xtn\t%0., %1."
>[(set_attr "type" "neon_shift_imm_narrow_q")]
>  )
> +
> +;; bfmmla
> +(define_insn "aarch64_bfmmlaqv4sf"
> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
> +(plus:V4SF (match_operand:V4SF 1 "register_operand" "0")
> +   (unspec:V4SF [(match_operand:V8BF 2 "register_operand" 
> "w")
> + (match_operand:V8BF 3 "register_operand" 
> "w")]
> +UNSPEC_BFMMLA)))]
> +  "TARGET_BF16_SIMD"
> +  "bfmmla\\t%0.4s, %2.8h, %3.8h"
> +  [(set_attr "type" "neon_mla_s_q")]

Looks like this should be neon_fp_mla_s_q instead.

> +)
> +
> +;; bfmlal
> +(define_insn "aarch64_bfmlalv4sf"
> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
> +(plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
> +(unspec:V4SF [(match_operand:V8BF 2 "register_operand" 
> "w")
> +  (match_operand:V8BF 3 "register_operand" 
> "w")]
> + BF_MLA)))]
> +  "TARGET_BF16_SIMD"
> +  "bfmlal\\t%0.4s, %2.8h, %3.8h"
> +  [(set_attr "type" "neon_fp_mla_s")]
> +)

Maybe we should have _q here too.  All the vectors are 128-bit vectors,
we just happen to ignore even or odd elements of the BF inputs.

> +(define_insn "aarch64_bfmlal_lanev4

Re: [GCC][PATCH][AArch64] ACLE intrinsics for BFCVTN, BFCVTN2 (AArch64 AdvSIMD) and BFCVT (AArch64 FP)

2019-12-23 Thread Richard Sandiford
Some of the comments on the BFMMLA/BFMLA[LT] patch apply here too.

Delia Burduv  writes:
> This patch adds the Armv8.6-a ACLE intrinsics for bfmmla, bfmlalb and 
> bfmlalt as part of the BFloat16 extension.

That's the other patch :-)

> [...]
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 55660ae248f4fa75d35ba2949cd4b9d5139ff5f5..ff7a1f5f34a19b05eba48dba96c736dfdfdf7bac
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -7027,3 +7027,32 @@
>"xtn\t%0., %1."
>[(set_attr "type" "neon_shift_imm_narrow_q")]
>  )
> +
> +;; bfcvtn
> +(define_insn "aarch64_bfcvtn"
> +  [(set (match_operand:V4SF_TO_BF 0 "register_operand" "=w")
> +(unspec:V4SF_TO_BF [(match_operand:V4SF 1 "register_operand" "w")]
> +UNSPEC_BFCVTN))]
> +  "TARGET_BF16_SIMD"
> +  "bfcvtn\\t%0.4h, %1.4s"
> +  [(set_attr "type" "f_cvt")]
> +)
> +

If I've understood the naming convention correctly, the closest type
seems to be "neon_fp_cvt_narrow_s_q".

> +(define_insn "aarch64_bfcvtn2v8bf"
> +  [(set (match_operand:V8BF 0 "register_operand" "=w")
> +(unspec:V8BF [(match_operand:V8BF 1 "register_operand" "w")
> +  (match_operand:V4SF 2 "register_operand" "w")]
> +  UNSPEC_BFCVTN2))]
> +  "TARGET_BF16_SIMD"
> +  "bfcvtn2\\t%0.8h, %2.4s"
> +  [(set_attr "type" "f_cvt")]
> +)

Same here.

The constraint on operand 1 needs to be "0", otherwise operands 1 and 0
could end up in different registers.  You could test for this using
something like:

bfloat16x8_t test_bfcvtnq2_untied (bfloat16x8_t unused, bfloat16x8_t inactive,
   float32x4_t a)
{
  return vcvtq_high_bf16_f32 (inactive, a);
}

which when compiled at -O should produce something like:

/*
**test_bfcvtnq2_untied:
**  mov v0\.8h, v1\.8h
**  bfcvtn2 v0\.8h, v2\.4s
**  ret
*/

(Completely untested, the code above is probably wrong.)

> +
> +(define_insn "aarch64_bfcvtbf"
> +  [(set (match_operand:BF 0 "register_operand" "=w")
> +(unspec:BF [(match_operand:SF 1 "register_operand" "w")]
> +UNSPEC_BFCVT))]
> +  "TARGET_BF16_SIMD"

I think this just needs the scalar macro rather than *_SIMD.

> +  "bfcvt\\t%h0, %s1"
> +  [(set_attr "type" "f_cvt")]
> +)
> diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h
> index 
> aedb0972735ce549fac1870bacd1ef3101e8fd26..1b9ab3690d35e153cd4f24b9e3bbb5b4cc4b4f4d
>  100644
> --- a/gcc/config/aarch64/arm_bf16.h
> +++ b/gcc/config/aarch64/arm_bf16.h
> @@ -34,7 +34,15 @@
>  #ifdef __ARM_FEATURE_BF16_SCALAR_ARITHMETIC
>  
>  typedef __bf16 bfloat16_t;
> -
> +typedef float float32_t;
> +
> +__extension__ extern __inline bfloat16_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vcvth_bf16_f32 \
> +  (float32_t __a)

No need for the line break here.

> +{
> +  return __builtin_aarch64_bfcvtbf (__a);
> +}
>  
>  #endif
>  #pragma GCC pop_options
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> 6cdbf381f0156ed993f03b847228b36ebbdd14f8..120f4b7d8827aee51834e75aeaa6ab8f8451980e
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -34610,6 +34610,35 @@ vrnd64xq_f64 (float64x2_t __a)
>  
>  #include "arm_bf16.h"
>  
> +#pragma GCC push_options
> +#pragma GCC target ("arch=armv8.2-a+bf16")
> +#ifdef __ARM_FEATURE_BF16_VECTOR_ARITHMETIC
> +
> +__extension__ extern __inline bfloat16x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vcvt_bf16_f32 (float32x4_t __a)
> +{
> +  return __builtin_aarch64_bfcvtnv4bf (__a);
> +
> +}

Nit: extra blank line.

> +
> +__extension__ extern __inline bfloat16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vcvtq_low_bf16_f32 (float32x4_t __a)
> +{
> +  return __builtin_aarch64_bfcvtn_qv8bf (__a);
> +}
> +
> +__extension__ extern __inline bfloat16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vcvtq_high_bf16_f32 (bfloat16x8_t __inactive, float32x4_t __a)
> +{
> +  return __builtin_aarch64_bfcvtn2v8bf (__inactive, __a);
> +}
> +
> +#endif
> +#pragma GCC pop_options
> +
>  #pragma GCC pop_options
>  
>  #undef __aarch64_vget_lane_any
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 
> 931166da5e47302afe810498eea9c8c2ab89b9de..f9f0bafb1eca4da42e564224fca1fd43d89f6ed1
>  100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -431,6 +431,9 @@
>  ;; SVE predicate modes that control 16-bit, 32-bit or 64-bit elements.
>  (define_mode_iterator PRED_HSD [VNx8BI VNx4BI VNx2BI])
>  
> +;; Bfloat16 modes to which V4SF can be converted
> +(define_mode_iterator V4SF_TO_BF [V4BF V8BF])
> +
>  ;; --
>  ;; Unspec enumerations for Advance SIMD. The

Re: [PATCH 09/13] OpenACC 2.6 deep copy: C and C++ front-end parts

2019-12-23 Thread Thomas Schwinge
Hi!

On 2019-12-17T22:03:49-0800, Julian Brown  wrote:
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/mdc-1.c
> @@ -0,0 +1,55 @@
> +/* Test OpenACC's support for manual deep copy, including the attach
> +   and detach clauses.  */
> +
> +/* { dg-do compile { target int32 } } */
> +/* { dg-additional-options "-fdump-tree-omplower" } */
> +
> +void
> +t1 ()
> +{
> +  struct foo {
> +int *a, *b, c, d, *e;
> +  } s;
> +
> +  int *a, *z;

These data types...

> +#pragma acc enter data copyin(s)
> +[...]

..., and these uses...

> +/* { dg-final { scan-tree-dump-times "pragma omp target oacc_enter_exit_data 
> map.to:s .len: 32.." 1 "omplower" } } */
> +[...]

..., and these tree dump scanning directives don't match up: a lot of
FAILs for anything that doesn't use 64-bit pointers, such as x86_64
GNU/Linux's '-m32' multilib.  This will need further tweaking to enable
tree dump scanning for all configurations, but for now, see attached
"Restrict 'c-c++-common/goacc/mdc-1.c' to LP64, LLP64"; committed to
trunk in r279720.


Grüße
 Thomas


From f8f558f90d9fe859e44258103486389d026321fa Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Mon, 23 Dec 2019 20:20:29 +
Subject: [PATCH] Restrict 'c-c++-common/goacc/mdc-1.c' to LP64, LLP64

The tree dump scanning has certain expectations.

	gcc/testsuite/
	* c-c++-common/goacc/mdc-1.c: Restrict to LP64, LLP64.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@279720 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog  | 4 
 gcc/testsuite/c-c++-common/goacc/mdc-1.c | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 59fc56447ac..1a659dba269 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2019-12-23  Thomas Schwinge  
+
+	* c-c++-common/goacc/mdc-1.c: Restrict to LP64, LLP64.
+
 2019-12-23  Richard Sandiford  
 
 	PR c++/92789
diff --git a/gcc/testsuite/c-c++-common/goacc/mdc-1.c b/gcc/testsuite/c-c++-common/goacc/mdc-1.c
index 6c6a81ea73a..fb5841a709d 100644
--- a/gcc/testsuite/c-c++-common/goacc/mdc-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/mdc-1.c
@@ -1,7 +1,8 @@
 /* Test OpenACC's support for manual deep copy, including the attach
and detach clauses.  */
 
-/* { dg-do compile { target int32 } } */
+/* TODO The tree dump scanning has certain expectations.
+   { dg-do compile { target { lp64 || llp64 } } } */
 /* { dg-additional-options "-fdump-tree-omplower" } */
 
 void
-- 
2.17.1



signature.asc
Description: PGP signature


libgo patch committed: Hurd portability patches

2019-12-23 Thread Ian Lance Taylor
This libgo patch by Svante Signell adds some Hurd portability patches
to libgo.  These are from GCC PR 93020.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 279398)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-85641a0f26061f7c98db42a2adb3250c07ce504e
+393957c8b68e370504209eb901aa0c3874e256d4
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/internal/poll/errno_unix.go
===
--- libgo/go/internal/poll/errno_unix.go(revision 279398)
+++ libgo/go/internal/poll/errno_unix.go(working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+// +build aix darwin dragonfly freebsd hurd linux netbsd openbsd solaris
 
 package poll
 
Index: libgo/go/os/export_unix_test.go
===
--- libgo/go/os/export_unix_test.go (revision 279398)
+++ libgo/go/os/export_unix_test.go (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix darwin dragonfly freebsd js,wasm linux nacl netbsd openbsd 
solaris
+// +build aix darwin dragonfly freebsd hurd js,wasm linux nacl netbsd openbsd 
solaris
 
 package os
 
Index: libgo/go/runtime/os_hurd.go
===
--- libgo/go/runtime/os_hurd.go (revision 279398)
+++ libgo/go/runtime/os_hurd.go (working copy)
@@ -112,7 +112,7 @@ func semawakeup(mp *m) {
 }
 
 func getncpu() int32 {
-   n := int32(sysconf(_SC_NPROCESSORS_ONLN))
+   n := int32(sysconf(__SC_NPROCESSORS_ONLN))
if n < 1 {
return 1
}
Index: libgo/go/syscall/export_unix_test.go
===
--- libgo/go/syscall/export_unix_test.go(revision 279398)
+++ libgo/go/syscall/export_unix_test.go(working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+// +build aix darwin dragonfly freebsd hurd linux netbsd openbsd solaris
 
 package syscall
 


[PATCH] Remove redundant builtins for avx512f scalar instructions.

2019-12-23 Thread Hongyu Wang
Hi:
  For avx512f scalar instructions, current builtin function like
__builtin_ia32_*{sd,ss}_round can be replaced by
__builtin_ia32_*{sd,ss}_mask_round with mask parameter set to -1. This
patch did the replacement and remove the corresponding redundant
builtins.

  Bootstrap is ok, make-check ok for i386 target.
  Ok for trunk?

Changelog

gcc/
* config/i386/avx512fintrin.h
(_mm_add_round_sd, _mm_add_round_ss): Use
 __builtin_ia32_adds?_mask_round builtins instead of
__builtin_ia32_adds?_round.
(_mm_sub_round_sd, _mm_sub_round_ss,
_mm_mul_round_sd, _mm_mul_round_ss,
_mm_div_round_sd, _mm_div_round_ss,
_mm_getexp_sd, _mm_getexp_ss,
_mm_getexp_round_sd, _mm_getexp_round_ss,
_mm_getmant_sd, _mm_getmant_ss,
_mm_getmant_round_sd, _mm_getmant_round_ss,
_mm_max_round_sd, _mm_max_round_ss,
_mm_min_round_sd, _mm_min_round_ss,
_mm_fmadd_round_sd, _mm_fmadd_round_ss,
_mm_fmsub_round_sd, _mm_fmsub_round_ss,
_mm_fnmadd_round_sd, _mm_fnmadd_round_ss,
_mm_fnmsub_round_sd, _mm_fnmsub_round_ss): Likewise.
* config/i386/i386-builtin.def
(__builtin_ia32_addsd_round, __builtin_ia32_addss_round,
__builtin_ia32_subsd_round, __builtin_ia32_subss_round,
__builtin_ia32_mulsd_round, __builtin_ia32_mulss_round,
__builtin_ia32_divsd_round, __builtin_ia32_divss_round,
__builtin_ia32_getexpsd128_round, __builtin_ia32_getexpss128_round,
__builtin_ia32_getmantsd_round, __builtin_ia32_getmantss_round,
__builtin_ia32_maxsd_round, __builtin_ia32_maxss_round,
__builtin_ia32_minsd_round, __builtin_ia32_minss_round,
__builtin_ia32_vfmaddsd3_round,
__builtin_ia32_vfmaddss3_round): Remove.
* config/i386/i386-expand.c
(ix86_expand_round_builtin): Remove corresponding case.

gcc/testsuite/
* lib/target-supports.exp
(check_effective_target_avx512f): Use
__builtin_ia32_getmantsd_mask_round builtins instead of
__builtin_ia32_getmantsd_round.
*gcc.target/i386/avx-1.c
(__builtin_ia32_addsd_round, __builtin_ia32_addss_round,
__builtin_ia32_subsd_round, __builtin_ia32_subss_round,
__builtin_ia32_mulsd_round, __builtin_ia32_mulss_round,
__builtin_ia32_divsd_round, __builtin_ia32_divss_round,
__builtin_ia32_getexpsd128_round, __builtin_ia32_getexpss128_round,
__builtin_ia32_getmantsd_round, __builtin_ia32_getmantss_round,
__builtin_ia32_maxsd_round, __builtin_ia32_maxss_round,
__builtin_ia32_minsd_round, __builtin_ia32_minss_round,
__builtin_ia32_vfmaddsd3_round,
__builtin_ia32_vfmaddss3_round): Remove.
*gcc.target/i386/sse-13.c: Ditto.
*gcc.target/i386/sse-23.c: Ditto.


Regards,
Hongyu Wang
From 9cc4928aad5770c53ff580f5c996092cdaf2f9ba Mon Sep 17 00:00:00 2001
From: hongyuw1 
Date: Wed, 18 Dec 2019 14:52:54 +
Subject: [PATCH] Remove redundant round builtins for avx512f scalar
 instructions

Changelog

gcc/
	* config/i386/avx512fintrin.h
	(_mm_add_round_sd, _mm_add_round_ss): Use
	 __builtin_ia32_adds?_mask_round builtins instead of
	__builtin_ia32_adds?_round.
	(_mm_sub_round_sd, _mm_sub_round_ss,
	_mm_mul_round_sd, _mm_mul_round_ss,
	_mm_div_round_sd, _mm_div_round_ss,
	_mm_getexp_sd, _mm_getexp_ss,
	_mm_getexp_round_sd, _mm_getexp_round_ss,
	_mm_getmant_sd, _mm_getmant_ss,
	_mm_getmant_round_sd, _mm_getmant_round_ss,
	_mm_max_round_sd, _mm_max_round_ss,
	_mm_min_round_sd, _mm_min_round_ss,
	_mm_fmadd_round_sd, _mm_fmadd_round_ss,
	_mm_fmsub_round_sd, _mm_fmsub_round_ss,
	_mm_fnmadd_round_sd, _mm_fnmadd_round_ss,
	_mm_fnmsub_round_sd, _mm_fnmsub_round_ss): Likewise.
	* config/i386/i386-builtin.def
	(__builtin_ia32_addsd_round, __builtin_ia32_addss_round,
	__builtin_ia32_subsd_round, __builtin_ia32_subss_round,
	__builtin_ia32_mulsd_round, __builtin_ia32_mulss_round,
	__builtin_ia32_divsd_round, __builtin_ia32_divss_round,
	__builtin_ia32_getexpsd128_round, __builtin_ia32_getexpss128_round,
	__builtin_ia32_getmantsd_round, __builtin_ia32_getmantss_round,
	__builtin_ia32_maxsd_round, __builtin_ia32_maxss_round,
	__builtin_ia32_minsd_round, __builtin_ia32_minss_round,
	__builtin_ia32_vfmaddsd3_round,
	__builtin_ia32_vfmaddss3_round): Remove.
	* config/i386/i386-expand.c
	(ix86_expand_round_builtin): Remove corresponding case.

gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_avx512f): Use
	__builtin_ia32_getmantsd_mask_round builtins instead of
	__builtin_ia32_getmantsd_round.
	*gcc.target/i386/avx-1.c
	(__builtin_ia32_addsd_round, __builtin_ia32_addss_round,
	__builtin_ia32_subsd_round, __builtin_ia32_subss_round,
	__builtin_ia32_mulsd_round, __builtin_ia32_mulss_round,
	__builtin_ia32_divsd_round, __builtin_ia32_divss_round,
	__builtin_ia32_getexpsd128_round, __builtin_ia32_getexpss128_round,
	__builtin_ia32_getmantsd_round, __builtin_ia32_getmantss_round,
	__bui