[PATCH] Fix PIE on netbsd (PR target/87221)

2018-11-09 Thread coypu
Re-sending because my patch doesn't seem to appear on the archive


This matches to what netbsd is doing with its own copy of GCC,
it can be simpler.

PR target/87221:
config/netbsd-elf.h (NETBSD_STARTFILE_SPEC): use crtbeginS.o for PIE
(NETBSD_ENDFILE_SPEC): use crtendS.o for PIE

---
 gcc/config/netbsd-elf.h | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/config/netbsd-elf.h b/gcc/config/netbsd-elf.h
index 4dc2aa757..26e5d996e 100644
--- a/gcc/config/netbsd-elf.h
+++ b/gcc/config/netbsd-elf.h
@@ -40,8 +40,11 @@ along with GCC; see the file COPYING3.  If not see
%{!p:crt0%O%s}}}\
%:if-exists(crti%O%s)   \
%{static:%:if-exists-else(crtbeginT%O%s crtbegin%O%s)} \
-   %{!static: \
- %{!shared:crtbegin%O%s} %{shared:crtbeginS%O%s}}"
+   %{!static:   \
+ %{!shared: \
+   %{!pie:crtbegin%O%s} \
+   %{pie:crtbeginS%O%s}}\
+ %{shared:crtbeginS%O%s}}"
 
 #undef STARTFILE_SPEC
 #define STARTFILE_SPEC NETBSD_STARTFILE_SPEC
@@ -52,7 +55,10 @@ along with GCC; see the file COPYING3.  If not see
C++ file-scope static objects deconstructed after exiting "main".  */
 
 #define NETBSD_ENDFILE_SPEC\
-  "%{!shared:crtend%O%s} %{shared:crtendS%O%s} \
+  "%{!shared:   \
+%{!pie:crtend%O%s}  \
+%{pie:crtendS%O%s}} \
+   %{shared:crtendS%O%s}\
%:if-exists(crtn%O%s)"
 
 #undef ENDFILE_SPEC
-- 
2.19.1


Fix handling of TYPE_USER_ALIGN

2018-11-09 Thread Jan Hubicka
Hi,
this patch fixes handling of TYPE_USER_ALIGN.  I tried to drop it
completely from LTO streaming and clear it in free lang data but it
breaks one ubsan testcase:

FAIL: c-c++-common/ubsan/pr63802.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  output pattern test
FAIL: c-c++-common/ubsan/pr63802.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  output pattern test

lto-bootstrapped/regtested x86_64-linux, OK?

Honza

* tree.c (fld_type_variant_equal_p): Test user align flag.
(flt_type_variant): Copy user align flag.
(fld_incomplete_type_of): Clear it.
Index: tree.c
===
--- tree.c  (revision 265914)
+++ tree.c  (working copy)
@@ -5109,7 +5109,8 @@ fld_type_variant_equal_p (tree t, tree v
   /* We want to match incomplete variants with complete types.
 In this case we need to ignore alignment.   */
   || ((!RECORD_OR_UNION_TYPE_P (t) || COMPLETE_TYPE_P (v))
- && TYPE_ALIGN (t) != TYPE_ALIGN (v))
+ && (TYPE_ALIGN (t) != TYPE_ALIGN (v)
+ || TYPE_USER_ALIGN (t) != TYPE_USER_ALIGN (v)))
   || fld_simplified_type_name (t) != fld_simplified_type_name (v)
   || !attribute_list_equal (TYPE_ATTRIBUTES (t),
TYPE_ATTRIBUTES (v)))
@@ -5140,7 +5141,10 @@ fld_type_variant (tree first, tree t, st
   /* Variants of incomplete types should have alignment 
  set to BITS_PER_UNIT.  Do not copy the actual alignment.  */
   if (!RECORD_OR_UNION_TYPE_P (v) || COMPLETE_TYPE_P (v))
-SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
+{
+  SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
+  TYPE_USER_ALIGN (v) = TYPE_USER_ALIGN (t);
+}
   gcc_checking_assert (fld_type_variant_equal_p (t,v));
   add_tree_to_fld_list (v, fld);
   return v;
@@ -5194,6 +5198,7 @@ fld_incomplete_type_of (tree t, struct f
  TYPE_SIZE (copy) = NULL;
  SET_TYPE_MODE (copy, VOIDmode);
  SET_TYPE_ALIGN (copy, BITS_PER_UNIT);
+ TYPE_USER_ALIGN (copy) = 0;
  TYPE_SIZE_UNIT (copy) = NULL;
  TYPE_CANONICAL (copy) = TYPE_CANONICAL (t);
  TYPE_TYPELESS_STORAGE (copy) = 0;


Re: Fix handling of TYPE_USER_ALIGN

2018-11-09 Thread Richard Biener
On November 9, 2018 10:00:08 AM GMT+01:00, Jan Hubicka  wrote:
>Hi,
>this patch fixes handling of TYPE_USER_ALIGN.  I tried to drop it
>completely from LTO streaming and clear it in free lang data but it
>breaks one ubsan testcase:
>
>FAIL: c-c++-common/ubsan/pr63802.c   -O2 -flto -fno-use-linker-plugin
>-flto-partition=none  output pattern test
>FAIL: c-c++-common/ubsan/pr63802.c   -O2 -flto -fuse-linker-plugin
>-fno-fat-lto-objects  output pattern test
>
>lto-bootstrapped/regtested x86_64-linux, OK?

OK. 

Richard. 

>Honza
>
>   * tree.c (fld_type_variant_equal_p): Test user align flag.
>   (flt_type_variant): Copy user align flag.
>   (fld_incomplete_type_of): Clear it.
>Index: tree.c
>===
>--- tree.c (revision 265914)
>+++ tree.c (working copy)
>@@ -5109,7 +5109,8 @@ fld_type_variant_equal_p (tree t, tree v
>   /* We want to match incomplete variants with complete types.
>In this case we need to ignore alignment.   */
>   || ((!RECORD_OR_UNION_TYPE_P (t) || COMPLETE_TYPE_P (v))
>-&& TYPE_ALIGN (t) != TYPE_ALIGN (v))
>+&& (TYPE_ALIGN (t) != TYPE_ALIGN (v)
>+|| TYPE_USER_ALIGN (t) != TYPE_USER_ALIGN (v)))
>   || fld_simplified_type_name (t) != fld_simplified_type_name (v)
>   || !attribute_list_equal (TYPE_ATTRIBUTES (t),
>   TYPE_ATTRIBUTES (v)))
>@@ -5140,7 +5141,10 @@ fld_type_variant (tree first, tree t, st
>   /* Variants of incomplete types should have alignment 
>  set to BITS_PER_UNIT.  Do not copy the actual alignment.  */
>   if (!RECORD_OR_UNION_TYPE_P (v) || COMPLETE_TYPE_P (v))
>-SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
>+{
>+  SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
>+  TYPE_USER_ALIGN (v) = TYPE_USER_ALIGN (t);
>+}
>   gcc_checking_assert (fld_type_variant_equal_p (t,v));
>   add_tree_to_fld_list (v, fld);
>   return v;
>@@ -5194,6 +5198,7 @@ fld_incomplete_type_of (tree t, struct f
> TYPE_SIZE (copy) = NULL;
> SET_TYPE_MODE (copy, VOIDmode);
> SET_TYPE_ALIGN (copy, BITS_PER_UNIT);
>+TYPE_USER_ALIGN (copy) = 0;
> TYPE_SIZE_UNIT (copy) = NULL;
> TYPE_CANONICAL (copy) = TYPE_CANONICAL (t);
> TYPE_TYPELESS_STORAGE (copy) = 0;



Re: [PATCH] Revert libsanitizer r318802 as we don't use Scudo allocator (PR sanitizer/87892).

2018-11-09 Thread Martin Liška
On 11/8/18 9:50 PM, Jeff Law wrote:
> On 11/8/18 1:48 PM, Jakub Jelinek wrote:
>> On Thu, Nov 08, 2018 at 01:43:29PM -0700, Jeff Law wrote:
>>> On 11/8/18 1:27 AM, Martin Liška wrote:
 libsanitizer/ChangeLog:

 2018-11-08  Martin Liska  

PR sanitizer/87892
* (all files): Revert upstream r318802.
>>> Is it causing a build failure or somesuch?  ie, why specifically are you
>>> wanting to remove it?
>>
>> Yes.  But perhaps it would be enough to just guard the CPU_COUNT use with
>> #ifdef CPU_COUNT and have some fallback, including say returning just 1.
>> If we care about Scudo or whatever is (what would be needed for that on the
>> compiler side?), then we'd need a proper implementation, one that doesn't
>> fail if a machine has more CPUs than fit into cpu_set_t, or if old glibc is
>> used and CPU_COUNT isn't defined, or even if the kernel doesn't have
>> affinity stuff at all.
> The obvious idea being to disable it with a more minimal patch making
> future merges easier -- if there's a less intrusive way to disable the
> bits, then that's fine with me.
> 
> jeff
> 

Ok, then I'm going to install following patch.

Martin
>From 40766a658e15019c8fd03f678a988ce53c2495b1 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 9 Nov 2018 09:44:15 +0100
Subject: [PATCH] Fallback in libsanitizer for scudo sanitizer (PR
 sanitizer/87892).

libsanitizer/ChangeLog:

2018-11-09  Martin Liska  

	PR sanitizer/87892
	* sanitizer_common/sanitizer_linux_libcdep.cc (defined): Return
	1 when CPU_COUNT macro is not defined.
---
 libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc
index 32f335eaf23..28360f5656a 100644
--- a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc
+++ b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc
@@ -694,9 +694,13 @@ u32 GetNumberOfCPUs() {
 #elif SANITIZER_SOLARIS
   return sysconf(_SC_NPROCESSORS_ONLN);
 #else
+#if defined(CPU_COUNT)
   cpu_set_t CPUs;
   CHECK_EQ(sched_getaffinity(0, sizeof(cpu_set_t), &CPUs), 0);
   return CPU_COUNT(&CPUs);
+#else
+  return 1;
+#endif
 #endif
 }
 
-- 
2.19.1



Re: [PATCH] Revert libsanitizer r318802 as we don't use Scudo allocator (PR sanitizer/87892).

2018-11-09 Thread Jakub Jelinek
On Fri, Nov 09, 2018 at 10:12:32AM +0100, Martin Liška wrote:
> Ok, then I'm going to install following patch.

Thanks.

> 2018-11-09  Martin Liska  
> 
>   PR sanitizer/87892
>   * sanitizer_common/sanitizer_linux_libcdep.cc (defined): Return
>   1 when CPU_COUNT macro is not defined.
> ---
>  libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc 
> b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc
> index 32f335eaf23..28360f5656a 100644
> --- a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc
> +++ b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc
> @@ -694,9 +694,13 @@ u32 GetNumberOfCPUs() {
>  #elif SANITIZER_SOLARIS
>return sysconf(_SC_NPROCESSORS_ONLN);
>  #else
> +#if defined(CPU_COUNT)
>cpu_set_t CPUs;
>CHECK_EQ(sched_getaffinity(0, sizeof(cpu_set_t), &CPUs), 0);
>return CPU_COUNT(&CPUs);
> +#else
> +  return 1;
> +#endif
>  #endif
>  }
>  
> -- 
> 2.19.1
> 

Jakub


Re: [PATCH] x86: Optimize VFIXUPIMM* patterns with multiple-alternative constraints

2018-11-09 Thread Wei Xiao
Hi Uros

Thanks for the remarks!
I improve the patch as attached to address the issues you mentioned:
1. No changes to substs any more.
2. Adopt established approach (e.g "rcp14") to
handle zero masks.

I'd like to explain our motivation of combining vfixupimm patterns: there will
be a lot of new x86 instructions with both masking and rounding like vfixupimm
in the future but we still want to keep x86 MD as short as possible and don't
want to write 2 patterns for each of these new instructions, which will also
raise code review cost for maintainer. We want to make sure the new pattern
paradigm is ok for x86 maintainer through this patch.

Wei
Uros Bizjak  于2018年11月7日周三 下午3:24写道:
>
> On Tue, Nov 6, 2018 at 11:16 AM Wei Xiao  wrote:
> >
> > Hi maintainers,
> >
> > The attached patch intends to optimize VFIXUPIMM* patterns with
> > multiple-alternative constraints and
> > 4 patterns are combined into 2 patterns. Tested with bootstrap and
> > regression tests on x86_64. No regressions.
> >
> > Is it OK for trunk?
>
> I'm not convinced that this particular optimization is a good idea.
> Looking at the patch, you have to add a whole bunch of substs just to
> merge two pattern sets. Also, the approach diverges from established
> approach of handling zero masks. The later raises maintenance costs
> for no compelling reason.
>
> I'd say to leave these patterns the way they are.
>
> Uros.


combine-vfixupimm-v2.diff
Description: Binary data


Re: Implement {get,set}_range_info() variants that work with value_range's

2018-11-09 Thread Aldy Hernandez




On 11/8/18 4:47 PM, Martin Sebor wrote:

On 11/08/2018 04:52 AM, Aldy Hernandez wrote:

get/set_range_info() currently returns the extremes of a range.  I have
implemented overloaded variants that return a proper range.  In the
future we should use actual ranges throughout, and not depend on range
extremes, as depending on this behavior could causes us to lose 
precision.


I am also including changes to size_must_be_zero_p() to show how we
should be using the range API, as opposed to performing error prone
ad-hoc calculations on ranges and anti-ranges.

Martin, I'm not saying your code is wrong.  There are numerous other
places in the compiler where we manipulate ranges/anti-ranges directly,
all of which should be adapted in the future.  Everywhere there is a
mention of VR_RANGE/VR_ANTI_RANGE in the compiler is suspect.  We should
ideally be using intersect/union/may_contain_p/null_p, etc.

OK pending another round of tests?  (As I had tested this patch along
with a whole slew of other upcoming changes ;-)).


I'm all for simplification! :)

I'd love it if the API made it possible to write these expressions
using native operators and intuitive names, without having to worry
about details like what types they are represented in or how to
convert between them.  It would be great if a function like
size_must_be_zero_p could be coded along the lines similar to:

   return intersect (size, value_range (ssize_type_node) - 0) == 0;

It shouldn't have to twiddle bits to compute the wide_int value
of SSIZE_MAX or wrestle with convertibility warts by calling
build_int_cst and and wide_int_to_tree.

Martin

PS I think we should be able to get close to the syntax above
with a few value_range ctors:

   value_range::value_range (tree type_or_expression);


Would this create the range for [type, type] or [-MIN, type] or what?



   template 
   value_range::value_range (const IntegerType&);

a non-member minus operator to compute the difference:

   value_range operator- (const value_range&, const value_range&);

a non-member intersect function like this:

   value_range intersect (const value_range&, const value_range&);

and a non-member equality operator:

   bool operator== (const value_range&, const value_range&);

With the above, because a tree and any integer (including wide_int,
offset_int, and all the other flavors du jour) implicitly convert
to a value_range (possibly representing a single value),
the subtraction, intersection, and equality would apply to any and
all of those types as well without callers having to spell out or
even know what representation they're dealing with.


I like it.

Aldy


Re: [PATCH] x86: Optimize VFIXUPIMM* patterns with multiple-alternative constraints

2018-11-09 Thread Uros Bizjak
On Fri, Nov 9, 2018 at 10:54 AM Wei Xiao  wrote:
>
> Hi Uros
>
> Thanks for the remarks!
> I improve the patch as attached to address the issues you mentioned:
> 1. No changes to substs any more.
> 2. Adopt established approach (e.g "rcp14") to
> handle zero masks.
>
> I'd like to explain our motivation of combining vfixupimm patterns: there will
> be a lot of new x86 instructions with both masking and rounding like vfixupimm
> in the future but we still want to keep x86 MD as short as possible and don't
> want to write 2 patterns for each of these new instructions, which will also
> raise code review cost for maintainer. We want to make sure the new pattern
> paradigm is ok for x86 maintainer through this patch.

Yes, the patch looks much nicer now.

+2018-11-09 Wei Xiao 
+ *config/i386/sse.md: Combine VFIXUPIMM* patterns
+ (_fixupimm_maskz): Update.
+ (_fixupimm): Update.
+ (_fixupimm_mask): Remove.
+ (avx512f_sfixupimm_maskz): Update.
+ (avx512f_sfixupimm): Update.
+ (avx512f_sfixupimm_mask): Remove.

(In future,  please add ChangeLog entry to the text of the mail).

OK for mainline.

Thanks,
Uros.


Re: [PR86438] compare-elim: cope with set of in_b

2018-11-09 Thread Alexandre Oliva
On Nov  8, 2018, Jeff Law  wrote:

>> PR rtl-optimization/86438
>> * gcc.dg/torture/pr86438.c: New.
> OK.

Thanks.  I ended up tweaking the test a bit further, when it occurred to
me that I might be able to trigger the same problem with -m32, but no
such luck.  Here's the test I'm installing, instead of the one I'd
posted.

diff --git a/gcc/testsuite/gcc.dg/torture/pr86438.c 
b/gcc/testsuite/gcc.dg/torture/pr86438.c
new file mode 100644
index ..3e95515ae6a6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr86438.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+
+typedef unsigned int u32;
+#if __SIZEOF_INT128__
+typedef unsigned long long u64;
+typedef unsigned __int128 u128;
+#else
+typedef unsigned long u64;
+typedef unsigned long long u128;
+#endif
+
+u128 g;
+
+static __attribute__ ((noinline, noclone))
+void check (u64 a, u64 b)
+{
+  if (a != 0 || b != 4)
+__builtin_abort ();
+}
+
+int
+main (void)
+{
+  u64 d = (g ? 5 : 4);
+  u32 f = __builtin_sub_overflow_p (d, (u128) d, (u64) 0);
+  u128 x = g + f + d;
+  check (x >> (sizeof (u64) * __CHAR_BIT__), x);
+  return 0;
+}


-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


Re: [PATCH] combine: Do not combine moves from hard registers

2018-11-09 Thread Richard Earnshaw (lists)
On 07/11/2018 23:02, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Nov 05, 2018 at 06:16:16PM +, Renlin Li wrote:
>> Sorry, this is not correct. Instructions scheduled between x and x+1 
>> directly use hard register r1.
>> It is not IRA/LRA assigning r1 to the operands.
>>
>>
>> To reproduce this particular case, you could use:
>> cc1  -O3 -marm -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp 
>> gcc.c-torture/execute/builtins/memcpy-chk.c
>>
>> This insn is been splitted.
>>
>> (insn 152 150 154 11 (set (mem/c:QI (plus:SI (reg/f:SI 266)
>> (const_int 24 [0x18])) [0 MEM[(void *)&p + 20B]+4 S1 A32])
>> (reg:QI 1 r1)) "memcpy-chk-reduce.c":48:3 189 {*arm_movqi_insn}
>>  (expr_list:REG_DEAD (reg:QI 1 r1)
>> (nil)))
> 
> Okay, I see what is going on.  The arm port often expands movmem to use
> hard registers (so that it can use load/store multiple?).  This does not
> work well with scheduling and RA, and this combine feature makes it worse.
> 

Yes, GCC doesn't have any way of describing the constraint "I need 
registers in ascending order of register number".  So the only way
around this is to nail down which registers are used.  In practice this
has worked pretty well for the last 20 years or so.

R.

> I don't know what to do about it in combine.
> 
> 
> Segher
> 



Re: [C++ PATCH 1/3] Fix various latent issues revealed by P0732 work.

2018-11-09 Thread Matthew Malcomson
(Third attempt to put this on the mailing list now -- today is not a good day 
for my email skills :-[)

Hi there,

This patch has caused a few g++ and libstdc++ regression test failures on arm,

I've included the g++ failures below.

Do you mind looking into this?
Cheers,

Matthew


dg-cmp-results.sh: Verbosity is 2, Variant is ""

Older log file: just-before-problemcase-g++.sum
Test Run By matthew on Thu Nov  8 16:45:40 2018
Target is arm-none-eabi
Host   is x86_64-pc-linux-gnu

Newer log file: g++-after-problem.sum
Test Run By matthew on Thu Nov  8 21:36:14 2018
Target is arm-none-eabi
Host   is x86_64-pc-linux-gnu

PASS->FAIL: g++.dg/cpp0x/constexpr-static12.C  -std=c++14 scan-assembler-not 
_ZNSt10unique_ptrC1Ei
PASS->FAIL: g++.dg/cpp0x/constexpr-static12.C  -std=c++17 scan-assembler-not 
_ZNSt10unique_ptrC1Ei
PASS->FAIL: g++.dg/cpp0x/constexpr-static3.C  -std=gnu++14 scan-assembler-not 
static_initialization
PASS->FAIL: g++.dg/cpp0x/constexpr-static3.C  -std=gnu++17 scan-assembler-not 
static_initialization
PASS->FAIL: g++.dg/cpp0x/constexpr-static.C  -std=gnu++14 scan-assembler-not 
static_initialization
PASS->FAIL: g++.dg/cpp0x/constexpr-static.C  -std=gnu++17 scan-assembler-not 
static_initialization
PASS->FAIL: g++.dg/cpp0x/implicit13.C  -std=c++14 scan-assembler-not _ZN1BC1Ev
PASS->FAIL: g++.dg/cpp0x/implicit13.C  -std=c++17 scan-assembler-not _ZN1BC1Ev
NA->FAIL: g++.dg/cpp0x/pr83734.C  -std=gnu++14 (internal compiler error)
PASS->FAIL: g++.dg/cpp0x/pr83734.C  -std=gnu++14 (test for excess errors)
NA->FAIL: g++.dg/cpp0x/pr83734.C  -std=gnu++17 (internal compiler error)
PASS->FAIL: g++.dg/cpp0x/pr83734.C  -std=gnu++17 (test for excess errors)
PASS->FAIL: g++.dg/cpp1y/constexpr-empty3.C  -std=c++14 scan-assembler-not 
static_init
PASS->FAIL: g++.dg/cpp1y/constexpr-empty3.C  -std=c++17 scan-assembler-not 
static_init


On 05/11/18 02:06, Jason Merrill wrote:
The initialized_type hunk fixes handling of void AGGR_INIT_EXPRs that call a
non-constructor; an AGGR_INIT_EXPR can have void type if its initialization
semantics are more complicated than just expanding the call.

The cxx_eval_vec_init_1 hunk corrects AGGR_INIT_EXPRs that were
nonsensically built to initialize an object of void type.  And the
build_aggr_init_expr hunk makes sure we don't do that again.

The ocp_convert and cxx_eval_outermost_constant_expr hunks deal with making
sure that a constant CONSTRUCTOR has the right type.

Tested x86_64-pc-linux-gnu, applying to trunk.

* cvt.c (ocp_convert): Don't wrap a CONSTRUCTOR in a NOP_EXPR.
* constexpr.c (initialized_type): Fix AGGR_INIT_EXPR handling.
(cxx_eval_vec_init_1): Correct type of AGGR_INIT_EXPR.
(cxx_eval_outermost_constant_expr): Make sure a CONSTRUCTOR has the
right type.  Don't wrap a CONSTRUCTOR if one was passed in.
* tree.c (build_aggr_init_expr): Check for void.
---
  gcc/cp/constexpr.c | 22 +-
  gcc/cp/cvt.c   | 10 +-
  gcc/cp/tree.c  |  2 ++
  gcc/cp/ChangeLog   |  7 +++
  4 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 7692b1727da..4fb1ba527e3 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -2778,8 +2778,10 @@ initialized_type (tree t)
  {
if (TYPE_P (t))
  return t;
-  tree type = cv_unqualified (TREE_TYPE (t));
-  if (TREE_CODE (t) == CALL_EXPR || TREE_CODE (t) == AGGR_INIT_EXPR)
+  tree type = TREE_TYPE (t);
+  if (!VOID_TYPE_P (type))
+/* No need to look deeper.  */;
+  else if (TREE_CODE (t) == CALL_EXPR)
  {
/* A constructor call has void type, so we need to look deeper.  */
tree fn = get_function_named_in_call (t);
@@ -2787,7 +2789,9 @@ initialized_type (tree t)
&& DECL_CXX_CONSTRUCTOR_P (fn))
  type = DECL_CONTEXT (fn);
  }
-  return type;
+  else if (TREE_CODE (t) == AGGR_INIT_EXPR)
+type = TREE_TYPE (AGGR_INIT_EXPR_SLOT (t));
+  return cv_unqualified (type);
  }
/* We're about to initialize element INDEX of an array or class from VALUE.
@@ -3000,7 +3004,7 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree 
atype, tree init,
  &argvec, elttype, LOOKUP_NORMAL,
  complain);
release_tree_vector (argvec);
-  init = build_aggr_init_expr (TREE_TYPE (init), init);
+  init = build_aggr_init_expr (elttype, init);
pre_init = true;
  }
  @@ -5089,7 +5093,7 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
  r = build_nop (TREE_TYPE (r), r);
TREE_CONSTANT (r) = false;
  }
-  else if (non_constant_p || r == t)
+  else if (non_constant_p)
  return t;
  if (should_unshare)
@@ -5097,18 +5101,18 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
  if (TREE_CODE (r) == CONSTRUCTOR && CLASS_TYPE_P (TREE_TYPE (r)))
  {
+  r = adjust_temp_type (type, r);
if (TREE_CODE (t) == TARGET_EXPR
&& TARGET_EXPR_INITIAL (t) == r)
  return 

Re: [C++ PATCH 1/3] Fix various latent issues revealed by P0732 work.

2018-11-09 Thread Kyrill Tkachov



On 09/11/18 10:28, Matthew Malcomson wrote:

(Third attempt to put this on the mailing list now -- today is not a good day 
for my email skills :-[)

Hi there,

This patch has caused a few g++ and libstdc++ regression test failures on arm,

I've included the g++ failures below.

Do you mind looking into this?
Cheers,

Matthew


dg-cmp-results.sh: Verbosity is 2, Variant is ""

Older log file: just-before-problemcase-g++.sum
Test Run By matthew on Thu Nov  8 16:45:40 2018
Target is arm-none-eabi
Host   is x86_64-pc-linux-gnu

Newer log file: g++-after-problem.sum
Test Run By matthew on Thu Nov  8 21:36:14 2018
Target is arm-none-eabi
Host   is x86_64-pc-linux-gnu

PASS->FAIL: g++.dg/cpp0x/constexpr-static12.C  -std=c++14 scan-assembler-not 
_ZNSt10unique_ptrC1Ei
PASS->FAIL: g++.dg/cpp0x/constexpr-static12.C  -std=c++17 scan-assembler-not 
_ZNSt10unique_ptrC1Ei
PASS->FAIL: g++.dg/cpp0x/constexpr-static3.C -std=gnu++14 scan-assembler-not 
static_initialization
PASS->FAIL: g++.dg/cpp0x/constexpr-static3.C -std=gnu++17 scan-assembler-not 
static_initialization
PASS->FAIL: g++.dg/cpp0x/constexpr-static.C  -std=gnu++14 scan-assembler-not 
static_initialization
PASS->FAIL: g++.dg/cpp0x/constexpr-static.C  -std=gnu++17 scan-assembler-not 
static_initialization
PASS->FAIL: g++.dg/cpp0x/implicit13.C  -std=c++14 scan-assembler-not _ZN1BC1Ev
PASS->FAIL: g++.dg/cpp0x/implicit13.C  -std=c++17 scan-assembler-not _ZN1BC1Ev
NA->FAIL: g++.dg/cpp0x/pr83734.C  -std=gnu++14 (internal compiler error)
PASS->FAIL: g++.dg/cpp0x/pr83734.C  -std=gnu++14 (test for excess errors)
NA->FAIL: g++.dg/cpp0x/pr83734.C  -std=gnu++17 (internal compiler error)
PASS->FAIL: g++.dg/cpp0x/pr83734.C  -std=gnu++17 (test for excess errors)
PASS->FAIL: g++.dg/cpp1y/constexpr-empty3.C  -std=c++14 scan-assembler-not 
static_init
PASS->FAIL: g++.dg/cpp1y/constexpr-empty3.C  -std=c++17 scan-assembler-not 
static_init




I believe these are tracked in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87935 FWIW.

Kyrill


On 05/11/18 02:06, Jason Merrill wrote:
The initialized_type hunk fixes handling of void AGGR_INIT_EXPRs that call a
non-constructor; an AGGR_INIT_EXPR can have void type if its initialization
semantics are more complicated than just expanding the call.

The cxx_eval_vec_init_1 hunk corrects AGGR_INIT_EXPRs that were
nonsensically built to initialize an object of void type. And the
build_aggr_init_expr hunk makes sure we don't do that again.

The ocp_convert and cxx_eval_outermost_constant_expr hunks deal with making
sure that a constant CONSTRUCTOR has the right type.

Tested x86_64-pc-linux-gnu, applying to trunk.

* cvt.c (ocp_convert): Don't wrap a CONSTRUCTOR in a NOP_EXPR.
* constexpr.c (initialized_type): Fix AGGR_INIT_EXPR handling.
(cxx_eval_vec_init_1): Correct type of AGGR_INIT_EXPR.
(cxx_eval_outermost_constant_expr): Make sure a CONSTRUCTOR has the
right type.  Don't wrap a CONSTRUCTOR if one was passed in.
* tree.c (build_aggr_init_expr): Check for void.
---
  gcc/cp/constexpr.c | 22 +-
  gcc/cp/cvt.c   | 10 +-
  gcc/cp/tree.c  |  2 ++
  gcc/cp/ChangeLog   |  7 +++
  4 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 7692b1727da..4fb1ba527e3 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -2778,8 +2778,10 @@ initialized_type (tree t)
  {
if (TYPE_P (t))
  return t;
-  tree type = cv_unqualified (TREE_TYPE (t));
-  if (TREE_CODE (t) == CALL_EXPR || TREE_CODE (t) == AGGR_INIT_EXPR)
+  tree type = TREE_TYPE (t);
+  if (!VOID_TYPE_P (type))
+/* No need to look deeper.  */;
+  else if (TREE_CODE (t) == CALL_EXPR)
  {
/* A constructor call has void type, so we need to look deeper.  */
tree fn = get_function_named_in_call (t);
@@ -2787,7 +2789,9 @@ initialized_type (tree t)
&& DECL_CXX_CONSTRUCTOR_P (fn))
  type = DECL_CONTEXT (fn);
  }
-  return type;
+  else if (TREE_CODE (t) == AGGR_INIT_EXPR)
+type = TREE_TYPE (AGGR_INIT_EXPR_SLOT (t));
+  return cv_unqualified (type);
  }
/* We're about to initialize element INDEX of an array or class from VALUE.
@@ -3000,7 +3004,7 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree 
atype, tree init,
  &argvec, elttype, LOOKUP_NORMAL,
  complain);
release_tree_vector (argvec);
-  init = build_aggr_init_expr (TREE_TYPE (init), init);
+  init = build_aggr_init_expr (elttype, init);
pre_init = true;
  }
  @@ -5089,7 +5093,7 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
  r = build_nop (TREE_TYPE (r), r);
TREE_CONSTANT (r) = false;
  }
-  else if (non_constant_p || r == t)
+  else if (non_constant_p)
  return t;
  if (should_unshare)
@@ -5097,18 +5101,18 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
  if (TREE_CODE (r) == CONSTRUCTOR && CLASS_TYPE_P (TREE_TYPE (r

Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9

2018-11-09 Thread Dominique d'Humières
Hi Jakub,

Bootstrapping r265942 on darwin failed with

In file included from 
/Applications/Xcode-6.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk/usr/include/stdio.h:490,
 from ../../../work/libgomp/affinity-fmt.c:28:
../../../work/libgomp/affinity-fmt.c: In function 'gomp_display_affinity':
../../../work/libgomp/affinity-fmt.c:369:17: error: format '%x' expects 
argument of type 'unsigned int', but argument 5 has type 'long unsigned int' 
-Werror=format=]
  369 |   sprintf (buf, "0x%x", (uintptr_t) handle);
  | ^~  ~~
  | |
  | long unsigned int
../../../work/libgomp/affinity-fmt.c:369:21: note: format string is defined here
  369 |   sprintf (buf, "0x%x", (uintptr_t) handle);
  |~^
  | |
  | unsigned int
  |%lx
cc1: all warnings being treated as errors

I have managed to bootstrap with the following hack:

--- ../_clean/libgomp/affinity-fmt.c2018-11-08 19:03:37.0 +0100
+++ libgomp/affinity-fmt.c  2018-11-09 01:00:16.0 +0100
@@ -362,11 +362,11 @@ gomp_display_affinity (char *buffer, siz
  char buf[3 * (sizeof (handle) + sizeof (int)) + 4];
 
  if (sizeof (handle) == sizeof (long))
-   sprintf (buf, "0x%lx", (long) handle);
+   sprintf (buf, "0x%lx", (long) (uintptr_t) handle);
  else if (sizeof (handle) == sizeof (long long))
-   sprintf (buf, "0x%llx", (long long) handle);
+   sprintf (buf, "0x%llx", (long long) (uintptr_t) handle);
  else
-   sprintf (buf, "0x%x", (int) handle);
+   sprintf (buf, "0x%x", (int) (uintptr_t) handle);
  gomp_display_num (buffer, size, &ret, zero, right, sz, buf);
  break;
}

which is certainly wrong, but allowed me to bootstrap.

TIA

Dominique

PS I can file a PR if necessary.



[PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

2018-11-09 Thread Kyrill Tkachov

Hi all,

In this testcase the codegen for VLA SVE is worse than it could be due to 
unrolling:

fully_peel_me:
mov x1, 5
ptrue   p1.d, all
whilelo p0.d, xzr, x1
ld1dz0.d, p0/z, [x0]
faddz0.d, z0.d, z0.d
st1dz0.d, p0, [x0]
cntdx2
addvl   x3, x0, #1
whilelo p0.d, x2, x1
beq .L1
ld1dz0.d, p0/z, [x0, #1, mul vl]
faddz0.d, z0.d, z0.d
st1dz0.d, p0, [x3]
cntwx2
incbx0, all, mul #2
whilelo p0.d, x2, x1
beq .L1
ld1dz0.d, p0/z, [x0]
faddz0.d, z0.d, z0.d
st1dz0.d, p0, [x0]
.L1:
ret

In this case, due to the vector-length-agnostic nature of SVE the compiler 
doesn't know the loop iteration count.
For such loops we don't want to unroll if we don't end up eliminating branches 
as this just bloats code size
and hurts icache performance.

This patch introduces a new unroll-known-loop-iterations-only param that 
disables cunroll when the loop iteration
count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA 
code, but it does help some
Advanced SIMD cases as well where loops with an unknown iteration count are not 
unrolled when it doesn't eliminate
the branches.

So for the above testcase we generate now:
fully_peel_me:
mov x2, 5
mov x3, x2
mov x1, 0
whilelo p0.d, xzr, x2
ptrue   p1.d, all
.L2:
ld1dz0.d, p0/z, [x0, x1, lsl 3]
faddz0.d, z0.d, z0.d
st1dz0.d, p0, [x0, x1, lsl 3]
incdx1
whilelo p0.d, x1, x3
bne .L2
ret

Not perfect still, but it's preferable to the original code.
The new param is enabled by default on aarch64 but disabled for other targets, 
leaving their behaviour unchanged
(until other target people experiment with it and set it, if appropriate).

Bootstrapped and tested on aarch64-none-linux-gnu.
Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in 
performance.

Ok for trunk?
Thanks,
Kyrill


2018-11-09  Kyrylo Tkachov  

* params.def (PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY): Define.
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Use above to
disable unrolling on unknown iteration count.
* config/aarch64/aarch64.c (aarch64_override_options_internal): Set
PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY to 1.
* doc/invoke.texi (--param unroll-known-loop-iterations-only):
Document.

2018-11-09  Kyrylo Tkachov  

* gcc.target/aarch64/sve/unroll-1.c: New test.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f0e54eda80656829528c018357dde2e1e87f6ebd..34d08a075221fd4c098e9b5e8fabd8fe3948d285 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10993,6 +10993,12 @@ aarch64_override_options_internal (struct gcc_options *opts)
 			 opts->x_param_values,
 			 global_options_set.x_param_values);
 
+  /* Don't unroll loops where the exact iteration count is not known at
+ compile-time.  */
+  maybe_set_param_value (PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY, 1,
+			 opts->x_param_values,
+			 global_options_set.x_param_values);
+
   /* If the user hasn't changed it via configure then set the default to 64 KB
  for the backend.  */
   maybe_set_param_value (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 802cc642453aef2d2c516bcbda22246252ec87c1..74e2aeda27d718264188761cf522d6c9f8025e07 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10732,6 +10732,9 @@ The maximum number of branches on the hot path through the peeled sequence.
 @item max-completely-peeled-insns
 The maximum number of insns of a completely peeled loop.
 
+@item unroll-known-loop-iterations-only
+Only completely unroll loops where the iteration count is known.
+
 @item max-completely-peel-times
 The maximum number of iterations of a loop to be suitable for complete peeling.
 
diff --git a/gcc/params.def b/gcc/params.def
index 4a5f2042dac72bb457488ac8bc35d09df94c929c..07946552232058cee41303e81ed694f7f0bb615e 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -344,6 +344,11 @@ DEFPARAM(PARAM_MAX_UNROLL_ITERATIONS,
 	 "The maximum depth of a loop nest we completely peel.",
 	 8, 0, 0)
 
+DEFPARAM(PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY,
+	 "unroll-known-loop-iterations-only",
+	 "Only completely unroll loops where the iteration count is known",
+	 0, 0, 1)
+
 /* The maximum number of insns of an unswitched loop.  */
 DEFPARAM(PARAM_MAX_UNSWITCH_INSNS,
 	"max-unswitch-insns",
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unroll-1.c b/gcc/testsuite/gcc.target/aarch64/sve/unroll-1.c
new file mode 100644
index ..7f53d20cbf8e18a4389b86c037f56f024bac22a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/unroll-1.c
@@ -0,0 +1

Re: introduce --enable-mingw-full32 to default to --large-address-aware

2018-11-09 Thread Alexandre Oliva
On Nov  8, 2018, JonY <10wa...@gmail.com> wrote:

> No, no. By quick I just mean using -Wl,--large-address-aware on an
> existing gcc install, nothing complex. Sorry about not making it clear.

Ah, good!

> I now understand the problem, thanks for the clarification about the
> patch. Patch is OK.

Thanks, here's what I'm installing (with comments indicating which
configuration each setting applies to, and why one of them won't work)


large-addr-aware for biarch non-w64 mingw32

for  gcc/ChangeLog

* config/i386/mingw32.h (LINK_SPEC_LARGE_ADDR_AWARE): Adjust
for non-w64 x86_64 biarch.
---
 gcc/config/i386/mingw32.h |   19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/mingw32.h b/gcc/config/i386/mingw32.h
index c9d8a7a31f30..5759d8d16aef 100644
--- a/gcc/config/i386/mingw32.h
+++ b/gcc/config/i386/mingw32.h
@@ -114,11 +114,26 @@ along with GCC; see the file COPYING3.  If not see
 #define SUBTARGET_EXTRA_SPECS  \
   { "shared_libgcc_undefs", SHARED_LIBGCC_UNDEFS_SPEC }
 
-#if MINGW_DEFAULT_LARGE_ADDR_AWARE
+#if ! MINGW_DEFAULT_LARGE_ADDR_AWARE
+/* This is used without --enable-large-address-aware.  */
+# define LINK_SPEC_LARGE_ADDR_AWARE ""
+#elif ! TARGET_BI_ARCH
+/* This is used on i686-pc-mingw32 with --enable-large-address-aware.  */
 # define LINK_SPEC_LARGE_ADDR_AWARE \
   "%{!shared:%{!mdll:--large-address-aware}}"
+#elif TARGET_64BIT_DEFAULT
+/* This is used on x86_64-pc-mingw32 with --enable-large-address-aware.
+   ??? It probably doesn't work, because the linker emulation defaults
+   to i386pep, the 64-bit mode that does not support
+   --large-address-aware, and x86_64-pc-mingw32 does not override the
+   emulation to i386pe for -m32, unlike x86_64-w64-mingw32.  */
+# define LINK_SPEC_LARGE_ADDR_AWARE \
+  "%{!shared:%{!mdll:%{m32:--large-address-aware}}}"
 #else
-# define LINK_SPEC_LARGE_ADDR_AWARE ""
+/* This would only be used if someone introduced a biarch
+   configuration that defaulted to 32-bit.  */
+# define LINK_SPEC_LARGE_ADDR_AWARE \
+  "%{!shared:%{!mdll:%{!m64:--large-address-aware}}}"
 #endif
 
 #define LINK_SPEC "%{mwindows:--subsystem windows} \


-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


arm - use the new CPU alias option to simplify the list of CPUs.

2018-11-09 Thread Richard Earnshaw (lists)
This patch simplifies the table of CPUs supported in GCC by making
use of the new alias feature.  Most of the changes are fairly
straight-forward:
- arm7tdmi and arm7tdmi-s are the same thing.
- arm710t, arm720t and arm740t differ only in features external to the core
- arm920 and arm920t are the same thing; arm922t and arm940t differ from
  arm920t only in features external to the core; ep9312 is an
arm920t-derived
  core that we continue to recognize for legacy reasons.
- arm10tdmi and arm1020t differ only in features external to the core.
- arm9e, arm946te-s, arm966e-s and arm968e-s differ only in features
external
  to the core.
- arm10e, arm1020e and arm1022e differ only in features external to the
core.

The arm10e/arm1020e/arm1022e change is the only one which changes
behaviour of the compiler slightly.  Previously, and for no reason
that I can remember, the scheduler for arm1020e/arm1022e was not used
for arm10e: this was probably an oversight.  The unification means
that the same scheduler is now used for all three cores.

* config/arm/arm-cpus.in (arm7tdmi): Add an alias for arm7tdmi-s.
(arm7tdmi-s): Delete CPU.
(arm710t): Add aliases for arm720t and arm740t.
(arm720t, arm740t): Delete CPUs.
(arm920t): Add aliases for arm920, arm922t and arm940t.
(arm920, arm922t, arm940t): Delete CPUs.
(arm10tdmi): Add alias for arm1020t.
(arm1020t): Delete CPU.
(arm9e): Add aliases for arm946e-s, arm966e-s and arm968e-s.
(arm946e-s, arm966e-s, arm968e-s): Delete CPUs.
(arm10e): Add aliases for arm1020e and arm1022e.
(arm1020e, arm1022e): Delete CPU.
* config/arm/arm.md (generic_sched): Remove entries that are now
handled by aliases.
(generic_vfp): Likewise.
* config/arm/arm1020e.md: Simplify tuning selection based on alias
changes.
* config/arm/arm-tune.md: Regenerated.
* config/arm/arm-tables.opt: Regenerated.

Applied to trunk.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 1def1cace68..7189fc9210d 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -683,36 +683,19 @@ end cpu fa626
 
 # V4T Architecture Processors
 begin cpu arm7tdmi
+ alias arm7tdmi-s
  tune flags CO_PROC
  architecture armv4t
  costs fastmul
 end cpu arm7tdmi
 
-begin cpu arm7tdmi-s
- cname arm7tdmis
- tune flags CO_PROC
- architecture armv4t
- costs fastmul
-end cpu arm7tdmi-s
-
 begin cpu arm710t
+ alias arm720t arm740t
  tune flags WBUF
  architecture armv4t
  costs fastmul
 end cpu arm710t
 
-begin cpu arm720t
- tune flags WBUF
- architecture armv4t
- costs fastmul
-end cpu arm720t
-
-begin cpu arm740t
- tune flags WBUF
- architecture armv4t
- costs fastmul
-end cpu arm740t
-
 begin cpu arm9
  tune flags LDSCHED
  architecture armv4t
@@ -725,105 +708,41 @@ begin cpu arm9tdmi
  costs fastmul
 end cpu arm9tdmi
 
-begin cpu arm920
- tune flags LDSCHED
- architecture armv4t
- costs fastmul
-end cpu arm920
-
 begin cpu arm920t
+ alias arm920 arm922t arm940t ep9312
  tune flags LDSCHED
  architecture armv4t
  costs fastmul
 end cpu arm920t
 
-begin cpu arm922t
- tune flags LDSCHED
- architecture armv4t
- costs fastmul
-end cpu arm922t
-
-begin cpu arm940t
- tune flags LDSCHED
- architecture armv4t
- costs fastmul
-end cpu arm940t
-
-begin cpu ep9312
- tune flags LDSCHED
- architecture armv4t
- costs fastmul
-end cpu ep9312
-
 
 # V5T Architecture Processors
 # These used VFPv1 which isn't supported by GCC
 begin cpu arm10tdmi
+ alias arm1020t
  tune flags LDSCHED
  architecture armv5t
  costs fastmul
 end cpu arm10tdmi
 
-begin cpu arm1020t
- tune flags LDSCHED
- architecture armv5t
- costs fastmul
-end cpu arm1020t
-
 
 # V5TE Architecture Processors
 begin cpu arm9e
+ alias arm946e-s arm966e-s arm968e-s
  tune flags LDSCHED
  architecture armv5te+fp
  option nofp remove ALL_FP
  costs 9e
 end cpu arm9e
 
-begin cpu arm946e-s
- cname arm946es
- tune flags LDSCHED
- architecture armv5te+fp
- option nofp remove ALL_FP
- costs 9e
-end cpu arm946e-s
-
-begin cpu arm966e-s
- cname arm966es
- tune flags LDSCHED
- architecture armv5te+fp
- option nofp remove ALL_FP
- costs 9e
-end cpu arm966e-s
-
-begin cpu arm968e-s
- cname arm968es
- tune flags LDSCHED
- architecture armv5te+fp
- option nofp remove ALL_FP
- costs 9e
-end cpu arm968e-s
-
 begin cpu arm10e
+ alias arm1020e arm1022e
  tune flags LDSCHED
  architecture armv5te+fp
  option nofp remove ALL_FP
  costs fastmul
 end cpu arm10e
 
-begin cpu arm1020e
- tune flags LDSCHED
- architecture armv5te+fp
- option nofp remove ALL_FP
- costs fastmul
-end cpu arm1020e
-
-begin cpu arm1022e
- tune flags LDSCHED
- architecture armv5te+fp
- option nofp remove ALL_FP
- costs fastmul
-end cpu arm1022e
-
 begin cpu xscale
  tune flags LDSCHED XSCALE
  architecture armv5te
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index cd496366cec..b314ec30ede 100644
--- a/gcc/config/arm/arm

Re: [PATCH 3/3][GCC][AARCH64] Add support for pointer authentication B key

2018-11-09 Thread Sam Tebbs
On 11/02/2018 06:01 PM, Sam Tebbs wrote:

> On 11/02/2018 05:35 PM, Sam Tebbs wrote:
>
>> Hi all,
>>
>> This patch adds support for the Armv8.3-A pointer authentication instructions
>> that use the B-key (pacib*, autib* and retab). This required adding builtins 
>> for
>> pacib1716 and autib1716, adding the "b-key" feature to the 
>> -mbranch-protection
>> option, and required emitting a new CFI directive ".cfi_b_key_frame" which
>> causes GAS to add 'B' to the CIE augmentation string. I also had to add a new
>> hook called ASM_POST_CFI_STARTPROC which is triggered when the .cfi_startproc
>> directive is emitted.
>>
>> The libgcc stack unwinder has been amended to authenticate return addresses
>> with the B key when the function has been signed with the B key.
>>
>> The previous patch in this series is here:
>> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00104.html
>>
>> Bootstrapped successfully and regression tested on aarch64-none-elf.
>>
>> OK for trunk?
>>
>> gcc/
>> 2018-11-02  Sam Tebbs  
>>
>>  * config/aarch64/aarch64-builtins.c (aarch64_builtins): Add
>>  AARCH64_PAUTH_BUILTIN_AUTIB1716 and AARCH64_PAUTH_BUILTIN_PACIB1716.
>>  * config/aarch64/aarch64-builtins.c (aarch64_init_pauth_hint_builtins):
>>  Add autib1716 and pacib1716 initialisation.
>>  * config/aarch64/aarch64-builtins.c (aarch64_expand_builtin): Add checks
>>  for autib1716 and pacib1716.
>>  * config/aarch64/aarch64-protos.h (aarch64_key_type,
>>  aarch64_post_cfi_startproc): Define.
>>  * config/aarch64/aarch64-protos.h (aarch64_ra_sign_key): Define extern.
>>  * config/aarch64/aarch64.c (aarch64_return_address_signing_enabled): Add
>>  check for b-key, remove frame.laid_out assertion.
>>  * config/aarch64/aarch64.c (aarch64_ra_sign_key,
>>  aarch64_post_cfi_startproc, aarch64_handle_pac_ret_b_key): Define.
>>  * config/aarch64/aarch64.h (TARGET_ASM_POST_CFI_STARTPROC): Define.
>>  * config/aarch64/aarch64.c (aarch64_pac_ret_subtypes): Add "b-key".
>>  * config/aarch64/aarch64.md (unspec): Add UNSPEC_AUTIA1716,
>>  UNSPEC_AUTIB1716, UNSPEC_AUTIASP, UNSPEC_AUTIBSP, UNSPEC_PACIA1716,
>>  UNSPEC_PACIB1716, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>>  * config/aarch64/aarch64.md (do_return): Add check for b-key.
>>  * config/aarch64/aarch64.md (sp): Add check for
>>  signing key and scope selected.
>>  * config/aarch64/aarch64.md (1716): Add check for
>>  signing key and scope selected.
>>  * config/aarch64/aarch64.opt (msign-return-address=): Deprecate.
>>  * config/aarch64/iterators.md (PAUTH_LR_SP): Add UNSPEC_AUTIASP,
>>  UNSPEC_AUTIBSP, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>>  * config/aarch64/iterators.md (PAUTH_17_16): Add UNSPEC_AUTIA1716,
>>  UNSPEC_AUTIB1716, UNSPEC_PACIA1716, UNSPEC_PACIB1716.
>>  * config/aarch64/iterators.md (pauth_mnem_prefix): Add UNSPEC_AUTIA1716,
>>  UNSPEC_AUTIB1716, UNSPEC_PACIA1716, UNSPEC_PACIB1716, UNSPEC_AUTIASP,
>>  UNSPEC_AUTIBSP, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>>  * config/aarch64/iterators.md (pauth_hint_num_a): Replace
>>  UNSPEC_PACI1716 and UNSPEC_AUTI1716 with UNSPEC_PACIA1716 and
>>  UNSPEC_AUTIA1716 respectively.
>>  * config/aarch64/iterators.md (pauth_hint_num_b): New int attribute.
>>
>> gcc/testsuite
>> 2018-11-02  Sam Tebbs  
>>
>>  * gcc.target/aarch64/return_address_sign_1.c (dg-final): Replace
>>  "autiasp" and "paciasp" with "hint\t29 // autisp" and
>>  "hint\t25 // pacisp" respectively.
>>  * gcc.target/aarch64/return_address_sign_2.c (dg-final): Replace
>>  "paciasp" with "hint\t25 // pacisp".
>>  * gcc.target/aarch64/return_address_sign_3.c (dg-final): Replace
>>  "paciasp" and "autiasp" with "pacisp" and "autisp" respectively.
>>  * gcc.target/aarch64/return_address_sign_b_1.c: New file.
>>  * gcc.target/aarch64/return_address_sign_b_2.c: New file.
>>  * gcc.target/aarch64/return_address_sign_b_3.c: New file.
>>  * gcc.target/aarch64/return_address_sign_b_exception.c: New file.
>>  * gcc.target/aarch64/return_address_sign_builtin.c: New file
>>
>> libgcc/
>> 2018-11-02  Sam Tebbs  
>>
>>  * config/aarch64/aarch64-unwind.h (aarch64_cie_signed_with_b_key): New
>>  function.
>>  * config/aarch64/aarch64-unwind.h (aarch64_post_extract_frame_addr,
>>  aarch64_post_frob_eh_handler_addr): Add check for b-key.
>>  * unwind-dw2-fde.c (get_cie_encoding): Add check for 'B' in augmentation
>>  string.
>>  * unwind-dw2.c (extract_cie_info): Add check for 'B' in augmentation
>>  string.
> Attached is an updated patch rebased on an improvement to the
> -mbranch-protection option documentation.

ping


Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9

2018-11-09 Thread Jakub Jelinek
On Fri, Nov 09, 2018 at 11:34:48AM +0100, Dominique d'Humières wrote:
> Bootstrapping r265942 on darwin failed with
> 
> In file included from 
> /Applications/Xcode-6.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk/usr/include/stdio.h:490,
>  from ../../../work/libgomp/affinity-fmt.c:28:
> ../../../work/libgomp/affinity-fmt.c: In function 'gomp_display_affinity':
> ../../../work/libgomp/affinity-fmt.c:369:17: error: format '%x' expects 
> argument of type 'unsigned int', but argument 5 has type 'long unsigned int' 
> -Werror=format=]
>   369 |   sprintf (buf, "0x%x", (uintptr_t) handle);
>   | ^~  ~~

Weird, the above prints a line which just isn't there, line 369 is:
sprintf (buf, "0x%x", (int) handle);
not
sprintf (buf, "0x%x", (uintptr_t) handle);

What is pthread_t on your platform?  An integral type (which one), pointer
or something different (e.g. an aggregate)?

I wonder if I shouldn't use __builtin_choose_expr to wrap it up and select
at compile time the only variant that applies.

Something like:

--- libgomp/affinity-fmt.c  2018-10-08 13:27:41.021061844 +0200
+++ libgomp/affinity-fmt.c  2018-11-09 12:01:17.048151087 +0100
@@ -356,37 +356,39 @@ gomp_display_affinity (char *buffer, siz
  goto do_int;
case 'i':
 #if defined(LIBGOMP_USE_PTHREADS) && defined(__GNUC__)
- /* Handle integral pthread_t.  */
- if (__builtin_classify_type (handle) == 1)
-   {
- char buf[3 * (sizeof (handle) + sizeof (int)) + 4];
-
- if (sizeof (handle) == sizeof (long))
-   sprintf (buf, "0x%lx", (long) handle);
- else if (sizeof (handle) == sizeof (long long))
-   sprintf (buf, "0x%llx", (long long) handle);
- else
-   sprintf (buf, "0x%x", (int) handle);
- gomp_display_num (buffer, size, &ret, zero, right, sz, buf);
- break;
-   }
- /* And pointer pthread_t.  */
- else if (__builtin_classify_type (handle) == 5)
-   {
- char buf[3 * (sizeof (uintptr_t) + sizeof (int)) + 4];
-
- if (sizeof (uintptr_t) == sizeof (long))
-   sprintf (buf, "0x%lx", (long) (uintptr_t) handle);
- else if (sizeof (uintptr_t) == sizeof (long long))
-   sprintf (buf, "0x%llx", (long long) (uintptr_t) handle);
- else
-   sprintf (buf, "0x%x", (int) (uintptr_t) handle);
- gomp_display_num (buffer, size, &ret, zero, right, sz, buf);
- break;
-   }
-#endif
+ {
+   char buf[3 * (sizeof (handle) + sizeof (uintptr_t) + sizeof (int))
++ 4];
+   /* Handle integral pthread_t.  */
+   __builtin_choose_expr (__builtin_classify_type (handle) == 1,
+  sizeof (handle) == sizeof (long)
+  ? sprintf (buf, "0x%lx", (long) handle)
+  : sizeof (handle) == sizeof (long long)
+  ? sprintf (buf, "0x%llx",
+ (long long) handle)
+  : sprintf (buf, "0x%x", (int) handle),
+  0);
+   /* And pointer pthread_t.  */
+   __builtin_choose_expr (__builtin_classify_type (handle) == 5,
+  sizeof (uintptr_t) == sizeof (long)
+  ? sprintf (buf, "0x%lx",
+ (long) (uintptr_t) handle)
+  : sizeof (uintptr_t) == sizeof (long long)
+  ? sprintf (buf, "0x%llx",
+ (long long) (uintptr_t) handle)
+  : sprintf (buf, "0x%x",
+ (int) (uintptr_t) handle),
+  0);
+   if (__builtin_classify_type (handle) != 1
+   && __builtin_classify_type (handle) != 5)
+ strcpy (buf, "0");
+   gomp_display_num (buffer, size, &ret, zero, right, sz, buf);
+   break;
+ }
+#else
  val = 0;
  goto do_int;
+#endif
case 'A':
  if (sz == (size_t) -1)
gomp_display_affinity_place (buffer, size, &ret,



Jakub


Re: [PATCH 2/3][GCC][AARCH64] Add new -mbranch-protection option to combine pointer signing and BTI

2018-11-09 Thread Sam Tebbs
On 11/02/2018 05:31 PM, Sam Tebbs wrote:

> Hi all,
>
> The -mbranch-protection option combines the functionality of
> -msign-return-address and the BTI features new in Armv8.5 to better reflect
> their relationship. This new option therefore supersedes and deprecates the
> existing -msign-return-address option.
>
> -mbranch-protection=[none|standard|] - Turns on different types of 
> branch
> protection available where:
>
>   * "none": Turn of all types of branch protection
>   * "standard" : Turns on all the types of protection to their respective
> standard levels.
>   *  can be "+" separated protection types:
>
>   * "bti" : Branch Target Identification Mechanism.
>   * "pac-ret{+leaf+b-key}": Return Address Signing. The default return
> address signing is enabled by signing functions that save the return
> address to memory (non-leaf functions will practically always do this)
> using the a-key. The optional tuning arguments allow the user to
> extend the scope of return address signing to include leaf functions
> and to change the key to b-key. The tuning arguments must proceed the
> protection type "pac-ret".
>
> Thus -mbranch-protection=standard -> -mbranch-protection=bti+pac-ret.
>
> Its mapping to -msign-return-address is as follows:
>
>   * -mbranch-protection=none -> -msign-return-address=none
>   * -mbranch-protection=standard -> -msign-return-address=leaf
>   * -mbranch-protection=pac-ret -> -msign-return-address=non-leaf
>   * -mbranch-protection=pac-ret+leaf -> -msign-return-address=all
>
> This patch implements the option's skeleton and the "none", "standard" and
> "pac-ret" types (along with its "leaf" subtype).
>
> The previous patch in this series is here:
> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00103.html
>
> Bootstrapped successfully and tested on aarch64-none-elf with no regressions.
>
> OK for trunk?
>
> gcc/ChangeLog:
>
> 2018-11-02  Sam Tebbs
>
>   * config/aarch64/aarch64.c (BRANCH_PROTEC_STR_MAX,
>   aarch64_parse_branch_protection,
>   struct aarch64_branch_protec_type,
>   aarch64_handle_no_branch_protection,
>   aarch64_handle_standard_branch_protection,
>   aarch64_validate_mbranch_protection,
>   aarch64_handle_pac_ret_protection,
>   aarch64_handle_attr_branch_protection,
>   accepted_branch_protection_string,
>   aarch64_pac_ret_subtypes,
>   aarch64_branch_protec_types,
>   aarch64_handle_pac_ret_leaf): Define.
>   (aarch64_override_options_after_change_1): Add check for
>   accepted_branch_protection_string.
>   (aarch64_override_options): Add check for
>   accepted_branch_protection_string.
>   (aarch64_option_save): Save accepted_branch_protection_string.
>   (aarch64_option_restore): Save
>   accepted_branch_protection_string.
>   * config/aarch64/aarch64.c (aarch64_attributes): Add branch-protection.
>   * config/aarch64/aarch64.opt: Add mbranch-protection. Deprecate
>   msign-return-address.
>   * doc/invoke.texi: Add mbranch-protection.
>
> gcc/testsuite/ChangeLog:
>
> 2018-11-02  Sam Tebbs
>
>   * (gcc.target/aarch64/return_address_sign_1.c,
>   gcc.target/aarch64/return_address_sign_2.c,
>   gcc.target/aarch64/return_address_sign_3.c (__attribute__)): Change
>   option to -mbranch-protection.
>

ping



Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9

2018-11-09 Thread Jakub Jelinek
On Fri, Nov 09, 2018 at 12:04:54PM +0100, Jakub Jelinek wrote:
> On Fri, Nov 09, 2018 at 11:34:48AM +0100, Dominique d'Humières wrote:
> > Bootstrapping r265942 on darwin failed with
> > 
> > In file included from 
> > /Applications/Xcode-6.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk/usr/include/stdio.h:490,
> >  from ../../../work/libgomp/affinity-fmt.c:28:
> > ../../../work/libgomp/affinity-fmt.c: In function 'gomp_display_affinity':
> > ../../../work/libgomp/affinity-fmt.c:369:17: error: format '%x' expects 
> > argument of type 'unsigned int', but argument 5 has type 'long unsigned 
> > int' -Werror=format=]
> >   369 |   sprintf (buf, "0x%x", (uintptr_t) handle);
> >   | ^~  ~~
> 
> Weird, the above prints a line which just isn't there, line 369 is:
> sprintf (buf, "0x%x", (int) handle);
> not
> sprintf (buf, "0x%x", (uintptr_t) handle);
> 
> What is pthread_t on your platform?  An integral type (which one), pointer
> or something different (e.g. an aggregate)?

The "but argument 5" in there is also weird, sprintf has 3 arguments.
Doesn't current gcc emit warnings like:
/tmp/j.c: In function ‘main’:
/tmp/j.c:8:21: warning: format ‘%x’ expects argument of type ‘unsigned int’, 
but argument 3 has type ‘long unsigned int’ [-Wformat=]
8 |   sprintf (buf, "0x%x", l);
  |~^   ~
  | |   |
  | |   long unsigned int
  | unsigned int
  |%lx
?
So, what exact compiler is compiling this code?

Jakub


[PATCH, libgfortran] Replace sync builtins with atomic builtins

2018-11-09 Thread Janne Blomqvist
The old __sync builtins have been deprecated for a long time now in
favor of the __atomic builtins following the C++11/C11 memory model.
This patch converts libgfortran to use the modern __atomic builtins.

At the same time I weakened the consistency to acquire-release as that
should be enough for this particular case.  Jakub, as the original
author of the algorithm, do you concur?

Regtested on x86_64-pc-linux-gnu, Ok for trunk?

libgfortran/ChangeLog:

2018-11-09  Janne Blomqvist  

* acinclude.m4 (LIBGFOR_CHECK_ATOMIC_FETCH_ADD): Rename and test
presence of atomic builtins instead of sync builtins.
* configure.ac (LIBGFOR_CHECK_ATOMIC_FETCH_ADD): Call new test.
* io/io.h (inc_waiting_locked): Use __atomic_fetch_add.
(predec_waiting_locked): Use __atomic_add_fetch.
(dec_waiting_unlocked): Use __atomic_fetch_add.
* config.h.in: Regenerated.
* configure: Regenerated.
---
 libgfortran/acinclude.m4 | 20 ++--
 libgfortran/configure.ac |  4 ++--
 libgfortran/io/io.h  | 12 ++--
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/libgfortran/acinclude.m4 b/libgfortran/acinclude.m4
index dd5429ac0d2..5b0c094e716 100644
--- a/libgfortran/acinclude.m4
+++ b/libgfortran/acinclude.m4
@@ -59,17 +59,17 @@ extern void bar(void) __attribute__((alias("foo")));]],
   [Define to 1 if the target supports __attribute__((alias(...))).])
   fi])
 
-dnl Check whether the target supports __sync_fetch_and_add.
-AC_DEFUN([LIBGFOR_CHECK_SYNC_FETCH_AND_ADD], [
-  AC_CACHE_CHECK([whether the target supports __sync_fetch_and_add],
-libgfor_cv_have_sync_fetch_and_add, [
+dnl Check whether the target supports __atomic_fetch_add.
+AC_DEFUN([LIBGFOR_CHECK_ATOMIC_FETCH_ADD], [
+  AC_CACHE_CHECK([whether the target supports __atomic_fetch_add],
+libgfor_cv_have_atomic_fetch_add, [
   AC_LINK_IFELSE([AC_LANG_PROGRAM([[int foovar = 0;]], [[
-if (foovar <= 0) return __sync_fetch_and_add (&foovar, 1);
-if (foovar > 10) return __sync_add_and_fetch (&foovar, -1);]])],
- libgfor_cv_have_sync_fetch_and_add=yes, 
libgfor_cv_have_sync_fetch_and_add=no)])
-  if test $libgfor_cv_have_sync_fetch_and_add = yes; then
-AC_DEFINE(HAVE_SYNC_FETCH_AND_ADD, 1,
- [Define to 1 if the target supports __sync_fetch_and_add])
+if (foovar <= 0) return __atomic_fetch_add (&foovar, 1, __ATOMIC_ACQ_REL);
+if (foovar > 10) return __atomic_add_fetch (&foovar, -1, 
__ATOMIC_ACQ_REL);]])],
+ libgfor_cv_have_atomic_fetch_add=yes, 
libgfor_cv_have_atomic_fetch_add=no)])
+  if test $libgfor_cv_have_atomic_fetch_add = yes; then
+AC_DEFINE(HAVE_ATOMIC_FETCH_ADD, 1,
+ [Define to 1 if the target supports __atomic_fetch_add])
   fi])
 
 dnl Check for pragma weak.
diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index 76007d38f6f..30ff8734760 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -608,8 +608,8 @@ fi
 LIBGFOR_CHECK_ATTRIBUTE_VISIBILITY
 LIBGFOR_CHECK_ATTRIBUTE_ALIAS
 
-# Check out sync builtins support.
-LIBGFOR_CHECK_SYNC_FETCH_AND_ADD
+# Check out atomic builtins support.
+LIBGFOR_CHECK_ATOMIC_FETCH_ADD
 
 # Check out #pragma weak.
 LIBGFOR_GTHREAD_WEAK
diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 902eb412848..36245cbff1f 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -961,8 +961,8 @@ internal_proto(free_ionml);
 static inline void
 inc_waiting_locked (gfc_unit *u)
 {
-#ifdef HAVE_SYNC_FETCH_AND_ADD
-  (void) __sync_fetch_and_add (&u->waiting, 1);
+#ifdef HAVE_ATOMIC_FETCH_ADD
+  (void) __atomic_fetch_add (&u->waiting, 1, __ATOMIC_ACQ_REL);
 #else
   u->waiting++;
 #endif
@@ -971,8 +971,8 @@ inc_waiting_locked (gfc_unit *u)
 static inline int
 predec_waiting_locked (gfc_unit *u)
 {
-#ifdef HAVE_SYNC_FETCH_AND_ADD
-  return __sync_add_and_fetch (&u->waiting, -1);
+#ifdef HAVE_ATOMIC_FETCH_ADD
+  return __atomic_add_fetch (&u->waiting, -1, __ATOMIC_ACQ_REL);
 #else
   return --u->waiting;
 #endif
@@ -981,8 +981,8 @@ predec_waiting_locked (gfc_unit *u)
 static inline void
 dec_waiting_unlocked (gfc_unit *u)
 {
-#ifdef HAVE_SYNC_FETCH_AND_ADD
-  (void) __sync_fetch_and_add (&u->waiting, -1);
+#ifdef HAVE_ATOMIC_FETCH_ADD
+  (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_ACQ_REL);
 #else
   __gthread_mutex_lock (&unit_lock);
   u->waiting--;
-- 
2.17.1



Re: record_ranges_from_incoming_edge: use value_range API for creating new range

2018-11-09 Thread Richard Biener
On Thu, Nov 8, 2018 at 4:27 PM Jeff Law  wrote:
>
> On 11/8/18 8:14 AM, Richard Biener wrote:
> > On Thu, Nov 8, 2018 at 4:00 PM Aldy Hernandez  wrote:
> >>
> >>
> >>
> >> On 11/8/18 9:56 AM, Richard Biener wrote:
> >>> On Thu, Nov 8, 2018 at 3:54 PM Aldy Hernandez  wrote:
> 
> 
> 
>  On 11/8/18 9:49 AM, Richard Biener wrote:
> > On Thu, Nov 8, 2018 at 3:31 PM Aldy Hernandez  wrote:
> >>
> >>
> >>
> >> On 11/8/18 9:24 AM, Richard Biener wrote:
> >>> On Thu, Nov 8, 2018 at 1:17 PM Aldy Hernandez  
> >>> wrote:
> 
>  This one's rather obvious and does not depend on any get_range_info 
>  API
>  change.
> 
>  OK for trunk?
> >>>
> >>> Hmm, no - that's broken.  IIRC m_equiv are shared bitmaps if you
> >>> do tem = *old_vr so you modify it in place with equiv_clear().
> >>
> >> Good point.
> >>
> >>>
> >>> Thus, operator= should be really deleted or mapped to 
> >>> value_range::set()
> >>> in which case tem = *old_vr would do useless bitmap allocation and
> >>> copying that you then clear.
> >>
> >> Actually, I was thinking that perhaps the assignment and equality
> >> operators should disregard the equivalence bitmap.  In this case, copy
> >> everything EXCEPT the bitmap and set the resulting equivalence bitmap 
> >> to
> >> NULL.
> >
> > I think that's equally confusing.
> 
>  I don't think so.  When you ask whether range A and range B are equal,
>  you're almost always interested in the range itself, not the equivalence
>  table behind it.
> 
>  We could also get rid of the == operator and just provide:
> 
>   bool equal_p(bool ignore_equivs);
> 
>  How does this sound?
> >>>
> >>> Good.
> >>>
> >
> >> It's also annoying to use ::ignore_equivs_equal_p().  Since that seems
> >> to be the predominant way of comparing ranges, perhaps it should be the
> >> default.
> >
> > I think a good approach would be to isolate m_equiv more because it is
> > really an implementation detail of the propagator.  Thus, make
> >
> > class value_range_with_equiv : public value_range
> > {
> > ... all the equiv stuff..
> > }
> >
> > make the lattice of type value_range_with_equiv and see what tickles
> > down.
> >
> > value_range_with_equiv wouldn't implement copy and assignment
> > (too expensive) and value_range can do with the trivial implementation.
> >
> > And most consumers/workers can just work on the equiv-less variants.
> 
>  I like this.  Unfortunately, not feasible for this cycle (for me
>  anyhow-- patches welcome though :)).  How about equal_p() as described
>  above?
> >>>
> >>> Works for me but you still need to sort out the copying, so if you think
> >>> splitting is not feasible (I'll give it a try) then please disable 
> >>> assingment
> >>> and copy operators and fixup code.
> >>
> >> Are you saying you'll try implementing value_range_with_equiv :
> >> value_range?  That would be of great help!
> >
> > Yes, I'll experiment with this.  Meanwhile noticed bogus
> >
> >   /* We're going to need to unwind this range.  We can
> >  not use VR as that's a stack object.  We have to allocate
> >  a new range and push the old range onto the stack.  We
> >  also have to be very careful about sharing the underlying
> >  bitmaps.  Ugh.  */
> >   value_range *new_vr = vr_values->allocate_value_range ();
> >   *new_vr = vr;
> >   new_vr->equiv_clear ();
> I hate this hunk of code.  In fact, it's probably the biggest wart from
> the classification & simplification work last year.
>
> The problem is the local stack object can't be pushed onto the unwinding
> stack -- it'll be destroyed when we leave scope and we end up popping
> total junk later when we restore the range.
>
> You also have to be careful because of the bitmap sharing.

_And_ the abstracted "allocator" doesn't allow initialization.  _And_
the ->set methods are private.

Bummer.

Stupid C++.

Richard.

>
> Jeff


Re: implement value_range::domain_p()

2018-11-09 Thread Richard Biener
On Thu, Nov 8, 2018 at 4:30 PM Aldy Hernandez  wrote:
>
>
>
> On 11/8/18 10:24 AM, Richard Biener wrote:
> > On Thu, Nov 8, 2018 at 4:05 PM Aldy Hernandez  wrote:
> >>
> >>
> >>
> >> On 11/8/18 9:53 AM, Richard Biener wrote:
> >>> On Thu, Nov 8, 2018 at 3:40 PM Aldy Hernandez  wrote:
> 
> 
> 
>  On 11/8/18 9:34 AM, Richard Biener wrote:
> > On Thu, Nov 8, 2018 at 1:52 PM Aldy Hernandez  wrote:
> >>
> >> I believe I've seen this idiom more than once.  I know for sure I've
> >> used it in our ssa-range branch :).  I'll hunt for the other uses and
> >> adjust accordingly.
> >
> > domain_p?!  Isn't that the same as varying_p()?  Also
> 
>  Sigh, yes.  If we kept normalized value_ranges around where [MIN,MAX]
>  degraded into VR_VARYING, then yes.  But alas, we have two ways of
>  representing the entire domain.  Don't look at me.  That crap was
>  already there :).
> 
>  Another approach would be to call ::set_and_canonicalize() before
>  checking varying_p() and teach the canonicalize function that [MIN, MAX]
>  is VR_VARYING.  How does that sound?
> >>>
> >>> But that's still broken for the case where it has equivalences.  I fear 
> >>> that
> >>> if you look at users we'll end up with three or more different
> >>> varying_p/domain_p
> >>> things all being subtly different...
> >>
> >> Bah, I give up.  I don't have time to look into possible subtleties wrt
> >> equivalences right now.  I'll drop this patch.
> >>
> >>>
> >>> As said in the other thread we should see to separate equivs out
> >>> of the way.
> >>
> >> And as I meant to say in the other thread, I'll buy you many beers if
> >> you can do this for this release :).
> >
> > Well, yall made a mess out of the nicely contained VRP, and topped
> > it with C++.
> >
> > Now it's ... a mess.
>
> You wish!  VRP has been a mess for quite a long time.
>
> >
> > And for whatever reason all the C++-ification and refactoring had to happen
> > for GCC 9 :/
>
> You're gonna absolutely love what we have in store for GCC 10 ;-).

I believe it when I see it.  Oh wait - that was sarcastic!

Richard.

> Aldy


Re: [C++ PATCH 1/3] Fix various latent issues revealed by P0732 work.

2018-11-09 Thread Christophe Lyon
On Fri, 9 Nov 2018 at 11:32, Kyrill Tkachov  wrote:
>
>
> On 09/11/18 10:28, Matthew Malcomson wrote:
> > (Third attempt to put this on the mailing list now -- today is not a good 
> > day for my email skills :-[)
> >
> > Hi there,
> >
> > This patch has caused a few g++ and libstdc++ regression test failures on 
> > arm,
> >
> > I've included the g++ failures below.
> >
> > Do you mind looking into this?
> > Cheers,
> >
> > Matthew
> >
> >
> > dg-cmp-results.sh: Verbosity is 2, Variant is ""
> >
> > Older log file: just-before-problemcase-g++.sum
> > Test Run By matthew on Thu Nov  8 16:45:40 2018
> > Target is arm-none-eabi
> > Host   is x86_64-pc-linux-gnu
> >
> > Newer log file: g++-after-problem.sum
> > Test Run By matthew on Thu Nov  8 21:36:14 2018
> > Target is arm-none-eabi
> > Host   is x86_64-pc-linux-gnu
> >
> > PASS->FAIL: g++.dg/cpp0x/constexpr-static12.C  -std=c++14 
> > scan-assembler-not _ZNSt10unique_ptrC1Ei
> > PASS->FAIL: g++.dg/cpp0x/constexpr-static12.C  -std=c++17 
> > scan-assembler-not _ZNSt10unique_ptrC1Ei
> > PASS->FAIL: g++.dg/cpp0x/constexpr-static3.C -std=gnu++14 
> > scan-assembler-not static_initialization
> > PASS->FAIL: g++.dg/cpp0x/constexpr-static3.C -std=gnu++17 
> > scan-assembler-not static_initialization
> > PASS->FAIL: g++.dg/cpp0x/constexpr-static.C  -std=gnu++14 
> > scan-assembler-not static_initialization
> > PASS->FAIL: g++.dg/cpp0x/constexpr-static.C  -std=gnu++17 
> > scan-assembler-not static_initialization
> > PASS->FAIL: g++.dg/cpp0x/implicit13.C  -std=c++14 scan-assembler-not 
> > _ZN1BC1Ev
> > PASS->FAIL: g++.dg/cpp0x/implicit13.C  -std=c++17 scan-assembler-not 
> > _ZN1BC1Ev
> > NA->FAIL: g++.dg/cpp0x/pr83734.C  -std=gnu++14 (internal compiler error)
> > PASS->FAIL: g++.dg/cpp0x/pr83734.C  -std=gnu++14 (test for excess errors)
> > NA->FAIL: g++.dg/cpp0x/pr83734.C  -std=gnu++17 (internal compiler error)
> > PASS->FAIL: g++.dg/cpp0x/pr83734.C  -std=gnu++17 (test for excess errors)
> > PASS->FAIL: g++.dg/cpp1y/constexpr-empty3.C  -std=c++14 scan-assembler-not 
> > static_init
> > PASS->FAIL: g++.dg/cpp1y/constexpr-empty3.C  -std=c++17 scan-assembler-not 
> > static_init
> >
> >
>
> I believe these are tracked in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87935 FWIW.
>

Yes, that was my intention.

> Kyrill
>
> > On 05/11/18 02:06, Jason Merrill wrote:
> > The initialized_type hunk fixes handling of void AGGR_INIT_EXPRs that call a
> > non-constructor; an AGGR_INIT_EXPR can have void type if its initialization
> > semantics are more complicated than just expanding the call.
> >
> > The cxx_eval_vec_init_1 hunk corrects AGGR_INIT_EXPRs that were
> > nonsensically built to initialize an object of void type. And the
> > build_aggr_init_expr hunk makes sure we don't do that again.
> >
> > The ocp_convert and cxx_eval_outermost_constant_expr hunks deal with making
> > sure that a constant CONSTRUCTOR has the right type.
> >
> > Tested x86_64-pc-linux-gnu, applying to trunk.
> >
> > * cvt.c (ocp_convert): Don't wrap a CONSTRUCTOR in a NOP_EXPR.
> > * constexpr.c (initialized_type): Fix AGGR_INIT_EXPR handling.
> > (cxx_eval_vec_init_1): Correct type of AGGR_INIT_EXPR.
> > (cxx_eval_outermost_constant_expr): Make sure a CONSTRUCTOR has the
> > right type.  Don't wrap a CONSTRUCTOR if one was passed in.
> > * tree.c (build_aggr_init_expr): Check for void.
> > ---
> >   gcc/cp/constexpr.c | 22 +-
> >   gcc/cp/cvt.c   | 10 +-
> >   gcc/cp/tree.c  |  2 ++
> >   gcc/cp/ChangeLog   |  7 +++
> >   4 files changed, 31 insertions(+), 10 deletions(-)
> >
> > diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
> > index 7692b1727da..4fb1ba527e3 100644
> > --- a/gcc/cp/constexpr.c
> > +++ b/gcc/cp/constexpr.c
> > @@ -2778,8 +2778,10 @@ initialized_type (tree t)
> >   {
> > if (TYPE_P (t))
> >   return t;
> > -  tree type = cv_unqualified (TREE_TYPE (t));
> > -  if (TREE_CODE (t) == CALL_EXPR || TREE_CODE (t) == AGGR_INIT_EXPR)
> > +  tree type = TREE_TYPE (t);
> > +  if (!VOID_TYPE_P (type))
> > +/* No need to look deeper.  */;
> > +  else if (TREE_CODE (t) == CALL_EXPR)
> >   {
> > /* A constructor call has void type, so we need to look deeper.  */
> > tree fn = get_function_named_in_call (t);
> > @@ -2787,7 +2789,9 @@ initialized_type (tree t)
> > && DECL_CXX_CONSTRUCTOR_P (fn))
> >   type = DECL_CONTEXT (fn);
> >   }
> > -  return type;
> > +  else if (TREE_CODE (t) == AGGR_INIT_EXPR)
> > +type = TREE_TYPE (AGGR_INIT_EXPR_SLOT (t));
> > +  return cv_unqualified (type);
> >   }
> > /* We're about to initialize element INDEX of an array or class from 
> > VALUE.
> > @@ -3000,7 +3004,7 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree 
> > atype, tree init,
> >   &argvec, elttype, LOOKUP_NORMAL,
> >   complain);
> > release_tree_vector (argvec);
> > -  init = build_aggr_init_expr (TREE_TYPE (init

Re: cleanups and unification of value_range dumping code

2018-11-09 Thread Richard Biener
On Thu, Nov 8, 2018 at 5:03 PM Aldy Hernandez  wrote:
>
>
>
> On 11/8/18 9:39 AM, Richard Biener wrote:
> > On Thu, Nov 8, 2018 at 2:32 PM Aldy Hernandez  wrote:
> >>
> >> [Richard, you're right.  An overloaded debug() is better than
> >> this->dump().  Anyone who thinks different is wrong.  End of discussion.]
> >>
> >> There's no reason to have multiple ways of dumping a value range.  And
> >> I'm not even talking about ipa-prop.* which decided that they should
> >> implement their own version of value_ranges basically to annoy me.
> >> Attached is a patch to remedy this.
> >>
> >> I have made three small user-visible changes to the dumping code--
> >> cause, well... you know me... I can't leave things alone.
> >>
> >> 1. It *REALLY* irks me that bools are dumped with -INF/+INF.  Everyone
> >> knows that bools should be 0/1.  It's in the Bible.  Look it up.
> >>
> >> 2. Analogous to #1, what's up with signed 1-bit fields?  Who understands
> >> [0, +INF] as [0, 0]?  No one; that's right.  (It's still beyond me the
> >> necessity of signed bit fields in the standard period, but I'll pick my
> >> battles.)
> >>
> >> 3. I find it quite convenient to display the tree type along with the
> >> range as:
> >>
> >>  [1, 1] int EQUIVALENCES { me, myself, I }
> >
> > the type inbetween the range and equivalences?  seriously?  If so then
> > _please_ dump the type before the range:
> >
> >int [1, 1] EQUIV { ... }
>
> Done in attached patch.
>
> >
> >> Finally.. As mentioned, I've implement an overloaded debug() because
> >> it's *so* nice to use gdb and an alias of:
> >>
> >>  define dd
> >>call debug($arg0)
> >>  end
> >>
> >> ...and have stuff just work.
> >>
> >> I do realize that we still have dump() lying around.  At some point, we
> >> should start dropping external access to the dump methods for objects
> >> that are never dumped by the compiler (but by the user at debug time).
> >
> > +DEBUG_FUNCTION void
> > +debug (const value_range *vr)
> > +{
> > +  dump_value_range (stderr, vr);
> > +}
> > +
> > +DEBUG_FUNCTION void
> > +debug (const value_range vr)
> >
> > ^^^ missing a & (const reference?)
> >
> > +{
> > +  dump_value_range (stderr, &vr);
> > +}
>
> Fixed.
>
> >
> > also
> >
> > @@ -2122,21 +2122,11 @@ dump_ssaname_info (pretty_printer *buffer,
> > tree node, int spc)
> > if (!POINTER_TYPE_P (TREE_TYPE (node))
> > && SSA_NAME_RANGE_INFO (node))
> >   {
> > -  wide_int min, max, nonzero_bits;
> > -  value_range_kind range_type = get_range_info (node, &min, &max);
> > +  value_range vr;
> >
> > -  if (range_type == VR_VARYING)
> > -   pp_printf (buffer, "# RANGE VR_VARYING");
> > -  else if (range_type == VR_RANGE || range_type == VR_ANTI_RANGE)
> > -   {
> > - pp_printf (buffer, "# RANGE ");
> > - pp_printf (buffer, "%s[", range_type == VR_RANGE ? "" : "~");
> > - pp_wide_int (buffer, min, TYPE_SIGN (TREE_TYPE (node)));
> > - pp_printf (buffer, ", ");
> > - pp_wide_int (buffer, max, TYPE_SIGN (TREE_TYPE (node)));
> > - pp_printf (buffer, "]");
> > -   }
> > -  nonzero_bits = get_nonzero_bits (node);
> > +  get_range_info (node, vr);
> >
> > this is again allocating INTEGER_CSTs for no good reason.  dumping really
> > shuld avoid possibly messing with the GC state.
>
> H, I'd really hate to have two distinct places that dump value
> ranges.  Is this really a problem?  Cause the times I've run into GC
> problems I'd had to resort to printf() anyhow...

Well, if you can avoid GC avoid it.  If you are in the midst of debugging
you certainly do not want your inferior calls mess with the GC state.

> And while we're on the subject of multiple implementations, I'm thinking
> of rewriting ipa-prop to actually use value_range's, not this:
>
> struct GTY(()) ipa_vr
> {
>/* The data fields below are valid only if known is true.  */
>bool known;
>enum value_range_kind type;
>wide_int min;
>wide_int max;
> };
>
> so perhaps:
>
> struct { bool known; value_range vr; }

Size is of great matter here so m_equiv stands in the way here.  Also that
would exchange wide-ints for trees.  Were trees not bad?  Andrew?!?!

> Remember that trees are cached, whereas wide_int's are not, and folks
> (not me :)) like to complain that wide_int's take a lot of space.

There's trailing-wide-ints for this issue.

> Would you be viscerally opposed to implementing ipa_vr with
> value_range's?  I really hate having to deal with yet another
> value_range variant.

Well, fact is that the SSA annotations have integer constant ranges
only, and those are transfered by IPA.  Exchanging that for the
fat internal representation of VRP is stupid.

You started separating out that wide-int-range stuff and _this_ is
what the SSA annotations / IPA should use.

The value_range class from VRP should become more internal again.

I guess I'm not seeing the end of the road you are currently w

[PATCH v2] [aarch64] Correct the maximum shift amount for shifted operands.

2018-11-09 Thread christoph . muellner
From: Christoph Muellner 

The aarch64 ISA specification allows a left shift amount to be applied
after extension in the range of 0 to 4 (encoded in the imm3 field).

This is true for at least the following instructions:

 * ADD (extend register)
 * ADDS (extended register)
 * SUB (extended register)

The result of this patch can be seen, when compiling the following code:

uint64_t myadd(uint64_t a, uint64_t b)
{
  return a+(((uint8_t)b)<<4);
}

Without the patch the following sequence will be generated:

 :
   0:   d37c1c21ubfiz   x1, x1, #4, #8
   4:   8b20add x0, x1, x0
   8:   d65f03c0ret

With the patch the ubfiz will be merged into the add instruction:

 :
   0:   8b211000add x0, x0, w1, uxtb #4
   4:   d65f03c0ret

Tested with "make check" and no regressions found.

*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner  
Philipp Tomsich  

* config/aarch64/aarch64.c (aarch64_uxt_size): Correct the maximum
shift amount for shifted operands.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Christoph Muellner  
Philipp Tomsich  

* gcc.target/aarch64/extend.c: Adjust the testcases to cover
the changed shift amount.
---
 gcc/config/aarch64/aarch64.c  |  2 +-
 gcc/testsuite/gcc.target/aarch64/extend.c | 16 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c82c7b6..c85988a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8190,7 +8190,7 @@ aarch64_output_casesi (rtx *operands)
 int
 aarch64_uxt_size (int shift, HOST_WIDE_INT mask)
 {
-  if (shift >= 0 && shift <= 3)
+  if (shift >= 0 && shift <= 4)
 {
   int size;
   for (size = 8; size <= 32; size *= 2)
diff --git a/gcc/testsuite/gcc.target/aarch64/extend.c 
b/gcc/testsuite/gcc.target/aarch64/extend.c
index f399e55..7986c5b 100644
--- a/gcc/testsuite/gcc.target/aarch64/extend.c
+++ b/gcc/testsuite/gcc.target/aarch64/extend.c
@@ -32,8 +32,8 @@ ldr_sxtw0 (char *arr, int i)
 unsigned long long
 adddi_uxtw (unsigned long long a, unsigned int i)
 {
-  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw #?3" } } */
-  return a + ((unsigned long long)i << 3);
+  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw #?4" } } */
+  return a + ((unsigned long long)i << 4);
 }
 
 unsigned long long
@@ -46,8 +46,8 @@ adddi_uxtw0 (unsigned long long a, unsigned int i)
 long long
 adddi_sxtw (long long a, int i)
 {
-  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw #?3" } } */
-  return a + ((long long)i << 3);
+  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw #?4" } } */
+  return a + ((long long)i << 4);
 }
 
 long long
@@ -60,8 +60,8 @@ adddi_sxtw0 (long long a, int i)
 unsigned long long
 subdi_uxtw (unsigned long long a, unsigned int i)
 {
-  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw #?3" } } */
-  return a - ((unsigned long long)i << 3);
+  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw #?4" } } */
+  return a - ((unsigned long long)i << 4);
 }
 
 unsigned long long
@@ -74,8 +74,8 @@ subdi_uxtw0 (unsigned long long a, unsigned int i)
 long long
 subdi_sxtw (long long a, int i)
 {
-  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw #?3" } } */
-  return a - ((long long)i << 3);
+  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw #?4" } } */
+  return a - ((long long)i << 4);
 }
 
 long long
-- 
2.9.5



Re: [RFC][PATCH]Merge VEC_COND_EXPR into MASK_STORE after loop vectorization

2018-11-09 Thread Richard Biener
On Thu, Nov 8, 2018 at 5:55 PM Renlin Li  wrote:
>
> Hi Richard,
>
> On 11/08/2018 12:09 PM, Richard Biener wrote:
> > On Thu, Nov 8, 2018 at 12:02 PM Renlin Li  wrote:
> >>
> >> Hi all,
> >>
> >> When allow-store-data-races is enabled, ifcvt would prefer to generated
> >> conditional select and unconditional store to convert certain if statement
> >> into:
> >>
> >> _ifc_1 = val
> >> _ifc_2 = A[i]
> >> val = cond? _ifc_1 : _ifc_2
> >> A[i] = val
> >>
> >> When the loop got vectorized, this pattern will be turned into
> >> MASK_LOAD, VEC_COND_EXPR and MASK_STORE. This could be improved.
> >
> > I'm somewhat confused - the vectorizer doesn't generate a masked store when
> > if-conversion didn't create one in the first place
> >
> > In particular with allow-store-data-races=1 (what your testcase uses)
> > there are no
> > masked loads/stores generated at all.   So at least you need a better 
> > testcase
> > to motivate (one that doesn't load from array[i] so that we know the 
> > conditional
> > stores might trap).
>
> Thanks for trying this. The test case is a little bit simple and artificial.
> ifcvt won't generate mask_store, instead it will generate unconditional store 
> with allow-store-data-races=1.
>
> My build is based on 25th Oct. I got the following IR from ifcvt with
> aarch64-none-elf-gcc -S -march=armv8-a+sve -O2 -ftree-vectorize --param 
> allow-store-data-races=1
>
> [local count: 1006632961]:
># i_20 = PHI 
># ivtmp_18 = PHI 
>a_10 = array[i_20];
>_1 = a_10 & 1;
>_2 = a_10 + 1;
>_ifc__32 = array[i_20];
>_ifc__33 = _2;
>_ifc__34 = _1 != 0 ? _ifc__33 : _ifc__32;
>array[i_20] = _ifc__34;
>prephitmp_8 = _1 != 0 ? _2 : a_10;
>_4 = a_10 + 2;
>_ifc__35 = array[i_20];
>_ifc__36 = _4;
>_ifc__37 = prephitmp_8 > 10 ? _ifc__36 : _ifc__35;
>array[i_20] = _ifc__37;
>i_13 = i_20 + 1;
>ivtmp_5 = ivtmp_18 - 1;
>if (ivtmp_5 != 0)
>  goto ; [93.33%]
>else
>  goto ; [6.67%]
>
> *However*, after I rebased my patch on the latest trunk.
> Got the following dump from ifcvt:
> [local count: 1006632961]:
># i_20 = PHI 
># ivtmp_18 = PHI 
>a_10 = array[i_20];
>_1 = a_10 & 1;
>_2 = a_10 + 1;
>_ifc__34 = _1 != 0 ? _2 : a_10;
>array[i_20] = _ifc__34;
>_4 = a_10 + 2;
>_ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
>array[i_20] = _ifc__37;
>i_13 = i_20 + 1;
>ivtmp_5 = ivtmp_18 - 1;
>if (ivtmp_5 != 0)
>  goto ; [93.33%]
>else
>  goto ; [6.67%]
>
> the redundant load is not generated, but you could still see the 
> unconditional store.

Yes, I fixed the redundant loads recently and indeed dead stores
remain (for the particular
testcase they would be easy to remove).

> After loop vectorization, the following is generated (without my change):

Huh.  But that's not because of if-conversion but because SVE needs to
mask _all_
loop operations that are not safe to execute with the loop_mask!

>vect_a_10.6_6 = .MASK_LOAD (vectp_array.4_35, 4B, loop_mask_7);
>a_10 = array[i_20];
>vect__1.7_39 = vect_a_10.6_6 & vect_cst__38;
>_1 = a_10 & 1;
>vect__2.8_41 = vect_a_10.6_6 + vect_cst__40;
>_2 = a_10 + 1;
>vect__ifc__34.9_43 = VEC_COND_EXPR  vect__2.8_41, vect_a_10.6_6>;
>_ifc__34 = _1 != 0 ? _2 : a_10;
>.MASK_STORE (vectp_array.10_45, 4B, loop_mask_7, vect__ifc__34.9_43);
>vect__4.12_49 = vect_a_10.6_6 + vect_cst__48;
>_4 = a_10 + 2;
>vect__ifc__37.13_51 = VEC_COND_EXPR  vect_cst__50, 
> vect__4.12_49, vect__ifc__34.9_43>;
>_ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
>.MASK_STORE (vectp_array.14_53, 4B, loop_mask_7, vect__ifc__37.13_51);
>
> With the old ifcvt code, my change here could improve it a little bit, 
> eliminate some redundant load.
> With the new code, it could not improved it further. I'll adjust the patch 
> based on the latest trunk.

So what does the patch change the above to?  The code has little to no
comments apart from a
small picture with code _before_ the transform...

I was wondering whether we can implement

  l = [masked]load;
  tem = cond ? x : l;
  masked-store = tem;

pattern matching in a regular pass - forwprop for example.  Note the
load doesn't need to be masked,
correct?  In fact if it is masked you need to make sure the
conditional never accesses parts that
are masked in the load, no?  Or require the mask to be the same as
that used by the store.  But then
you still cannot simply replace the store mask with a new mask
generated from the conditional?

Richard.

>
> >
> > So what I see (with store data races not allowed) from ifcvt is
>
> when store data races is not allowed, we won't generate unconditional store. 
> Instead ifcvt
> generates predicated store. That's what you showed here.
>
> As I mentioned, we could always make ifcvt generate mask_store as it should 
> be always safe.
> But I don't know the performance implication on other targets (I assume there 
> must be reasons why
> people write

Re: record_ranges_from_incoming_edge: use value_range API for creating new range

2018-11-09 Thread Aldy Hernandez
Patches welcome!

On Fri, Nov 9, 2018, 12:30 Richard Biener  On Thu, Nov 8, 2018 at 4:27 PM Jeff Law  wrote:
> >
> > On 11/8/18 8:14 AM, Richard Biener wrote:
> > > On Thu, Nov 8, 2018 at 4:00 PM Aldy Hernandez 
> wrote:
> > >>
> > >>
> > >>
> > >> On 11/8/18 9:56 AM, Richard Biener wrote:
> > >>> On Thu, Nov 8, 2018 at 3:54 PM Aldy Hernandez 
> wrote:
> > 
> > 
> > 
> >  On 11/8/18 9:49 AM, Richard Biener wrote:
> > > On Thu, Nov 8, 2018 at 3:31 PM Aldy Hernandez 
> wrote:
> > >>
> > >>
> > >>
> > >> On 11/8/18 9:24 AM, Richard Biener wrote:
> > >>> On Thu, Nov 8, 2018 at 1:17 PM Aldy Hernandez 
> wrote:
> > 
> >  This one's rather obvious and does not depend on any
> get_range_info API
> >  change.
> > 
> >  OK for trunk?
> > >>>
> > >>> Hmm, no - that's broken.  IIRC m_equiv are shared bitmaps if you
> > >>> do tem = *old_vr so you modify it in place with equiv_clear().
> > >>
> > >> Good point.
> > >>
> > >>>
> > >>> Thus, operator= should be really deleted or mapped to
> value_range::set()
> > >>> in which case tem = *old_vr would do useless bitmap allocation
> and
> > >>> copying that you then clear.
> > >>
> > >> Actually, I was thinking that perhaps the assignment and equality
> > >> operators should disregard the equivalence bitmap.  In this case,
> copy
> > >> everything EXCEPT the bitmap and set the resulting equivalence
> bitmap to
> > >> NULL.
> > >
> > > I think that's equally confusing.
> > 
> >  I don't think so.  When you ask whether range A and range B are
> equal,
> >  you're almost always interested in the range itself, not the
> equivalence
> >  table behind it.
> > 
> >  We could also get rid of the == operator and just provide:
> > 
> >   bool equal_p(bool ignore_equivs);
> > 
> >  How does this sound?
> > >>>
> > >>> Good.
> > >>>
> > >
> > >> It's also annoying to use ::ignore_equivs_equal_p().  Since that
> seems
> > >> to be the predominant way of comparing ranges, perhaps it should
> be the
> > >> default.
> > >
> > > I think a good approach would be to isolate m_equiv more because
> it is
> > > really an implementation detail of the propagator.  Thus, make
> > >
> > > class value_range_with_equiv : public value_range
> > > {
> > > ... all the equiv stuff..
> > > }
> > >
> > > make the lattice of type value_range_with_equiv and see what
> tickles
> > > down.
> > >
> > > value_range_with_equiv wouldn't implement copy and assignment
> > > (too expensive) and value_range can do with the trivial
> implementation.
> > >
> > > And most consumers/workers can just work on the equiv-less
> variants.
> > 
> >  I like this.  Unfortunately, not feasible for this cycle (for me
> >  anyhow-- patches welcome though :)).  How about equal_p() as
> described
> >  above?
> > >>>
> > >>> Works for me but you still need to sort out the copying, so if you
> think
> > >>> splitting is not feasible (I'll give it a try) then please disable
> assingment
> > >>> and copy operators and fixup code.
> > >>
> > >> Are you saying you'll try implementing value_range_with_equiv :
> > >> value_range?  That would be of great help!
> > >
> > > Yes, I'll experiment with this.  Meanwhile noticed bogus
> > >
> > >   /* We're going to need to unwind this range.  We can
> > >  not use VR as that's a stack object.  We have to
> allocate
> > >  a new range and push the old range onto the stack.  We
> > >  also have to be very careful about sharing the
> underlying
> > >  bitmaps.  Ugh.  */
> > >   value_range *new_vr = vr_values->allocate_value_range ();
> > >   *new_vr = vr;
> > >   new_vr->equiv_clear ();
> > I hate this hunk of code.  In fact, it's probably the biggest wart from
> > the classification & simplification work last year.
> >
> > The problem is the local stack object can't be pushed onto the unwinding
> > stack -- it'll be destroyed when we leave scope and we end up popping
> > total junk later when we restore the range.
> >
> > You also have to be careful because of the bitmap sharing.
>
> _And_ the abstracted "allocator" doesn't allow initialization.  _And_
> the ->set methods are private.
>
> Bummer.
>
> Stupid C++.
>
> Richard.
>
> >
> > Jeff
>


Re: PR83750: CSE erf/erfc pair

2018-11-09 Thread Richard Biener
On Fri, Nov 9, 2018 at 8:11 AM Prathamesh Kulkarni
 wrote:
>
> On Tue, 6 Nov 2018 at 16:04, Richard Biener  
> wrote:
> >
> > On Mon, Nov 5, 2018 at 3:11 PM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 5 Nov 2018 at 18:14, Richard Biener  
> > > wrote:
> > > >
> > > > On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Mon, 5 Nov 2018 at 15:10, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > > This patch adds two transforms to match.pd to CSE erf/erfc pair.
> > > > > > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> > > > > > > erf(x) when canonicalization is disabled and result of erf(x) has
> > > > > > > single use within 1 - erf(x).
> > > > > > >
> > > > > > > The patch regressed builtin-nonneg-1.c. The following test-case
> > > > > > > reproduces the issue with patch:
> > > > > > >
> > > > > > > void test(double d1) {
> > > > > > >   if (signbit(erfc(d1)))
> > > > > > > link_failure_erfc();
> > > > > > > }
> > > > > > >
> > > > > > > ssa dump:
> > > > > > >
> > > > > > >:
> > > > > > >   _5 = __builtin_erf (d1_4(D));
> > > > > > >   _1 = 1.0e+0 - _5;
> > > > > > >   _6 = _1 < 0.0;
> > > > > > >   _2 = (int) _6;
> > > > > > >   if (_2 != 0)
> > > > > > > goto ; [INV]
> > > > > > >   else
> > > > > > > goto ; [INV]
> > > > > > >
> > > > > > >:
> > > > > > >   link_failure_erfc ();
> > > > > > >
> > > > > > >:
> > > > > > >   return;
> > > > > > >
> > > > > > > As can be seen, erfc(d1) is folded to 1 - erf(d1).
> > > > > > > forwprop then transforms the if condition from _2 != 0
> > > > > > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> > > > > > > in undefined reference to link_failure_erfc().
> > > > > > >
> > > > > > > So, the patch adds another transform erf(x) > 1 -> 0.
> > > > > >
> > > > > > Ick.
> > > > > >
> > > > > > Why not canonicalize erf (x) to 1-erfc(x) instead?
> > > > > Sorry I didn't quite follow, won't this cause similar issue with erf ?
> > > > > I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)
> > > > > and 1 - erfc(x) -> erf(x) after canonicalization is disabled.
> > > > >
> > > > > This caused undefined reference to link_failure_erf() in following 
> > > > > test-case:
> > > > >
> > > > > extern int signbit(double);
> > > > > extern void link_failure_erf(void);
> > > > > extern double erf(double);
> > > > >
> > > > > void test(double d1) {
> > > > >   if (signbit(erf(d1)))
> > > > > link_failure_erf();
> > > > > }
> > > >
> > > > But that's already not optimized without any canonicalization
> > > > because erf returns sth in range [-1, 1].
> > > >
> > > > I suggested the change because we have limited support for FP
> > > > value-ranges and nonnegative is one thing we can compute
> > > > (and erfc as opposed to erf is nonnegative).
> > > Ah right, thanks for the explanation.
> > > Unfortunately this still regresses builtin-nonneg-1.c, which can be
> > > reproduced with following test-case:
> > >
> > > extern int signbit(double);
> > > extern void link_failure_erf(void);
> > > extern double erf(double);
> > > extern double fabs(double);
> > >
> > > void test(double d1) {
> > >   if (signbit(erf(fabs(d1
> > > link_failure_erf();
> > > }
> > >
> > > signbit(erf(fabs(d1)) is transformed to 0 without patch but with patch
> > > it gets canonicalized to signbit(1 - erfc(fabs(d1))) which similarly
> > > defeats DCE.
> > >
> > > forwprop1 shows:
> > >  :
> > >   _1 = ABS_EXPR ;
> > >   _6 = __builtin_erfc (_1);
> > >   _2 = 1.0e+0 - _6;
> > >   _7 = _6 > 1.0e+0;
> > >   _3 = (int) _7;
> > >   if (_6 > 1.0e+0)
> > > goto ; [INV]
> > >   else
> > > goto ; [INV]
> > >
> > >:
> > >   link_failure_erf ();
> > >
> > >:
> > >   return;
> > >
> > > I assume we would need to somehow tell gcc that the canonicalized
> > > expression 1 - erfc(x) would not exceed 1.0 ?
> > > Is there a better way to do that apart from defining pattern (1 -
> > > erfc(x)) > 1.0 -> 0
> > > which I agree doesn't look ideal to add in match.pd ?
> >
> > You could handle a MINUS_EXPR of 1 and erfc() in
> > gimple_assign_nonnegative_warnv_p (I wouldn't bother
> > to do it for tree_binary_nonnegative_warnv_p)
> >
> Um won't the value range of 1 - erfc(x) be [-1.0, 1.0] (same as
> erf(x)), so would adding it to gimple_assign_nonnegative_warnv_p be
> correct ?

Under the same condition as erf, when the argument is positive.
This is what the testcase is testing.

> erf(fabs(x)) is non-negative and between [0.0, 1.0] although I am not
> sure if 1 - erfc(fabs(x)) would be always non-negative too if
> erfc(fabs(x)) can exceed 1.0 for some value of x ?
> AFAIU for the above case, gcc doesn't know that 1 - erfc(x) cannot
> exceed 1.0, and that's because we don't propagate value range for
> floats as you pointed out.
>
> Thanks,
> Prathamesh
> > This is of cours

Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

2018-11-09 Thread Richard Biener
On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
 wrote:
>
> Hi all,
>
> In this testcase the codegen for VLA SVE is worse than it could be due to 
> unrolling:
>
> fully_peel_me:
>  mov x1, 5
>  ptrue   p1.d, all
>  whilelo p0.d, xzr, x1
>  ld1dz0.d, p0/z, [x0]
>  faddz0.d, z0.d, z0.d
>  st1dz0.d, p0, [x0]
>  cntdx2
>  addvl   x3, x0, #1
>  whilelo p0.d, x2, x1
>  beq .L1
>  ld1dz0.d, p0/z, [x0, #1, mul vl]
>  faddz0.d, z0.d, z0.d
>  st1dz0.d, p0, [x3]
>  cntwx2
>  incbx0, all, mul #2
>  whilelo p0.d, x2, x1
>  beq .L1
>  ld1dz0.d, p0/z, [x0]
>  faddz0.d, z0.d, z0.d
>  st1dz0.d, p0, [x0]
> .L1:
>  ret
>
> In this case, due to the vector-length-agnostic nature of SVE the compiler 
> doesn't know the loop iteration count.
> For such loops we don't want to unroll if we don't end up eliminating 
> branches as this just bloats code size
> and hurts icache performance.
>
> This patch introduces a new unroll-known-loop-iterations-only param that 
> disables cunroll when the loop iteration
> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE 
> VLA code, but it does help some
> Advanced SIMD cases as well where loops with an unknown iteration count are 
> not unrolled when it doesn't eliminate
> the branches.
>
> So for the above testcase we generate now:
> fully_peel_me:
>  mov x2, 5
>  mov x3, x2
>  mov x1, 0
>  whilelo p0.d, xzr, x2
>  ptrue   p1.d, all
> .L2:
>  ld1dz0.d, p0/z, [x0, x1, lsl 3]
>  faddz0.d, z0.d, z0.d
>  st1dz0.d, p0, [x0, x1, lsl 3]
>  incdx1
>  whilelo p0.d, x1, x3
>  bne .L2
>  ret
>
> Not perfect still, but it's preferable to the original code.
> The new param is enabled by default on aarch64 but disabled for other 
> targets, leaving their behaviour unchanged
> (until other target people experiment with it and set it, if appropriate).
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in 
> performance.
>
> Ok for trunk?

Hum.  Why introduce a new --param and not simply key on
flag_peel_loops instead?  That is
enabled by default at -O3 and with FDO but you of course can control
that in your targets
post-option-processing hook.

It might also make sense to have more fine-grained control for this
and allow a target
to say whether it wants to peel a specific loop or not when the
middle-end thinks that
would be profitable.

Richard.

> Thanks,
> Kyrill
>
>
> 2018-11-09  Kyrylo Tkachov  
>
> * params.def (PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY): Define.
> * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Use above to
> disable unrolling on unknown iteration count.
> * config/aarch64/aarch64.c (aarch64_override_options_internal): Set
> PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY to 1.
> * doc/invoke.texi (--param unroll-known-loop-iterations-only):
> Document.
>
> 2018-11-09  Kyrylo Tkachov  
>
> * gcc.target/aarch64/sve/unroll-1.c: New test.
>


Re: introduce --enable-mingw-full32 to default to --large-address-aware

2018-11-09 Thread JonY
On 11/09/2018 10:48 AM, Alexandre Oliva wrote:
> On Nov  8, 2018, JonY <10wa...@gmail.com> wrote:
> 
>> No, no. By quick I just mean using -Wl,--large-address-aware on an
>> existing gcc install, nothing complex. Sorry about not making it clear.
> 
> Ah, good!
> 
>> I now understand the problem, thanks for the clarification about the
>> patch. Patch is OK.
> 
> Thanks, here's what I'm installing (with comments indicating which
> configuration each setting applies to, and why one of them won't work)
> 
> 
> large-addr-aware for biarch non-w64 mingw32
> 
> for  gcc/ChangeLog
> 
>   * config/i386/mingw32.h (LINK_SPEC_LARGE_ADDR_AWARE): Adjust
> for non-w64 x86_64 biarch.


This patch is certainly better with the explanation, OK for trunk.





signature.asc
Description: OpenPGP digital signature


[PATCH] Fix PR87953

2018-11-09 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-11-09  Richard Biener  

PR tree-optimization/87953
* tree-vect-loop.c (vectorizable_reduction): For analysis
always pass ops[0] to vectorizable_condition.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 22bed26609b..dad6524ee2b 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6525,9 +6525,10 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
   if (code == COND_EXPR)
 {
   /* Only call during the analysis stage, otherwise we'll lose
-STMT_VINFO_TYPE.  */
+STMT_VINFO_TYPE.  We'll pass ops[0] as reduc_op, it's only
+used as a flag during analysis.  */
   if (!vec_stmt && !vectorizable_condition (stmt_info, gsi, NULL,
-   ops[reduc_index], 0, NULL,
+   ops[0], 0, NULL,
cost_vec))
 {
   if (dump_enabled_p ())


Re: [ARM/FDPIC v3 02/21] [ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts

2018-11-09 Thread Christophe Lyon
On Mon, 15 Oct 2018 at 10:12, Christophe Lyon
 wrote:
>
>
>
> On Fri, 12 Oct 2018 at 11:54, Richard Earnshaw (lists) 
>  wrote:
>>
>> On 11/10/18 14:34, Christophe Lyon wrote:
>> > The new arm-uclinuxfdpiceabi target behaves pretty much like
>> > arm-linux-gnueabi. In order the enable the same set of features, we
>> > have to update several configure scripts that generally match targets
>> > like *-*-linux*: in most places, we add *-uclinux* where there is
>> > already *-linux*, or uclinux* when there is already linux*.
>> >
>> > In gcc/config.gcc and libgcc/config.host we use *-*-uclinuxfdpiceabi
>> > because there is already a different behaviour for *-*uclinux* target.
>> >
>> > In libtool.m4, we use uclinuxfdpiceabi in cases where ELF shared
>> > libraries support is required, as uclinux does not guarantee that.
>> >
>> > 2018-XX-XX  Christophe Lyon  
>> >
>> >   config/
>> >   * futex.m4: Handle *-uclinux*.
>> >   * tls.m4 (GCC_CHECK_TLS): Likewise.
>> >
>> >   gcc/
>> >   * config.gcc: Handle *-*-uclinuxfdpiceabi.
>> >
>> >   libatomic/
>> >   * configure.tgt: Handle arm*-*-uclinux*.
>> >   * configure: Regenerate.
>> >
>> >   libgcc/
>> >   * config.host: Handle *-*-uclinuxfdpiceabi.
>> >
>> >   libitm/
>> >   * configure.tgt: Handle *-*-uclinux*.
>> >   * configure: Regenerate.
>> >
>> >   libstdc++-v3/
>> >   * acinclude.m4: Handle uclinux*.
>> >   * configure: Regenerate.
>> >   * configure.host: Handle uclinux*
>> >
>> >   * libtool.m4: Handle uclinux*.
>>
>> What testing have you done to ensure that these new uclinux* changes
>> have not affected existing uclinux configurations?
>>
>> Also, do you really need to use uclinuxfdpiceabi (which is quite
>> Arm-specific) everywhere, or would uclinuxfdpic* be better and ease work
>> for other fdpic targets?
>>

Thanks to the help I received in
https://gcc.gnu.org/ml/gcc/2018-10/msg00154.html
https://gcc.gnu.org/ml/gcc/2018-11/msg00012.html
I managed to build an xtensa-buildroot-uclinux-uclibc toolchain
(r265582), and run 'make check'.
Then I applied this patch, rebuilt the toolchain (from scratch) and
ran 'make check' again.
compare_test says:
Tests that now fail, but worked before (54 tests):
[...]
Tests that now work, but didn't before (2 tests):
23_containers/map/requirements/exception/propagation_consistent.cc
execution test
ext/mt_allocator/deallocate_global_thread-3.cc execution test

New tests that FAIL (4 tests):
27_io/filesystem/path/native/string.cc (test for excess errors)
30_threads/condition_variable_any/50862.cc execution test
30_threads/condition_variable_any/53830.cc execution test
experimental/filesystem/path/native/string.cc (test for excess errors)

New tests that PASS (301 tests):
[...]

(all the diffs are in libstdc++ obviously)
I've attached the details to this email.

>
> This patch became necessary when I was asked to change the target name from  
> arm-linux-uclibceabi to arm-uclinuxfdpiceabi.
> Changing it implied that many features were disabled and tests regressed 
> because the new target name didn't match the regexps in configure scripts.
>
> I iterated over the regressions to see which features were now missing, and I 
> updated the configure scripts accordingly.
>
> When the feature being tested was generic, I used the general *-*-uclinux* 
> form, because it seemed reasonable that it was OK for other targets.
> When the test was arm-related, I used the stricter uclinuxfdpiceabi form, and 
> specifically not uclinuxfdpic* to avoid breaking other fdpic targets. When in 
> doubt, I preferred to stay on the safe side.
>
> To answer you first question, I do not have the setup to test other uclinux 
> configs, I'm not even sure of the list.
> So I tested arm-uclinuxfdpiceabi, made sure I got the same results as with 
> our previous target name, and that
> the whole series didn't regress on arm-linux-gnueabi*.
>
> I hope other uclinux/fdpic target maintainers comment on this patch if 
> something looks wrong to them.
>
> Christophe
>
>> R.
>> >
>> > Change-Id: I6a1fdcd9847d8a82179a214612a3474c1f492916
>> >
>> > diff --git a/config/futex.m4 b/config/futex.m4
>> > index e95144d..4dffe15 100644
>> > --- a/config/futex.m4
>> > +++ b/config/futex.m4
>> > @@ -9,7 +9,7 @@ AC_DEFUN([GCC_LINUX_FUTEX],[dnl
>> >  GCC_ENABLE(linux-futex,default, ,[use the Linux futex system call],
>> >  permit yes|no|default)
>> >  case "$target" in
>> > -  *-linux*)
>> > +  *-linux* | *-uclinux*)
>> >  case "$enable_linux_futex" in
>> >default)
>> >   # If headers don't have gettid/futex syscalls definition, then
>> > diff --git a/config/tls.m4 b/config/tls.m4
>> > index 4e170c8..5a8676e 100644
>> > --- a/config/tls.m4
>> > +++ b/config/tls.m4
>> > @@ -76,7 +76,7 @@ AC_DEFUN([GCC_CHECK_TLS], [
>> > dnl Shared library options may depend on the host; this check
>> > dnl is only known to be needed for GNU/Linux.
>> > case $host in
>> > - 

Re: cleanups and unification of value_range dumping code

2018-11-09 Thread Aldy Hernandez




On 11/9/18 6:37 AM, Richard Biener wrote:

On Thu, Nov 8, 2018 at 5:03 PM Aldy Hernandez  wrote:




On 11/8/18 9:39 AM, Richard Biener wrote:

On Thu, Nov 8, 2018 at 2:32 PM Aldy Hernandez  wrote:


[Richard, you're right.  An overloaded debug() is better than
this->dump().  Anyone who thinks different is wrong.  End of discussion.]

There's no reason to have multiple ways of dumping a value range.  And
I'm not even talking about ipa-prop.* which decided that they should
implement their own version of value_ranges basically to annoy me.
Attached is a patch to remedy this.

I have made three small user-visible changes to the dumping code--
cause, well... you know me... I can't leave things alone.

1. It *REALLY* irks me that bools are dumped with -INF/+INF.  Everyone
knows that bools should be 0/1.  It's in the Bible.  Look it up.

2. Analogous to #1, what's up with signed 1-bit fields?  Who understands
[0, +INF] as [0, 0]?  No one; that's right.  (It's still beyond me the
necessity of signed bit fields in the standard period, but I'll pick my
battles.)

3. I find it quite convenient to display the tree type along with the
range as:

  [1, 1] int EQUIVALENCES { me, myself, I }


the type inbetween the range and equivalences?  seriously?  If so then
_please_ dump the type before the range:

int [1, 1] EQUIV { ... }


Done in attached patch.




Finally.. As mentioned, I've implement an overloaded debug() because
it's *so* nice to use gdb and an alias of:

  define dd
call debug($arg0)
  end

...and have stuff just work.

I do realize that we still have dump() lying around.  At some point, we
should start dropping external access to the dump methods for objects
that are never dumped by the compiler (but by the user at debug time).


+DEBUG_FUNCTION void
+debug (const value_range *vr)
+{
+  dump_value_range (stderr, vr);
+}
+
+DEBUG_FUNCTION void
+debug (const value_range vr)

^^^ missing a & (const reference?)

+{
+  dump_value_range (stderr, &vr);
+}


Fixed.



also

@@ -2122,21 +2122,11 @@ dump_ssaname_info (pretty_printer *buffer,
tree node, int spc)
 if (!POINTER_TYPE_P (TREE_TYPE (node))
 && SSA_NAME_RANGE_INFO (node))
   {
-  wide_int min, max, nonzero_bits;
-  value_range_kind range_type = get_range_info (node, &min, &max);
+  value_range vr;

-  if (range_type == VR_VARYING)
-   pp_printf (buffer, "# RANGE VR_VARYING");
-  else if (range_type == VR_RANGE || range_type == VR_ANTI_RANGE)
-   {
- pp_printf (buffer, "# RANGE ");
- pp_printf (buffer, "%s[", range_type == VR_RANGE ? "" : "~");
- pp_wide_int (buffer, min, TYPE_SIGN (TREE_TYPE (node)));
- pp_printf (buffer, ", ");
- pp_wide_int (buffer, max, TYPE_SIGN (TREE_TYPE (node)));
- pp_printf (buffer, "]");
-   }
-  nonzero_bits = get_nonzero_bits (node);
+  get_range_info (node, vr);

this is again allocating INTEGER_CSTs for no good reason.  dumping really
shuld avoid possibly messing with the GC state.


H, I'd really hate to have two distinct places that dump value
ranges.  Is this really a problem?  Cause the times I've run into GC
problems I'd had to resort to printf() anyhow...


Well, if you can avoid GC avoid it.  If you are in the midst of debugging
you certainly do not want your inferior calls mess with the GC state.


I personally haven't had to deal with this in 20 years, and I believe 
the convenience of having one dumping routine makes up for the 
theoretical possibility that we'd run into this in the future.





And while we're on the subject of multiple implementations, I'm thinking
of rewriting ipa-prop to actually use value_range's, not this:

struct GTY(()) ipa_vr
{
/* The data fields below are valid only if known is true.  */
bool known;
enum value_range_kind type;
wide_int min;
wide_int max;
};

so perhaps:

struct { bool known; value_range vr; }


Size is of great matter here so m_equiv stands in the way here.  Also that
would exchange wide-ints for trees.  Were trees not bad?  Andrew?!?!


Remember that trees are cached, whereas wide_int's are not, and folks
(not me :)) like to complain that wide_int's take a lot of space.


There's trailing-wide-ints for this issue.


Had we been using trailing wide ints, it would've been a great argument, 
but alas we aren't.  As it stands, we'd use 44% less memory by using 
value_range's-- even with the m_equiv waste:


struct GTY(()) ipa_with_value_range
{
  bool known;
  value_range vr;
};

sizeof(ipa_vr)  => 72 bytes
sizeof(ipa_with_value_range)=> 40 bytes

Also, won't your upcoming value_range_without_equivs work further reduce 
these 40 bytes?


Aldy


[PATCH] MAINTAINERS: add myself as or1k maintainer

2018-11-09 Thread Stafford Horne
ChangeLog:

-mm-dd  Stafford Horne  

* MAINTAINERS (CPU Port Maintainers): Add myself for or1k.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ba8c8040967..cc7d4bddee8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -90,6 +90,7 @@ nds32 portShiva Chen  

 nios2 port Chung-Lin Tang  
 nios2 port Sandra Loosemore
 nvptx port Tom de Vries
+or1k port  Stafford Horne  
 pdp11 port Paul Koning 
 powerpcspe portAndrew Jenner   

 riscv port Kito Cheng  
-- 
2.17.2



[committed] Add testsuite coverage for ordered and doacross worksharing loops with task reductions

2018-11-09 Thread Jakub Jelinek
Hi!

Tested on x86_64-linux, committed to trunk.

2018-11-09  Jakub Jelinek  

* gcc.dg/gomp/workshare-reduction-1.c: New test.
* gcc.dg/gomp/workshare-reduction-2.c: New test.
* gcc.dg/gomp/workshare-reduction-3.c: New test.
* gcc.dg/gomp/workshare-reduction-4.c: New test.
* gcc.dg/gomp/workshare-reduction-5.c: New test.
* gcc.dg/gomp/workshare-reduction-6.c: New test.
* gcc.dg/gomp/workshare-reduction-7.c: New test.
* gcc.dg/gomp/workshare-reduction-8.c: New test.
* gcc.dg/gomp/workshare-reduction-9.c: New test.
* gcc.dg/gomp/workshare-reduction-10.c: New test.
* gcc.dg/gomp/workshare-reduction-11.c: New test.
* gcc.dg/gomp/workshare-reduction-12.c: New test.
* gcc.dg/gomp/workshare-reduction-13.c: New test.
* gcc.dg/gomp/workshare-reduction-14.c: New test.
* gcc.dg/gomp/workshare-reduction-15.c: New test.
* gcc.dg/gomp/workshare-reduction-16.c: New test.
* gcc.dg/gomp/workshare-reduction-17.c: New test.
* gcc.dg/gomp/workshare-reduction-18.c: New test.
* gcc.dg/gomp/workshare-reduction-19.c: New test.
* gcc.dg/gomp/workshare-reduction-20.c: New test.
* gcc.dg/gomp/workshare-reduction-21.c: New test.
* gcc.dg/gomp/workshare-reduction-22.c: New test.
* gcc.dg/gomp/workshare-reduction-23.c: New test.
* gcc.dg/gomp/workshare-reduction-24.c: New test.
* gcc.dg/gomp/workshare-reduction-25.c: New test.
* gcc.dg/gomp/workshare-reduction-26.c: New test.
* gcc.dg/gomp/workshare-reduction-27.c: New test.
* gcc.dg/gomp/workshare-reduction-28.c: New test.
* gcc.dg/gomp/workshare-reduction-29.c: New test.
* gcc.dg/gomp/workshare-reduction-30.c: New test.
* gcc.dg/gomp/workshare-reduction-31.c: New test.
* gcc.dg/gomp/workshare-reduction-32.c: New test.
* gcc.dg/gomp/workshare-reduction-33.c: New test.
* gcc.dg/gomp/workshare-reduction-34.c: New test.
* gcc.dg/gomp/workshare-reduction-35.c: New test.
* gcc.dg/gomp/workshare-reduction-36.c: New test.
* gcc.dg/gomp/workshare-reduction-37.c: New test.
* gcc.dg/gomp/workshare-reduction-38.c: New test.
* gcc.dg/gomp/workshare-reduction-39.c: New test.
* gcc.dg/gomp/workshare-reduction-40.c: New test.
* gcc.dg/gomp/workshare-reduction-41.c: New test.
* gcc.dg/gomp/workshare-reduction-42.c: New test.
* gcc.dg/gomp/workshare-reduction-43.c: New test.
* gcc.dg/gomp/workshare-reduction-44.c: New test.
* gcc.dg/gomp/workshare-reduction-45.c: New test.
* gcc.dg/gomp/workshare-reduction-46.c: New test.
* gcc.dg/gomp/workshare-reduction-47.c: New test.
* gcc.dg/gomp/workshare-reduction-48.c: New test.
* gcc.dg/gomp/workshare-reduction-49.c: New test.
* gcc.dg/gomp/workshare-reduction-50.c: New test.
* gcc.dg/gomp/workshare-reduction-51.c: New test.
* gcc.dg/gomp/workshare-reduction-52.c: New test.
* gcc.dg/gomp/workshare-reduction-53.c: New test.
* gcc.dg/gomp/workshare-reduction-54.c: New test.
* gcc.dg/gomp/workshare-reduction-55.c: New test.
* gcc.dg/gomp/workshare-reduction-56.c: New test.
* gcc.dg/gomp/workshare-reduction-57.c: New test.
* gcc.dg/gomp/workshare-reduction-58.c: New test.
libgomp/
* testsuite/libgomp.c-c++-common/task-reduction-13.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-14.c: New test.

--- gcc/testsuite/gcc.dg/gomp/workshare-reduction-1.c.jj2018-11-09 
13:53:25.626769600 +0100
+++ gcc/testsuite/gcc.dg/gomp/workshare-reduction-1.c   2018-11-09 
13:39:37.933450741 +0100
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_start \[^\n\r]*, 0, 
0, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_end " 1 "optimized" 
} } */
+/* { dg-final { scan-tree-dump-times 
"__builtin_GOMP_loop_maybe_nonmonotonic_runtime_next " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times 
"__builtin_GOMP_workshare_task_reduction_unregister \\(0\\)" 1 "optimized" } } 
*/
+/* { dg-final { scan-tree-dump-times "__builtin_GOMP_parallel " 1 "optimized" 
} } */
+
+int j;
+void bar (int *);
+
+void
+foo (int a, int b, int c)
+{
+  int i;
+  #pragma omp parallel
+  #pragma omp for reduction (task, *: j) schedule (runtime)
+  for (i = a; i < b; i += c)
+{
+  j++;
+  bar (&j);
+}
+}
--- gcc/testsuite/gcc.dg/gomp/workshare-reduction-2.c.jj2018-11-09 
13:53:25.648769236 +0100
+++ gcc/testsuite/gcc.dg/gomp/workshare-reduction-2.c   2018-11-09 
13:39:37.936450691 +0100
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_start \

[committed] json.cc: fix comment

2018-11-09 Thread David Malcolm
I've committed this obvious fix to trunk as r265968 (a copy&paste error).

gcc/ChangeLog:
* json.cc (selftest::test_writing_literals): Fix comment.
---
 gcc/json.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/json.cc b/gcc/json.cc
index a0c43956..46b6ef6 100644
--- a/gcc/json.cc
+++ b/gcc/json.cc
@@ -288,7 +288,7 @@ test_writing_strings ()
   assert_print_eq (contains_quotes, "\"before \\\"quoted\\\" after\"");
 }
 
-/* Verify that JSON strings are written correctly.  */
+/* Verify that JSON literals are written correctly.  */
 
 static void
 test_writing_literals ()
-- 
1.8.5.3



Re: [PATCH] Come up with htab_hash_string_vptr and use string-specific if possible.

2018-11-09 Thread Michael Matz
Hi,

On Thu, 8 Nov 2018, Martin Liška wrote:

> > That seems better.  But still, why declare this in system.h?  It seems 
> > hash-table.h seems more appropriate.
> 
> I need to declare it before I'll poison it. As system.h is included very 
> early, one can guarantee that there will be no usage before the 
> poisoning happens.

Yes but it's also included everywhere, so adding anything to it comes at a 
cost, and conceptually it simply doesn't belong there.

There's no fundamental reason why we can't poison identifiers in other 
headers.  Indeed we do in vec.h.  So move the whole thing including 
poisoning to hash-table.h?


Ciao,
Michael.

Re: [PATCH, arm] Backport -- Fix ICE during thunk generation with -mlong-calls

2018-11-09 Thread Mihail Ionescu

Hi Sudi

On 11/08/2018 06:53 PM, Sudakshina Das wrote:

Hi Mihail

On 08/11/18 10:02, Ramana Radhakrishnan wrote:

On 07/11/2018 17:49, Mihail Ionescu wrote:

Hi All,

This is a backport from trunk for GCC 8 and 7.

SVN revision: r264595.

Regression tested on arm-none-eabi.


gcc/ChangeLog

2018-11-02  Mihail Ionescu  

Backport from mainiline
2018-09-26  Eric Botcazou  

* config/arm/arm.c (arm_reorg): Skip Thumb reorg pass for thunks.
(arm32_output_mi_thunk): Deal with long calls.

gcc/testsuite/ChangeLog

2018-11-02  Mihail Ionescu  

Backport from mainiline
2018-09-17  Eric Botcazou  

* g++.dg/other/thunk2a.C: New test.
* g++.dg/other/thunk2b.C: Likewise.


If everything is ok, could someone commit it on my behalf?

Best regards,
   Mihail



It is a regression since my rewrite of this code.

Ok to backport to the release branches, it's been on trunk for a while
and not shown any issues - please give the release managers a day or so
to object.

regards
Ramana



Does this fix PR87867 you reported? If yes, then it would be easier
to add the PR tag in the ChangeLog so that the ticket gets updated once
committed.

Thanks
Sudi



I've seen the patch has been commited (Ramana added the PR tag), but I
now know it has to be added and I will do it in the future.

Regards
Mihail


Re: [PATCH v3] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ulrich Weigand
Ilya Leoshkevich wrote:

> +  /* Return unannotated constant pool references, so that the corresponding
> + entries are added to the back-end-managed pool.  Not doing so would 
> result
> + in those entries being placed in the middle-end-managed pool, which 
> would
> + in turn prevent merging them with identical ones in the back-end-managed
> + pool.  */

I'm still not convinced this is necessary; how frequently does the case occur
that a constant is duplicated, and is it worth the extra code complexity?
(Also, note that the middle-end pool is shared across the whole translation
unit, while the back-end pool is duplicated in every function ... so if the
same constant is used by many functions, forcing it into the back-end pool
may *increase* overall size.)

In any case, does this even happen with your current code?  You find the
unannotated reference here, and therefore copy the constant into the local
pool, but then replace_constant_pool_ref will still ignore the insn, and
thus the insn will actually continue to refer to the middle-end pool
constant, *not* the copy created in the local pool.  Right?

>if (DISP_IN_RANGE (INTVAL (frame_off)))
>   {
> -   insn = gen_rtx_SET (frame_pointer,
> -   gen_rtx_PLUS (Pmode, frame_pointer, frame_off));
> -   insn = emit_insn (insn);
> +   insn = emit_insn (copy_rtx (cfa));
>   }

This seems unrelated?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[PATCH][ARM] Update max_cond_insns settings

2018-11-09 Thread Wilco Dijkstra
The existing setting of max_cond_insns for most cores is non-optimal.
Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
Also such long sequences of conditional instructions can increase the number
of executed instructions significantly, so using 5 for max_cond_insns is
non-optimal.

Previous benchmarking showed that setting max_cond_insn to 2 was the best value
for Cortex-A15 and Cortex-A57.  All ARMv8-A cores use 2 - apart from Cortex-A35
and Cortex-A53.  Given that using 5 is worse, set it to 2.  This also has the
advantage of producing more uniform code.

Bootstrap and regress OK on arm-none-linux-gnueabihf.

OK for stage 1?

ChangeLog:
2017-04-12  Wilco Dijkstra  

    * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
    (arm_cortex_a35_tune): Likewise.
---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,


[PATCH][AArch64] PR79262: Adjust vector cost

2018-11-09 Thread Wilco Dijkstra
PR79262 has been fixed for almost all AArch64 cpus, however the example is still
vectorized in a few cases, resulting in lower performance.  Increase the cost of
vector-to-scalar moves so it is more similar to the other vector costs. As a 
result
-mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
performance improves.

OK for commit?

ChangeLog:
2018-01-22  Wilco Dijkstra  

    PR target/79262
    * config/aarch64/aarch64.c (generic_vector_cost): Adjust 
vec_to_scalar_cost.
--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost =
   1, /* vec_int_stmt_cost  */
   1, /* vec_fp_stmt_cost  */
   2, /* vec_permute_cost  */
-  1, /* vec_to_scalar_cost  */
+  2, /* vec_to_scalar_cost  */
   1, /* scalar_to_vec_cost  */
   1, /* vec_align_load_cost  */
   1, /* vec_unalign_load_cost  */


[PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-11-09 Thread Wilco Dijkstra
Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing
bitfields by their declared type, which results in better codegeneration on 
practically
any target.

I'm thinking we should completely remove all trace of SLOW_BYTE_ACCESS
from GCC as it's confusing and useless.

OK for commit until we get rid of it?

ChangeLog:
2017-11-17  Wilco Dijkstra  

    gcc/
    * config/aarch64/aarch64.h (SLOW_BYTE_ACCESS): Set to 1.
--
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
056110afb228fb919e837c04aa5e5552a4868ec3..d8f4d129a02fb89eb00d256aba8c4764d6026078
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -769,14 +769,9 @@ typedef struct
    if given data not on the nominal alignment.  */
 #define STRICT_ALIGNMENT    TARGET_STRICT_ALIGN
 
-/* Define this macro to be non-zero if accessing less than a word of
-   memory is no faster than accessing a word of memory, i.e., if such
-   accesses require more than one instruction or if there is no
-   difference in cost.
-   Although there's no difference in instruction count or cycles,
-   in AArch64 we don't want to expand to a sub-word to a 64-bit access
-   if we don't have to, for power-saving reasons.  */
-#define SLOW_BYTE_ACCESS   0
+/* Contrary to all documentation, this enables wide bitfield accesses,
+   which results in better code when accessing multiple bitfields.  */
+#define SLOW_BYTE_ACCESS   1
 
 #define NO_FUNCTION_CSE 1



[PATCH] Add value_range_base (w/o equivs)

2018-11-09 Thread Richard Biener


This adds value_range_base, a base class of class value_range
with all members but the m_equiv one.

I have looked into the sole GC user, IPA propagation, and replaced
the value_range use there with value_range_base since it also
asserted the equiv member is always NULL.

This in turn means I have written down that GC users only can
use value_range_base (and fixed the accessability issue with
adding a bunch of friends).

I have moved all methods that are required for this single user
sofar (without looking with others are trivially movable because
they do not look at equivs - that's for a followup).  I ended
up basically duplicating the ::union_ wrapper around union_ranges
(boo).  Some more refactoring might save us a few lines here.

There are two places where IPA prop calls extract_range_from_unary_expr.
I've temporarily made it use value_range temporaries there given
I need to think about the few cases of ->deep_copy () we have in
there.  IMHO equiv handling would need to move to the callers or
rather "copies" shouldn't be handled by extract_range_from_unary_expr.
Well, I need to think about it.  (no, I don't want to make that
function virtual)

The other workers might be even easier to massage to work on 
value_range_base only.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

If that goes well I want to apply this even in its incomplete form.
Unless there are any complaints?

Thanks,
Richard.

>From 525e2e74bbb666399fafcffe8f17e928f45ca898 Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Fri, 9 Nov 2018 15:00:50 +0100
Subject: [PATCH] add-value_range_base

* tree-vrp.h (class value_range_base): New base class for
value_range containing all but the m_equiv member.
(dump_value_range_base): Add.
(range_includes_zero_p): Work on value_range_base.
* tree-vrp.c (value_range_base::set): Split out base handling
from...
(value_range::set): this.
(value_range::set_equiv): New.
(value_range_base::value_range_base): New constructors.
(value_range_base::check): Split out base handling from...
(value_range::check): this.
(value_range::equal_p): Refactor in terms of
ignore_equivs_equal_p which is now member of the base.
(value_range_base::set_undefined): New.
(value_range_base::set_varying): Likewise.
(value_range_base::dump):Split out base handling from...
(value_range::dump): this.
(value_range_base::set_and_canonicalize): Split out base handling
from...
(value_range::set_and_canonicalize): this.
(value_range_base::union_): New.

* ipa-prop.h (struct ipa_jump_func): Use value_range_base *
for m_vr.
* ipa-cp.c (class ipcp_vr_lattice): Use value_range_base
instead of value_range everywhere.
(ipcp_vr_lattice::print): Use dump_value_range_base.
(ipcp_vr_lattice::meet_with): Adjust.
(ipcp_vr_lattice::meet_with_1): Likewise.
(ipa_vr_operation_and_type_effects): Likewise.
(propagate_vr_across_jump_function): Likewise.
* ipa-prop.c (struct ipa_vr_ggc_hash_traits): Likewise.
(ipa_get_value_range): Likewise.
(ipa_set_jfunc_vr): Likewise.
(ipa_compute_jump_functions_for_edge): Likewise.

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 4471bae11c7..4f147eb37cc 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -307,18 +307,18 @@ private:
 class ipcp_vr_lattice
 {
 public:
-  value_range m_vr;
+  value_range_base m_vr;
 
   inline bool bottom_p () const;
   inline bool top_p () const;
   inline bool set_to_bottom ();
-  bool meet_with (const value_range *p_vr);
+  bool meet_with (const value_range_base *p_vr);
   bool meet_with (const ipcp_vr_lattice &other);
   void init () { gcc_assert (m_vr.undefined_p ()); }
   void print (FILE * f);
 
 private:
-  bool meet_with_1 (const value_range *other_vr);
+  bool meet_with_1 (const value_range_base *other_vr);
 };
 
 /* Structure containing lattices for a parameter itself and for pieces of
@@ -522,7 +522,7 @@ ipcp_bits_lattice::print (FILE *f)
 void
 ipcp_vr_lattice::print (FILE * f)
 {
-  dump_value_range (f, &m_vr);
+  dump_value_range_base (f, &m_vr);
 }
 
 /* Print all ipcp_lattices of all functions to F.  */
@@ -909,7 +909,7 @@ ipcp_vr_lattice::meet_with (const ipcp_vr_lattice &other)
lattice.  */
 
 bool
-ipcp_vr_lattice::meet_with (const value_range *p_vr)
+ipcp_vr_lattice::meet_with (const value_range_base *p_vr)
 {
   return meet_with_1 (p_vr);
 }
@@ -918,7 +918,7 @@ ipcp_vr_lattice::meet_with (const value_range *p_vr)
OTHER_VR lattice.  Return TRUE if anything changed.  */
 
 bool
-ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
+ipcp_vr_lattice::meet_with_1 (const value_range_base *other_vr)
 {
   if (bottom_p ())
 return false;
@@ -926,7 +926,7 @@ ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
   if (other_vr->varying_p ())
 return set_to_bottom ();
 

Ping: [PATCH, testsuite] ignore some "conflicting types for built-in" messages

2018-11-09 Thread Paul Koning
Ping.

I'd like to commit this.  The discussion seems to have ended up with the 
conclusion that this is a reasonable approach.

paul


> On Nov 1, 2018, at 3:13 PM, Paul Koning  wrote:
> 
> A number of test cases contain declarations like:
>  void *memcpy();
> which currently are silently accepted on most platforms but not on all; pdp11 
> (and possibly some others) generate a "conflicting types for built-in 
> function" warning.
> 
> It was suggested to prune those messages because the test cases where these 
> occur are not looking for the message but are testing some other issue, so 
> the message is not relevant.  The attached patch adds dg-prune-output 
> directives to do so.
> 
> Ok for trunk?
> 
>   paul
> 
> ChangeLog:
> 
> 2018-11-01  Paul Koning  
> 
>   * gcc.dg/Walloca-16.c: Ignore conflicting types for built-in
>   warnings.
>   * gcc.dg/Wrestrict-4.c: Ditto.
>   * gcc.dg/Wrestrict-5.c: Ditto.
>   * gcc.dg/pr83463.c: Ditto.
>   * gcc.dg/torture/pr55890-2.c: Ditto.
>   * gcc.dg/torture/pr55890-3.c: Ditto.
>   * gcc.dg/torture/pr71816.c: Ditto.
> 
> Index: gcc.dg/Walloca-16.c
> ===
> --- gcc.dg/Walloca-16.c   (revision 265727)
> +++ gcc.dg/Walloca-16.c   (working copy)
> @@ -1,5 +1,6 @@
> /* PR tree-optimization/84224 */
> /* { dg-do compile } */
> +/* { dg-prune-output "conflicting types for built-in" } */
> /* { dg-options "-O0 -Walloca" } */
> 
> void *alloca ();
> Index: gcc.dg/Wrestrict-4.c
> ===
> --- gcc.dg/Wrestrict-4.c  (revision 265727)
> +++ gcc.dg/Wrestrict-4.c  (working copy)
> @@ -3,6 +3,7 @@
>Test to verify that invalid calls to built-in functions declared
>without a prototype don't cause an ICE.
>{ dg-do compile }
> +   { dg-prune-output "conflicting types for built-in" }
>{ dg-options "-O2 -Warray-bounds -Wrestrict" } */
> 
> void* memcpy ();
> Index: gcc.dg/Wrestrict-5.c
> ===
> --- gcc.dg/Wrestrict-5.c  (revision 265727)
> +++ gcc.dg/Wrestrict-5.c  (working copy)
> @@ -2,6 +2,7 @@
>functions declared with no prototype are checked for overlap, and that
>invalid calls are ignored.
>   { dg-do compile }
> +  { dg-prune-output "conflicting types for built-in" }
>   { dg-options "-O2 -Wrestrict" }  */
> 
> typedef __SIZE_TYPE__ size_t;
> Index: gcc.dg/pr83463.c
> ===
> --- gcc.dg/pr83463.c  (revision 265727)
> +++ gcc.dg/pr83463.c  (working copy)
> @@ -1,5 +1,6 @@
> /* PR middle-end/83463 */
> /* { dg-do compile } */
> +/* { dg-prune-output "conflicting types for built-in" } */
> /* { dg-options "-O2 -Wrestrict -Wno-pointer-to-int-cast" } */
> 
> int *a;
> Index: gcc.dg/torture/pr55890-2.c
> ===
> --- gcc.dg/torture/pr55890-2.c(revision 265727)
> +++ gcc.dg/torture/pr55890-2.c(working copy)
> @@ -1,4 +1,5 @@
> /* { dg-do compile } */
> +/* { dg-prune-output "conflicting types for built-in" } */
> 
> extern void *memcpy();
> int main() { memcpy(); }
> Index: gcc.dg/torture/pr55890-3.c
> ===
> --- gcc.dg/torture/pr55890-3.c(revision 265727)
> +++ gcc.dg/torture/pr55890-3.c(working copy)
> @@ -1,4 +1,5 @@
> /* { dg-do compile } */
> +/* { dg-prune-output "conflicting types for built-in" } */
> 
> void *memmove ();
> 
> Index: gcc.dg/torture/pr71816.c
> ===
> --- gcc.dg/torture/pr71816.c  (revision 265727)
> +++ gcc.dg/torture/pr71816.c  (working copy)
> @@ -1,4 +1,5 @@
> /* { dg-do compile } */
> +/* { dg-prune-output "conflicting types for built-in" } */
> 
> void *ext2fs_resize_mem_p;
> struct ext2_icount_el {
> 



[PATCH] Fix up affinity-fmt.c (was Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9)

2018-11-09 Thread Jakub Jelinek
Hi!

The earlier patch doesn't work, because there were still expressions
where handle could be still cast to integer of different size if it
happened to be a pointer, or an invalid cast of e.g. aggregate to
integer.

The following patch so far only tested on a simplified code should
handle it properly though, gomp_integral should yield an integral type
without a warning (for aggregates etc. 0, but that is what I wanted to
print, don't know what else to print if pthread_t is an aggregate I have no
idea what it contains).  Plus I've added some portability stuff for mingw
%llx vs. %I64x.

Can you please give it a whirl on Darwin and see what the
display-affinity-1.c testcase prints (in libgomp/testsuite/libgomp.log) ?

Thanks.

2018-11-09  Jakub Jelinek  

* affinity-fmt.c: Include inttypes.h if HAVE_INTTYPES_H.
(gomp_display_affinity): Use __builtin_choose_expr to handle
properly handle argument having integral, or pointer or some other
type.  If inttypes.h is available and PRIx64 is defined, use PRIx64
with uint64_t type instead of %llx and unsigned long long.

--- libgomp/affinity-fmt.c.jj   2018-11-08 18:08:01.412987460 +0100
+++ libgomp/affinity-fmt.c  2018-11-09 15:24:52.049169494 +0100
@@ -30,6 +30,9 @@
 #ifdef HAVE_UNISTD_H
 #include 
 #endif
+#ifdef HAVE_INTTYPES_H
+# include   /* For PRIx64.  */
+#endif
 #ifdef HAVE_UNAME
 #include 
 #endif
@@ -356,37 +359,42 @@ gomp_display_affinity (char *buffer, siz
  goto do_int;
case 'i':
 #if defined(LIBGOMP_USE_PTHREADS) && defined(__GNUC__)
- /* Handle integral pthread_t.  */
- if (__builtin_classify_type (handle) == 1)
-   {
- char buf[3 * (sizeof (handle) + sizeof (int)) + 4];
-
- if (sizeof (handle) == sizeof (long))
-   sprintf (buf, "0x%lx", (long) handle);
- else if (sizeof (handle) == sizeof (long long))
-   sprintf (buf, "0x%llx", (long long) handle);
- else
-   sprintf (buf, "0x%x", (int) handle);
- gomp_display_num (buffer, size, &ret, zero, right, sz, buf);
- break;
-   }
- /* And pointer pthread_t.  */
- else if (__builtin_classify_type (handle) == 5)
-   {
- char buf[3 * (sizeof (uintptr_t) + sizeof (int)) + 4];
-
- if (sizeof (uintptr_t) == sizeof (long))
-   sprintf (buf, "0x%lx", (long) (uintptr_t) handle);
- else if (sizeof (uintptr_t) == sizeof (long long))
-   sprintf (buf, "0x%llx", (long long) (uintptr_t) handle);
- else
-   sprintf (buf, "0x%x", (int) (uintptr_t) handle);
- gomp_display_num (buffer, size, &ret, zero, right, sz, buf);
- break;
-   }
+ {
+   char buf[3 * (sizeof (handle) + sizeof (uintptr_t) + sizeof (int))
++ 4];
+   /* This macro returns expr unmodified for integral or pointer
+  types and 0 for anything else (e.g. aggregates).  */
+#define gomp_nonaggregate(expr) \
+  __builtin_choose_expr (__builtin_classify_type (expr) == 1   \
+|| __builtin_classify_type (expr) == 5, expr, 0)
+   /* This macro returns expr unmodified for integral types,
+  (uintptr_t) (expr) for pointer types and 0 for anything else
+  (e.g. aggregates).  */
+#define gomp_integral(expr) \
+  __builtin_choose_expr (__builtin_classify_type (expr) == 5,  \
+(uintptr_t) gomp_nonaggregate (expr),  \
+gomp_nonaggregate (expr))
+
+   if (sizeof (gomp_integral (handle)) == sizeof (unsigned long))
+ sprintf (buf, "0x%lx", (unsigned long) gomp_integral (handle));
+#if defined (HAVE_INTTYPES_H) && defined (PRIx64)
+   else if (sizeof (gomp_integral (handle)) == sizeof (uint64_t))
+ sprintf (buf, "0x" PRIx64, (uint64_t) gomp_integral (handle));
+#else
+   else if (sizeof (gomp_integral (handle))
+== sizeof (unsigned long long))
+ sprintf (buf, "0x%llx",
+  (unsigned long long) gomp_integral (handle));
 #endif
+   else
+ sprintf (buf, "0x%x", (unsigned int) gomp_integral (handle));
+   gomp_display_num (buffer, size, &ret, zero, right, sz, buf);
+   break;
+ }
+#else
  val = 0;
  goto do_int;
+#endif
case 'A':
  if (sz == (size_t) -1)
gomp_display_affinity_place (buffer, size, &ret,


Jakub


[PATCH][AArch64] Fix symbol offset limit

2018-11-09 Thread Wilco Dijkstra
In aarch64_classify_symbol symbols are allowed full-range offsets on 
relocations. 
This means the offset can use all of the +/-4GB offset, leaving no offset 
available
for the symbol itself.  This results in relocation overflow and link-time errors
for simple expressions like &global_char + 0xff00.

To avoid this, limit the offset to +/-1MB so that the symbol needs to be within 
a
3.9GB offset from its references.  For the tiny code model use a 64KB offset, 
allowing
most of the 1MB range for code/data between the symbol and its references.

Bootstrapped on AArch64, passes regress, OK for commit?

ChangeLog:
2018-11-09  Wilco Dijkstra  

gcc/
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Apply reasonable limit to symbol offsets.

testsuite/
* gcc.target/aarch64/symbol-range.c (foo): Set new limit.
* gcc.target/aarch64/symbol-range-tiny.c (foo): Likewise.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
b6ea6d3f14b212b69f76d21b72527b6a8ea8cb0e..be03aeea8cd9bab07a01a161c4a91283bd25c99d
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11794,26 +11794,26 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
 the offset does not cause overflow of the final address.  But
 we have no way of knowing the address of symbol at compile time
 so we can't accurately say if the distance between the PC and
-symbol + offset is outside the addressible range of +/-1M in the
-TINY code model.  So we rely on images not being greater than
-1M and cap the offset at 1M and anything beyond 1M will have to
-be loaded using an alternative mechanism.  Furthermore if the
-symbol is a weak reference to something that isn't known to
-resolve to a symbol in this module, then force to memory.  */
+symbol + offset is outside the addressible range of +/-1MB in the
+TINY code model.  So we limit the maximum offset to +/-64KB and
+assume the offset to the symbol is not larger than +/-(1MB - 64KB).
+Furthermore force to memory if the symbol is a weak reference to
+something that doesn't resolve to a symbol in this module.  */
  if ((SYMBOL_REF_WEAK (x)
   && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (offset, -1048575, 1048575))
+ || !IN_RANGE (offset, -0x1, 0x1))
return SYMBOL_FORCE_TO_MEM;
+
  return SYMBOL_TINY_ABSOLUTE;
 
case AARCH64_CMODEL_SMALL:
  /* Same reasoning as the tiny code model, but the offset cap here is
-4G.  */
+1MB, allowing +/-3.9GB for the offset to the symbol.  */
  if ((SYMBOL_REF_WEAK (x)
   && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (offset, HOST_WIDE_INT_C (-4294967263),
-   HOST_WIDE_INT_C (4294967264)))
+ || !IN_RANGE (offset, -0x10, 0x10))
return SYMBOL_FORCE_TO_MEM;
+
  return SYMBOL_SMALL_ABSOLUTE;
 
case AARCH64_CMODEL_TINY_PIC:
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
index 
d7e46b059e41f2672b3a1da5506fa8944e752e01..d49ff4dbe5786ef6d343d2b90052c09676dd7fe5
 100644
--- a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
+++ b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
@@ -1,12 +1,12 @@
-/* { dg-do compile } */
+/* { dg-do link } */
 /* { dg-options "-O3 -save-temps -mcmodel=tiny" } */
 
-int fixed_regs[0x0020];
+char fixed_regs[0x0020];
 
 int
-foo()
+main ()
 {
-  return fixed_regs[0x0008];
+  return fixed_regs[0x000ff000];
 }
 
 /* { dg-final { scan-assembler-not "adr\tx\[0-9\]+, fixed_regs\\\+" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
index 
6574cf4310430b847e77ea56bf8f20ef312d53e4..75c87c12f08004c153efc5192e5cfab566c089db
 100644
--- a/gcc/testsuite/gcc.target/aarch64/symbol-range.c
+++ b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
@@ -1,12 +1,12 @@
-/* { dg-do compile } */
+/* { dg-do link } */
 /* { dg-options "-O3 -save-temps -mcmodel=small" } */
 
-int fixed_regs[0x2ULL];
+char fixed_regs[0x2ULL];
 
 int
-foo()
+main ()
 {
-  return fixed_regs[0x1ULL];
+  return fixed_regs[0xf000ULL];
 }
 
 /* { dg-final { scan-assembler-not "adrp\tx\[0-9\]+, fixed_regs\\\+" } } */


Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9

2018-11-09 Thread David Malcolm
On Fri, 2018-11-09 at 12:11 +0100, Jakub Jelinek wrote:
> On Fri, Nov 09, 2018 at 12:04:54PM +0100, Jakub Jelinek wrote:
> > On Fri, Nov 09, 2018 at 11:34:48AM +0100, Dominique d'Humières
> > wrote:
> > > Bootstrapping r265942 on darwin failed with
> > > 
> > > In file included from /Applications/Xcode-
> > > 6.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SD
> > > Ks/MacOSX10.9.sdk/usr/include/stdio.h:490,
> > >  from ../../../work/libgomp/affinity-fmt.c:28:
> > > ../../../work/libgomp/affinity-fmt.c: In function
> > > 'gomp_display_affinity':
> > > ../../../work/libgomp/affinity-fmt.c:369:17: error: format '%x'
> > > expects argument of type 'unsigned int', but argument 5 has type
> > > 'long unsigned int' -Werror=format=]
> > >   369 |   sprintf (buf, "0x%x", (uintptr_t) handle);
> > >   | ^~  ~~
> > 
> > Weird, the above prints a line which just isn't there, line 369 is:
> > sprintf (buf, "0x%x", (int) handle);
> > not
> > sprintf (buf, "0x%x", (uintptr_t) handle);
> > 
> > What is pthread_t on your platform?  An integral type (which one),
> > pointer
> > or something different (e.g. an aggregate)?
> 
> The "but argument 5" in there is also weird, sprintf has 3 arguments.
> Doesn't current gcc emit warnings like:
> /tmp/j.c: In function ‘main’:
> /tmp/j.c:8:21: warning: format ‘%x’ expects argument of type
> ‘unsigned int’, but argument 3 has type ‘long unsigned int’ [-
> Wformat=]
> 8 |   sprintf (buf, "0x%x", l);
>   |~^   ~
>   | |   |
>   | |   long unsigned int
>   | unsigned int
>   |%lx
> ?
> So, what exact compiler is compiling this code?

I agree that that looks weird.  I wonder if this is showing a bug in
the substring-locations.c or c-format.c code?  (I've changed these in
the last couple of months; maybe I messed up?)  Or is the compiler
picking up the wrong code?

The fact that it fails to use the location *within* the string in the
initial error but then prints a supporting "note" with it suggests to
me that there are macros involved.  Is sprintf a macro on this
configuration?  Maybe that's got something to do with it.

Dave


Re: [PATCH][AArch64] PR79262: Adjust vector cost

2018-11-09 Thread James Greenhalgh
On Fri, Nov 09, 2018 at 08:14:27AM -0600, Wilco Dijkstra wrote:
> PR79262 has been fixed for almost all AArch64 cpus, however the example is 
> still
> vectorized in a few cases, resulting in lower performance.  Increase the cost 
> of
> vector-to-scalar moves so it is more similar to the other vector costs. As a 
> result
> -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
> performance improves.
> 
> OK for commit?

No.

We have 7 unique target tuning structures in the AArch64 backend, of which
only one has a 2x ratio between scalar_int_cost and vec_to_scalar_cost. Other
ratios are 1, 3, 8, 3, 4, 6.

What makes this choice correct? What makes it more correct than what we
have now? On which of the 28 entries in config/aarch64/aarch64-cores.def does
performance improve? Are the Spec benchmarks sufficiently representative to
change the generic vectorisation costs?

Please validate the performance effect of this patch, which changes default
code generation for everyone, on more than one testcase in a bug report.

Thanks,
James

> ChangeLog:
> 2018-01-22  Wilco Dijkstra  
> 
>     PR target/79262
>     * config/aarch64/aarch64.c (generic_vector_cost): Adjust 
> vec_to_scalar_cost.
> --


Re: [PATCH][AArch64] PR79262: Adjust vector cost

2018-11-09 Thread James Greenhalgh
On Mon, Jan 22, 2018 at 09:22:27AM -0600, Richard Biener wrote:
> On Mon, Jan 22, 2018 at 4:01 PM, Wilco Dijkstra  
> wrote:
> > PR79262 has been fixed for almost all AArch64 cpus, however the example is 
> > still
> > vectorized in a few cases, resulting in lower performance.  Increase the 
> > cost of
> > vector-to-scalar moves so it is more similar to the other vector costs. As 
> > a result
> > -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
> > performance improves.
> >
> > OK for commit?
> 
> It would be better to dissect this cost into vec_to_scalar and vec_extract 
> where
> vec_to_scalar really means getting at the scalar value of a vector of
> uniform values
> which most targets can do without any instruction (just use a subreg).
> 
> I suppose we could also make vec_to_scalar equal to vector extraction and 
> remove
> the uses for the other case (reduction vector result to scalar reg).

I have dug up Richard's comments from last year, which you appear to have
ignored and made no reference to when resubmitting the patch.

Please don't do that. Carefully consider Richard's review feedback before
resubmitting this patch.

To reiterate, it is not OK for trunk.

Thanks,
James

> 
> Richard.
> 
> > ChangeLog:
> > 2018-01-22  Wilco Dijkstra  
> >
> > PR target/79262
> > * config/aarch64/aarch64.c (generic_vector_cost): Adjust 
> > vec_to_scalar_cost.


[RFC] support --with-multilib-list=@/path/name

2018-11-09 Thread Alexandre Oliva
Richard,

Olivier tells me he talked to you briefly at the Cauldron about allowing
custom multilib sets to be configured from custom Makefile fragments
supplied at configure time, and that you was somewhat receptive to the
idea.  I have implemented this as follows, for ARM targets only so far,
using "@/path/name" in the argument to --with-multilib-list to mean
/path/name is to be used as a multilib-list Makefile fragment.

Does this look like an acceptable interface?  Is it ok as far as ARM
maintainers are concerned?

Any objections from maintainers of other targets that support
--with-multilib-list (or from anyone else) to the syntax, to the
implementation, or to extending this possibility to their ports?

for  gcc/ChangeLog

* config.gcc (tmake_file): Add /path/name to tmake_file for
each @/path/name in --with-multilib-list on arm-*-* targets.
* configure.ac: Accept full pathnames in tmake_file.
* configure: Rebuilt.
* doc/install.texi (with-multilib-list): Document it.
---
 gcc/config.gcc   |   13 +
 gcc/configure|9 ++---
 gcc/configure.ac |5 -
 gcc/doc/install.texi |   22 +++---
 4 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 7578ff03825e..f1363c41f989 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3998,6 +3998,19 @@ case "${target}" in
aprofile|rmprofile)

tmake_profile_file="arm/t-multilib"
;;
+   @/*)
+   ml=`echo "X$arm_multilib" | sed 
'1s,^X@,,'`
+   if test -f "${ml}"; then
+   
tmake_file="${tmake_file} ${ml}"
+   else
+   echo "Error: ${ml} does 
not exist" >&2
+   exit 1
+   fi
+   ;;
+   @*)
+   echo "Error: multilib config 
file must start with /" >&2
+   exit 1
+   ;;
*)
echo "Error: 
--with-multilib-list=${with_multilib_list} not supported." 1>&2
exit 1
diff --git a/gcc/configure b/gcc/configure
index b814484ea25b..5f15c7a1ff02 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -12244,7 +12244,10 @@ done
 tmake_file_=
 for f in ${tmake_file}
 do
-   if test -f ${srcdir}/config/$f
+   if test -n `echo "X$f" | sed -n '1s,^X/.*,/,p` && test -f "$f"
+   then
+   tmake_file_="${tmake_file_} $f"
+   elif test -f ${srcdir}/config/$f
then
tmake_file_="${tmake_file_} \$(srcdir)/config/$f"
fi
@@ -18572,7 +18575,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18575 "configure"
+#line 18578 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18678,7 +18681,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18681 "configure"
+#line 18684 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 59585912556b..99a3e6f8f52f 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1940,7 +1940,10 @@ done
 tmake_file_=
 for f in ${tmake_file}
 do
-   if test -f ${srcdir}/config/$f
+   if test -n `echo "X$f" | sed -n '1s,^X/.*,/,p` && test -f "$f"
+   then
+   tmake_file_="${tmake_file_} $f"
+   elif test -f ${srcdir}/config/$f
then
tmake_file_="${tmake_file_} \$(srcdir)/config/$f"
fi
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index be9b07b5d23b..fd19fc590ec8 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1078,13 +1078,21 @@ values and meaning for each target is given below.
 
 @table @code
 @item arm*-*-*
-@var{list} is a comma separated list of @code{aprofile} and @code{rmprofile}
-to build multilibs for A or R and M architecture profiles respectively.  Note
-that, due to some limitation of the current multilib framework, using the
-combined @code{aprofile,rmprofile} multilibs selects in some cases a less
-optimal multilib than when using the multilib profile for the architecture
-targetted.  The special value @code{default} is also accepted and is equivalent
-to omitting the option, ie. only the default run-time library will be enable

Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-09 Thread Richard Biener
On Sun, Nov 4, 2018 at 7:33 AM Andi Kleen  wrote:
>
> From: Andi Kleen 
>
> Add a new pass to automatically instrument changes to variables
> with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
> field into an Processor Trace log, which allows log over head
> logging of informatin.
>
> This allows to reconstruct how values later, which can be useful for
> debugging or other analysis of the program behavior. With the compiler
> support this can be done with without having to manually add instrumentation
> to the code.
>
> Using dwarf information this can be later mapped back to the variables.
>
> There are new options to enable instrumentation for different types,
> and also a new attribute to control analysis fine grained per
> function or variable level. The attributes can be set on both
> the variable and the type level, and also on structure fields.
> This allows to enable tracing only for specific code in large
> programs.
>
> The pass is generic, but only the x86 backend enables the necessary
> hooks. When the backend enables the necessary hooks (with -mptwrite)
> there is an additional pass that looks through the code for
> attribute vartrace enabled functions or variables.
>
> The -fvartrace-locals options is experimental: it works, but it
> generates redundant ptwrites because the pass doesn't use
> the SSA information to minimize instrumentation. This could be optimized
> later.
>
> Currently the code can be tested with SDE, or on a Intel
> Gemini Lake system with a new enough Linux kernel (v4.10+)
> that supports PTWRITE for PT. Linux perf can be used to
> record the values
>
> perf record -e intel_pt/ptw=1,branch=0/ program
> perf script --itrace=crw -F +synth ...
>
> I have an experimential version of perf that can also use
> dwarf information to symbolize many[1] values back to their variable
> names. So far it is not in standard perf, but available at
>
> https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4
>
> It is currently not able to decode all variable locations to names,
> but a large subset.
>
> Longer term hopefully gdb will support this information too.
>
> The CPU can potentially generate very data high bandwidths when
> code doing a lot of computation is heavily instrumented.
> This can cause some data loss in both the CPU and also in perf
> logging the data when the disk cannot keep up.
>
> Running some larger workloads most workloads do not cause
> CPU level overflows, but I've seen it with -fvartrace
> with crafty, and with more workloads with -fvartrace-locals.
>
> Recommendation is to not fully instrument programs,
> but only areas of interest either at the file level or using
> the attributes.
>
> The other thing is that perf and the disk often cannot keep up
> with the data bandwidth for longer computations. In this case
> it's possible to use perf snapshot mode (add --snapshot
> to the command line above). The data will be only logged to
> a memory ring buffer then, and only dump the buffers on events
> of interest by sending SIGUSR2 to the perf binrary.
>
> In the future this will be hopefully better supported with
> core files and gdb.
>
> Passes bootstrap and test suite on x86_64-linux, also
> bootstrapped and tested gcc itself with full -fvartrace
> and -fvartrace-locals instrumentation.

So how is this supposed to be used?  I guess in a
edit-debug cycle and not for production code?

What do you actually write with PTWRITE?  I suppose
you need to keep a ID to something mapping somewhere
so you can make sense of the perf records?

Few comments inline below, but I'm not sure if this
whole thing is interesting for GCC (as opposed to being
a instrumentation plugin)

> gcc/:
>
> 2018-11-03  Andi Kleen  
>
> * Makefile.in: Add tree-vartrace.o.
> * common.opt: Add -fvartrace, -fvartrace-returns,
> -fvartrace-args, -fvartrace-reads, -fvartrace-writes,
> -fvartrace-locals
> * config/i386/i386.c (ix86_vartrace_func): Add.
> (TARGET_VARTRACE_FUNC): Add.
> * doc/extend.texi: Document vartrace/no_vartrace
> attributes.
> * doc/invoke.texi: Document -fvartrace, -fvartrace-returns,
> -fvartrace-args, -fvartrace-reads, -fvartrace-writes,
> -fvartrace-locals
> * doc/tm.texi (TARGET_VARTRACE_FUNC): Add.
> * passes.def: Add vartrace pass.
> * target.def (vartrace_func): Add.
> * tree-pass.h (make_pass_vartrace): Add.
> * tree-vartrace.c: New file to implement vartrace pass.
>
> gcc/c-family/:
>
> 2018-11-03  Andi Kleen  
>
> * c-attribs.c (handle_vartrace_attribute): New function.
>
> config/:
>
> 2018-11-03  Andi Kleen  
>
> * bootstrap-vartrace.mk: New.
> * bootstrap-vartrace-locals.mk: New.
> ---
>  config/bootstrap-vartrace-locals.mk |   3 +
>  config/bootstrap-vartrace.mk|   3 +
>  gcc/Makefile.in |   1 +
>  gcc/c-family/c-attribs.c|  23 

libgo patch committed: Change RLIM_INFINITY from 0xffffffffffffffff to -1

2018-11-09 Thread Ian Lance Taylor
This libgo patch changees RLIM_INFINITY from 0x to -1
on GNU/Linux.  0x is arguably the correct value, but
in the gc toolchain's syscall package the value is -1.  So we are
compatible.  New programs should be using the golang.org/x/sys/unix
package anyhow.  This fixes https://golang.org/issue/28665.
Bootstrapped and ran Go tests on x86_64-pc-linux-gnu.  Committed to
trunk and to 7 and 8 branches.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 265820)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-5fcfe352ad91945a4f4d0dcfb6309df9bd072c7d
+da8c968474690d1e77442ac3361b2302ea8e1f36
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/mksysinfo.sh
===
--- libgo/mksysinfo.sh  (revision 265710)
+++ libgo/mksysinfo.sh  (working copy)
@@ -1152,10 +1152,17 @@ grep '^const _RLIMIT_' gen-sysinfo.go |
 grep '^const _RLIM_' gen-sysinfo.go |
 grep -v '^const _RLIM_INFINITY ' |
 sed -e 's/^\(const \)_\(RLIM_[^= ]*\)\(.*\)$/\1\2 = _\2/' >> ${OUT}
+rliminf=""
 if test "${rlimit}" = "_rlimit64" && grep '^const _RLIM64_INFINITY ' 
gen-sysinfo.go > /dev/null 2>&1; then
-  echo 'const RLIM_INFINITY = _RLIM64_INFINITY' >> ${OUT}
-elif grep '^const _RLIM_INFINITY ' gen-sysinfo.go > /dev/null 2>&1; then
-  echo 'const RLIM_INFINITY = _RLIM_INFINITY' >> ${OUT}
+  rliminf=`grep '^const _RLIM64_INFINITY ' gen-sysinfo.go | sed -e 's/.* //'`
+else
+  rliminf=`grep '^const _RLIM_INFINITY ' gen-sysinfo.go | sed -e 's/.* //'`
+fi
+# For compatibility with the gc syscall package, treat 0x as 
-1.
+if test "$rliminf" = "0x"; then
+  echo "const RLIM_INFINITY = -1" >> ${OUT}
+elif test -n "$rliminf"; then
+  echo "const RLIM_INFINITY = $rliminf" >> ${OUT}
 fi
 
 # The sysinfo struct.


[PATCH, GCC, ARM] Enable armv8.5-a and add +sb and +predres for previous ARMv8-a in ARM

2018-11-09 Thread Sudakshina Das
Hi

This patch adds -march=armv8.5-a to the Arm backend.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
Armv8.5-A also adds two new security features:
- Speculation Barrier instruction
- Execution and Data Prediction Restriction Instructions
These are made optional to all older Armv8-A versions. Thus we are
adding two new options "+sb" and "+predres" to all older Armv8-A. These
are passed on to the assembler and have no code generation effects and
have already gone in the trunk of binutils.

Bootstrapped and regression tested with arm-none-linux-gnueabihf.

Is this ok for trunk?
Sudi

*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* config/arm/arm-cpus.in (armv8_5, sb, predres): New features.
(ARMv8_5a): New fgroup.
(armv8.5-a): New arch.
(armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New
options sb and predres.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/t-aprofile: Add matching rules for -march=armv8.5-a
* config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a.
* config/arm/t-multilib (v8_5_a_simd_variants): New variable.
Add matching rules for -march=armv8.5-a and extensions.
* doc/invoke.texi (ARM options): Document -march=armv8.5-a.
Add sb and predres to all armv8-a except armv8.5-a.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* gcc.target/arm/multilib.exp: Add some -march=armv8.5-a
combination tests.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index d82e95a226659948e59b317f07e0fd386ed674a2..e6bcc3c720b64f4c80d9bff101e756de82d760e6 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -114,6 +114,9 @@ define feature armv8_3
 # Architecture rel 8.4.
 define feature armv8_4
 
+# Architecture rel 8.5.
+define feature armv8_5
+
 # M-Profile security extensions.
 define feature cmse
 
@@ -174,6 +177,14 @@ define feature quirk_cm3_ldrd
 # (Very) slow multiply operations.  Should probably be a tuning bit.
 define feature smallmul
 
+# Speculation Barrier Instruction for v8-A architectures, added by
+# default to v8.5-A
+define feature sb
+
+# Execution and Data Prediction Restriction Instruction for
+# v8-A architectures, added by default from v8.5-A
+define feature predres
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -235,6 +246,7 @@ define fgroup ARMv8_1aARMv8a crc32 armv8_1
 define fgroup ARMv8_2aARMv8_1a armv8_2
 define fgroup ARMv8_3aARMv8_2a armv8_3
 define fgroup ARMv8_4aARMv8_3a armv8_4
+define fgroup ARMv8_5aARMv8_4a armv8_5 sb predres
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
@@ -505,6 +517,8 @@ begin arch armv8-a
  option crypto add FP_ARMv8 CRYPTO
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
+ option sb add sb
+ option predres add predres
 end arch armv8-a
 
 begin arch armv8.1-a
@@ -517,6 +531,8 @@ begin arch armv8.1-a
  option crypto add FP_ARMv8 CRYPTO
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
+ option sb add sb
+ option predres add predres
 end arch armv8.1-a
 
 begin arch armv8.2-a
@@ -532,6 +548,8 @@ begin arch armv8.2-a
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
  option dotprod add FP_ARMv8 DOTPROD
+ option sb add sb
+ option predres add predres
 end arch armv8.2-a
 
 begin arch armv8.3-a
@@ -547,6 +565,8 @@ begin arch armv8.3-a
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
  option dotprod add FP_ARMv8 DOTPROD
+ option sb add sb
+ option predres add predres
 end arch armv8.3-a
 
 begin arch armv8.4-a
@@ -560,8 +580,23 @@ begin arch armv8.4-a
  option crypto add FP_ARMv8 CRYPTO DOTPROD
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
+ option sb add sb
+ option predres add predres
 end arch armv8.4-a
 
+begin arch armv8.5-a
+ tune for cortex-a53
+ tune flags CO_PROC
+ base 8A
+ profile A
+ isa ARMv8_5a
+ option simd add FP_ARMv8 DOTPROD
+ option fp16 add fp16 fp16fml FP_ARMv8 DOTPROD
+ option crypto add FP_ARMv8 CRYPTO DOTPROD
+ option nocrypto remove ALL_CRYPTO
+ option nofp remove ALL_FP
+end arch armv8.5-a
+
 begin arch armv8-m.base
  tune for cortex-m23
  base 8M_BASE
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index eacee746a39912d04aa03c636f9a95e0e72ce43b..dde6e137db5598d92df6a1e69a63140146bf7372 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -377,19 +377,22 @@ EnumValue
 Enum(arm_arch) String(armv8.4-a) Value(24)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.base) Value(25)
+Enum(arm_arch) String(armv8.5-a) Value(25)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.main) Value(26)
+Enum(arm_arch) String(armv8-m.base)

[PATCH][RFC] Come up with -flive-patching master option.

2018-11-09 Thread Martin Liška
Hi.

After I added 2 new options, I would like to include a new master option.
It's minimal version which only disables optimizations that we are aware of
and can potentially cause problems for live-patching.

Martin
>From dd52cd0249fc30cf6d7bf01a8826323277817b78 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 7 Nov 2018 12:41:19 +0100
Subject: [PATCH] Come up with -fvectorized-functions.

---
 gcc/fortran/decl.c| 33 +++
 gcc/fortran/gfortran.h|  2 ++
 gcc/fortran/match.h   |  1 +
 gcc/fortran/parse.c   |  3 +++
 gcc/fortran/trans-intrinsic.c | 43 +++
 5 files changed, 82 insertions(+)

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 2b77d950abb..938c35c508f 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -98,6 +98,9 @@ bool gfc_matching_function;
 /* Set upon parsing a !GCC$ unroll n directive for use in the next loop.  */
 int directive_unroll = -1;
 
+/* List middle-end built-ins that should be vectorized.  */
+vec vectorized_builtins;
+
 /* If a kind expression of a component of a parameterized derived type is
parameterized, temporarily store the expression here.  */
 static gfc_expr *saved_kind_expr = NULL;
@@ -11243,3 +11246,33 @@ gfc_match_gcc_unroll (void)
   gfc_error ("Syntax error in !GCC$ UNROLL directive at %C");
   return MATCH_ERROR;
 }
+
+/* Match a !GCC$ builtin b attributes flags form:
+
+   The parameter b is name of a middle-end built-in.
+   Flags are one of:
+ - omp-simd-notinbranch.
+
+   When we come here, we have already matched the !GCC$ builtin string.  */
+match
+gfc_match_gcc_builtin (void)
+{
+  char builtin[GFC_MAX_SYMBOL_LEN + 1];
+
+  if (gfc_match_name (builtin) != MATCH_YES)
+return MATCH_ERROR;
+
+  gfc_gobble_whitespace ();
+  if (gfc_match ("attributes") != MATCH_YES)
+return MATCH_ERROR;
+
+  gfc_gobble_whitespace ();
+  if (gfc_match ("omp_simd_notinbranch") != MATCH_YES)
+return MATCH_ERROR;
+
+  char *r = XNEWVEC (char, strlen (builtin) + 32);
+  sprintf (r, "__builtin_%s", builtin);
+  vectorized_builtins.safe_push (r);
+
+  return MATCH_YES;
+}
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index d8ef35d9d6c..bb7f4dd0c4b 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2763,6 +2763,7 @@ gfc_finalizer;
 bool gfc_in_match_data (void);
 match gfc_match_char_spec (gfc_typespec *);
 extern int directive_unroll;
+extern vec vectorized_builtins;
 
 /* Handling Parameterized Derived Types  */
 bool gfc_insert_kind_parameter_exprs (gfc_expr *);
@@ -3501,5 +3502,6 @@ bool gfc_is_reallocatable_lhs (gfc_expr *);
 /* trans-decl.c */
 
 void finish_oacc_declare (gfc_namespace *, gfc_symbol *, bool);
+void gfc_adjust_builtins (void);
 
 #endif /* GCC_GFORTRAN_H  */
diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index 418542bd5a6..f25ed860c06 100644
--- a/gcc/fortran/match.h
+++ b/gcc/fortran/match.h
@@ -247,6 +247,7 @@ match gfc_match_dimension (void);
 match gfc_match_external (void);
 match gfc_match_gcc_attributes (void);
 match gfc_match_gcc_unroll (void);
+match gfc_match_gcc_builtin (void);
 match gfc_match_import (void);
 match gfc_match_intent (void);
 match gfc_match_intrinsic (void);
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 13cc6f5fccd..56d0d050bc3 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -1072,6 +1072,7 @@ decode_gcc_attribute (void)
 
   match ("attributes", gfc_match_gcc_attributes, ST_ATTR_DECL);
   match ("unroll", gfc_match_gcc_unroll, ST_NONE);
+  match ("builtin", gfc_match_gcc_builtin, ST_NONE);
 
   /* All else has failed, so give up.  See if any of the matchers has
  stored an error message of some sort.  */
@@ -5663,6 +5664,8 @@ parse_progunit (gfc_statement st)
   gfc_state_data *p;
   int n;
 
+  gfc_adjust_builtins ();
+
   if (gfc_new_block
   && gfc_new_block->abr_modproc_decl
   && gfc_new_block->attr.function)
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 4ae2b3252b5..0417e039a39 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -597,7 +597,50 @@ define_quad_builtin (const char *name, tree type, bool is_const)
   return fndecl;
 }
 
+/* Add SIMD attribute for FNDECL built-in if the built-in
+   name is in VECTORIZED_BUILTINS.  */
 
+static void
+add_simd_flag_for_built_in (tree fndecl)
+{
+  if (fndecl == NULL_TREE)
+return;
+
+  const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  for (unsigned i = 0; i < vectorized_builtins.length (); i++)
+if (strcmp (vectorized_builtins[i], name) == 0)
+  {
+	DECL_ATTRIBUTES (fndecl)
+	  = tree_cons (get_identifier ("omp declare simd"),
+		   build_tree_list (NULL_TREE,
+	build_omp_clause (UNKNOWN_LOCATION,
+			  OMP_CLAUSE_NOTINBRANCH)),
+		   DECL_ATTRIBUTES (fndecl));
+	return;
+  }
+}
+
+void
+gfc_adjust_builtins (void)
+{
+  gfc_intrinsic_map_t *m;
+  for (m = gfc_intrinsi

Re: [PATCH] Fix up affinity-fmt.c (was Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9)

2018-11-09 Thread Iain Sandoe
Hi Jakub,


> On 9 Nov 2018, at 06:37, Jakub Jelinek  wrote:

> +#if defined (HAVE_INTTYPES_H) && defined (PRIx64)
> + else if (sizeof (gomp_integral (handle)) == sizeof (uint64_t))
> +   sprintf (buf, "0x" PRIx64, (uint64_t) gomp_integral (handle));

s/0x/0x%/?

Iain



Re: Implement {get,set}_range_info() variants that work with value_range's

2018-11-09 Thread Martin Sebor

On 11/09/2018 03:02 AM, Aldy Hernandez wrote:



On 11/8/18 4:47 PM, Martin Sebor wrote:

On 11/08/2018 04:52 AM, Aldy Hernandez wrote:

get/set_range_info() currently returns the extremes of a range.  I have
implemented overloaded variants that return a proper range.  In the
future we should use actual ranges throughout, and not depend on range
extremes, as depending on this behavior could causes us to lose
precision.

I am also including changes to size_must_be_zero_p() to show how we
should be using the range API, as opposed to performing error prone
ad-hoc calculations on ranges and anti-ranges.

Martin, I'm not saying your code is wrong.  There are numerous other
places in the compiler where we manipulate ranges/anti-ranges directly,
all of which should be adapted in the future.  Everywhere there is a
mention of VR_RANGE/VR_ANTI_RANGE in the compiler is suspect.  We should
ideally be using intersect/union/may_contain_p/null_p, etc.

OK pending another round of tests?  (As I had tested this patch along
with a whole slew of other upcoming changes ;-)).


I'm all for simplification! :)

I'd love it if the API made it possible to write these expressions
using native operators and intuitive names, without having to worry
about details like what types they are represented in or how to
convert between them.  It would be great if a function like
size_must_be_zero_p could be coded along the lines similar to:

   return intersect (size, value_range (ssize_type_node) - 0) == 0;

It shouldn't have to twiddle bits to compute the wide_int value
of SSIZE_MAX or wrestle with convertibility warts by calling
build_int_cst and and wide_int_to_tree.

Martin

PS I think we should be able to get close to the syntax above
with a few value_range ctors:

   value_range::value_range (tree type_or_expression);


Would this create the range for [type, type] or [-MIN, type] or what?


For a TYPE it would create [TYPE_MIN, TYPE_MAX].


   template 
   value_range::value_range (const IntegerType&);

a non-member minus operator to compute the difference:

   value_range operator- (const value_range&, const value_range&);

a non-member intersect function like this:

   value_range intersect (const value_range&, const value_range&);

and a non-member equality operator:

   bool operator== (const value_range&, const value_range&);

With the above, because a tree and any integer (including wide_int,
offset_int, and all the other flavors du jour) implicitly convert
to a value_range (possibly representing a single value),
the subtraction, intersection, and equality would apply to any and
all of those types as well without callers having to spell out or
even know what representation they're dealing with.


I like it.


I'm glad!  When can we get it? ;-)  Seriously, if this is
something we would like to see I'm happy to help put it
together.

Martin


Re: [PATCH] Fix up affinity-fmt.c (was Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9)

2018-11-09 Thread Jakub Jelinek
On Fri, Nov 09, 2018 at 07:33:44AM -0800, Iain Sandoe wrote:
> > On 9 Nov 2018, at 06:37, Jakub Jelinek  wrote:
> 
> > +#if defined (HAVE_INTTYPES_H) && defined (PRIx64)
> > +   else if (sizeof (gomp_integral (handle)) == sizeof (uint64_t))
> > + sprintf (buf, "0x" PRIx64, (uint64_t) gomp_integral (handle));
> 
> s/0x/0x%/?

Oops, consider it adjusted.  Haven't started my bootstrap/regtest of it
yet...

Thanks.

Jakub


Re: [RFC] support --with-multilib-list=@/path/name

2018-11-09 Thread Richard Earnshaw (lists)
On 09/11/2018 15:07, Alexandre Oliva wrote:
> Richard,
> 
> Olivier tells me he talked to you briefly at the Cauldron about allowing
> custom multilib sets to be configured from custom Makefile fragments
> supplied at configure time, and that you was somewhat receptive to the
> idea.  I have implemented this as follows, for ARM targets only so far,
> using "@/path/name" in the argument to --with-multilib-list to mean
> /path/name is to be used as a multilib-list Makefile fragment.
> 
> Does this look like an acceptable interface?  Is it ok as far as ARM
> maintainers are concerned?
> 
> Any objections from maintainers of other targets that support
> --with-multilib-list (or from anyone else) to the syntax, to the
> implementation, or to extending this possibility to their ports?
> 
> for  gcc/ChangeLog
> 
>   * config.gcc (tmake_file): Add /path/name to tmake_file for
> each @/path/name in --with-multilib-list on arm-*-* targets.
>   * configure.ac: Accept full pathnames in tmake_file.
>   * configure: Rebuilt.
>   * doc/install.texi (with-multilib-list): Document it.

I'm not opposed to such an extension, but I do have some caveats on the
current implementation (mostly on the implied contract between the
compiler and the person building GCC):

- I'm not sure if we really want to allow combinations of an arbitrary
multilib config with the builtin configurations.  Those are tricky
enough to get right as it is, and requests to support additional libs as
well as those with changes to these makefile fragments might be an
issue.  As such, I think I'd want to say that you either use the builtin
lists *or* you supply your own fragment.

- I'd also be concerned about implying that this interface into the
compiler build system is in any way stable, so I think we'd want to
document explicitly that makefile fragments supplied this way may have
to be tailored to a specific release of GCC.

Given the second point, there's nothing to stop a user copying the
internal makefile fragments into their own fragment and then adjusting
it to work with their additional features; so I don't think the first
restriction is too onerous.

R.

> ---
>  gcc/config.gcc   |   13 +
>  gcc/configure|9 ++---
>  gcc/configure.ac |5 -
>  gcc/doc/install.texi |   22 +++---
>  4 files changed, 38 insertions(+), 11 deletions(-)
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 7578ff03825e..f1363c41f989 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -3998,6 +3998,19 @@ case "${target}" in
>   aprofile|rmprofile)
>   
> tmake_profile_file="arm/t-multilib"
>   ;;
> + @/*)
> + ml=`echo "X$arm_multilib" | sed 
> '1s,^X@,,'`
> + if test -f "${ml}"; then
> + 
> tmake_file="${tmake_file} ${ml}"
> + else
> + echo "Error: ${ml} does 
> not exist" >&2
> + exit 1
> + fi
> + ;;
> + @*)
> + echo "Error: multilib config 
> file must start with /" >&2
> + exit 1
> + ;;
>   *)
>   echo "Error: 
> --with-multilib-list=${with_multilib_list} not supported." 1>&2
>   exit 1
> diff --git a/gcc/configure b/gcc/configure
> index b814484ea25b..5f15c7a1ff02 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -12244,7 +12244,10 @@ done
>  tmake_file_=
>  for f in ${tmake_file}
>  do
> - if test -f ${srcdir}/config/$f
> + if test -n `echo "X$f" | sed -n '1s,^X/.*,/,p` && test -f "$f"
> + then
> + tmake_file_="${tmake_file_} $f"
> + elif test -f ${srcdir}/config/$f
>   then
>   tmake_file_="${tmake_file_} \$(srcdir)/config/$f"
>   fi
> @@ -18572,7 +18575,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 18575 "configure"
> +#line 18578 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> @@ -18678,7 +18681,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 18681 "configure"
> +#line 18684 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index 5958591255

Re: [RFC][PATCH]Merge VEC_COND_EXPR into MASK_STORE after loop vectorization

2018-11-09 Thread Renlin Li

Hi Richard,

On 11/09/2018 11:48 AM, Richard Biener wrote:

On Thu, Nov 8, 2018 at 5:55 PM Renlin Li  wrote:


Hi Richard,


*However*, after I rebased my patch on the latest trunk.
Got the following dump from ifcvt:
 [local count: 1006632961]:
# i_20 = PHI 
# ivtmp_18 = PHI 
a_10 = array[i_20];
_1 = a_10 & 1;
_2 = a_10 + 1;
_ifc__34 = _1 != 0 ? _2 : a_10;
array[i_20] = _ifc__34;
_4 = a_10 + 2;
_ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
array[i_20] = _ifc__37;
i_13 = i_20 + 1;
ivtmp_5 = ivtmp_18 - 1;
if (ivtmp_5 != 0)
  goto ; [93.33%]
else
  goto ; [6.67%]

the redundant load is not generated, but you could still see the unconditional 
store.


Yes, I fixed the redundant loads recently and indeed dead stores
remain (for the particular
testcase they would be easy to remove).


Right.




After loop vectorization, the following is generated (without my change):


Huh.  But that's not because of if-conversion but because SVE needs to
mask _all_
loop operations that are not safe to execute with the loop_mask!


vect_a_10.6_6 = .MASK_LOAD (vectp_array.4_35, 4B, loop_mask_7);
a_10 = array[i_20];
vect__1.7_39 = vect_a_10.6_6 & vect_cst__38;
_1 = a_10 & 1;
vect__2.8_41 = vect_a_10.6_6 + vect_cst__40;
_2 = a_10 + 1;
vect__ifc__34.9_43 = VEC_COND_EXPR ;
_ifc__34 = _1 != 0 ? _2 : a_10;
.MASK_STORE (vectp_array.10_45, 4B, loop_mask_7, vect__ifc__34.9_43);
vect__4.12_49 = vect_a_10.6_6 + vect_cst__48;
_4 = a_10 + 2;
vect__ifc__37.13_51 = VEC_COND_EXPR  vect_cst__50, 
vect__4.12_49, vect__ifc__34.9_43>;
_ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
.MASK_STORE (vectp_array.14_53, 4B, loop_mask_7, vect__ifc__37.13_51);

With the old ifcvt code, my change here could improve it a little bit, 
eliminate some redundant load.
With the new code, it could not improved it further. I'll adjust the patch 
based on the latest trunk.


So what does the patch change the above to?  The code has little to no
comments apart from a
small picture with code _before_ the transform...

It is like this:
  vect_a_10.6_6 = .MASK_LOAD (vectp_array.4_35, 4B, loop_mask_7);
  a_10 = array[i_20];
  vect__1.7_39 = vect_a_10.6_6 & vect_cst__38;
  _1 = a_10 & 1;
  vect__2.8_41 = vect_a_10.6_6 + vect_cst__40;
  _2 = a_10 + 1;
  _60 = vect__1.7_39 != vect_cst__42;
  vect__ifc__34.9_43 = VEC_COND_EXPR <_60, vect__2.8_41, vect_a_10.6_6>;
  _ifc__34 = _1 != 0 ? _2 : a_10;
  vec_mask_and_61 = _60 & loop_mask_7;
  .MASK_STORE (vectp_array.10_45, 4B, vec_mask_and_61, vect__2.8_41);
  vect__4.12_49 = vect_a_10.6_6 + vect_cst__48;
  _4 = a_10 + 2;
  vect__ifc__37.13_51 = VEC_COND_EXPR  vect_cst__50, 
vect__4.12_49, vect__ifc__34.9_43>;
  _ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
  .MASK_STORE (vectp_array.14_53, 4B, loop_mask_7, vect__ifc__37.13_51);

As the loaded value is used later, It could not be removed.

With the change, ideally, less data is stored.
However, it might generate more instructions.

1, The load is not eliminable. Apparently, your change eliminate most of the 
redundant load.
   The rest is necessary or not easy to remove.
2, additional AND instruction.

With a simpler test case like this:

static int array[100];
int test (int a, int i)
{
  for (unsigned i = 0; i < 16; i++)
{
  if (a & 1)
array[i] = a + 1;
}
  return array[i];
}

The new code-gen will be:
  vect__2.4_29 = vect_cst__27 + vect_cst__28;
  _44 = vect_cst__34 != vect_cst__35;
  vec_mask_and_45 = _44 & loop_mask_32;
  .MASK_STORE (vectp_array.9_37, 4B, vec_mask_and_45, vect__2.4_29);

While the old one is:

  vect__2.4_29 = vect_cst__27 + vect_cst__28;
  vect__ifc__24.7_33 = .MASK_LOAD (vectp_array.5_30, 4B, loop_mask_32);
  vect__ifc__26.8_36 = VEC_COND_EXPR ;
  .MASK_STORE (vectp_array.9_37, 4B, loop_mask_32, vect__ifc__26.8_36);




I was wondering whether we can implement

   l = [masked]load;
   tem = cond ? x : l;
   masked-store = tem;

pattern matching in a regular pass - forwprop for example.  Note the
load doesn't need to be masked,
correct?  In fact if it is masked you need to make sure the
conditional never accesses parts that
are masked in the load, no?  Or require the mask to be the same as
that used by the store.  But then
you still cannot simply replace the store mask with a new mask
generated from the conditional?


Yes, this would require the mask for load and store is the same.
This matches the pattern before loop vectorization.
The mask here is loop mask, to ensure we are bounded by the number of 
iterations.

The new mask is the (original mask & condition mask) (example shown above).
In this case, less lanes will be stored.

It is possible we do that in forwprop.
I could try to integrate the change into it if it is the correct place to go.

As the pattern is initially generated by loop vectorizer, I did the change 
right after it before it got
converted into other forms. For example, forwprop will transform the original 
c

Re: [PATCH][GCC] Make DR_TARGET_ALIGNMENT compile time variable

2018-11-09 Thread Andre Vieira (lists)
On 05/11/18 12:41, Richard Biener wrote:
> On Mon, Nov 5, 2018 at 1:07 PM Andre Vieira (lists)
>  wrote:
>>
>>
>> Hi,
>>
Hi,

Thank you for the quick response! See inline responses below.

>> This patch enables targets to describe DR_TARGET_ALIGNMENT as a
>> compile-time variable.  It does so by turning the variable into a
>> 'poly_uint64'.  This should not affect the current code-generation for
>> any target.
>>
>> We hope to use this in the near future for SVE using the
>> current_vector_size as the preferred target alignment for vectors.  In
>> fact I have a patch to do just this, but I am still trying to figure out
>> whether and when it is beneficial to peel for alignment with a runtime
>> misalignment.
> 
> In fact in most cases I have seen the issue is that it's not visible whether
> peeling will be able to align _all_ references and doing peeling only to
> align some is hardly beneficial.  To improve things the vectorizer would
> have to version the loop for the case where peeling can reach alignment
> for a group of DRs and then vectorize one copy with peeling for alignment
> and one copy with unaligned accesses.


So I have seen code being peeled for alignment even when it only knows
how to align one of a group (only checked 2 or 3) and I think this may
still be beneficial in some cases.  I am more worried about cases where
the number of iterations isn't enough to justify the initial peeling
cost or when the loop isn't memory bound, i.e. very arithmetic heavy
loops.  This is a bigger vectorization problem though, that would
require some kind of cost-model.

> 
>>  The patch I am working on will change the behavior of
>> auto-vectorization for SVE when building vector-length agnostic code for
>> targets that benefit from aligned vector loads/stores.  The patch will
>> result in  the generation of a runtime computation of misalignment and
>> the construction of a corresponding mask for the first iteration of the
>> loop.
>>
>> I have decided to not offer support for prolog/epilog peeling when the
>> target alignment is not compile-time constant, as this didn't seem
>> useful, this is why 'vect_do_peeling' returns early if
>> DR_TARGET_ALIGNMENT is not constant.
>>
>> I bootstrapped and tested this on aarch64 and x86 basically
>> bootstrapping one target that uses this hook and one that doesn't.
>>
>> Is this OK for trunk?
> 
> The patch looks good but I wonder wheter it is really necessary at this
> point.

The goal of this patch is really to enable future work, on it's own it
does nothing.  I am working on a small target-specific patch to enable
this for SVE, but I need to do a bit more analysis and benchmarking to
be able to determine whether its beneficial which I will not be able to
finish before end of stage 1. That is why I split them up and sent this
one upstream to see if I could get the middle-end change in.

> 
> Thanks,
> Richard.
> 
>> Cheers,
>> Andre
>>
>> 2018-11-05  Andre Vieira  
>>
>> * config/aarch64/aarch64.c 
>> (aarch64_vectorize_preferred_vector_alignment):
>> Change return type to poly_uint64.
>> (aarch64_simd_vector_alignment_reachable): Adapt to preferred vector
>> alignment being a poly int.
>> * doc/tm.texi (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Change 
>> return
>> type to poly_uint64.
>> * target.def (default_preferred_vector_alignment): Likewise.
>> * targhooks.c (default_preferred_vector_alignment): Likewise.
>> * targhooks.h (default_preferred_vector_alignment): Likewise.
>> * tree-vect-data-refs.c
>> (vect_calculate_target_alignment): Likewise.
>> (vect_compute_data_ref_alignment): Adapt to vector alignment
>> being a poly int.
>> (vect_update_misalignment_for_peel): Likewise.
>> (vect_enhance_data_refs_alignment): Likewise.
>> (vect_find_same_alignment_drs): Likewise.
>> (vect_duplicate_ssa_name_ptr_info): Likewise.
>> (vect_setup_realignment): Likewise.
>> (vect_can_force_dr_alignment_p): Change alignment parameter type to
>> poly_uint64.
>> * tree-vect-loop-manip.c (get_misalign_in_elems): Learn to construct 
>> a mask
>> with a compile time variable vector alignment.
>> (vect_gen_prolog_loop_niters): Adapt to vector alignment being a 
>> poly int.
>> (vect_do_peeling): Exit early if vector alignment is not constant.
>> * tree-vect-stmts.c (ensure_base_align): Adapt to vector alignment 
>> being a
>> poly int.
>> (vectorizable_store): Likewise.
>> (vectorizable_load): Likweise.
>> * tree-vectorizer.h (struct dr_vec_info): Make target_alignment 
>> field a
>> poly_uint64.
>> (vect_known_alignment_in_bytes): Adapt to vector alignment being a 
>> poly
>> int.
>> (vect_can_force_dr_alignment_p): Change alignment parameter type to
>> poly_uint64.



Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9

2018-11-09 Thread David Edelsohn
> 2018-11-08  Rainer Orth  
>
> * affinity.c: Include , .
> (gomp_display_affinity_place): Remove cpusetp.
> * teams.c: Include .

This fixed AIX bootstrap failure also.

Thanks, David


Re: [PATCH v3] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ilya Leoshkevich
> Am 09.11.2018 um 14:43 schrieb Ulrich Weigand :
> 
> Ilya Leoshkevich wrote:
> 
>> +  /* Return unannotated constant pool references, so that the corresponding
>> + entries are added to the back-end-managed pool.  Not doing so would 
>> result
>> + in those entries being placed in the middle-end-managed pool, which 
>> would
>> + in turn prevent merging them with identical ones in the 
>> back-end-managed
>> + pool.  */
> 
> I'm still not convinced this is necessary; how frequently does the case occur
> that a constant is duplicated, and is it worth the extra code complexity?
> (Also, note that the middle-end pool is shared across the whole translation
> unit, while the back-end pool is duplicated in every function ... so if the
> same constant is used by many functions, forcing it into the back-end pool
> may *increase* overall size.)
> 
> In any case, does this even happen with your current code?  You find the
> unannotated reference here, and therefore copy the constant into the local
> pool, but then replace_constant_pool_ref will still ignore the insn, and
> thus the insn will actually continue to refer to the middle-end pool
> constant, *not* the copy created in the local pool.  Right?

Whoops, that’s right.  This code must be currently achieving nothing at
all!  I did not run into such cases and I don’t have a test case for
them, so they are hypothetical.  So let’s keep it simple here.

> 
>>   if (DISP_IN_RANGE (INTVAL (frame_off)))
>>  {
>> -  insn = gen_rtx_SET (frame_pointer,
>> -  gen_rtx_PLUS (Pmode, frame_pointer, frame_off));
>> -  insn = emit_insn (insn);
>> +  insn = emit_insn (copy_rtx (cfa));
>>  }
> 
> This seems unrelated?

Now that insn is of type rtx_insn *, gen_rtx_SET () result cannot be
assigned to it.  Let me rewrite this by creating an rtx temporary,
which should be more readable.


Re: [PATCH] Fix not properly nul-terminated string constants in JIT

2018-11-09 Thread Bernd Edlinger
Hi Dave,

is the patch OK, or do you still have questions?


Thanks
Bernd.

On 11/2/18 10:48 PM, Bernd Edlinger wrote:
> On 11/2/18 9:40 PM, David Malcolm wrote:
>> On Sun, 2018-08-05 at 16:59 +, Bernd Edlinger wrote:
>>> Hi!
>>>
>>>
>>> My other patch with adds assertions to varasm.c regarding correct
>>> nul termination of sting literals did make these incorrect string
>>> constants in JIT frontend fail.
>>>
>>> The string constants are not nul terminated if their length exceeds
>>> 200 characters.  The test cases do not use strings of that size where
>>> that would make a difference.  But using a fixed index type is
>>> clearly
>>> wrong.
>>>
>>> This patch removes the fixed char[200] array type from
>>> playback::context,
>>> and uses build_string_literal instead of using build_string directly.
>>>
>>>
>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>> Is it OK for trunk?
>>
>> Sorry for the belated response.
>>
>> Was this tested with --enable-host-shared and --enable-languages=jit ?
>> Note that "jit" is not included in --enable-languages=all.
>>
> 
> Yes, of course.  The test suite contains a few string constants, just
> all of them are shorter than 200 characters.  But I think removing this
> artificial limit enables the existing test cases to test that the
> shorter string is in fact zero terminated.
> 
>> The patch seems reasonable, but I'm a little confused over the meaning
>> of "len" in build_string_literal and build_string: does it refer to the
>> length or the size of the string?
>>
> 
> build_string_literal:
> For languages that use zero-terminated strings, len is strlen(str)+1, and
> str is a zero terminated single-byte character string.
> For languages that don't use zero-terminated strings, len is the size of
> the string and str is not zero terminated.
> 
> build_string:
> constructs a STRING_CST tree object, which is usable as is in some contexts,
> like for asm constraints, but as a string literal it is incomplete, and
> needs an index type.  The index type defines the memory size which must
> be larger than the string precision.  Excess memory is implicitly cleared.
> 
> This means currently all jit strings shorter than 200 characters
> are filled with zero up to the limit of 200 chars as imposed by
> m_char_array_type_node.  Strings of exactly 200 chars are not zero terminated,
> and larger strings should result in an assertion (excess precision was 
> previously
> allowed, but no zero termination was appended, when that is not part of
> the original string constant).
> 
> Previously it was allowed to have memory size less than the string len, which
> had complicated the STRING_CST semantics in the middle-end, but with the
> string_cst semantic rework I did for gcc-9 this is no longer allowed and
> results in (checking) assertions in varasm.c.
> 
>>> @@ -617,16 +616,9 @@ playback::rvalue *
>>>   playback::context::
>>>   new_string_literal (const char *value)
>>>   {
>>> -  tree t_str = build_string (strlen (value), value);
>>> -  gcc_assert (m_char_array_type_node);
>>> -  TREE_TYPE (t_str) = m_char_array_type_node;
>>> -
>>> -  /* Convert to (const char*), loosely based on
>>> - c/c-typeck.c: array_to_pointer_conversion,
>>> - by taking address of start of string.  */
>>> -  tree t_addr = build1 (ADDR_EXPR, m_const_char_ptr, t_str);
>>> +  tree t_str = build_string_literal (strlen (value) + 1, value);
>>> -  return new rvalue (this, t_addr);
>>> +  return new rvalue (this, t_str);
>>>   }
>>
>> In the above, the call to build_string with strlen is replaced with
>> build_string_literal with strlen + 1.
>>
>>
>> build_string's comment says:
>>
>> "Note that for a C string literal, LEN should include the trailing
>> NUL."
>>
>> but has:
>>
>>    length = len + offsetof (struct tree_string, str) + 1;
>>
>> and:
>>
>>    TREE_STRING_LENGTH (s) = len;
>>    memcpy (s->string.str, str, len);
>>    s->string.str[len] = '\0';
>>
>> suggesting that the "len" parameter is in fact the length *without* the
>> trailing NUL, and that a trailing NUL is added by build_string.
>>
> 
> Yes, string constants in tree objects have another zero termiation,
> but varasm.c does something different, there the index range takes
> precedence.
> The index range is built in build_string_literal as follows:
> 
>    elem = build_type_variant (char_type_node, 1, 0);
>    index = build_index_type (size_int (len - 1));
>    type = build_array_type (elem, index);
> 
> therefore the string constant hast the type char[0..len-1]
> thus only len bytes are significant for code generation, the extra
> nul is just for "convenience".
> 
>> However build_string_literal has:
>>
>>    t = build_string (len, str);
>>    elem = build_type_variant (char_type_node, 1, 0);
>>    index = build_index_type (size_int (len - 1));
>>
>> suggesting that the len is passed directly to build_string (and thus
>> ought to be strlen), but the build_index_type uses len - 1 (which
>> suggests that len is the size of the strin

[PATCH v4] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ilya Leoshkevich
Bootstrapped and regtested on s390x-redhat-linux.

r265490 allowed the compiler to choose in a more flexible way whether to
use load or load-address-relative-long (LARL) instruction.  When it
chose LARL for literal pool references, the latter ones were rewritten
by pass_s390_early_mach to use UNSPEC_LTREF, which assumes base register
usage, which in turn is not compatible with LARL.  The end result was an
ICE because of unrecognizable insn.

UNSPEC_LTREF and friends are necessary in order to communicate the
dependency on the base register to pass_sched2.  When relative
addressing is used, no base register is necessary, so in such cases the
rewrite must be avoided.

gcc/ChangeLog:

2018-11-05  Ilya Leoshkevich  

PR target/87762
* config/s390/predicates.md (larl_operand): Use
s390_symbol_relative_long_p () to reduce code duplication.
* config/s390/s390-protos.h (s390_symbol_relative_long_p): New
function.
* config/s390/s390.c (s390_safe_relative_long_p): Likewise.
(annotate_constant_pool_refs): Skip insns which support relative
addressing.
(annotate_constant_pool_refs_1): New helper function.
(s390_symbol_relative_long_p): New function.
(find_constant_pool_ref): Assert that unannotated constant pool
references must use relative addressing.
(replace_constant_pool_ref): Skip insns which support relative
addressing.
(replace_constant_pool_ref_1): New helper function.
(s390_mainpool_finish): Adjust to the new signature of
replace_constant_pool_ref ().
(s390_chunkify_finish): Likewise.
(pass_s390_early_mach::execute): Likewise.
(s390_prologue_plus_offset): Likewise.
(s390_emit_prologue): Likewise.
(s390_emit_epilogue): Likewise.
---
 gcc/config/s390/predicates.md |  8 +--
 gcc/config/s390/s390-protos.h |  1 +
 gcc/config/s390/s390.c| 93 ++-
 3 files changed, 72 insertions(+), 30 deletions(-)

diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index 97f717c558d..bb66c8a6bcb 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -151,9 +151,7 @@
   if (GET_CODE (op) == LABEL_REF)
 return true;
   if (SYMBOL_REF_P (op))
-return (!SYMBOL_FLAG_NOTALIGN2_P (op)
-   && SYMBOL_REF_TLS_MODEL (op) == 0
-   && s390_rel_address_ok_p (op));
+return s390_symbol_relative_long_p (op);
 
   /* Everything else must have a CONST, so strip it.  */
   if (GET_CODE (op) != CONST)
@@ -176,9 +174,7 @@
   if (GET_CODE (op) == LABEL_REF)
 return true;
   if (SYMBOL_REF_P (op))
-return (!SYMBOL_FLAG_NOTALIGN2_P (op)
-   && SYMBOL_REF_TLS_MODEL (op) == 0
-   && s390_rel_address_ok_p (op));
+return s390_symbol_relative_long_p (op);
 
 
   /* Now we must have a @GOTENT offset or @PLT stub
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 96fa705f879..d3c2cc55e28 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -157,6 +157,7 @@ extern void s390_indirect_branch_via_thunk (unsigned int 
regno,
rtx comparison_operator,
enum s390_indirect_branch_type 
type);
 extern void s390_indirect_branch_via_inline_thunk (rtx execute_target);
+extern bool s390_symbol_relative_long_p (rtx);
 #endif /* RTX_CODE */
 
 /* s390-c.c routines */
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 1be85727b73..58f3163b274 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -2731,6 +2731,17 @@ s390_safe_attr_type (rtx_insn *insn)
 return TYPE_NONE;
 }
 
+/* Return attribute relative_long of insn.  */
+
+static bool
+s390_safe_relative_long_p (rtx_insn *insn)
+{
+  if (recog_memoized (insn) >= 0)
+return get_attr_relative_long (insn) == RELATIVE_LONG_YES;
+  else
+return false;
+}
+
 /* Return true if DISP is a valid short displacement.  */
 
 static bool
@@ -8116,11 +8127,8 @@ s390_first_cycle_multipass_dfa_lookahead (void)
   return 4;
 }
 
-/* Annotate every literal pool reference in X by an UNSPEC_LTREF expression.
-   Fix up MEMs as required.  */
-
 static void
-annotate_constant_pool_refs (rtx *x)
+annotate_constant_pool_refs_1 (rtx *x)
 {
   int i, j;
   const char *fmt;
@@ -8199,16 +8207,39 @@ annotate_constant_pool_refs (rtx *x)
 {
   if (fmt[i] == 'e')
{
- annotate_constant_pool_refs (&XEXP (*x, i));
+ annotate_constant_pool_refs_1 (&XEXP (*x, i));
}
   else if (fmt[i] == 'E')
{
  for (j = 0; j < XVECLEN (*x, i); j++)
-   annotate_constant_pool_refs (&XVECEXP (*x, i, j));
+   annotate_constant_pool_refs_1 (&XVECEXP (*x, i, j));
}
 }
 }
 
+/* Annotate every literal pool reference in INSN by an UNSPEC_LTREF expression.
+   Fix up MEMs as required.
+   Skip insns whic

[PATCH] -fsave-optimization-record: compress the output using zlib

2018-11-09 Thread David Malcolm
One of the concerns noted at Cauldron about -fsave-optimization-record
was the size of the output files.

This file implements compression of the -fsave-optimization-record
output, using zlib.

I did some before/after testing of this patch, using SPEC 2017's
502.gcc_r with -O3, looking at the sizes of the generated
FILENAME.opt-record.json[.gz] files.

The largest file was for insn-attrtab.c:
  before:  171736285 bytes (164M)
  after: 5304015 bytes (5.1M)

Smallest file was for vasprintf.c:
  before:  30567 bytes
  after:4485 bytes

Median file by size before was lambda-mat.c:
  before:2266738 bytes (2.2M)
  after:   75988 bytes (15K)

Total of all files in the benchmark:
  before: 2041720713 bytes (1.9G)
  after:66870770 bytes (63.8M)

...so clearly compression is a big win in terms of file size, at the
cost of making the files slightly more awkward to work with. [1]
I also wonder if we want to support any pre-filtering of the output
(FWIW roughly half of the biggest file seems to be "Adding assert for "
messages from tree-vrp.c).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

[1] I've updated my optrecord.py module to deal with this, which
simplifies things; it's still not clear to me if that should live
in "contrib/" or not.

gcc/ChangeLog:
* doc/invoke.texi (-fsave-optimization-record): Note that the
output is compressed.
* optinfo-emit-json.cc: Include .
(optrecord_json_writer::write): Compress the output.
---
 gcc/doc/invoke.texi  |  6 +++---
 gcc/optinfo-emit-json.cc | 35 +++
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4ff3a150..f26ada0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14408,11 +14408,11 @@ dumps from the vectorizer about missed opportunities.
 
 @item -fsave-optimization-record
 @opindex fsave-optimization-record
-Write a SRCFILE.opt-record.json file detailing what optimizations
+Write a SRCFILE.opt-record.json.gz file detailing what optimizations
 were performed, for those optimizations that support @option{-fopt-info}.
 
-This option is experimental and the format of the data within the JSON
-file is subject to change.
+This option is experimental and the format of the data within the
+compressed JSON file is subject to change.
 
 It is roughly equivalent to a machine-readable version of
 @option{-fopt-info-all}, as a collection of messages with source file,
diff --git a/gcc/optinfo-emit-json.cc b/gcc/optinfo-emit-json.cc
index 31029ad..6d4502c 100644
--- a/gcc/optinfo-emit-json.cc
+++ b/gcc/optinfo-emit-json.cc
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pass_manager.h"
 #include "selftest.h"
 #include "dump-context.h"
+#include 
 
 /* A class for writing out optimization records in JSON format.  */
 
@@ -133,16 +134,34 @@ optrecord_json_writer::~optrecord_json_writer ()
 void
 optrecord_json_writer::write () const
 {
-  char *filename = concat (dump_base_name, ".opt-record.json", NULL);
-  FILE *outfile = fopen (filename, "w");
-  if (outfile)
+  pretty_printer pp;
+  m_root_tuple->print (&pp);
+
+  bool emitted_error = false;
+  char *filename = concat (dump_base_name, ".opt-record.json.gz", NULL);
+  gzFile outfile = gzopen (filename, "w");
+  if (outfile == NULL)
 {
-  m_root_tuple->dump (outfile);
-  fclose (outfile);
+  error_at (UNKNOWN_LOCATION, "cannot open file %qs for writing 
optimization records",
+   filename); // FIXME: more info?
+  goto cleanup;
 }
-  else
-error_at (UNKNOWN_LOCATION, "unable to write optimization records to %qs",
- filename); // FIXME: more info?
+
+  if (gzputs (outfile, pp_formatted_text (&pp)) <= 0)
+{
+  int tmp;
+  error_at (UNKNOWN_LOCATION, "error writing optimization records to %qs: 
%s",
+   filename, gzerror (outfile, &tmp));
+  emitted_error = true;
+}
+
+ cleanup:
+  if (outfile)
+if (gzclose (outfile) != Z_OK)
+  if (!emitted_error)
+   error_at (UNKNOWN_LOCATION, "error closing optimization records %qs",
+ filename);
+
   free (filename);
 }
 
-- 
1.8.5.3



Re: [PATCH] Simplify floating point comparisons

2018-11-09 Thread Wilco Dijkstra
Richard Biener wrote:
>Marc Glisse wrote:
>> Let's try with C = DBL_MIN and x = 婊BL_MAX. I don't believe it involves
>> signed zeros or infinities, just an underflow. First, the result depends on
>> the rounding mode. And in the default round-to-nearest, both divisions give
>> 0, and thus compare the same with 0, but we replace that with a sign test on
>> x, where they clearly give opposite answers.
>>
>> What would be the proper flag to test to check if we care about underflow?
>
> We have none specific so this makes it flag_unsafe_math_optimizations.

Right I have added the unsafe math check again like in the previous version:


The patch implements some of the optimizations discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.

Simplify (C / x >= 0.0) into x >= 0.0 with -funsafe-math-optimizations
(since C / x can underflow to zero if x is huge, it's not safe otherwise).
If C is negative the comparison is reversed.


Simplify (x * C1) > C2 into x > (C2 / C1) with -funsafe-math-optimizations.
If C1 is negative the comparison is reversed.

OK for commit?

ChangeLog
2018-11-09  Wilco Dijkstra
Jackson Woodruff  

gcc/
PR 71026/tree-optimization
* match.pd: Simplify floating point comparisons.

gcc/testsuite/
PR 71026/tree-optimization
* gcc.dg/div-cmp-1.c: New test.
* gcc.dg/div-cmp-2.c: New test.

--
diff --git a/gcc/match.pd b/gcc/match.pd
index 
94fbab841f5e36bd33fda849a686fd80886ee1ff..f6c76510f95be2485e5bacd07edab336705cbd25
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -405,6 +405,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (rdiv @0 (negate @1))
  (rdiv (negate @0) @1))
 
+(if (flag_unsafe_math_optimizations)
+ /* Simplify (C / x op 0.0) to x op 0.0 for C != 0, C != Inf/Nan.
+Since C / x may underflow to zero, do this only for unsafe math.  */
+ (for op (lt le gt ge)
+  neg_op (gt ge lt le)
+  (simplify
+   (op (rdiv REAL_CST@0 @1) real_zerop@2)
+   (if (!HONOR_SIGNED_ZEROS (@1) && !HONOR_INFINITIES (@1))
+(switch
+ (if (real_less (&dconst0, TREE_REAL_CST_PTR (@0)))
+  (op @1 @2))
+ /* For C < 0, use the inverted operator.  */
+ (if (real_less (TREE_REAL_CST_PTR (@0), &dconst0))
+  (neg_op @1 @2)))
+
 /* Optimize (X & (-A)) / A where A is a power of 2, to X >> log2(A) */
 (for div (trunc_div ceil_div floor_div round_div exact_div)
  (simplify
@@ -4049,6 +4064,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(rdiv @2 @1))
(rdiv (op @0 @2) @1)))
 
+ (for cmp (lt le gt ge)
+  neg_cmp (gt ge lt le)
+  /* Simplify (x * C1) cmp C2 -> x cmp (C2 / C1), where C1 != 0.  */
+  (simplify
+   (cmp (mult @0 REAL_CST@1) REAL_CST@2)
+   (with
+{ tree tem = const_binop (RDIV_EXPR, type, @2, @1); }
+(if (tem
+&& !(REAL_VALUE_ISINF (TREE_REAL_CST (tem))
+ || (real_zerop (tem) && !real_zerop (@1
+ (switch
+  (if (real_less (&dconst0, TREE_REAL_CST_PTR (@1)))
+   (cmp @0 { tem; }))
+  (if (real_less (TREE_REAL_CST_PTR (@1), &dconst0))
+   (neg_cmp @0 { tem; })))
+
  /* Simplify sqrt(x) * sqrt(y) -> sqrt(x*y).  */
  (for root (SQRT CBRT)
   (simplify
diff --git a/gcc/testsuite/gcc.dg/div-cmp-1.c b/gcc/testsuite/gcc.dg/div-cmp-1.c
new file mode 100644
index 
..cd1a5cd3d6fee5a10e9859ca99b344fa3fdb7f5f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/div-cmp-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -funsafe-math-optimizations -fdump-tree-optimized-raw" } 
*/
+
+int
+cmp_mul_1 (float x)
+{
+  return x * 3 <= 100;
+}
+
+int
+cmp_mul_2 (float x)
+{
+  return x * -5 > 100;
+}
+
+int
+div_cmp_1 (float x, float y)
+{
+  return x / 3 <= y;
+}
+
+int
+div_cmp_2 (float x, float y)
+{
+  return x / 3 <= 1;
+}
+
+/* { dg-final { scan-tree-dump-times "mult_expr" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "rdiv_expr" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/div-cmp-2.c b/gcc/testsuite/gcc.dg/div-cmp-2.c
new file mode 100644
index 
..f4ac42a196a804747d0b578e0aa2131671c8d3cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/div-cmp-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -funsafe-math-optimizations -ffinite-math-only 
-fdump-tree-optimized-raw" } */
+
+int
+cmp_1 (float x)
+{
+  return 5 / x >= 0;
+}
+
+int
+cmp_2 (float x)
+{
+  return 1 / x <= 0;
+}
+
+int
+cmp_3 (float x)
+{
+  return -2 / x >= 0;
+}
+
+int
+cmp_4 (float x)
+{
+  return -5 / x <= 0;
+}
+
+/* { dg-final { scan-tree-dump-not "rdiv_expr" "optimized" } } */
+

Re: PING [PATCH] use MAX_OFILE_ALIGNMENT to validate attribute aligned (PR 87795)

2018-11-09 Thread Martin Sebor

On 11/07/2018 03:43 PM, Jeff Law wrote:

On 11/6/18 5:06 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-10/msg02081.html

I thought I'd already ACK's this one...


OK.


You did but I made changes afterwards in response to Joseph's
comment.  I've committed the final patch in r265977.

Thanks
Martin


[PATCH, GCC, AArch64] Branch Dilution Pass

2018-11-09 Thread Sudakshina Das
Hi

I am posting this patch on behalf of Carey (cc'ed). I also have some
review comments that I will make as a reply to this later.


This implements a new AArch64 specific back-end pass that helps optimize
branch-dense code, which can be a bottleneck for performance on some Arm
cores. This is achieved by padding out the branch-dense sections of the
instruction stream with nops.

This has proven to show up to a 2.61%~ improvement on the Cortex A-72
(SPEC CPU 2006: sjeng).

The implementation includes the addition of a new RTX instruction class
FILLER_INSN, which has been white listed to allow placement of NOPs
outside of a basic block. This is to allow padding after unconditional
branches. This is favorable so that any performance gained from
diluting branches is not paid straight back via excessive eating of nops.

It was deemed that a new RTX class was less invasive than modifying
behavior in regards to standard UNSPEC nops.

## Command Line Options

Three new target-specific options are provided:
- mbranch-dilution
- mbranch-dilution-granularity={num}
- mbranch-dilution-max-branches={num}

A number of cores known to be able to benefit from this pass have been
given default tuning values for their granularity and max-branches.
Each affected core has a very specific granule size and associated
max-branch limit. This is a microarchitecture specific optimization. 
Typical usage should be -mdilute-branches with a specificed -mcpu. Cores 
with a granularity tuned to 0 will be ignored. Options are provided for 
experimentation.

## Algorithm and Heuristic

The pass takes a very simple 'sliding window' approach to the problem. 
We crawl through each instruction (starting at the first branch) and 
keep track of the number of branches within the current "granule" (or 
window). When this exceeds the max-branch value, the pass will dilute 
the current granule, inserting nops to push out some of the branches. 
The heuristic will favour unconditonal branches (for performance 
reasons), or branches that are between two other branches (in order to 
decrease the likelihood of another dilution call being needed).

Each branch type required a different method for nop insertion due to 
RTL/basic_block restrictions:

- Returning calls do not end a basic block so can be handled by emitting
a generic nop.
- Unconditional branches must be the end of a basic block, and nops 
cannot be outside of a basic block.
   Thus the need for FILLER_INSN, which allows placement outside of a 
basic block - and translates to a nop.
- For most conditional branches we've taken a simple approach and only 
handle the fallthru edge for simplicity,
   which we do by inserting a "nop block" of nops on the fallthru edge, 
mapping that back to the original destination block.
- asm gotos and pcsets are going to be tricky to analyse from a dilution 
perspective so are ignored at present.


## Changelog

gcc/testsuite/ChangeLog:

2018-11-09  Carey Williams  

* gcc.target/aarch64/branch-dilution-off.c: New test.
* gcc.target/aarch64/branch-dilution-on.c: New test.


gcc/ChangeLog:

2018-11-09  Carey Williams  

* cfgbuild.c (inside_basic_block_p): Add FILLER_INSN case.
* cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside
basic blocks.
* config.gcc (extra_objs): Add aarch64-branch-dilution.o.
* config/aarch64/aarch64-branch-dilution.c: New file.
* config/aarch64/aarch64-passes.def (branch-dilution): Register
pass.
* config/aarch64/aarch64-protos.h (struct tune_params): Declare
tuning parameters bdilution_gsize and bdilution_maxb.
(make_pass_branch_dilution): New declaration.
* config/aarch64/aarch64.c (generic_tunings,cortexa35_tunings,
cortexa53_tunings,cortexa57_tunings,cortexa72_tunings,
cortexa73_tunings,exynosm1_tunings,thunderxt88_tunings,
thunderx_tunings,tsv110_tunings,xgene1_tunings,
qdf24xx_tunings,saphira_tunings,thunderx2t99_tunings):
Provide default tunings for bdilution_gsize and bdilution_maxb.
* config/aarch64/aarch64.md (filler_insn): Define new insn.
* config/aarch64/aarch64.opt (mbranch-dilution,
mbranch-dilution-granularity,
mbranch-dilution-max-branches): Define new branch dilution
options.
* config/aarch64/t-aarch64 (aarch64-branch-dilution.c): New rule
for aarch64-branch-dilution.c.
* coretypes.h (rtx_filler_insn): New rtx class.
* doc/invoke.texi (mbranch-dilution,
mbranch-dilution-granularity,
mbranch-dilution-max-branches): Document branch dilution
options.
* emit-rtl.c (emit_filler_after): New emit function.
* rtl.def (FILLER_INSN): New RTL EXPR of type RTX_INSN.
* rtl.h (class GTY): New class for rtx_filler_insn.
(is_a_helper ::test): New test helper for rtx_filler_insn.
(macro FILLER_INSN_P(X)): New predicate.
* target-insns.def (f

Re: [PATCH v4] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ulrich Weigand
Ilya Leoshkevich wrote:

>gcc_assert (GET_CODE (x) != SYMBOL_REF
> -   || !CONSTANT_POOL_ADDRESS_P (x));
> +   || !CONSTANT_POOL_ADDRESS_P (x)
> +   || s390_symbol_relative_long_p (x));

Hmm, it's a bit weird that this routine now uses a different check
than the other two.  It would look more straightforward for
find_constant_pool_ref to use the same s390_safe_relative_long_p
check as the others.

(This would then also make s390_symbol_relative_long_p redundant.)

If we do that, it might even make sense to pull the 
s390_safe_relative_long_p check up into the callers:

E.g. in s390_mainpool_finish, replace

  /* Replace all literal pool references.  */

  for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
{
  if (NONJUMP_INSN_P (insn) || CALL_P (insn))
{

with

  if ((NONJUMP_INSN_P (insn) || CALL_P (insn))
  && !s390_safe_relative_long_p (insn))

(This last change is just a suggestion, only if it makes the
overall code simpler and more readable.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



arm - minor cleanups for CPU alias support

2018-11-09 Thread Richard Earnshaw (lists)
A couple of very minor issues with the new support for CPU
aliases.

* config/arm/parsecpu.awk (/alias/): Tighten invisible alias
matching criteria.  Remove unused array initializer.

Committed.
diff --git a/gcc/config/arm/parsecpu.awk b/gcc/config/arm/parsecpu.awk
index ba2dee5fdcb..87411847ec6 100644
--- a/gcc/config/arm/parsecpu.awk
+++ b/gcc/config/arm/parsecpu.awk
@@ -684,7 +684,7 @@ BEGIN {
 for (n = 2; n <= alias_count; n++) {
 	visible = "true"
 	alias = $n
-	if (alias ~ /!.*/) {
+	if (alias ~ /^!.*/) {
 	visible = "false"
 	gsub(/^!/, "", alias)
 	}
@@ -700,7 +700,6 @@ BEGIN {
 	}
 	cpu_all_aliases[alias] = cpu_name
 }
-cpu_has_alias[cpu_name] = 1
 parse_ok = 1
 }
 


Re: [PATCH][AArch64] PR79262: Adjust vector cost

2018-11-09 Thread Wilco Dijkstra
Hi James,

> We have 7 unique target tuning structures in the AArch64 backend, of which
> only one has a 2x ratio between scalar_int_cost and vec_to_scalar_cost. Other
> ratios are 1, 3, 8, 3, 4, 6.

I wouldn't read too much in the exact value here - the costs are simply 
relative to
other values for the same tuning, ie. cores that use 4 or 6 also tend to use 
larger
values for the other entries.

> What makes this choice correct? What makes it more correct than what we
> have now? On which of the 28 entries in config/aarch64/aarch64-cores.def does
> performance improve? 

It's correct since it's the minimum value that stops vectorization for this 
particular
bug without increasing costs too much and accidentally blocking useful loops 
from
being vectorized (as confirmed by benchmarking). Given that all other vector 
cost
tunings already block vectorization for this case, this is clearly what is 
required
for best performance. So this improves performance for all 28 entries.

> Please validate the performance effect of this patch, which changes default
> code generation for everyone, on more than one testcase in a bug report.

I did validate the performance like I mentioned in the patch. Since it has been 
a
while, I can easily rerun the benchmarks using latest trunk to verify it's 
still a gain.

> Are the Spec benchmarks sufficiently representative to change the generic
> vectorisation costs?

SPEC is sufficiently large and complex to show there is a gain without 
regressions.
Do you have reason to believe other benchmarks might not see the same gain?

Wilco

Re: [PATCH] detect attribute mismatches in alias declarations (PR 81824)

2018-11-09 Thread Martin Sebor

+/* Handle the "copy" attribute by copying the set of attributes
+   from the symbol referenced by ARGS to the declaration of *NODE.  */
+
+static tree
+handle_copy_attribute (tree *node, tree name, tree args,
+  int flags, bool *no_add_attrs)
+{
+  /* Break cycles in circular references.  */
+  static hash_set attr_copy_visited;

Does this really need to be static?


The variable was intended to break cycles in recursive calls to
the function for self-referential applications of attribute copy
but since the attribute itself is not applied (anymore) such cycles
can no longer form.  I have removed the variable and simplified
the handlers (there are tests to verify this works correctly).


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cfe6a8e..8ffb0cd 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5c95f67..c027acd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi

[ ... ]


+
+In C++, the warning is issued when an explicitcspecialization of a primary

"explicitcspecialization" ? :-)



Fixed.



Looks pretty good.  There's the explicit specialization nit and the
static vs auto question for attr_copy_visited.  Otherwise it's OK.


Thanks.  I've retested a revision with the changes discussed here
and committed it as r265980.

Martin


Re: [PATCH][AArch64] PR79262: Adjust vector cost

2018-11-09 Thread Wilco Dijkstra
Hi James,

>On Mon, Jan 22, 2018 at 09:22:27AM -0600, Richard Biener wrote:
>> It would be better to dissect this cost into vec_to_scalar and vec_extract 
>> where
>> vec_to_scalar really means getting at the scalar value of a vector of
>> uniform values
>> which most targets can do without any instruction (just use a subreg).
>> 
>> I suppose we could also make vec_to_scalar equal to vector extraction and 
>> remove
>> the uses for the other case (reduction vector result to scalar reg).
>
> I have dug up Richard's comments from last year, which you appear to have
> ignored and made no reference to when resubmitting the patch.
>
> Please don't do that. Carefully consider Richard's review feedback before
> resubmitting this patch.

I seem to have missed this comment... However I can't see how it relates to my
patch, particularly since Richard claimed in PR79262 that this PR is a backend
issue. Sure it *would* be great to fix the vector cost infrastructure, but 
that's a lot
of work, and wasn't my goal here, nor is it required to fix PR79262. The 
existing
costs allow the issue to be solved, just like the other targets/tunings have 
done
already, so that's clearly the best approach for this PR.

Wilco

Re: [PATCH][RFC] Come up with -flive-patching master option.

2018-11-09 Thread Qing Zhao
Hi, Martin,

thanks a lot for the previous two new options for live-patching. 


I have two more questions below:

1. do we still need new options to disable the following:
   A. unreachable code/variable removal? 
   B. Visibility changes with -flto and/or -fwhole-program?

2. for this new patch, could you please explain a little bit more on the 
problem?

Thanks.

Qing

> On Nov 9, 2018, at 9:33 AM, Martin Liška  wrote:
> 
> Hi.
> 
> After I added 2 new options, I would like to include a new master option.
> It's minimal version which only disables optimizations that we are aware of
> and can potentially cause problems for live-patching.
> 
> Martin
> <0001-Come-up-with-fvectorized-functions.patch>



Re: [PATCH] Add value_range_base (w/o equivs)

2018-11-09 Thread Aldy Hernandez

On 11/9/18 9:19 AM, Richard Biener wrote:


This adds value_range_base, a base class of class value_range
with all members but the m_equiv one.


First of all, thanks so much for doing this!



I have looked into the sole GC user, IPA propagation, and replaced
the value_range use there with value_range_base since it also
asserted the equiv member is always NULL.

This in turn means I have written down that GC users only can
use value_range_base (and fixed the accessability issue with
adding a bunch of friends).



+
 /* Range of values that can be associated with an SSA_NAME after VRP
-   has executed.  */
-class GTY((for_user)) value_range
+   has executed.  Note you may only use value_range_base with GC memory.  */
+class GTY((for_user)) value_range_base
+{


GC users cannot use the derived value_range?  Either way could you 
document the "why" this is the case above?


And thanks for adding those accessibility friends, they're a pain to get 
right.  It's a pity we have to go to such lengths to deal with this shit.




I have moved all methods that are required for this single user
sofar (without looking with others are trivially movable because
they do not look at equivs - that's for a followup).  I ended


That would be great.


up basically duplicating the ::union_ wrapper around union_ranges
(boo).  Some more refactoring might save us a few lines here.

There are two places where IPA prop calls extract_range_from_unary_expr.
I've temporarily made it use value_range temporaries there given
I need to think about the few cases of ->deep_copy () we have in
there.  IMHO equiv handling would need to move to the callers or
rather "copies" shouldn't be handled by extract_range_from_unary_expr.
Well, I need to think about it.  (no, I don't want to make that
function virtual)

The other workers might be even easier to massage to work on
value_range_base only.


If value_range_base is the predominant idiom, perhaps it should be 
called value_range and the derived class be called 
value_range_with_equiv or some such?




Bootstrap and regtest running on x86_64-unknown-linux-gnu.

If that goes well I want to apply this even in its incomplete form.
Unless there are any complaints?


Fine by me.  Again, thanks.
Aldy


[patch, libgfortran] Patch committed as obvious

2018-11-09 Thread Jerry DeLisle
Committed as obvious. I had inserted this line to check effects of the 
-fdec flags and forgot to delete it before my previous commit.


Author: jvdelisle
Date: Fri Nov  9 17:29:33 2018
New Revision: 265979

URL: https://gcc.gnu.org/viewcvs?rev=265979&root=gcc&view=rev
Log:
2018-11-09  Jerry DeLisle  

PR libfortran/78351
* io/transfer.c (read_sf_internal): Delete leftover
debug code.

Modified:
trunk/libgfortran/ChangeLog
trunk/libgfortran/io/transfer.c


Re: [PATCH v4] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ilya Leoshkevich
> Am 09.11.2018 um 18:30 schrieb Ulrich Weigand :
> 
> Ilya Leoshkevich wrote:
> 
>>   gcc_assert (GET_CODE (x) != SYMBOL_REF
>> -  || !CONSTANT_POOL_ADDRESS_P (x));
>> +  || !CONSTANT_POOL_ADDRESS_P (x)
>> +  || s390_symbol_relative_long_p (x));
> 
> Hmm, it's a bit weird that this routine now uses a different check
> than the other two.  It would look more straightforward for
> find_constant_pool_ref to use the same s390_safe_relative_long_p
> check as the others.
> 
> (This would then also make s390_symbol_relative_long_p redundant.)

Ok, sounds good.

> If we do that, it might even make sense to pull the 
> s390_safe_relative_long_p check up into the callers:
> 
> E.g. in s390_mainpool_finish, replace
> 
>  /* Replace all literal pool references.  */
> 
>  for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
>{
>  if (NONJUMP_INSN_P (insn) || CALL_P (insn))
>{
> 
> with
> 
>  if ((NONJUMP_INSN_P (insn) || CALL_P (insn))
>  && !s390_safe_relative_long_p (insn))
> 
> (This last change is just a suggestion, only if it makes the
> overall code simpler and more readable.)

I would have to do that in 8 places (4 for find/replace and 4 for 
annotate), so it’s probably better to keep the current 3 checks.


Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

2018-11-09 Thread Kyrill Tkachov

On 09/11/18 12:18, Richard Biener wrote:

On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
 wrote:


Hi all,

In this testcase the codegen for VLA SVE is worse than it could be due to 
unrolling:

fully_peel_me:
 mov x1, 5
 ptrue   p1.d, all
 whilelo p0.d, xzr, x1
 ld1dz0.d, p0/z, [x0]
 faddz0.d, z0.d, z0.d
 st1dz0.d, p0, [x0]
 cntdx2
 addvl   x3, x0, #1
 whilelo p0.d, x2, x1
 beq .L1
 ld1dz0.d, p0/z, [x0, #1, mul vl]
 faddz0.d, z0.d, z0.d
 st1dz0.d, p0, [x3]
 cntwx2
 incbx0, all, mul #2
 whilelo p0.d, x2, x1
 beq .L1
 ld1dz0.d, p0/z, [x0]
 faddz0.d, z0.d, z0.d
 st1dz0.d, p0, [x0]
.L1:
 ret

In this case, due to the vector-length-agnostic nature of SVE the compiler 
doesn't know the loop iteration count.
For such loops we don't want to unroll if we don't end up eliminating branches 
as this just bloats code size
and hurts icache performance.

This patch introduces a new unroll-known-loop-iterations-only param that 
disables cunroll when the loop iteration
count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA 
code, but it does help some
Advanced SIMD cases as well where loops with an unknown iteration count are not 
unrolled when it doesn't eliminate
the branches.

So for the above testcase we generate now:
fully_peel_me:
 mov x2, 5
 mov x3, x2
 mov x1, 0
 whilelo p0.d, xzr, x2
 ptrue   p1.d, all
.L2:
 ld1dz0.d, p0/z, [x0, x1, lsl 3]
 faddz0.d, z0.d, z0.d
 st1dz0.d, p0, [x0, x1, lsl 3]
 incdx1
 whilelo p0.d, x1, x3
 bne .L2
 ret

Not perfect still, but it's preferable to the original code.
The new param is enabled by default on aarch64 but disabled for other targets, 
leaving their behaviour unchanged
(until other target people experiment with it and set it, if appropriate).

Bootstrapped and tested on aarch64-none-linux-gnu.
Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in 
performance.

Ok for trunk?


Hum.  Why introduce a new --param and not simply key on
flag_peel_loops instead?  That is
enabled by default at -O3 and with FDO but you of course can control
that in your targets
post-option-processing hook.


You mean like this?
It's certainly a simpler patch, but I was just a bit hesitant of making this 
change for all targets :)
But I suppose it's a reasonable change.



It might also make sense to have more fine-grained control for this
and allow a target
to say whether it wants to peel a specific loop or not when the
middle-end thinks that
would be profitable.


Can be worth looking at as a follow-up. Do you envisage the target analysing
the gimple statements of the loop to figure out its cost?

Thanks,
Kyrill


2018-11-09  Kyrylo Tkachov  

* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not unroll
loop when number of iterations is not known and flag_peel_loops is in
effect.

2018-11-09  Kyrylo Tkachov  

* gcc.target/aarch64/sve/unroll-1.c: New test.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unroll-1.c b/gcc/testsuite/gcc.target/aarch64/sve/unroll-1.c
new file mode 100644
index ..7f53d20cbf8e18a4389b86c037f56f024bac22a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/unroll-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+/* Check that simple loop is not fully unrolled.  */
+
+void
+fully_peel_me (double *x)
+{
+  for (int i = 0; i < 5; i++)
+x[i] = x[i] * 2;
+}
+
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[.+]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[.+\]\n} 1 } } */
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index c2953059fb9218d4bc4cf12fe9277a552b4a04bd..daeddb384254775d6482ed5580e8262e4b26a87f 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -883,6 +883,16 @@ try_unroll_loop_completely (struct loop *loop,
 			 loop->num);
 	  return false;
 	}
+	  else if (TREE_CODE (niter) == SCEV_NOT_KNOWN
+		   && flag_peel_loops)
+	{
+	  if (dump_enabled_p ())
+		dump_printf (MSG_NOTE, "Not unrolling loop %d: "
+			 "exact number of iterations not known "
+			 "(-fpeel-loops).\n",
+			 loop->num);
+	  return false;
+	}
 	}
 
   initialize_original_copy_tables ();


Re: Small typo in iconv.m4

2018-11-09 Thread Eric Gallager
On 11/7/18, Jeff Law  wrote:
> On 11/6/18 9:37 AM, Hafiz Abid Qadeer wrote:
>> Hi All,
>> I was investigating a character set related problem with windows hosted
>> GDB and I tracked it down to a typo in iconv.m4. This typo caused
>> libiconv detection to fail and related support was not built into gdb.
>>
>> The problem is with the following line.
>> CPPFLAGS="$LIBS $INCICONV"
>> which should have been
>> CPPFLAGS="$CPPFLAGS $INCICONV"
>>
>> OK to commit the attached patch?
>>
>> 2018-11-06  Hafiz Abid Qadeer  
>>
>>  * config/iconv.m4 (AM_ICONV_LINK): Don't overwrite CPPFLAGS.
>>  Append $INCICONV to it.
>>  * gcc/configure: Regenerate.
>>  * libcpp/configure: Likewise.
>>  * libstdc++-v3/configure: Likewise.
>>  * intl/configure: Likewise.
>>
>> Thanks,
>>
> THanks.  I wasn't sure if you had commit privs, so I went ahead and
> installed the patch.
>
> Jeff
>

Does this have any effect on GCC bug 78251?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78251
Or any of the related bugs under "See Also" for that matter?


Re: [patch] allow target config to state r18 is fixed on aarch64

2018-11-09 Thread Olivier Hainque
Hello Wilco,

Would you have further thoughts on the patches proposed in 

 https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01453.html

?

There was:

1)  * config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG) : Redefine as
R9_REGNUM instead of 9.
(PROBE_STACK_SECOND_REG): Redefine as R10_REGNUM instead of 10.


2)  * config/aarch64/aarch64.h (FIXED_R18): New internal
configuration macro, defaulted to 0.
(FIXED_REGISTERS): Use it.
(STATIC_CHAIN_REGNUM): Use r11 instead of r18.

and

3)  * config/aarch64/aarch64.c (aarch64_override_options): Once
arch, cpu and tune were validated, insert SUBTARGET_OVERRIDE_OPTIONS
   if defined.



Thanks a lot in advance!

With Kind Regards,

Olivier


> On 23 Oct 2018, at 19:13, Olivier Hainque  wrote:
> 
> Hi Wilco,
> 
>> On 18 Oct 2018, at 19:08, Wilco Dijkstra  wrote:
> 
>>> I wondered if we could set it to R11 unconditionally and picked
>>> the way ensuring no change for !vxworks ports, especially since I
>>> don't have means to test more than what I described above.
>> 
>> Yes it should always be the same register, there is no gain in switching
>> it dynamically. I'd suggest to use X9 since X8 is the last register used for
>> arguments (STATIC_CHAIN_REGNUM is passed when calling a nested
>> function) and some of the higher registers may be used as temporaries in
>> prolog/epilog.
> 
> Thanks for your feedback!  I ported the patches
> to gcc-8 and was able to get a functional toolchain
> for aarch64-wrs-vxworks7 and aarch64-elf, passing
> full Acats for a couple of runtime variants on VxWorks
> (compiled for RTP or kernel mode) as well as a small
> internal testsuite we have, dedicated to cross configurations.
> 
> All the patches apply directly on mainline.
> 
> As for the original patch, I also sanity checked that
> "make all-gcc" passes (self tests included) there for
> --target=aarch64-elf --enable-languages=c
> 
> 
> There are three changes to the common aarch64 port files.
> 
> It turns out that X9 doesn't work for STATIC_CHECK_REGNUM
> because this conflicts with the registers used for -fstack-check:
> 
>  /* The pair of scratch registers used for stack probing.  */
>  #define PROBE_STACK_FIRST_REG  9
>  #define PROBE_STACK_SECOND_REG 10
> 
> I didn't find that immediately (read, I first experienced a few
> badly crashing test runs) because I searched for R9_REGNUM to check
> for other uses, so the first patchlet attached simply adjusts
> the two #defines above to use R9/R10_REGNUM.
> 
> 
> 2018-10-23  Olivier Hainque  
> 
>* config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG) : Redefine as
>R9_REGNUM instead of 9.
>(PROBE_STACK_SECOND_REG): Redefine as R10_REGNUM instead of 10.
> 
> 
> The second patch is the one which I proposed a few days ago
> to allow a subtarget (in my case, the VxWorks port) to state that
> R18 is to be considered fixed. Two changes compared to the original
> patch: a comment attached to the default definition of FIXED_R18,
> and the unconditional use of R11_REGNUM as an alternate STATIC_CHAIN_REGNUM.
> 
> I suppose the latter could require extra testing than what I was
> able to put in (since this is also changing for !vxworks configurations),
> which Sam very kindly did on the first instance.
> 
> I didn't touch CALL_SAVED_REGISTERS since this is 1 for r18 already.
> I also didn't see a strong reason to move to a more dynamic scheme,
> through conditional_register_usage.
> 
> 2018-03-18  Olivier Hainque  
> 
>   * config/aarch64/aarch64.h (FIXED_R18): New internal
>   configuration macro, defaulted to 0.
>   (FIXED_REGISTERS): Use it.
>   (STATIC_CHAIN_REGNUM): Use r11 instead of r18.
> 
> 
> The third patch proposes the introduction of support for a
> conditional SUBTARGET_OVERRIDE_OPTIONS macro, as many other
> architectures have, and which is needed by all VxWorks ports.
> 
> In the current state, this one could possibly impact only
> VxWorks, as no other config file would define the macro.
> 
> I'm not 100% clear on the possible existence of rules regarding
> the placement of this within the override_options functions. We used
> something similar to what other ports do, and it worked just fine
> for VxWorks.
> 
> 2018-10-23  Olivier Hainque  
> 
>* config/aarch64/aarch64.c (aarch64_override_options): Once
>arch, cpu and tune were validated, insert SUBTARGET_OVERRIDE_OPTIONS
>if defined.
> 
> I'm happy to adjust any of all this if needed of course.
> 
> Thanks in advance for your feedback!
> 
> With Kind Regards,
> 
> Olivier
> 
> 
> <0001-Use-R9-R10_REGNUM-to-designate-stack-checking-regist.patch.txt><0002-Introduce-FIXED_R18-in-aarch64.h.patch.txt><0003-Support-for-SUBTARGET_OVERRIDE_OPTIONS-in-aarch64.c.patch.txt>



Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-09 Thread Andi Kleen
Hi Richard,

On Fri, Nov 09, 2018 at 04:27:22PM +0100, Richard Biener wrote:
> > Passes bootstrap and test suite on x86_64-linux, also
> > bootstrapped and tested gcc itself with full -fvartrace
> > and -fvartrace-locals instrumentation.
> 
> So how is this supposed to be used?  I guess in a
> edit-debug cycle and not for production code?

It can actually be used for production code.

When processor trace is disabled the PTWRITE
instructions acts as nops. So it's only increasing
the code foot print. Since the instrumentation
should only log values which are already computed
it normally doesn't cause any other code.

Even when it is enabled the primary overhead is the
additional memory bandwidth, since the CPU can
do the logging in parallel to other code. As long
as the instrumentation is not too excessive to generate
too much memory bandwidth, it might be actually
quite reasonable to even keep the logging on for
production code, and use it as a "flight recorder",
which is dumped on failures. 

This would also be the model in gdb, once we have support
in it. You would run the program in the debugger
and it just logs the data to a memory buffer,
but when stopping the value history can be examined.

There's also some ongoing work to add (optional) support
for PT to Linux crash dumps, so eventually that will
work without having to always run the debugger.

Today it can be done by running perf in the background
to record the PT, however there the setup is a bit
more complicated.

The primary use case I was envisioning was to set
the attribute on some critical functions/structures/types
of interest and then have a very overhead logging
option for them (generally cheaper than
equivalent software instrumentation). And then
they automatically gets logged without the programmer
needing to add lots of instrumentation code to 
catch every instance. So think of it as a 
"hardware accelerated printf"

> 
> What do you actually write with PTWRITE?  I suppose
> you need to keep a ID to something mapping somewhere
> so you can make sense of the perf records?

PTWRITE writes 32bit/64bit values. The CPU reports the
IP of PTWRITE in the log, either explicitely, or implicitely if branch
trace is enabled too. The IP can then be used to look up
the DWARF scope for that IP. Then the decoder
decodes the operand of PTWRITE and maps it back using 
the dwarf information. So it all works using 
existing debugger infrastructure, and a quite simple
instruction decoder.

I'll clarify that in the description.

> 
> Few comments inline below, but I'm not sure if this
> whole thing is interesting for GCC (as opposed to being
> a instrumentation plugin)

I'm biased, but I think automatic data tracing is a very exciting
use case, so hopefully it can be considered for mainstream gcc.

> >   
> > handle_no_profile_instrument_function_attribute,
> >   NULL },
> > @@ -767,6 +775,21 @@ handle_no_sanitize_undefined_attribute (tree *node, 
> > tree name, tree, int,
> >return NULL_TREE;
> >  }
> >
> > +/* Handle "vartrace"/"no_vartrace" attributes; arguments as in
> > +   struct attribute_spec.handler.  */
> > +
> > +static tree
> > +handle_vartrace_attribute (tree *node, tree, tree, int flags,
> > +  bool *)
> > +{
> > +  if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
> > +*node = build_variant_type_copy (*node);
> 
> I don't think you want the attribute on types.  As far as I understood your
> descriptions it should only be on variables and functions.

The idea was that it's possible to trace all instances of a type,
especially structure members. Otherwise it will be harder for
the programmer to hunt down every instance.

For example if I have a structure that is used over a program,
and one member gets the wrong value.

I can do then in the header:

struct foo {
int member __attribute__(("vartrace"));
};

and then recompile the program. Every instance of writing to
member will then be automatically instrumented (assuming
the program stays type safe)

Makes sense?

[BTW I considered adding an address trace
too for pointer writes to hunt down the non type safe
instances and possibly some other use cases. 
That might be possible follow on work]

> > +
> >  #undef TARGET_GIMPLIFY_VA_ARG_EXPR
> >  #define TARGET_GIMPLIFY_VA_ARG_EXPR ix86_gimplify_va_arg
> >
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index 1eca009e255..08286aa4591 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -3193,6 +3193,13 @@ the standard C library can be guaranteed not to 
> > throw an exception
> >  with the notable exceptions of @code{qsort} and @code{bsearch} that
> >  take function pointer arguments.
> >
> > +@item no_vartrace
> > +@cindex @code{no_vartrace} function or variable attribute
> > +Disable data tracing for the function or variable or structured field
> > +marked with this attribute. Applies to types. Currentl

Re: [PATCH, GCC, ARM] Enable armv8.5-a and add +sb and +predres for previous ARMv8-a in ARM

2018-11-09 Thread Kyrill Tkachov

Hi Sudi,

On 09/11/18 15:33, Sudakshina Das wrote:

Hi

This patch adds -march=armv8.5-a to the Arm backend.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
Armv8.5-A also adds two new security features:
- Speculation Barrier instruction
- Execution and Data Prediction Restriction Instructions
These are made optional to all older Armv8-A versions. Thus we are
adding two new options "+sb" and "+predres" to all older Armv8-A. These
are passed on to the assembler and have no code generation effects and
have already gone in the trunk of binutils.

Bootstrapped and regression tested with arm-none-linux-gnueabihf.

Is this ok for trunk?
Sudi

*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* config/arm/arm-cpus.in (armv8_5, sb, predres): New features.
(ARMv8_5a): New fgroup.
(armv8.5-a): New arch.
(armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New
options sb and predres.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/t-aprofile: Add matching rules for -march=armv8.5-a
* config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a.
* config/arm/t-multilib (v8_5_a_simd_variants): New variable.
Add matching rules for -march=armv8.5-a and extensions.
* doc/invoke.texi (ARM options): Document -march=armv8.5-a.
Add sb and predres to all armv8-a except armv8.5-a.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* gcc.target/arm/multilib.exp: Add some -march=armv8.5-a
combination tests.


Hi

This patch adds -march=armv8.5-a to the Arm backend.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
Armv8.5-A also adds two new security features:
- Speculation Barrier instruction
- Execution and Data Prediction Restriction Instructions
These are made optional to all older Armv8-A versions. Thus we are
adding two new options "+sb" and "+predres" to all older Armv8-A. These
are passed on to the assembler and have no code generation effects and
have already gone in the trunk of binutils.

Bootstrapped and regression tested with arm-none-linux-gnueabihf.

Is this ok for trunk?
Sudi

*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das

* config/arm/arm-cpus.in (armv8_5, sb, predres): New features.
(ARMv8_5a): New fgroup.
(armv8.5-a): New arch.
(armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New
options sb and predres.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/t-aprofile: Add matching rules for -march=armv8.5-a
* config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a.
* config/arm/t-multilib (v8_5_a_simd_variants): New variable.
Add matching rules for -march=armv8.5-a and extensions.
* doc/invoke.texi (ARM options): Document -march=armv8.5-a.
Add sb and predres to all armv8-a except armv8.5-a.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Sudakshina Das

* gcc.target/arm/multilib.exp: Add some -march=armv8.5-a
combination tests.



This is ok modulo a typo fix below.

Thanks,
Kyrill



index 
25788ad09851daf41038b1578307bf23b7f34a94..eba038f9d20bc54bef7bdb7fa1c0e7028d954ed7
 100644
--- a/gcc/config/arm/t-multilib
+++ b/gcc/config/arm/t-multilib
@@ -70,7 +70,8 @@ v8_a_simd_variants:= $(call all_feat_combs, simd crypto)
 v8_1_a_simd_variants   := $(call all_feat_combs, simd crypto)
 v8_2_a_simd_variants   := $(call all_feat_combs, simd fp16 fp16fml crypto 
dotprod)
 v8_4_a_simd_variants   := $(call all_feat_combs, simd fp16 crypto)
-v8_r_nosimd_variants   := +crc
+v8_5_a_simd_variants   := $(call all_feat_combs, simd fp16 crypto)
+v8_r_nosimd_variants   := +cr5
 


Typo, should be +crc





Re: [PATCH v2] [aarch64] Correct the maximum shift amount for shifted operands.

2018-11-09 Thread Kyrill Tkachov

Hi Christoph,


On 09/11/18 11:38, christoph.muell...@theobroma-systems.com wrote:

From: Christoph Muellner 

The aarch64 ISA specification allows a left shift amount to be applied
after extension in the range of 0 to 4 (encoded in the imm3 field).

This is true for at least the following instructions:

  * ADD (extend register)
  * ADDS (extended register)
  * SUB (extended register)

The result of this patch can be seen, when compiling the following code:

uint64_t myadd(uint64_t a, uint64_t b)
{
   return a+(((uint8_t)b)<<4);
}

Without the patch the following sequence will be generated:

 :
0:  d37c1c21ubfiz   x1, x1, #4, #8
4:  8b20add x0, x1, x0
8:  d65f03c0ret

With the patch the ubfiz will be merged into the add instruction:

 :
0:  8b211000add x0, x0, w1, uxtb #4
4:  d65f03c0ret

Tested with "make check" and no regressions found.

*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner  
Philipp Tomsich  

* config/aarch64/aarch64.c (aarch64_uxt_size): Correct the maximum
shift amount for shifted operands.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Christoph Muellner  
Philipp Tomsich  

* gcc.target/aarch64/extend.c: Adjust the testcases to cover
the changed shift amount.


Thanks, this looks good to me.
You'll still need a maintainer to approve.

Kyrill



---
  gcc/config/aarch64/aarch64.c  |  2 +-
  gcc/testsuite/gcc.target/aarch64/extend.c | 16 
  2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c82c7b6..c85988a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8190,7 +8190,7 @@ aarch64_output_casesi (rtx *operands)
  int
  aarch64_uxt_size (int shift, HOST_WIDE_INT mask)
  {
-  if (shift >= 0 && shift <= 3)
+  if (shift >= 0 && shift <= 4)
  {
int size;
for (size = 8; size <= 32; size *= 2)
diff --git a/gcc/testsuite/gcc.target/aarch64/extend.c 
b/gcc/testsuite/gcc.target/aarch64/extend.c
index f399e55..7986c5b 100644
--- a/gcc/testsuite/gcc.target/aarch64/extend.c
+++ b/gcc/testsuite/gcc.target/aarch64/extend.c
@@ -32,8 +32,8 @@ ldr_sxtw0 (char *arr, int i)
  unsigned long long
  adddi_uxtw (unsigned long long a, unsigned int i)
  {
-  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw #?3" } } */
-  return a + ((unsigned long long)i << 3);
+  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw #?4" } } */
+  return a + ((unsigned long long)i << 4);
  }
  
  unsigned long long

@@ -46,8 +46,8 @@ adddi_uxtw0 (unsigned long long a, unsigned int i)
  long long
  adddi_sxtw (long long a, int i)
  {
-  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw #?3" } } */
-  return a + ((long long)i << 3);
+  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw #?4" } } */
+  return a + ((long long)i << 4);
  }
  
  long long

@@ -60,8 +60,8 @@ adddi_sxtw0 (long long a, int i)
  unsigned long long
  subdi_uxtw (unsigned long long a, unsigned int i)
  {
-  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw #?3" } } */
-  return a - ((unsigned long long)i << 3);
+  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw #?4" } } */
+  return a - ((unsigned long long)i << 4);
  }
  
  unsigned long long

@@ -74,8 +74,8 @@ subdi_uxtw0 (unsigned long long a, unsigned int i)
  long long
  subdi_sxtw (long long a, int i)
  {
-  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw #?3" } } */
-  return a - ((long long)i << 3);
+  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw #?4" } } */
+  return a - ((long long)i << 4);
  }
  
  long long




Re: [PATCH] Fix up affinity-fmt.c (was Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9)

2018-11-09 Thread Iain Sandoe
Hi Jakub,

> On 9 Nov 2018, at 07:40, Jakub Jelinek  wrote:
> 
> On Fri, Nov 09, 2018 at 07:33:44AM -0800, Iain Sandoe wrote:
>>> On 9 Nov 2018, at 06:37, Jakub Jelinek  wrote:
>> 
>>> +#if defined (HAVE_INTTYPES_H) && defined (PRIx64)
>>> +   else if (sizeof (gomp_integral (handle)) == sizeof (uint64_t))
>>> + sprintf (buf, "0x" PRIx64, (uint64_t) gomp_integral (handle));
>> 
>> s/0x/0x%/?
> 
> Oops, consider it adjusted.  Haven't started my bootstrap/regtest of it
> yet…

Works For Me (as amended, bootstrap succeeds):

display-affinity-1.c:
libgomp: Affinity not supported on this configuration
L:0%0>xx.local   xx.local   

Re: Small typo in iconv.m4

2018-11-09 Thread Jeff Law
On 11/9/18 10:57 AM, Eric Gallager wrote:
> On 11/7/18, Jeff Law  wrote:
>> On 11/6/18 9:37 AM, Hafiz Abid Qadeer wrote:
>>> Hi All,
>>> I was investigating a character set related problem with windows hosted
>>> GDB and I tracked it down to a typo in iconv.m4. This typo caused
>>> libiconv detection to fail and related support was not built into gdb.
>>>
>>> The problem is with the following line.
>>> CPPFLAGS="$LIBS $INCICONV"
>>> which should have been
>>> CPPFLAGS="$CPPFLAGS $INCICONV"
>>>
>>> OK to commit the attached patch?
>>>
>>> 2018-11-06  Hafiz Abid Qadeer  
>>>
>>> * config/iconv.m4 (AM_ICONV_LINK): Don't overwrite CPPFLAGS.
>>> Append $INCICONV to it.
>>> * gcc/configure: Regenerate.
>>> * libcpp/configure: Likewise.
>>> * libstdc++-v3/configure: Likewise.
>>> * intl/configure: Likewise.
>>>
>>> Thanks,
>>>
>> THanks.  I wasn't sure if you had commit privs, so I went ahead and
>> installed the patch.
>>
>> Jeff
>>
> 
> Does this have any effect on GCC bug 78251?
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78251
> Or any of the related bugs under "See Also" for that matter?
Certainly looks related.  Though I think we're still going to see
pollution, just in a slightly different way.

jeff


Re: [RFT PATCH, i386]: Fix PR87928, ICE in ix86_compute_frame_layout

2018-11-09 Thread Iain Sandoe
Hi Uros,

> On 8 Nov 2018, at 23:53, Uros Bizjak  wrote:
> 

> Attached patch fixes PR87928, where we ICE in ix86_compute_frame_layout in


> I will commit the patch to mainline SVN early next week to allow
> Darwin and Ming/Cygwin maintainers some time to test the patch on
> their targets.

Bootstrap succeeds and the new test passes for Darwin.

This does not alter that Darwin has breakage in this area 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78444)
see : https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00884.html and HJ’s reply.

Essentially, determining any case that demands anything other than 128b 
alignment on Darwin looks hard to reason about***

(I will try to get to more investigation there when time permits)

Iain

*** Quotes from "Mac OS X ABI Function Call Guide”

m32:

The function calling conventions used in the IA-32 environment are the same as 
those used in the System V IA-32 ABI, with the following exceptions: 

   …

• The stack is 16-byte aligned at the point of function calls
…

m64:

Calls out to the x86_64 psABI





Re: [PATCH][RFC] Come up with -flive-patching master option.

2018-11-09 Thread Bernhard Reutner-Fischer
On 9 November 2018 16:33:22 CET, "Martin Liška"  wrote:
>Hi.
>
>After I added 2 new options, I would like to include a new master
>option.
>It's minimal version which only disables optimizations that we are
>aware of
>and can potentially cause problems for live-patching.

I think you attached the wrong patch, AFAICS you attached the gfortran 
vectorized_builtins patch..

thanks,


Re: [PATCH] MIPS: Add `-mfix-r5900' option for the R5900 short loop erratum

2018-11-09 Thread Fredrik Noring
Many thanks for your prompt review, Maciej!

> > * gcc/config/mips/mips.c (mips_reorg_process_insns)
> >   (mips_option_override): Default to working around R5900
> >   errata only if the processor was selected explicitly.
> 
>  I think this only describes the `mips_option_override' part.  ChangeLog 
> entries are best concise, so how about just:
> 
>   * config/mips/mips.c (mips_reorg_process_insns)
>   (mips_option_override): Handle `-mfix-r5900'.
> 
> ?

Done!

> > * gcc/config/mips/mips.h: Declare `mfix-r5900' and
> >   `mno-fix-r5900'.
> 
>  This has to be:
> 
>   * config/mips/mips.h (ASM_SPEC): Add `mfix-r5900' and 
>   `mno-fix-r5900'.
> 
> as you're adding to the ASM_SPEC definition.

Yes, done!

> Also your patch lines are 
> way off and I had to seach for the place this change applies to -- have 
> you been using current upstream trunk with this change?

Hmm... I thought so, but obviously 2 November was a week old. I'm sorry
about that, I have rebased v2 now. [ As mentioned in the commit message,
the test was done with GCC 8.2.0 since there was an unrelated problem
compiling some sanitizer component at the time. I will try to test a
current GCC version during the weekend. ]

> > * gcc/config/mips/mips.opt: Define MASK_FIX_R5900.
> 
>  Likewise:
> 
>   * config/mips/mips.opt (mfix-r5900): New option.
> 
> as it's an entry for `mfix-r5900' that you're adding.

Done!

> > * gcc/doc/invoke.texi: Document the R5900, `mfix-r5900' and
> >   `mno-fix-r5900'.
> 
>  It looks like missing bits, did you mean:
> 
>   * doc/invoke.texi: Document the `r5900' processor name, and 
>   `-mfix-r5900' and `-mno-fix-r5900' options.
> 
> ?

Yes, done!

>  All these entries apply to the gcc/ChangeLog rather than the top-level 
> one, so please drop the leading gcc/ part of the paths, and indicate it at 
> the beginning instead.

Done!

>  Please also fix indentation throughout; the subsequent lines of entries 
> are horizontally aligned with the asterisk above and not indented any 
> further.

Ah, yes, of course.

>  Please resubmit your change with the ChangeLog entry updated.  
> Regrettably I have no authority to approve your submission, but hopefully 
> a maintainer will so that it can make it to GCC 9.

I will post v2 shortly. Let's hope for the best. Thanks again!

Fredrik


Re: [PATCH 20/25] GCN libgcc.

2018-11-09 Thread Jeff Law
On 9/5/18 5:52 AM, a...@codesourcery.com wrote:
> This patch contains the GCN port of libgcc.  I've broken it out just to keep
> both parts more manageable.
> 
> We have the usual stuff, plus a "gomp_print" implementation intended to 
> provide
> a means to output text to console without using the full printf.  Originally
> this was because we did not have a working Newlib port, but now it provides 
> the
> underlying mechanism for printf.  It's also much lighter than printf, and
> therefore more suitable for debugging offload kernels (for which there is no
> debugger, yet).
> 
> In order to work in offload kernels the same function must be present in both
> host and GCN toolchains.  Therefore it needs to live in libgomp (hence the
> name).  However, having found it also useful in stand alone testing I have
> moved the GCN implementation to libgcc.
> 
> It was also necessary to provide a means to disable EMUTLS.
> 
> 2018-09-05  Andrew Stubbs  
>   Kwok Cheung Yeung  
>   Julian Brown  
>   Tom de Vries  
> 
>   libgcc/
>   * Makefile.in: Don't add emutls.c when --enable-emutls is "no".
>   * config.host: Recognize amdgcn*-*-amdhsa.
>   * config/gcn/crt0.c: New file.
>   * config/gcn/gomp_print.c: New file.
>   * config/gcn/lib2-divmod-hi.c: New file.
>   * config/gcn/lib2-divmod.c: New file.
>   * config/gcn/lib2-gcn.h: New file.
>   * config/gcn/reduction.c: New file.
>   * config/gcn/sfp-machine.h: New file.
>   * config/gcn/t-amdgcn: New file.
> ---
> 
> 
> 0020-GCN-libgcc.patch
> 
> diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
> index 0c5b264..6f68257 100644
> --- a/libgcc/Makefile.in
> +++ b/libgcc/Makefile.in
> @@ -429,9 +429,11 @@ LIB2ADD += enable-execute-stack.c
>  # While emutls.c has nothing to do with EH, it is in LIB2ADDEH*
>  # instead of LIB2ADD because that's the way to be sure on some targets
>  # (e.g. *-*-darwin*) only one copy of it is linked.
> +ifneq ($(enable_emutls),no)
>  LIB2ADDEH += $(srcdir)/emutls.c
>  LIB2ADDEHSTATIC += $(srcdir)/emutls.c
>  LIB2ADDEHSHARED += $(srcdir)/emutls.c
> +endif
Why is this needed? Are you just trying to cut out stuff you don't need
in the quest for smaller code or does this cause a more direct problem?


> diff --git a/libgcc/config/gcn/crt0.c b/libgcc/config/gcn/crt0.c
> new file mode 100644
> index 000..f4f367b
> --- /dev/null
> +++ b/libgcc/config/gcn/crt0.c
> @@ -0,0 +1,23 @@
> +/* Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   This file is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by the
> +   Free Software Foundation; either version 3, or (at your option) any
> +   later version.
> +
> +   This file is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   .  */
> +
> +/* Provide an entry point symbol to silence a linker warning.  */
> +void _start() {}
This seems wrong.   I realize you're trying to quiet a linker warning
here, but for the case where you're creating GCN executables (testing?)
this should probably be handled by the C-runtime or linker script.


> diff --git a/libgcc/config/gcn/gomp_print.c b/libgcc/config/gcn/gomp_print.c
> new file mode 100644
> index 000..41f50c3
> --- /dev/null
> +++ b/libgcc/config/gcn/gomp_print.c
[ ... ]
Would this be better in libgomp?  Oh, you addressed that in the
prologue.  Feels like libgomp would be better to me, but I can
understand the rationale behind wanting it in libgcc.


I won't comment on the static sizes since this apparently has to match
something already in existence.



> +
> +void
> +gomp_print_string (const char *msg, const char *value)
> +{
> +  struct printf_data *output = reserve_print_slot ();
> +  output->type = 2; /* String.  */
> +
> +  strncpy (output->msg, msg, 127);
> +  output->msg[127] = '\0';
> +  strncpy (output->text, value, 127);
> +  output->text[127] = '\0';
> +
> +  asm ("" ::: "memory");
> +  output->written = 1;
> +}
I'm not familiar with the GCN memory model, but your asm is really just
a barrier for the compiler.  Do you need any kind of hardware fencing
here?  Similarly for other instances.

All these functions probably need a little comment on their purpose and
arguments.

Note some of the divmod stuff recently changed.  You may need minor
updates

libgo patch committed: Fix typo in gccgo name mangling in cgo

2018-11-09 Thread Ian Lance Taylor
This patch by Than McIntosh fixes a typo in cmd/cgo in the gccgo name
mangling recipe.  The code to implement new-style gccgo name mangling
had a recipe that didn't quite match the one in the compiler
(incorrect handling for '.'). This showed up as a failure in the
gotools cgo test if the directory containing the test run included a
"." character.  Bootstrapped and ran Go tests on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 265974)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-da8c968474690d1e77442ac3361b2302ea8e1f36
+559fae430b81595efe151222385192a07a9fc3c3
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/cgo/out.go
===
--- libgo/go/cmd/cgo/out.go (revision 265710)
+++ libgo/go/cmd/cgo/out.go (working copy)
@@ -1261,7 +1261,7 @@ func gccgoPkgpathToSymbolNew(ppath strin
for _, c := range []byte(ppath) {
switch {
case 'A' <= c && c <= 'Z', 'a' <= c && c <= 'z',
-   '0' <= c && c <= '9', '_' == c:
+   '0' <= c && c <= '9', c == '_', c == '.':
bsl = append(bsl, c)
default:
changed = true


Re: [PATCH 21/25] GCN Back-end (part 1/2).

2018-11-09 Thread Jeff Law
On 9/5/18 7:40 AM, Andrew Stubbs wrote:
> This part initially failed to send due to size.
> 
> This is the main portion of the GCN back-end, plus the configuration
> adjustments needed to build it.
> 
> The config.sub patch is here so people can try it, but I'm aware that
> needs to
> be committed elsewhere first.
> 
> The back-end contains various bits that support OpenACC and OpenMP, but the
> middle-end and libgomp patches are missing.  I included them here because
> they're harmless and carving up the files seems like unnecessary effort.
>  The
> remaining offload support will be posted at a later date.
> 
> The gcn-run.c is a separate tool that can run a GCN program on a GPU using
> the ROCm drivers and HSA runtime libraries.
> 
> 2018-09-05  Andrew Stubbs  
>> ...Kwok Cheung Yeung  
>> ...Julian Brown  
>> ...Tom de Vries  
>> ...Jan Hubicka  
>> ...Martin Jambor  
> 
>> ...* config.sub: Recognize amdgcn*-*-amdhsa.
>> ...* configure.ac: Likewise.
>> ...* configure: Regenerate.
> 
>> ...gcc/
>> ...* common/config/gcn/gcn-common.c: New file.
>> ...* config.gcc: Add amdgcn*-*-amdhsa configuration.
>> ...* config/gcn/constraints.md: New file.
>> ...* config/gcn/driver-gcn.c: New file.
>> ...* config/gcn/gcn-builtins.def: New file.
>> ...* config/gcn/gcn-hsa.h: New file.
>> ...* config/gcn/gcn-modes.def: New file.
>> ...* config/gcn/gcn-opts.h: New file.
>> ...* config/gcn/gcn-passes.def: New file.
>> ...* config/gcn/gcn-protos.h: New file.
>> ...* config/gcn/gcn-run.c: New file.
>> ...* config/gcn/gcn-tree.c: New file.
>> ...* config/gcn/gcn-valu.md: New file.
>> ...* config/gcn/gcn.c: New file.
>> ...* config/gcn/gcn.h: New file.
>> ...* config/gcn/gcn.md: New file.
>> ...* config/gcn/gcn.opt: New file.
>> ...* config/gcn/mkoffload.c: New file.
>> ...* config/gcn/offload.h: New file.
>> ...* config/gcn/predicates.md: New file.
>> ...* config/gcn/t-gcn-hsa: New file.
> 
> 0021-gcn-port-pt1.patch
> 

> +amdgcn-*-amdhsa)
> + tm_file="dbxelf.h elfos.h gcn/gcn-hsa.h gcn/gcn.h newlib-stdint.h"
Please consider killing dbxelf.h :-)  I assume your default debugging
format is dwarf2, but do you really need to support embedded stabs?



> +
> +/* FIXME: review debug info settings */
> +#define PREFERRED_DEBUGGING_TYPE   DWARF2_DEBUG
> +#define DWARF2_DEBUGGING_INFO  1
> +#define DWARF2_ASM_LINE_DEBUG_INFO 1
> +#define EH_FRAME_THROUGH_COLLECT2  1
These look reasonable.  Essentially you're doing dwarf2 by default.
Maybe just look at EH_FRAME_THROUGH_COLLECT2 more closely to make sure
it still makes sense and isn't a remnant of early port hackery to get
things stumbling along.


> diff --git a/gcc/config/gcn/gcn-run.c b/gcc/config/gcn/gcn-run.c
> new file mode 100644
> index 000..3dea343
> --- /dev/null
> +++ b/gcc/config/gcn/gcn-run.c
I'm going to assume this is largely correct.  It looks like all the glue
code to run kernels on the unit.  It loads the code to be run AFACIT, so
it doesn't need an exception clause as it's not linked against the code
that is to be run IIUC.



> diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c
> new file mode 100644
> index 000..0365baf
> --- /dev/null
> +++ b/gcc/config/gcn/gcn-tree.c
> @@ -0,0 +1,715 @@
> +/* Copyright (C) 2017-2018 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +   
> +   GCC is free software; you can redistribute it and/or modify it under
> +   the terms of the GNU General Public License as published by the Free
> +   Software Foundation; either version 3, or (at your option) any later
> +   version.
> +   
> +   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +   WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +   for more details.
> +   
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .  */
> +
> +/* {{{ Includes.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "target.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "gimple-iterator.h"
> +#include "cfghooks.h"
> +#include "cfgloop.h"
> +#include "tm_p.h"
> +#include "stringpool.h"
> +#include "fold-const.h"
> +#include "varasm.h"
> +#include "omp-low.h"
> +#include "omp-general.h"
> +#include "internal-fn.h"
> +#include "tree-vrp.h"
> +#include "tree-ssanames.h"
> +#include "tree-ssa-operands.h"
> +#include "gimplify.h"
> +#include "tree-phinodes.h"
> +#include "cgraph.h"
> +#include "targhooks.h"
> +#include "langhooks-def.h"
> +
> +/* }}}  */
> +/* {{{ OMP GCN pass.  */
> +
> +unsigned int
> +execute_omp_gcn (void)
So some documentation about what this pass is supposed to be doing

[PATCH v2] MIPS: Add `-mfix-r5900' option for the R5900 short loop erratum

2018-11-09 Thread Fredrik Noring
The short loop bug under certain conditions causes loops to
execute only once or twice, due to a hardware bug in the R5900 chip.

`-march=r5900' already enables the R5900 short loop workaround.
However, the R5900 ISA and most other MIPS ISAs are mutually
exclusive since R5900-specific instructions are generated as well.

The `-mfix-r5900' option can be used in combination with e.g.
`-mips2' or `-mips3' to generate generic MIPS binaries that also
work with the R5900 target.  The workaround is implemented by GAS
rather than by GCC.

The following small `shortloop.c' file has been used as a test
with GCC 8.2.0:

void shortloop(void)
{
__asm__ __volatile__ (
"   li $3, 300\n"
"loop:\n"
"   addi $3, -1\n"
"   addi $4, -1\n"
"   bne $3, $0, loop\n"
"   li $4, 3\n"
::);
}

The following six combinations have been tested:

% mipsr5900el-unknown-linux-gnu-gcc -O1 -c shortloop.c
% mipsr5900el-unknown-linux-gnu-gcc -O1 -c shortloop.c -mfix-r5900
% mipsr5900el-unknown-linux-gnu-gcc -O1 -c shortloop.c -mno-fix-r5900

% mipsr4000el-unknown-linux-gnu-gcc -O1 -c shortloop.c
% mipsr4000el-unknown-linux-gnu-gcc -O1 -c shortloop.c -mfix-r5900
% mipsr4000el-unknown-linux-gnu-gcc -O1 -c shortloop.c -mno-fix-r5900

The R5900 short loop erratum is corrected in exactly three cases:

1. for the target `mipsr5900el' by default;

2. for the target `mipsr5900el' with `-mfix-r5900';

3. for any other MIPS target (e.g. `mipsr4000el') with `-mfix-r5900'.

In all other cases the correction is not made.

gcc/
* config/mips/mips.c (mips_reorg_process_insns)
(mips_option_override): Handle `-mfix-r5900'.
* config/mips/mips.h (ASM_SPEC): Add `mfix-r5900' and
`mno-fix-r5900'.
* config/mips/mips.opt (mfix-r5900): New option.
* doc/invoke.texi: Document the `r5900' processor name, and
`-mfix-r5900' and `-mno-fix-r5900' options.
---
Thanks a lot for your review, Maciej!

Changes in v2:
- ChangeLog entry corrections
- Rebase to origin/trunk
---
 gcc/config/mips/mips.c   | 14 ++
 gcc/config/mips/mips.h   |  1 +
 gcc/config/mips/mips.opt |  4 
 gcc/doc/invoke.texi  | 14 +-
 4 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index d78e2056ec2..04de85a3ba2 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -18906,13 +18906,13 @@ mips_reorg_process_insns (void)
   if (crtl->profile)
 cfun->machine->all_noreorder_p = false;
 
-  /* Code compiled with -mfix-vr4120, -mfix-rm7000 or -mfix-24k can't be
- all noreorder because we rely on the assembler to work around some
- errata.  The R5900 too has several bugs.  */
+  /* Code compiled with -mfix-vr4120, -mfix-r5900, -mfix-rm7000 or
+ -mfix-24k can't be all noreorder because we rely on the assembler
+ to work around some errata.  The R5900 target has several bugs.  */
   if (TARGET_FIX_VR4120
   || TARGET_FIX_RM7000
   || TARGET_FIX_24K
-  || TARGET_MIPS5900)
+  || TARGET_FIX_R5900)
 cfun->machine->all_noreorder_p = false;
 
   /* The same is true for -mfix-vr4130 if we might generate MFLO or
@@ -20287,6 +20287,12 @@ mips_option_override (void)
   && strcmp (mips_arch_info->name, "r4400") == 0)
 target_flags |= MASK_FIX_R4400;
 
+  /* Default to working around R5900 errata only if the processor
+ was selected explicitly.  */
+  if ((target_flags_explicit & MASK_FIX_R5900) == 0
+  && strcmp (mips_arch_info->name, "r5900") == 0)
+target_flags |= MASK_FIX_R5900;
+
   /* Default to working around R1 errata only if the processor
  was selected explicitly.  */
   if ((target_flags_explicit & MASK_FIX_R1) == 0
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 11ca364d752..d2205f08972 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1411,6 +1411,7 @@ struct mips_cpu_info {
 %{mloongson-ext2} %{mno-loongson-ext2} \
 %{msmartmips} %{mno-smartmips} \
 %{mmt} %{mno-mt} \
+%{mfix-r5900} %{mno-fix-r5900} \
 %{mfix-rm7000} %{mno-fix-rm7000} \
 %{mfix-vr4120} %{mfix-vr4130} \
 %{mfix-24k} \
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 16c33d12e22..75332100435 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -165,6 +165,10 @@ mfix-r4400
 Target Report Mask(FIX_R4400)
 Work around certain R4400 errata.
 
+mfix-r5900
+Target Report Mask(FIX_R5900)
+Work around the R5900 short loop erratum.
+
 mfix-rm7000
 Target Report Var(TARGET_FIX_RM7000)
 Work around certain RM7000 errata.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e071f125efe..44ecc241fd7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -945,6 +945,7 @@ Objective-C and Objective-C++ Dialects}.
 -mmad  -mno-mad  -mimadd  -mno-imadd  -mfused-madd  -mno-fused-madd  -nocpp 
@gol
 -mfix-24k  -mno-fix-24k @gol
 -mfix-r4000  -mno-fix-r4000  -m

Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-11-09 Thread Jeff Law
On 9/5/18 7:42 AM, Andrew Stubbs wrote:
> This part initially failed to send due to size.
> 
> Here's part 2.
> 
> 0021-gcn-port-pt2.patch
[ ... ]
You've already addressed Joseph's comments in a follow-up.


> 
> diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
> new file mode 100644
> index 000..7e59b06
> --- /dev/null
> +++ b/gcc/config/gcn/gcn.c
> @@ -0,0 +1,6161 @@
> +/* Copyright (C) 2016-2018 Free Software Foundation, Inc.
> +
> +   This file is free software; you can redistribute it and/or modify it under
> +   the terms of the GNU General Public License as published by the Free
> +   Software Foundation; either version 3 of the License, or (at your option)
> +   any later version.
> +
> +   This file is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +   for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .  */
> +
> +/* {{{ Includes.  */
> +
> +/* We want GET_MODE_SIZE et al to return integers, please.  */

> +
> +static tree
> +gcn_handle_amdgpu_hsa_kernel_attribute (tree *node, tree name,
> + tree args, int, bool *no_add_attrs)
> +{
> +  if (TREE_CODE (*node) != FUNCTION_TYPE
> +  && TREE_CODE (*node) != METHOD_TYPE
> +  && TREE_CODE (*node) != METHOD_TYPE
> +  && TREE_CODE (*node) != FIELD_DECL
> +  && TREE_CODE (*node) != TYPE_DECL)
METHOD_TYPE tested twice here.  Might as well use FUNC_OR_METHOD_TYPE_P.


>> +
> +/* Return true is REG is a valid place to store a pointer,
> +   for instructions that require an SGPR.
> +   FIXME rename. */
> +
> +static bool
> +gcn_address_register_p (rtx reg, machine_mode mode, bool strict)
> +{
> +  if (GET_CODE (reg) == SUBREG)
> +reg = SUBREG_REG (reg);
> +
> +  if (!REG_P (reg))
> +return false;
> +
> +  if (GET_MODE (reg) != mode)
> +return false;
> +
> +  int regno = REGNO (reg);
> +
> +  if (regno >= FIRST_PSEUDO_REGISTER)
> +{
> +  if (!strict)
> + return true;
> +
> +  if (!reg_renumber)
> + return false;
> +
> +  regno = reg_renumber[regno];
> +}
> +
> +  return (regno < 102 || regno == M0_REG
> +   || regno == ARG_POINTER_REGNUM || regno == FRAME_POINTER_REGNUM);
Consider using a symbolic name for "102" :-)



> +
> +/* Generate epilogue.  Called from gen_epilogue during pro_and_epilogue pass.
> +
> +   See gcn_expand_prologue for stack details.  */
> +
> +void
> +gcn_expand_epilogue (void)
You probably need a barrier in here to ensure that the scheduler doesn't
move an aliased memory reference into the local stack beyond the stack
adjustment.

You're less likely to run into it because you eliminate frame pointers
fairly aggressively, but it's still the right thing to do.

> +
> +/* Implement TARGET_LEGITIMATE_COMBINED_INSN.
> +
> +   Return false if the instruction is not appropriate as a combination of two
> +   or more instructions.  */
> +
> +bool
> +gcn_legitimate_combined_insn (rtx_insn *insn)
> +{
> +  rtx pat = PATTERN (insn);
> +
> +  /* The combine pass tends to strip (use (exec)) patterns from insns.  This
> + means it basically switches everything to use the *_scalar form of the
> + instructions, which is not helpful.  So, this function disallows such
> + combinations.  Unfortunately, this also disallows combinations of 
> genuine
> + scalar-only patterns, but those only come from explicit expand code.
> +
> + Possible solutions:
> + - Invent TARGET_LEGITIMIZE_COMBINED_INSN.
> + - Remove all (use (EXEC)) and rely on md_reorg with "exec" attribute.
> +   */
This seems a bit hokey.  Why specifically is combine removing the USE?



> +
> +/* If INSN sets the EXEC register to a constant value, return the value,
> +   otherwise return zero.  */
> +
> +static
> +int64_t gcn_insn_exec_value (rtx_insn *insn)
Nit.  Make sure the function's name is in column 0 by moving the return
type to the previous line.


> +{
> +  if (!NONDEBUG_INSN_P (insn))
> +return 0;
> +
> +  rtx pattern = PATTERN (insn);
> +
> +  if (GET_CODE (pattern) == SET)
> +{
> +  rtx dest = XEXP (pattern, 0);
> +  rtx src = XEXP (pattern, 1);
> +
> +  if (GET_MODE (dest) == DImode
> +   && REG_P (dest) && REGNO (dest) == EXEC_REG
> +   && CONST_INT_P (src))
> + return INTVAL (src);
> +}
> +
> +  return 0;
> +}
> +
> +/* Sets the EXEC register before INSN to the value that it had after
> +   LAST_EXEC_DEF.  The constant value of the EXEC register is returned if
> +   known, otherwise it returns zero.  */
> +
> +static
> +int64_t gcn_restore_exec (rtx_insn *insn, rtx_insn *last_exec_def,
> +   int64_t curr_exec, bool curr_exec_known,
> +   bool &last_exec_def_saved)
Similarly.  

Re: [PATCH] add simple attribute introspection

2018-11-09 Thread Jeff Law
On 10/31/18 10:27 AM, Martin Sebor wrote:
> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01473.html
> 
> With the C++ bits approved I'm still looking for a review/approval
> of the remaining changes: the C front end and the shared c-family
> handling of the new built-in.
I thought I acked those with just a couple minor doc nits:

> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 8ffb0cd..dcf4747 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -2649,8 +2649,9 @@ explicit @code{externally_visible} attributes
are still necessary.
>  @cindex @code{flatten} function attribute
>  Generally, inlining into a function is limited.  For a function
marked with
>  this attribute, every call inside this function is inlined, if possible.
> -Whether the function itself is considered for inlining depends on its
size and
> -the current inlining parameters.
> +Functions declared with attribute @code{noinline} and similar are not
> +inlined.  Whether the function itself is considered for inlining depends
> +on its size and the current inlining parameters.
Guessing this was from another doc patch that I think has already been
approved


> @@ -11726,6 +11728,33 @@ check its compatibility with @var{size}.
>
>  @end deftypefn
>
> +@deftypefn {Built-in Function} bool __builtin_has_attribute
(@var{type-or-expression}, @var{attribute})
> +The @code{__builtin_has_attribute} function evaluates to an integer
constant
> +expression equal to @code{true} if the symbol or type referenced by
> +the @var{type-or-expression} argument has been declared with
> +the @var{attribute} referenced by the second argument.  Neither argument
> +is valuated.  The @var{type-or-expression} argument is subject to the
same
s/valuated/evaluated/ ?


Did the implementation change significantly requiring another review
iteration?

jeff


Re: [PATCH] Fix PR 86572

2018-11-09 Thread Jeff Law
On 11/5/18 4:20 PM, Bernd Edlinger wrote:
> On 11/5/18 1:28 AM, H.J. Lu wrote:
>> On Sun, Nov 4, 2018 at 10:02 AM Jeff Law  wrote:
>>> On 10/22/18 9:08 AM, Bernd Edlinger wrote:
 Hi!

 This makes c_strlen avoid an unsafe strlen folding of const arguments
 with non-const offset.  Currently a negative out of bounds offset
 makes the strlen function return an extremely large number, and
 at the same time, prevents the VRP machinery, to determine the correct
 range if the strlen function in this case.

 Fixed by doing the whole computation in size_t and casting the
 result back to ssize_t.


 Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
 Is it OK for trunk?


 Thanks
 Bernd.


 patch-pr86572.diff

 gcc:
 2018-10-22  Bernd Edlinger  

PR tree-optimization/86572
* builtins.c (c_strlen): Handle negative offsets in a safe way.

 testsuite:
 2018-10-22  Bernd Edlinger  

PR tree-optimization/86572
* gcc.dg/pr86572.c: New test.
>>> OK.
>>> jeff
>> This caused:
>>
>> /export/gnu/import/git/gcc-test-ia32/src-trunk/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c:56:1:
>> internal compiler error: verify_gimple failed^M
>> 0x8922dc4 verify_gimple_in_seq(gimple*)^M
>>  ../../src-trunk/gcc/tree-cfg.c:5082^M
>> 0x86899d7 gimplify_body(tree_node*, bool)^M
>>  ../../src-trunk/gcc/gimplify.c:12859^M
>> 0x8689b8b gimplify_function_tree(tree_node*)^M
>>  ../../src-trunk/gcc/gimplify.c:12949^M
>> 0x84f7690 cgraph_node::analyze()^M
>>  ../../src-trunk/gcc/cgraphunit.c:667^M
>> 0x84fa1d8 analyze_functions^M
>>  ../../src-trunk/gcc/cgraphunit.c:1126^M
>> 0x84fadd3 symbol_table::finalize_compilation_unit()^M
>>  ../../src-trunk/gcc/cgraphunit.c:2833^M
>> Please submit a full bug report,^M
>> with preprocessed source if appropriate.^M
>> Please include the complete backtrace with any bug report.^M
>> See  for instructions.^M
>> compiler exited with status 1
>> FAIL: gcc.dg/warn-strlen-no-nul.c (internal compiler error)
>>
>> on i386.
>>
> Ah yes thanks.
> 
> This is caused by an incorrect folding in string_constant.
> After stripping the type casts in the POINTER_PLUS_EXPR
> we add the offset which is sizetype to what is left
> over from arg1, which is probably even a correctness issue,
> if the type cast was a narrowing one.
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu (this time
> tested with RUNTESTFLAGS="--target_board=unix\{-m32,\}")
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 
> 
> 
> patch-fix-string-cst.diff
> 
> 2018-11-05  Bernd Edlinger  
> 
>   * expr.c (string_constant): Don't strip NOPS in subexpressions.
>   Fold PLUS_EXPR correctly.
OK.
jeff


Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-11-09 Thread Jeff Law
On 11/8/18 6:33 AM, Wilco Dijkstra wrote:
> Hi,
> 
>> But the max. error in sinh/cosh/atanh is less than 2 ULP, with some math
>> libraries.  It could be < 1 ULP, in theory, so sinh(atanh(x)) less than
>> 2 ULP even.
> 
> You can't add ULP errors in general - a tiny difference in the input can 
> make a huge difference in the result if the derivative is > 1. 
> 
> Even with perfect implementations of 0.501ULP on easy functions with
> no large derivatives you could get a 2ULP total error if the perfectly rounded
> and actual result end up rounding in different directions in the 2nd 
> function...
> 
> So you have to measure ULP error since it is quite counter intuitive.
> 
>> And signed zeroes.  Yeah.  I think it would have to be
>> flag_unsafe_math_optimizations + some more.
> 
> Indeed.
So we need to give Giuliano some clear guidance on guarding.  This is
out of my area of expertise, so looking to y'all to help here.

jeff


[committed] Minor testsuite cleanup from summer's v850 work

2018-11-09 Thread Jeff Law

I've had this lying around.  Recent changes in that file reminded me to
push it.

Essentially the v850 pushes arguments for variadic functions rather than
passing them in args, similarly to the nds32.  So we need to guard the
test in a similar manner.

Committed to the trunk.

Jeff
commit cc9a0d23a784738d5121c8eb9f097acb0cf476bf
Author: law 
Date:   Fri Nov 9 20:08:20 2018 +

gcc.dg/torture/stackalign/builtin-apply-2.c: Skip on v850.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@265982 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 1556e8a7d19..285d0e1384c 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2018-11-09  Jeff Law  
+
+   gcc.dg/torture/stackalign/builtin-apply-2.c: Skip on v850.
+
 2018-11-09  Martin Sebor  
 
PR middle-end/81824
diff --git a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c 
b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c
index 3f8d350ba8f..dde2600c4d1 100644
--- a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c
+++ b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c
@@ -10,7 +10,7 @@
avr: Variadic funcs don't pass arguments in registers, while normal funcs
 do.  */
 /* { dg-skip-if "Variadic funcs use different argument passing from normal 
funcs" { arm_hf_eabi || { avr-*-* riscv*-*-* or1k*-*-* } } } */
-/* { dg-skip-if "Variadic funcs have all args on stack. Normal funcs have args 
in registers." { nds32*-*-* } } */
+/* { dg-skip-if "Variadic funcs have all args on stack. Normal funcs have args 
in registers." { nds32*-*-* } { v850*-*-* } } */
 /* { dg-require-effective-target untyped_assembly } */

 


Re: [RFC] support --with-multilib-list=@/path/name

2018-11-09 Thread Palmer Dabbelt

On Fri, 09 Nov 2018 07:07:06 PST (-0800), ol...@adacore.com wrote:

Richard,

Olivier tells me he talked to you briefly at the Cauldron about allowing
custom multilib sets to be configured from custom Makefile fragments
supplied at configure time, and that you was somewhat receptive to the
idea.  I have implemented this as follows, for ARM targets only so far,
using "@/path/name" in the argument to --with-multilib-list to mean
/path/name is to be used as a multilib-list Makefile fragment.

Does this look like an acceptable interface?  Is it ok as far as ARM
maintainers are concerned?

Any objections from maintainers of other targets that support
--with-multilib-list (or from anyone else) to the syntax, to the
implementation, or to extending this possibility to their ports?


We've had requests for this feature from users of the RISC-V port, but haven't 
had time to look into it.  One wrinkle in RISC-V land is that we have a script 
in GCC that generates our fragments, so I'm not sure if this is exactly what we 
want.  It's certainly better than our current "just patch GCC" flow, though!



for  gcc/ChangeLog

* config.gcc (tmake_file): Add /path/name to tmake_file for
each @/path/name in --with-multilib-list on arm-*-* targets.
* configure.ac: Accept full pathnames in tmake_file.
* configure: Rebuilt.
* doc/install.texi (with-multilib-list): Document it.
---
 gcc/config.gcc   |   13 +
 gcc/configure|9 ++---
 gcc/configure.ac |5 -
 gcc/doc/install.texi |   22 +++---
 4 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 7578ff03825e..f1363c41f989 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3998,6 +3998,19 @@ case "${target}" in
aprofile|rmprofile)

tmake_profile_file="arm/t-multilib"
;;
+   @/*)
+   ml=`echo "X$arm_multilib" | sed 
'1s,^X@,,'`
+   if test -f "${ml}"; then
+   tmake_file="${tmake_file} 
${ml}"
+   else
+   echo "Error: ${ml} does not 
exist" >&2
+   exit 1
+   fi
+   ;;
+   @*)
+   echo "Error: multilib config file must 
start with /" >&2
+   exit 1
+   ;;
*)
echo "Error: 
--with-multilib-list=${with_multilib_list} not supported." 1>&2
exit 1
diff --git a/gcc/configure b/gcc/configure
index b814484ea25b..5f15c7a1ff02 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -12244,7 +12244,10 @@ done
 tmake_file_=
 for f in ${tmake_file}
 do
-   if test -f ${srcdir}/config/$f
+   if test -n `echo "X$f" | sed -n '1s,^X/.*,/,p` && test -f "$f"
+   then
+   tmake_file_="${tmake_file_} $f"
+   elif test -f ${srcdir}/config/$f
then
tmake_file_="${tmake_file_} \$(srcdir)/config/$f"
fi
@@ -18572,7 +18575,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18575 "configure"
+#line 18578 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H

@@ -18678,7 +18681,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18681 "configure"
+#line 18684 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H

diff --git a/gcc/configure.ac b/gcc/configure.ac
index 59585912556b..99a3e6f8f52f 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1940,7 +1940,10 @@ done
 tmake_file_=
 for f in ${tmake_file}
 do
-   if test -f ${srcdir}/config/$f
+   if test -n `echo "X$f" | sed -n '1s,^X/.*,/,p` && test -f "$f"
+   then
+   tmake_file_="${tmake_file_} $f"
+   elif test -f ${srcdir}/config/$f
then
tmake_file_="${tmake_file_} \$(srcdir)/config/$f"
fi
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index be9b07b5d23b..fd19fc590ec8 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1078,13 +1078,21 @@ values and meaning for each target is given below.
 
 @table @code

 @item arm*-*-*
-@var{list} is a comma separated list of @code{aprofile} and @code{rmprofile}
-to build multilibs for A or R and M architecture profiles respective

Re: Relocation (= move+destroy)

2018-11-09 Thread Jonathan Wakely

On 25/10/18 13:36 +0100, Jonathan Wakely wrote:

On 25/10/18 14:30 +0200, Marc Glisse wrote:

On Tue, 23 Oct 2018, Jonathan Wakely wrote:


+  template
+inline void
+__relocate_a(_Tp* __dest, _Up* __orig, _Allocator& __alloc)


I find it a little surprising that this overload for single objects
using the memmove argument ordering (dest, source) but the range overload
below uses the STL ordering (source_begin, source_end, dest).

But I wouldn't be surprised if we're already doing that somewhere that
I've forgotten about.

WOuld it make sense to either rename this overload, or to use
consistent argument ordering for the two __relocate_a overloads?


The functions were not meant as overloads, it just happened that 
I arrived at the same name for both, but it would make perfect 
sense to give them different names. I started from 
__relocate(dest, source) for one element, and later added an 
allocator to it. The other one corresponds to 
__uninitialized_move_a, and naming it __uninitialized_relocate_a 
would be silly since "uninitialized" is included in the 
definition of relocate.


Yes, my first thought was to add "uninitialized" and I rejected it for
that same reason.

I think I'd rather rename than change the order. Do you have 
suggestions? __relocate_range_a?


I was thinking the single object one could be __relocate_1_a with the
_1 being like copy_n, fill_n etc. but that's going to be confusing
with __relocate_a_1 instead!

It seems unfortunate to have to put "range" in the name when no other
algos that work on ranges bother to say that in the name.

Maybe the single object one could be __relocate_single_a? Or
__do_relocate_a? __relocate_obj_a? None of them really makes me happy.


I went with __relocate_object_a, but that's easy to change.


I'm OK with that. If anybody thinks of a better name we can change it.

We may want to specialize / overload __relocate_object_a for some 
specific types in the future, although we would more likely have 
__relocate_object_a call __relocate_object (no allocator) when it 
receives the default allocator, and specialize that one. And this 
should anyway be less common that specializing 
__is_trivially_relocatable.


I just realized that __reallocate_a and construct don't take the 
allocator argument in the same position... I can switch if it helps.


No, leaving it at the end is consistent with __uninitialized_copy_a
et al, and it's closer in spirit to them.

OK for trunk, thanks!


Here's the fix for the regression this introduced.

Tested x86_64-linux, committed to trunk.


commit caf2adcb563d7dd62c3de7bb49f528753f958ba3
Author: Jonathan Wakely 
Date:   Fri Nov 9 19:19:16 2018 +

PR libstdc++/87787 fix UBsan error in std::vector

PR libstdc++/87787
* include/bits/stl_uninitialized.h (__relocate_a_1): Do not call
memmove when there's nothing to copy (and pointers could be null).

diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h b/libstdc++-v3/include/bits/stl_uninitialized.h
index 94c7e151e29..8839bfdcc90 100644
--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -904,7 +904,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		   _Tp* __result, allocator<_Up>& __alloc)
 {
   ptrdiff_t __count = __last - __first;
-  __builtin_memmove(__result, __first, __count * sizeof(_Tp));
+  if (__count > 0)
+	__builtin_memmove(__result, __first, __count * sizeof(_Tp));
   return __result + __count;
 }
 


  1   2   >