Re: [libgcc] Use v2 map syntax in libgcc-unwind.map if Solaris ld supports it

2018-09-18 Thread Rainer Orth
Hi Jeff,

> On 9/16/18 5:28 AM, Rainer Orth wrote:
>> Currently, the libgcc-unwind.map file generated for use with Solaris ld
>> 
>>  http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01088.html
>> 
>> uses the v1 linker map file syntax because both that's supported
>> everywhere.  However, with ld -z guidance, newer versions of ld warn
>> about this:
>> 
>> ld: guidance: version 2 mapfile syntax recommended: ./libgcc-unwind.map
>> 
>> Since it is easy to detect if ld supports v2 map syntax (introduced in
>> Solaris 11 and later backported to some Solaris 10 patches) and the
>> mapfile is generated at build time, the following patch performs this
>> check and generates a v2 mapfile if ld supports it.
>> 
>> While testing the patch, I found that the arg to AC_TRY_COMMAND needed
>> quoting to avoid the embedded commas in -Wl,-M,... ended the command.
>> Shouldn't the other uses of AC_TRY_COMMAND receive the same quoting for
>> safety and consistency?
>> 
>> Bootstrapped on i386-pc-solaris2.10 (with older v1-only ld) and
>> i386-pc-solaris2.11 without regressions.
>> 
>> Ok for mainline?
> OK with a suitable ChangeLog.

thanks, installed.  However, the ChangeLog entry already was in the
original patch submission.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH]: Allow TARGET_SCHED_ADJUST_PRIORITY hook to reduce priority

2018-09-18 Thread Andreas Schwab
On Sep 17 2018, Jeff Law  wrote:

> On 9/17/18 3:08 AM, Andreas Schwab wrote:
>>  PR rtl-optimization/85458
>>  * sel-sched.c (sel_target_adjust_priority): Remove wrong
>>  assertion.
> Under what conditions is the new priority negative?  Without digging
> deep into the ia64 port that just seems wrong.

It is created in create_speculation_check:

Starting program: /daten/src/gcc/c-ia64/gcc/cc1 -O3 -funroll-loops 
../../gcc/gcc/testsuite/gcc.c-torture/compile/20010102-1.c
 _obstack_newchunk
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data>
 Streaming LTO

Assembling 
functions:
  _obstack_newchunk
Breakpoint 6, create_speculation_check (c_expr=0x7fffd250, check_ds=47, 
orig_insn=0x76b6b4c0) at ../../gcc/gcc/sel-sched.c:1820
1820  if (recovery_block != NULL)
$18 = 13
(gdb) c
Continuing.

Breakpoint 6, create_speculation_check (c_expr=0x7fffd250, check_ds=47, 
orig_insn=0x76b6b680) at ../../gcc/gcc/sel-sched.c:1820
1820  if (recovery_block != NULL)
$19 = 11
(gdb) 
Continuing.

Breakpoint 6, create_speculation_check (c_expr=0x7fffd250, check_ds=47, 
orig_insn=0x76b6ba00) at ../../gcc/gcc/sel-sched.c:1820
1820  if (recovery_block != NULL)
$20 = 7
(gdb) 
Continuing.

Breakpoint 6, create_speculation_check (c_expr=0x7fffd250, check_ds=26, 
orig_insn=0x76bb1940) at ../../gcc/gcc/sel-sched.c:1820
1820  if (recovery_block != NULL)
$21 = 1
(gdb) 
Continuing.

Breakpoint 6, create_speculation_check (c_expr=0x7fffd250, 
check_ds=253978, orig_insn=0x76b6b300)
at ../../gcc/gcc/sel-sched.c:1820
1820  if (recovery_block != NULL)
$22 = 15
(gdb) 
Continuing.

Breakpoint 6, create_speculation_check (c_expr=0x7fffd250, 
check_ds=253952, orig_insn=0x76b55240)
at ../../gcc/gcc/sel-sched.c:1820
1820  if (recovery_block != NULL)
$23 = 0
(gdb) 
Continuing.

Breakpoint 6, create_speculation_check (c_expr=0x7fffd250, 
check_ds=253999, orig_insn=0x76b554c0)
at ../../gcc/gcc/sel-sched.c:1820
1820  if (recovery_block != NULL)
$24 = 0
(gdb) 
Continuing.

Breakpoint 6, create_speculation_check (c_expr=0x7fffd250, 
check_ds=253987, orig_insn=0x76b55700)
at ../../gcc/gcc/sel-sched.c:1820
1820  if (recovery_block != NULL)
$25 = 0
(gdb) 
Continuing.

Breakpoint 6, create_speculation_check (c_expr=0x7fffd250, 
check_ds=253978, orig_insn=0x76b55940)
at ../../gcc/gcc/sel-sched.c:1820
1820  if (recovery_block != NULL)
$26 = 0
(gdb) 
Continuing.

Breakpoint 6, create_speculation_check (c_expr=0x7fffd250, 
check_ds=253978, orig_insn=0x76b55b80)
at ../../gcc/gcc/sel-sched.c:1820
1820  if (recovery_block != NULL)
$27 = 0
(gdb) 
Continuing.

Breakpoint 6, create_speculation_check (c_expr=0x7fffd250, 
check_ds=253952, orig_insn=0x76b58540)
at ../../gcc/gcc/sel-sched.c:1820
1820  if (recovery_block != NULL)
$28 = 0
(gdb) 
Continuing.

Breakpoint 6, create_speculation_check (c_expr=0x7fffd250, 
check_ds=253952, orig_insn=0x76b586c0)
at ../../gcc/gcc/sel-sched.c:1820
1820  if (recovery_block != NULL)
$29 = -1
(gdb) 
Continuing.

Breakpoint 1, fancy_abort (file=0x13ead27 "../../gcc/gcc/sel-sched.c", 
line=, 
function=0x13ebe30  
"sel_target_adjust_priority") at ../../gcc/gcc/diagnostic.c:1559
1559{

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] Introduce libgcov.so run-time library (PR gcov-profile/84107).

2018-09-18 Thread Martin Liška
On 9/17/18 1:39 PM, Martin Liška wrote:
> On 9/12/18 4:48 PM, Richard Biener wrote:
>> On Wed, Sep 12, 2018 at 11:27 AM Martin Liška  wrote:
>>>
>>> On 9/11/18 5:08 PM, Alexander Monakov wrote:
 On Tue, 11 Sep 2018, Martin Liška wrote:
> I've discussed the topic with Alexander on the Cauldron and we hoped
> that the issue with unique __gcov_root can be handled with DECL_COMMON in 
> each DSO.
> Apparently this approach does not work as all DSOs in an executable have 
> eventually
> just a single instance of the __gcov_root, which is bad.

 No, that happens only if they have default visibility.  Emitting them with
 "hidden" visibility solves this.
>>>
>>> Ah, ok, thanks. So theoretically that way should work.
>>>

> So that I returned back to catch the root of the failure. It shows that 
> even with
> -Wl,--dynamic-list-data __gcov_indirect_call_callee point to a different 
> variable
> among the DSOs. The reason is that the variable is TLS:

 (no, that the variable is TLS is not causing the bug; the issue is 
 libraries
 carry public definitions of just one or both variables depending on if they
 have indirect calls or not, and the library state becomes "diverged" when
 one variable gets interposed while the other does not)
>>>
>>> I see, I'm attaching patch that does that. I can confirm your test-case 
>>> works
>>> fine w/o -Wl,--dynamic-list-data.
>>>
>>> I'm wondering if it will work as well with dlopen/dlsym machinery? Or now
>>> the linker flag will be needed?
>>>

> That said I would like to go this direction. I would be happy to receive
> a feedback. Then I'll test the patch.

 Hm, this TLS change is attacking the issue from a wrong direction.  What I
 suggested on the Cauldron as a simple and backportable way of solving this
 is to consolidate the two TLS variables into one TLS struct, which is then
 either interposed or not as a whole.
>>>
>>> Got it, current I prefer the solution.
>>
>> Sounds good.  I notice the code had this before but...
>>
>> +  TREE_PUBLIC (ic_tuple_var) = 1;
>> +  DECL_EXTERNAL (ic_tuple_var) = 1;
>> +  TREE_STATIC (ic_tuple_var) = 1;
>>
>> that's one flag too much?!  I suggest to drop DECL_EXTERNAL.
> 
> I've done that. I'm sending updated version of the patch.
> I'm currently testing the patch. Ready after it finishes tests?
> 
> Martin
> 
>>
>> Richard.
>>
>>> Martin
>>>

 Alexander

>>>
> 

Hi.

This is tested version where DECL_EXTERNAL is needed for LTO partitioning
to work properly. Note that the previous variables emitted in tree-profile.c 
were
also DECL_EXTERNAL.

Patch survives tests on ppcl64-linux-gnu.

Martin
>From 181c59e8e6474edadb23bcc7d3b387a854eb73a9 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 12 Sep 2018 11:24:59 +0200
Subject: [PATCH] Fix divergence in indirect profiling (PR gcov-profile/84107).

gcc/ChangeLog:

2018-09-17  Martin Liska  

	PR gcov-profile/84107
	* tree-profile.c (init_ic_make_global_vars):
	Remove ic_void_ptr_var and ic_gcov_type_ptr_var.
	Come up with new ic_tuple* variables.  Emit
	__gcov_indirect_call{,_topn} variables.
	(gimple_gen_ic_profiler): Access the variable
	and emit gimple.
	(gimple_gen_ic_func_profiler): Access
	__gcov_indirect_call.callee field.

libgcc/ChangeLog:

2018-09-17  Martin Liska  

	PR gcov-profile/84107
	* libgcov-profiler.c (__gcov_indirect_call):
	Change type to indirect_call_tuple.
	(struct indirect_call_tuple): New struct.
	(__gcov_indirect_call_topn_profiler): Change type.
	(__gcov_indirect_call_profiler_v2): Use the new
	variables.
	* libgcov.h (struct indirect_call_tuple): New struct
	definition.
---
 gcc/tree-profile.c| 79 +++
 libgcc/libgcov-profiler.c | 25 -
 libgcc/libgcov.h  |  9 +
 3 files changed, 63 insertions(+), 50 deletions(-)

diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index f96bd4b9704..8e1b966be1e 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -53,6 +53,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "tree-pretty-print.h"
+#include "langhooks.h"
+#include "stor-layout.h"
 
 static GTY(()) tree gcov_type_node;
 static GTY(()) tree tree_interval_profiler_fn;
@@ -64,8 +66,9 @@ static GTY(()) tree tree_ior_profiler_fn;
 static GTY(()) tree tree_time_profiler_counter;
 
 
-static GTY(()) tree ic_void_ptr_var;
-static GTY(()) tree ic_gcov_type_ptr_var;
+static GTY(()) tree ic_tuple_var;
+static GTY(()) tree ic_tuple_counters_field;
+static GTY(()) tree ic_tuple_callee_field;
 static GTY(()) tree ptr_void;
 
 /* Do initialization work for the edge profiler.  */
@@ -81,38 +84,36 @@ init_ic_make_global_vars (void)
   tree gcov_type_ptr;
 
   ptr_void = build_pointer_type (void_type_node);
+  gcov_type_ptr = build_pointer_type (get_gcov_type ());
 
-  ic_void_ptr_var
-= build_decl (UNKNOWN_LOCATION, VAR_D

Re: [PATCH] rtlanal: Fix nonzero_bits for non-load paradoxical subregs (PR85925)

2018-09-18 Thread Eric Botcazou
> Do you have any progress with this?

Once I'm done with the recent regressions, I'll get back to this older one.

-- 
Eric Botcazou


Re: [GCC][PATCH][Aarch64] Added pattern to match zero extended bfxil

2018-09-18 Thread Kyrill Tkachov

Hi all,

On 18/09/18 00:00, Ramana Radhakrishnan wrote:

On Mon, 17 Sep 2018, 23:56 Christophe Lyon, 
wrote:

> On Fri, 14 Sep 2018 at 12:04, Sam Tebbs  wrote:
> >
> >
> >
> > On 08/28/2018 11:54 PM, James Greenhalgh wrote:
> >
> > 
> > >
> > > OK once the other one is approved, with the obvious rebase over the
> renamed
> > > function.
> > >
> > > James
> >
> > Here is the rebased patch. Still OK for me to commit to trunk now that
> > the other patch has been committed?
> >
> > Sam
> >
> > >
> > >> gcc/
> > >> 2018-07-31  Sam Tebbs 
> > >>
> > >>   PR target/85628
> > >>   * config/aarch64/aarch64.md (*aarch64_bfxilsi_uxtw): Define.
> > >>
> > >> gcc/testsuite
> > >> 2018-07-31  Sam Tebbs 
> > >>
> > >>   PR target/85628
> > >>   * gcc.target/aarch64/combine_bfxil.c
> > >> (combine_zero_extended_int, foo6):
> > >>   New functions.
> >
>
> Hi Sam,
>
> This patch causes a regression when using -mabi=ilp32:
> FAIL: gcc.target/aarch64/combine_bfxil.c scan-assembler-not uxtw\\t
>
> How much do we care about ilp32?
>


Lets keep the testsuite clean please.




This is due to the foo* functions in the test that take a pointer argument and 
store results into it.
In ILP32 the callee clears the top bits with a uxtw. This is irrelevant to the 
pattern this test is exercising.
I'll fix up the test.

Thanks,
Kyrill


Ramana

>
> Christophe
>




Re: [PATCH] PR libstdc++/87135 Rehash only when necessary (LWG2156)

2018-09-18 Thread Jonathan Wakely

On 13/09/18 07:49 +0200, François Dumont wrote:

All changes limited to hashtable_c++0x.cc file.

_Prime_rehash_policy::_M_next_bkt now really does what its comment was 
declaring that is to say:

  // Return a prime no smaller than n.

_Prime_rehash_policy::_M_need_rehash rehash only when _M_next_size is 
exceeded, not only when it is reach.


    PR libstdc++/87135
    * src/c++11/hashtable_c++0x.cc:
    (_Prime_rehash_policy::_M_next_bkt): Return a prime no smaller than
    requested size, but not necessarily greater.
    (_Prime_rehash_policy::_M_need_rehash): Rehash only if target size is
    strictly greater than next resize threshold.
    * testsuite/23_containers/unordered_map/modifiers/reserve.cc: 
Adapt test

    to validate that there is no rehash as long as number of insertion is
    lower or equal to the reserved number of elements.

unordered_map tests successful, ok to commit once all other tests 
completed ?


François



diff --git a/libstdc++-v3/src/c++11/hashtable_c++0x.cc 
b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
index a776a8506fe..ec6031b3f5b 100644
--- a/libstdc++-v3/src/c++11/hashtable_c++0x.cc
+++ b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
@@ -46,10 +46,10 @@ namespace __detail
  {
// Optimize lookups involving the first elements of __prime_list.
// (useful to speed-up, eg, constructors)
-static const unsigned char __fast_bkt[13]
-  = { 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11, 13, 13 };
+static const unsigned char __fast_bkt[]
+  = { 2, 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11, 13, 13 };

-if (__n <= 12)
+if (__n < sizeof(__fast_bkt) / sizeof(unsigned char))


sizeof(unsigned char) is defined to be 1, always.

I think this should be just sizeof(__fast_bkt), or if you're trying to
guard against the type of __fast_bkt changing, then use
sizeof(__fast_bkt) / sizeof(__fast_bkt[0]))

OK for trunk with either of those, instead of sizeof(unsigned char).

Thanks.




Re: [PATCH 13/25] Create TARGET_DISABLE_CURRENT_VECTOR_SIZE

2018-09-18 Thread Andrew Stubbs

On 17/09/18 20:28, Richard Sandiford wrote:

This patch simply disables the cache so that it must ask the backend for the
preferred mode for every type.


TBH I'm surprised this works.  Obviously it does, otherwise you wouldn't
have posted it, but it seems like an accident.  Various parts of the
vectoriser query current_vector_size and expect it to be stable for
the current choice of vector size.


Indeed, this is why this remains only a half-baked patch: I wasn't 
confident it was the correct or whole solution.


It works in so much as it fixes the immediate problem that I saw -- "no 
vector type" -- and makes a bunch of vect.exp testcases happy.


It's quite possible that something else is unhappy with this.


The underlying problem also affects (at least) base AArch64, SVE and x86_64.
We try to choose vector types on the fly based only on the type of a given
scalar value, but in reality, the type we want for a 32-bit element (say)
often depends on whether the vectorisation region also has smaller or
larger elements.  And in general we only know that after
vect_mark_stmts_to_be_vectorized, but we want to know the vector types
earlier, such as in pattern recognition and while building SLP trees.
It's a bit of a chicken-and-egg problem...


I don't understand why the number of bits in a vector is the key 
information here?


It would make sense if you were to say that the number of elements has 
to be fixed in a given region, because obviously that's tied to loop 
strides and such, but why the size?


It seems like there is an architecture were you don't want to mix 
instruction types (SSE vs. AVX?) and that makes sense for that 
architecture, but if that's the case then we need to be able to turn it 
off for other architectures.


For GCN, vectors are fully maskable, so we almost want such 
considerations to be completely ignored.  We basically want it to act 
like it can have any size vector it likes, up to 64 elements.


Andrew


Re: [ARM] Fix ICE during thunk generation with -mlong-calls

2018-09-18 Thread Eric Botcazou
> this seems to contradict your statement above about having to work
> harder to fix up minipools.

Why?  Fixing up minipools is done in the generic ARM reorg pass, not in the 
Thumb reorg pass(es).

> Why do we need a barrier here unconditionally (ie in the non-longcall case)?

We don't, but it doesn't harm to put it either.  For example, the x86, PowerPC 
and SPARC ports always do it.

-- 
Eric Botcazou


[PATCH][GCC][AARCH64] enable STLUR use: Use STLUR in atomic_store

2018-09-18 Thread Matthew Malcomson
[PATCH][GCC][AARCH64] Use STLUR for atomic_store

Use the STLUR instruction introduced in Armv8.4-a.
This instruction has the store-release semantic like STLR but can take a
9-bit unscaled signed immediate offset.

Example test case:
```
void
foo ()
{
int32_t *atomic_vals = calloc (4, sizeof (int32_t));
atomic_store_explicit (atomic_vals + 1, 2, memory_order_release);
}
```

Before patch generates
```
foo:
stp x29, x30, [sp, -16]!
mov x1, 4
mov x0, x1
mov x29, sp
bl  calloc
mov w1, 2
add x0, x0, 4
stlrw1, [x0]
ldp x29, x30, [sp], 16
ret
```

After patch generates
```
foo:
stp x29, x30, [sp, -16]!
mov x1, 4
mov x0, x1
mov x29, sp
bl  calloc
mov w1, 2
stlur   w1, [x0, 4]
ldp x29, x30, [sp], 16
ret
```

We introduce a new feature flag to indicate the presence of this instruction.
The feature flag is called AARCH64_ISA_RCPC8_4 and is included when targeting
armv8.4 architecture.

We also introduce an "arch" attribute to be checked called "rcpc8_4" after this
feature flag.

Full bootstrap and regression test done on aarch64-none-linux-gnu.
Ok for trunk?

gcc/

2018-09-18  Matthew Malcomson  

* config/aarch64/aarch64-protos.h
(aarch64_offset_9bit_signed_unscaled_p): New declaration.
* config/aarch64/aarch64.md (arches): New "rcpc8_4" attribute value.
(arch_enabled): Add check for "rcpc8_4" attribute value of "arch".
* config/aarch64/aarch64.h (AARCH64_FL_RCPC8_4): New bitfield.
(AARCH64_FL_FOR_ARCH8_4): Include AARCH64_FL_RCPC8_4.
(AARCH64_FL_PROFILE): Move index so flags are ordered.
(AARCH64_ISA_RCPC8_4): New flag.
* config/aarch64/aarch64.c (offset_9bit_signed_unscaled_p): Renamed
to aarch64_offset_9bit_signed_unscaled_p.
* config/aarch64/atomics.md (atomic_store): Allow offset
and use stlur.
* config/aarch64/constraints.md (Ust): New constraint.
* config/aarch64/predicates.md.
(aarch64_9bit_offset_memory_operand): New predicate.

gcc/testsuite/

2018-09-18  Matthew Malcomson  

* gcc.target/aarch64/atomic-store.c: New.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
ef95fc829b83886e2ff00e4664e31af916e99b0c..7a6254e46893fb36dc2ae57e7cfe78af67fb0e49
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -393,6 +393,7 @@ void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, 
rtx, rtx, rtx);
 bool aarch64_mov_operand_p (rtx, machine_mode);
 rtx aarch64_reverse_mask (machine_mode, unsigned int);
 bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64);
+bool aarch64_offset_9bit_signed_unscaled_p (machine_mode, poly_int64);
 char *aarch64_output_sve_cnt_immediate (const char *, const char *, rtx);
 char *aarch64_output_sve_addvl_addpl (rtx, rtx, rtx);
 char *aarch64_output_sve_inc_dec_immediate (const char *, rtx);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
c1218503bab19323eee1cca8b7e4bea8fbfcf573..cc21e1656b75b4ed1e94d0eb4b2b3af0039ba47e
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -157,9 +157,10 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_SM4   (1 << 17)  /* Has ARMv8.4-A SM3 and SM4.  */
 #define AARCH64_FL_SHA3  (1 << 18)  /* Has ARMv8.4-a SHA3 and 
SHA512.  */
 #define AARCH64_FL_F16FML (1 << 19)  /* Has ARMv8.4-a FP16 extensions.  */
+#define AARCH64_FL_RCPC8_4(1 << 20)  /* Has ARMv8.4-a RCPC extensions.  */
 
 /* Statistical Profiling extensions.  */
-#define AARCH64_FL_PROFILE(1 << 20)
+#define AARCH64_FL_PROFILE(1 << 21)
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -178,7 +179,7 @@ extern unsigned aarch64_architecture_version;
   (AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_V8_3)
 #define AARCH64_FL_FOR_ARCH8_4 \
   (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4 | AARCH64_FL_F16FML \
-   | AARCH64_FL_DOTPROD)
+   | AARCH64_FL_DOTPROD | AARCH64_FL_RCPC8_4)
 
 /* Macros to test ISA flags.  */
 
@@ -199,6 +200,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_SM4   (aarch64_isa_flags & AARCH64_FL_SM4)
 #define AARCH64_ISA_SHA3  (aarch64_isa_flags & AARCH64_FL_SHA3)
 #define AARCH64_ISA_F16FML(aarch64_isa_flags & AARCH64_FL_F16FML)
+#define AARCH64_ISA_RCPC8_4   (aarch64_isa_flags & AARCH64_FL_RCPC8_4)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
0d7ca9998466d8d4f9e79faf451a281f8d154d7d..b1a963689

Re: GCC 8 backports

2018-09-18 Thread Martin Liška
Hi.

One more tested patch.

Martin
>From ee960f3dcad03652cd133b8598131aed488c11cb Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 17 Sep 2018 10:19:02 +
Subject: [PATCH] Backport r264363

gcc/ChangeLog:

2018-09-17  Martin Liska  

	PR gcov-profile/85871
	* gcov.c (output_intermediate_file): Fix out of bounds
	access.
---
 gcc/gcov.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/gcov.c b/gcc/gcov.c
index 7f06cf66d0c..c7c52ce3629 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -1068,7 +1068,8 @@ output_intermediate_file (FILE *gcov_file, source_info *src)
 	}
 
   /* Follow with lines associated with the source file.  */
-  output_intermediate_line (gcov_file, &src->lines[line_num], line_num);
+  if (line_num < src->lines.size ())
+	output_intermediate_line (gcov_file, &src->lines[line_num], line_num);
 }
 }
 
-- 
2.18.0



[Patch, fortran] PR87336] [8/9 regression] wrong output for pointer dummy assiocated to target actual argument

2018-09-18 Thread Paul Richard Thomas
This one came up on clf yesterday. Thanks to Jeurgen Reuter for posting the PR.

This is further fallout from the array descriptor changes. I decided
not to pick out the special case involved but to set the 'span' field
for all new (ie. 'parm') descriptors. A bit of fiddling around with
gfc_get_array_span was needed to punt on incomplete types, unless a
length could be found in the gfc_expr.

Bootstraps and regtests on FC28/x86_64 - OK for 8- and 9-branches?

Paul

2018-09-18  Paul Thomas  

PR fortran/87336
* trans-array.c (gfc_get_array_span): Try to get the element
length of incomplete types. Return NULL_TREE otherwise.
(gfc_conv_expr_descriptor): Only set the 'span' field if the
above does not return NULL_TREE. Set 'span' field if possible
for all new descriptors.

2018-09-18  Paul Thomas  

PR fortran/87336
* gfortran.dg/pointer_array_10.f90 : New test.
* gfortran.dg/assign_10.f90 : Increase 'parm' count to 20.
* gfortran.dg/transpose_optimization_2.f90 : Increase 'parm'
count to 72.
Index: gcc/fortran/trans-array.c
===
*** gcc/fortran/trans-array.c	(revision 264364)
--- gcc/fortran/trans-array.c	(working copy)
*** gfc_get_array_span (tree desc, gfc_expr
*** 849,858 
else
  {
/* If none of the fancy stuff works, the span is the element
! 	 size of the array.  */
tmp = gfc_get_element_type (TREE_TYPE (desc));
!   tmp = fold_convert (gfc_array_index_type,
! 			  size_in_bytes (tmp));
  }
return tmp;
  }
--- 849,870 
else
  {
/* If none of the fancy stuff works, the span is the element
! 	 size of the array. Attempt to deal with unbounded character
! 	 types if possible. Otherwise, return NULL_TREE.  */
tmp = gfc_get_element_type (TREE_TYPE (desc));
!   if (tmp && TREE_CODE (tmp) == ARRAY_TYPE
! 	  && TYPE_MAX_VALUE (TYPE_DOMAIN (tmp)) == NULL_TREE)
! 	{
! 	  if (expr->expr_type == EXPR_VARIABLE
! 	  && expr->ts.type == BT_CHARACTER)
! 	tmp = fold_convert (gfc_array_index_type,
! gfc_get_expr_charlen (expr));
! 	  else
! 	tmp = NULL_TREE;
! 	}
!   else
! 	tmp = fold_convert (gfc_array_index_type,
! 			size_in_bytes (tmp));
  }
return tmp;
  }
*** gfc_conv_expr_descriptor (gfc_se *se, gf
*** 7074,7080 

  	  /* and set the span field.  */
  	  tmp = gfc_get_array_span (desc, expr);
! 	  gfc_conv_descriptor_span_set (&se->pre, se->expr, tmp);
  	}
  	  else if (se->want_pointer)
  	{
--- 7086,7093 

  	  /* and set the span field.  */
  	  tmp = gfc_get_array_span (desc, expr);
! 	  if (tmp != NULL_TREE)
! 		gfc_conv_descriptor_span_set (&se->pre, se->expr, tmp);
  	}
  	  else if (se->want_pointer)
  	{
*** gfc_conv_expr_descriptor (gfc_se *se, gf
*** 7344,7356 
desc = info->descriptor;
if (se->direct_byref && !se->byref_noassign)
  	{
! 	  /* For pointer assignments we fill in the destination  */
  	  parm = se->expr;
  	  parmtype = TREE_TYPE (parm);
-
- 	  /* and set the span field.  */
- 	  tmp = gfc_get_array_span (desc, expr);
- 	  gfc_conv_descriptor_span_set (&loop.pre, parm, tmp);
  	}
else
  	{
--- 7357,7365 
desc = info->descriptor;
if (se->direct_byref && !se->byref_noassign)
  	{
! 	  /* For pointer assignments we fill in the destination.  */
  	  parm = se->expr;
  	  parmtype = TREE_TYPE (parm);
  	}
else
  	{
*** gfc_conv_expr_descriptor (gfc_se *se, gf
*** 7388,7393 
--- 7397,7407 
  	}
  	}

+   /* Set the span field.  */
+   tmp = gfc_get_array_span (desc, expr);
+   if (tmp != NULL_TREE)
+ 	gfc_conv_descriptor_span_set (&loop.pre, parm, tmp);
+
offset = gfc_index_zero_node;

/* The following can be somewhat confusing.  We have two
Index: gcc/testsuite/gfortran.dg/assign_10.f90
===
*** gcc/testsuite/gfortran.dg/assign_10.f90	(revision 264364)
--- gcc/testsuite/gfortran.dg/assign_10.f90	(working copy)
*** end
*** 23,27 
  ! cases will all yield a temporary, so that atmp appears 18 times.
  ! Note that it is the kind conversion that generates the temp.
  !
! ! { dg-final { scan-tree-dump-times "parm" 18 "original" } }
  ! { dg-final { scan-tree-dump-times "atmp" 18 "original" } }
--- 23,27 
  ! cases will all yield a temporary, so that atmp appears 18 times.
  ! Note that it is the kind conversion that generates the temp.
  !
! ! { dg-final { scan-tree-dump-times "parm" 20 "original" } }
  ! { dg-final { scan-tree-dump-times "atmp" 18 "original" } }
Index: gcc/testsuite/gfortran.dg/pointer_array_10.f90
===
*** gcc/testsuite/gfortran.dg/pointer_array_10.f90	(nonexistent)
--- gcc/testsuite/gfortran.dg/pointer_array_10.f90	(working co

Re: [PATCH v3] combine: perform jump threading at the end

2018-09-18 Thread Ilya Leoshkevich



> Am 17.09.2018 um 19:11 schrieb Segher Boessenkool 
> :
> 
> On Mon, Sep 17, 2018 at 10:50:58AM +0200, Ilya Leoshkevich wrote:
>>> Am 14.09.2018 um 23:35 schrieb Segher Boessenkool 
>>> :
>>> Could you please show generated code before and after this patch?
>>> I mean generated assembler code.  What -S gives you.
>> 
>> Before:
>> 
>> foo4:
>> .LFB0:
>>  .cfi_startproc
>>  lt  %r1,0(%r2)
>>  jne .L2
>>  lhi %r3,1
>>  cs  %r1,%r3,0(%r2)
>> .L2:
>>  jne .L5
>>  br  %r14
>> .L5:
>>  jg  bar
>> 
>> After:
>> 
>> foo4:
>> .LFB0:
>>  .cfi_startproc
>>  lt  %r1,0(%r2)
>>  jne .L4
>>  lhi %r3,1
>>  cs  %r1,%r3,0(%r2)
>>  jne .L4
>>  br  %r14
>> .L4:
>>  jg  bar
> 
> Ah.  And a compiler of some weeks old gives
> 
> foo4:
> .LFB0:
>   .cfi_startproc
>   lhi %r3,0
>   lhi %r4,1
>   cs  %r3,%r4,0(%r2)
>   jne .L4
>   br  %r14
> .L4:
>   jg  bar
> 
> so this is all caused by the recent optimisation that avoids the "cs" if
> it can.

Could you please try building with -march=z13?  I don’t see the „lt“ 
instruction in your output.  On z196+ we try to speed up the code by
jumping around the „cs“ when possible.




Re: [ARM] Fix ICE during thunk generation with -mlong-calls

2018-09-18 Thread Richard Earnshaw (lists)
On 18/09/18 10:00, Eric Botcazou wrote:
>> this seems to contradict your statement above about having to work
>> harder to fix up minipools.
> 
> Why?  Fixing up minipools is done in the generic ARM reorg pass, not in the 
> Thumb reorg pass(es).
> 

Ah!  But that still doesn't explain why you want to skip these passes
when building thunks.

>> Why do we need a barrier here unconditionally (ie in the non-longcall case)?
> 
> We don't, but it doesn't harm to put it either.  For example, the x86, 
> PowerPC 
> and SPARC ports always do it.
> 

So is the barrier correct, or isn't it?  There's really no two ways
about this.  I don't like arbitrary changes that are justified solely on
'that's what another port does'.

R.


Re: [PATCH] gcov: emit hotness colors to easily find hot code.

2018-09-18 Thread Martin Liška
On 9/17/18 8:41 PM, David Malcolm wrote:
> On Wed, 2018-09-12 at 14:36 +0200, Martin Liška wrote:
>> Hi.
>>
>> This is follow-up of:
>> https://gcc.gnu.org/ml/gcc/2018-08/msg00162.html
>>
>> I'm suggesting to introduce using colors in order to indicate hotness
>> of lines. Legend is printed at the very beginning of the output file.
>> Example: https://pasteboard.co/HDxK4Nm.png
> 
> One comment: color seems to be being used for two different purposes:
> (a) in the left-hand column (presumably the profile count), where e.g.
> lines that are 0 are being marked as red
> (b) in the middle column (the line number), lines that are > 50% are
> being marked as red.
> 
> So red seems to be being used both for very hot lines (for the line
> number), and for unexecuted lines (for the profile count).

Thanks for review David.

Yes, it's used for 2 purposes.

> 
> Perhaps the "hotness legend" could instead say something like:
> 
> Colorization: profile count: blah blah blah
> Colorization: line numbers: hotness: > 50 % > 20% > 10%
> 
> to explain both colorization schemes?  (I'm not in love with the above
> wording, but hopefully the idea makes sense).

I welcome that, now one can see:

-:0:Colorization: profile count: zero coverage (exceptional) zero 
coverage (unexceptional) unexecuted block
-:0:Colorization: line numbers: hotness: > 50% > 20% > 10%
...

Hope it's an improvement.

Martin

> 
>> Patch survives gcov.exp test-suite. Will install next week if no
>> objections.
>>
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2018-09-12  Martin Liska  
>>
>>  * doc/gcov.texi: Document new option --use-hotness-colors.
>>  * gcov.c (struct source_info): Declare new field.
>>  (source_info::source_info): Set default for maximum_count.
>>  (print_usage): Add new -q option.
>>  (process_args): Process it.
>>  (accumulate_line_info): Save src->maximum_count.
>>  (output_line_beginning): Make color line number if
>>  flag_use_hotness_colors is set.
>>  (output_line_details): Pass default argument value.
>>  (output_lines): Pass src->maximum_count.
>> ---
>>  gcc/doc/gcov.texi |  8 ++-
>>  gcc/gcov.c| 56 +--
>> 
>>  2 files changed, 56 insertions(+), 8 deletions(-)
>>
>>

>From 3e826e7b4b09a7130ee88122d54fb35ad41b044f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 18 Sep 2018 12:48:16 +0200
Subject: [PATCH] Improve colorization legen in gcov reports.

gcc/ChangeLog:

2018-09-18  Martin Liska  

	* gcov.c (output_lines): Print colorization legend
	for both flag_use_colors and flag_use_hotness_colors.
	Reword the help.
---
 gcc/gcov.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/gcov.c b/gcc/gcov.c
index c09d5060053..0de14dc52af 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -2935,9 +2935,19 @@ output_lines (FILE *gcov_file, const source_info *src)
   FILE *source_file;
   const char *retval;
 
-  /* Print legend of color hotness syntax.  */
+  /* Print colorization legend.  */
+  if (flag_use_colors)
+fprintf (gcov_file, "%s",
+	 DEFAULT_LINE_START "Colorization: profile count: " \
+	 SGR_SEQ (COLOR_BG_CYAN) "zero coverage (exceptional)" SGR_RESET \
+	 " " \
+	 SGR_SEQ (COLOR_BG_RED) "zero coverage (unexceptional)" SGR_RESET \
+	 " " \
+	 SGR_SEQ (COLOR_BG_MAGENTA) "unexecuted block" SGR_RESET "\n");
+
   if (flag_use_hotness_colors)
-fprintf (gcov_file, "%s", DEFAULT_LINE_START "Hotness legend: " \
+fprintf (gcov_file, "%s",
+	 DEFAULT_LINE_START "Colorization: line numbers: hotness: " \
 	 SGR_SEQ (COLOR_BG_RED) "> 50%" SGR_RESET " " \
 	 SGR_SEQ (COLOR_BG_YELLOW) "> 20%" SGR_RESET " " \
 	 SGR_SEQ (COLOR_BG_GREEN) "> 10%" SGR_RESET "\n");
-- 
2.18.0



Re: [PATCH 13/25] Create TARGET_DISABLE_CURRENT_VECTOR_SIZE

2018-09-18 Thread Richard Sandiford
Andrew Stubbs  writes:
> On 17/09/18 20:28, Richard Sandiford wrote:
>>> This patch simply disables the cache so that it must ask the backend for the
>>> preferred mode for every type.
>> 
>> TBH I'm surprised this works.  Obviously it does, otherwise you wouldn't
>> have posted it, but it seems like an accident.  Various parts of the
>> vectoriser query current_vector_size and expect it to be stable for
>> the current choice of vector size.
>
> Indeed, this is why this remains only a half-baked patch: I wasn't 
> confident it was the correct or whole solution.
>
> It works in so much as it fixes the immediate problem that I saw -- "no 
> vector type" -- and makes a bunch of vect.exp testcases happy.
>
> It's quite possible that something else is unhappy with this.
>
>> The underlying problem also affects (at least) base AArch64, SVE and x86_64.
>> We try to choose vector types on the fly based only on the type of a given
>> scalar value, but in reality, the type we want for a 32-bit element (say)
>> often depends on whether the vectorisation region also has smaller or
>> larger elements.  And in general we only know that after
>> vect_mark_stmts_to_be_vectorized, but we want to know the vector types
>> earlier, such as in pattern recognition and while building SLP trees.
>> It's a bit of a chicken-and-egg problem...
>
> I don't understand why the number of bits in a vector is the key 
> information here?

Arguably it shouldn't be, and it's really just a proxy for the vector
(sub)architecture.  But this is "should be" vs. "is" :-)

> It would make sense if you were to say that the number of elements has 
> to be fixed in a given region, because obviously that's tied to loop 
> strides and such, but why the size?
>
> It seems like there is an architecture were you don't want to mix 
> instruction types (SSE vs. AVX?) and that makes sense for that 
> architecture, but if that's the case then we need to be able to turn it 
> off for other architectures.

It's not about trying to avoid mixing vector sizes: from what Jakub
said earlier in the year, even x86 wants to do that (but can't yet).
The idea is instead to try the available possibilities.

E.g. for AArch64 we want to try SVE, 128-bit Advanced SIMD and
64-bit Advanced SIMD.  With something like:

  int *ip;
  short *sp;
  for (int i = 0; i < n; ++i)
ip[i] = sp[i];

there are three valid choices for Advanced SIMD:

(1) use 1 128-bit vector of sp and 2 128-bit vectors of ip
(2) use 1 64-bit vector of sp and 2 64-bit vectors of ip
(3) use 1 64-bit vector of sp and 1 128-bit vector of ip

At the moment we only try (1) and (2), but in practice, (3) should be
better than (2) in most cases.  I guess in some ways trying all three
would be best, but if we only try two, trying (1) and (3) is better
than trying (1) and (2).

For:

  for (int i = 0; i < n; ++i)
ip[i] += 1;

there are two valid choices for Advanced SIMD:

(4) use 1 128-bit vector of ip
(5) use 1 64-bit vector of ip

The problem for the current autovec set-up is that the ip type for
64-bit Advanced SIMD varies between (3) and (5): for (3) it's a
128-bit vector type and for (5) it's a 64-bit vector type.
So the type we want for a given vector subarchitecture is partly
determined by the other types in the region: it isn't simply a
function of the subarchitecture and the element type.

This is why the current autovec code only supports (1), (2),
(4) and (5).  And I think this is essentially the same limitation
that you're hitting.

> For GCN, vectors are fully maskable, so we almost want such 
> considerations to be completely ignored.  We basically want it to act 
> like it can have any size vector it likes, up to 64 elements.

SVE is similar.  But even for SVE there's an equivalent trade-off
between (1) and (3):

(1') use 1 fully-populated vector for sp and 2 fully-populated
 vectors for ip
(3') use 1 half-populated vector for sp and 1 fully-populated
 vector for ip

Which is best for more complicated examples depends on the balance
between ip-based work and sp-based work.  The packing and unpacking
in (1') has a cost, but it would pay off if there was much more
sp work than ip work, since in that case (3') would spend most
of its time operating on partially-populated vectors.

Would the same be useful for GCN, or do you basically always
want a VF of 64?

None of this is a fundamental restriction in theory.  It's just
something that needs to be fixed.

One approach would be to get the loop vectoriser to iterate over the
number of lanes the target supports insteaad of all possible vector
sizes.  The problem is that on its own this would mean trying 4
lane counts even on targets with a single supported vector size.
So we'd need to do something a bit smarter...

Richard


Re: [PATCH v3] combine: perform jump threading at the end

2018-09-18 Thread Ilya Leoshkevich
> Am 17.09.2018 um 19:11 schrieb Segher Boessenkool 
> :
> 
> On Mon, Sep 17, 2018 at 10:50:58AM +0200, Ilya Leoshkevich wrote:
>>> Am 14.09.2018 um 23:35 schrieb Segher Boessenkool 
>>> :
> 
>>> Why does the existing jump threading not work for you; should it happen
>>> at another time?
>> 
>> We call cleanup_cfg (CLEANUP_THREADING) only once - during the „jump“
>> pass, which happens before combine.  There is also „jump2“ pass, which
>> happens afterwards, and after discussion with Ulrich Weigand I tried to
>> move jump threading there.  While this change had the desired effect on
>> the testcase, the code got worse in another places.
> 
> Yeah, jump2 is quite late.  I think you should do it before postreload_cse
> or similar.

Thanks, I will give it a try.

> 
> Btw, what percentage of files (or subroutines) does jump threading run
> after combine, with your patch?
> 

I checked this on SPEC CPU 2006.

Turns out it’s not as good as I expected: the additional jump threading
is performed on 64% of the functions.  I think this is because I only
check whether PATTERNs have any side-effects, but I don’t look at
INSN_CODEs.  I'll try to change this and see whether I'll get a better
number.

Here are some other figures:

- The total number of cleanup_cfg calls: +3.3%.
- The total run time of cleanup_cfg: +5.5%.
- The average run time of cleanup_cfg:   +2.1%.



Re: [PATCH] Cleanup strcpy/stpcpy no nul warning code

2018-09-18 Thread Bernd Edlinger
On 09/18/18 07:31, Jeff Law wrote:
> On 9/17/18 1:18 PM, Bernd Edlinger wrote:
>> On 09/17/18 20:32, Jeff Law wrote:
>>> On 9/17/18 12:20 PM, Bernd Edlinger wrote:
 On 09/17/18 19:33, Jeff Law wrote:
> On 9/16/18 1:58 PM, Bernd Edlinger wrote:
>> Hi,
>>
>> this is a cleanup of the recently added strlen/strcpy/stpcpy
>> no nul warning code.
>>
>> Most importantly it moves the SSA_NAME handling from
>> unterminated_array to string_constant, thereby fixing
>> another round of xfails in the strlen and stpcpy test cases.
>>
>> I need to say that the fix for bug 86622 is relying in
>> type info on the pointer which is probably not safe in
>> GIMPLE in the light of the recent discussion.
>>
>> I had to add two further exceptions, which should
>> be safe in general: that is a pointer arithmentic on a string
>> literal is okay, and arithmetic on a string constant
>> that is exactly the size of the whole DECL, cannot
>> access an adjacent member.
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
>>
>> patch-cleanup-no-nul.diff
>>
>> gcc:
>> 2018-09-16  Bernd Edlinger  
>>
>>  * builtins.h (unterminated_array): Remove prototype.
>>* builtins.c (unterminated_array): Simplify.  Make static.
>>(expand_builtin_strcpy_args): Don't use TREE_NO_WARNING here.
>>  (expand_builtin_stpcpy_1): Remove warn_string_no_nul without effect.
>>  * expr.c (string_constant): Handle SSA_NAME.  Add more exceptions
>>  where pointer arithmetic is safe.
>>  * gimple-fold.c (get_range_strlen): Handle nonstr like in c_strlen.
>>  (get_max_strlen): Remove the unnecessary mynonstr handling.
>>  (gimple_fold_builtin_strcpy): Simplify.
>>  (gimple_fold_builtin_stpcpy): Simplify.
>>  (gimple_fold_builtin_sprintf): Remove NO_WARNING propagation
>>  without effect.
>>  (gimple_fold_builtin_strlen): Simplify.
>>
>> testsuite:
>> 2018-09-16  Bernd Edlinger  
>>
>>  * gcc.dg/warn-stpcpy-no-nul.c: Remove xfails.
>>  * gcc.dg/warn-strlen-no-nul.c: Remove xfails.
>>
>> Index: gcc/builtins.c
>> ===
>> --- gcc/builtins.c   (revision 264342)
>> +++ gcc/builtins.c   (working copy)
>> @@ -567,31 +567,12 @@ warn_string_no_nul (location_t loc, const char *fn
>>the declaration of the object of which the array is a member or
>>element.  Otherwise return null.  */
>> 
>> -tree
>> +static tree
>> unterminated_array (tree exp)
>> {
>> -  if (TREE_CODE (exp) == SSA_NAME)
>> -{
>> -  gimple *stmt = SSA_NAME_DEF_STMT (exp);
>> -  if (!is_gimple_assign (stmt))
>> -return NULL_TREE;
>> -
>> -  tree rhs1 = gimple_assign_rhs1 (stmt);
>> -  tree_code code = gimple_assign_rhs_code (stmt);
>> -  if (code == ADDR_EXPR
>> -  && TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF)
>> -rhs1 = rhs1;
>> -  else if (code != POINTER_PLUS_EXPR)
>> -return NULL_TREE;
>> -
>> -  exp = rhs1;
>> -}
>> -
>>   tree nonstr = NULL;
>> -  if (c_strlen (exp, 1, &nonstr, 1) == NULL && nonstr)
>> -return nonstr;
>> -
>> -  return NULL_TREE;
>> +  c_strlen (exp, 1, &nonstr);
>> +  return nonstr;
>> }
> Sigh.  This is going to conflict with patch #6 from Martin's kit.
>

 Sorry, it just feels wrong to do this folding here instead of in
 string_constant.

 I think the handling of POINTER_PLUS_EXPR above, is faulty,
 because it is ignoring the offset parameter, which typically
 contains the offset part, may add an offset to a different
 structure member or another array index.  That is what happened
 in PR 86622.

 So you are likely looking at the wrong index, or even the wrong
 structure member.
>>> I'm not disagreeing that something's wrong here -- the whole concept
>>> that we extract the rhs and totally ignore the offset seems wrong.  I've
>>> stumbled over it working through issues with either patch #4 or #6 from
>>> Martin.  But I felt I needed to go back and reevaluate any assumptions I
>>> had about how the code was supposed to be used before I called it out.
>>>
>>>
>>> Your expr.c changes may be worth pushing through independently of the
>>> rest.AFAICT they're really just exposing more cases where we can
>>> determine that we're working with a stirng constant.
>>>
>>
>> I think that moves just the folding from here to expr.c,
>> I don't see how the change in part #6 on unterminated_array
>> is supposed to work, I quote it here and add some comments:
>>
> Essentially there's a couple of concepts he wants to get in

Re: [PATCH] S/390: Fix conditional returns

2018-09-18 Thread Ilya Leoshkevich
> Am 18.09.2018 um 02:57 schrieb Jeff Law :
> 
> On 9/5/18 2:34 AM, Ilya Leoshkevich wrote:
>> --- a/gcc/cfgcleanup.c
>> +++ b/gcc/cfgcleanup.c
>> @@ -2624,7 +2624,7 @@ bb_is_just_return (basic_block bb, rtx_insn **ret, 
>> rtx_insn **use)
>>   {
>>  rtx pat = PATTERN (insn);
>> 
>> -if (!*ret && ANY_RETURN_P (pat))
>> +if (!*ret && (ANY_RETURN_P (pat) || return_in_parallel (pat)))
>>*ret = insn;
>>  else if (!*ret && !*use && GET_CODE (pat) == USE
>>  && REG_P (XEXP (pat, 0))
> So what else is in the return insn that requires you to test for
> return_in_parallel?  If we're going to allow a return in a parallel,
> here I think we need to tighten down its form given the intended ways
> bb_is_just_return is supposed to be used.  Essentially other side
> effects would seem to be forbidden in the parallel.  ie, you could have
> a PARALLEL with a return and use inside, but not a return with anything
> else inside (such as a clobber).

Yes, it’s RETURN+USE.  I allowed CLOBBERs, because bb_is_just_return
already allows them, but I don’t think it's necessary for the S/390 use
case, so I can make it more restrictive if needed.

> 
> Why do you need to make a copy of the parallel RTX in redirect_exp_1?
> We don't do it for other cases -- why can't we just use validate_change
> like all the other RTXs?
> 
> 
>> /* Throughout LOC, redirect OLABEL to NLABEL.  Treat null OLABEL or
>>NLABEL as a return.  Accrue modifications into the change group.  */
>> 
>> @@ -1437,9 +1457,22 @@ redirect_exp_1 (rtx *loc, rtx olabel, rtx nlabel, 
>> rtx_insn *insn)
>>   if ((code == LABEL_REF && label_ref_label (x) == olabel)
>>   || x == olabel)
>> {
>> -  x = redirect_target (nlabel);
>> -  if (GET_CODE (x) == LABEL_REF && loc == &PATTERN (insn))
>> -x = gen_rtx_SET (pc_rtx, x);
>> +  rtx *nret = return_in_parallel (nlabel);
>> +
>> +  if (nret)
>> +{
>> +  rtx npat;
>> +
>> +  x = *nret;
>> +  npat = copy_update_parallel (nlabel, nret, PATTERN (insn));
>> +  validate_change (insn, &PATTERN (insn), npat, 1);
>> +}
>> +  else
>> +{
>> +  x = redirect_target (nlabel);
>> +  if (GET_CODE (x) == LABEL_REF && loc == &PATTERN (insn))
>> +x = gen_rtx_SET (pc_rtx, x);
>> +}
>>   validate_change (insn, loc, x, 1);
> Why the need to use copy_update_parallel here?  Is there a reason why
> validate_change is insufficient?

The original RTL has the following form:

(jump_insn 1 (set %pc (if_then_else (ne %cc 0) (label_ref 2) %pc)))
...
(code_label 2)
(jump_insn 3 (parallel [(return) (use %r14)]))

and the goal is to transform (jump_insn 1) to:

(jump_insn 1
 (parallel
  [(set %pc (if_then_else (ne %cc 0) (return) %pc))
   (use %r14)]))

while keeping (code_label 2) and (jump_insn 3) intact.  So I have to
create a new PARALLEL based on the original one.

> 
> 
>>   return;
>> }
>> @@ -1551,10 +1584,15 @@ void
>> redirect_jump_2 (rtx_jump_insn *jump, rtx olabel, rtx nlabel, int 
>> delete_unused,
>>   int invert)
>> {
>> +  rtx *ret;
>>   rtx note;
>> 
>>   gcc_assert (JUMP_LABEL (jump) == olabel);
>> 
>> +  ret = return_in_parallel (nlabel);
>> +  if (ret)
>> +nlabel = *ret;
> Why does return_in_parallel return an rtx *?  Can't you just return the
> rtx and avoid the unnecessary dereferencing?  I guess this ultimately
> comes back to why can't you use validate_change like everyone else in
> redirect_exp_1?

Right, this is related.  This is to indicate to copy_update_parallel,
which of the side-effects need to be updated.



RE: [PATCH][GCC][Arm] Fix subreg crash in different way by enabling the FP16 pattern unconditionally.

2018-09-18 Thread Tamar Christina
Hi All,

I'm looking for a backport of this patch to GCC8.

Ok for backport?

Thanks,
Tamar

> -Original Message-
> From: Kyrill Tkachov 
> Sent: Wednesday, August 15, 2018 15:25
> To: Tamar Christina ; Thomas Preudhomme
> 
> Cc: gcc-patches@gcc.gnu.org; nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com
> Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by
> enabling the FP16 pattern unconditionally.
> 
> Hi Tamar,
> 
> On 26/07/18 12:01, Tamar Christina wrote:
> > Hi Thomas,
> >
> > > -Original Message-
> > > From: Thomas Preudhomme 
> > > Sent: Thursday, July 26, 2018 09:29
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; Ramana
> Radhakrishnan
> > > ; Richard Earnshaw
> > > ; ni...@redhat.com; Kyrylo Tkachov
> > > 
> > > Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by
> > > enabling the FP16 pattern unconditionally.
> > >
> > > Hi Tamar,
> > >
> > > On Wed, 25 Jul 2018 at 16:28, Tamar Christina
> > > 
> > > wrote:
> > > >
> > > > Hi Thomas,
> > > >
> > > > Thanks for the review!
> > > >
> > > > > >
> > > > > > I don't believe the TARGET_FP16 guard to be needed, because
> > > > > > the pattern doesn't actually generate code and requires
> > > > > > another pattern for that, and a reg to reg move should always
> > > > > > be possible anyway. So allowing the force to register here is
> > > > > > safe and it allows the compiler to generate a correct error
> > > > > > instead of ICEing in an
> > > infinite loop.
> > > > >
> > > > > How about subreg to subreg move? Doesn't that expand to more
> > > > > insns (subreg to reg and reg to subreg)? Couldn't you improve
> > > > > the logic to check that there is actually a mode change so that
> > > > > if there isn't (like moving from one subreg to another) just expand to
> a single move?
> > > > >
> > > >
> > > > Yes, but that is not a new issue. My patch is simply removing the
> > > > TARGET_FP16 restrictions and merging two patterns that should be
> > > > one
> > > using an iterator and nothing more.
> > > >
> > > > The redundant mov is already there and a different issue than the
> > > > ICE I'm
> > > trying to fix.
> > >
> > > It's there for movv4hf and movv6hf but your patch extends this
> > > problem to movv2sf and movv4sf as well.
> >
> > I don't understand how it can. My patch just replaces one pattern for
> > V4HF and one for V8HF with one pattern operating on VH.
> >
> > ;; Vector modes for 16-bit floating-point support.
> > (define_mode_iterator VH [V8HF V4HF])
> >
> > My pattern has absolutely no effect on V2SF and V4SF or any of the other
> modes.
> >
> > >
> > > >
> > > > None of the code inside the expander is needed at all, the code
> > > > really only has an effect on subreg to subreg moves, as
> > > > `force_reg` doesn't do
> > > anything when it's argument is already a reg.
> > > >
> > > > The comment in the expander (which was already there) is wrong.
> > > > The
> > > > *reason* the ICE is fixed isn't because of the `force_reg`. It's
> > > > because of the mere presence of the expander itself. The expander
> > > > matches the standard mov$a optab and so this prevents
> > > emit_move_insn_1 from doing the move by subwords as it finds a
> > > pattern that's able to do the move.
> > >
> > > Could you then fix the comment in your patch as well? I hadn't
> > > understood the force_reg was not key here. You might want to update
> > > the following sentence from your patch description if you are going
> > > to include it in your commit message:
> >
> > I'll update the comment in the patch. The cover letter won't be
> > included in the commit, But it does accurately reflect the current
> > state of affairs. The patch will do the force_reg, It's just not the reason 
> > it
> works.
> >
> > >
> > > The way this is worked around in the back-end is that we have move
> > > patterns in neon.md that usually just force the register instead of
> > > checking with the back-end.
> > >
> > > "The way this is worked around (..) that just force the register" is
> > > what led me to believe the force_reg was important.
> > >
> > > >
> > > > The expander however always falls through and doesn’t stop RTL
> > > > generation. You could remove all the code in there and have it
> > > > properly match the *neon_mov instructions which will do the right
> > > > thing later at code generation time and avoid the redundant moves.
> > > > My
> > > guess is the original `force_reg` was copied from the other patterns
> > > like `movti` and the existing `mov`. There It makes sense
> > > because the operands can be MEM or anything general_operand.
> > > >
> > > > However the redundant moves are a different problem than what I'm
> > > > trying to solve here. So I think that's another patch which
> > > > requires further
> > > testing.
> > >
> > > I was just thinking of restricting when does the force_reg happens
> > > but if it can be removed completely I agree it should probably be
> > > done in a

Re: [C++ Patch PING] [C++ Patch] PR 85065 ("[concepts] ICE with invalid use of a concept")

2018-09-18 Thread Jason Merrill
On Mon, Sep 17, 2018 at 1:53 PM, Paolo Carlini  wrote:
> Hi again,
>
> On 9/3/18 10:59 PM, Paolo Carlini wrote:
>>
>> in this error-recovery ICE, upon the error make_constrained_auto assigns
>> error_mark_node to PLACEHOLDER_TYPE_CONSTRAINTS (type) which then causes a
>> crash later when hash_placeholder_constraint is called on it. I think we
>> should cope with this somehow, I believe that consistency with the way we
>> use error_mark_node in this part of the front-end prevents us from avoiding
>> to assign the error_mark_node in the first place and, for the reasons
>> explained in my previous patch, we want to unconditionally call
>> make_constrained_auto. This said, catching in practice the error_mark_node
>> would normally mean renouncing to the pattern 'if (tree c = ...)' which we
>> lately appear to like a lot and seems indeed neat. Thus I'm wondering if we
>> want instead to add a macro like ERROR_AS_NULL, which of course would be
>> also useful in many other places - essentially in all the circumstances
>> where we want to check for a kosher node, thus neither null nor
>> error_mark_node. What do you think? What about the name, in case? Tested
>> x86_64-linux.
>
>
> Today I reviewed again this issue, for which I sent a tentative patch a
> couple of weeks ago. All in all, I still believe that is the right place to
> catch the error_mark_node and avoid ICE-ing later, the quick rationale being
> that PLACEHOLDER_TYPE_CONSTRAINTS can be error_mark_node for other reasons
> too. As regards the ERROR_AS_NULL idea, I'm still not sure, on one hand it
> would allow for more compact and neat code in some cases, on the other hand
> could be seen as some sort of obfuscation - well, some people out there
> consider an obfuscation the very 'if (c =...)' pattern ;) Anyway, I'm
> attaching the normal versions of the fix, which, per a recent message from
> Jason, probably is almost obvious...

Hmm, I do kind of like the ERROR_AS_NULL idea.  I might call it
NON_ERROR, though.  OK with that change.

Jason


Re: [PATCH][GCC][Arm] Fix subreg crash in different way by enabling the FP16 pattern unconditionally.

2018-09-18 Thread Kyrill Tkachov

Hi Tamar,

On 18/09/18 13:55, Tamar Christina wrote:

Hi All,

I'm looking for a backport of this patch to GCC8.

Ok for backport?



Ok if testing on that branch shows no problems.

Thanks,
Kyrill


Thanks,
Tamar

> -Original Message-
> From: Kyrill Tkachov 
> Sent: Wednesday, August 15, 2018 15:25
> To: Tamar Christina ; Thomas Preudhomme
> 
> Cc: gcc-patches@gcc.gnu.org; nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com
> Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by
> enabling the FP16 pattern unconditionally.
>
> Hi Tamar,
>
> On 26/07/18 12:01, Tamar Christina wrote:
> > Hi Thomas,
> >
> > > -Original Message-
> > > From: Thomas Preudhomme 
> > > Sent: Thursday, July 26, 2018 09:29
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; Ramana
> Radhakrishnan
> > > ; Richard Earnshaw
> > > ; ni...@redhat.com; Kyrylo Tkachov
> > > 
> > > Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by
> > > enabling the FP16 pattern unconditionally.
> > >
> > > Hi Tamar,
> > >
> > > On Wed, 25 Jul 2018 at 16:28, Tamar Christina
> > > 
> > > wrote:
> > > >
> > > > Hi Thomas,
> > > >
> > > > Thanks for the review!
> > > >
> > > > > >
> > > > > > I don't believe the TARGET_FP16 guard to be needed, because
> > > > > > the pattern doesn't actually generate code and requires
> > > > > > another pattern for that, and a reg to reg move should always
> > > > > > be possible anyway. So allowing the force to register here is
> > > > > > safe and it allows the compiler to generate a correct error
> > > > > > instead of ICEing in an
> > > infinite loop.
> > > > >
> > > > > How about subreg to subreg move? Doesn't that expand to more
> > > > > insns (subreg to reg and reg to subreg)? Couldn't you improve
> > > > > the logic to check that there is actually a mode change so that
> > > > > if there isn't (like moving from one subreg to another) just expand to
> a single move?
> > > > >
> > > >
> > > > Yes, but that is not a new issue. My patch is simply removing the
> > > > TARGET_FP16 restrictions and merging two patterns that should be
> > > > one
> > > using an iterator and nothing more.
> > > >
> > > > The redundant mov is already there and a different issue than the
> > > > ICE I'm
> > > trying to fix.
> > >
> > > It's there for movv4hf and movv6hf but your patch extends this
> > > problem to movv2sf and movv4sf as well.
> >
> > I don't understand how it can. My patch just replaces one pattern for
> > V4HF and one for V8HF with one pattern operating on VH.
> >
> > ;; Vector modes for 16-bit floating-point support.
> > (define_mode_iterator VH [V8HF V4HF])
> >
> > My pattern has absolutely no effect on V2SF and V4SF or any of the other
> modes.
> >
> > >
> > > >
> > > > None of the code inside the expander is needed at all, the code
> > > > really only has an effect on subreg to subreg moves, as
> > > > `force_reg` doesn't do
> > > anything when it's argument is already a reg.
> > > >
> > > > The comment in the expander (which was already there) is wrong.
> > > > The
> > > > *reason* the ICE is fixed isn't because of the `force_reg`. It's
> > > > because of the mere presence of the expander itself. The expander
> > > > matches the standard mov$a optab and so this prevents
> > > emit_move_insn_1 from doing the move by subwords as it finds a
> > > pattern that's able to do the move.
> > >
> > > Could you then fix the comment in your patch as well? I hadn't
> > > understood the force_reg was not key here. You might want to update
> > > the following sentence from your patch description if you are going
> > > to include it in your commit message:
> >
> > I'll update the comment in the patch. The cover letter won't be
> > included in the commit, But it does accurately reflect the current
> > state of affairs. The patch will do the force_reg, It's just not the reason 
it
> works.
> >
> > >
> > > The way this is worked around in the back-end is that we have move
> > > patterns in neon.md that usually just force the register instead of
> > > checking with the back-end.
> > >
> > > "The way this is worked around (..) that just force the register" is
> > > what led me to believe the force_reg was important.
> > >
> > > >
> > > > The expander however always falls through and doesn’t stop RTL
> > > > generation. You could remove all the code in there and have it
> > > > properly match the *neon_mov instructions which will do the right
> > > > thing later at code generation time and avoid the redundant moves.
> > > > My
> > > guess is the original `force_reg` was copied from the other patterns
> > > like `movti` and the existing `mov`. There It makes sense
> > > because the operands can be MEM or anything general_operand.
> > > >
> > > > However the redundant moves are a different problem than what I'm
> > > > trying to solve here. So I think that's another patch which
> > > > requires further
> > > testing.
> > >
> > > I was just thinking of restricting wh

Re: [PATCH] Fix PR63155

2018-09-18 Thread Richard Biener
On Mon, 17 Sep 2018, Richard Biener wrote:

> 
> The following fixes the memory use at -O0 from out-of-SSA coalescing
> (conflict computation in particular) with lots of abnormals (setjmp
> calls).  After reviewing I found no reason to not use
> compute_optimized_partition_bases since we populate the coalesce lists
> and used_in_copy pieces correctly at -O0 and using 
> compute_optimized_partition_bases avoids having gigantic bases for
> type-equivalent variables, specifically:
> 
> -   /* This restricts what anonymous SSA names we can coalesce
> -  as it restricts the sets we compute conflicts for.
> -  Using TREE_TYPE to generate sets is the easiest as
> -  type equivalency also holds for SSA names with the same
> -  underlying decl.
> -
> -  Check gimple_can_coalesce_p when changing this code.  */
> -   m->base.from = (TYPE_CANONICAL (TREE_TYPE (var))
> -   ? TYPE_CANONICAL (TREE_TYPE (var))
> -   : TREE_TYPE (var));
> 
> but instead uses bases that only include things that we desire to
> coalesce later.  Memory use of the conflict bitmaps is quadratic
> in the sizes of the base sets so splitting up to more bases is
> desirable (and doesn't change code generation).
> 
> {,-O0} Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> I still have one other memory/compile-time improvement patch but
> I lack a testcase that shows a measurable difference.

Re-bootstrapped and tested the following which adds a small
cleanup to gimple_can_coalesce_p, simplifying it a bit.

Committed to trunk.  If there's no fallout I plan to backport this.

Richard.

2018-09-17  Richard Biener  

PR middle-end/63155
* tree-ssa-coalesce.c (tree_int_map_hasher): Remove.
(compute_samebase_partition_bases): Likewise.
(coalesce_ssa_name): Always use compute_optimized_partition_bases.
(gimple_can_coalesce_p): Simplify.

Index: gcc/tree-ssa-coalesce.c
===
--- gcc/tree-ssa-coalesce.c (revision 264369)
+++ gcc/tree-ssa-coalesce.c (working copy)
@@ -1579,22 +1579,9 @@ gimple_can_coalesce_p (tree name1, tree
 
   /* If the types are not the same, see whether they are compatible.  This
  (for example) allows coalescing when the types are fundamentally the
- same, but just have different names.
-
- In the non-optimized case, we must first test TYPE_CANONICAL because
- we use it to compute the partition_to_base_index of the map.  */
-  if (flag_tree_coalesce_vars)
-{
-  if (types_compatible_p (t1, t2))
-   goto check_modes;
-}
-  else
-{
-  if (TYPE_CANONICAL (t1)
- && TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2)
- && types_compatible_p (t1, t2))
-   goto check_modes;
-}
+ same, but just have different names.  */
+  if (types_compatible_p (t1, t2))
+goto check_modes;
 
   return false;
 }
@@ -1720,89 +1707,6 @@ compute_optimized_partition_bases (var_m
   partition_delete (tentative);
 }
 
-/* Hashtable helpers.  */
-
-struct tree_int_map_hasher : nofree_ptr_hash 
-{
-  static inline hashval_t hash (const tree_int_map *);
-  static inline bool equal (const tree_int_map *, const tree_int_map *);
-};
-
-inline hashval_t
-tree_int_map_hasher::hash (const tree_int_map *v)
-{
-  return tree_map_base_hash (v);
-}
-
-inline bool
-tree_int_map_hasher::equal (const tree_int_map *v, const tree_int_map *c)
-{
-  return tree_int_map_eq (v, c);
-}
-
-/* This routine will initialize the basevar fields of MAP with base
-   names.  Partitions will share the same base if they have the same
-   SSA_NAME_VAR, or, being anonymous variables, the same type.  This
-   must match gimple_can_coalesce_p in the non-optimized case.  */
-
-static void
-compute_samebase_partition_bases (var_map map)
-{
-  int x, num_part;
-  tree var;
-  struct tree_int_map *m, *mapstorage;
-
-  num_part = num_var_partitions (map);
-  hash_table tree_to_index (num_part);
-  /* We can have at most num_part entries in the hash tables, so it's
- enough to allocate so many map elements once, saving some malloc
- calls.  */
-  mapstorage = m = XNEWVEC (struct tree_int_map, num_part);
-
-  /* If a base table already exists, clear it, otherwise create it.  */
-  free (map->partition_to_base_index);
-  map->partition_to_base_index = (int *) xmalloc (sizeof (int) * num_part);
-
-  /* Build the base variable list, and point partitions at their bases.  */
-  for (x = 0; x < num_part; x++)
-{
-  struct tree_int_map **slot;
-  unsigned baseindex;
-  var = partition_to_var (map, x);
-  if (SSA_NAME_VAR (var)
- && (!VAR_P (SSA_NAME_VAR (var))
- || !DECL_IGNORED_P (SSA_NAME_VAR (var
-   m->base.from = SSA_NAME_VAR (var);
-  else
-   /* This restricts what anonymous SSA names we can coalesce
-  as it restricts the sets we compute conflicts for.
-  Using

[AArch64][testsuite][committed] Fix gcc.target/aarch64/combine_bfxil.c for -mabi=ilp32

2018-09-18 Thread Kyrill Tkachov

Hi all,

As described in https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00963.html this 
test generates UXTW instructions with -mabi=ilp32
because the foo* functions take pointers and store results into them. In ILP32 
the callee clears the top bits with a UXTW.
This trips the scan-assembler-not UXTW test that checks that the zero_extend 
form of the BFXIL pattern is used, which it is.

This patch avoids this problem by not passing pointers to the results, but 
instead using global variables for which the foo* functions
will synthesise the address using ADRP, avoiding the UXTW instructions.

With this patch the test PASSes fully with -mabi=ilp32 and still PASSes on LP64.

Committing to trunk as obvious.

Thanks,
Kyrill

2018-09-18  Kyrylo Tkachov  

* gcc.target/aarch64/combine_bfxil.c: Avoid passing pointers to
functions.
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
index 98f6159fbda0978dd863f16d5d88dbff8c4b343f..84e5377ce9a10953f50b7c13ed06563bef014a55 100644
--- a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -19,7 +19,7 @@ combine_balanced (unsigned long long a, unsigned long long b)
 unsigned long long
 combine_minimal (unsigned long long a, unsigned long long b)
 {
-  return (a & 0xfffe) | (b & 0x0001);
+  return (a & 0xfffell) | (b & 0x0001ll);
 }
 
 unsigned long long
@@ -40,77 +40,77 @@ combine_unbalanced_int (unsigned int a, unsigned int b)
   return (a & 0xff00ll) | (b & 0x00ffll);
 }
 
+unsigned long long c, d;
+
 __attribute__ ((noinline)) void
-foo (unsigned long long a, unsigned long long b, unsigned long long *c,
-  unsigned long long *d)
+foo (unsigned long long a, unsigned long long b)
 {
-  *c = combine_minimal (a, b);
-  *d = combine_minimal (b, a);
+  c = combine_minimal (a, b);
+  d = combine_minimal (b, a);
 }
 
 __attribute__ ((noinline)) void
-foo2 (unsigned long long a, unsigned long long b, unsigned long long *c,
-  unsigned long long *d)
+foo2 (unsigned long long a, unsigned long long b)
 {
-  *c = combine_balanced (a, b);
-  *d = combine_balanced (b, a);
+  c = combine_balanced (a, b);
+  d = combine_balanced (b, a);
 }
 
 __attribute__ ((noinline)) void
-foo3 (unsigned long long a, unsigned long long b, unsigned long long *c,
-  unsigned long long *d)
+foo3 (unsigned long long a, unsigned long long b)
 {
-  *c = combine_unbalanced (a, b);
-  *d = combine_unbalanced (b, a);
+  c = combine_unbalanced (a, b);
+  d = combine_unbalanced (b, a);
 }
 
+unsigned int ic, id;
+
 void
-foo4 (unsigned int a, unsigned int b, unsigned int *c, unsigned int *d)
+foo4 (unsigned int a, unsigned int b)
 {
-  *c = combine_balanced_int (a, b);
-  *d = combine_balanced_int (b, a);
+  ic = combine_balanced_int (a, b);
+  id = combine_balanced_int (b, a);
 }
 
 void
-foo5 (unsigned int a, unsigned int b, unsigned int *c, unsigned int *d)
+foo5 (unsigned int a, unsigned int b)
 {
-  *c = combine_unbalanced_int (a, b);
-  *d = combine_unbalanced_int (b, a);
+  ic = combine_unbalanced_int (a, b);
+  id = combine_unbalanced_int (b, a);
 }
 
 void
-foo6 (unsigned int a, unsigned int b, unsigned long long *c, unsigned long long *d)
+foo6 (unsigned int a, unsigned int b)
 {
-  *c = combine_zero_extended_int(a, b);
-  *d = combine_zero_extended_int(b, a);
+  c = combine_zero_extended_int(a, b);
+  d = combine_zero_extended_int(b, a);
 }
 
 int
 main (void)
 {
-  unsigned long long a = 0x0123456789ABCDEF, b = 0xFEDCBA9876543210, c, d;
-  foo3 (a, b, &c, &d);
+  unsigned long long a = 0x0123456789ABCDEF, b = 0xFEDCBA9876543210;
+  foo3 (a, b);
   if (c != 0x0123456789543210) abort ();
   if (d != 0xfedcba9876abcdef) abort ();
-  foo2 (a, b, &c, &d);
+  foo2 (a, b);
   if (c != 0x0123456776543210) abort ();
   if (d != 0xfedcba9889abcdef) abort ();
-  foo (a, b, &c, &d);
+  foo (a, b);
   if (c != 0x0123456789abcdee) abort ();
   if (d != 0xfedcba9876543211) abort ();
 
-  unsigned int a2 = 0x01234567, b2 = 0xFEDCBA98, c2, d2;
-  foo4 (a2, b2, &c2, &d2);
-  if (c2 != 0x0123ba98) abort ();
-  if (d2 != 0xfedc4567) abort ();
-  foo5 (a2, b2, &c2, &d2);
-  if (c2 != 0x01234598) abort ();
-  if (d2 != 0xfedcba67) abort ();
-
-  unsigned long long c3, d3;
-  foo6 (a2, b2, &c3, &d3);
-  if (c3 != 0x0123ba98) abort ();
-  if (d3 != 0xfedc4567) abort ();
+  unsigned int a2 = 0x01234567, b2 = 0xFEDCBA98;
+  foo4 (a2, b2);
+  if (ic != 0x0123ba98) abort ();
+  if (id != 0xfedc4567) abort ();
+  foo5 (a2, b2);
+  if (ic != 0x01234598) abort ();
+  if (id != 0xfedcba67) abort ();
+
+  foo6 (a2, b2);
+  if (c != 0x0123ba98) abort ();
+  if (d != 0xfedc4567) abort ();
   return 0;
 }
 


[PATCH][AArch64][committed] Fix gcc.target/aarch64/spellcheck_1.c and spellcheck_4.c

2018-09-18 Thread Kyrill Tkachov

Hi all,

These two tests started failing after commit r264335 that adjusted the cutoff 
point at which the diagnostic suggestions machinery decides a suggestion is 
meaningful.
For these tests it means we no longer suggest anything as an alternative to 
"armv8-a-typo" as an "arch=" pargma value. We do still list the valid options, 
we just don't prefer one particular value over the others.

When I first wrote this test it wasn't with a particular architecture 
suggestion in mind, but rather to test that the suggestion machinery is being 
sanely invoked.
So this patch changes the dg-message check to treat the "did you mean...?" hunk 
as optional (in case the heuristics in the suggestions machinery change again).

With this patch the two tests PASS again on aarch64.

Committing to trunk as obvious.

Thanks,
Kyrill

2018-09-18  Kyrylo Tkachov  

* gcc.target/aarch64/spellcheck_1.c:
Make architecture suggestion optional.
* gcc.target/aarch64/spellcheck_4.c:
Likewise.
diff --git a/gcc/testsuite/gcc.target/aarch64/spellcheck_1.c b/gcc/testsuite/gcc.target/aarch64/spellcheck_1.c
index f57e0c5463299ea1c2b369fb00ca3fc005cb567b..a0795c1cc532208fcb04ceefa6877deac4a2d749 100644
--- a/gcc/testsuite/gcc.target/aarch64/spellcheck_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/spellcheck_1.c
@@ -3,7 +3,7 @@
 __attribute__((target ("arch=armv8-a-typo"))) void
 foo ()
 {
-  /* { dg-message "valid arguments are: \[^\n\r]*; did you mean 'armv8-a'?"  "" { target *-*-* } .-1 } */
+  /* { dg-message "valid arguments are: \[^\n\r]*(; did you mean 'armv*'?)?"  "" { target *-*-* } .-1 } */
   /* { dg-error "invalid name \\(\"armv8-a-typo\"\\) in 'target\\(\"arch=\"\\)' pragma or attribute"  "" { target *-*-* } .-2 } */
   /* { dg-error "pragma or attribute 'target\\(\"arch=armv8-a-typo\"\\)' is not valid"  "" { target *-*-* } .-3 } */
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/spellcheck_4.c b/gcc/testsuite/gcc.target/aarch64/spellcheck_4.c
index 6f66fdc98e28d32dec46d79747cb31a94e9d7529..37c9d3c4d61607803641a42d24a135d28bf375bb 100644
--- a/gcc/testsuite/gcc.target/aarch64/spellcheck_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/spellcheck_4.c
@@ -8,4 +8,4 @@ foo ()
 }
 
 /* { dg-error "unknown value 'armv8-a-typo' for -march"  "" { target *-*-* } 0 } */
-/* { dg-message "valid arguments are: \[^\n\r]*; did you mean 'armv8-a'?"  "" { target *-*-* } 0 } */
+/* { dg-message "valid arguments are: \[^\n\r]*(; did you mean 'armv*'?)?"  "" { target *-*-* } 0 } */


[PATCH] Fix gcc.dg/warn-abs-1.c for arm and aarch64-none-elf

2018-09-18 Thread Kyrill Tkachov

Hi all,

This new test has some difficulties on the fabsl function.
On arm this is because we don't support the _Float128 type which the test uses.
This is handled in the patch by requiring a float128 target selector.

On aarch64-none-elf, a Newlib target, it fails because fabsl is not available.
long double support is known to be incomplete in newlib, and the fabsl function 
is not available
for targets where long double is larger than a double.
Therefore this patch skips the test on such targets.

Ok for trunk?

Thanks,
Kyrill

2018-09-18  Kyrylo Tkachov  

* gcc.dg/warn-abs-1.c: Require float128 target.
Skip if large_long_double newlib target.
diff --git a/gcc/testsuite/gcc.dg/warn-abs-1.c b/gcc/testsuite/gcc.dg/warn-abs-1.c
index 6aa937c3a2e9921e90969911550eebf2965ffdb4..129a3af8ac69a93596e98c6e50089fe9b74fe3d0 100644
--- a/gcc/testsuite/gcc.dg/warn-abs-1.c
+++ b/gcc/testsuite/gcc.dg/warn-abs-1.c
@@ -1,4 +1,5 @@
-/* { dg-do compile } */
+/* { dg-do compile { target float128 } } */
+/* { dg-skip-if "incomplete long double support" { { newlib } && large_long_double } }  */
 /* { dg-options "-Wabsolute-value" } */
 
 #include 


[PATCH c++/86881] -Wshadow-local-compatible ICE

2018-09-18 Thread Nathan Sidwell
This ICE happens when we try and figure if an unresolved auto type is 
compatible with a previous binding.  With the addition of auto, we're 
now checking too early, but that's a harder problem.   This fixes the 
ICE, but the downside is we don't warn at all, if the auto resolves to a 
compatible type.  But that's not a new problem.


Applying to trunk and gcc-8

nathan
--
Nathan Sidwell
2018-09-18  Nathan Sidwell  

	PR c++/86881
	cp/
	* name-lookup.c (check_local_shadow): Ignore auto types.

	testsuite/
	* g++.dg/warn/pr86881.C: New.

Index: gcc/cp/name-lookup.c
===
--- gcc/cp/name-lookup.c	(revision 264371)
+++ gcc/cp/name-lookup.c	(working copy)
@@ -2764,6 +2764,13 @@ check_local_shadow (tree decl)
 	   && (same_type_p (TREE_TYPE (old), TREE_TYPE (decl))
 		   || (!dependent_type_p (TREE_TYPE (decl))
 		   && !dependent_type_p (TREE_TYPE (old))
+		   /* If the new decl uses auto, we don't yet know
+			  its type (the old type cannot be using auto
+			  at this point, without also being
+			  dependent).  This is an indication we're
+			  (now) doing the shadow checking too
+			  early.  */
+		   && !type_uses_auto (TREE_TYPE (decl))
 		   && can_convert (TREE_TYPE (old), TREE_TYPE (decl),
    tf_none
 	warning_code = OPT_Wshadow_compatible_local;
Index: gcc/testsuite/g++.dg/warn/pr86881.C
===
--- gcc/testsuite/g++.dg/warn/pr86881.C	(revision 0)
+++ gcc/testsuite/g++.dg/warn/pr86881.C	(working copy)
@@ -0,0 +1,20 @@
+// PR c++/86881 ICE with shadow warning
+// { dg-do compile { c++11 } }
+// { dg-additional-options { -Wshadow-compatible-local } }}
+
+void a() {
+  auto b([] {});
+  {
+auto b = 0;
+  }
+}
+
+struct Proxy { };
+
+void Two ()
+{
+  auto my = Proxy ();
+  {
+auto my = Proxy (); // { dg-warning "shadows" "" { xfail *-*-* } }
+  };
+}


Re: [PATCH]: Allow TARGET_SCHED_ADJUST_PRIORITY hook to reduce priority

2018-09-18 Thread Jeff Law
On 9/18/18 1:52 AM, Andreas Schwab wrote:
> On Sep 17 2018, Jeff Law  wrote:
> 
>> On 9/17/18 3:08 AM, Andreas Schwab wrote:
>>> PR rtl-optimization/85458
>>> * sel-sched.c (sel_target_adjust_priority): Remove wrong
>>> assertion.
>> Under what conditions is the new priority negative?  Without digging
>> deep into the ia64 port that just seems wrong.
> 
> It is created in create_speculation_check:
But that says nothing about why this happens or whether or not it's a
valid state.



 /* Decrease priority of check by difference of load/check instruction
 latencies.  */
  EXPR_PRIORITY (INSN_EXPR (insn)) -= (sel_vinsn_cost (INSN_VINSN
(orig_insn))
   - sel_vinsn_cost (INSN_VINSN
(insn)));

There's nothing inherently wrong with having one cost be higher than the
other.  So I don't think this is a problem with computing costs in the
target files.  This feels a lot more like a bug in sel-sched.c


My inclination would be to declare negative priorities invalid and clamp
the value in sel-sched.c.  THere's other places in sel-sched.c that look
fishy as well.  Of course my other inclination would be to kill
sel-sched completely.

Jeff


Re: [PATCH] Fix gcc.dg/warn-abs-1.c for arm and aarch64-none-elf

2018-09-18 Thread Jeff Law
On 9/18/18 7:45 AM, Kyrill Tkachov wrote:
> Hi all,
> 
> This new test has some difficulties on the fabsl function.
> On arm this is because we don't support the _Float128 type which the
> test uses.
> This is handled in the patch by requiring a float128 target selector.
> 
> On aarch64-none-elf, a Newlib target, it fails because fabsl is not
> available.
> long double support is known to be incomplete in newlib, and the fabsl
> function is not available
> for targets where long double is larger than a double.
> Therefore this patch skips the test on such targets.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2018-09-18  Kyrylo Tkachov  
> 
>     * gcc.dg/warn-abs-1.c: Require float128 target.
>     Skip if large_long_double newlib target.

OK
jeff


Re: [PATCH] Fix gcc.dg/warn-abs-1.c for arm and aarch64-none-elf

2018-09-18 Thread Rainer Orth
Hi Kyrill,

> This new test has some difficulties on the fabsl function.
> On arm this is because we don't support the _Float128 type which the test 
> uses.
> This is handled in the patch by requiring a float128 target selector.
>
> On aarch64-none-elf, a Newlib target, it fails because fabsl is not available.
> long double support is known to be incomplete in newlib, and the fabsl
> function is not available
> for targets where long double is larger than a double.
> Therefore this patch skips the test on such targets.

this is PR testsuite/87339.  Please fix the ChangeLog accordingly.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] S/390: Fix conditional returns

2018-09-18 Thread Segher Boessenkool
On Wed, Sep 05, 2018 at 10:34:48AM +0200, Ilya Leoshkevich wrote:
> S/390 epilogue ends with (parallel [(return) (use %r14)]) instead of
> the more usual (return) or (simple_return).  This sequence is not
> recognized by the conditional return logic in try_optimize_cfg ().

Why does it need this?  Other targets with a link register make
EPILOGUE_USES handle this.

If you really need a parallel, can you make ANY_RETURN_P recognise it?

> +/* Create a copy of PARALLEL with side-effect OSIDE replaced by NSIDE.  */
> +static rtx
> +copy_update_parallel (rtx par, rtx *oside, rtx nside)
> +{
> +  rtx npar;
> +  int i;
> +
> +  npar = gen_rtx_PARALLEL (GET_MODE (par), rtvec_alloc (XVECLEN (par, 0)));
> +  for (i = XVECLEN (par, 0) - 1; i >= 0; i--)
> +{
> +  rtx *side_effect = &XVECEXP (par, 0, i);
> +
> +  if (side_effect == oside)
> + XVECEXP (npar, 0, i) = nside;
> +  else
> + XVECEXP (npar, 0, i) = copy_rtx (*side_effect);
> +}
> +  return npar;
> +}

This doesn't work if nside is used anywhere else.  But the only caller
uses the previous instruction pattern; maybe make a function specialised
to that only?  You could give it a better name, too ;-)  (It is especially
surprising because the function is called copy_* but it does _not_ copy
its argument!)


Segher


Re: [PATCH][GCC][AARCH64] enable STLUR use: Use STLUR in atomic_store

2018-09-18 Thread Richard Earnshaw (lists)
On 18/09/18 10:15, Matthew Malcomson wrote:
> [PATCH][GCC][AARCH64] Use STLUR for atomic_store
> 
> Use the STLUR instruction introduced in Armv8.4-a.
> This instruction has the store-release semantic like STLR but can take a
> 9-bit unscaled signed immediate offset.
> 
> Example test case:
> ```
> void
> foo ()
> {
> int32_t *atomic_vals = calloc (4, sizeof (int32_t));
> atomic_store_explicit (atomic_vals + 1, 2, memory_order_release);
> }
> ```
> 
> Before patch generates
> ```
> foo:
>   stp x29, x30, [sp, -16]!
>   mov x1, 4
>   mov x0, x1
>   mov x29, sp
>   bl  calloc
>   mov w1, 2
>   add x0, x0, 4
>   stlrw1, [x0]
>   ldp x29, x30, [sp], 16
>   ret
> ```
> 
> After patch generates
> ```
> foo:
>   stp x29, x30, [sp, -16]!
>   mov x1, 4
>   mov x0, x1
>   mov x29, sp
>   bl  calloc
>   mov w1, 2
>   stlur   w1, [x0, 4]
>   ldp x29, x30, [sp], 16
>   ret
> ```
> 
> We introduce a new feature flag to indicate the presence of this instruction.
> The feature flag is called AARCH64_ISA_RCPC8_4 and is included when targeting
> armv8.4 architecture.
> 
> We also introduce an "arch" attribute to be checked called "rcpc8_4" after 
> this
> feature flag.
> 
> Full bootstrap and regression test done on aarch64-none-linux-gnu.
> Ok for trunk?
> 
> gcc/
> 
> 2018-09-18  Matthew Malcomson  
> 
>   * config/aarch64/aarch64-protos.h
>   (aarch64_offset_9bit_signed_unscaled_p): New declaration.
>   * config/aarch64/aarch64.md (arches): New "rcpc8_4" attribute value.
>   (arch_enabled): Add check for "rcpc8_4" attribute value of "arch".
>   * config/aarch64/aarch64.h (AARCH64_FL_RCPC8_4): New bitfield.
>   (AARCH64_FL_FOR_ARCH8_4): Include AARCH64_FL_RCPC8_4.
>   (AARCH64_FL_PROFILE): Move index so flags are ordered.
>   (AARCH64_ISA_RCPC8_4): New flag.
>   * config/aarch64/aarch64.c (offset_9bit_signed_unscaled_p): Renamed
>   to aarch64_offset_9bit_signed_unscaled_p.
>   * config/aarch64/atomics.md (atomic_store): Allow offset
>   and use stlur.
>   * config/aarch64/constraints.md (Ust): New constraint.
>   * config/aarch64/predicates.md.
>   (aarch64_9bit_offset_memory_operand): New predicate.
> 
> gcc/testsuite/
> 
> 2018-09-18  Matthew Malcomson  
> 
>   * gcc.target/aarch64/atomic-store.c: New.
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> ef95fc829b83886e2ff00e4664e31af916e99b0c..7a6254e46893fb36dc2ae57e7cfe78af67fb0e49
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -393,6 +393,7 @@ void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, 
> rtx, rtx, rtx);
>  bool aarch64_mov_operand_p (rtx, machine_mode);
>  rtx aarch64_reverse_mask (machine_mode, unsigned int);
>  bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64);
> +bool aarch64_offset_9bit_signed_unscaled_p (machine_mode, poly_int64);
>  char *aarch64_output_sve_cnt_immediate (const char *, const char *, rtx);
>  char *aarch64_output_sve_addvl_addpl (rtx, rtx, rtx);
>  char *aarch64_output_sve_inc_dec_immediate (const char *, rtx);
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 
> c1218503bab19323eee1cca8b7e4bea8fbfcf573..cc21e1656b75b4ed1e94d0eb4b2b3af0039ba47e
>  100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -157,9 +157,10 @@ extern unsigned aarch64_architecture_version;
>  #define AARCH64_FL_SM4 (1 << 17)  /* Has ARMv8.4-A SM3 and SM4.  
> */
>  #define AARCH64_FL_SHA3(1 << 18)  /* Has ARMv8.4-a SHA3 and 
> SHA512.  */
>  #define AARCH64_FL_F16FML (1 << 19)  /* Has ARMv8.4-a FP16 extensions.  
> */
> +#define AARCH64_FL_RCPC8_4(1 << 20)  /* Has ARMv8.4-a RCPC extensions.  
> */
>  
>  /* Statistical Profiling extensions.  */
> -#define AARCH64_FL_PROFILE(1 << 20)
> +#define AARCH64_FL_PROFILE(1 << 21)
>  
>  /* Has FP and SIMD.  */
>  #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
> @@ -178,7 +179,7 @@ extern unsigned aarch64_architecture_version;
>(AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_V8_3)
>  #define AARCH64_FL_FOR_ARCH8_4   \
>(AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4 | AARCH64_FL_F16FML \
> -   | AARCH64_FL_DOTPROD)
> +   | AARCH64_FL_DOTPROD | AARCH64_FL_RCPC8_4)
>  
>  /* Macros to test ISA flags.  */
>  
> @@ -199,6 +200,7 @@ extern unsigned aarch64_architecture_version;
>  #define AARCH64_ISA_SM4 (aarch64_isa_flags & AARCH64_FL_SM4)
>  #define AARCH64_ISA_SHA3(aarch64_isa_flags & AARCH64_FL_SHA3)
>  #define AARCH64_ISA_F16FML  (aarch64_isa_flags & AARCH64_FL_F16FML)
> +#define AARCH64_ISA_RCPC8_4 (aarch64_isa_flags & AARCH

Re: [PATCH] Fix gcc.dg/warn-abs-1.c for arm and aarch64-none-elf

2018-09-18 Thread Kyrill Tkachov



On 18/09/18 14:59, Rainer Orth wrote:

Hi Kyrill,


This new test has some difficulties on the fabsl function.
On arm this is because we don't support the _Float128 type which the test uses.
This is handled in the patch by requiring a float128 target selector.

On aarch64-none-elf, a Newlib target, it fails because fabsl is not available.
long double support is known to be incomplete in newlib, and the fabsl
function is not available
for targets where long double is larger than a double.
Therefore this patch skips the test on such targets.

this is PR testsuite/87339.  Please fix the ChangeLog accordingly.


Ah, thanks. I've adjusted the ChangeLog entry with r264393.

Kyrill


Rainer





[PATCH] PR other/87353 fix formatting and grammar in manual

2018-09-18 Thread Jonathan Wakely

The changes to invoke.texi in r242433 left some unwanted spaces that
texi2pod.pl interprets as verbatim formatting. There are also some
grammatical errors due to the removal of references to GCJ, where the
G++ driver is referred to in the plural.

PR other/87353
* doc/invoke.texi (Link Options): Fix formatting and grammar.

Committing to trunk as obvious. I'll also backport it to gcc-7 and
gcc-8 branches.

commit ebf978995cadb23e91d112d859f0e3f5dfb055ad
Author: Jonathan Wakely 
Date:   Tue Sep 18 15:07:45 2018 +0100

PR other/87353 fix formatting and grammar in manual

The changes to invoke.texi in r242433 left some unwanted spaces that
texi2pod.pl interprets as verbatim formatting. There are also some
grammatical errors due to the removal of references to GCJ, where the
G++ driver is referred to in the plural.

PR other/87353
* doc/invoke.texi (Link Options): Fix formatting and grammar.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 94304c314cf..685c211e176 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12625,9 +12625,9 @@ of these is when the application wishes to throw and 
catch exceptions
 across different shared libraries.  In that case, each of the libraries
 as well as the application itself should use the shared @file{libgcc}.
 
-Therefore, the G++ and driver automatically adds @option{-shared-libgcc}
- whenever you build a shared library or a main executable, because C++
- programs typically use exceptions, so this is the right thing to do.
+Therefore, the G++ driver automatically adds @option{-shared-libgcc}
+whenever you build a shared library or a main executable, because C++
+programs typically use exceptions, so this is the right thing to do.
 
 If, instead, you use the GCC driver to create shared libraries, you may
 find that they are not always linked with the shared @file{libgcc}.
@@ -12641,8 +12641,7 @@ propagate through such shared libraries, without 
incurring relocation
 costs at library load time.
 
 However, if a library or main executable is supposed to throw or catch
-exceptions, you must link it using the G++ driver, as appropriate
-for the languages used in the program, or using the option
+exceptions, you must link it using the G++ driver, or using the option
 @option{-shared-libgcc}, such that it is linked with the shared
 @file{libgcc}.
 


Re: [PATCH v3] combine: perform jump threading at the end

2018-09-18 Thread Segher Boessenkool
On Tue, Sep 18, 2018 at 12:22:14PM +0200, Ilya Leoshkevich wrote:
> 
> 
> > Am 17.09.2018 um 19:11 schrieb Segher Boessenkool 
> > :
> > 
> > On Mon, Sep 17, 2018 at 10:50:58AM +0200, Ilya Leoshkevich wrote:
> >>> Am 14.09.2018 um 23:35 schrieb Segher Boessenkool 
> >>> :
> >>> Could you please show generated code before and after this patch?
> >>> I mean generated assembler code.  What -S gives you.
> >> 
> >> Before:
> >> 
> >> foo4:
> >> .LFB0:
> >>.cfi_startproc
> >>lt  %r1,0(%r2)
> >>jne .L2
> >>lhi %r3,1
> >>cs  %r1,%r3,0(%r2)
> >> .L2:
> >>jne .L5
> >>br  %r14
> >> .L5:
> >>jg  bar
> >> 
> >> After:
> >> 
> >> foo4:
> >> .LFB0:
> >>.cfi_startproc
> >>lt  %r1,0(%r2)
> >>jne .L4
> >>lhi %r3,1
> >>cs  %r1,%r3,0(%r2)
> >>jne .L4
> >>br  %r14
> >> .L4:
> >>jg  bar
> > 
> > Ah.  And a compiler of some weeks old gives
> > 
> > foo4:
> > .LFB0:
> > .cfi_startproc
> > lhi %r3,0
> > lhi %r4,1
> > cs  %r3,%r4,0(%r2)
> > jne .L4
> > br  %r14
> > .L4:
> > jg  bar
> > 
> > so this is all caused by the recent optimisation that avoids the "cs" if
> > it can.
> 
> Could you please try building with -march=z13?  I don’t see the „lt“ 
> instruction in your output.  On z196+ we try to speed up the code by
> jumping around the „cs“ when possible.

This was a few weeks old source (because it didn't bootstrap elsewhere).
This is s390-linux-gcc -Wall -W -march=z196 -O2 as in the PR (and the
compiler has largely default options).

-march=z13 has identical output on that compiler.



Segher


Re: [PATCH v3] combine: perform jump threading at the end

2018-09-18 Thread Segher Boessenkool
On Tue, Sep 18, 2018 at 01:29:56PM +0200, Ilya Leoshkevich wrote:
> > Yeah, jump2 is quite late.  I think you should do it before postreload_cse
> > or similar.
> 
> Thanks, I will give it a try.
> 
> > Btw, what percentage of files (or subroutines) does jump threading run
> > after combine, with your patch?
> 
> I checked this on SPEC CPU 2006.
> 
> Turns out it’s not as good as I expected: the additional jump threading
> is performed on 64% of the functions.  I think this is because I only
> check whether PATTERNs have any side-effects, but I don’t look at
> INSN_CODEs.  I'll try to change this and see whether I'll get a better
> number.
> 
> Here are some other figures:
> 
> - The total number of cleanup_cfg calls: +3.3%.
> - The total run time of cleanup_cfg: +5.5%.
> - The average run time of cleanup_cfg:   +2.1%.

Thanks for all those numbers!

So IMO it is not worth doing this only _some_ of the time after combine,
given it doesn't help much and requires awful code, and awful code
duplication (setting state no less, eww).

It is also clear that jump threading indeed is quite a bit more expensive
than the average cleanup_cfg.

But it is also clear that in the grand scheme of things doing it always
does not cost much.

I wonder if it also helps on other targets to do an extra jump threading
(say right before postreload_cse)?  That would pretty much seal the deal
I'd say.


Segher


Re: [rs6000] Do not generate sibling call to nested function

2018-09-18 Thread Segher Boessenkool
Hi!

On Mon, Sep 17, 2018 at 11:11:53AM +0200, Eric Botcazou wrote:
> more precisely, to a nested function that requires a static chain.  The 
> reason 
> is that the sibling call epilogue may clobber the static chain register r11.
> 
> Tested on PowerPC/Linux, OK for the mainline?
> 
> 
> 2018-09-17  Eric Botcazou  
> 
>   * config/rs6000/rs6000.c (rs6000_function_ok_for_sibcall): Return false
>   if the call takes a static chain.


> --- config/rs6000/rs6000.c(revision 264342)
> +++ config/rs6000/rs6000.c(working copy)
> @@ -24332,6 +24332,10 @@ rs6000_function_ok_for_sibcall (tree dec
>  {
>tree fntype;
>  
> +  /* The sibcall epilogue may clobber the static chain register.  */
> +  if (CALL_EXPR_STATIC_CHAIN (exp))
> +return false;
> +
>if (decl)
>  fntype = TREE_TYPE (decl);
>else

We could probably make sibcalls work with static chain, but that is most
likely not worth it.  Could you change the comment to say something like
that?

Approved for trunk and backports.  Thanks!


Segher


Re: C++ PATCH to implement P1064R0, Virtual Function Calls in Constant Expressions (v4)

2018-09-18 Thread Marek Polacek
On Mon, Sep 17, 2018 at 11:28:06PM -0400, Jason Merrill wrote:
> On Mon, Sep 17, 2018 at 5:39 PM, Marek Polacek  wrote:
> > On Fri, Sep 14, 2018 at 04:45:22PM -0400, Marek Polacek wrote:
> >> On Fri, Sep 14, 2018 at 04:30:46PM -0400, Jason Merrill wrote:
> >> > On Fri, Sep 14, 2018 at 1:19 PM, Marek Polacek  
> >> > wrote:
> >> > > This patch implements another bit of C++20, virtual calls in constant
> >> > > expression:
> >> > > 
> >> > > The basic idea is that since in a constant expression we know the 
> >> > > dynamic
> >> > > type (to detect invalid code etc.), the restriction that prohibits 
> >> > > virtual
> >> > > calls is unnecessary.
> >> > >
> >> > > Handling virtual function calls turned out to be fairly easy (as 
> >> > > anticipated);
> >> > > I simply let the constexpr machinery figure out the dynamic type and 
> >> > > then
> >> > > OBJ_TYPE_REF_TOKEN gives us the index into the virtual function table. 
> >> > >  That
> >> > > way we get the function decl we're interested in, and 
> >> > > cxx_eval_call_expression
> >> > > takes it from there.
> >> > >
> >> > > But handling pointer-to-virtual-member-functions doesn't work like 
> >> > > that.
> >> > > get_member_function_from_ptrfunc creates a COND_EXPR which looks like
> >> > > if (pf.__pfn & 1) // is it a virtual function?
> >> > >   // yes, find the pointer in the vtable
> >> > > else
> >> > >   // no, just return the pointer
> >> > > so ideally we want to evaluate the then-branch.  Eventually it'll 
> >> > > evaluate it
> >> > > to something like _ZTV2X2[2], but the vtable isn't constexpr so we'd 
> >> > > end up
> >> > > with "not a constant expression" error.
> >> >
> >> > Then let's mark the vtable as constexpr, there's no reason for it not to 
> >> > be.
> >
> > Done.  But then I had to add indexes to the vtable's ctor (because 
> > find_array_ctor_elt
> > expects it), which broke an assert in gimple_get_virt_method_for_vtable.  
> > But I
> > don't need the array_ref hack anymore!
> 
> > Also, I had to set DECL_DECLARED_CONSTEXPR_P after maybe_commonize_var,
> > otherwise we run into the sorry in that function with -fno-weak...
> 
> Hmm, we shouldn't give that sorry for DECL_ARTIFICIAL variables.
> 
> Looking more closely, it seems that the call to maybe_commonize_var
> from initialize_artificial_var did nothing before this change, since
> the vtable is DECL_ARTIFICIAL, so it didn't pass the condition at the
> top.  I suppose we should extend the !DECL_ARTIFICIAL check in
> maybe_commonize_var to the inline variable case as well.

Done.  And then I could move setting DECL_DECLARED_CONSTEXPR_P to
initialize_artificial_var, which is where I think it belongs.

> >> Ok, unfortunately it wasn't as easy as merely marking it 
> >> DECL_DECLARED_CONSTEXPR_P
> >> in initialize_artificial_var because then I saw "used in its own 
> >> initializer"
> >> error.  Which I don't know why, but now that I know you agree with this 
> >> direction
> >> I can dig deeper.
> >>
> >> > > Since the vtable initializer is
> >> > > a compile-time constant, I thought we could make it work by a hack as 
> >> > > the one
> >> > > in cxx_eval_array_reference.  We'll then let cxx_eval_call_expression 
> >> > > do its
> >> > > job and everything is hunky-dory.
> >> > >
> >> > > Except when it isn't: I noticed that the presence of _vptr doesn't 
> >> > > make the
> >> > > class non-empty, and when cxx_eval_constant_expression saw a decl with 
> >> > > an empty
> >> > > class type, it just evaluated it to { }.  But such a class still had 
> >> > > gotten an
> >> > > initializer that looks like {.D.2082 = {._vptr.X2 = &_ZTV2X2 + 16}}.  
> >> > > So
> >> > > replacing it with { } will lose the proper initializer whereupon we 
> >> > > fail.
> >> > > The check I've added to cxx_eval_constant_expression won't win any 
> >> > > beauty
> >> > > contests but unfortunately EMPTY_CONSTRUCTOR_P doesn't work there.
> >> >
> >> > Perhaps we should check !TYPE_POLYMORPHIC_P as well as
> >> > is_really_empty_class.  Perhaps there should be a predicate for that,
> >> > say, is_really_nearly_empty_class...
> >
> > For now I've only added the !TYPE_POLYMORPHIC_P check, which works just 
> > fine.
> >
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> >
> > 2018-09-17  Marek Polacek  
> >
> > P1064R0 - Allowing Virtual Function Calls in Constant Expressions
> > * call.c (build_over_call): Add FIXME.
> > * class.c (initialize_vtable): Mark the vtable as constexpr.
> > (build_vtbl_initializer): Build vtable's constructor with indexes.
> > * constexpr.c (cxx_eval_constant_expression): Don't ignore _vptr's
> > initializer.  Handle OBJ_TYPE_REF.
> > (potential_constant_expression_1): Handle OBJ_TYPE_REF.
> > * decl.c (grokdeclarator): Change error to pedwarn.  Only warn when
> > pedantic and not C++2a.
> >
> > * gimple-fold.c (gimpl

Re: [PATCH][GCC][AARCH64] enable STLUR use: Use STLUR in atomic_store

2018-09-18 Thread Matthew Malcomson




diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 
36c06756a1f94cadae097b3aad654fbeba1cf2f3..73078e412d01a43c05195f01488b95a2bc7a20ec
 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -481,9 +481,9 @@
  )
  
  (define_insn "atomic_store"

-  [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "=Q")
+  [(set (match_operand:ALLI 0 "aarch64_9bit_offset_memory_operand" "=Q,Ust")

This is less than ideal because on earlier architectures the predicate
will allow the offset variants but register allocation will then have to
undo that to match the first alternative.

I think what we should do is define a wrapped variant of
aarch64_9bit_offset_memory_operand which uses that function but only
allows the offset when RCPC8_4 is enabled.

Something like

aarch64_rcpc_memory_operand (...)
{
   if (TARGET_RCPC8_4)
 return aarch64_9bit_offset_memory_operand (...);
   return aarch64_sync_memory_operand (...);
}

OK with that change.

R.




Is defining that in the predicates.md file like the below OK?
 (just to keep the new predicates together and so it can be found in 
predicates.md)



(define_predicate "aarch64_rcpc_memory_operand"
  (if_then_else (match_test "AARCH64_ISA_RCPC8_4")
    (match_operand 0 "aarch64_9bit_offset_memory_operand")
    (match_operand 0 "aarch64_sync_memory_operand")))





Re: [PATCH][GCC][AARCH64] enable STLUR use: Use STLUR in atomic_store

2018-09-18 Thread Richard Earnshaw (lists)
On 18/09/18 16:36, Matthew Malcomson wrote:
> 
>>> diff --git a/gcc/config/aarch64/atomics.md
>>> b/gcc/config/aarch64/atomics.md
>>> index
>>> 36c06756a1f94cadae097b3aad654fbeba1cf2f3..73078e412d01a43c05195f01488b95a2bc7a20ec
>>> 100644
>>> --- a/gcc/config/aarch64/atomics.md
>>> +++ b/gcc/config/aarch64/atomics.md
>>> @@ -481,9 +481,9 @@
>>>   )
>>>     (define_insn "atomic_store"
>>> -  [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "=Q")
>>> +  [(set (match_operand:ALLI 0 "aarch64_9bit_offset_memory_operand"
>>> "=Q,Ust")
>> This is less than ideal because on earlier architectures the predicate
>> will allow the offset variants but register allocation will then have to
>> undo that to match the first alternative.
>>
>> I think what we should do is define a wrapped variant of
>> aarch64_9bit_offset_memory_operand which uses that function but only
>> allows the offset when RCPC8_4 is enabled.
>>
>> Something like
>>
>> aarch64_rcpc_memory_operand (...)
>> {
>>    if (TARGET_RCPC8_4)
>>  return aarch64_9bit_offset_memory_operand (...);
>>    return aarch64_sync_memory_operand (...);
>> }
>>
>> OK with that change.
>>
>> R.
>>
>>
> 
> Is defining that in the predicates.md file like the below OK?
>  (just to keep the new predicates together and so it can be found in
> predicates.md)
> 
> 
> (define_predicate "aarch64_rcpc_memory_operand"
>   (if_then_else (match_test "AARCH64_ISA_RCPC8_4")
>     (match_operand 0 "aarch64_9bit_offset_memory_operand")
>     (match_operand 0 "aarch64_sync_memory_operand")))
> 
> 
> 

Sure.

R.


Re: [PATCH 4/5] Add pp_humanized_limit and pp_humanized_range

2018-09-18 Thread Martin Sebor

On 09/14/2018 12:17 PM, David Malcolm wrote:

gcc/ChangeLog:
* pretty-print.c (get_power_of_two): New function.
(struct relative_to_power_of_2): New struct.
(pp_humanized_limit): New function.
(pp_humanized_range): New function.
(selftest::assert_pp_humanized_limit): New function.
(ASSERT_PP_HUMANIZED_LIMIT): New macro.
(selftest::assert_pp_humanized_range): New function.
(ASSERT_PP_HUMANIZED_RANGE): New macro.
(selftest::test_get_power_of_two): New function.
(selftest::test_humanized_printing): New function.
(selftest::pretty_print_c_tests): Call them.
* pretty-print.h (pp_humanized_limit): New function.
(pp_humanized_range): New function.

...

+static void
+test_humanized_printing ()
+{
+  ASSERT_PP_HUMANIZED_LIMIT ((1<<16) + 1, "(1<<16)+1");
+  ASSERT_PP_HUMANIZED_LIMIT (10, "10");
+  ASSERT_PP_HUMANIZED_LIMIT (1<<17, "1<<17");
+  ASSERT_PP_HUMANIZED_LIMIT ((1<<17) - 1, "(1<<17)-1");
+  ASSERT_PP_HUMANIZED_LIMIT ((1<<17) + 1, "(1<<17)+1");
+  ASSERT_PP_HUMANIZED_LIMIT (4294967295, "(1<<32)-1");
+
+  ASSERT_PP_HUMANIZED_RANGE (0, 0, "0");
+  ASSERT_PP_HUMANIZED_RANGE (0, 1, "0...1");
+  ASSERT_PP_HUMANIZED_RANGE ((1<<16), (1<<17), "65536...1<<17");


I didn't comment on this aspect of the change yesterday: besides
making very large numbers easier for humans to understand, my hope
was to also help avoid data model dependencies in the test suite.

When testing on any given host it's easy to forget that a constant
may have a different value in a different data model and hardcode
a fixed number.  The test then fails for targets where the constant
has a different value.

As it is, the sprintf (and other) tests deal with the problem like
this:

  T (-1, "%*cX", INT_MAX, '1'); /* { dg-warning "output of \[0-9\]+ 
bytes causes result to exceed .INT_MAX." } */


I.e., by accepting any sequence of digits where a large constant
like INT_MAX is expected.  This works fine but doesn't verify that
the value printed is correct.

One approach to dealing with this is to use manifest constants
in the diagnostic output that are independent of the data model,
like INT_MAX, SIZE_MAX, etc.

Using shift expressions instead doesn't solve this problem,
but even beyond that, I wouldn't consider shift expressions
like "(1 << 16) + 1" to be significantly more readable than
their value.  IMO, it overestimates the capacity of most
programmers, to do shift arithmetic in their heads ;-)

Do you see a problem with using the  constants?

Martin


Re: [PATCH,nvptx] Remove use of CUDA unified memory in libgomp

2018-09-18 Thread Cesar Philippidis
On 08/01/2018 04:12 AM, Tom de Vries wrote:
> On 07/31/2018 05:27 PM, Cesar Philippidis wrote:

>>/* Copy the (device) pointers to arguments to the device (dp and hp might 
>> in
>>   fact have the same value on a unified-memory system).  */
> 
> This comment needs to be updated, right?
> 
>> -  CUDA_CALL_ASSERT (cuMemcpy, (CUdeviceptr) dp, (CUdeviceptr) hp,
>> +  CUDA_CALL_ASSERT (cuMemcpyHtoD, dp, hp,
>>  mapnum * sizeof (void *));
>>GOMP_PLUGIN_debug (0, "  %s: kernel %s: launch"
>>   " gangs=%u, workers=%u, vectors=%u\n",
>> -- 2.7.4
>>
> 
> Otherwise OK.

Thanks. I've committed the attach patch to trunk.

Cesar
[nvptx] Remove use of CUDA unified memory in libgomp

2018-09-18  Cesar Philippidis  

	libgomp/
	* plugin/plugin-nvptx.c (struct cuda_map): New.
	(struct ptx_stream): Replace d, h, h_begin, h_end, h_next, h_prev,
	h_tail with (cuda_map *) map.
	(cuda_map_create): New function.
	(cuda_map_destroy): New function.
	(map_init): Update to use a linked list of cuda_map objects.
	(map_fini): Likewise.
	(map_pop): Likewise.
	(map_push): Likewise.  Return CUdeviceptr instead of void.
	(init_streams_for_device): Remove stales references to ptx_stream
	members.
	(select_stream_for_async): Likewise.
	(nvptx_exec): Update call to map_init.

(cherry picked from gomp-4_0-branch r242614)
---
 libgomp/plugin/plugin-nvptx.c | 170 ++
 1 file changed, 91 insertions(+), 79 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index bae1b05..6492e5f 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -192,20 +192,20 @@ cuda_error (CUresult r)
 static unsigned int instantiated_devices = 0;
 static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct cuda_map
+{
+  CUdeviceptr d;
+  size_t size;
+  bool active;
+  struct cuda_map *next;
+};
+
 struct ptx_stream
 {
   CUstream stream;
   pthread_t host_thread;
   bool multithreaded;
-
-  CUdeviceptr d;
-  void *h;
-  void *h_begin;
-  void *h_end;
-  void *h_next;
-  void *h_prev;
-  void *h_tail;
-
+  struct cuda_map *map;
   struct ptx_stream *next;
 };
 
@@ -217,101 +217,114 @@ struct nvptx_thread
   struct ptx_device *ptx_dev;
 };
 
+static struct cuda_map *
+cuda_map_create (size_t size)
+{
+  struct cuda_map *map = GOMP_PLUGIN_malloc (sizeof (struct cuda_map));
+
+  assert (map);
+
+  map->next = NULL;
+  map->size = size;
+  map->active = false;
+
+  CUDA_CALL_ERET (NULL, cuMemAlloc, &map->d, size);
+  assert (map->d);
+
+  return map;
+}
+
+static void
+cuda_map_destroy (struct cuda_map *map)
+{
+  CUDA_CALL_ASSERT (cuMemFree, map->d);
+  free (map);
+}
+
+/* The following map_* routines manage the CUDA device memory that
+   contains the data mapping arguments for cuLaunchKernel.  Each
+   asynchronous PTX stream may have multiple pending kernel
+   invocations, which are launched in a FIFO order.  As such, the map
+   routines maintains a queue of cuLaunchKernel arguments.
+
+   Calls to map_push and map_pop must be guarded by ptx_event_lock.
+   Likewise, calls to map_init and map_fini are guarded by
+   ptx_dev_lock inside GOMP_OFFLOAD_init_device and
+   GOMP_OFFLOAD_fini_device, respectively.  */
+
 static bool
 map_init (struct ptx_stream *s)
 {
   int size = getpagesize ();
 
   assert (s);
-  assert (!s->d);
-  assert (!s->h);
-
-  CUDA_CALL (cuMemAllocHost, &s->h, size);
-  CUDA_CALL (cuMemHostGetDevicePointer, &s->d, s->h, 0);
 
-  assert (s->h);
+  s->map = cuda_map_create (size);
 
-  s->h_begin = s->h;
-  s->h_end = s->h_begin + size;
-  s->h_next = s->h_prev = s->h_tail = s->h_begin;
-
-  assert (s->h_next);
-  assert (s->h_end);
   return true;
 }
 
 static bool
 map_fini (struct ptx_stream *s)
 {
-  CUDA_CALL (cuMemFreeHost, s->h);
+  assert (s->map->next == NULL);
+  assert (!s->map->active);
+
+  cuda_map_destroy (s->map);
+
   return true;
 }
 
 static void
 map_pop (struct ptx_stream *s)
 {
-  assert (s != NULL);
-  assert (s->h_next);
-  assert (s->h_prev);
-  assert (s->h_tail);
-
-  s->h_tail = s->h_next;
-
-  if (s->h_tail >= s->h_end)
-s->h_tail = s->h_begin + (int) (s->h_tail - s->h_end);
+  struct cuda_map *next;
 
-  if (s->h_next == s->h_tail)
-s->h_prev = s->h_next;
+  assert (s != NULL);
 
-  assert (s->h_next >= s->h_begin);
-  assert (s->h_tail >= s->h_begin);
-  assert (s->h_prev >= s->h_begin);
+  if (s->map->next == NULL)
+{
+  s->map->active = false;
+  return;
+}
 
-  assert (s->h_next <= s->h_end);
-  assert (s->h_tail <= s->h_end);
-  assert (s->h_prev <= s->h_end);
+  next = s->map->next;
+  cuda_map_destroy (s->map);
+  s->map = next;
 }
 
-static void
-map_push (struct ptx_stream *s, size_t size, void **h, void **d)
+static CUdeviceptr
+map_push (struct ptx_stream *s, size_t size)
 {
-  int left;
-  int offset;
+  struct cuda_map *map = NULL, *t = NULL;
 
-  assert (s != NULL);
+  assert (s);
+  assert (s->map);
 
-  left = s->h_end

Re: [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat

2018-09-18 Thread Will Schmidt
On Wed, 2018-09-12 at 08:23 -0500, Segher Boessenkool wrote: 
> Hi!

> > +   unsigned int n_elts = VECTOR_CST_NELTS (arg0);
> > +   if (TREE_CODE (arg1) != INTEGER_CST
> > +   || TREE_INT_CST_LOW (arg1) > (n_elts -1))
> > + return false;
> 
> I think you should check lower bound 0 as well.  Maybe use IN_RANGE?

I'm comparing the IN_RANGE usage here with elsewhere in rs6000.c.
Do I need the sext_hwi() adjustment for arg1 within the IN_RANGE check?
As reference, this is for the vec_splat intrinsic, where arg1 is defined
as 'const int', and we want to return false if arg1 is outside the range
of 0..#elements . 
vector bool char vec_splat (vector bool char, const int);

So.. this?
unsigned int n_elts = VECTOR_CST_NELTS (arg0);
if (TREE_CODE (arg1) != INTEGER_CST
|| !IN_RANGE (TREE_INT_CST_LOW (arg1), 0, (n_elts - 1))
  return false;

OR something like this with sext_hwi(...) ?:
unsigned int n_elts = VECTOR_CST_NELTS (arg0);
if (TREE_CODE (arg1) != INTEGER_CST
|| !IN_RANGE (sext_hwi (TREE_INT_CST_LOW (arg1), n_elts),
  0, (n_elts - 1)))
  return false;


thanks
-Will


> 
> The rest looks fine, thanks!  Okay for trunk (with the trivialities fixed).
> 
> 
> Segher
> 





[PATCH, i386]: Macroize *extendxf2

2018-09-18 Thread Uros Bizjak
2018-09-18  Uros Bizjak  

* config/i386/i386.md (*extendxf2): Macroize insn from
*extendsfxf2 and *extenddfxf2 using MODEF mode iterator.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 264373)
+++ config/i386/i386.md (working copy)
@@ -9837,24 +9837,15 @@
   [(set_attr "type" "fsgn")
(set_attr "mode" "DF")])
 
-(define_insn "*extendsfxf2"
+(define_insn "*extendxf2"
   [(set (match_operand:XF 0 "register_operand" "=f")
(absneg:XF (float_extend:XF
-(match_operand:SF 1 "register_operand" "0"]
+(match_operand:MODEF 1 "register_operand" "0"]
   "TARGET_80387"
   "f"
   [(set_attr "type" "fsgn")
(set_attr "mode" "XF")])
 
-(define_insn "*extenddfxf2"
-  [(set (match_operand:XF 0 "register_operand" "=f")
-   (absneg:XF (float_extend:XF
-(match_operand:DF 1 "register_operand" "0"]
-  "TARGET_80387"
-  "f"
-  [(set_attr "type" "fsgn")
-   (set_attr "mode" "XF")])
-
 ;; Copysign instructions
 
 (define_mode_iterator CSGNMODE [SF DF TF])


[PATCH] Handle CLOBBER in reg_overlap_mentioned_p (PR86882)

2018-09-18 Thread Segher Boessenkool
Combine will put CLOBBER (with a non-void mode) anywhere in a pattern
to poison it.  reg_overlap_mentioned_p did not handle this.  This patch
fixes that.

Tested on the PR testcase on x86_64-linux; also bootstrapped and
regression checked on powerpc64-linux {-m32,-m64}.  Is this okay for
trunk and backports?

What should I do with the testcase?  I don't think tool-generated
testcases are useful at all (instead, we should ship the tool!); but
if people want it anyway, should this be gcc.target/x86_64, or more
general somehow?


Segher


2018-09-18  Segher Boessenkool  

PR rtl-optimization/86882
* rtlanal.c (reg_overlap_mentioned_p): Handle CLOBBER.

---
 gcc/rtlanal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 6c620a4..e8b6b9c 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1815,6 +1815,7 @@ reg_overlap_mentioned_p (const_rtx x, const_rtx in)
  recurse:
   switch (GET_CODE (x))
 {
+case CLOBBER:
 case STRICT_LOW_PART:
 case ZERO_EXTRACT:
 case SIGN_EXTRACT:
-- 
1.8.3.1



Re: unique_ptr's new static_assert check whether deleter can be invoked fails

2018-09-18 Thread Jonathan Wakely

On 17/09/18 16:41 +0100, Jonathan Wakely wrote:

On 17/09/18 16:41 +0200, Stephan Bergmann wrote:
Compiling LibreOffice with recent GCC trunk, I came across an error 
like with this reproducer



$ cat test14.cc
#include 
struct S1;
struct S2;
struct D { void operator ()(S1 *); };
class C {
  std::unique_ptr m;
  C();
  C(C const &);
  ~C();
  C & operator =(C const &);
};

$ gcc/inst/bin/g++ -c test14.cc
In file included from /home/sbergman/gcc/inst/include/c++/9.0.0/memory:80,
   from test14.cc:1:
/home/sbergman/gcc/inst/include/c++/9.0.0/bits/unique_ptr.h: In instantiation of 
‘class std::__uniq_ptr_impl’:
/home/sbergman/gcc/inst/include/c++/9.0.0/bits/unique_ptr.h:172:33:   required from 
‘class std::unique_ptr’
test14.cc:6:28:   required from here
/home/sbergman/gcc/inst/include/c++/9.0.0/bits/unique_ptr.h:145:22: error: 
static assertion failed: unique_ptr's deleter must be invocable with a pointer
145 |   static_assert( __is_invocable<_Dp&, pointer&>::value,
  |  ^~


where S2 is an incomplete type, so the static_assert in unique_ptr 
(added recently with  
"Implement LWG 2905 changes to constrain unique_ptr constructors") 
can't determine that D can be invoked with an S2* (which it can if 
S2 is derived from S1).


Oh dear, this is another example of the problem described in
https://cplusplus.github.io/LWG/lwg-active.html#3099

I think strictly speaking, your code is undefined according to
[unique.ptr.single] p1. That requires the is_invocable condition to be
true. I think that should be considered a defect in the standard, so
I've requested a new issue to fix that.

I'm not sure whether this is a bug in LibreOffice (which would need 
to make sure S2 is a complete type when defining C),


Or you can define D::pointer as S1* and then the assertion passes.

struct D {
using pointer = S1*;
void operator ()(S1 *);
};



or a too aggressive static_assert in libstdc++?


Yes, I'll have to remove it.


I've committed the attached patch (and requested a new issue to fix
the standard). Thanks for bringing this to our attention.


commit d9451e907ff44cd4203da7dda76d41d01577d00b
Author: Jonathan Wakely 
Date:   Tue Sep 18 15:25:51 2018 +0100

Fix location of invocable check for unique_ptr deleter

The deleter only needs to be invocable when the unique_ptr destructor
and reset member function are instantiated. In other contexts it might
not be possible to pass unique_ptr::pointer to the deleter, if
that requires a derived-to-base conversion from T* and T is incomplete.

* include/bits/unique_ptr.h (__uniq_ptr_impl): Remove static assertion
checking invocable condition.
(unique_ptr::~unique_ptr, unique_ptr::reset): Restore static assertion
here, where types must be complete. Pass pointer to deleter as an
rvalue.
* testsuite/20_util/unique_ptr/requirements/incomplete.cc: New test.

diff --git a/libstdc++-v3/include/bits/unique_ptr.h b/libstdc++-v3/include/bits/unique_ptr.h
index ddc6ae0e233..0717c1e2728 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -142,8 +142,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static_assert( !is_rvalue_reference<_Dp>::value,
 		 "unique_ptr's deleter type must be a function object type"
 		 " or an lvalue reference type" );
-  static_assert( __is_invocable<_Dp&, pointer&>::value,
-	 "unique_ptr's deleter must be invocable with a pointer" );
 
   __uniq_ptr_impl() = default;
   __uniq_ptr_impl(pointer __p) : _M_t() { _M_ptr() = __p; }
@@ -282,9 +280,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Destructor, invokes the deleter if the stored pointer is not null.
   ~unique_ptr() noexcept
   {
+	static_assert(__is_invocable::value,
+		  "unique_ptr's deleter must be invocable with a pointer");
 	auto& __ptr = _M_t._M_ptr();
 	if (__ptr != nullptr)
-	  get_deleter()(__ptr);
+	  get_deleter()(std::move(__ptr));
 	__ptr = pointer();
   }
 
@@ -389,10 +389,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   reset(pointer __p = pointer()) noexcept
   {
+	static_assert(__is_invocable::value,
+		  "unique_ptr's deleter must be invocable with a pointer");
 	using std::swap;
 	swap(_M_t._M_ptr(), __p);
 	if (__p != pointer())
-	  get_deleter()(__p);
+	  get_deleter()(std::move(__p));
   }
 
   /// Exchange the pointer and deleter with another object.
diff --git a/libstdc++-v3/testsuite/20_util/unique_ptr/requirements/incomplete.cc b/libstdc++-v3/testsuite/20_util/unique_ptr/requirements/incomplete.cc
new file mode 100644
index 000..96e02e4c903
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/unique_ptr/requirements/incomplete.cc
@@ -0,0 +1,33 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This fi

Re: [PATCH] Handle CLOBBER in reg_overlap_mentioned_p (PR86882)

2018-09-18 Thread Richard Sandiford
Segher Boessenkool  writes:
> Combine will put CLOBBER (with a non-void mode) anywhere in a pattern
> to poison it.  reg_overlap_mentioned_p did not handle this.  This patch
> fixes that.
>
> Tested on the PR testcase on x86_64-linux; also bootstrapped and
> regression checked on powerpc64-linux {-m32,-m64}.  Is this okay for
> trunk and backports?

Yes, thanks.

> What should I do with the testcase?  I don't think tool-generated
> testcases are useful at all (instead, we should ship the tool!); but
> if people want it anyway, should this be gcc.target/x86_64, or more
> general somehow?

Seems obvious enough not to need a testcase IMO.

Richard

> Segher
>
>
> 2018-09-18  Segher Boessenkool  
>
>   PR rtl-optimization/86882
>   * rtlanal.c (reg_overlap_mentioned_p): Handle CLOBBER.

>
> ---
>  gcc/rtlanal.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index 6c620a4..e8b6b9c 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1815,6 +1815,7 @@ reg_overlap_mentioned_p (const_rtx x, const_rtx in)
>   recurse:
>switch (GET_CODE (x))
>  {
> +case CLOBBER:
>  case STRICT_LOW_PART:
>  case ZERO_EXTRACT:
>  case SIGN_EXTRACT:


Re: [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat

2018-09-18 Thread Segher Boessenkool
On Tue, Sep 18, 2018 at 10:52:17AM -0500, Will Schmidt wrote:
> On Wed, 2018-09-12 at 08:23 -0500, Segher Boessenkool wrote: 
> > Hi!
> 
> > > + unsigned int n_elts = VECTOR_CST_NELTS (arg0);
> > > + if (TREE_CODE (arg1) != INTEGER_CST
> > > + || TREE_INT_CST_LOW (arg1) > (n_elts -1))
> > > +   return false;
> > 
> > I think you should check lower bound 0 as well.  Maybe use IN_RANGE?
> 
> I'm comparing the IN_RANGE usage here with elsewhere in rs6000.c.
> Do I need the sext_hwi() adjustment for arg1 within the IN_RANGE check?

No, it will work fine (it cast to unsigned HWI internally).

> As reference, this is for the vec_splat intrinsic, where arg1 is defined
> as 'const int', and we want to return false if arg1 is outside the range
> of 0..#elements . 
>   vector bool char vec_splat (vector bool char, const int);
> 
> So.. this?
>   unsigned int n_elts = VECTOR_CST_NELTS (arg0);
>   if (TREE_CODE (arg1) != INTEGER_CST
>   || !IN_RANGE (TREE_INT_CST_LOW (arg1), 0, (n_elts - 1))
> return false;

That looks fine, but you can lose the parens around "n_elts - 1".

...

It turns out TREE_INT_CST_LOW returns an unsigned HWI, so this is all
moot, sorry for the misdirection; just the > is fine, as you had.

But write:

if (TREE_CODE (arg1) != INTEGER_CST
|| TREE_INT_CST_LOW (arg1) >= n_elts)

or use tree_to_uhwi perhaps...  Ask someone who knows trees :-)


Segher


Re: [PATCH v2] libgcc: m68k: avoid TEXTRELs in shared library (PR 86224)

2018-09-18 Thread Andreas Schwab
On Sep 17 2018, Jeff Law  wrote:

> Hmm, I thought Andreas NAK'd V1.

Consider that retracted.  My misunderstanding.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


PING^3 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-09-18 Thread H.J. Lu
On Tue, Sep 11, 2018 at 9:01 AM, H.J. Lu  wrote:
> On Tue, Sep 4, 2018 at 9:01 AM, H.J. Lu  wrote:
>> On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu  wrote:
>>> With -mavx, for
>>>
>>> [hjl@gnu-cfl-1 skx-2]$ cat foo.i
>>> extern float f;
>>> extern double d;
>>> extern int i;
>>>
>>> void
>>> foo (void)
>>> {
>>>   d = f;
>>>   f = i;
>>> }
>>>
>>> we need to generate
>>>
>>> vxorp[ds]   %xmmN, %xmmN, %xmmN
>>> ...
>>> vcvtss2sd   f(%rip), %xmmN, %xmmX
>>> ...
>>> vcvtsi2ss   i(%rip), %xmmN, %xmmY
>>>
>>> to avoid partial XMM register stall.  This patch adds a pass to generate
>>> a single
>>>
>>> vxorps  %xmmN, %xmmN, %xmmN
>>>
>>> at function entry, which is shared by all SF and DF conversions, instead
>>> of generating one
>>>
>>> vxorp[ds]   %xmmN, %xmmN, %xmmN
>>>
>>> for each SF/DF conversion.
>>>
>>> Performance impacts on SPEC CPU 2017 rate with 1 copy using
>>>
>>> -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops
>>>
>>> are
>>>
>>> 1. On Broadwell server:
>>>
>>> 500.perlbench_r (-0.82%)
>>> 502.gcc_r (0.73%)
>>> 505.mcf_r (-0.24%)
>>> 520.omnetpp_r (-2.22%)
>>> 523.xalancbmk_r (-1.47%)
>>> 525.x264_r (0.31%)
>>> 531.deepsjeng_r (0.27%)
>>> 541.leela_r (0.85%)
>>> 548.exchange2_r (-0.11%)
>>> 557.xz_r (-0.34%)
>>> Geomean: (-0.23%)
>>>
>>> 503.bwaves_r (0.00%)
>>> 507.cactuBSSN_r (-1.88%)
>>> 508.namd_r (0.00%)
>>> 510.parest_r (-0.56%)
>>> 511.povray_r (0.49%)
>>> 519.lbm_r (-1.28%)
>>> 521.wrf_r (-0.28%)
>>> 526.blender_r (0.55%)
>>> 527.cam4_r (-0.20%)
>>> 538.imagick_r (2.52%)
>>> 544.nab_r (-0.18%)
>>> 549.fotonik3d_r (-0.51%)
>>> 554.roms_r (-0.22%)
>>> Geomean: (0.00%)
>>>
>>> 2. On Skylake client:
>>>
>>> 500.perlbench_r (-0.29%)
>>> 502.gcc_r (-0.36%)
>>> 505.mcf_r (1.77%)
>>> 520.omnetpp_r (-0.26%)
>>> 523.xalancbmk_r (-3.69%)
>>> 525.x264_r (-0.32%)
>>> 531.deepsjeng_r (0.00%)
>>> 541.leela_r (-0.46%)
>>> 548.exchange2_r (0.00%)
>>> 557.xz_r (0.00%)
>>> Geomean: (-0.34%)
>>>
>>> 503.bwaves_r (0.00%)
>>> 507.cactuBSSN_r (-0.56%)
>>> 508.namd_r (0.87%)
>>> 510.parest_r (0.00%)
>>> 511.povray_r (-0.73%)
>>> 519.lbm_r (0.84%)
>>> 521.wrf_r (0.00%)
>>> 526.blender_r (-0.81%)
>>> 527.cam4_r (-0.43%)
>>> 538.imagick_r (2.55%)
>>> 544.nab_r (0.28%)
>>> 549.fotonik3d_r (0.00%)
>>> 554.roms_r (0.32%)
>>> Geomean: (0.12%)
>>>
>>> 3. On Skylake server:
>>>
>>> 500.perlbench_r (-0.55%)
>>> 502.gcc_r (0.69%)
>>> 505.mcf_r (0.00%)
>>> 520.omnetpp_r (-0.33%)
>>> 523.xalancbmk_r (-0.21%)
>>> 525.x264_r (-0.27%)
>>> 531.deepsjeng_r (0.00%)
>>> 541.leela_r (0.00%)
>>> 548.exchange2_r (-0.11%)
>>> 557.xz_r (0.00%)
>>> Geomean: (0.00%)
>>>
>>> 503.bwaves_r (0.58%)
>>> 507.cactuBSSN_r (0.00%)
>>> 508.namd_r (0.00%)
>>> 510.parest_r (0.18%)
>>> 511.povray_r (-0.58%)
>>> 519.lbm_r (0.25%)
>>> 521.wrf_r (0.40%)
>>> 526.blender_r (0.34%)
>>> 527.cam4_r (0.19%)
>>> 538.imagick_r (5.87%)
>>> 544.nab_r (0.17%)
>>> 549.fotonik3d_r (0.00%)
>>> 554.roms_r (0.00%)
>>> Geomean: (0.62%)
>>>
>>> On Skylake client, impacts on 538.imagick_r are
>>>
>>> size before:
>>>
>>>textdata bss dec hex filename
>>> 277   108765576 2572029  273efd imagick_r.exe
>>>
>>> size after:
>>>
>>>textdata bss dec hex filename
>>> 2511825   108765576 2528277  269415 imagick_r.exe
>>>
>>> number of vxorp[ds]:
>>>
>>> before  after   difference
>>> 14570   4515-69%
>>>
>>> OK for trunk?
>>>
>>> Thanks.
>>>
>>>
>>> H.J.
>>> ---
>>> gcc/
>>>
>>> 2018-08-28  H.J. Lu  
>>> Sunil K Pandey  
>>>
>>> PR target/87007
>>> * config/i386/i386-passes.def: Add
>>> pass_remove_partial_avx_dependency.
>>> * config/i386/i386-protos.h
>>> (make_pass_remove_partial_avx_dependency): New.
>>> * config/i386/i386.c (make_pass_remove_partial_avx_dependency):
>>> New function.
>>> (pass_data_remove_partial_avx_dependency): New.
>>> (pass_remove_partial_avx_dependency): Likewise.
>>> (make_pass_remove_partial_avx_dependency): Likewise.
>>> * config/i386/i386.md (SF/DF conversion splitters): Disabled
>>> for TARGET_AVX.
>>>
>>> gcc/testsuite/
>>>
>>> 2018-08-28  H.J. Lu  
>>> Sunil K Pandey  
>>>
>>> PR target/87007
>>> * gcc.target/i386/pr87007.c: New file.
>>
>>
>> PING:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01781.html
>>
>
> PING.
>

Hi Kirll, Jakub, Jan,

Can you take a look?

Thanks.

-- 
H.J.


Re: [PATCH] look harder for MEM_REF operand equality to avoid -Wstringop-truncation (PR 84561)

2018-09-18 Thread Martin Sebor

On 09/17/2018 05:09 PM, Jeff Law wrote:

On 9/14/18 4:11 PM, Martin Sebor wrote:

On 09/14/2018 03:35 PM, Jeff Law wrote:

On 9/12/18 11:46 AM, Martin Sebor wrote:

On 08/31/2018 04:07 AM, Richard Biener wrote:

On Thu, Aug 30, 2018 at 7:39 PM Martin Sebor  wrote:


On 08/30/2018 11:22 AM, Richard Biener wrote:

On August 30, 2018 6:54:21 PM GMT+02:00, Martin Sebor
 wrote:

On 08/30/2018 02:35 AM, Richard Biener wrote:

On Thu, Aug 30, 2018 at 2:12 AM Martin Sebor 

wrote:


The attached patch adds code to work harder to determine whether
the destination of an assignment involving MEM_REF is the same
as the destination of a prior strncpy call.  The included test
case demonstrates when this situation comes up.  During ccp,
dstbase and lhsbase returned by get_addr_base_and_unit_offset()
end up looking like this:


"During CCP" means exactly when?  The CCP lattice tracks copies
so CCP should already know that _1 == _8.  I suppose during
substitute_and_fold then?  But that replaces uses before folding
the stmt.


Yes, when ccp_finalize() performs the final substitution during
substitute_and_fold().


But then you shouldn't need the loop but at most look at the pointer
SSA Def to get at the non-invariant ADDR_EXPR.


I don't follow.   Are you suggesting to compare
SSA_NAME_DEF_STMT (dstbase) to SSA_NAME_DEF_STMT (lhsbase) for
equality?  They're not equal.


No.


The first loop iterates once and retrieves

   1.  _8 = &pb_3(D)->a;

The second loop iterates three times and retrieves:

   1.  _1 = _9
   2.  _9 = _8
   3.  _8 = &pb_3(D)->a;

How do I get from _1 to &pb_3(D)->a without iterating?  Or are
you saying to still iterate but compare the SSA_NAME_DEF_STMT?


I say you should retrieve _8 = &pb_3(D)->a immediately since the
copies should be
propagated out at this stage.


The warning is issued as the strncpy call is being folded (during
the dom walk in substitute_and_fold_engine::substitute_and_fold)
but before the subsequent statements have been folded (during
the subsequent loop to eliminate statements).  So at the point
of the strncpy folding the three assignments above are still
there.

I can't think of a good way to solve this problem that's not
overly intrusive.  Unless you have some suggestions for how
to deal with it, is the patch okay as is?

In what pass do you see the the naked copies?  In general those should
have been propagated away.


As I said above, this happens during the dom walk in the ccp
pass:

My bad.  Sigh. CCP doesn't track copies, just constants, so there's not
going to be any data structure you can exploit.  And I don't think
there's a value number you can use to determine the two objects are the
same.

Hmm, let's back up a bit, what is does the relevant part of the IL look
like before CCP?  Is the real problem here that we have unpropagated
copies lying around in the IL?  Hmm, more likely the IL looksl ike:

   _8 = &pb_3(D)->a;
   _9 = _8;
   _1 = _9;
   strncpy (MEM_REF (&pb_3(D)->a), ...);
   MEM[(struct S *)_1].a[n_7] = 0;


Yes, that is what the folder sees while the strncpy call is
being transformed/folded by ccp.  The MEM_REF is folded just
after the strncpy call and that's when it's transformed into

  MEM[(struct S *)_8].a[n_7] = 0;

(The assignments to _1 and _9 don't get removed until after
the dom walk finishes).



If we were to propagate the copies out we'd at best have:

   _8 = &pb_3(D)->a;
   strncpy (MEM_REF (&pb_3(D)->a), ...);
   MEM[(struct S *)_8].a[n_7] = 0;


Is that in a form you can handle?  Or would we also need to forward
propagate the address computation into the use of _8?


The above works as long as we look at the def_stmt of _8 in
the MEM_REF (we currently don't).  That's also what the last
iteration of the loop does.  In this case (with _8) it would
be discovered in the first iteration, so the loop could be
replaced by a simple if statement.

But I'm not sure I understand the concern with the loop.  Is
it that we are looping at all, i.e., the cost?  Or that ccp
is doing something wrong or suboptimal? (Should have
propagated the value of _8 earlier?)

Martin


[PATCH] rs6000: Remove old "Cygnus sibcall" comment

2018-09-18 Thread Segher Boessenkool
This comment is quite cryptic and very out-of-date by now.  Committing.


2018-09-18  Segher Boessenkool  

* config/rs6000/rs6000.md: Remove old "Cygnus sibcall" comment.

---
 gcc/config/rs6000/rs6000.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0889df..4ce24d5 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -13062,8 +13062,7 @@ (define_insn "*lmw"
(set_attr "indexed" "yes")
(set_attr "cell_micro" "always")])
 
-; FIXME: This would probably be somewhat simpler if the Cygnus sibcall
-; stuff was in GCC.  Oh, and "any_parallel_operand" is a bit flexible...
+; FIXME: "any_parallel_operand" is a bit flexible...
 
 ; The following comment applies to:
 ; save_gpregs_*
-- 
1.8.3.1



Re: [PATCH 4/4] rs6000: Delete old add+cmp patterns

2018-09-18 Thread Segher Boessenkool
On Fri, Aug 17, 2018 at 09:29:01AM +0200, Steven Bosscher wrote:
> On the topic of archaeology: Perhaps the time also finally has come to
> revisit this comment in rs6000.md:
> 
> 13065; FIXME: This would probably be somewhat simpler if the Cygnus sibcall
> 13066; stuff was in GCC. <...>

Done in r264403 now.  I never understood what it was supposed to mean :-)


Segher


[patch, fortran] Inline BLAS calls to support conjg(transpose(a))

2018-09-18 Thread Thomas Koenig

Hello world,

this patch generates direct calls to *GEMM when -fexternal-blas is
specified. This allows to handle arguments to conjugate and transposed
elements, which is quite a common use case.

While looking at the code, I found that the inline limit checks were not
correctly handled for cases except for A2B2. This is also fixed.

In order to check all cases at runtime, I simply copied the reference
BLAS routines to the test cases, why they are *.f instead of *.f90.

Regarding the bounds checking: I added three new test cases, but
as for checking everything, that would be a bit too much. The code
is clear enough that I think the other cases should be OK.

OK for trunk?

Regards

Thomas

2018-09-18  Thomas Koenig  

PR fortran/29550
* gfortran.h (gfc_expr): Add external_blas flag.
* frontend-passes.c (matrix_case): Add case A2TB2T.
(optimize_namespace): Handle flag_external_blas by
calling call_external_blas.
(get_array_inq_function): Add argument okind. If
it is nonzero, use it as the kind of argument
to be used.
(inline_limit_check): Remove m_case argument, add
limit argument instead.  Remove assert about m_case.
Set the limit for inlining from the limit argument.
(matmul_lhs_realloc): Handle case A2TB2T.
(inline_matmul_assign): Handle inline limit for other cases with
two rank-two matrices.  Remove no-op calls to inline_limit_check.
(call_external_blas): New function.
* trans-intrinsic.c (gfc_conv_intrinsic_funcall): Do not add
argument to external BLAS if external_blas is already set.

2018-09-18  Thomas Koenig  

PR fortran/29550
* gfortran.dg/inline_matmul_13.f90: Adjust count for
_gfortran_matmul.
* gfortran.dg/inline_matmul_16.f90: Likewise.
* gfortran.dg/promotion_2.f90: Add -fblas-matmul-limit=1.  Scan
for dgemm instead of dgemm_.  Add call to random_number to make
standard conforming.
* gfortran.dg/matmul_blas_1.f90: New test.
* gfortran.dg/matmul_bounds_14.f: New test.
* gfortran.dg/matmul_bounds_15.f: New test.
* gfortran.dg/matmul_bounds_16.f: New test.
Index: fortran/frontend-passes.c
===
--- fortran/frontend-passes.c	(Revision 264349)
+++ fortran/frontend-passes.c	(Arbeitskopie)
@@ -53,6 +53,7 @@ static gfc_code * create_do_loop (gfc_expr *, gfc_
   char *vname=NULL);
 static gfc_expr* check_conjg_transpose_variable (gfc_expr *, bool *,
 		 bool *);
+static int call_external_blas (gfc_code **, int *, void *);
 static bool has_dimen_vector_ref (gfc_expr *);
 static int matmul_temp_args (gfc_code **, int *,void *data);
 static int index_interchange (gfc_code **, int*, void *);
@@ -131,7 +132,7 @@ static int var_num = 1;
 
 /* What sort of matrix we are dealing with when inlining MATMUL.  */
 
-enum matrix_case { none=0, A2B2, A2B1, A1B2, A2B2T, A2TB2 };
+enum matrix_case { none=0, A2B2, A2B1, A1B2, A2B2T, A2TB2, A2TB2T };
 
 /* Keep track of the number of expressions we have inserted so far
using create_var.  */
@@ -1428,7 +1429,7 @@ optimize_namespace (gfc_namespace *ns)
   gfc_code_walker (&ns->code, convert_elseif, dummy_expr_callback, NULL);
   gfc_code_walker (&ns->code, cfe_code, cfe_expr_0, NULL);
   gfc_code_walker (&ns->code, optimize_code, optimize_expr, NULL);
-  if (flag_inline_matmul_limit != 0)
+  if (flag_inline_matmul_limit != 0 || flag_external_blas)
 	{
 	  bool found;
 	  do
@@ -1441,9 +1442,15 @@ optimize_namespace (gfc_namespace *ns)
 
 	  gfc_code_walker (&ns->code, matmul_temp_args, dummy_expr_callback,
 			   NULL);
-	  gfc_code_walker (&ns->code, inline_matmul_assign, dummy_expr_callback,
-			   NULL);
 	}
+
+  if (flag_external_blas)
+	gfc_code_walker (&ns->code, call_external_blas, dummy_expr_callback,
+			 NULL);
+
+  if (flag_inline_matmul_limit != 0)
+	gfc_code_walker (&ns->code, inline_matmul_assign, dummy_expr_callback,
+			 NULL);
 }
 
   if (flag_frontend_loop_interchange)
@@ -2938,7 +2945,7 @@ matmul_temp_args (gfc_code **c, int *walk_subtrees
dim is zero-based.  */
 
 static gfc_expr *
-get_array_inq_function (gfc_isym_id id, gfc_expr *e, int dim)
+get_array_inq_function (gfc_isym_id id, gfc_expr *e, int dim, int okind = 0)
 {
   gfc_expr *fcn;
   gfc_expr *dim_arg, *kind;
@@ -2964,8 +2971,12 @@ static gfc_expr *
 }
 
   dim_arg =  gfc_get_int_expr (gfc_default_integer_kind, &e->where, dim);
-  kind = gfc_get_int_expr (gfc_default_integer_kind, &e->where,
-			   gfc_index_integer_kind);
+  if (okind != 0)
+kind = gfc_get_int_expr (gfc_default_integer_kind, &e->where,
+			 okind);
+  else
+kind = gfc_get_int_expr (gfc_default_integer_kind, &e->where,
+			 gfc_index_integer_kind);
 
   ec = gfc_copy_expr (e);
 
@@ -3026,7 +3037,7 @@ get_operand (gfc_intrinsic_op op, gfc_expr *e1, gf
removed by D

Re: [Patch, fortran] PR87336] [8/9 regression] wrong output for pointer dummy assiocated to target actual argument

2018-09-18 Thread Thomas Koenig

Hi Paul,


Bootstraps and regtests on FC28/x86_64 - OK for 8- and 9-branches?


OK.  Thanks for the patch!

Regards

Thomas


Re: [Patch, fortran] PR87336] [8/9 regression] wrong output for pointer dummy assiocated to target actual argument

2018-09-18 Thread Paul Richard Thomas
Thanks Thomas,

I'll take a look at your in a few minutes.

Cheers

Paul

On 18 September 2018 at 18:50, Thomas Koenig  wrote:
> Hi Paul,
>
>> Bootstraps and regtests on FC28/x86_64 - OK for 8- and 9-branches?
>
>
> OK.  Thanks for the patch!
>
> Regards
>
> Thomas



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [patch, fortran] Inline BLAS calls to support conjg(transpose(a))

2018-09-18 Thread Paul Richard Thomas
Hi Thomas,

This fine, except for one niggle. Rather than having the blas source
in each testcase, why don't you put all the functions in a blas file
and use the dg-additional-sources mechanism?

Cheers

Paul


On 18 September 2018 at 18:46, Thomas Koenig  wrote:
> Hello world,
>
> this patch generates direct calls to *GEMM when -fexternal-blas is
> specified. This allows to handle arguments to conjugate and transposed
> elements, which is quite a common use case.
>
> While looking at the code, I found that the inline limit checks were not
> correctly handled for cases except for A2B2. This is also fixed.
>
> In order to check all cases at runtime, I simply copied the reference
> BLAS routines to the test cases, why they are *.f instead of *.f90.
>
> Regarding the bounds checking: I added three new test cases, but
> as for checking everything, that would be a bit too much. The code
> is clear enough that I think the other cases should be OK.
>
> OK for trunk?
>
> Regards
>
> Thomas
>
> 2018-09-18  Thomas Koenig  
>
> PR fortran/29550
> * gfortran.h (gfc_expr): Add external_blas flag.
> * frontend-passes.c (matrix_case): Add case A2TB2T.
> (optimize_namespace): Handle flag_external_blas by
> calling call_external_blas.
> (get_array_inq_function): Add argument okind. If
> it is nonzero, use it as the kind of argument
> to be used.
> (inline_limit_check): Remove m_case argument, add
> limit argument instead.  Remove assert about m_case.
> Set the limit for inlining from the limit argument.
> (matmul_lhs_realloc): Handle case A2TB2T.
> (inline_matmul_assign): Handle inline limit for other cases with
> two rank-two matrices.  Remove no-op calls to inline_limit_check.
> (call_external_blas): New function.
> * trans-intrinsic.c (gfc_conv_intrinsic_funcall): Do not add
> argument to external BLAS if external_blas is already set.
>
> 2018-09-18  Thomas Koenig  
>
> PR fortran/29550
> * gfortran.dg/inline_matmul_13.f90: Adjust count for
> _gfortran_matmul.
> * gfortran.dg/inline_matmul_16.f90: Likewise.
> * gfortran.dg/promotion_2.f90: Add -fblas-matmul-limit=1.  Scan
> for dgemm instead of dgemm_.  Add call to random_number to make
> standard conforming.
> * gfortran.dg/matmul_blas_1.f90: New test.
> * gfortran.dg/matmul_bounds_14.f: New test.
> * gfortran.dg/matmul_bounds_15.f: New test.
> * gfortran.dg/matmul_bounds_16.f: New test.



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: C++ PATCH to implement P1064R0, Virtual Function Calls in Constant Expressions (v4)

2018-09-18 Thread Jason Merrill
On Tue, Sep 18, 2018 at 11:25 AM, Marek Polacek  wrote:
> On Mon, Sep 17, 2018 at 11:28:06PM -0400, Jason Merrill wrote:
>> On Mon, Sep 17, 2018 at 5:39 PM, Marek Polacek  wrote:
>> > On Fri, Sep 14, 2018 at 04:45:22PM -0400, Marek Polacek wrote:
>> >> On Fri, Sep 14, 2018 at 04:30:46PM -0400, Jason Merrill wrote:
>> >> > On Fri, Sep 14, 2018 at 1:19 PM, Marek Polacek  
>> >> > wrote:
>> >> > > This patch implements another bit of C++20, virtual calls in constant
>> >> > > expression:
>> >> > > 
>> >> > > The basic idea is that since in a constant expression we know the 
>> >> > > dynamic
>> >> > > type (to detect invalid code etc.), the restriction that prohibits 
>> >> > > virtual
>> >> > > calls is unnecessary.
>> >> > >
>> >> > > Handling virtual function calls turned out to be fairly easy (as 
>> >> > > anticipated);
>> >> > > I simply let the constexpr machinery figure out the dynamic type and 
>> >> > > then
>> >> > > OBJ_TYPE_REF_TOKEN gives us the index into the virtual function 
>> >> > > table.  That
>> >> > > way we get the function decl we're interested in, and 
>> >> > > cxx_eval_call_expression
>> >> > > takes it from there.
>> >> > >
>> >> > > But handling pointer-to-virtual-member-functions doesn't work like 
>> >> > > that.
>> >> > > get_member_function_from_ptrfunc creates a COND_EXPR which looks like
>> >> > > if (pf.__pfn & 1) // is it a virtual function?
>> >> > >   // yes, find the pointer in the vtable
>> >> > > else
>> >> > >   // no, just return the pointer
>> >> > > so ideally we want to evaluate the then-branch.  Eventually it'll 
>> >> > > evaluate it
>> >> > > to something like _ZTV2X2[2], but the vtable isn't constexpr so we'd 
>> >> > > end up
>> >> > > with "not a constant expression" error.
>> >> >
>> >> > Then let's mark the vtable as constexpr, there's no reason for it not 
>> >> > to be.
>> >
>> > Done.  But then I had to add indexes to the vtable's ctor (because 
>> > find_array_ctor_elt
>> > expects it), which broke an assert in gimple_get_virt_method_for_vtable.  
>> > But I
>> > don't need the array_ref hack anymore!
>>
>> > Also, I had to set DECL_DECLARED_CONSTEXPR_P after maybe_commonize_var,
>> > otherwise we run into the sorry in that function with -fno-weak...
>>
>> Hmm, we shouldn't give that sorry for DECL_ARTIFICIAL variables.
>>
>> Looking more closely, it seems that the call to maybe_commonize_var
>> from initialize_artificial_var did nothing before this change, since
>> the vtable is DECL_ARTIFICIAL, so it didn't pass the condition at the
>> top.  I suppose we should extend the !DECL_ARTIFICIAL check in
>> maybe_commonize_var to the inline variable case as well.
>
> Done.  And then I could move setting DECL_DECLARED_CONSTEXPR_P to
> initialize_artificial_var, which is where I think it belongs.
>
>> >> Ok, unfortunately it wasn't as easy as merely marking it 
>> >> DECL_DECLARED_CONSTEXPR_P
>> >> in initialize_artificial_var because then I saw "used in its own 
>> >> initializer"
>> >> error.  Which I don't know why, but now that I know you agree with this 
>> >> direction
>> >> I can dig deeper.
>> >>
>> >> > > Since the vtable initializer is
>> >> > > a compile-time constant, I thought we could make it work by a hack as 
>> >> > > the one
>> >> > > in cxx_eval_array_reference.  We'll then let cxx_eval_call_expression 
>> >> > > do its
>> >> > > job and everything is hunky-dory.
>> >> > >
>> >> > > Except when it isn't: I noticed that the presence of _vptr doesn't 
>> >> > > make the
>> >> > > class non-empty, and when cxx_eval_constant_expression saw a decl 
>> >> > > with an empty
>> >> > > class type, it just evaluated it to { }.  But such a class still had 
>> >> > > gotten an
>> >> > > initializer that looks like {.D.2082 = {._vptr.X2 = &_ZTV2X2 + 16}}.  
>> >> > > So
>> >> > > replacing it with { } will lose the proper initializer whereupon we 
>> >> > > fail.
>> >> > > The check I've added to cxx_eval_constant_expression won't win any 
>> >> > > beauty
>> >> > > contests but unfortunately EMPTY_CONSTRUCTOR_P doesn't work there.
>> >> >
>> >> > Perhaps we should check !TYPE_POLYMORPHIC_P as well as
>> >> > is_really_empty_class.  Perhaps there should be a predicate for that,
>> >> > say, is_really_nearly_empty_class...
>> >
>> > For now I've only added the !TYPE_POLYMORPHIC_P check, which works just 
>> > fine.
>> >
>> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
>> >
>> > 2018-09-17  Marek Polacek  
>> >
>> > P1064R0 - Allowing Virtual Function Calls in Constant Expressions
>> > * call.c (build_over_call): Add FIXME.
>> > * class.c (initialize_vtable): Mark the vtable as constexpr.
>> > (build_vtbl_initializer): Build vtable's constructor with indexes.
>> > * constexpr.c (cxx_eval_constant_expression): Don't ignore _vptr's
>> > initializer.  Handle OBJ_TYPE_REF.
>> > (potential_constant_expression_1): Handle OBJ_

Re: [PATCH] Cleanup strcpy/stpcpy no nul warning code

2018-09-18 Thread Jeff Law
On 9/18/18 5:44 AM, Bernd Edlinger wrote:
>>
> 
> Hmm, you know, I wrote a while ago the following:
> https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00411.html
> 
> Where I suggested to change c_strlen parameter nonstr to a
> structure, where additional information could go.
It certainly seems better than continually adding more parameters and/or
having a given parameter have multiple overloaded meanings.  We'd fill
in everything we can and let the caller determine what bits they need.

Martin has suggested something similar.

> 
> Well, I have the impression that the 6/6 patch should be split
> up in a part that uses c_strlen in its current form, and a
> follow-up patch that adds any additional info.
> I prefer doing things step-by-step, you know.
I'm pretty sure it's not going to be able to use c_strlen in its current
form.  There are times when it really wants the length of the string,
even if it's not terminated.


> 
> Note that the example given is probably unsafe:
>> T (&b[3][1] + v0, bsz);
> 
> This does not pass the check in string_constant because it
> looks like _2 = (char*)&b + _1; _3 = strlen (_2);
> I have no way to prove that v0 is not larger than sizeof (b[3]).
That's OK.   The existence of the variable part turns "must" warnings
into "may" warnings precisely because we don't know how v0 impacts things.

> 
> By the way test test case for pr87053 is an example
> of a false positive warning (it comes twice even though
> TREE_NO_WARNING is used).
> 
> gcc pr87053.c
> pr87053.c: In function 'main':
> pr87053.c:15:26: warning: 'strlen' argument missing terminating nul 
> [-Wstringop-overflow=]
> 15 |   if (__builtin_strlen (u.z) != 7)
> | ~^~
> pr87053.c:11:3: note: referenced argument declared here
> 11 | } u = {{"1234", "567"}};
> |   ^
> pr87053.c:15:26: warning: 'strlen' argument missing terminating nul 
> [-Wstringop-overflow=]
> 15 |   if (__builtin_strlen (u.z) != 7)
> | ~^~
> pr87053.c:11:3: note: referenced argument declared here
> 11 | } u = {{"1234", "567"}};
> |   ^
> 
> I think the duplicate warnings will not go away unless
> this warning is only diagnosed in the tree-ssa-strlen
> pass instead of the folding code, which happens repeatedly.
Please file a bug.  FWIW, I wouldn't necessarily rely on on the strlen
pass not being repeated either.

jeff


Re: C++ PATCH to implement P1064R0, Virtual Function Calls in Constant Expressions (v4)

2018-09-18 Thread Marek Polacek
On Tue, Sep 18, 2018 at 02:29:16PM -0400, Jason Merrill wrote:
> > --- gcc/cp/class.c
> > +++ gcc/cp/class.c
> > @@ -9266,6 +9266,7 @@ build_vtbl_initializer (tree binfo,
> >tree vcall_index;
> >tree fn, fn_original;
> >tree init = NULL_TREE;
> > +  tree idx = build_int_cst (size_type_node, jx++);
> 
> Let's use size_int here, like in process_init_constructor_array.

Done.
 
> OK with that change.

Thanks for quick reviews!  Next one: explicit(bool).

Applying to trunk.

2018-09-18  Marek Polacek  

P1064R0 - Allowing Virtual Function Calls in Constant Expressions
* call.c (build_over_call): No longer check if we're outside a template
function.
* class.c (build_vtbl_initializer): Build vtable's constructor with
indexes.
* constexpr.c (cxx_eval_constant_expression): Don't ignore _vptr's
initializer.  Handle OBJ_TYPE_REF.
(potential_constant_expression_1): Handle OBJ_TYPE_REF.
* decl.c (maybe_commonize_var): Bail out for any DECL_ARTIFICIAL.
(initialize_artificial_var): Mark the variable as constexpr.
(grokdeclarator): Change error to pedwarn.  Only warn when
pedantic and not C++2a.

* gimple-fold.c (gimple_get_virt_method_for_vtable): Adjust assert.

* g++.dg/cpp0x/constexpr-virtual5.C: Adjust dg-error.
* g++.dg/cpp2a/constexpr-virtual1.C: New test.
* g++.dg/cpp2a/constexpr-virtual2.C: New test.
* g++.dg/cpp2a/constexpr-virtual3.C: New test.
* g++.dg/cpp2a/constexpr-virtual4.C: New test.
* g++.dg/cpp2a/constexpr-virtual5.C: New test.
* g++.dg/cpp2a/constexpr-virtual6.C: New test.
* g++.dg/cpp2a/constexpr-virtual7.C: New test.
* g++.dg/cpp2a/constexpr-virtual8.C: New test.
* g++.dg/cpp2a/constexpr-virtual9.C: New test.
* g++.dg/diagnostic/virtual-constexpr.C: Skip for C++2a.  Use
-pedantic-errors.  Adjust dg-error.

diff --git gcc/cp/call.c gcc/cp/call.c
index 69503ca7920..ddf0ed044a0 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -8399,10 +8399,7 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
   && DECL_BUILT_IN_CLASS (fn) == BUILT_IN_NORMAL)
 maybe_warn_class_memaccess (input_location, fn, args);
 
-  if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0
-  /* Don't mess with virtual lookup in instantiate_non_dependent_expr;
-virtual functions can't be constexpr.  */
-  && !in_template_function ())
+  if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
 {
   tree t;
   tree binfo = lookup_base (TREE_TYPE (TREE_TYPE (argarray[0])),
diff --git gcc/cp/class.c gcc/cp/class.c
index e950a7423f7..9ca46441871 100644
--- gcc/cp/class.c
+++ gcc/cp/class.c
@@ -9266,6 +9266,7 @@ build_vtbl_initializer (tree binfo,
   tree vcall_index;
   tree fn, fn_original;
   tree init = NULL_TREE;
+  tree idx = size_int (jx++);
 
   fn = BV_FN (v);
   fn_original = fn;
@@ -9369,7 +9370,7 @@ build_vtbl_initializer (tree binfo,
  int i;
  if (init == size_zero_node)
for (i = 0; i < TARGET_VTABLE_USES_DESCRIPTORS; ++i)
- CONSTRUCTOR_APPEND_ELT (*inits, NULL_TREE, init);
+ CONSTRUCTOR_APPEND_ELT (*inits, idx, init);
  else
for (i = 0; i < TARGET_VTABLE_USES_DESCRIPTORS; ++i)
  {
@@ -9377,11 +9378,11 @@ build_vtbl_initializer (tree binfo,
 fn, build_int_cst (NULL_TREE, i));
TREE_CONSTANT (fdesc) = 1;
 
-   CONSTRUCTOR_APPEND_ELT (*inits, NULL_TREE, fdesc);
+   CONSTRUCTOR_APPEND_ELT (*inits, idx, fdesc);
  }
}
   else
-   CONSTRUCTOR_APPEND_ELT (*inits, NULL_TREE, init);
+   CONSTRUCTOR_APPEND_ELT (*inits, idx, init);
 }
 }
 
diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index 88c73787961..aa33319875f 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -4209,7 +4209,11 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
 CONST_DECL for aggregate constants.  */
   if (lval)
return t;
+  /* is_really_empty_class doesn't take into account _vptr, so initializing
+otherwise empty class with { } would overwrite the initializer that
+initialize_vtable created for us.  */
   if (COMPLETE_TYPE_P (TREE_TYPE (t))
+ && !TYPE_POLYMORPHIC_P (TREE_TYPE (t))
  && is_really_empty_class (TREE_TYPE (t)))
{
  /* If the class is empty, we aren't actually loading anything.  */
@@ -4778,7 +4782,6 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
 case MODOP_EXPR:
   /* GCC internal stuff.  */
 case VA_ARG_EXPR:
-case OBJ_TYPE_REF:
 case NON_DEPENDENT_EXPR:
 case BASELINK:
 case OFFSET_REF:
@@ -4788,6 +4791,34 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
   *non_constant_p

[PATCH][Middle-end][Version 3]Add a new option to control inlining only on static functions

2018-09-18 Thread Qing Zhao
Hi,

this is the 3rd version of the patch, the major change is to address Andrew’s 
concern on the documentation part.

I updated the documentation of this option as following:

'-finline-only-static'
 By default, GCC inlines functions without considering whether they
 are static or not.  This flag guides inliner to only inline static
 functions.  This option has any effect only when inlining itself is
 turned on by the -finline-functions or -finline-small-fiunctions.

 Off by default.

all other changes keep the same as version 2.

please take a look again. and let me know any comment and suggestion.

thanks.

Qing

gcc/ChangeLog

+2018-09-18  Qing Zhao  
+
+   * cif-code.def (FUNCTION_EXTERN): New CIFCODE.
+   * common.opt (-finline-only-static): New option.
+   * doc/invoke.texi: Document -finline-only-static.
+   * ipa-inline.c (can_inline_edge_p): Control inlining based on
+   function's visibility. 

gcc/testsuite/ChangeLog

+2018-09-18  Qing Zhao  
+
+   * gcc.dg/inline_only_static.c: New test.
+



New-inline-only-static.patch
Description: Binary data


Re: [PATCH] look harder for MEM_REF operand equality to avoid -Wstringop-truncation (PR 84561)

2018-09-18 Thread Jeff Law
On 9/18/18 11:12 AM, Martin Sebor wrote:

>> My bad.  Sigh. CCP doesn't track copies, just constants, so there's not
>> going to be any data structure you can exploit.  And I don't think
>> there's a value number you can use to determine the two objects are the
>> same.
>>
>> Hmm, let's back up a bit, what is does the relevant part of the IL look
>> like before CCP?  Is the real problem here that we have unpropagated
>> copies lying around in the IL?  Hmm, more likely the IL looksl ike:
>>
>>    _8 = &pb_3(D)->a;
>>    _9 = _8;
>>    _1 = _9;
>>    strncpy (MEM_REF (&pb_3(D)->a), ...);
>>    MEM[(struct S *)_1].a[n_7] = 0;
> 
> Yes, that is what the folder sees while the strncpy call is
> being transformed/folded by ccp.  The MEM_REF is folded just
> after the strncpy call and that's when it's transformed into
> 
>   MEM[(struct S *)_8].a[n_7] = 0;
> 
> (The assignments to _1 and _9 don't get removed until after
> the dom walk finishes).
> 
>>
>> If we were to propagate the copies out we'd at best have:
>>
>>    _8 = &pb_3(D)->a;
>>    strncpy (MEM_REF (&pb_3(D)->a), ...);
>>    MEM[(struct S *)_8].a[n_7] = 0;
>>
>>
>> Is that in a form you can handle?  Or would we also need to forward
>> propagate the address computation into the use of _8?
> 
> The above works as long as we look at the def_stmt of _8 in
> the MEM_REF (we currently don't).  That's also what the last
> iteration of the loop does.  In this case (with _8) it would
> be discovered in the first iteration, so the loop could be
> replaced by a simple if statement.
> 
> But I'm not sure I understand the concern with the loop.  Is
> it that we are looping at all, i.e., the cost?  Or that ccp
> is doing something wrong or suboptimal? (Should have
> propagated the value of _8 earlier?)
I suspect it's more a concern that things like copies are typically
propagated away.   So their existence in the IL (and consequently your
need to handle them) raises the question "has something else failed to
do its job earlier".

During which of the CCP passes is this happening?  Can we pull the
warning out of the folder (even if that means having a distinct warning
pass over the IL?)


Jeff


Re: [PATCH] PR86957

2018-09-18 Thread Indu Bhagat



On 09/17/2018 03:52 AM, Jan Hubicka wrote:

On 09/11/2018 02:21 AM, Martin Liška wrote:

--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -811,6 +811,10 @@ Wcoverage-mismatch
  Common Var(warn_coverage_mismatch) Init(1) Warning
  Warn in case profiles in -fprofile-use do not match.
  
+Wmissing-profile

+Common Var(warn_missing_profile) Init(1) Warning
+Warn in case profiles in -fprofile-use do not exist.

Maybe 'Want about missing profile for a function in -fprofile-use build.' ?


Since, it also warns when feedback file is missing for a compilation unit, the
suggested text above will be more restrictive. So I did not change.

Perhaps we want also to have reference from -fprofile-use documentation so 
users notice
this option.

Honza

Thanks

Yes, I had added that in the patch. Following additional text will 
appear at the end of

-fprofile-use :

+ Additionally, by default, GCC also emits a warning message if
+the feedback profiles do not exist (See @option{-Wmissing-profile}).


[Patch, fortran] PR87239 - ICE in deferred-length string

2018-09-18 Thread Paul Richard Thomas
This is sufficiently 'obvious'  that I have committed the patch to
trunk as revision 264409.

My inclination is to commit it to 8-branch as well since deferred
character bugs were one of the complaints in the fortran standards
survey. OK?

Paul

2018-09-18  Paul Thomas  

PR fortran/87239
* trans-expr.c (gfc_trans_assignment_1): The rse.pre for the
assignment of deferred character elemental function results to
a realocatable lhs must not be added to the exterior block but
must go to the loop body.

2018-09-18  Paul Thomas  

PR fortran/87239
* gfortran.dg/elemental_function_2.f90 : New test.
Index: gcc/fortran/trans-expr.c
===
*** gcc/fortran/trans-expr.c	(revision 264406)
--- gcc/fortran/trans-expr.c	(working copy)
*** gfc_trans_assignment_1 (gfc_expr * expr1
*** 10283,10290 
if (flag_realloc_lhs
&& expr2->ts.type == BT_CHARACTER && expr1->ts.deferred
&& !(lss != gfc_ss_terminator
! 	   && ((expr2->expr_type == EXPR_OP
! 		&& expr2->value.op.op == INTRINSIC_CONCAT)
  	   || (expr2->expr_type == EXPR_FUNCTION
  		   && expr2->value.function.isym != NULL
  		   && expr2->value.function.isym->id == GFC_ISYM_CONVERSION
--- 10283,10293 
if (flag_realloc_lhs
&& expr2->ts.type == BT_CHARACTER && expr1->ts.deferred
&& !(lss != gfc_ss_terminator
! 	   && ((expr2->expr_type == EXPR_FUNCTION
! 		&& expr2->value.function.esym != NULL
! 		   && expr2->value.function.esym->attr.elemental)
! 	   || (expr2->expr_type == EXPR_OP
! 		   && expr2->value.op.op == INTRINSIC_CONCAT)
  	   || (expr2->expr_type == EXPR_FUNCTION
  		   && expr2->value.function.isym != NULL
  		   && expr2->value.function.isym->id == GFC_ISYM_CONVERSION
Index: gcc/testsuite/gfortran.dg/elemental_function_2.f90
===
*** gcc/testsuite/gfortran.dg/elemental_function_2.f90	(nonexistent)
--- gcc/testsuite/gfortran.dg/elemental_function_2.f90	(working copy)
***
*** 0 
--- 1,40 
+ ! { dg-do compile }
+ !
+ ! Test the fix for PR87239 in which the call to the elemental function
+ ! 'gettwo' was being added before the scalarization loop in the assignment.
+ ! Since the result temporary was being declared in the loop body, this
+ ! drove the gimplifier crazy. It is sufficient to compile this testcase
+ ! since it used to ICE.
+ !
+ ! Contributed by Juergen Reuter  
+ !
+ module test
+   implicit none
+ contains
+ 
+   elemental function gettwo( s ) result( res )
+ character(*), intent(in) :: s
+ character(len(s)) :: res
+ 
+ res = s( 1 : 2 )
+   endfunction gettwo
+ 
+ endmodule test
+ 
+ program main
+   use test
+   implicit none
+   character(10) :: inp( 5 )
+   integer :: i
+ 
+   ! character(10), allocatable :: out(:) ! this works
+   character(:), allocatable :: out(:) ! this was stuffed
+ 
+   inp = [ 'aaa', 'bbb', 'ccc', 'ddd', 'eee' ]
+ 
+   out = gettwo( inp )
+ 
+   do i = 1, size (out, 1)
+ if (trim (out(i)) .ne. inp(i)(1:2)) stop 1
+   end do
+ endprogram main


Re: [PATCH] look harder for MEM_REF operand equality to avoid -Wstringop-truncation (PR 84561)

2018-09-18 Thread Martin Sebor

On 09/18/2018 12:58 PM, Jeff Law wrote:

On 9/18/18 11:12 AM, Martin Sebor wrote:


My bad.  Sigh. CCP doesn't track copies, just constants, so there's not
going to be any data structure you can exploit.  And I don't think
there's a value number you can use to determine the two objects are the
same.

Hmm, let's back up a bit, what is does the relevant part of the IL look
like before CCP?  Is the real problem here that we have unpropagated
copies lying around in the IL?  Hmm, more likely the IL looksl ike:

   _8 = &pb_3(D)->a;
   _9 = _8;
   _1 = _9;
   strncpy (MEM_REF (&pb_3(D)->a), ...);
   MEM[(struct S *)_1].a[n_7] = 0;


Yes, that is what the folder sees while the strncpy call is
being transformed/folded by ccp.  The MEM_REF is folded just
after the strncpy call and that's when it's transformed into

  MEM[(struct S *)_8].a[n_7] = 0;

(The assignments to _1 and _9 don't get removed until after
the dom walk finishes).



If we were to propagate the copies out we'd at best have:

   _8 = &pb_3(D)->a;
   strncpy (MEM_REF (&pb_3(D)->a), ...);
   MEM[(struct S *)_8].a[n_7] = 0;


Is that in a form you can handle?  Or would we also need to forward
propagate the address computation into the use of _8?


The above works as long as we look at the def_stmt of _8 in
the MEM_REF (we currently don't).  That's also what the last
iteration of the loop does.  In this case (with _8) it would
be discovered in the first iteration, so the loop could be
replaced by a simple if statement.

But I'm not sure I understand the concern with the loop.  Is
it that we are looping at all, i.e., the cost?  Or that ccp
is doing something wrong or suboptimal? (Should have
propagated the value of _8 earlier?)

I suspect it's more a concern that things like copies are typically
propagated away.   So their existence in the IL (and consequently your
need to handle them) raises the question "has something else failed to
do its job earlier".

During which of the CCP passes is this happening?  Can we pull the
warning out of the folder (even if that means having a distinct warning
pass over the IL?)


It happens during the third run of the pass.

The only way to do what you suggest that I could think of is
to defer the strncpy to memcpy transformation until after
the warning pass.  That was also my earlier suggestion: defer
both it and the warning until the tree-ssa-strlen pass (where
the warning is implemented to begin with -- the folder calls
into it).

Martin


Re: [patch, fortran] Inline BLAS calls to support conjg(transpose(a))

2018-09-18 Thread Thomas Koenig

Hi Paul,


This fine, except for one niggle. Rather than having the blas source
in each testcase, why don't you put all the functions in a blas file
and use the dg-additional-sources mechanism?


Good idea, I have added this. Committed as r264411.

Regards

Thomas


[nvptx] vector length patch series

2018-09-18 Thread Cesar Philippidis
Hi Tom,

Here is a link to our nvptx vector length patches on github:

  https://github.com/cesarjp/gcc/tree/trunk-og8-vl-private

Specifically, the code lives in the trunk-og8-vl-private branch. There
are a couple of outstanding dependency patches:

  * Teach gfortran to lower OpenACC routine dims
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00368.html
b186c651f37 [openacc] Make GFC default to -1 for OpenACC routine dims

  * Add target hook TARGET_GOACC_ADJUST_PARALLELISM
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00369.html
49b2039013e [openacc] Add target hook TARGET_GOACC_ADJUST_PARALLELISM

  * Enable firstprivate OpenACC reductions
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00370.html
1f70cdb7cf0 (HEAD -> trunk-og8-vl-private,
github/trunk-og8-vl-private) [OpenACC] Enable firstprivate OpenACC
reductions

  * Adjust offsets for present data clauses
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01213.html
8bcda2f1a2b [libgomp, OpenACC] Adjust offsets for present data clauses

Of the patches in trunk-og8-vl-private, the following are just general
refactors and cleanups which do not change any functionality:

7eb378e9b0c [nvptx] Generalize state propagation and synchronization
10aa1f74d5a [nvptx] Use MAX, MIN, ROUND_UP macros
9dfe611f3d8 [nvptx] Use TARGET_SET_CURRENT_FUNCTION
4fbe0e812bd [nvptx] Add axis_dim
fbe43dac79f [nvptx] Add thread count parm to bar.sync
57d3f8c88ff [nvptx] only use one bar.sync barriers in OpenACC offloaded code
f14d0e882eb [nvptx] Fix whitespace in nvptx_single and nvptx_neuter_pars
82d81fffb0f [nvptx] make nvptx state propagation function names more generic
95703737e09 [nvptx] consolidate offloaded function attributes into
struct offload_attrs
8c9e897c36d [nvptx] Rename worker_bcast variables oacc_bcast.
45147e7e3f3 [nvptx] update openacc dim macros
caa641ecfb4 [nvptx] Update insufficient launch message to accommodate
large vectors

The following patches actually implement the new vector length
functionality. Note that trunk doesn't support missing arguments between
colons in -fopenacc-dim like -fopenacc-dim=::64, so I had to remove a
couple or adjust a couple of your test cases from og8.

591973d3c3a [nvptx] use user-defined vectors when possible
fb9cefa5b17 [nvptx] Handle large vector reductions
5154d363d07 [nvptx] Force vl32 if calling vector-partitionable routines
f62e3afcf6a [nvptx, openacc] Don't emit barriers for empty loops
4cc408658fb [PR85246] [nvptx] Fix propagation of branch cond in
vw-neutered code
d97ed5fc580 [nvptx] Simplifly logic in nvptx_single
62f0c5df3dd [nvptx] Enable worker partitioning with warp-sized vector_length
f2cf96b0df3 [nvptx] Handle large vectors in libgomp
eba014c260c [nvptx] Enable large vectors
f31d8b98ca1 [nvptx] Add vector_length 128 testcases

Let me know if you encounter any problems with that github branch.

This branch has recently been recently rebased against trunk. Further, I
bootstrapped and regtested it on x86_64 Linux target with nvptx
offloading.

Thanks,
Cesar


Re: [PATCH 13/25] Create TARGET_DISABLE_CURRENT_VECTOR_SIZE

2018-09-18 Thread Andrew Stubbs

On 18/09/18 12:21, Richard Sandiford wrote:

Would the same be useful for GCN, or do you basically always
want a VF of 64?


Always 64; the vector size varies between 512-bit and 4096-bit, as needed.


None of this is a fundamental restriction in theory.  It's just
something that needs to be fixed.

One approach would be to get the loop vectoriser to iterate over the
number of lanes the target supports insteaad of all possible vector
sizes.  The problem is that on its own this would mean trying 4
lane counts even on targets with a single supported vector size.
So we'd need to do something a bit smarter...


Yeah, that sounds like an interesting project, but way more than I think 
I need. Basically, we don't need to iterate over anything; there's only 
one option for each mode.


For the purposes of this patch, might it be enough to track down all the 
places that use the current_vector_size and fix them up, somehow?


Obviously, I'm not sure what that means just yet ...

Andrew


[PATCH, rs6000] Fix PR86952 (p8-vec-xl-xst-v2.c)

2018-09-18 Thread Will Schmidt
Hi,

The expected codegen for this testcase with target {le} and
option -mcpu=power8 is lxvd2x and stxvd2x.  It was initially
committed with contents as seen on builds for P7, which was
incorrect.  Update as is appropriate.

[testsuite]

2018-09-18  Will Schmidt  

PR testsuite/86952
* p8-vec-xl-xst-v2.c: Add and update expected codegen.

diff --git a/gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst-v2.c 
b/gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst-v2.c
index cc68ceb..6f48d72 100644
--- a/gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst-v2.c
+++ b/gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst-v2.c
@@ -57,8 +57,7 @@ void
 bartle (vector unsigned short x, unsigned short * address)
 {
   vec_xst (x, 0, address);
 }
 
-/* { dg-final { scan-assembler-times "lvx" 4 } } */
-/* { dg-final { scan-assembler-times "stvx"  4 } } */
-/* { dg-final { scan-assembler-times "xxpermdi" 0 } } */
+/* { dg-final { scan-assembler-times {\mlxvd2x\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mstxvd2x\M} 4 } } */




Re: [PATCH, rs6000] Fix PR86592 (p8-vec-xl-xst-v2.c)

2018-09-18 Thread Will Schmidt
On Tue, 2018-09-18 at 15:27 -0500, Will Schmidt wrote:
> Hi,
> 
> The expected codegen for this testcase with target {le} and
> option -mcpu=power8 is lxvd2x and stxvd2x.  It was initially
> committed with contents as seen on builds for P7, which was
> incorrect.  Update as is appropriate.
> 
> [testsuite]
> 
> 2018-09-18  Will Schmidt  
> 
>   PR testsuite/86952

nuts, sorry, this should have been pr testsuite 86592.
let me try this again




[PATCH, rs6000] Fix PR86592 (p8-vec-xl-xst-v2.c)

2018-09-18 Thread Will Schmidt
Hi,

The expected codegen for this testcase with target {le} and
option -mcpu=power8 is lxvd2x and stxvd2x.  It was initially
committed with contents as seen on builds for P7, which was
incorrect.  Update as is appropriate.

[testsuite]

2018-09-18  Will Schmidt  

PR testsuite/86592
* p8-vec-xl-xst-v2.c: Add and update expected codegen.

diff --git a/gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst-v2.c 
b/gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst-v2.c
index cc68ceb..6f48d72 100644
--- a/gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst-v2.c
+++ b/gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst-v2.c
@@ -57,8 +57,7 @@ void
 bartle (vector unsigned short x, unsigned short * address)
 {
   vec_xst (x, 0, address);
 }
 
-/* { dg-final { scan-assembler-times "lvx" 4 } } */
-/* { dg-final { scan-assembler-times "stvx"  4 } } */
-/* { dg-final { scan-assembler-times "xxpermdi" 0 } } */
+/* { dg-final { scan-assembler-times {\mlxvd2x\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mstxvd2x\M} 4 } } */




Re: [PATCH] PR libstdc++/87135 Rehash only when necessary (LWG2156)

2018-09-18 Thread François Dumont

On 09/18/2018 10:41 AM, Jonathan Wakely wrote:

On 13/09/18 07:49 +0200, François Dumont wrote:

All changes limited to hashtable_c++0x.cc file.

_Prime_rehash_policy::_M_next_bkt now really does what its comment 
was declaring that is to say:

  // Return a prime no smaller than n.

_Prime_rehash_policy::_M_need_rehash rehash only when _M_next_size is 
exceeded, not only when it is reach.


    PR libstdc++/87135
    * src/c++11/hashtable_c++0x.cc:
    (_Prime_rehash_policy::_M_next_bkt): Return a prime no smaller than
    requested size, but not necessarily greater.
    (_Prime_rehash_policy::_M_need_rehash): Rehash only if target 
size is

    strictly greater than next resize threshold.
    * testsuite/23_containers/unordered_map/modifiers/reserve.cc: 
Adapt test
    to validate that there is no rehash as long as number of 
insertion is

    lower or equal to the reserved number of elements.

unordered_map tests successful, ok to commit once all other tests 
completed ?


François


diff --git a/libstdc++-v3/src/c++11/hashtable_c++0x.cc 
b/libstdc++-v3/src/c++11/hashtable_c++0x.cc

index a776a8506fe..ec6031b3f5b 100644
--- a/libstdc++-v3/src/c++11/hashtable_c++0x.cc
+++ b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
@@ -46,10 +46,10 @@ namespace __detail
  {
    // Optimize lookups involving the first elements of __prime_list.
    // (useful to speed-up, eg, constructors)
-    static const unsigned char __fast_bkt[13]
-  = { 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11, 13, 13 };
+    static const unsigned char __fast_bkt[]
+  = { 2, 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11, 13, 13 };

-    if (__n <= 12)
+    if (__n < sizeof(__fast_bkt) / sizeof(unsigned char))


sizeof(unsigned char) is defined to be 1, always.


I also though it was overkill but there are so many platforms that I 
prefer to be caution. Good to know that it can't be otherwise.




I think this should be just sizeof(__fast_bkt), or if you're trying to
guard against the type of __fast_bkt changing, then use
sizeof(__fast_bkt) / sizeof(__fast_bkt[0]))


I committed with simply sizeof(__fast_bkt)



Re: [PATCH 16/25] Fix IRA ICE.

2018-09-18 Thread Andrew Stubbs

On 17/09/18 10:22, Richard Sandiford wrote:

 writes:

The IRA pass makes an assumption that any pseudos created after the pass begins
were created explicitly by the pass itself and therefore will have
corresponding entries in its other tables.

The GCN back-end, however, often creates additional pseudos, in expand
patterns, to represent the necessary EXEC value, and these break IRA's
assumption and cause ICEs.

This patch simply has IRA skip unknown pseudos, and the problem goes away.

Presumably, it's not ideal that these registers have not been processed by IRA,
but it does not appear to do any real harm.


Could you go into more detail about how this happens?  Other targets
also create pseudos in their move patterns.


Here's a simplified snippet from the machine description:

(define_expand "mov" 




  [(set (match_operand:VEC_REG_MODE 0 "nonimmediate_operand") 




(match_operand:VEC_REG_MODE 1 "general_operand"))] 




  "" 




  { 




[...]




if (can_create_pseudo_p ()) 




  { 




rtx exec = gcn_full_exec_reg ();
rtx undef = gcn_gen_undef (mode);


[...]

emit_insn (gen_mov_vector (operands[0], operands[1], exec
 undef));
[...]

DONE;
  }
  })

gcn_full_exec_reg creates a new pseudo. It gets used as the mask 
parameter of a vec_merge.


These registers then trip the asserts in ira.c.

In the case of setup_preferred_alternate_classes_for_new_pseudos it's 
because they have numbers greater than "start" but have not been 
initialized with different ORIGINAL_REGNO (why would they have been?)


In the case of move_unallocated_pseudos it's because the table 
pseudo_replaced_reg only has entries for the new pseudos directly 
created by find_moveable_pseudos, not the ones created indirectly.


Andrew


Re: [GCC][PATCH v2][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

2018-09-18 Thread Christophe Lyon
On Thu, 13 Sep 2018 at 11:49, Kyrill Tkachov
 wrote:
>
>
> On 13/09/18 10:25, Sam Tebbs wrote:
> >
> > On 09/11/2018 04:20 PM, James Greenhalgh wrote:
> > > On Tue, Sep 04, 2018 at 10:13:43AM -0500, Sam Tebbs wrote:
> > >> Hi James,
> > >>
> > >> Thanks for the feedback. Here is an update with the changes you proposed
> > >> and an updated changelog.
> > >>
> > >> gcc/
> > >> 2018-09-04  Sam Tebbs  
> > >>
> > >>   PR target/85628
> > >>   * config/aarch64/aarch64.md (*aarch64_bfxil):
> > >>   Define.
> > >>   * config/aarch64/constraints.md (Ulc): Define
> > >>   * config/aarch64/aarch64-protos.h 
> > >> (aarch64_high_bits_all_ones_p):
> > >>   Define.
> > >>   * config/aarch64/aarch64.c (aarch64_high_bits_all_ones_p): New 
> > >> function.
> > >>
> > >> gcc/testsuite
> > >> 2018-09-04  Sam Tebbs  
> > >>
> > >>   PR target/85628
> > >>   * gcc.target/aarch64/combine_bfxil.c: New file.
> > >>   * gcc.target/aarch64/combine_bfxil_2.c: New file.
> > >>
> > >>
> > > 
> > >
> > >> +/* Return true if I's bits are consecutive ones from the MSB.  */
> > >> +bool
> > >> +aarch64_high_bits_all_ones_p (HOST_WIDE_INT i)
> > >> +{
> > >> +  return exact_log2(-i) != HOST_WIDE_INT_M1;
> > >> +}
> > > You need a space in here between the function name and the bracket:
> > >
> > >exact_log2 (-i)
> > >
> > >
> > >> +extern void abort(void);
> > > The same comment applies multiple places in this file.
> > >
> > > Likewise; if (
> > >
> > > Otherwise, OK, please apply with those fixes.
> > >
> > > Thanks,
> > > James
> >
> > Thanks for noticing that, here's the fixed version.
> >
>
> Thanks Sam, I've committed the patch on your behalf with r264264.
> If you want to get write-after-approval access to the SVN repo to commit 
> patches yourself in the future
> please fill out the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi 
> putting my address from the MAINTAINERS file as the approver.
>

Hi,

You've probably already noticed by now since you fixed the
combine_bfi_1 issue introduced by this commit, but it add another
regression:
FAIL: gcc.target/aarch64/copysign-bsl.c scan-assembler b(sl|it|if)\tv[0-9]

Christophe

> Kyrill
>
> > Sam
>


Re: [PATCH][Middle-end][Version 3]Add a new option to control inlining only on static functions

2018-09-18 Thread Martin Sebor

On 09/18/2018 12:58 PM, Qing Zhao wrote:

Hi,

this is the 3rd version of the patch, the major change is to address Andrew’s 
concern on the documentation part.

I updated the documentation of this option as following:

'-finline-only-static'
 By default, GCC inlines functions without considering whether they
 are static or not.  This flag guides inliner to only inline static
 functions.  This option has any effect only when inlining itself is
 turned on by the -finline-functions or -finline-small-fiunctions.

 Off by default.

all other changes keep the same as version 2.

please take a look again. and let me know any comment and suggestion.


Just a few minor spelling issues:



thanks.

Qing

gcc/ChangeLog

+2018-09-18  Qing Zhao  
+
+   * cif-code.def (FUNCTION_EXTERN): New CIFCODE.
+   * common.opt (-finline-only-static): New option.
+   * doc/invoke.texi: Document -finline-only-static.
+   * ipa-inline.c (can_inline_edge_p): Control inlining based on
+   function's visibility.


Probably "linkage" would be a more fitting term here.



gcc/testsuite/ChangeLog

+2018-09-18  Qing Zhao  
+
+   * gcc.dg/inline_only_static.c: New test.
+



diff --git a/gcc/cif-code.def b/gcc/cif-code.def
index 19a7621..64b2b1a 100644
--- a/gcc/cif-code.def
+++ b/gcc/cif-code.def
@@ -132,6 +132,12 @@
 DEFCIFCODE(USES_COMDAT_LOCAL, CIF_FINAL_ERROR,
 DEFCIFCODE(ATTRIBUTE_MISMATCH, CIF_FINAL_ERROR,
   N_("function attribute mismatch"))

+/* We can't inline because the user requests only inlining static 
function

+   but the function is external visible.  */

I suspect you meant: "only static functions" (plural) and
"the function has external linkage" (as defined in the C and
C++ standards).

+DEFCIFCODE(FUNCTION_EXTERN, CIF_FINAL_ERROR,
+  N_("function is external visible when the user requests only"
+ " inlining static"))
+

Here as well: either "function has external linkage" or "function
is extern."

===
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ec12711..b6b0db5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@ -8066,6 +8067,15 @@
 having large chains of nested wrapper functions.

 Enabled by default.

+@item -finline-only-static
+@opindex finline-only-static
+By default, GCC inlines functions without considering whether they are 
static

+or not. This flag guides inliner to only inline static functions.

Guides "the inliner" (missing article).

+This option has any effect only when inlining itself is turned on by the
+-finline-functions or -finline-small-fiunctions.

"by the -f... options."  (Missing "options") and
-finline-small-functions (note the spelling of functions).

+
+Off by default.

I think the customary way to word it is: "Disabled by default."
or "The finline-only-static option/flag is disabled/off by default"

Martin


[PATCH] dump_printf: use %T and %G throughout

2018-09-18 Thread David Malcolm
As promised at Cauldron, this patch uses %T and %G with dump_printf and
dump_printf_loc calls to eliminate calls to

  dump_generic_expr (MSG_*, arg, TDF_SLIM)  (via %T)

and

  dump_gimple_stmt (MSG_*, TDF_SLIM, stmt, 0)  (via %G)

throughout the middle-end, simplifying numerous dump callsites.

A few calls to these functions didn't match the above pattern; I didn't
touch these.  I wasn't able to use %E anywhere.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* tree-data-ref.c (runtime_alias_check_p): Use formatted printing
with %T in place of calls to dump_generic_expr.
(prune_runtime_alias_test_list): Likewise.
(create_runtime_alias_checks): Likewise.
* tree-vect-data-refs.c (vect_check_nonzero_value): Likewise.
(vect_analyze_data_ref_dependence): Likewise.
(vect_slp_analyze_data_ref_dependence): Likewise.
(vect_record_base_alignment): Likewise.  Use %G in place of call
to dump_gimple_stmt.
(vect_compute_data_ref_alignment): Likewise.
(verify_data_ref_alignment): Likewise.
(vect_find_same_alignment_drs): Likewise.
(vect_analyze_group_access_1): Likewise.
(vect_analyze_data_ref_accesses): Likewise.
(dependence_distance_ge_vf): Likewise.
(dump_lower_bound): Likewise.
(vect_prune_runtime_alias_test_list): Likewise.
(vect_find_stmt_data_reference): Likewise.
(vect_analyze_data_refs): Likewise.
(vect_create_addr_base_for_vector_ref): Likewise.
(vect_create_data_ref_ptr): Likewise.
* tree-vect-loop-manip.c (vect_set_loop_condition): Likewise.
(vect_can_advance_ivs_p): Likewise.
(vect_update_ivs_after_vectorizer): Likewise.
(vect_gen_prolog_loop_niters): Likewise.
(vect_prepare_for_masked_peels): Likewise.
* tree-vect-loop.c (vect_determine_vf_for_stmt): Likewise.
(vect_determine_vectorization_factor): Likewise.
(vect_is_simple_iv_evolution): Likewise.
(vect_analyze_scalar_cycles_1): Likewise.
(vect_analyze_loop_operations): Likewise.
(report_vect_op): Likewise.
(vect_is_slp_reduction): Likewise.
(check_reduction_path): Likewise.
(vect_is_simple_reduction): Likewise.
(vect_create_epilog_for_reduction): Likewise.
(vect_finalize_reduction:): Likewise.
(vectorizable_induction): Likewise.
(vect_transform_loop_stmt): Likewise.
(vect_transform_loop): Likewise.
(optimize_mask_stores): Likewise.
* tree-vect-patterns.c (vect_pattern_detected): Likewise.
(vect_split_statement): Likewise.
(vect_recog_over_widening_pattern): Likewise.
(vect_recog_average_pattern): Likewise.
(vect_determine_min_output_precision_1): Likewise.
(vect_determine_precisions_from_range): Likewise.
(vect_determine_precisions_from_users): Likewise.
(vect_mark_pattern_stmts): Likewise.
(vect_pattern_recog_1): Likewise.
* tree-vect-slp.c (vect_get_and_check_slp_defs): Likewise.
(vect_record_max_nunits): Likewise.
(vect_build_slp_tree_1): Likewise.
(vect_build_slp_tree_2): Likewise.
(vect_print_slp_tree): Likewise.
(vect_analyze_slp_instance): Likewise.
(vect_detect_hybrid_slp_stmts): Likewise.
(vect_detect_hybrid_slp_1): Likewise.
(vect_slp_analyze_operations): Likewise.
(vect_slp_analyze_bb_1): Likewise.
(vect_transform_slp_perm_load): Likewise.
(vect_schedule_slp_instance): Likewise.
* tree-vect-stmts.c (vect_mark_relevant): Likewise.
(vect_mark_stmts_to_be_vectorized): Likewise.
(vect_init_vector_1): Likewise.
(vect_get_vec_def_for_operand): Likewise.
(vect_finish_stmt_generation_1): Likewise.
(vect_check_load_store_mask): Likewise.
(vectorizable_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_load): Likewise.
(vect_analyze_stmt): Likewise.
(vect_is_simple_use): Likewise.
(vect_get_vector_types_for_stmt): Likewise.
(vect_get_mask_type_for_stmt): Likewise.
* tree-vectorizer.c (increase_alignment): Likewise.
---
 gcc/tree-data-ref.c|  48 ++---
 gcc/tree-vect-data-refs.c  | 481 +++--
 gcc/tree-vect-loop-manip.c |  36 ++--
 gcc/tree-vect-loop.c   | 215 ++--
 gcc/tree-vect-patterns.c   | 106 --
 gcc/tree-vect-slp.c| 242 ---
 gcc/tree-vect-stmts.c  | 290 ---
 gcc/tree-vectorizer.c  |   4 +-
 8 files changed, 442 insertions(+), 980 deletions(-)

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index a8c6872..bf30a61 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1322,13 +1322,9 @@ bool
 runtim

C++ PATCH for c++/87357, missing -Wconversion warning

2018-09-18 Thread Marek Polacek
As I mentioned in the PR, [class.conv.fct] says "A conversion function is never
used to convert a (possibly cv-qualified) object to the (possibly cv-qualified)
same object type (or a reference to it), to a (possibly cv-qualified) base class
of that type (or a reference to it), or to (possibly cv-qualified) void." but I
noticed we weren't warning about 

struct X {
  operator X();
};

(and there are no tests for this case) because comparing types directly only
works for TYPE_CANONICAL, otherwise we have to use same_type_*.

I'm also changing the warning about 'void' since we can't form a reference to
void.

I'll also note that I think we should warn about this by default, not only
when -Wconversion is in effect.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-09-18  Marek Polacek  

PR c++/87357 - missing -Wconversion warning
* decl.c (grok_op_properties): Remove diagnostic parts mentioning
a conversion to a reference to void.  Use
same_type_ignoring_top_level_qualifiers_p rather than comparing types
directly.

* g++.dg/warn/Wconversion5.C: New test.

diff --git gcc/cp/decl.c gcc/cp/decl.c
index 827c1720335..503b433cbd1 100644
--- gcc/cp/decl.c
+++ gcc/cp/decl.c
@@ -13553,15 +13553,11 @@ grok_op_properties (tree decl, bool complain)
t = TYPE_MAIN_VARIANT (TREE_TYPE (t));
 
   if (VOID_TYPE_P (t))
-   warning_at (loc, OPT_Wconversion,
-   ref
-   ? G_("conversion to a reference to void "
-"will never use a type conversion operator")
-   : G_("conversion to void "
-"will never use a type conversion operator"));
+   warning_at (loc, OPT_Wconversion, "conversion to void "
+   "will never use a type conversion operator");
   else if (class_type)
{
- if (t == class_type)
+ if (same_type_ignoring_top_level_qualifiers_p (t, class_type))
warning_at (loc, OPT_Wconversion,
ref
? G_("conversion to a reference to the same type "
diff --git gcc/testsuite/g++.dg/warn/Wconversion5.C 
gcc/testsuite/g++.dg/warn/Wconversion5.C
index e69de29bb2d..00b1ddab188 100644
--- gcc/testsuite/g++.dg/warn/Wconversion5.C
+++ gcc/testsuite/g++.dg/warn/Wconversion5.C
@@ -0,0 +1,20 @@
+// PR c++/87357
+// { dg-do compile }
+// { dg-options "-Wconversion" }
+
+struct B { };
+
+struct X : public B {
+  operator X(); // { dg-warning "3:conversion to the same type will never use 
a type conversion operator" }
+  operator X&(); // { dg-warning "3:conversion to a reference to the same type 
will never use a type conversion operator" }
+  operator X() const; // { dg-warning "3:conversion to the same type will 
never use a type conversion operator" }
+  operator const X(); // { dg-warning "3:conversion to the same type will 
never use a type conversion operator" }
+
+  operator B(); // { dg-warning "3:conversion to a base class will never use a 
type conversion operator" }
+  operator B&(); // { dg-warning "3:conversion to a reference to a base class 
will never use a type conversion operator" }
+  operator B() const; // { dg-warning "3:conversion to a base class will never 
use a type conversion operator" }
+  operator const B(); // { dg-warning "3:conversion to a base class will never 
use a type conversion operator" }
+
+  operator void(); // { dg-warning "3:conversion to void will never use a type 
conversion operator" }
+  operator void() const; // { dg-warning "3:conversion to void will never use 
a type conversion operator" }
+};


Re: [PATCH] Backport gettext fixes to get rid of warnings on macOS

2018-09-18 Thread Jeff Law
On 9/6/18 4:36 AM, Simon Marchi wrote:
> On 2018-08-01 03:58 PM, Simon Marchi wrote:
>> This patch was tested to build binutils-gdb on GNU/Linux and macOS.  It can 
>> be
>> applied to the gcc repo too, after fixing some trivial merge conflicts 
>> (someone
>> else will need to do it, as I don't have push access to gcc).  Although I 
>> think
>> it is relatively low-risk, building gcc on macOS was not tested with this
>> patch, so if somebody that has already a macOS build can do it, it would be
>> appreciated.
>>
>> Two fixes were committed recently to the gettext repo in order to make
>> gdb build warning-free on macOS.  This patch backports them both:
>>
>>   - Make the format_arg attribute effective also in the case 
>> _INTL_REDIRECT_INLINE.
>> 113893dce80358a4ae0d9463ce73c5670c81cf0c
>> 
>> http://git.savannah.gnu.org/cgit/gettext.git/commit/?id=113893dce80358a4ae0d9463ce73c5670c81cf0c
>>
>>   - Enable the format_arg attribute also on clang on Mac OS X.
>> bd6a52241c7c83c90e043ace2082a2508d273f55
>> 
>> http://git.savannah.gnu.org/cgit/gettext.git/commit/?id=bd6a52241c7c83c90e043ace2082a2508d273f55
>>
>> intl/ChangeLog:
>>
>>  * libgnuintl.h (_INTL_MAY_RETURN_STRING_ARG, gettext, dgettext,
>>  dcgettext, ngettext, dngettext, dcngettext): Backport changes
>>  from upstream gettext.
>> ---
>>  intl/libgnuintl.h | 35 +++
>>  1 file changed, 23 insertions(+), 12 deletions(-)
>>
>> diff --git a/intl/libgnuintl.h b/intl/libgnuintl.h
>> index acc9093..7616d6f 100644
>> --- a/intl/libgnuintl.h
>> +++ b/intl/libgnuintl.h
>> @@ -115,7 +115,7 @@ extern "C" {
>>  /* _INTL_MAY_RETURN_STRING_ARG(n) declares that the given function may 
>> return
>> its n-th argument literally.  This enables GCC to warn for example about
>> printf (gettext ("foo %y")).  */
>> -#if __GNUC__ >= 3 && !(__APPLE_CC__ > 1 && defined __cplusplus)
>> +#if defined __GNUC__ && __GNUC__ >= 3 && !(defined __APPLE_CC__ && 
>> __APPLE_CC__ > 1 && !(defined __clang__ && __clang__ && __clang_major__ >= 
>> 3) && defined __cplusplus)
>>  # define _INTL_MAY_RETURN_STRING_ARG(n) __attribute__ ((__format_arg__ (n)))
>>  #else
>>  # define _INTL_MAY_RETURN_STRING_ARG(n)
>> @@ -127,7 +127,9 @@ extern "C" {
>>  #ifdef _INTL_REDIRECT_INLINE
>>  extern char *libintl_gettext (const char *__msgid)
>> _INTL_MAY_RETURN_STRING_ARG (1);
>> -static inline char *gettext (const char *__msgid)
>> +static inline
>> +_INTL_MAY_RETURN_STRING_ARG (1)
>> +char *gettext (const char *__msgid)
>>  {
>>return libintl_gettext (__msgid);
>>  }
>> @@ -145,7 +147,9 @@ extern char *gettext _INTL_PARAMS ((const char *__msgid))
>>  #ifdef _INTL_REDIRECT_INLINE
>>  extern char *libintl_dgettext (const char *__domainname, const char 
>> *__msgid)
>> _INTL_MAY_RETURN_STRING_ARG (2);
>> -static inline char *dgettext (const char *__domainname, const char *__msgid)
>> +static inline
>> +_INTL_MAY_RETURN_STRING_ARG (2)
>> +char *dgettext (const char *__domainname, const char *__msgid)
>>  {
>>return libintl_dgettext (__domainname, __msgid);
>>  }
>> @@ -165,8 +169,9 @@ extern char *dgettext _INTL_PARAMS ((const char 
>> *__domainname,
>>  extern char *libintl_dcgettext (const char *__domainname, const char 
>> *__msgid,
>>  int __category)
>> _INTL_MAY_RETURN_STRING_ARG (2);
>> -static inline char *dcgettext (const char *__domainname, const char 
>> *__msgid,
>> -   int __category)
>> +static inline
>> +_INTL_MAY_RETURN_STRING_ARG (2)
>> +char *dcgettext (const char *__domainname, const char *__msgid, int 
>> __category)
>>  {
>>return libintl_dcgettext (__domainname, __msgid, __category);
>>  }
>> @@ -188,8 +193,10 @@ extern char *dcgettext _INTL_PARAMS ((const char 
>> *__domainname,
>>  extern char *libintl_ngettext (const char *__msgid1, const char *__msgid2,
>> unsigned long int __n)
>> _INTL_MAY_RETURN_STRING_ARG (1) _INTL_MAY_RETURN_STRING_ARG (2);
>> -static inline char *ngettext (const char *__msgid1, const char *__msgid2,
>> -  unsigned long int __n)
>> +static inline
>> +_INTL_MAY_RETURN_STRING_ARG (1) _INTL_MAY_RETURN_STRING_ARG (2)
>> +char *ngettext (const char *__msgid1, const char *__msgid2,
>> +unsigned long int __n)
>>  {
>>return libintl_ngettext (__msgid1, __msgid2, __n);
>>  }
>> @@ -210,8 +217,10 @@ extern char *ngettext _INTL_PARAMS ((const char 
>> *__msgid1,
>>  extern char *libintl_dngettext (const char *__domainname, const char 
>> *__msgid1,
>>  const char *__msgid2, unsigned long int __n)
>> _INTL_MAY_RETURN_STRING_ARG (2) _INTL_MAY_RETURN_STRING_ARG (3);
>> -static inline char *dngettext (const char *__domainname, const char 
>> *__msgid1,
>> -   const char *__msgid2, unsigned long int __n)
>> +static inline
>> +_INTL_MAY_RETURN_STRING_ARG (2) _INTL_MAY_RETURN_STRING_ARG (

Re: [PATCH] S/390: Fix conditional returns

2018-09-18 Thread Jeff Law
On 9/18/18 8:04 AM, Segher Boessenkool wrote:
> On Wed, Sep 05, 2018 at 10:34:48AM +0200, Ilya Leoshkevich wrote:
>> S/390 epilogue ends with (parallel [(return) (use %r14)]) instead of
>> the more usual (return) or (simple_return).  This sequence is not
>> recognized by the conditional return logic in try_optimize_cfg ().
> 
> Why does it need this?  Other targets with a link register make
> EPILOGUE_USES handle this.
I think because he's trying to optimize a conditional jump to a return
insn into a conditional return insn.  I don't think we do that on other
targets, though I have pondered it from time to time.

Jeff


Re: [PATCH] look harder for MEM_REF operand equality to avoid -Wstringop-truncation (PR 84561)

2018-09-18 Thread Jeff Law
On 9/18/18 1:46 PM, Martin Sebor wrote:
> On 09/18/2018 12:58 PM, Jeff Law wrote:
>> On 9/18/18 11:12 AM, Martin Sebor wrote:
>>
 My bad.  Sigh. CCP doesn't track copies, just constants, so there's not
 going to be any data structure you can exploit.  And I don't think
 there's a value number you can use to determine the two objects are the
 same.

 Hmm, let's back up a bit, what is does the relevant part of the IL look
 like before CCP?  Is the real problem here that we have unpropagated
 copies lying around in the IL?  Hmm, more likely the IL looksl ike:

    _8 = &pb_3(D)->a;
    _9 = _8;
    _1 = _9;
    strncpy (MEM_REF (&pb_3(D)->a), ...);
    MEM[(struct S *)_1].a[n_7] = 0;
>>>
>>> Yes, that is what the folder sees while the strncpy call is
>>> being transformed/folded by ccp.  The MEM_REF is folded just
>>> after the strncpy call and that's when it's transformed into
>>>
>>>   MEM[(struct S *)_8].a[n_7] = 0;
>>>
>>> (The assignments to _1 and _9 don't get removed until after
>>> the dom walk finishes).
>>>

 If we were to propagate the copies out we'd at best have:

    _8 = &pb_3(D)->a;
    strncpy (MEM_REF (&pb_3(D)->a), ...);
    MEM[(struct S *)_8].a[n_7] = 0;


 Is that in a form you can handle?  Or would we also need to forward
 propagate the address computation into the use of _8?
>>>
>>> The above works as long as we look at the def_stmt of _8 in
>>> the MEM_REF (we currently don't).  That's also what the last
>>> iteration of the loop does.  In this case (with _8) it would
>>> be discovered in the first iteration, so the loop could be
>>> replaced by a simple if statement.
>>>
>>> But I'm not sure I understand the concern with the loop.  Is
>>> it that we are looping at all, i.e., the cost?  Or that ccp
>>> is doing something wrong or suboptimal? (Should have
>>> propagated the value of _8 earlier?)
>> I suspect it's more a concern that things like copies are typically
>> propagated away.   So their existence in the IL (and consequently your
>> need to handle them) raises the question "has something else failed to
>> do its job earlier".
>>
>> During which of the CCP passes is this happening?  Can we pull the
>> warning out of the folder (even if that means having a distinct warning
>> pass over the IL?)
> 
> It happens during the third run of the pass.
> 
> The only way to do what you suggest that I could think of is
> to defer the strncpy to memcpy transformation until after
> the warning pass.  That was also my earlier suggestion: defer
> both it and the warning until the tree-ssa-strlen pass (where
> the warning is implemented to begin with -- the folder calls
> into it).
If it's happening that late (CCP3) in general, then ISTM we ought to be
able to get the warning out of the folder.  We just have to pick the
right spot.

warn_restrict runs before fold_all_builtins, but after dom/vrp so we
should have the IL in pretty good shape.  That seems like about the
right time.

I wonder if we could generalize warn_restrict to be a more generic
warning pass over the IL and place it right before fold_builtins.

Jeff
jeff


Re: Transform assertion into optimization hints

2018-09-18 Thread François Dumont

On 09/18/2018 08:11 AM, Marc Glisse wrote:

On Tue, 18 Sep 2018, François Dumont wrote:

If your concern is rather that the condition got evaluated which will 
eventually slow down execution then I need to check generated code. 
Any good link explaining how to have a clear view on the generated 
code ?


This.

For a file like

#include 

void f();
void g(std::map const&m){
  if(m.count(42)>0)__builtin_unreachable();
  f();
}

compiled with g++ file.c -O3 -S -fdump-tree-optimized, I get a
file.c.228t.optimized that contains quite a lot of code, not just a call
to f.


Too bad, and thanks for the tip on how to generate this.



Maybe a 'if __builtin_constant_p(_Condition)' could help if gcc 
doesn't do it itself already.


How would you use that precisely?


I wouldn't, I wasn't totally waken up when I proposed this, it makes no 
sens.




It may be easiest to use a different macro for trivial tests that can go
with __builtin_unreachable and for expensive tests that cannot.

Even if I think that doing this kind of operation in an assert call is 
too much I agree that it makes this change invalid.


I'll see if I need to introduce a macro the day I want to add a 
__builtin_unreachable.


François