[PATCH] Invoke maybe_warn_nonstring_arg for strcpy/stpcpy builtins.

2018-04-11 Thread Andreas Krebbel
c-c++-common/attr-nonstring-3.c fails on IBM Z. The reason appears to be
that we provide builtin implementations for strcpy and stpcpy.  The
warnings currently will only be emitted when expanding these as normal
calls.

Bootstrapped and regression tested on x86_64 and s390x.

Ok?

gcc/ChangeLog:

2018-04-11  Andreas Krebbel  

* builtins.c (expand_builtin_strcpy): Invoke
maybe_warn_nonstring_arg.
(expand_builtin_stpcpy): Likewise.
---
 gcc/builtins.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index ababee5..83bbb70 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3770,6 +3770,12 @@ expand_builtin_strcpy (tree exp, rtx target)
   tree dest = CALL_EXPR_ARG (exp, 0);
   tree src = CALL_EXPR_ARG (exp, 1);
 
+  /* Check to see if the argument was declared attribute nonstring
+ and if so, issue a warning since at this point it's not known
+ to be nul-terminated.  */
+  tree fndecl = get_callee_fndecl (exp);
+  maybe_warn_nonstring_arg (fndecl, exp);
+
   if (warn_stringop_overflow)
 {
   tree destsize = compute_objsize (dest, warn_stringop_overflow - 1);
@@ -3828,6 +3834,12 @@ expand_builtin_stpcpy (tree exp, rtx target, 
machine_mode mode)
   tree len, lenp1;
   rtx ret;
 
+  /* Check to see if the argument was declared attribute nonstring
+and if so, issue a warning since at this point it's not known
+to be nul-terminated.  */
+  tree fndecl = get_callee_fndecl (exp);
+  maybe_warn_nonstring_arg (fndecl, exp);
+
   /* Ensure we get an actual string whose length can be evaluated at
 compile-time, not an expression containing a string.  This is
 because the latter will potentially produce pessimized code
-- 
2.9.1



Re: [PATCH] Invoke maybe_warn_nonstring_arg for strcpy/stpcpy builtins.

2018-04-11 Thread Jakub Jelinek
On Wed, Apr 11, 2018 at 09:48:05AM +0200, Andreas Krebbel wrote:
> c-c++-common/attr-nonstring-3.c fails on IBM Z. The reason appears to be
> that we provide builtin implementations for strcpy and stpcpy.  The
> warnings currently will only be emitted when expanding these as normal
> calls.
> 
> Bootstrapped and regression tested on x86_64 and s390x.
> 
> Ok?
> 
> gcc/ChangeLog:
> 
> 2018-04-11  Andreas Krebbel  
> 
>   * builtins.c (expand_builtin_strcpy): Invoke
>   maybe_warn_nonstring_arg.
>   (expand_builtin_stpcpy): Likewise.

Don't you then warn twice if builtin implementations for strcpy and stpcpy
aren't available or can't be used, once here and once in calls.c?

> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -3770,6 +3770,12 @@ expand_builtin_strcpy (tree exp, rtx target)
>tree dest = CALL_EXPR_ARG (exp, 0);
>tree src = CALL_EXPR_ARG (exp, 1);
>  
> +  /* Check to see if the argument was declared attribute nonstring
> + and if so, issue a warning since at this point it's not known
> + to be nul-terminated.  */
> +  tree fndecl = get_callee_fndecl (exp);
> +  maybe_warn_nonstring_arg (fndecl, exp);
> +
>if (warn_stringop_overflow)
>  {
>tree destsize = compute_objsize (dest, warn_stringop_overflow - 1);
> @@ -3828,6 +3834,12 @@ expand_builtin_stpcpy (tree exp, rtx target, 
> machine_mode mode)
>tree len, lenp1;
>rtx ret;
>  
> +  /* Check to see if the argument was declared attribute nonstring
> +  and if so, issue a warning since at this point it's not known
> +  to be nul-terminated.  */
> +  tree fndecl = get_callee_fndecl (exp);
> +  maybe_warn_nonstring_arg (fndecl, exp);
> +
>/* Ensure we get an actual string whose length can be evaluated at
>compile-time, not an expression containing a string.  This is
>because the latter will potentially produce pessimized code
> -- 
> 2.9.1

Jakub


Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-11 Thread Sameera Deshpande
On 10 April 2018 at 20:07, Sudakshina Das  wrote:
> Hi Sameera
>
>
> On 10/04/18 11:20, Sameera Deshpande wrote:
>>
>> On 7 April 2018 at 01:25, Christophe Lyon 
>> wrote:
>>>
>>> Hi,
>>>
>>> 2018-04-06 12:15 GMT+02:00 Sameera Deshpande
>>> :

 Hi Christophe,

 Please find attached the updated patch with testcases.

 Ok for trunk?
>>>
>>>
>>> Thanks for the update.
>>>
>>> Since the new intrinsics are only available on aarch64, you want to
>>> prevent the tests from running on arm.
>>> Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
>>> targets.
>>> There are several examples on how to do that in that directory.
>>>
>>> I have also noticed that the tests fail at execution on aarch64_be.
>>>
>>> I didn't look at the patch in details.
>>>
>>> Christophe
>>>
>>>

 - Thanks and regards,
Sameera D.

 2017-12-14 22:17 GMT+05:30 Christophe Lyon :
>
> 2017-12-14 9:29 GMT+01:00 Sameera Deshpande
> :
>>
>> Hi!
>>
>> Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
>> vst1_*_x3 intrinsics as defined by Neon document.
>>
>> Ok for trunk?
>>
>> - Thanks and regards,
>>Sameera D.
>>
>> gcc/Changelog:
>>
>> 2017-11-14  Sameera Deshpande  
>>
>>
>>  * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
>>  (st1x2): Likewise.
>>  (st1x3): Likewise.
>>  * config/aarch64/aarch64-simd.md
>> (aarch64_ld1x3): New pattern.
>>  (aarch64_ld1_x3_): Likewise
>>  (aarch64_st1x2): Likewise
>>  (aarch64_st1_x2_): Likewise
>>  (aarch64_st1x3): Likewise
>>  (aarch64_st1_x3_): Likewise
>>  * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
>>  (vld1_s8_x3): Likewise.
>>  (vld1_u16_x3): Likewise.
>>  (vld1_s16_x3): Likewise.
>>  (vld1_u32_x3): Likewise.
>>  (vld1_s32_x3): Likewise.
>>  (vld1_u64_x3): Likewise.
>>  (vld1_s64_x3): Likewise.
>>  (vld1_fp16_x3): Likewise.
>>  (vld1_f32_x3): Likewise.
>>  (vld1_f64_x3): Likewise.
>>  (vld1_p8_x3): Likewise.
>>  (vld1_p16_x3): Likewise.
>>  (vld1_p64_x3): Likewise.
>>  (vld1q_u8_x3): Likewise.
>>  (vld1q_s8_x3): Likewise.
>>  (vld1q_u16_x3): Likewise.
>>  (vld1q_s16_x3): Likewise.
>>  (vld1q_u32_x3): Likewise.
>>  (vld1q_s32_x3): Likewise.
>>  (vld1q_u64_x3): Likewise.
>>  (vld1q_s64_x3): Likewise.
>>  (vld1q_f16_x3): Likewise.
>>  (vld1q_f32_x3): Likewise.
>>  (vld1q_f64_x3): Likewise.
>>  (vld1q_p8_x3): Likewise.
>>  (vld1q_p16_x3): Likewise.
>>  (vld1q_p64_x3): Likewise.
>>  (vst1_s64_x2): Likewise.
>>  (vst1_u64_x2): Likewise.
>>  (vst1_f64_x2):
>> Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3
>>>
>>> patchname=armv8_2-fp16-scalar-2.patch3
>>> refrev=259064
>>> email_to=christophe.l...@linaro.org
>>>
>>  (vst1_s8_x2): Likewise.
>>  (vst1_p8_x2): Likewise.
>>  (vst1_s16_x2): Likewise.
>>  (vst1_p16_x2): Likewise.
>>  (vst1_s32_x2): Likewise.
>>  (vst1_u8_x2): Likewise.
>>  (vst1_u16_x2): Likewise.
>>  (vst1_u32_x2): Likewise.
>>  (vst1_f16_x2): Likewise.
>>  (vst1_f32_x2): Likewise.
>>  (vst1_p64_x2): Likewise.
>>  (vst1q_s8_x2): Likewise.
>>  (vst1q_p8_x2): Likewise.
>>  (vst1q_s16_x2): Likewise.
>>  (vst1q_p16_x2): Likewise.
>>  (vst1q_s32_x2): Likewise.
>>  (vst1q_s64_x2): Likewise.
>>  (vst1q_u8_x2): Likewise.
>>  (vst1q_u16_x2): Likewise.
>>  (vst1q_u32_x2): Likewise.
>>  (vst1q_u64_x2): Likewise.
>>  (vst1q_f16_x2): Likewise.
>>  (vst1q_f32_x2): Likewise.
>>  (vst1q_f64_x2): Likewise.
>>  (vst1q_p64_x2): Likewise.
>>  (vst1_s64_x3): Likewise.
>>  (vst1_u64_x3): Likewise.
>>  (vst1_f64_x3): Likewise.
>>  (vst1_s8_x3): Likewise.
>>  (vst1_p8_x3): Likewise.
>>  (vst1_s16_x3): Likewise.
>>  (vst1_p16_x3): Likewise.
>>  (vst1_s32_x3): Likewise.
>>  (vst1_u8_x3): Likewise.
>>  (vst1_u16_x3): Likewise.
>>  (vst1_u32_x3): Likewise.
>>  (vst1_f16_x3): Likewise.
>>  (vst1_f32_x3): Likewise.
>>  (vst1_p64_x3): Likewise.
>>  (vst1q_s8_x3): Likewise.
>>  (vst1q_p8_x3): Likewise.
>>  (vst1q_s16_x3): Likewise.

Re: [PATCH, GCC/ARM] Fix PR85261: ICE with FPSCR setter builtin

2018-04-11 Thread Kyrill Tkachov

Hi Thomas,

On 09/04/18 15:29, Thomas Preudhomme wrote:

Hi Ramana,

On 06/04/18 17:17, Thomas Preudhomme wrote:
>
>
> On 06/04/18 17:08, Ramana Radhakrishnan wrote:
>> On 06/04/2018 16:54, Thomas Preudhomme wrote:
>>> Instruction pattern for setting the FPSCR expects the input value to be
>>> in a register. However, __builtin_arm_set_fpscr expander does not ensure
>>> that this is the case and as a result GCC ICEs when the builtin is
>>> called with a constant literal.
>>>
>>> This commit fixes the builtin to force the input value into a register.
>>> It also remove the unneeded volatile in the existing fpscr test and
>>> fixes the function prototype.
>>>
>>> ChangeLog entries are as follows:
>>>
>>> *** gcc/ChangeLog ***
>>>
>>> 2018-04-06  Thomas Preud'homme 
>>>
>>> PR target/85261
>>> * config/arm/arm-builtins.c (arm_expand_builtin): Force input operand
>>> into register.
>>>
>>> *** gcc/testsuite/ChangeLog ***
>>>
>>> 2018-04-06  Thomas Preud'homme 
>>>
>>> PR target/85261
>>> * gcc.target/arm/fpscr.c: Add call to __builtin_arm_set_fpscr with
>>> literal value.  Expect 2 MCR instruction. Fix function prototype.
>>> Remove volatile keyword.
>>>
>>> Testing: Built an arm-none-eabi GCC cross-compiler and testsuite shows
>>> no regression.
>>>
>>> Is this ok for stage4?
>>>
>>> Best regards,
>>>
>>> Thomas
>>>
>>
>> (sorry about the duplicate for those who get it)
>>
>>
>> LGTM, though in this case I would prefer a bootstrap and regression run
>> as this is automatically exercised most with gcc.dg/atomic_*.c and you
>> really need this tested on linux than just bare-metal as I'm not sure
>> how this gets tested on arm-none-eabi.
>
> Oh it is indeed. Didn't realized it was used anywhere. Will start bootstrap
> right away.

Done with --with-arch=armv8-a --with-mode=thumb --with-fpu=neon-vfpv4
--with-float=hard --enable-languages=c,c++,fortran --with-system-zlib
--enable-plugins --enable-bootstrap. Testsuite for that GCC does not show any
regression either.

Ok to commit?



Thanks for doing this.
This is ok for trunk.


>
>>
>> What about earlier branches, have you looked ? This is a silly target
>> bug and fixes should go back to older branches in this particular case
>> after baking this on trunk for some time.
>
> GCC 6 and 7 are affected as well and a backport will be done once it has baked
> long enough of course.

Will now bootstrap and regtest against GCC 6 and 7. Will let you know once that
is finished.


Thanks,
Kyrill



Best regards,

Thomas




[og7] backported "[nvptx, PR84041] Add memory_barrier insn"

2018-04-11 Thread Tom de Vries

On 04/09/2018 03:19 PM, Tom de Vries wrote:

Hi,

we've been having hanging OpenMP tests for nvptx offloading: 
for-{3,5,6}.c and the corresponding C++ test-cases.


The failures have now been analyzed down to gomp_ptrlock_get in 
libgomp/config/nvptx/ptrlock.h:

...
  static inline void *gomp_ptrlock_get (gomp_ptrlock_t *ptrlock)
{
   uintptr_t v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);
   if (v > 2)
     return (void *) v;

   if (v == 0
   && __atomic_compare_exchange_n (ptrlock, &v, 1, false,
   MEMMODEL_ACQUIRE,
   MEMMODEL_ACQUIRE))
     return NULL;

   while (v == 1)
     v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);

   return (void *) v;
}
...

There's no atomic load insn defined for nvptx, and also no memory 
barrier insn, so the atomic load ends up generating a normal load. The 
JIT compiler does loop-invariant code motion, and moves the load out of 
the loop, which turns the while into an eternal loop.



Fix conservatively by defining the memory_barrier insn. This can 
possibly be fixed more optimally by implementing an atomic load 
operation in nvptx.


Build x86_64 with nvptx accelerator and reg-tested libgomp.

Committed to stage4 trunk.



And back-ported to og7 branch.

Thanks,
- Tom


0001-nvptx-Add-memory_barrier-insn.patch


[nvptx] Add memory_barrier insn

2018-04-09  Tom de Vries  

PR target/84041
* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_MEMBAR.
(define_expand "*memory_barrier"): New define_expand.
(define_insn "memory_barrier"): New insn.

---
  gcc/config/nvptx/nvptx.md | 22 ++
  1 file changed, 22 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4f4453d..68bba36 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -55,6 +55,7 @@
 UNSPECV_CAS
 UNSPECV_XCHG
 UNSPECV_BARSYNC
+   UNSPECV_MEMBAR
 UNSPECV_DIM_POS
  
 UNSPECV_FORK

@@ -1459,6 +1460,27 @@
"\\tbar.sync\\t%0;"
[(set_attr "predicable" "false")])
  
+(define_expand "memory_barrier"

+  [(set (match_dup 0)
+   (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
+  ""
+{
+  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[0]) = 1;
+})
+
+;; Ptx defines the memory barriers membar.cta, membar.gl and membar.sys
+;; (corresponding to cuda functions threadfence_block, threadfence and
+;; threadfence_system).  For the insn memory_barrier we use membar.sys.  This
+;; may be overconservative, but before using membar.gl instead we'll need to
+;; explain in detail why it's safe to use.  For now, use membar.sys.
+(define_insn "*memory_barrier"
+  [(set (match_operand:BLK 0 "" "")
+   (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
+  ""
+  "\\tmembar.sys;"
+  [(set_attr "predicable" "false")])
+
  (define_insn "nvptx_nounroll"
[(unspec_volatile [(const_int 0)] UNSPECV_NOUNROLL)]
""





Re: [nvptx] propagating conditionals in worker-vector partitioned loops

2018-04-11 Thread Tom de Vries

On 10/27/2016 12:29 AM, Cesar Philippidis wrote:

Currently, the nvptx backend is only neutering the worker axis when
propagating variables used in conditional expressions across the worker
and vector axes. That's a problem with the worker-state spill and fill
propagation implementation because all of the vector threads in worker 0
all write the the same address location being spilled. As the attached
test case demonstrates, this might cause an infinite loop depending on
the values in the vector threads being propagated.

This patch fixes this issue by introducing a new worker-vector
predicate, so that both the worker and vector threads can be predicated
together, not separately. I.e., instead of first neutering worker axis,
then neutering the vector axis, this patch uses a single predicate for
tid.x == 0 && tid.y == 0.

Is this patch ok for trunk?


Hi Cesar,

Please, when encountering a bug on trunk or release branch always file a PR.

I accidentally found this bug recently, filed it as PR85204 - "[nvptx] 
infinite loop generated", and then fixed it here: 
https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00232.html .


The patch you propose is not correct because it introduces a diverging 
branch marked with .uni.


Thanks,
- Tom


Re: [PATCH] Improve IPA-CP handling of self-recursive calls

2018-04-11 Thread Jan Hubicka


2018-04-08  Martin Jambor  

PR ipa/84149
* ipa-cp.c (propagate_vals_across_pass_through): Expand comment.
(cgraph_edge_brings_value_p): New parameter dest_val, check if it is
not the same as the source val.
(cgraph_edge_brings_value_p): New parameter.
(gather_edges_for_value): Pass destination value to
cgraph_edge_brings_value_p.
(perhaps_add_new_callers): Likewise.
(get_info_about_necessary_edges): Likewise and exclude values brought
only by self-recursive edges.
(create_specialized_node): Redirect only clones of self-calling edges.
(+self_recursive_pass_through_p): New function.
(find_more_scalar_values_for_callers_subset): Use it.
(find_aggregate_values_for_callers_subset): Likewise.
(known_aggs_to_agg_replacement_list): Removed.
(decide_whether_version_node): Re-calculate known constants for all
remaining context clones.


OK.
thanks!
Honza



[PATCH] Clean up attribute value comparison in lto-symtab.c.

2018-04-11 Thread Martin Liška
Hi.

This is a small clean-up which Jakub suggested.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Martin


gcc/lto/ChangeLog:

2018-04-11  Martin Liska  

* lto-symtab.c (lto_symtab_merge_p): Use attribute_value_equal
function.
---
 gcc/lto/lto-symtab.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)


diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c
index 37c4f45eb0b..2660542300e 100644
--- a/gcc/lto/lto-symtab.c
+++ b/gcc/lto/lto-symtab.c
@@ -580,9 +580,7 @@ lto_symtab_merge_p (tree prevailing, tree decl)
   tree prev_attr = lookup_attribute ("error", DECL_ATTRIBUTES (prevailing));
   tree attr = lookup_attribute ("error", DECL_ATTRIBUTES (decl));
   if ((prev_attr == NULL) != (attr == NULL)
-	  || (prev_attr
-	  && TREE_VALUE (TREE_VALUE (prev_attr))
-		 != TREE_VALUE (TREE_VALUE (attr
+	  || (prev_attr && !attribute_value_equal (prev_attr, attr)))
 	{
   if (symtab->dump_file)
 	fprintf (symtab->dump_file, "Not merging decls; "
@@ -593,9 +591,7 @@ lto_symtab_merge_p (tree prevailing, tree decl)
   prev_attr = lookup_attribute ("warning", DECL_ATTRIBUTES (prevailing));
   attr = lookup_attribute ("warning", DECL_ATTRIBUTES (decl));
   if ((prev_attr == NULL) != (attr == NULL)
-	  || (prev_attr
-	  && TREE_VALUE (TREE_VALUE (prev_attr))
-		 != TREE_VALUE (TREE_VALUE (attr
+	  || (prev_attr && !attribute_value_equal (prev_attr, attr)))
 	{
   if (symtab->dump_file)
 	fprintf (symtab->dump_file, "Not merging decls; "



Re: [PATCH] Clean up attribute value comparison in lto-symtab.c.

2018-04-11 Thread Jakub Jelinek
On Wed, Apr 11, 2018 at 11:26:26AM +0200, Martin Liška wrote:
> This is a small clean-up which Jakub suggested.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> 
> gcc/lto/ChangeLog:
> 
> 2018-04-11  Martin Liska  
> 
>   * lto-symtab.c (lto_symtab_merge_p): Use attribute_value_equal
>   function.

Ok, thanks.

Jakub


Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2018-04-11 Thread Tom de Vries

On 01/30/2017 07:54 PM, Torvald Riegel wrote:

This patch fixes the __atomic builtins to not implement supposedly
lock-free atomic loads based on just a compare-and-swap operation.


Hi,

The internals doc still lists CAS ( 
https://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html#index-atomic_005floadmode-instruction-pattern 
):

...
‘atomic_loadmode’

This pattern implements an atomic load operation with memory model 
semantics. Operand 1 is the memory address being loaded from. Operand 0 
is the result of the load. Operand 2 is the memory model to be used for 
the load operation.


If not present, the __atomic_load built-in function will either 
resort to a normal load with memory barriers, or a compare-and-swap 
operation if a normal load would not be atomic.

...

Thanks,
- Tom


Re: [PATCH PR85190]Adjust pointer for aligned access

2018-04-11 Thread Richard Biener
On Tue, Apr 10, 2018 at 6:28 PM, Bin.Cheng  wrote:
> On Tue, Apr 10, 2018 at 3:58 PM, Bin.Cheng  wrote:
>> On Tue, Apr 10, 2018 at 2:26 PM, Jakub Jelinek  wrote:
>>> On Tue, Apr 10, 2018 at 09:55:35AM +, Bin Cheng wrote:
 Hi Rainer, could you please help me double check that this solves the 
 issue?

 Thanks,
 bin

 gcc/testsuite
 2018-04-10  Bin Cheng  

   PR testsuite/85190
   * gcc.dg/vect/pr81196.c: Adjust pointer for aligned access.
>>>
 diff --git a/gcc/testsuite/gcc.dg/vect/pr81196.c 
 b/gcc/testsuite/gcc.dg/vect/pr81196.c
 index 46d7a9e..15320ae 100644
 --- a/gcc/testsuite/gcc.dg/vect/pr81196.c
 +++ b/gcc/testsuite/gcc.dg/vect/pr81196.c
 @@ -4,14 +4,14 @@

  void f(short*p){
p=(short*)__builtin_assume_aligned(p,64);
 -  short*q=p+256;
 +  short*q=p+255;
for(;p!=q;++p,--q){
  short t=*p;*p=*q;*q=t;
>>>
>>> This is UB then though, because p will never be equal to q.
>
> Hmm, though it's UB in this case, is it OK for niter analysis gives
> below results?
>
> Analyzing # of iterations of loop 1
>   exit condition [126, + , 18446744073709551615] != 0
>   bounds on difference of bases: -126 ... -126
>   result:
> # of iterations 126, bounded by 126
>
> I don't really follow last piece of code in number_of_iterations_ne:
>
>   /* Let nsd (step, size of mode) = d.  If d does not divide c, the loop
>  is infinite.  Otherwise, the number of iterations is
>  (inverse(s/d) * (c/d)) mod (size of mode/d).  */
>   bits = num_ending_zeros (s);
>   bound = build_low_bits_mask (niter_type,
>(TYPE_PRECISION (niter_type)
> - tree_to_uhwi (bits)));
>
>   d = fold_binary_to_constant (LSHIFT_EXPR, niter_type,
>build_int_cst (niter_type, 1), bits);
>   s = fold_binary_to_constant (RSHIFT_EXPR, niter_type, s, bits);
>
>   if (!exit_must_be_taken)
> {
>   /* If we cannot assume that the exit is taken eventually, record the
>  assumptions for divisibility of c.  */
>   assumption = fold_build2 (FLOOR_MOD_EXPR, niter_type, c, d);
>   assumption = fold_build2 (EQ_EXPR, boolean_type_node,
> assumption, build_int_cst (niter_type, 0));
>   if (!integer_nonzerop (assumption))
> niter->assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
>   niter->assumptions, assumption);
> }
>
>   c = fold_build2 (EXACT_DIV_EXPR, niter_type, c, d);
>   tmp = fold_build2 (MULT_EXPR, niter_type, c, inverse (s, bound));
>   niter->niter = fold_build2 (BIT_AND_EXPR, niter_type, tmp, bound);
>   return true;
>
> Though infinite niters is mentioned, I don't see it's handled?

number_of_iterations_ne_max computes this it seems based on the
fact that pointer overflow is undefined.  This means that 126 is
as good as any other number given the testcase is undefined...

Richard.

> Thanks,
> bin
>> Sorry I already checked in, will try to correct it in another patch.
>>
>> Thanks,
>> bin
>>>
}
  }
  void b(short*p){
p=(short*)__builtin_assume_aligned(p,64);
 -  short*q=p+256;
 +  short*q=p+255;
for(;p>>>  short t=*p;*p=*q;*q=t;
>>>
>>> This one is fine, sure.
>>>
>>> Jakub


Re: [PATCH, GCC/ARM] Fix PR85203: cmse_nonsecure_caller returns wrong result

2018-04-11 Thread Thomas Preudhomme

Hi Kyrill,

One week went by so I've committed the change to GCC 7 as announced.

Best regards,

Thomas

On 05/04/18 16:36, Kyrill Tkachov wrote:


On 05/04/18 16:13, Thomas Preudhomme wrote:

Hi Kyrill,

On 04/04/18 18:20, Thomas Preudhomme wrote:

Hi Kyrill,

On 04/04/18 18:19, Kyrill Tkachov wrote:

Hi Thomas,

On 04/04/18 18:03, Thomas Preudhomme wrote:

Hi,

__builtin_cmse_nonsecure_caller implementation returns true in almost
all cases due to 2 separate bugs:

* gen_addsi is used instead of gen_andsi to retrieve the lsb
* the lsb boolean value is not negated but the specification [1] says
   the intrinsic should return true for a nonsecure caller and a
   nonsecure caller is characterized with LR's lsb being 0

This was not caught due to (1) lack of runtime test and (2) the existing
RTL scan not taking into account that '.' matches newline in Tcl regular
expressions.

This patch fixes the implementation issues and improves testing of
cmse_nonsecure_caller by (1) adding a runtime test for the secure caller
case and (2) looking for an SET insn of an AND expression in the right
function. This leaves the nonsecure caller case only partly tested
since the exact value being AND and the negation are not covered by the
scan and the existing test infrastructure does not allow 2 separate
compilation and link to be performed. It is enough though to catch the
current incorrect behavior.

The patch also reorganize the scan directives in cmse-1.c to more easily
identify what function they are intended to test in the file.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2018-04-04  Thomas Preud'homme 

    PR target/85203
    * config/arm/arm-builtins.c (arm_expand_builtin): Change
    expansion to perform a bitwise AND of the argument followed by a
    boolean negation of the result.

*** gcc/testsuite/ChangeLog ***

2018-04-04  Thomas Preud'homme 

    PR target/85203
    * gcc.target/arm/cmse/cmse-1.c: Tighten cmse_nonsecure_caller RTL scan
    to match a single insn of the baz function.  Move scan directives at
    the end of the file below the functions they are trying to test for
    better readability.
    * gcc.target/arm/cmse/cmse-16.c: New testcase.

Testing: No bootstrap since only M profile builtin code has been changed
but regression testing for arm-none-eabi targeting Arm Cortex-M23 and
Cortex-M33 shows no regression.

Is this ok for stage4?



Ok, thanks for fixing this.
Does this need backporting to the branches?


Yes to gcc-7-branch only.


The patch applies cleanly on gcc-7-branch and the same testing shows no 
regression. Ok to apply to gcc-7-branch once the patch has baked for 7 days in 
trunk?



Yes, thanks.
Kyrill


Best regards,

Thomas




Re: [PATCH] Fix -gsplit-dwarf ICE (PR debug/85302)

2018-04-11 Thread Richard Biener
On Tue, 10 Apr 2018, Jakub Jelinek wrote:

> Hi!
> 
> The r257510 change broke -gsplit-dwarf support by introducing a circular
> dependency.  Before that revision index_location_lists used to do:
> /* Don't index an entry that has already been indexed
>or won't be output.  */
> if (curr->begin_entry != NULL
> || (strcmp (curr->begin, curr->end) == 0 && !curr->force))
>   continue;
> r257510 introduced a function which does that
> (strcmp (curr->begin, curr->end) == 0 && !curr->force)
> part extended for LVUs, but also calls size_of_locs.  In dwarf4 and earlier
> we really can't emit location expressions >= 64KB in size, so we just
> ignored those during output and the helper function used by the output and
> now this spot uses that too.  The problem is that size_of_locs needs
> (in some cases) the split dwarf address table indexes to be computed (the
> indexes are uleb128s and thus need to be sized accurately), but at the point
> index_location_lists is called, those aren't computed and can't easily be,
> because that very function adds new address table entries and the current
> logic requires that once indexes are assigned the hash table is immutable,
> during output we assert we go through the same indexes as were assigned.
> In theory we could introduce another hash table to hold the new table
> entries that didn't have indexes assigned yet, but given that >= 64KB locs
> are extremely rare, I think it is just wasted effort.
> 
> This patch just makes sure we create address table entry without computing
> the size_of_locs and so rarely could add an entry nothing will really use;
> it is just an address though, so not a big deal IMHO.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2018-04-10  Jakub Jelinek  
> 
>   PR debug/85302
>   * dwarf2out.c (skip_loc_list_entry): Don't call size_of_locs if
>   SIZEP is NULL.
>   (output_loc_list): Pass address of a dummy size variable even in the
>   locview handling loop.
>   (index_location_lists): Add comment on why skip_loc_list_entry can't
>   call size_of_locs.
> 
>   * g++.dg/debug/dwarf2/pr85302.C: New test.
> 
> --- gcc/dwarf2out.c.jj2018-04-06 19:28:24.0 +0200
> +++ gcc/dwarf2out.c   2018-04-10 10:21:21.352384405 +0200
> @@ -10032,18 +10032,22 @@ maybe_gen_llsym (dw_loc_list_ref list)
>gen_llsym (list);
>  }
>  
> -/* Determine whether or not to skip loc_list entry CURR.  If we're not
> +/* Determine whether or not to skip loc_list entry CURR.  If SIZEP is
> +   NULL, don't consider size of the location expression.  If we're not
> to skip it, and SIZEP is non-null, store the size of CURR->expr's
> representation in *SIZEP.  */
>  
>  static bool
> -skip_loc_list_entry (dw_loc_list_ref curr, unsigned long *sizep = 0)
> +skip_loc_list_entry (dw_loc_list_ref curr, unsigned long *sizep = NULL)
>  {
>/* Don't output an entry that starts and ends at the same address.  */
>if (strcmp (curr->begin, curr->end) == 0
>&& curr->vbegin == curr->vend && !curr->force)
>  return true;
>  
> +  if (!sizep)
> +return false;
> +
>unsigned long size = size_of_locs (curr->expr);
>  
>/* If the expression is too large, drop it on the floor.  We could
> @@ -10053,8 +10057,7 @@ skip_loc_list_entry (dw_loc_list_ref cur
>if (dwarf_version < 5 && size > 0x)
>  return true;
>  
> -  if (sizep)
> -*sizep = size;
> +  *sizep = size;
>  
>return false;
>  }
> @@ -10121,7 +10124,9 @@ output_loc_list (dw_loc_list_ref list_he
>for (dw_loc_list_ref curr = list_head; curr != NULL;
>  curr = curr->dw_loc_next)
>   {
> -   if (skip_loc_list_entry (curr))
> +   unsigned long size;
> +
> +   if (skip_loc_list_entry (curr, &size))
>   continue;
>  
> vcount++;
> @@ -30887,7 +30892,14 @@ index_location_lists (dw_die_ref die)
>  for (curr = list; curr != NULL; curr = curr->dw_loc_next)
>{
>  /* Don't index an entry that has already been indexed
> -   or won't be output.  */
> +or won't be output.  Make sure skip_loc_list_entry doesn't
> +call size_of_locs, because that might cause circular dependency,
> +index_location_lists requiring address table indexes to be
> +computed, but adding new indexes through add_addr_table_entry
> +and address table index computation requiring no new additions
> +to the hash table.  In the rare case of DWARF[234] >= 64KB
> +location expression, we'll just waste unused address table entry
> +for it.  */
>  if (curr->begin_entry != NULL
>  || skip_loc_list_entry (curr))
>continue;
> --- gcc/testsuite/g++.dg/debug/dwarf2/pr85302.C.jj2018-04-10 
> 10:29:22.994385218 +0200
> +++ gcc/testsuite/g++.dg/debug/dwarf2/pr8530

Re: [PATCH] sched-rgn: run add_branch_dependencies for sel-sched (PR 84301)

2018-04-11 Thread Andrey Belevantsev
On 10.04.2018 14:09, Alexander Monakov wrote:
> Hi,
> 
> The add_branch_dependencies function is fairly unusual in that it creates
> dependence edges "out of thin air" for all sorts of instructions preceding
> BB end. I think that is really unfortunate (explicit barriers in RTL would
> be more natural), but I've already complained about that in the PR.
> 
> The bug/regression is that this function was not run for sel-sched, but the
> testcase uncovers that moving a USE away from the return insn can break
> assumptions in mode-switching code.
> 
> Solve this by running the first part of add_branch_dependencies where it
> sets CANT_MOVE flags on immovable non-branch insns.
> 
> Bootstrapped/regtested on x86_64 with sel-sched active. OK to apply?

Looks fine to me but I cannot approve -- maybe Vladimir can take a look?

Andrey

> 
> Alexander
> 
> 
>   PR target/84301
>   * sched-rgn.c (add_branch_dependences): Move sel_sched_p check here...
>   (compute_block_dependences): ... from here.
> 
>   * gcc.target/i386/pr84301.c: New test.
> 
> diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
> index 8c3a740b70e..3c67fccb9b1 100644
> --- a/gcc/sched-rgn.c
> +++ b/gcc/sched-rgn.c
> @@ -2497,6 +2497,11 @@ add_branch_dependences (rtx_insn *head, rtx_insn *tail)
>while (insn != head && DEBUG_INSN_P (insn));
>  }
>  
> +  /* Selective scheduling handles control dependencies by itself, and
> + CANT_MOVE flags ensure that other insns will be kept in place.  */
> +  if (sel_sched_p ())
> +return;
> +
>/* Make sure these insns are scheduled last in their block.  */
>insn = last;
>if (insn != 0)
> @@ -2725,9 +2730,7 @@ compute_block_dependences (int bb)
>  
>sched_analyze (&tmp_deps, head, tail);
>  
> -  /* Selective scheduling handles control dependencies by itself.  */
> -  if (!sel_sched_p ())
> -add_branch_dependences (head, tail);
> +  add_branch_dependences (head, tail);
>  
>if (current_nr_blocks > 1)
>  propagate_deps (bb, &tmp_deps);
> diff --git a/gcc/testsuite/gcc.target/i386/pr84301.c 
> b/gcc/testsuite/gcc.target/i386/pr84301.c
> index e69de29bb2d..f1708b8ea6c 100644
> --- a/gcc/testsuite/gcc.target/i386/pr84301.c
> +++ b/gcc/testsuite/gcc.target/i386/pr84301.c
> @@ -0,0 +1,15 @@
> +/* PR target/84301 */
> +/* { dg-do compile } */
> +/* { dg-options "-march=bdver1 -O1 -fexpensive-optimizations 
> -fschedule-insns -fselective-scheduling -fno-dce -fno-tree-dce --param 
> max-pending-list-length=0 --param selsched-max-lookahead=2" } */
> +
> +int lr;
> +long int xl;
> +
> +int
> +v4 (void)
> +{
> +  int mp;
> +
> +  ++xl;
> +  mp = (lr - xl) > 1;
> +}
> 



Re: [PATCH] sched-deps: respect deps->readonly in macro-fusion (PR 84566)

2018-04-11 Thread Andrey Belevantsev
On 10.04.2018 13:40, Alexander Monakov wrote:
> Hi,
> 
> this fixes a simple "regression" under the qsort_chk umbrella: sched-deps
> analysis has deps->readonly flag, but macro-fusion code does not respect it
> and mutates instructions. This breaks an assumption in sel_rank_for_schedule
> and manifests as qsort checking error.
> 
> Since sched_macro_fuse_insns is only called to set SCHED_GROUP_P on suitable
> insns, guard the call with !deps->readonly.
> 
> Bootstrapped/regtested on x86_64 with sel-sched active and
> --with-cpu=sandybridge to exercise macro-fusion code and verified on aarch64
> cross-compiler that the failing testcase given in the PR is fixed.
> 
> OK to apply?

Fine with me but you need a scheduler maintainer approval.

Andrey

> 
> Thanks.
> Alexander
> 
>   PR rtl-optimization/84566
>   * sched-deps.c (sched_analyze_insn): Check deps->readonly when invoking
>   sched_macro_fuse_insns.
> 
> diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
> index 9a5cbebea40..120b5f0ddc1 100644
> --- a/gcc/sched-deps.c
> +++ b/gcc/sched-deps.c
> @@ -2897,7 +2897,8 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, 
> rtx_insn *insn)
>&& code == SET);
>  
>/* Group compare and branch insns for macro-fusion.  */
> -  if (targetm.sched.macro_fusion_p
> +  if (!deps->readonly
> +  && targetm.sched.macro_fusion_p
>&& targetm.sched.macro_fusion_p ())
>  sched_macro_fuse_insns (insn);
>  
> 



Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-11 Thread Sudakshina Das

Hi Sameera

On 11/04/18 09:04, Sameera Deshpande wrote:

On 10 April 2018 at 20:07, Sudakshina Das  wrote:

Hi Sameera


On 10/04/18 11:20, Sameera Deshpande wrote:


On 7 April 2018 at 01:25, Christophe Lyon 
wrote:


Hi,

2018-04-06 12:15 GMT+02:00 Sameera Deshpande
:


Hi Christophe,

Please find attached the updated patch with testcases.

Ok for trunk?



Thanks for the update.

Since the new intrinsics are only available on aarch64, you want to
prevent the tests from running on arm.
Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
targets.
There are several examples on how to do that in that directory.

I have also noticed that the tests fail at execution on aarch64_be.

I didn't look at the patch in details.

Christophe




- Thanks and regards,
Sameera D.

2017-12-14 22:17 GMT+05:30 Christophe Lyon :


2017-12-14 9:29 GMT+01:00 Sameera Deshpande
:


Hi!

Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
vst1_*_x3 intrinsics as defined by Neon document.

Ok for trunk?

- Thanks and regards,
Sameera D.

gcc/Changelog:

2017-11-14  Sameera Deshpande  


  * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
  (st1x2): Likewise.
  (st1x3): Likewise.
  * config/aarch64/aarch64-simd.md
(aarch64_ld1x3): New pattern.
  (aarch64_ld1_x3_): Likewise
  (aarch64_st1x2): Likewise
  (aarch64_st1_x2_): Likewise
  (aarch64_st1x3): Likewise
  (aarch64_st1_x3_): Likewise
  * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
  (vld1_s8_x3): Likewise.
  (vld1_u16_x3): Likewise.
  (vld1_s16_x3): Likewise.
  (vld1_u32_x3): Likewise.
  (vld1_s32_x3): Likewise.
  (vld1_u64_x3): Likewise.
  (vld1_s64_x3): Likewise.
  (vld1_fp16_x3): Likewise.
  (vld1_f32_x3): Likewise.
  (vld1_f64_x3): Likewise.
  (vld1_p8_x3): Likewise.
  (vld1_p16_x3): Likewise.
  (vld1_p64_x3): Likewise.
  (vld1q_u8_x3): Likewise.
  (vld1q_s8_x3): Likewise.
  (vld1q_u16_x3): Likewise.
  (vld1q_s16_x3): Likewise.
  (vld1q_u32_x3): Likewise.
  (vld1q_s32_x3): Likewise.
  (vld1q_u64_x3): Likewise.
  (vld1q_s64_x3): Likewise.
  (vld1q_f16_x3): Likewise.
  (vld1q_f32_x3): Likewise.
  (vld1q_f64_x3): Likewise.
  (vld1q_p8_x3): Likewise.
  (vld1q_p16_x3): Likewise.
  (vld1q_p64_x3): Likewise.
  (vst1_s64_x2): Likewise.
  (vst1_u64_x2): Likewise.
  (vst1_f64_x2):
Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3


patchname=armv8_2-fp16-scalar-2.patch3
refrev=259064
email_to=christophe.l...@linaro.org


  (vst1_s8_x2): Likewise.
  (vst1_p8_x2): Likewise.
  (vst1_s16_x2): Likewise.
  (vst1_p16_x2): Likewise.
  (vst1_s32_x2): Likewise.
  (vst1_u8_x2): Likewise.
  (vst1_u16_x2): Likewise.
  (vst1_u32_x2): Likewise.
  (vst1_f16_x2): Likewise.
  (vst1_f32_x2): Likewise.
  (vst1_p64_x2): Likewise.
  (vst1q_s8_x2): Likewise.
  (vst1q_p8_x2): Likewise.
  (vst1q_s16_x2): Likewise.
  (vst1q_p16_x2): Likewise.
  (vst1q_s32_x2): Likewise.
  (vst1q_s64_x2): Likewise.
  (vst1q_u8_x2): Likewise.
  (vst1q_u16_x2): Likewise.
  (vst1q_u32_x2): Likewise.
  (vst1q_u64_x2): Likewise.
  (vst1q_f16_x2): Likewise.
  (vst1q_f32_x2): Likewise.
  (vst1q_f64_x2): Likewise.
  (vst1q_p64_x2): Likewise.
  (vst1_s64_x3): Likewise.
  (vst1_u64_x3): Likewise.
  (vst1_f64_x3): Likewise.
  (vst1_s8_x3): Likewise.
  (vst1_p8_x3): Likewise.
  (vst1_s16_x3): Likewise.
  (vst1_p16_x3): Likewise.
  (vst1_s32_x3): Likewise.
  (vst1_u8_x3): Likewise.
  (vst1_u16_x3): Likewise.
  (vst1_u32_x3): Likewise.
  (vst1_f16_x3): Likewise.
  (vst1_f32_x3): Likewise.
  (vst1_p64_x3): Likewise.
  (vst1q_s8_x3): Likewise.
  (vst1q_p8_x3): Likewise.
  (vst1q_s16_x3): Likewise.
  (vst1q_p16_x3): Likewise.
  (vst1q_s32_x3): Likewise.
  (vst1q_s64_x3): Likewise.
  (vst1q_u8_x3): Likewise.
  (vst1q_u16_x3): Likewise.
  (vst1q_u32_x3): Likewise.
  (vst1q_u64_x3): Likewise.
  (vst1q_f16_x3): Likewise.
  (vst1q_f32_x3): Likewise.
  (vst1q_f64_x3): Likewise.
  (vst1q_p64_x3): Likewise.



Hi,
I'm not a maintainer, but I suspect you should add some tests.

Christophe





--
- Thanks and regards,
Sameera D.



Hi Christophe,

Please find attached the updated patch. Similar to the testcase
vld1x2.c, I have updated the testcases to mark them XFAIL for ARM, as
the intrinsics are not implemented yet. I

Re: [wwwdocs] Document libstdc++ changes in GCC 8

2018-04-11 Thread Jonathan Wakely
On 11 April 2018 at 07:42, Bernd Edlinger wrote:
>> Let me know if I've forgotten anything we should document.
>
> Not your fault, but -Wclass-memaccess comes rather often and is not in the 
> changes.

And not a libstdc++ change :-)

Thee are lots of things missing, I'm only adding what I maintain.


[og7] Backport "[nvptx] Fix neutering of bb with only cond jump"

2018-04-11 Thread Tom de Vries
[ was: Re: [gomp4] propagating conditionals in worker-vector partitioned 
loops ]


On 10/28/2016 06:33 PM, Cesar Philippidis wrote:

I've applied the patch to gomp-4_0-branch to correct an issue involving
the propagation of variables used in conditional expressions to worker
and vector partitioned loops. More details regarding this patch can be
found here


I've reverted this patch on og7, and backported the fix for PR85204.

Thanks,
- Tom
Backport "[nvptx] Fix neutering of bb with only cond jump"

2018-04-11  Tom de Vries  

	backport from trunk:
	2018-04-05  Tom de Vries  

	PR target/85204
	* config/nvptx/nvptx.c (nvptx_single): Fix neutering of bb with only
	cond jump.

	revert:
	2016-10-28  Cesar Philippidis  

	* config/nvptx/nvptx.c (nvptx_single): Use a single predicate
	for loops partitioned across both worker and vector axes.

---
 gcc/config/nvptx/nvptx.c | 35 ++-
 1 file changed, 6 insertions(+), 29 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index cd89d17..3c48c14 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4196,38 +4196,12 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
   /* Insert the vector test inside the worker test.  */
   unsigned mode;
   rtx_insn *before = tail;
-  rtx wvpred = NULL_RTX;
-  bool skip_vector = false;
-
-  /* Create a single predicate for loops containing both worker and
- vectors.  */
-  if (cond_branch
-  && (GOMP_DIM_MASK (GOMP_DIM_WORKER) & mask)
-  && (GOMP_DIM_MASK (GOMP_DIM_VECTOR) & mask))
-{
-  rtx regx = gen_reg_rtx (SImode);
-  rtx regy = gen_reg_rtx (SImode);
-  rtx tmp = gen_reg_rtx (SImode);
-  wvpred = gen_reg_rtx (BImode);
-
-  emit_insn_before (gen_oacc_dim_pos (regx, const1_rtx), head);
-  emit_insn_before (gen_oacc_dim_pos (regy, const2_rtx), head);
-  emit_insn_before (gen_rtx_SET (tmp, gen_rtx_IOR (SImode, regx, regy)),
-			head);
-  emit_insn_before (gen_rtx_SET (wvpred, gen_rtx_NE (BImode, tmp,
-			 const0_rtx)),
-			head);
-
-  skip_mask &= ~(GOMP_DIM_MASK (GOMP_DIM_VECTOR));
-  skip_vector = true;
-}
-
+  rtx_insn *neuter_start = NULL;
   for (mode = GOMP_DIM_WORKER; mode <= GOMP_DIM_VECTOR; mode++)
 if (GOMP_DIM_MASK (mode) & skip_mask)
   {
 	rtx_code_label *label = gen_label_rtx ();
-	rtx pred = skip_vector ? wvpred
-	  : cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER];
+	rtx pred = cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER];
 
 	if (!pred)
 	  {
@@ -4240,7 +4214,10 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
 	  br = gen_br_true (pred, label);
 	else
 	  br = gen_br_true_uni (pred, label);
-	emit_insn_before (br, head);
+	if (neuter_start)
+	  neuter_start = emit_insn_after (br, neuter_start);
+	else
+	  neuter_start = emit_insn_before (br, head);
 
 	LABEL_NUSES (label)++;
 	if (tail_branch)


[PATCH] Use --push-state --as-needed and --pop-state instead of --as-needed and --no-as-needed for libgcc

2018-04-11 Thread Jakub Jelinek
Hi!

As discussed, using --as-needed and --no-as-needed is dangerous, because
it results in --no-as-needed even for libraries after -lgcc_s, even when the
default is --as-needed or --as-needed has been specified earlier on the
command line.

If the linker supports --push-state/--pop-state, we should IMHO use it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for stage1?

Or is this something we want in GCC8 too?

2018-04-11  Jakub Jelinek  

* configure.ac (LD_AS_NEEDED_OPTION, LD_NO_AS_NEEDED_OPTION): Use
--push-state --as-needed and --pop-state instead of --as-needed and
--no-as-needed if ld supports it.
* configure: Regenerated.

--- gcc/configure.ac.jj 2018-03-21 21:18:32.470351282 +0100
+++ gcc/configure.ac2018-04-10 13:31:25.448060053 +0200
@@ -5517,11 +5517,21 @@ if test $in_tree_ld = yes ; then
   if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 
16 -o "$gcc_cv_gld_major_version" -gt 2 \
  && test $in_tree_ld_is_elf = yes; then
 gcc_cv_ld_as_needed=yes
+if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" 
-ge 28; then
+  gcc_cv_ld_as_needed_option='--push-state --as-needed'
+  gcc_cv_ld_no_as_needed_option='--pop-state'
+fi
   fi
 elif test x$gcc_cv_ld != x; then
   # Check if linker supports --as-needed and --no-as-needed options
   if $gcc_cv_ld --help 2>&1 | grep as-needed > /dev/null; then
 gcc_cv_ld_as_needed=yes
+if $gcc_cv_ld --help 2>&1 | grep push-state > /dev/null; then
+  if $gcc_cv_ld --help 2>&1 | grep pop-state > /dev/null; then
+   gcc_cv_ld_as_needed_option='--push-state --as-needed'
+   gcc_cv_ld_no_as_needed_option='--pop-state'
+  fi
+fi
   fi
   case "$target:$gnu_ld" in
 *-*-solaris2*:no)
--- gcc/configure.jj2018-03-21 21:18:30.187351579 +0100
+++ gcc/configure   2018-04-10 13:47:57.652298798 +0200
@@ -28733,11 +28733,21 @@ if test $in_tree_ld = yes ; then
   if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 
16 -o "$gcc_cv_gld_major_version" -gt 2 \
  && test $in_tree_ld_is_elf = yes; then
 gcc_cv_ld_as_needed=yes
+if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" 
-ge 28; then
+  gcc_cv_ld_as_needed_option='--push-state --as-needed'
+  gcc_cv_ld_no_as_needed_option='--pop-state'
+fi
   fi
 elif test x$gcc_cv_ld != x; then
   # Check if linker supports --as-needed and --no-as-needed options
   if $gcc_cv_ld --help 2>&1 | grep as-needed > /dev/null; then
 gcc_cv_ld_as_needed=yes
+if $gcc_cv_ld --help 2>&1 | grep push-state > /dev/null; then
+  if $gcc_cv_ld --help 2>&1 | grep pop-state > /dev/null; then
+   gcc_cv_ld_as_needed_option='--push-state --as-needed'
+   gcc_cv_ld_no_as_needed_option='--pop-state'
+  fi
+fi
   fi
   case "$target:$gnu_ld" in
 *-*-solaris2*:no)

Jakub


[PATCH] libgcc/CET: Skip signal frames when unwinding shadow stack

2018-04-11 Thread H.J. Lu
When -fcf-protection -mcet is used, I got

FAIL: g++.dg/eh/sighandle.C

(gdb) bt
 #0  _Unwind_RaiseException (exc=exc@entry=0x416ed0)
at /export/gnu/import/git/sources/gcc/libgcc/unwind.inc:140
 #1  0x77d9936b in __cxxabiv1::__cxa_throw (obj=,
tinfo=0x403dd0 , dest=0x0)
at /export/gnu/import/git/sources/gcc/libstdc++-v3/libsupc++/eh_throw.cc:90
 #2  0x00401255 in sighandler (signo=11, si=0x7fffd6f8,
uc=0x7fffd5c0)
at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:9
 #3    Signal frame which isn't on shadow stack
 #4  dosegv ()
at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:14
 #5  0x004012e3 in main ()
at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:30
(gdb) p frames
$6 = 5
(gdb)

frame count should be 4, not 5.  This patch skips signal frames when
unwinding shadow stack.

Tested on i686 and x86-64.  OK for trunk?

H.J.

PR libgcc/85334
* unwind-generic.h (_Unwind_Frames_Increment): New.
* config/i386/shadow-stack-unwind.h (_Unwind_Frames_Increment):
Likewise.
* unwind.inc (_Unwind_RaiseException_Phase2): Increment frame
count with _Unwind_Frames_Increment.
(_Unwind_ForcedUnwind_Phase2): Likewise.
---
 libgcc/config/i386/shadow-stack-unwind.h | 5 +
 libgcc/unwind-generic.h  | 3 +++
 libgcc/unwind.inc| 6 --
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/i386/shadow-stack-unwind.h 
b/libgcc/config/i386/shadow-stack-unwind.h
index 40f48df2aec..a32f3e74b52 100644
--- a/libgcc/config/i386/shadow-stack-unwind.h
+++ b/libgcc/config/i386/shadow-stack-unwind.h
@@ -49,3 +49,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
}   \
 }  \
 while (0)
+
+/* Increment frame count.  Skip signal frames.  */
+#undef _Unwind_Frames_Increment
+#define _Unwind_Frames_Increment(context, frames) \
+  if (!_Unwind_IsSignalFrame (context)) frames++
diff --git a/libgcc/unwind-generic.h b/libgcc/unwind-generic.h
index b5e3568e1bc..639c96f438e 100644
--- a/libgcc/unwind-generic.h
+++ b/libgcc/unwind-generic.h
@@ -291,4 +291,7 @@ EXCEPTION_DISPOSITION _GCC_specific_handler 
(PEXCEPTION_RECORD, void *,
 /* Additional actions to unwind number of stack frames.  */
 #define _Unwind_Frames_Extra(frames)
 
+/* Increment frame count.  */
+#define _Unwind_Frames_Increment(context, frames) frames++
+
 #endif /* unwind.h */
diff --git a/libgcc/unwind.inc b/libgcc/unwind.inc
index 68c08964d30..b49f8797009 100644
--- a/libgcc/unwind.inc
+++ b/libgcc/unwind.inc
@@ -72,8 +72,9 @@ _Unwind_RaiseException_Phase2(struct _Unwind_Exception *exc,
   /* Don't let us unwind past the handler context.  */
   gcc_assert (!match_handler);
 
+  _Unwind_Frames_Increment (context, frames);
+
   uw_update_context (context, &fs);
-  frames++;
 }
 
   *frames_p = frames;
@@ -187,10 +188,11 @@ _Unwind_ForcedUnwind_Phase2 (struct _Unwind_Exception 
*exc,
return _URC_FATAL_PHASE2_ERROR;
}
 
+  _Unwind_Frames_Increment (context, frames);
+
   /* Update cur_context to describe the same frame as fs, and discard
 the previous context if necessary.  */
   uw_advance_context (context, &fs);
-  frames++;
 }
 
   *frames_p = frames;
-- 
2.14.3



Re: [PATCH] Fix some broadcasts in -masm=intel mode (PR target/85281)

2018-04-11 Thread Kirill Yukhin
Hello Jakub!
On 09 апр 20:29, Jakub Jelinek wrote:
> Hi!
> 
> As the following testcase shows, we emit an incorrect PTR prefix in a
> vpbroadcastb instruction in -masm=intel mode; gas accepts and the manual
> documents that the input operand is xmm2/m8 for vpbroadcastb and
> xmm2/m16 for vpbroadcastw, so we need to use BYTE PTR and WORD PTR instead
> of XMMWORD PTR.
> 
> The first two hunks are just a simplification, the only reason we couldn't
> use  used in many other spots is that it wasn't covering the 512-bit
> floating point vectors.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
LGTM. Sorry for delay.

--
Thanks, K


[wwwdocs] [COMMITTED] ARC gcc8 changes entry

2018-04-11 Thread Claudiu Zissulescu
Hi,

Please find the ARC's gcc8 changes entry section as committed to wwwdocs.

Thank you,
Claudiu


changes.patch
Description: changes.patch


[PATCH] Fix PR85339, bogus early debug

2018-04-11 Thread Richard Biener

The following fixes the missing .debug_line in the early LTO debug
DWARF which makes all DW_AT_decl_file invalid.

LTO bootstrapped on x86_64-unknown-linux-gnu, LTO bootstrap with -g3
still running, so is regtesting.

I verified it works with thin and fat LTO and that the early DWARF
has a proper .debug_line including the reference from the CU.  This
then also makes dwz happy with simple objects that do not contain
DW_OP_GNU_variable_value.

Ok for trunk if the rest of testing succeeds?

Thanks,
Richard.

2018-04-11  Richard Biener  

PR lto/85339
* dwarf2out.c (dwarf2out_finish): Remove DW_AT_stmt_list attribute
from early DWARF output.
(dwarf2out_early_finish): Output line info unconditionally into
early DWARF and add reference to it.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 259308)
+++ gcc/dwarf2out.c (working copy)
@@ -31045,7 +31046,8 @@ dwarf2out_finish (const char *)
   /* Reset die CU symbol so we don't output it twice.  */
   comp_unit_die ()->die_id.die_symbol = NULL;
 
-  /* Remove DW_AT_macro from the early output.  */
+  /* Remove DW_AT_macro and DW_AT_stmt_list from the early output.  */
+  remove_AT (comp_unit_die (), DW_AT_stmt_list);
   if (have_macinfo)
remove_AT (comp_unit_die (), DEBUG_MACRO_ATTRIBUTE);
 
@@ -31681,6 +31683,7 @@ static void
 dwarf2out_early_finish (const char *filename)
 {
   set_early_dwarf s;
+  char dl_section_ref[MAX_ARTIFICIAL_LABEL_BYTES];
 
   /* PCH might result in DW_AT_producer string being restored from the
  header compilation, so always fill it with empty string initially
@@ -31829,6 +31836,16 @@ dwarf2out_early_finish (const char *file
ctnode != NULL; ctnode = ctnode->next)
 add_sibling_attributes (ctnode->root_die);
 
+  /* AIX Assembler inserts the length, so adjust the reference to match the
+ offset expected by debuggers.  */
+  strcpy (dl_section_ref, debug_skeleton_line_section_label);
+  if (XCOFF_DEBUGGING_INFO)
+strcat (dl_section_ref, DWARF_INITIAL_LENGTH_SIZE_STR);
+
+  if (debug_info_level >= DINFO_LEVEL_TERSE)
+add_AT_lineptr (comp_unit_die (), DW_AT_stmt_list,
+   dl_section_ref);
+
   if (have_macinfo)
 add_AT_macptr (comp_unit_die (), DEBUG_MACRO_ATTRIBUTE,
   macinfo_section_label);
@@ -31898,11 +31915,6 @@ dwarf2out_early_finish (const char *file
   output_macinfo (debug_skeleton_line_section_label, true);
   dw2_asm_output_data (1, 0, "End compilation unit");
 
-  /* Emit a skeleton debug_line section.  */
-  switch_to_section (debug_skeleton_line_section);
-  ASM_OUTPUT_LABEL (asm_out_file, debug_skeleton_line_section_label);
-  output_line_info (true);
-
   if (flag_fat_lto_objects)
{
  vec_free (macinfo_table);
@@ -31910,6 +31922,10 @@ dwarf2out_early_finish (const char *file
}
 }
 
+  /* Emit a skeleton debug_line section.  */
+  switch_to_section (debug_skeleton_line_section);
+  ASM_OUTPUT_LABEL (asm_out_file, debug_skeleton_line_section_label);
+  output_line_info (true);
 
   /* If we emitted any indirect strings, output the string table too.  */
   if (debug_str_hash || skeleton_debug_str_hash)


Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-11 Thread Sameera Deshpande
On 11 April 2018 at 15:53, Sudakshina Das  wrote:
> Hi Sameera
>
>
> On 11/04/18 09:04, Sameera Deshpande wrote:
>>
>> On 10 April 2018 at 20:07, Sudakshina Das  wrote:
>>>
>>> Hi Sameera
>>>
>>>
>>> On 10/04/18 11:20, Sameera Deshpande wrote:


 On 7 April 2018 at 01:25, Christophe Lyon 
 wrote:
>
>
> Hi,
>
> 2018-04-06 12:15 GMT+02:00 Sameera Deshpande
> :
>>
>>
>> Hi Christophe,
>>
>> Please find attached the updated patch with testcases.
>>
>> Ok for trunk?
>
>
>
> Thanks for the update.
>
> Since the new intrinsics are only available on aarch64, you want to
> prevent the tests from running on arm.
> Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
> targets.
> There are several examples on how to do that in that directory.
>
> I have also noticed that the tests fail at execution on aarch64_be.
>
> I didn't look at the patch in details.
>
> Christophe
>
>
>>
>> - Thanks and regards,
>> Sameera D.
>>
>> 2017-12-14 22:17 GMT+05:30 Christophe Lyon
>> :
>>>
>>>
>>> 2017-12-14 9:29 GMT+01:00 Sameera Deshpande
>>> :


 Hi!

 Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
 vst1_*_x3 intrinsics as defined by Neon document.

 Ok for trunk?

 - Thanks and regards,
 Sameera D.

 gcc/Changelog:

 2017-11-14  Sameera Deshpande  


   * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
   (st1x2): Likewise.
   (st1x3): Likewise.
   * config/aarch64/aarch64-simd.md
 (aarch64_ld1x3): New pattern.
   (aarch64_ld1_x3_): Likewise
   (aarch64_st1x2): Likewise
   (aarch64_st1_x2_): Likewise
   (aarch64_st1x3): Likewise
   (aarch64_st1_x3_): Likewise
   * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
   (vld1_s8_x3): Likewise.
   (vld1_u16_x3): Likewise.
   (vld1_s16_x3): Likewise.
   (vld1_u32_x3): Likewise.
   (vld1_s32_x3): Likewise.
   (vld1_u64_x3): Likewise.
   (vld1_s64_x3): Likewise.
   (vld1_fp16_x3): Likewise.
   (vld1_f32_x3): Likewise.
   (vld1_f64_x3): Likewise.
   (vld1_p8_x3): Likewise.
   (vld1_p16_x3): Likewise.
   (vld1_p64_x3): Likewise.
   (vld1q_u8_x3): Likewise.
   (vld1q_s8_x3): Likewise.
   (vld1q_u16_x3): Likewise.
   (vld1q_s16_x3): Likewise.
   (vld1q_u32_x3): Likewise.
   (vld1q_s32_x3): Likewise.
   (vld1q_u64_x3): Likewise.
   (vld1q_s64_x3): Likewise.
   (vld1q_f16_x3): Likewise.
   (vld1q_f32_x3): Likewise.
   (vld1q_f64_x3): Likewise.
   (vld1q_p8_x3): Likewise.
   (vld1q_p16_x3): Likewise.
   (vld1q_p64_x3): Likewise.
   (vst1_s64_x2): Likewise.
   (vst1_u64_x2): Likewise.
   (vst1_f64_x2):

 Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3
>
>
> patchname=armv8_2-fp16-scalar-2.patch3
> refrev=259064
> email_to=christophe.l...@linaro.org
>
   (vst1_s8_x2): Likewise.
   (vst1_p8_x2): Likewise.
   (vst1_s16_x2): Likewise.
   (vst1_p16_x2): Likewise.
   (vst1_s32_x2): Likewise.
   (vst1_u8_x2): Likewise.
   (vst1_u16_x2): Likewise.
   (vst1_u32_x2): Likewise.
   (vst1_f16_x2): Likewise.
   (vst1_f32_x2): Likewise.
   (vst1_p64_x2): Likewise.
   (vst1q_s8_x2): Likewise.
   (vst1q_p8_x2): Likewise.
   (vst1q_s16_x2): Likewise.
   (vst1q_p16_x2): Likewise.
   (vst1q_s32_x2): Likewise.
   (vst1q_s64_x2): Likewise.
   (vst1q_u8_x2): Likewise.
   (vst1q_u16_x2): Likewise.
   (vst1q_u32_x2): Likewise.
   (vst1q_u64_x2): Likewise.
   (vst1q_f16_x2): Likewise.
   (vst1q_f32_x2): Likewise.
   (vst1q_f64_x2): Likewise.
   (vst1q_p64_x2): Likewise.
   (vst1_s64_x3): Likewise.
   (vst1_u64_x3): Likewise.
   (vst1_f64_x3): Likewise.
   (vst1_s8_x3): Likewise.

Re: [PATCH] Invoke maybe_warn_nonstring_arg for strcpy/stpcpy builtins.

2018-04-11 Thread Andreas Krebbel
On 04/11/2018 10:02 AM, Jakub Jelinek wrote:
> On Wed, Apr 11, 2018 at 09:48:05AM +0200, Andreas Krebbel wrote:
>> c-c++-common/attr-nonstring-3.c fails on IBM Z. The reason appears to be
>> that we provide builtin implementations for strcpy and stpcpy.  The
>> warnings currently will only be emitted when expanding these as normal
>> calls.
>>
>> Bootstrapped and regression tested on x86_64 and s390x.
>>
>> Ok?
>>
>> gcc/ChangeLog:
>>
>> 2018-04-11  Andreas Krebbel  
>>
>>  * builtins.c (expand_builtin_strcpy): Invoke
>>  maybe_warn_nonstring_arg.
>>  (expand_builtin_stpcpy): Likewise.
> 
> Don't you then warn twice if builtin implementations for strcpy and stpcpy
> aren't available or can't be used, once here and once in calls.c?

Looks like this could happen if the expander is present but rejects expansion. 
I basically copied
this from the strcmp builtin which looks like possibly running into the same 
problem:

  /* Check to see if the argument was declared attribute nonstring
 and if so, issue a warning since at this point it's not known
 to be nul-terminated.  */
  tree fndecl = get_callee_fndecl (exp);
  maybe_warn_nonstring_arg (fndecl, exp);

  if (result)
{
  /* Return the value in the proper mode for this function.  */
  machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
  if (GET_MODE (result) == mode)
return result;
  if (target == 0)
return convert_to_mode (mode, result, 0);
  convert_move (target, result, 0);
  return target;
}

  /* Expand the library call ourselves using a stabilized argument
 list to avoid re-evaluating the function's arguments twice.  */
  tree fn = build_call_nofold_loc (EXPR_LOCATION (exp), fndecl, 2, arg1, arg2);
  gcc_assert (TREE_CODE (fn) == CALL_EXPR);
  CALL_EXPR_TAILCALL (fn) = CALL_EXPR_TAILCALL (exp);
  return expand_call (fn, target, target == const0_rtx);

-Andreas-



Re: [PATCH] Fix PR85339, bogus early debug

2018-04-11 Thread Jason Merrill
OK.

On Wed, Apr 11, 2018 at 7:09 AM, Richard Biener  wrote:
>
> The following fixes the missing .debug_line in the early LTO debug
> DWARF which makes all DW_AT_decl_file invalid.
>
> LTO bootstrapped on x86_64-unknown-linux-gnu, LTO bootstrap with -g3
> still running, so is regtesting.
>
> I verified it works with thin and fat LTO and that the early DWARF
> has a proper .debug_line including the reference from the CU.  This
> then also makes dwz happy with simple objects that do not contain
> DW_OP_GNU_variable_value.
>
> Ok for trunk if the rest of testing succeeds?
>
> Thanks,
> Richard.
>
> 2018-04-11  Richard Biener  
>
> PR lto/85339
> * dwarf2out.c (dwarf2out_finish): Remove DW_AT_stmt_list attribute
> from early DWARF output.
> (dwarf2out_early_finish): Output line info unconditionally into
> early DWARF and add reference to it.
>
> Index: gcc/dwarf2out.c
> ===
> --- gcc/dwarf2out.c (revision 259308)
> +++ gcc/dwarf2out.c (working copy)
> @@ -31045,7 +31046,8 @@ dwarf2out_finish (const char *)
>/* Reset die CU symbol so we don't output it twice.  */
>comp_unit_die ()->die_id.die_symbol = NULL;
>
> -  /* Remove DW_AT_macro from the early output.  */
> +  /* Remove DW_AT_macro and DW_AT_stmt_list from the early output.  */
> +  remove_AT (comp_unit_die (), DW_AT_stmt_list);
>if (have_macinfo)
> remove_AT (comp_unit_die (), DEBUG_MACRO_ATTRIBUTE);
>
> @@ -31681,6 +31683,7 @@ static void
>  dwarf2out_early_finish (const char *filename)
>  {
>set_early_dwarf s;
> +  char dl_section_ref[MAX_ARTIFICIAL_LABEL_BYTES];
>
>/* PCH might result in DW_AT_producer string being restored from the
>   header compilation, so always fill it with empty string initially
> @@ -31829,6 +31836,16 @@ dwarf2out_early_finish (const char *file
> ctnode != NULL; ctnode = ctnode->next)
>  add_sibling_attributes (ctnode->root_die);
>
> +  /* AIX Assembler inserts the length, so adjust the reference to match the
> + offset expected by debuggers.  */
> +  strcpy (dl_section_ref, debug_skeleton_line_section_label);
> +  if (XCOFF_DEBUGGING_INFO)
> +strcat (dl_section_ref, DWARF_INITIAL_LENGTH_SIZE_STR);
> +
> +  if (debug_info_level >= DINFO_LEVEL_TERSE)
> +add_AT_lineptr (comp_unit_die (), DW_AT_stmt_list,
> +   dl_section_ref);
> +
>if (have_macinfo)
>  add_AT_macptr (comp_unit_die (), DEBUG_MACRO_ATTRIBUTE,
>macinfo_section_label);
> @@ -31898,11 +31915,6 @@ dwarf2out_early_finish (const char *file
>output_macinfo (debug_skeleton_line_section_label, true);
>dw2_asm_output_data (1, 0, "End compilation unit");
>
> -  /* Emit a skeleton debug_line section.  */
> -  switch_to_section (debug_skeleton_line_section);
> -  ASM_OUTPUT_LABEL (asm_out_file, debug_skeleton_line_section_label);
> -  output_line_info (true);
> -
>if (flag_fat_lto_objects)
> {
>   vec_free (macinfo_table);
> @@ -31910,6 +31922,10 @@ dwarf2out_early_finish (const char *file
> }
>  }
>
> +  /* Emit a skeleton debug_line section.  */
> +  switch_to_section (debug_skeleton_line_section);
> +  ASM_OUTPUT_LABEL (asm_out_file, debug_skeleton_line_section_label);
> +  output_line_info (true);
>
>/* If we emitted any indirect strings, output the string table too.  */
>if (debug_str_hash || skeleton_debug_str_hash)


[PATCH] Make redirection only for target_clones (PR ipa/85329).

2018-04-11 Thread Martin Liška
Hi.

Following restricts cgraph redirection done in multiple_target.c just
to clones that are created in the IPA pass.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-04-11  Martin Liska  

PR ipa/85329
* multiple_target.c (create_dispatcher_calls): Rename to ...
(redirect_target_clone_callers): ... this.
(expand_target_clones): Record created clones.
(ipa_target_clone): Call redirect_target_clone_callers only
for these clones.
---
 gcc/multiple_target.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)


diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
index b006a5ab6ec..5bd88fbd83c 100644
--- a/gcc/multiple_target.c
+++ b/gcc/multiple_target.c
@@ -58,11 +58,11 @@ replace_function_decl (tree *op, int *walk_subtrees, void *data)
   return NULL;
 }
 
-/* If the call in NODE has multiple target attribute with multiple fields,
-   replace it with dispatcher call and create dispatcher (once).  */
+/* If the call in NODE is a target_clone attribute clone, redirect all
+   edges and references that point to cgraph NODE.  */
 
 static void
-create_dispatcher_calls (struct cgraph_node *node)
+redirect_target_clone_callers (struct cgraph_node *node)
 {
   ipa_ref *ref;
 
@@ -300,10 +300,12 @@ create_target_clone (cgraph_node *node, bool definition, char *name)
 }
 
 /* If the function in NODE has multiple target attributes
-   create the appropriate clone for each valid target attribute.  */
+   create the appropriate clone for each valid target attribute.
+   TO_REDIRECT is a vector where we place all newly created clones.  */
 
 static bool
-expand_target_clones (struct cgraph_node *node, bool definition)
+expand_target_clones (struct cgraph_node *node, bool definition,
+		  auto_vec  &to_redirect)
 {
   int i;
   /* Parsing target attributes separated by comma.  */
@@ -404,6 +406,8 @@ expand_target_clones (struct cgraph_node *node, bool definition)
   before->next = after;
   after->prev = before;
   DECL_FUNCTION_VERSIONED (new_node->decl) = 1;
+
+  to_redirect.safe_push (new_node);
 }
 
   XDELETEVEC (attrs);
@@ -420,6 +424,8 @@ expand_target_clones (struct cgraph_node *node, bool definition)
 = targetm.target_option.valid_attribute_p (node->decl, NULL,
 	   TREE_VALUE (attributes), 0);
   input_location = saved_loc;
+  to_redirect.safe_push (node);
+
   return ret;
 }
 
@@ -428,13 +434,12 @@ ipa_target_clone (void)
 {
   struct cgraph_node *node;
 
-  bool target_clone_pass = false;
+  auto_vec  nodes_to_redirect;
   FOR_EACH_FUNCTION (node)
-target_clone_pass |= expand_target_clones (node, node->definition);
+expand_target_clones (node, node->definition, nodes_to_redirect);
 
-  if (target_clone_pass)
-FOR_EACH_FUNCTION (node)
-  create_dispatcher_calls (node);
+  for (unsigned i = 0; i < nodes_to_redirect.length (); i++)
+redirect_target_clone_callers (nodes_to_redirect[i]);
 
   return 0;
 }



Re: [PATCH] Fix VEC_PERM_EXPR folding (PR tree-optimization/85331)

2018-04-11 Thread Richard Biener
On Wed, 11 Apr 2018, Jakub Jelinek wrote:

> Hi!
> 
> We ICE on the following testcase, because VEC_PERM_EXPR indexes are supposed
> to be clamped into the 0 .. 2 * nelts - 1 range, but if some index is very
> large constant (larger or equal than HOST_WIDE_INT_1 << 33), then clamp
> doesn't actually perform any clamping.
> This is because can_div_trunc_p stores the quotient into int variable and
> this doesn't fit in that case into int.  As element_type is poly_int64,
> it should fit into HOST_WIDE_INT.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2018-04-11  Jakub Jelinek  
> 
>   PR tree-optimization/85331
>   * vec-perm-indices.h (vec_perm_indices::clamp): Change input type
>   from int to HOST_WIDE_INT.
> 
>   * gcc.c-torture/execute/pr85331.c: New test.
> 
> --- gcc/vec-perm-indices.h.jj 2018-01-03 10:19:54.969533927 +0100
> +++ gcc/vec-perm-indices.h2018-04-11 09:48:02.043153054 +0200
> @@ -119,7 +119,7 @@ inline vec_perm_indices::element_type
>  vec_perm_indices::clamp (element_type elt) const
>  {
>element_type limit = input_nelts (), elem_within_input;
> -  int input;
> +  HOST_WIDE_INT input;
>if (!can_div_trunc_p (elt, limit, &input, &elem_within_input))
>  return elt;
>  
> --- gcc/testsuite/gcc.c-torture/execute/pr85331.c.jj  2018-04-11 
> 09:54:02.044206856 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr85331.c 2018-04-11 
> 09:53:22.359200922 +0200
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/85331 */
> +
> +typedef double V __attribute__((vector_size (2 * sizeof (double;
> +typedef long long W __attribute__((vector_size (2 * sizeof (long long;
> +
> +__attribute__((noipa)) void
> +foo (V *r)
> +{
> +  V y = { 1.0, 2.0 };
> +  W m = { 101LL, 0LL };
> +  *r = __builtin_shuffle (y, m);
> +}
> +
> +int
> +main ()
> +{
> +  V r;
> +  foo (&r);
> +  if (r[0] != 2.0 || r[1] != 1.0)
> +__builtin_abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH PR85190]Adjust pointer for aligned access

2018-04-11 Thread Bin.Cheng
On Wed, Apr 11, 2018 at 10:46 AM, Richard Biener
 wrote:
> On Tue, Apr 10, 2018 at 6:28 PM, Bin.Cheng  wrote:
>> On Tue, Apr 10, 2018 at 3:58 PM, Bin.Cheng  wrote:
>>> On Tue, Apr 10, 2018 at 2:26 PM, Jakub Jelinek  wrote:
 On Tue, Apr 10, 2018 at 09:55:35AM +, Bin Cheng wrote:
> Hi Rainer, could you please help me double check that this solves the 
> issue?
>
> Thanks,
> bin
>
> gcc/testsuite
> 2018-04-10  Bin Cheng  
>
>   PR testsuite/85190
>   * gcc.dg/vect/pr81196.c: Adjust pointer for aligned access.

> diff --git a/gcc/testsuite/gcc.dg/vect/pr81196.c 
> b/gcc/testsuite/gcc.dg/vect/pr81196.c
> index 46d7a9e..15320ae 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr81196.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr81196.c
> @@ -4,14 +4,14 @@
>
>  void f(short*p){
>p=(short*)__builtin_assume_aligned(p,64);
> -  short*q=p+256;
> +  short*q=p+255;
>for(;p!=q;++p,--q){
>  short t=*p;*p=*q;*q=t;

 This is UB then though, because p will never be equal to q.
>>
>> Hmm, though it's UB in this case, is it OK for niter analysis gives
>> below results?
>>
>> Analyzing # of iterations of loop 1
>>   exit condition [126, + , 18446744073709551615] != 0
>>   bounds on difference of bases: -126 ... -126
>>   result:
>> # of iterations 126, bounded by 126
>>
>> I don't really follow last piece of code in number_of_iterations_ne:
>>
>>   /* Let nsd (step, size of mode) = d.  If d does not divide c, the loop
>>  is infinite.  Otherwise, the number of iterations is
>>  (inverse(s/d) * (c/d)) mod (size of mode/d).  */
>>   bits = num_ending_zeros (s);
>>   bound = build_low_bits_mask (niter_type,
>>(TYPE_PRECISION (niter_type)
>> - tree_to_uhwi (bits)));
>>
>>   d = fold_binary_to_constant (LSHIFT_EXPR, niter_type,
>>build_int_cst (niter_type, 1), bits);
>>   s = fold_binary_to_constant (RSHIFT_EXPR, niter_type, s, bits);
>>
>>   if (!exit_must_be_taken)
>> {
>>   /* If we cannot assume that the exit is taken eventually, record the
>>  assumptions for divisibility of c.  */
>>   assumption = fold_build2 (FLOOR_MOD_EXPR, niter_type, c, d);
>>   assumption = fold_build2 (EQ_EXPR, boolean_type_node,
>> assumption, build_int_cst (niter_type, 0));
>>   if (!integer_nonzerop (assumption))
>> niter->assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
>>   niter->assumptions, assumption);
>> }
>>
>>   c = fold_build2 (EXACT_DIV_EXPR, niter_type, c, d);
>>   tmp = fold_build2 (MULT_EXPR, niter_type, c, inverse (s, bound));
>>   niter->niter = fold_build2 (BIT_AND_EXPR, niter_type, tmp, bound);
>>   return true;
>>
>> Though infinite niters is mentioned, I don't see it's handled?
>
> number_of_iterations_ne_max computes this it seems based on the
> fact that pointer overflow is undefined.  This means that 126 is
> as good as any other number given the testcase is undefined...

Okay, in this case, I simply removed the function with UB in the case.
Is it OK?

Thanks,
bin

gcc/testsuite
2018-04-11  Bin Cheng  

PR testsuite/85190
* gcc.dg/vect/pr81196.c: Remove function with undefined behavior.


>
> Richard.
>
>> Thanks,
>> bin
>>> Sorry I already checked in, will try to correct it in another patch.
>>>
>>> Thanks,
>>> bin

>}
>  }
>  void b(short*p){
>p=(short*)__builtin_assume_aligned(p,64);
> -  short*q=p+256;
> +  short*q=p+255;
>for(;p  short t=*p;*p=*q;*q=t;

 This one is fine, sure.

 Jakub
diff --git a/gcc/testsuite/gcc.dg/vect/pr81196.c 
b/gcc/testsuite/gcc.dg/vect/pr81196.c
index 15320ae..97d40a0 100644
--- a/gcc/testsuite/gcc.dg/vect/pr81196.c
+++ b/gcc/testsuite/gcc.dg/vect/pr81196.c
@@ -2,13 +2,6 @@
 /* { dg-require-effective-target vect_int } */
 /* { dg-require-effective-target vect_perm_short } */
 
-void f(short*p){
-  p=(short*)__builtin_assume_aligned(p,64);
-  short*q=p+255;
-  for(;p!=q;++p,--q){
-short t=*p;*p=*q;*q=t;
-  }
-}
 void b(short*p){
   p=(short*)__builtin_assume_aligned(p,64);
   short*q=p+255;
@@ -16,4 +9,4 @@ void b(short*p){
 short t=*p;*p=*q;*q=t;
   }
 }
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */


Re: [PATCH PR85190]Adjust pointer for aligned access

2018-04-11 Thread Richard Biener
On Wed, Apr 11, 2018 at 3:15 PM, Bin.Cheng  wrote:
> On Wed, Apr 11, 2018 at 10:46 AM, Richard Biener
>  wrote:
>> On Tue, Apr 10, 2018 at 6:28 PM, Bin.Cheng  wrote:
>>> On Tue, Apr 10, 2018 at 3:58 PM, Bin.Cheng  wrote:
 On Tue, Apr 10, 2018 at 2:26 PM, Jakub Jelinek  wrote:
> On Tue, Apr 10, 2018 at 09:55:35AM +, Bin Cheng wrote:
>> Hi Rainer, could you please help me double check that this solves the 
>> issue?
>>
>> Thanks,
>> bin
>>
>> gcc/testsuite
>> 2018-04-10  Bin Cheng  
>>
>>   PR testsuite/85190
>>   * gcc.dg/vect/pr81196.c: Adjust pointer for aligned access.
>
>> diff --git a/gcc/testsuite/gcc.dg/vect/pr81196.c 
>> b/gcc/testsuite/gcc.dg/vect/pr81196.c
>> index 46d7a9e..15320ae 100644
>> --- a/gcc/testsuite/gcc.dg/vect/pr81196.c
>> +++ b/gcc/testsuite/gcc.dg/vect/pr81196.c
>> @@ -4,14 +4,14 @@
>>
>>  void f(short*p){
>>p=(short*)__builtin_assume_aligned(p,64);
>> -  short*q=p+256;
>> +  short*q=p+255;
>>for(;p!=q;++p,--q){
>>  short t=*p;*p=*q;*q=t;
>
> This is UB then though, because p will never be equal to q.
>>>
>>> Hmm, though it's UB in this case, is it OK for niter analysis gives
>>> below results?
>>>
>>> Analyzing # of iterations of loop 1
>>>   exit condition [126, + , 18446744073709551615] != 0
>>>   bounds on difference of bases: -126 ... -126
>>>   result:
>>> # of iterations 126, bounded by 126
>>>
>>> I don't really follow last piece of code in number_of_iterations_ne:
>>>
>>>   /* Let nsd (step, size of mode) = d.  If d does not divide c, the loop
>>>  is infinite.  Otherwise, the number of iterations is
>>>  (inverse(s/d) * (c/d)) mod (size of mode/d).  */
>>>   bits = num_ending_zeros (s);
>>>   bound = build_low_bits_mask (niter_type,
>>>(TYPE_PRECISION (niter_type)
>>> - tree_to_uhwi (bits)));
>>>
>>>   d = fold_binary_to_constant (LSHIFT_EXPR, niter_type,
>>>build_int_cst (niter_type, 1), bits);
>>>   s = fold_binary_to_constant (RSHIFT_EXPR, niter_type, s, bits);
>>>
>>>   if (!exit_must_be_taken)
>>> {
>>>   /* If we cannot assume that the exit is taken eventually, record the
>>>  assumptions for divisibility of c.  */
>>>   assumption = fold_build2 (FLOOR_MOD_EXPR, niter_type, c, d);
>>>   assumption = fold_build2 (EQ_EXPR, boolean_type_node,
>>> assumption, build_int_cst (niter_type, 0));
>>>   if (!integer_nonzerop (assumption))
>>> niter->assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
>>>   niter->assumptions, assumption);
>>> }
>>>
>>>   c = fold_build2 (EXACT_DIV_EXPR, niter_type, c, d);
>>>   tmp = fold_build2 (MULT_EXPR, niter_type, c, inverse (s, bound));
>>>   niter->niter = fold_build2 (BIT_AND_EXPR, niter_type, tmp, bound);
>>>   return true;
>>>
>>> Though infinite niters is mentioned, I don't see it's handled?
>>
>> number_of_iterations_ne_max computes this it seems based on the
>> fact that pointer overflow is undefined.  This means that 126 is
>> as good as any other number given the testcase is undefined...
>
> Okay, in this case, I simply removed the function with UB in the case.
> Is it OK?

OK.

Richard.

> Thanks,
> bin
>
> gcc/testsuite
> 2018-04-11  Bin Cheng  
>
> PR testsuite/85190
> * gcc.dg/vect/pr81196.c: Remove function with undefined behavior.
>
>
>>
>> Richard.
>>
>>> Thanks,
>>> bin
 Sorry I already checked in, will try to correct it in another patch.

 Thanks,
 bin
>
>>}
>>  }
>>  void b(short*p){
>>p=(short*)__builtin_assume_aligned(p,64);
>> -  short*q=p+256;
>> +  short*q=p+255;
>>for(;p>  short t=*p;*p=*q;*q=t;
>
> This one is fine, sure.
>
> Jakub


[PATCH] Fix non-AVX512VL handling of lo extraction from AVX512F xmm16+ (PR target/85328)

2018-04-11 Thread Jakub Jelinek
Hi!

In lots of patterns we assume that we never see xmm16+ hard registers
with 128-bit and 256-bit vector modes when not -mavx512vl, because
HARD_REGNO_MODE_OK refuses those.
Unfortunately, as this testcase and patch shows, the vec_extract_lo*
splitters work as a loophole around this, we happily create instructions
like (set (reg:V32QI xmm5) (reg:V32QI xmm16)) and then hard register
propagation can propagate the V32QI xmm16 into other insns like vpand.

The following patch fixes it by making sure we never create such registers,
just emit (set (reg:V64QI xmm5) (reg:V64QI xmm16)) instead, which by copying
all the 512 bits also copies the low bits, and as the destination is
originally V32QI which is not HARD_REGNO_MODE_OK in xmm16+, this should be
fine.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-04-11  Jakub Jelinek  

PR target/85328
* config/i386/sse.md
(avx512dq_vextract64x2_1 split,
avx512f_vextract32x4_1 split,
vec_extract_lo_ split, vec_extract_lo_v32hi,
vec_extract_lo_v64qi): For non-AVX512VL if input is xmm16+ reg
and output is a reg, avoid creating invalid lowpart subreg, but
instead split into a 512-bit move.

* gcc.target/i386/pr85328.c: New test.

--- gcc/config/i386/sse.md.jj   2018-04-10 14:37:02.092801344 +0200
+++ gcc/config/i386/sse.md  2018-04-11 12:00:44.296840287 +0200
@@ -7362,7 +7362,15 @@ (define_split
  (parallel [(const_int 0) (const_int 1)])))]
   "TARGET_AVX512DQ && reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (mode, operands[1]);")
+{
+  if (!TARGET_AVX512VL
+  && REG_P (operands[0])
+  && EXT_REX_SSE_REG_P (operands[1]))
+operands[0]
+  = lowpart_subreg (mode, operands[0], mode);
+  else
+operands[1] = gen_lowpart (mode, operands[1]);
+})
 
 (define_insn "avx512f_vextract32x4_1"
   [(set (match_operand: 0 "" 
"=")
@@ -7395,7 +7403,15 @@ (define_split
 (const_int 2) (const_int 3)])))]
   "TARGET_AVX512F && reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (mode, operands[1]);")
+{
+  if (!TARGET_AVX512VL
+  && REG_P (operands[0])
+  && EXT_REX_SSE_REG_P (operands[1]))
+operands[0]
+  = lowpart_subreg (mode, operands[0], mode);
+  else
+operands[1] = gen_lowpart (mode, operands[1]);
+})
 
 (define_mode_attr extract_type_2
   [(V16SF "avx512dq") (V16SI "avx512dq") (V8DF "avx512f") (V8DI "avx512f")])
@@ -7655,7 +7671,15 @@ (define_split
   "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))
&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (mode, operands[1]);")
+{
+  if (!TARGET_AVX512VL
+  && REG_P (operands[0])
+  && EXT_REX_SSE_REG_P (operands[1]))
+operands[0]
+  = lowpart_subreg (mode, operands[0], mode);
+  else
+operands[1] = gen_lowpart (mode, operands[1]);
+})
 
 (define_insn "vec_extract_lo_"
   [(set (match_operand: 0 "" "=v,m")
@@ -7830,7 +7854,14 @@ (define_insn_and_split "vec_extract_lo_v
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (V16HImode, operands[1]);")
+{
+  if (!TARGET_AVX512VL
+  && REG_P (operands[0])
+  && EXT_REX_SSE_REG_P (operands[1]))
+operands[0] = lowpart_subreg (V32HImode, operands[0], V16HImode);
+  else
+operands[1] = gen_lowpart (V16HImode, operands[1]);
+})
 
 (define_insn "vec_extract_hi_v32hi"
   [(set (match_operand:V16HI 0 "nonimmediate_operand" "=v,m")
@@ -7915,7 +7946,14 @@ (define_insn_and_split "vec_extract_lo_v
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (V32QImode, operands[1]);")
+{
+  if (!TARGET_AVX512VL
+  && REG_P (operands[0])
+  && EXT_REX_SSE_REG_P (operands[1]))
+operands[0] = lowpart_subreg (V64QImode, operands[0], V32QImode);
+  else
+operands[1] = gen_lowpart (V32QImode, operands[1]);
+})
 
 (define_insn "vec_extract_hi_v64qi"
   [(set (match_operand:V32QI 0 "nonimmediate_operand" "=v,m")
--- gcc/testsuite/gcc.target/i386/pr85328.c.jj  2018-04-11 12:07:15.044933408 
+0200
+++ gcc/testsuite/gcc.target/i386/pr85328.c 2018-04-11 10:45:17.269733600 
+0200
@@ -0,0 +1,18 @@
+/* PR target/85328 */
+/* { dg-do assemble { target avx512f } } */
+/* { dg-options "-O3 -fno-caller-saves -mavx512f" } */
+
+typedef char U __attribute__((vector_size (64)));
+typedef int V __attribute__((vector_size (64)));
+U a, b;
+
+extern void bar (void);
+
+V
+foo (V f)
+{
+  b <<= (U){(V){}[63]} & 7;
+  bar ();
+  a = (U)f & 7;
+  return (V)b;
+}

Jakub


[PATCH] Fix VEC_PERM_EXPR folding (PR tree-optimization/85331)

2018-04-11 Thread Jakub Jelinek
Hi!

We ICE on the following testcase, because VEC_PERM_EXPR indexes are supposed
to be clamped into the 0 .. 2 * nelts - 1 range, but if some index is very
large constant (larger or equal than HOST_WIDE_INT_1 << 33), then clamp
doesn't actually perform any clamping.
This is because can_div_trunc_p stores the quotient into int variable and
this doesn't fit in that case into int.  As element_type is poly_int64,
it should fit into HOST_WIDE_INT.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-04-11  Jakub Jelinek  

PR tree-optimization/85331
* vec-perm-indices.h (vec_perm_indices::clamp): Change input type
from int to HOST_WIDE_INT.

* gcc.c-torture/execute/pr85331.c: New test.

--- gcc/vec-perm-indices.h.jj   2018-01-03 10:19:54.969533927 +0100
+++ gcc/vec-perm-indices.h  2018-04-11 09:48:02.043153054 +0200
@@ -119,7 +119,7 @@ inline vec_perm_indices::element_type
 vec_perm_indices::clamp (element_type elt) const
 {
   element_type limit = input_nelts (), elem_within_input;
-  int input;
+  HOST_WIDE_INT input;
   if (!can_div_trunc_p (elt, limit, &input, &elem_within_input))
 return elt;
 
--- gcc/testsuite/gcc.c-torture/execute/pr85331.c.jj2018-04-11 
09:54:02.044206856 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr85331.c   2018-04-11 
09:53:22.359200922 +0200
@@ -0,0 +1,22 @@
+/* PR tree-optimization/85331 */
+
+typedef double V __attribute__((vector_size (2 * sizeof (double;
+typedef long long W __attribute__((vector_size (2 * sizeof (long long;
+
+__attribute__((noipa)) void
+foo (V *r)
+{
+  V y = { 1.0, 2.0 };
+  W m = { 101LL, 0LL };
+  *r = __builtin_shuffle (y, m);
+}
+
+int
+main ()
+{
+  V r;
+  foo (&r);
+  if (r[0] != 2.0 || r[1] != 1.0)
+__builtin_abort ();
+  return 0;
+}

Jakub


Re: [PATCH] Make redirection only for target_clones (PR ipa/85329).

2018-04-11 Thread Martin Liška
On 04/11/2018 03:12 PM, Martin Liška wrote:
> Hi.
> 
> Following restricts cgraph redirection done in multiple_target.c just
> to clones that are created in the IPA pass.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-04-11  Martin Liska  
> 
>   PR ipa/85329
>   * multiple_target.c (create_dispatcher_calls): Rename to ...
>   (redirect_target_clone_callers): ... this.
>   (expand_target_clones): Record created clones.
>   (ipa_target_clone): Call redirect_target_clone_callers only
>   for these clones.
> ---
>  gcc/multiple_target.c | 25 +++--
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 
> 

So the patch is not sufficient for the original test-case. I'm reducing that
and will enhance the patch.

Martin


[PATCH 0/2] MIPS/GCC/testsuite: Fixes for data-sym-pool.c

2018-04-11 Thread Maciej W. Rozycki
Hi,

 Paul has recently reported a regression test failure with data-sym-pool.c 
at -O0 in his configuration.  After some tweaking I was able to reproduce 
it with mine.  I also realised we need an n64 variant, hence this has 
become a mini patch series.

 As these changes are test suite updates only, not affecting code 
generation in any way and so limited in scope, I propose to have them 
included in GCC 8 despite it being so close to release.

 See individual changes for details.

  Maciej


[PATCH 1/2] MIPS/GCC/testsuite: Fix data-sym-pool.c for SVR4 model at -O0

2018-04-11 Thread Maciej W. Rozycki
With GCC configurations using the SVR4 rather than the PLT dynamic 
executable model and the o32 ABI with the data-sym-pool.c test case code 
like below is produced:

.file   1 "data-sym-pool.c"
.section .mdebug.abi32
.previous
.nanlegacy
.module fp=xx
.module nooddspreg
.abicalls
.text
.align  2
.globl  frob
.setmips16
.setnomicromips
.entfrob
.type   frob, @function
frob:
.frame  $17,8,$31   # vars= 0, regs= 1/0, args= 0, gp= 0
.mask   0x0002,-4
.fmask  0x,0
save8,$17
move$17,$sp
lw  $2,$L4
move$sp,$17
restore 8,$17
jr  $31
.type   __pool_frob_3, @object
__pool_frob_3:
.align  2
$L3:
.word   __gnu_local_gp
$L4:
.word   305419896
.type   __pend_frob_3, @function
__pend_frob_3:
.insn
.endfrob
.size   frob, .-frob
.ident  "GCC: (GNU) 8.0.1 20180410 (experimental)"

causing a failure due to the unexpected `__gnu_local_gp' entry in the 
constant pool, even though there is nothing wrong with it as far as the 
annotation being examined is concerned.

Given that the SVR4 vs PLT code model consideration is irrelevant for 
this test case rather than rewriting the regular expression to match 
this variant of code just enforce the PLT model by using the `-mplt' 
option.  It is safe to use this option unconditionally as it is silently 
ignored with configurations that do not support this model, e.g. bare 
metal ELF.

gcc/testsuite/
* gcc.target/mips/data-sym-pool.c (dg-options): Add `-mplt'.
---
Hi,

 I have regression-tested this with the `mips-mti-linux-gnu' target and 
the o32, n32 and n64 ABIs.  The two latters are demoted to o64 by the test 
framework due to the lack of MIPS16 support for the hard-float variants of 
these ABIs and I don't have soft-float multilibs configured, so instead I 
have verified n32/soft-float and n64/soft-float variants by hand.  The 
latter revealed the need for 2/2.

 Finally I do not have a bare metal ELF configuration available for 
regression-testing right now, so I only verified that `-mplt' is silently 
ignored.  Code generated is expected to be the same as in the PLT mode.

 OK to apply?

  Maciej
---
 gcc/testsuite/gcc.target/mips/data-sym-pool.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

gcc-mips16-test-data-sym-pool-extra-entries.diff
Index: gcc/gcc/testsuite/gcc.target/mips/data-sym-pool.c
===
--- gcc.orig/gcc/testsuite/gcc.target/mips/data-sym-pool.c  2016-11-17 
21:24:46.0 +
+++ gcc/gcc/testsuite/gcc.target/mips/data-sym-pool.c   2018-04-10 
23:27:49.226719338 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mips16 -mcode-readable=yes" } */
+/* { dg-options "-mips16 -mcode-readable=yes -mplt" } */
 
 int
 frob (void)
@@ -20,6 +20,10 @@ $L3: # The label 
must match.
 __pend_frob_3: # The symbol must match.
.insn
 
-   that is `__pool_*'/`__pend_*' symbols inserted around a constant pool.  */
+   that is `__pool_*'/`__pend_*' symbols inserted around a constant pool.
+
+   This code is built with `-mplt' to prevent the special `__gnu_local_gp'
+   symbol from being placed in the constant pool at `-O0' for SVR4 code
+   and consequently interfering with test expectations.  */
 
 /* { dg-final { scan-assembler 
"\tlw\t\\\$\[0-9\]+,(.L(\[0-9\]+))\n.*\t\\.type\t(__pool_frob_\\2), 
@object\n\\3:\n\t\\.align\t2\n\\1:\n\t\\.word\t305419896\n\t\\.type\t(__pend_frob_\\2),
 @function\n\\4:\n\t\\.insn\n" } } */


[PATCH 2/2] MIPS/GCC/testsuite: Fix data-sym-pool.c for n64 code

2018-04-11 Thread Maciej W. Rozycki
With the soft-float n64 ABI and the data-sym-pool.c test case code like 
below is produced:

.file   1 "data-sym-pool.c"
.section .mdebug.abi64
.previous
.nanlegacy
.module softfloat
.module oddspreg
.abicalls
.option pic0
.text
.align  2
.globl  frob
.setmips16
.setnomicromips
.entfrob
.type   frob, @function
frob:
.frame  $17,16,$31  # vars= 0, regs= 1/0, args= 0, gp= 0
.mask   0x0002,-8
.fmask  0x,0
daddiu  $sp,-16
sd  $17,8($sp)
move$17,$sp
ld  $2,.L3
move$sp,$17
ld  $17,8($sp)
daddiu  $sp,16
jr  $31
.type   __pool_frob_3, @object
__pool_frob_3:
.align  3
.L3:
.dword  305419896
.type   __pend_frob_3, @function
__pend_frob_3:
.insn
.endfrob
.size   frob, .-frob
.ident  "GCC: (GNU) 8.0.1 20180410 (experimental)"

(we have no support for hard-float n64 MIPS16 code generation), which 
means that the test case will fail, as the regular expression pattern 
expects `lw' and `.word' rather than `ld' and `.dword' respectively to 
appear in assembly code generation.  Correct the pattern in an obvious 
way then making it accept both intructions and pseudo-ops.

gcc/testsuite/
* gcc.target/mips/data-sym-pool.c (dg-options): Match `ld' and
`.dword' in addition to `lw' and `.word'.
---
Hi,

 I have regression-tested this with the `mips-mti-linux-gnu' target and 
the o32 ABI.  I don't have an n64 soft-float multilib configured, so the 
manually generated assembly file included with the description serves as 
the proof for what needs to be matched.

 OK to apply?

  Maciej
---
 gcc/testsuite/gcc.target/mips/data-sym-pool.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

gcc-mips16-test-data-sym-pool-dword.diff
Index: gcc/gcc/testsuite/gcc.target/mips/data-sym-pool.c
===
--- gcc.orig/gcc/testsuite/gcc.target/mips/data-sym-pool.c  2018-04-10 
23:27:49.0 +0100
+++ gcc/gcc/testsuite/gcc.target/mips/data-sym-pool.c   2018-04-10 
23:37:05.357453843 +0100
@@ -26,4 +26,4 @@ __pend_frob_3:# The 
symbol must mat
symbol from being placed in the constant pool at `-O0' for SVR4 code
and consequently interfering with test expectations.  */
 
-/* { dg-final { scan-assembler 
"\tlw\t\\\$\[0-9\]+,(.L(\[0-9\]+))\n.*\t\\.type\t(__pool_frob_\\2), 
@object\n\\3:\n\t\\.align\t2\n\\1:\n\t\\.word\t305419896\n\t\\.type\t(__pend_frob_\\2),
 @function\n\\4:\n\t\\.insn\n" } } */
+/* { dg-final { scan-assembler 
"\tl\[wd\]\t\\\$\[0-9\]+,(.L(\[0-9\]+))\n.*\t\\.type\t(__pool_frob_\\2), 
@object\n\\3:\n\t\\.align\t2\n\\1:\n\t\\.d?word\t305419896\n\t\\.type\t(__pend_frob_\\2),
 @function\n\\4:\n\t\\.insn\n" } } */


Re: [PATCH, rs6000] PR85321 improve documentation of -mcall and -mtraceback=

2018-04-11 Thread Segher Boessenkool
Hi!

On Tue, Apr 10, 2018 at 06:49:55PM -0500, Aaron Sawdey wrote:
> Another update to document -mcall- and -mtraceback= options. Cleanup to
> remove -mabi={no-,}spe from the RS/6000 and PowerPC section. And a trim
> to the help text for -mblock-compare-* and -mstring-compare-inline-
> limit so they are not excessively long. The complete description for
> those is now in invoke.texi. This is the last piece for 85321.

Some comments, most total nits, but please fix the sysv4.opt problem:

>   * doc/invoke.texi (RS/6000 and PowerPC Options): Document options
>   -mcall= and -mtraceback. Remove options -mabi=spe and -mabi=no-spe
>   from PowerPC section.

-mcall-
-mtraceback=

>   * config/rs6000/sysv4.opt (mcall): Improve help text.

mcall-


> --- doc/invoke.texi   (revision 259302)
> +++ doc/invoke.texi   (working copy)
> @@ -1076,7 +1076,10 @@
>  -mprioritize-restricted-insns=@var{priority} @gol
>  -msched-costly-dep=@var{dependence_type} @gol
>  -minsert-sched-nops=@var{scheme} @gol
> --mcall-sysv  -mcall-netbsd @gol
> +-mcall-aixdesc  -mcall-eabi  -mcall-freebsd  @gol
> +-mcall-linux  -mcall-netbsd  -mcall-openbsd  @gol
> +-mcall-sysv  -mcall-sysv-eabi  -mcall-sysv-noeabi @gol
> +-mtraceback=@var{traceback_type} @gol
>  -maix-struct-return  -msvr4-struct-return @gol
>  -mabi=@var{abi-type}  -msecure-plt  -mbss-plt @gol
>  -mblock-move-inline-limit=@var{num} @gol

So in xx-type we sometimes use underscores...  I wonder what is preferred.

> @@ -23957,6 +23960,12 @@
>  On System V.4 and embedded PowerPC systems compile code for the
>  OpenBSD operating system.
>  
> +@item -mtraceback=@var{traceback_type}
> +@opindex mtraceback
> +Select the type of traceback table. Valid values for @var{traceback_type}
> +are @samp{full}, @samp{part},
> +and @samp{no}.

Join the last two physical lines?

> --- config/rs6000/sysv4.opt   (revision 259301)
> +++ config/rs6000/sysv4.opt   (working copy)
> @@ -21,7 +21,7 @@
>  
>  mcall-
>  Target RejectNegative Joined Var(rs6000_abi_name)
> -Select ABI calling convention.
> +-mcall=ABI   Select ABI calling convention.

-mcall-ABI  instead.

> --- config/rs6000/rs6000.opt  (revision 259301)
> +++ config/rs6000/rs6000.opt  (working copy)
> @@ -335,15 +335,15 @@
>  
>  mblock-compare-inline-limit=
>  Target Report Var(rs6000_block_compare_inline_limit) Init(31) RejectNegative 
> Joined UInteger Save
> -Specify the maximum number of bytes to compare inline with non-looping code. 
> If this is set to 0, all inline expansion (non-loop and loop) of memcmp is 
> disabled.
> +Specify the maximum number of bytes to compare inline with non-looping code.

You can make it shorter still, so it actually fits on a line in the --help
display, like

"Max number of bytes to compare without loops."

>  mblock-compare-inline-loop-limit=
>  Target Report Var(rs6000_block_compare_inline_loop_limit) Init(-1) 
> RejectNegative Joined UInteger Save
> -Specify the maximum number of bytes to compare inline with loop code 
> generation.  If the length is not known at compile time, memcmp will be 
> called after this many bytes are compared. By default, a length will be 
> picked depending on the tuning target.
> +Specify the maximum number of bytes to compare inline with loop code 
> generation.

"Max number of bytes to compare with loops."

>  mstring-compare-inline-limit=
>  Target Report Var(rs6000_string_compare_inline_limit) Init(8) RejectNegative 
> Joined UInteger Save
> -Specify the maximum number pairs of load instructions that should be 
> generated inline for the compare.  If the number needed exceeds the limit, a 
> call to strncmp will be generated instead.
> +Specify the maximum number pairs of load instructions that should be 
> generated for inline compares.

This one is harder; that's because the actual number you set isn't so
user-friendly.

"Max number of pairs of load insns for compare without loops."


Okay for trunk with the sysv4.opt fix, and whatever polishing you want
to do.  Thanks!


Segher


Re: [PATCH] sched-rgn: run add_branch_dependencies for sel-sched (PR 84301)

2018-04-11 Thread Vladimir Makarov



On 04/11/2018 06:15 AM, Andrey Belevantsev wrote:

On 10.04.2018 14:09, Alexander Monakov wrote:

Hi,

The add_branch_dependencies function is fairly unusual in that it creates
dependence edges "out of thin air" for all sorts of instructions preceding
BB end. I think that is really unfortunate (explicit barriers in RTL would
be more natural), but I've already complained about that in the PR.

The bug/regression is that this function was not run for sel-sched, but the
testcase uncovers that moving a USE away from the return insn can break
assumptions in mode-switching code.

Solve this by running the first part of add_branch_dependencies where it
sets CANT_MOVE flags on immovable non-branch insns.

Bootstrapped/regtested on x86_64 with sel-sched active. OK to apply?

Looks fine to me but I cannot approve -- maybe Vladimir can take a look?


The patch is ok for me.  Thanks for working on the PR.


Re: [PATCH] sched-deps: respect deps->readonly in macro-fusion (PR 84566)

2018-04-11 Thread Vladimir Makarov



On 04/11/2018 06:19 AM, Andrey Belevantsev wrote:

On 10.04.2018 13:40, Alexander Monakov wrote:

Hi,

this fixes a simple "regression" under the qsort_chk umbrella: sched-deps
analysis has deps->readonly flag, but macro-fusion code does not respect it
and mutates instructions. This breaks an assumption in sel_rank_for_schedule
and manifests as qsort checking error.

Since sched_macro_fuse_insns is only called to set SCHED_GROUP_P on suitable
insns, guard the call with !deps->readonly.

Bootstrapped/regtested on x86_64 with sel-sched active and
--with-cpu=sandybridge to exercise macro-fusion code and verified on aarch64
cross-compiler that the failing testcase given in the PR is fixed.

OK to apply?

Fine with me but you need a scheduler maintainer approval.



Ok with me.  Thanks for the patch.


Re: [PATCH][explow] PR target/85173: validize memory before passing it on to target probe_stack

2018-04-11 Thread Uros Bizjak
On Tue, Apr 10, 2018 at 11:14 AM, Uros Bizjak  wrote:
> On Tue, Apr 10, 2018 at 10:45 AM, Richard Earnshaw (lists)
>  wrote:
>
 Alpha should be fixed -- the docs clearly state that the operand is "the
 memory reference in the stack that needs to be probed".  Just passing in
 the offset seems wrong.
>>>
>>> This pattern has to be renamed to not clash with the standard pattern name.
>>>
>>> I'm testing the attached patch.
>>>
>>
>> Ugh!  Two different APIs, one called gen_stack_probe and one
>> gen_probe_stack?  That has to be a recipe for disaster!
>
> It is just an unforunatelly named helper expander. Maybe
> "stack_probe_internal" is indeed a bettern name.

Now committed with the above change and following ChangeLog entry:

2018-04-11  Uros Bizjak  

* config/alpha/alpha.md (stack_probe_internal): Rename
from "probe_stack".  Update all callers.

Bootstrapped and regression tested on alphaev68-linux-gnu.

Uros.


Re: [doc PATCH] fix up C++ option references (PR 71283)

2018-04-11 Thread Jason Merrill

On 04/05/2018 07:28 PM, Martin Sebor wrote:

Attached is the final version of the patch to adjust the lists
of options (C++ Language Options and -Wall) to include missing
C++ options, reference the forms of options that aren't
the default, and use TexInfo tables for the lists of options
in -Wall and -Wextra to address Nathan's comment.  The patch
also fixes bug 71283.



 -Wnoexcept  -Wnoexcept-type  -Wclass-memaccess @gol

...

+-Wclass-memaccess -Wclobbered  -Wcomment  -Wconditionally-supported @gol


-Wclass-memaccess is already in the C++ options summary, I don't think 
we need to also add it to the diagnostic options summary.



-@item -Wclass-memaccess @r{(C++ and Objective-C++ only)}
+@item -Wclass-memaccess @r{(C++ only)}


You removed "and Objective-C++" only for this option?  Let's either 
change all of them or none.



-@item -Wsubobject-linkage @r{(C++ and Objective-C++ only)}
+@item -Wno-subobject-linkage @r{(C++ and Objective-C++ only)}
 @opindex Wsubobject-linkage
 @opindex Wno-subobject-linkage
 Warn if a class type has a base or a field whose type uses the anonymous



-@item -Wdelete-incomplete @r{(C++ and Objective-C++ only)}
+@item -Wno-delete-incomplete @r{(C++ and Objective-C++ only)}
 @opindex Wdelete-incomplete
 @opindex Wno-delete-incomplete
 Warn when deleting a pointer to incomplete type, which may cause


If you're reversing the sense of the flag, please adjust the 
documentation to match.


Jason


Re: [patch, fortran] Remove parallell annotation from DO CONCURRENT

2018-04-11 Thread Jakub Jelinek
On Tue, Apr 10, 2018 at 11:50:44PM +0200, Thomas Koenig wrote:
> Hi Jakub,
> 
> 
> > The new test FAILs everywhere, gfortran.dg doesn't have infrastructure to
> > run -fopenmp, -fopenacc nor -ftree-parallelize-loops= tests.
> > You need to put such tests into libgomp/testsuite/libgomp.fortran/
> 
> I put the test case in the attached form into the libgomp.fortran
> directory, but it failed execution, without error message.
> 
> Anything I could have done differently?

Avoid using that much stack?  ulimit -s is usually around 8MB on Linux, on
other OSes it can be as low as 2MB or even less, so using 160MB edof array
is way too much.  Also, even if you e.g. allocated it from heap rather than
stack (still for some targets it would be too much), isn't that just too
expensive for the testsuite?
Can you reproduce say even with ne = 20 (with the fix reverted)?

Also, are you going to do the suggested change (because can_be_parallel
is something only autopar cares about, but annot_expr_parallel_kind
sets like annot_expr_ivdep_kind also safelen to INTMAX):
--- gcc/fortran/trans-stmt.c.jj 2018-04-10 08:52:25.467790554 +0200
+++ gcc/fortran/trans-stmt.c2018-04-11 17:42:40.670493050 +0200
@@ -3643,12 +3643,13 @@ gfc_trans_forall_loop (forall_info *fora
   cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
  count, build_int_cst (TREE_TYPE (count), 0));
 
-  /* PR 83064 means that we cannot use the annotation if the
-autoparallelizer is active.  */
-  if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops)
+  /* PR 83064 means that we cannot use the annot_expr_parallel_kind
+annotation until autopar is tought to handle local variables
+in loops annotated that way.  */
+  if (forall_tmp->do_concurrent)
cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
   build_int_cst (integer_type_node,
- annot_expr_parallel_kind),
+ annot_expr_ivdep_kind),
   integer_zero_node);
 
   tmp = build1_v (GOTO_EXPR, exit_label);


> ! { dg-do run }
> ! PR 83064 - this used to give wrong results.
> ! { dg-additional-options "-O3 -ftree-parallelize-loops=2" }
> ! Original test case by Christian Felter
> 
> program main
> use, intrinsic :: iso_fortran_env
> implicit none
> 
> integer, parameter :: nsplit = 4
> integer(int64), parameter :: ne = 2000
> integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i
> real(real64), dimension(nsplit) :: pi

Jakub


Re: [C++ Patch] PR 70808 ("Spurious -Wzero-as-null-pointer-constant for nullptr_t")

2018-04-11 Thread Jakub Jelinek
On Tue, Apr 10, 2018 at 08:57:22PM +0200, Paolo Carlini wrote:
> 2018-04-10  Paolo Carlini  
> 
>   PR c++/70808
>   * g++.dg/warn/Wzero-as-null-pointer-constant-7.C: New.

The testcase FAILs in -std=c++98 mode for obvious reasons.

I've committed following after testing it with
make check-c++-all RUNTESTFLAGS=dg.exp=Wzero-as-null-pointer-constant-7.C

2018-04-11  Jakub Jelinek  

PR c++/70808
* g++.dg/warn/Wzero-as-null-pointer-constant-7.C: Require c++11
effective target.

--- gcc/testsuite/g++.dg/warn/Wzero-as-null-pointer-constant-7.C
(revision 259324)
+++ gcc/testsuite/g++.dg/warn/Wzero-as-null-pointer-constant-7.C
(revision 259325)
@@ -1,4 +1,5 @@
 // PR c++/70808
+// { dg-do compile { target c++11 } }
 // { dg-options "-Wzero-as-null-pointer-constant" }
 
 int* no_warn = {};


Jakub


Re: [PATCH] Use --push-state --as-needed and --pop-state instead of --as-needed and --no-as-needed for libgcc

2018-04-11 Thread Matthias Klose
On 11.04.2018 12:31, Jakub Jelinek wrote:
> Hi!
> 
> As discussed, using --as-needed and --no-as-needed is dangerous, because
> it results in --no-as-needed even for libraries after -lgcc_s, even when the
> default is --as-needed or --as-needed has been specified earlier on the
> command line.
> 
> If the linker supports --push-state/--pop-state, we should IMHO use it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for stage1?
> 
> Or is this something we want in GCC8 too?

this is problematic for binutils versions with --push-state/--pop-state support
in the BFD linker but not in gold, and then using -fuse-ld=gold.  So maybe the
version check for the BFD linker should only succeed for the first binutils
version which also has -push-state/--pop-state support in gold.

The BFD linker is only able to save exactly one state, and nested --push-state
calls override the state (binutils PR23043).  Otoh, there is not much linked
after libgcc, so maybe this is not an issue.

Matthias


Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-11 Thread Sudakshina Das

Hi Sameera

On 11/04/18 13:05, Sameera Deshpande wrote:

On 11 April 2018 at 15:53, Sudakshina Das  wrote:

Hi Sameera


On 11/04/18 09:04, Sameera Deshpande wrote:


On 10 April 2018 at 20:07, Sudakshina Das  wrote:


Hi Sameera


On 10/04/18 11:20, Sameera Deshpande wrote:



On 7 April 2018 at 01:25, Christophe Lyon 
wrote:



Hi,

2018-04-06 12:15 GMT+02:00 Sameera Deshpande
:



Hi Christophe,

Please find attached the updated patch with testcases.

Ok for trunk?




Thanks for the update.

Since the new intrinsics are only available on aarch64, you want to
prevent the tests from running on arm.
Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
targets.
There are several examples on how to do that in that directory.

I have also noticed that the tests fail at execution on aarch64_be.

I didn't look at the patch in details.

Christophe




- Thanks and regards,
 Sameera D.

2017-12-14 22:17 GMT+05:30 Christophe Lyon
:



2017-12-14 9:29 GMT+01:00 Sameera Deshpande
:



Hi!

Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
vst1_*_x3 intrinsics as defined by Neon document.

Ok for trunk?

- Thanks and regards,
 Sameera D.

gcc/Changelog:

2017-11-14  Sameera Deshpande  


   * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
   (st1x2): Likewise.
   (st1x3): Likewise.
   * config/aarch64/aarch64-simd.md
(aarch64_ld1x3): New pattern.
   (aarch64_ld1_x3_): Likewise
   (aarch64_st1x2): Likewise
   (aarch64_st1_x2_): Likewise
   (aarch64_st1x3): Likewise
   (aarch64_st1_x3_): Likewise
   * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
   (vld1_s8_x3): Likewise.
   (vld1_u16_x3): Likewise.
   (vld1_s16_x3): Likewise.
   (vld1_u32_x3): Likewise.
   (vld1_s32_x3): Likewise.
   (vld1_u64_x3): Likewise.
   (vld1_s64_x3): Likewise.
   (vld1_fp16_x3): Likewise.
   (vld1_f32_x3): Likewise.
   (vld1_f64_x3): Likewise.
   (vld1_p8_x3): Likewise.
   (vld1_p16_x3): Likewise.
   (vld1_p64_x3): Likewise.
   (vld1q_u8_x3): Likewise.
   (vld1q_s8_x3): Likewise.
   (vld1q_u16_x3): Likewise.
   (vld1q_s16_x3): Likewise.
   (vld1q_u32_x3): Likewise.
   (vld1q_s32_x3): Likewise.
   (vld1q_u64_x3): Likewise.
   (vld1q_s64_x3): Likewise.
   (vld1q_f16_x3): Likewise.
   (vld1q_f32_x3): Likewise.
   (vld1q_f64_x3): Likewise.
   (vld1q_p8_x3): Likewise.
   (vld1q_p16_x3): Likewise.
   (vld1q_p64_x3): Likewise.
   (vst1_s64_x2): Likewise.
   (vst1_u64_x2): Likewise.
   (vst1_f64_x2):

Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3



patchname=armv8_2-fp16-scalar-2.patch3
refrev=259064
email_to=christophe.l...@linaro.org


   (vst1_s8_x2): Likewise.
   (vst1_p8_x2): Likewise.
   (vst1_s16_x2): Likewise.
   (vst1_p16_x2): Likewise.
   (vst1_s32_x2): Likewise.
   (vst1_u8_x2): Likewise.
   (vst1_u16_x2): Likewise.
   (vst1_u32_x2): Likewise.
   (vst1_f16_x2): Likewise.
   (vst1_f32_x2): Likewise.
   (vst1_p64_x2): Likewise.
   (vst1q_s8_x2): Likewise.
   (vst1q_p8_x2): Likewise.
   (vst1q_s16_x2): Likewise.
   (vst1q_p16_x2): Likewise.
   (vst1q_s32_x2): Likewise.
   (vst1q_s64_x2): Likewise.
   (vst1q_u8_x2): Likewise.
   (vst1q_u16_x2): Likewise.
   (vst1q_u32_x2): Likewise.
   (vst1q_u64_x2): Likewise.
   (vst1q_f16_x2): Likewise.
   (vst1q_f32_x2): Likewise.
   (vst1q_f64_x2): Likewise.
   (vst1q_p64_x2): Likewise.
   (vst1_s64_x3): Likewise.
   (vst1_u64_x3): Likewise.
   (vst1_f64_x3): Likewise.
   (vst1_s8_x3): Likewise.
   (vst1_p8_x3): Likewise.
   (vst1_s16_x3): Likewise.
   (vst1_p16_x3): Likewise.
   (vst1_s32_x3): Likewise.
   (vst1_u8_x3): Likewise.
   (vst1_u16_x3): Likewise.
   (vst1_u32_x3): Likewise.
   (vst1_f16_x3): Likewise.
   (vst1_f32_x3): Likewise.
   (vst1_p64_x3): Likewise.
   (vst1q_s8_x3): Likewise.
   (vst1q_p8_x3): Likewise.
   (vst1q_s16_x3): Likewise.
   (vst1q_p16_x3): Likewise.
   (vst1q_s32_x3): Likewise.
   (vst1q_s64_x3): Likewise.
   (vst1q_u8_x3): Likewise.
   (vst1q_u16_x3): Likewise.
   (vst1q_u32_x3): Likewise.
   (vst1q_u64_x3): Likewise.
   (vst1q_f16_x3): Likewise.
   (vst1q_f32_x3): Likewise.
   (vst1q_f64_x3): Likewise.
   (vst1q_p64_x3): Likewise.




Hi,
I'm not a maintainer, but I suspect you should add some tests.

Christophe






--
- Thanks and re

Re: [PATCH] Fix PR85339, bogus early debug

2018-04-11 Thread Jakub Jelinek
On Wed, Apr 11, 2018 at 01:09:39PM +0200, Richard Biener wrote:
> 2018-04-11  Richard Biener  
> 
>   PR lto/85339
>   * dwarf2out.c (dwarf2out_finish): Remove DW_AT_stmt_list attribute
>   from early DWARF output.
>   (dwarf2out_early_finish): Output line info unconditionally into
>   early DWARF and add reference to it.
> 
> +  if (debug_info_level >= DINFO_LEVEL_TERSE)
> +add_AT_lineptr (comp_unit_die (), DW_AT_stmt_list,
> + dl_section_ref);

No reason to wrap the above line, it fits nicely on one line.

Otherwise LGTM.

Jakub


Re: [PATCH] [PR c++/85039] no type definitions in builtin offsetof

2018-04-11 Thread Jason Merrill
On Thu, Apr 5, 2018 at 9:33 AM, Jason Merrill  wrote:
> On Wed, Apr 4, 2018 at 12:25 PM, Alexandre Oliva  wrote:
>> On Apr  4, 2018, Jason Merrill  wrote:
>>
>>> On Tue, Apr 3, 2018 at 11:25 PM, Alexandre Oliva  wrote:
 I still think we could attempt to retain the extension as it is, parsing
 types introduced in data member initializers within the scope of the
 class containing the data member, like we do, but, when the class is
 already complete, recording it if not in TYPE_FIELDS, in some additional
 field consulted for name mangling purposes and, in order to retain
 compatibility, if the type is not a closure or anonymous, also recording
 it in the enclosing namespace, so that it is found by lookup as in the
 quoted snippet.

 Is that a terrible idea?
>>
>>> It sounds like a lot of work to support a very questionable pattern.
>>
>> It's not so much work (the simple patch below does just that, and its
>> testing is almost done); I agree it's questionable, and it's limited (it
>> never worked in initializers of members of template classes, as the -4
>> testcase, so we don't have to worry about retaining temporary
>> compatibility with that), but it's there, so I think we'd be better off
>> deprecating it, if that's the direction we want to go.  The patch below
>> has just the right spot for a deprecation warning, even ;-)
>>
>> We could recommend users to use a closure that returns the offsetof
>> instead of the unadorned offsetof.  That would work portably, but we
>> shouldn't make the transformation ourselves: it would change the
>> ABI-defined numbering of closure types.
>>
>>> Perhaps we should just disallow defining a type in offsetof if the
>>> current scope is a class.
>>
>> Even anonymous types?  I suspect this could break a lot of existing
>> code, with anonymous types hiding in macros.
>
> It seems unlikely to me that such a use of macros would occur at class
> scope; there's no C compatibility issue there.

I raised this issue with the C++ committee, and it seems that nobody
expects defining a type here to work.  So let's go back to your first
patch, removing the offending part of semicolon3.C.

Jason


Re: [doc PATCH] fix up C++ option references (PR 71283)

2018-04-11 Thread Martin Sebor

On 04/11/2018 09:44 AM, Jason Merrill wrote:

On 04/05/2018 07:28 PM, Martin Sebor wrote:

Attached is the final version of the patch to adjust the lists
of options (C++ Language Options and -Wall) to include missing
C++ options, reference the forms of options that aren't
the default, and use TexInfo tables for the lists of options
in -Wall and -Wextra to address Nathan's comment.  The patch
also fixes bug 71283.



 -Wnoexcept  -Wnoexcept-type  -Wclass-memaccess @gol

...

+-Wclass-memaccess -Wclobbered  -Wcomment  -Wconditionally-supported @gol


-Wclass-memaccess is already in the C++ options summary, I don't think
we need to also add it to the diagnostic options summary.


Some of these C++-only options are listed in 3.8 Options to
Request or Suppress Warnings which the See Options to Request
or Suppress Warnings link under Warning Options points to.

I would expect all the warning options mentioned anywhere in
3.8 to be listed in the Warning Options summary.  That would
include all C++-only options in -Wall and -Wextra but not
other C++-only options (at least not for now).  That's what
I'm aiming for with the patch; I may have missed some.

It seems to me that the most intuitive organization might
actually be to list all warning options in the Warning Summary
section, even if some of them are specific to just a subset of
languages and also listed in language-specific sections.  (At
least for the C family.)

I often have 3.8 Options to Request or Suppress Warnings open
in my browser and use it to search for all warning options.
I find it inconvenient (and prone to error) to have to remember
to also open 3.5 Options Controlling C++ Dialect to look for
C++-only options that aren't listed in 3.5.

Does the approach sound like an improvement to you?


-@item -Wclass-memaccess @r{(C++ and Objective-C++ only)}
+@item -Wclass-memaccess @r{(C++ only)}


You removed "and Objective-C++" only for this option?  Let's either
change all of them or none.


I think that was a mistake.  The option is valid for ObjC++
so let me put it back.


-@item -Wsubobject-linkage @r{(C++ and Objective-C++ only)}
+@item -Wno-subobject-linkage @r{(C++ and Objective-C++ only)}
 @opindex Wsubobject-linkage
 @opindex Wno-subobject-linkage
 Warn if a class type has a base or a field whose type uses the anonymous



-@item -Wdelete-incomplete @r{(C++ and Objective-C++ only)}
+@item -Wno-delete-incomplete @r{(C++ and Objective-C++ only)}
 @opindex Wdelete-incomplete
 @opindex Wno-delete-incomplete
 Warn when deleting a pointer to incomplete type, which may cause


If you're reversing the sense of the flag, please adjust the
documentation to match.


Not sure I understand what part you think needs adjusting.
I changed it to -Wno- to reflect that the option is enabled
by default.  Can you elaborate?

Martin


Re: RFC: Disable asan tests under ulimit -v

2018-04-11 Thread Jason Merrill
On Tue, Apr 3, 2018 at 1:23 PM, Jason Merrill  wrote:
> On Tue, Apr 3, 2018 at 12:56 PM, Jason Merrill  wrote:
>> On Mon, Mar 26, 2018 at 4:01 PM, Jason Merrill  wrote:
>>>
>>> On Mon, Mar 26, 2018 at 2:55 PM, Andreas Schwab 
>>> wrote:
>>> > On Mär 26 2018, Jakub Jelinek  wrote:
>>> >> On Mon, Mar 26, 2018 at 08:33:41PM +0200, Andreas Schwab wrote:
>>> >>> On Mär 26 2018, Jason Merrill  wrote:
>>> >>>
>>> >>> > if [catch {exec sh ulimit -v} ulimit_v] {
>>> >>>
>>> >>> expect1.1> exec sh ulimit -v
>>> >>> sh: ulimit: No such file or directory
>>> >>> while executing
>>> >>> "exec sh ulimit -v"
>>> >>
>>> >> Perhaps
>>> >>   if [catch {exec sh -c ulimit -v} ulimit_v] {
>>> >
>>> > expect1.1> exec sh -c ulimit -v
>>> > unlimited
>>> > expect1.2> exec sh -c {ulimit -v}
>>> > 4194304
>>>
>>> OK, so
>>>
>>> if ![is_remote target] {
>>> if [catch {exec sh -c "ulimit -v"} ulimit_v] {
>>> # failed to get ulimit
>>> } elseif [regexp {^[0-9]+$} $ulimit_v] {
>>> # ulimit -v gave a numeric limit
>>> return
>>> }
>>> }
>>
>> This version adds a warning.  OK for trunk?

And this one puts the check in asan_init rather than its users.  OK?
commit 8b2e4c11607171426da477d6a81225e333c0b735
Author: Jason Merrill 
Date:   Fri Mar 23 11:14:50 2018 -0400

* lib/asan-dg.exp (asan_init): Don't run tests if ulimit -v is set.

diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp
index 25f1de45879..11a96ad000a 100644
--- a/gcc/testsuite/lib/asan-dg.exp
+++ b/gcc/testsuite/lib/asan-dg.exp
@@ -89,6 +89,17 @@ proc asan_init { args } {
 global asan_saved_TEST_ALWAYS_FLAGS
 global asan_saved_ALWAYS_CXXFLAGS
 
+# asan doesn't work if there's a ulimit on virtual memory.
+if ![is_remote target] {
+	if [catch {exec sh -c "ulimit -v"} ulimit_v] {
+	# failed to get ulimit
+	} elseif [regexp {^[0-9]+$} $ulimit_v] {
+	# ulimit -v gave a numeric limit
+	warning "skipping asan tests due to ulimit -v"
+	return -code return
+	}
+}
+
 set link_flags ""
 if ![is_remote host] {
 	if [info exists TOOL_OPTIONS] {


Re: [C++ Patch] PR 70808 ("Spurious -Wzero-as-null-pointer-constant for nullptr_t")

2018-04-11 Thread Paolo Carlini

Hi,

On 11/04/2018 17:57, Jakub Jelinek wrote:

On Tue, Apr 10, 2018 at 08:57:22PM +0200, Paolo Carlini wrote:

2018-04-10  Paolo Carlini  

PR c++/70808
* g++.dg/warn/Wzero-as-null-pointer-constant-7.C: New.

The testcase FAILs in -std=c++98 mode for obvious reasons.
Indeed, thanks Jakub and sorry about the stupid mistake. I'm moving the 
new testcase to g++.dg/cpp0x/Wzero-as-null-pointer-constant-3.C, more 
consistent with how we placed these testcases in the past.


Paolo.


Re: RFC: C++ PATCH for c++/69763, making C++ alignof match C _Alignof

2018-04-11 Thread Jason Merrill
On Tue, Apr 10, 2018 at 1:12 PM, Jason Merrill  wrote:
> But really this is beside the point: the x86 ABI says that the
> alignment of double is 4, so alignof(double) should be 4 regardless of
> what GCC wants to do internally.  And I think the same is true of
> __alignof__.

Particularly since the description of __alignof__ has been clarified:

"Some machines never actually require alignment; they allow reference
to any data type even at an odd address.  For these machines,
'__alignof__' reports the smallest alignment that GCC gives the data
type, usually as mandated by the target ABI."

To that end, here's a patch that makes __alignof__ agree with
_Alignof, fixing bug 10360:
commit 3528a49567a23c0fab5916119e8c4f2dcddd8f3c
Author: Jason Merrill 
Date:   Wed Apr 11 11:09:38 2018 -0400

PR c/10360 - __alignof__(double) on x86.

PR c++/69763 - wrong alignof(double) on x86.
gcc/c-family/
* c-common.c (c_sizeof_or_alignof_type): Always return minimum
alignment.
* c-common.h: Remove parameter from prototype.
(c_sizeof, c_alignof): Remove argument.
gcc/c/
* c-parser.c (c_parser_alignas_specifier)
(c_parser_alignof_expression): Drop min_align argument.
gcc/cp/
* typeck.c (cxx_sizeof_or_alignof_type): Drop min_align argument.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 7e6905e791e..5da030916cb 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -3570,7 +3570,7 @@ c_common_get_alias_set (tree t)
 
 tree
 c_sizeof_or_alignof_type (location_t loc,
-			  tree type, bool is_sizeof, bool min_alignof,
+			  tree type, bool is_sizeof,
 			  int complain)
 {
   const char *op_name;
@@ -3637,10 +3637,8 @@ c_sizeof_or_alignof_type (location_t loc,
 	value = size_binop_loc (loc, CEIL_DIV_EXPR, TYPE_SIZE_UNIT (type),
 size_int (TYPE_PRECISION (char_type_node)
 	  / BITS_PER_UNIT));
-  else if (min_alignof)
+  else
 	value = size_int (min_align_of_type (type));
-  else
-	value = size_int (TYPE_ALIGN_UNIT (type));
 }
 
   /* VALUE will have the middle-end integer type sizetype.
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 6cf7614f682..db04ce6cf6a 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -827,7 +827,7 @@ extern tree c_fully_fold (tree, bool, bool *, bool = false);
 extern tree c_wrap_maybe_const (tree, bool);
 extern tree c_common_truthvalue_conversion (location_t, tree);
 extern void c_apply_type_quals_to_decl (int, tree);
-extern tree c_sizeof_or_alignof_type (location_t, tree, bool, bool, int);
+extern tree c_sizeof_or_alignof_type (location_t, tree, bool, int);
 extern tree c_alignof_expr (location_t, tree);
 /* Print an error message for invalid operands to arith operation CODE.
NOP_EXPR is used as a special case (see truthvalue_conversion).  */
@@ -854,8 +854,8 @@ extern tree fold_for_warn (tree);
 extern tree c_common_get_narrower (tree, int *);
 extern bool get_nonnull_operand (tree, unsigned HOST_WIDE_INT *);
 
-#define c_sizeof(LOC, T)  c_sizeof_or_alignof_type (LOC, T, true, false, 1)
-#define c_alignof(LOC, T) c_sizeof_or_alignof_type (LOC, T, false, false, 1)
+#define c_sizeof(LOC, T)  c_sizeof_or_alignof_type (LOC, T, true, 1)
+#define c_alignof(LOC, T) c_sizeof_or_alignof_type (LOC, T, false, 1)
 
 /* Subroutine of build_binary_op, used for certain operations.  */
 extern tree shorten_binary_op (tree result_type, tree op0, tree op1, bool bitwise);
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 47720861d3f..53e120071d9 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -3513,7 +3513,7 @@ c_parser_alignas_specifier (c_parser * parser)
   struct c_type_name *type = c_parser_type_name (parser);
   if (type != NULL)
 	ret = c_sizeof_or_alignof_type (loc, groktypename (type, NULL, NULL),
-	false, true, 1);
+	false, 1);
 }
   else
 ret = c_parser_expr_no_commas (parser, NULL).value;
@@ -7406,7 +7406,7 @@ c_parser_alignof_expression (c_parser *parser)
   in_alignof--;
   ret.value = c_sizeof_or_alignof_type (loc, groktypename (type_name,
 			   NULL, NULL),
-	false, is_c11_alignof, 1);
+	false, 1);
   ret.original_code = ERROR_MARK;
   ret.original_type = NULL;
   set_c_expr_source_range (&ret, start_loc, end_loc);
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 1e6996cde09..bf71d7458a8 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -1641,8 +1641,7 @@ cxx_sizeof_or_alignof_type (tree type, enum tree_code op, bool complain)
 }
 
   return c_sizeof_or_alignof_type (input_location, complete_type (type),
-   op == SIZEOF_EXPR, true,
-   complain);
+   op == SIZEOF_EXPR, complain);
 }
 
 /* Return the size of the type, without producing any warnings for


Re: [patch, fortran] Remove parallell annotation from DO CONCURRENT

2018-04-11 Thread Thomas Koenig

Am 11.04.2018 um 17:44 schrieb Jakub Jelinek:

On Tue, Apr 10, 2018 at 11:50:44PM +0200, Thomas Koenig wrote:

Hi Jakub,



The new test FAILs everywhere, gfortran.dg doesn't have infrastructure to
run -fopenmp, -fopenacc nor -ftree-parallelize-loops= tests.
You need to put such tests into libgomp/testsuite/libgomp.fortran/


I put the test case in the attached form into the libgomp.fortran
directory, but it failed execution, without error message.

Anything I could have done differently?


Avoid using that much stack?


Well, I don't think stack use is excessive :-)

$ gfortran -S -Ofast do_concurrent_5.f90
$ fgrep ', %rsp' do_concurrent_5.s
subq$136, %rsp
addq$136, %rsp

I do see your point about total memory consumption, though.

Computation time of the test case I committed is around 1 s, which was
also not too bad.

I have attached updated patch which moves the test case to
gfortran.dg/gomp (where it actually passes).

Also, the patch below implements the suggestion of using
annot_expr_ivdep_kind.

OK for trunk?

Regards

Thomas

2018-04-11  Thomas Koenig  

PR fortran/83064
PR testsuite/85346
* trans-stmt.c (gfc_trans_forall_loop): Use annot_expr_ivdep_kind
for annotation and remove dependence on -ftree-parallelize-loops.

2018-04-11  Thomas Koenig  

PR fortran/83064
PR testsuite/85346
* gfortran.dg/do_concurrent_5.f90: Reduce memory consumption and
move test to
* gfortran.dg/gomp/do_concurrent_5.f90: New location.
* gfortran.dg/do_concurrent_6.f90: New test.
Index: fortran/trans-stmt.c
===
--- fortran/trans-stmt.c	(Revision 259326)
+++ fortran/trans-stmt.c	(Arbeitskopie)
@@ -3643,12 +3643,12 @@
   cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
 			  count, build_int_cst (TREE_TYPE (count), 0));
 
-  /* PR 83064 means that we cannot use the annotation if the
-	 autoparallelizer is active.  */
-  if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops)
+  /* PR 83064 means that we cannot use annot_expr_parallel_kind until
+   the autoparallelizer can hande this.  */
+  if (forall_tmp->do_concurrent)
 	cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 		   build_int_cst (integer_type_node,
-  annot_expr_parallel_kind),
+  annot_expr_ivdep_kind),
 		   integer_zero_node);
 
   tmp = build1_v (GOTO_EXPR, exit_label);
Index: testsuite/gfortran.dg/do_concurrent_5.f90
===
--- testsuite/gfortran.dg/do_concurrent_5.f90	(Revision 259258)
+++ testsuite/gfortran.dg/do_concurrent_5.f90	(nicht existent)
@@ -1,70 +0,0 @@
-! { dg-do  run }
-! PR 83064 - this used to give wrong results.
-! { dg-options "-O3 -ftree-parallelize-loops=2" }
-! Original test case by Christian Felter
-
-program main
-use, intrinsic :: iso_fortran_env
-implicit none
-
-integer, parameter :: nsplit = 4
-integer(int64), parameter :: ne = 2000
-integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i
-real(real64), dimension(nsplit) :: pi
-
-edof(1::4) = 1
-edof(2::4) = 2
-edof(3::4) = 3
-edof(4::4) = 4
-
-stride = ceiling(real(ne)/nsplit)
-do i = 1, nsplit
-high(i) = stride*i
-end do
-do i = 2, nsplit
-low(i) = high(i-1) + 1
-end do
-low(1) = 1
-high(nsplit) = ne
-
-pi = 0
-do concurrent (i = 1:nsplit)
-pi(i) = sum(compute( low(i), high(i) ))
-end do
-if (abs (sum(pi) - atan(1.0d0)) > 1e-5) call abort
-
-contains
-
-pure function compute( low, high ) result( ttt )
-integer(int64), intent(in) :: low, high
-real(real64), dimension(nsplit) :: ttt
-integer(int64) :: j, k
-
-ttt = 0
-
-! Unrolled loop
-! do j = low, high, 4
-! k = 1
-! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )
-! k = 2
-! ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 )
-! k = 3
-! ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 )
-! k = 4
-! ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 )
-! end do
-
-! Loop with modulo operation
-! do j = low, high
-! k = mod( j, nsplit ) + 1
-! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )
-! end do
-
-! Loop with subscripting via host association
-do j = low, high
-k = edof(j)
-ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 )
-end do
-end function
-
-end program main
Index: testsuite/gfortran.dg/do_concurrent_

Re: RFC: Disable asan tests under ulimit -v

2018-04-11 Thread Jason Merrill
On Wed, Apr 11, 2018 at 2:07 PM, Jakub Jelinek  wrote:
> On Wed, Apr 11, 2018 at 01:59:40PM -0400, Jason Merrill wrote:
>> And this one puts the check in asan_init rather than its users.  OK?
>
> tsan tests have the same problem.

Hmm, tsan tests work fine for me under ulimit -v.

> Wouldn't it be better to have a helper procedure for this and use it next to
> if [check_effective_target_fsanitize_address]

Ah, of course, that's where it belongs.  Done below.

> What exactly does return -code return?

It forces the caller to return as well.
commit f980f806f93982ba54390d45ac5ccc8b350b160c
Author: Jason Merrill 
Date:   Fri Mar 23 11:14:50 2018 -0400

* lib/asan-dg.exp: Don't run tests if ulimit -v is set.

diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp
index 25f1de45879..39451b98a60 100644
--- a/gcc/testsuite/lib/asan-dg.exp
+++ b/gcc/testsuite/lib/asan-dg.exp
@@ -18,9 +18,24 @@
 # code, 0 otherwise.
 
 proc check_effective_target_fsanitize_address {} {
-return [check_no_compiler_messages fsanitize_address executable {
+if ![check_no_compiler_messages fsanitize_address executable {
 	int main (void) { return 0; }
-}]
+}] {
+	return 0;
+}
+
+# asan doesn't work if there's a ulimit on virtual memory.
+if ![is_remote target] {
+	if [catch {exec sh -c "ulimit -v"} ulimit_v] {
+	# failed to get ulimit
+	} elseif [regexp {^[0-9]+$} $ulimit_v] {
+	# ulimit -v gave a numeric limit
+	warning "skipping asan tests due to ulimit -v"
+	return 0;
+	}
+}
+
+return 1;
 }
 
 proc asan_include_flags {} {


Re: [patch, fortran] Remove parallell annotation from DO CONCURRENT

2018-04-11 Thread Jakub Jelinek
On Wed, Apr 11, 2018 at 08:18:35PM +0200, Thomas Koenig wrote:
> Am 11.04.2018 um 17:44 schrieb Jakub Jelinek:
> > On Tue, Apr 10, 2018 at 11:50:44PM +0200, Thomas Koenig wrote:
> > > Hi Jakub,
> > > 
> > > 
> > > > The new test FAILs everywhere, gfortran.dg doesn't have infrastructure 
> > > > to
> > > > run -fopenmp, -fopenacc nor -ftree-parallelize-loops= tests.
> > > > You need to put such tests into libgomp/testsuite/libgomp.fortran/
> > > 
> > > I put the test case in the attached form into the libgomp.fortran
> > > directory, but it failed execution, without error message.
> > > 
> > > Anything I could have done differently?
> > 
> > Avoid using that much stack?
> 
> Well, I don't think stack use is excessive :-)
> 
> $ gfortran -S -Ofast do_concurrent_5.f90
> $ fgrep ', %rsp' do_concurrent_5.s
> subq$136, %rsp
> addq$136, %rsp

The test is not compiled with those options in the testsuite though, but
with -fopenmp -O0 -O3 -ftree-parallelize-loops=2 to select the important
ones.  And with these options
grep ', %rsp' do_concurrent_5.s  | sort -u
addq$16176, %rsp
addq$8, %rsp
subq$16176, %rsp
subq$8, %rsp
-fopenmp is added in the default flags and implies -frecursive and thus
-fautomatic.  You could add -fno-openmp to dg-additional-options if it is
ok for the large vars to be static.  Another thing which can be seen from
the above "-O0 -O3" is that libgomp.fortran/ tests cycle through optimization
options, if you only want -O3 only, then better just dg-skip-if if it isn't -O3,
instead of running the test effectively with -O3 6 or how many times.
Or if you want to test all optimization levels, take -O3 out of the
dg-additional-options.

> I have attached updated patch which moves the test case to
> gfortran.dg/gomp (where it actually passes).

How could it pass there?  dg-do run tests don't belong into g*.dg/gomp/,
nothing adds the -B etc. options needed to find libgomp.spec or libgomp
as a library, or adds it to LD_LIBRARY_PATH etc.
There are zero dg-do run tests in gfortran.dg/gomp/, there are 4
dg-do run tests in c-c++-common/gomp/, but those work fine because they
use -fopenmp-simd option rather than
-fopenmp/-fopenacc/-ftree-parallelize-loops= etc.

Jakub


Re: RFC: Disable asan tests under ulimit -v

2018-04-11 Thread Jakub Jelinek
On Wed, Apr 11, 2018 at 01:59:40PM -0400, Jason Merrill wrote:
> And this one puts the check in asan_init rather than its users.  OK?

tsan tests have the same problem.  Wouldn't it be better to have a helper
procedure for this and use it next to
if [check_effective_target_fsanitize_address] {
or fsanitize_thread
in g*.dg/[at]san/[at]san.exp?
What exactly does return -code return?  asan_init is invoked after dg-init
and so I'd be afraid dg-finish which should be done will not be invoked.

> commit 8b2e4c11607171426da477d6a81225e333c0b735
> Author: Jason Merrill 
> Date:   Fri Mar 23 11:14:50 2018 -0400
> 
> * lib/asan-dg.exp (asan_init): Don't run tests if ulimit -v is 
> set.
> 
> diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp
> index 25f1de45879..11a96ad000a 100644
> --- a/gcc/testsuite/lib/asan-dg.exp
> +++ b/gcc/testsuite/lib/asan-dg.exp
> @@ -89,6 +89,17 @@ proc asan_init { args } {
>  global asan_saved_TEST_ALWAYS_FLAGS
>  global asan_saved_ALWAYS_CXXFLAGS
>  
> +# asan doesn't work if there's a ulimit on virtual memory.
> +if ![is_remote target] {
> + if [catch {exec sh -c "ulimit -v"} ulimit_v] {
> + # failed to get ulimit
> + } elseif [regexp {^[0-9]+$} $ulimit_v] {
> + # ulimit -v gave a numeric limit
> + warning "skipping asan tests due to ulimit -v"
> + return -code return
> + }
> +}
> +
>  set link_flags ""
>  if ![is_remote host] {
>   if [info exists TOOL_OPTIONS] {


Jakub


[PATCH] Fix non-AVX512VL handling of lo extraction from AVX512F xmm16+ (PR target/85328, take 2)

2018-04-11 Thread Jakub Jelinek
On Wed, Apr 11, 2018 at 03:27:28PM +0200, Jakub Jelinek wrote:
> In lots of patterns we assume that we never see xmm16+ hard registers
> with 128-bit and 256-bit vector modes when not -mavx512vl, because
> HARD_REGNO_MODE_OK refuses those.
> Unfortunately, as this testcase and patch shows, the vec_extract_lo*
> splitters work as a loophole around this, we happily create instructions
> like (set (reg:V32QI xmm5) (reg:V32QI xmm16)) and then hard register
> propagation can propagate the V32QI xmm16 into other insns like vpand.
> 
> The following patch fixes it by making sure we never create such registers,
> just emit (set (reg:V64QI xmm5) (reg:V64QI xmm16)) instead, which by copying
> all the 512 bits also copies the low bits, and as the destination is
> originally V32QI which is not HARD_REGNO_MODE_OK in xmm16+, this should be
> fine.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Actually, thinking about it more (not that I have managed to come up with a
testcase), if output is a MEM and input is xmm16+, then we really need to
give up in the splitters and instead emit the v*extract* instructions,
because simple vmovdqa and vmovap[sd] require AVX512VL for the EVEX
encodings.

So, here is an updated patch, bootstrapped/regtested on x86_64-linux and
i686-linux, is this one ok for trunk instead?

Tried e.g.
#include 

__m256d f1 (__m512d x) { register __m512d a __asm ("zmm16"); __asm ("" : "=v" 
(a) : "0" (x)); return _mm512_extractf64x4_pd (a, 0); }
void f2 (__m256d *p, __m512d x) { register __m512d a __asm ("zmm16"); __asm ("" 
: "=v" (a) : "0" (x)); *p = _mm512_extractf64x4_pd (a, 0); }
__m256d f3 (__m512d x, __m256d y) { register __m512d a __asm ("zmm16"); __asm 
("" : "=v" (a) : "0" (x)); return y + _mm512_extractf64x4_pd (a, 0); }
__m128 f4 (__m512 x) { register __m512 a __asm ("zmm16"); __asm ("" : "=v" (a) 
: "0" (x)); return _mm512_extractf32x4_ps (a, 0); }
void f5 (__m128 *p, __m512 x) { register __m512 a __asm ("zmm16"); __asm ("" : 
"=v" (a) : "0" (x)); *p = _mm512_extractf32x4_ps (a, 0); }
__m128 f6 (__m512 x, __m128 y) { register __m512 a __asm ("zmm16"); __asm ("" : 
"=v" (a) : "0" (x)); return y + _mm512_extractf32x4_ps (a, 0); }
__m256i f7 (__m512i x) { register __m512i a __asm ("zmm16"); __asm ("" : "=v" 
(a) : "0" (x)); return _mm512_extracti64x4_epi64 (a, 0); }
void f8 (__m256i *p, __m512i x) { register __m512i a __asm ("zmm16"); __asm ("" 
: "=v" (a) : "0" (x)); *p = _mm512_extracti64x4_epi64 (a, 0); }
__m256i f9 (__m512i x, __m256i y) { register __m512i a __asm ("zmm16"); __asm 
("" : "=v" (a) : "0" (x)); return y + _mm512_extracti64x4_epi64 (a, 0); }
__m128i f10 (__m512i x) { register __m512i a __asm ("zmm16"); __asm ("" : "=v" 
(a) : "0" (x)); return _mm512_extracti32x4_epi32 (a, 0); }
void f11 (__m128i *p, __m512i x) { register __m512i a __asm ("zmm16"); __asm 
("" : "=v" (a) : "0" (x)); *p = _mm512_extracti32x4_epi32 (a, 0); }
__m128i f12 (__m512i x, __m128i y) { register __m512i a __asm ("zmm16"); __asm 
("" : "=v" (a) : "0" (x)); return y + _mm512_extracti32x4_epi32 (a, 0); }
but couldn't reproduce though.

2018-04-11  Jakub Jelinek  

PR target/85328
* config/i386/sse.md
(avx512dq_vextract64x2_1 split,
avx512f_vextract32x4_1 split,
vec_extract_lo_ split, vec_extract_lo_v32hi,
vec_extract_lo_v64qi): For non-AVX512VL if input is xmm16+ reg
and output is a reg, avoid creating invalid lowpart subreg, but
instead split into a 512-bit move.  Don't split if not AVX512VL,
input is xmm16+ reg and output is a mem.
(vec_extract_lo_, vec_extract_lo_v32hi,
vec_extract_lo_v64qi): Don't require split if not AVX512VL, input is
xmm16+ reg and output is a mem.

* gcc.target/i386/pr85328.c: New test.

--- gcc/config/i386/sse.md.jj   2018-04-11 13:36:29.368015262 +0200
+++ gcc/config/i386/sse.md  2018-04-11 17:15:56.175746606 +0200
@@ -7361,9 +7361,21 @@ (define_split
(vec_select:
  (match_operand:V8FI 1 "register_operand")
  (parallel [(const_int 0) (const_int 1)])))]
-  "TARGET_AVX512DQ && reload_completed"
+  "TARGET_AVX512DQ
+   && reload_completed
+   && (TARGET_AVX512VL
+   || REG_P (operands[0])
+   || !EXT_REX_SSE_REG_P (operands[1]))"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (mode, operands[1]);")
+{
+  if (!TARGET_AVX512VL
+  && REG_P (operands[0])
+  && EXT_REX_SSE_REG_P (operands[1]))
+operands[0]
+  = lowpart_subreg (mode, operands[0], mode);
+  else
+operands[1] = gen_lowpart (mode, operands[1]);
+})
 
 (define_insn "avx512f_vextract32x4_1"
   [(set (match_operand: 0 "" 
"=")
@@ -7394,9 +7406,21 @@ (define_split
  (match_operand:V16FI 1 "register_operand")
  (parallel [(const_int 0) (const_int 1)
 (const_int 2) (const_int 3)])))]
-  "TARGET_AVX512F && reload_completed"
+  "TARGET_AVX512F
+   && reload_completed
+   && (TARGET_AVX512VL
+

Re: [wwwdocs] document new options in gcc-8/changes.html

2018-04-11 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00219.html

There's one outstanding typo that Paolo noticed since I posted
the update.  I'll fix that before committing.

On 04/04/2018 04:28 PM, Martin Sebor wrote:

Attached is an updated diff rebased on top of the latest revision
of the file.  This new version fixes the typos Paolo pointed out
(thanks) and adds a few more options:

-Wmissing-attributes, -Wif-not-aligned, and -Wpacked-not-aligned.

I used a spell-checker this time to (hopefully) minimize the typos.

The rest of the changes are described here:
https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00121.html

Martin




Re: [PATCH] Handle empty infinite loops in OpenACC for PR84955

2018-04-11 Thread Cesar Philippidis
On 04/09/2018 04:31 AM, Richard Biener wrote:
> On Fri, 6 Apr 2018, Jakub Jelinek wrote:
> 
>> On Fri, Apr 06, 2018 at 06:48:52AM -0700, Cesar Philippidis wrote:
>>> 2018-04-06  Cesar Philippidis  
>>>
>>> PR middle-end/84955
>>>
>>> gcc/
>>> * cfgloop.c (flow_loops_find): Add assert.
>>> * omp-expand.c (expand_oacc_for): Add dummy false branch for
>>> tiled basic blocks without omp continue statements.
>>> * tree-cfg.c (execute_fixup_cfg): Handle calls to internal
>>> functions like regular functions.
>>>
>>> libgomp/
>>> * testsuite/libgomp.oacc-c-c++-common/pr84955.c: New test.
>>> * testsuite/libgomp.oacc-fortran/pr84955.f90: New test.
>>
>> I'd like to defer the cfgloop.c and tree-cfg.c changes to Richard, just want 
>> to
>> mention that:
>>
>>> --- a/gcc/tree-cfg.c
>>> +++ b/gcc/tree-cfg.c
>>> @@ -9586,10 +9586,7 @@ execute_fixup_cfg (void)
>>>for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
>>> {
>>>   gimple *stmt = gsi_stmt (gsi);
>>> - tree decl = is_gimple_call (stmt)
>>> - ? gimple_call_fndecl (stmt)
>>> - : NULL;
>>> - if (decl)
>>> + if (is_gimple_call (stmt))
>>
>> This change doesn't affect just internal functions, but also all indirect
>> calls through function pointers with const, pure or noreturn attributes.
> 
> I think the change is desirable nevertheless.  The question is if we
> want to do it at this point in time.
> 
> The description of the problem sounds more like LTO writing writing out
> loops without previously fixing up state.  So sth like the following
> which I'd prefer at this stage (the above hunk is ok for stage1 then).

OK, I'll save that hunk for stage 1.

> Index: gcc/lto-streamer-out.c
> ===
> --- gcc/lto-streamer-out.c  (revision 259227)
> +++ gcc/lto-streamer-out.c  (working copy)
> @@ -2084,6 +2151,9 @@ output_function (struct cgraph_node *nod
>/* Set current_function_decl and cfun.  */
>push_cfun (fn);
>  
> +  /* Fixup loops if required to match discovery done in the reader.  */
> +  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> +
>/* Make string 0 be a NULL string.  */
>streamer_write_char_stream (ob->string_stream, 0);
>  
> @@ -2176,12 +2246,13 @@ output_function (struct cgraph_node *nod
>streamer_write_record_start (ob, LTO_null);
>  
>output_cfg (ob, fn);
> -
> -  pop_cfun ();
> }
>else
>  streamer_write_uhwi (ob, 0);
>  
> +  loop_optimizer_finalize ();
> +  pop_cfun ();
> +
>/* Create a section to hold the pickled output of this function.   */
>produce_asm (ob, function);

That worked. Is this patch OK for trunk, GCC 6 and GCC 7?

Thanks,
Cesar

2018-04-11  Cesar Philippidis  
	Richard Biener  

	PR middle-end/84955

	gcc/
	* cfgloop.c (flow_loops_find): Add assert.
	* lto-streamer-out.c (output_function): Fix CFG loop state before
	streaming out.
	* omp-expand.c (expand_oacc_for): Handle calls to internal
	functions like regular functions.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/pr84955.c: New test.
	* testsuite/libgomp.oacc-fortran/pr84955.f90: New test.


diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index 8af793c6015..6e68639452c 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -462,6 +462,9 @@ flow_loops_find (struct loops *loops)
 	{
 	  struct loop *loop;
 
+	  if (!from_scratch)
+	gcc_assert (header->loop_father != NULL);
+
 	  /* The current active loop tree has valid loop-fathers for
 	 header blocks.  */
 	  if (!from_scratch
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index 1d2ab9757f1..fd6788a69b0 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -2084,6 +2084,9 @@ output_function (struct cgraph_node *node)
   /* Set current_function_decl and cfun.  */
   push_cfun (fn);
 
+  /* Fixup loops if required to match discovery done in the reader.  */
+  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
+
   /* Make string 0 be a NULL string.  */
   streamer_write_char_stream (ob->string_stream, 0);
 
@@ -2176,12 +2179,13 @@ output_function (struct cgraph_node *node)
   streamer_write_record_start (ob, LTO_null);
 
   output_cfg (ob, fn);
-
-  pop_cfun ();
}
   else
 streamer_write_uhwi (ob, 0);
 
+  loop_optimizer_finalize ();
+  pop_cfun ();
+
   /* Create a section to hold the pickled output of this function.   */
   produce_asm (ob, function);
 
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index bb204906ea6..c7d30ea3964 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -5439,6 +5439,14 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd)
 
 	  split->flags ^= EDGE_FALLTHRU | EDGE_TRUE_VALUE;
 
+	  /* Add a dummy exit for the tiled block when cont_bb is missing.  */
+	  if (cont_bb == NULL)
+	{
+	  edge e = make_edge (body_bb, exit_bb, EDGE_FALSE_VALUE);
+	  e->probability = profile_probability::even 

Re: RFC: Disable asan tests under ulimit -v

2018-04-11 Thread Jakub Jelinek
On Wed, Apr 11, 2018 at 02:28:09PM -0400, Jason Merrill wrote:
> On Wed, Apr 11, 2018 at 2:07 PM, Jakub Jelinek  wrote:
> > On Wed, Apr 11, 2018 at 01:59:40PM -0400, Jason Merrill wrote:
> >> And this one puts the check in asan_init rather than its users.  OK?
> >
> > tsan tests have the same problem.
> 
> Hmm, tsan tests work fine for me under ulimit -v.

Weird.

> > Wouldn't it be better to have a helper procedure for this and use it next to
> > if [check_effective_target_fsanitize_address]
> 
> Ah, of course, that's where it belongs.  Done below.
> 
> > What exactly does return -code return?
> 
> It forces the caller to return as well.

> commit f980f806f93982ba54390d45ac5ccc8b350b160c
> Author: Jason Merrill 
> Date:   Fri Mar 23 11:14:50 2018 -0400
> 
> * lib/asan-dg.exp: Don't run tests if ulimit -v is set.

Ok.

Jakub


[PATCH] Fix copyprop_hardreg_forward_1 (PR rtl-optimization/85342)

2018-04-11 Thread Jakub Jelinek
Hi!

When switching regcprop.c to use validate_* and apply_change_group,
I have added code to restore recog_data.operands[i] if they have been
replaced after apply_change_group failure.  That is bogus though, when
apply_change_group fails, recog_data.insn is NULL and rest of recog_data
structure is complete garbage; and nothing in copyprop_hardreg_forward_1
seems to use it afterwards anyway, just will call extract_insn on the next
insn.  Furthermore, the "fixups" were only for the recog_data structure
operands itself, nothing else, the instruction itself has been already
corrected by cancel_changes.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-04-11  Jakub Jelinek  

PR rtl-optimization/85342
* regcprop.c (copyprop_hardreg_forward_1): Remove replaced array, use
a bool scalar var inside of the loop instead.  Don't try to update
recog_data.operand after failed apply_change_group.

* gcc.target/i386/pr85342.c: New test.

--- gcc/regcprop.c.jj   2018-01-04 00:43:17.996703342 +0100
+++ gcc/regcprop.c  2018-04-11 16:17:29.883575142 +0200
@@ -751,7 +751,6 @@ copyprop_hardreg_forward_1 (basic_block
   bool is_asm, any_replacements;
   rtx set;
   rtx link;
-  bool replaced[MAX_RECOG_OPERANDS];
   bool changed = false;
   struct kill_set_value_data ksvd;
 
@@ -934,7 +933,7 @@ copyprop_hardreg_forward_1 (basic_block
 eldest live copy that's in an appropriate register class.  */
   for (i = 0; i < n_ops; i++)
{
- replaced[i] = false;
+ bool replaced = false;
 
  /* Don't scan match_operand here, since we've no reg class
 information to pass down.  Any operands that we could
@@ -951,26 +950,26 @@ copyprop_hardreg_forward_1 (basic_block
  if (recog_data.operand_type[i] == OP_IN)
{
  if (op_alt[i].is_address)
-   replaced[i]
+   replaced
  = replace_oldest_value_addr (recog_data.operand_loc[i],
   alternative_class (op_alt, i),
   VOIDmode, ADDR_SPACE_GENERIC,
   insn, vd);
  else if (REG_P (recog_data.operand[i]))
-   replaced[i]
+   replaced
  = replace_oldest_value_reg (recog_data.operand_loc[i],
  alternative_class (op_alt, i),
  insn, vd);
  else if (MEM_P (recog_data.operand[i]))
-   replaced[i] = replace_oldest_value_mem (recog_data.operand[i],
-   insn, vd);
+   replaced = replace_oldest_value_mem (recog_data.operand[i],
+insn, vd);
}
  else if (MEM_P (recog_data.operand[i]))
-   replaced[i] = replace_oldest_value_mem (recog_data.operand[i],
-   insn, vd);
+   replaced = replace_oldest_value_mem (recog_data.operand[i],
+insn, vd);
 
  /* If we performed any replacement, update match_dups.  */
- if (replaced[i])
+ if (replaced)
{
  int j;
  rtx new_rtx;
@@ -989,13 +988,6 @@ copyprop_hardreg_forward_1 (basic_block
{
  if (! apply_change_group ())
{
- for (i = 0; i < n_ops; i++)
-   if (replaced[i])
- {
-   rtx old = *recog_data.operand_loc[i];
-   recog_data.operand[i] = old;
- }
-
  if (dump_file)
fprintf (dump_file,
 "insn %u: reg replacements not verified\n",
--- gcc/testsuite/gcc.target/i386/pr85342.c.jj  2018-04-11 16:25:50.564848408 
+0200
+++ gcc/testsuite/gcc.target/i386/pr85342.c 2018-04-11 16:26:05.534856581 
+0200
@@ -0,0 +1,29 @@
+/* PR rtl-optimization/85342 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -mavx512vl" } */
+
+typedef unsigned char U __attribute__((vector_size (64)));
+typedef unsigned int V __attribute__((vector_size (64)));
+typedef unsigned __int128 W __attribute__((vector_size (64)));
+int i;
+V g, h, z, k, l, m;
+U j;
+
+W
+bar (W o, W p)
+{
+  U q;
+  o |= (W){q[0]} >= o;
+  o += 1 < o;
+  j |= (U){} == j;
+  return i + (W)q + (W)g + (W)h + (W)z + o + (W)j + (W)k + (W)l + (W)m + p;
+}
+
+W
+foo (U u)
+{
+  U q;
+  W r = bar ((W)(U){0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
~0}, (W)q);
+  u += (U)bar ((W){~0}, r);
+  return (W)u;
+}

Jakub


Re: [PATCH] Use --push-state --as-needed and --pop-state instead of --as-needed and --no-as-needed for libgcc

2018-04-11 Thread Jakub Jelinek
On Wed, Apr 11, 2018 at 06:07:17PM +0200, Matthias Klose wrote:
> On 11.04.2018 12:31, Jakub Jelinek wrote:
> > Hi!
> > 
> > As discussed, using --as-needed and --no-as-needed is dangerous, because
> > it results in --no-as-needed even for libraries after -lgcc_s, even when the
> > default is --as-needed or --as-needed has been specified earlier on the
> > command line.
> > 
> > If the linker supports --push-state/--pop-state, we should IMHO use it.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for stage1?
> > 
> > Or is this something we want in GCC8 too?
> 
> this is problematic for binutils versions with --push-state/--pop-state 
> support
> in the BFD linker but not in gold, and then using -fuse-ld=gold.  So maybe the
> version check for the BFD linker should only succeed for the first binutils
> version which also has -push-state/--pop-state support in gold.

Does anybody use -fuse-ld=gold?

> The BFD linker is only able to save exactly one state, and nested --push-state
> calls override the state (binutils PR23043).  Otoh, there is not much linked
> after libgcc, so maybe this is not an issue.

In any case, here is updated patch that will use it only for 2.28+ which
should have both ld.bfd and ld.gold --push-state support.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2018-04-11  Jakub Jelinek  

* configure.ac (LD_AS_NEEDED_OPTION, LD_NO_AS_NEEDED_OPTION): Use
--push-state --as-needed and --pop-state instead of --as-needed and
--no-as-needed if ld supports it.
* configure: Regenerated.

--- gcc/configure.ac.jj 2018-04-10 14:35:49.764788806 +0200
+++ gcc/configure.ac2018-04-11 18:26:16.745563830 +0200
@@ -5517,11 +5517,25 @@ if test $in_tree_ld = yes ; then
   if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 
16 -o "$gcc_cv_gld_major_version" -gt 2 \
  && test $in_tree_ld_is_elf = yes; then
 gcc_cv_ld_as_needed=yes
+if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" 
-ge 28; then
+  gcc_cv_ld_as_needed_option='--push-state --as-needed'
+  gcc_cv_ld_no_as_needed_option='--pop-state'
+fi
   fi
 elif test x$gcc_cv_ld != x; then
   # Check if linker supports --as-needed and --no-as-needed options
   if $gcc_cv_ld --help 2>&1 | grep as-needed > /dev/null; then
 gcc_cv_ld_as_needed=yes
+if $gcc_cv_ld --help 2>&1 | grep push-state > /dev/null \
+   && $gcc_cv_ld --help 2>&1 | grep pop-state > /dev/null \
+   && echo "$ld_ver" | grep GNU > /dev/null \
+   && test "$ld_vers_major" -eq 2 -a "$ld_vers_minor" -ge 28; then
+  # Use these options only when both ld.bfd and ld.gold support
+  # --push-state/--pop-state, which unfortunately wasn't added
+  # at the same time.
+  gcc_cv_ld_as_needed_option='--push-state --as-needed'
+  gcc_cv_ld_no_as_needed_option='--pop-state'
+fi
   fi
   case "$target:$gnu_ld" in
 *-*-solaris2*:no)
--- gcc/configure.jj2018-04-10 14:35:49.875788826 +0200
+++ gcc/configure   2018-04-11 18:26:28.162568402 +0200
@@ -28733,11 +28733,25 @@ if test $in_tree_ld = yes ; then
   if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 
16 -o "$gcc_cv_gld_major_version" -gt 2 \
  && test $in_tree_ld_is_elf = yes; then
 gcc_cv_ld_as_needed=yes
+if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" 
-ge 28; then
+  gcc_cv_ld_as_needed_option='--push-state --as-needed'
+  gcc_cv_ld_no_as_needed_option='--pop-state'
+fi
   fi
 elif test x$gcc_cv_ld != x; then
   # Check if linker supports --as-needed and --no-as-needed options
   if $gcc_cv_ld --help 2>&1 | grep as-needed > /dev/null; then
 gcc_cv_ld_as_needed=yes
+if $gcc_cv_ld --help 2>&1 | grep push-state > /dev/null \
+   && $gcc_cv_ld --help 2>&1 | grep pop-state > /dev/null \
+   && echo "$ld_ver" | grep GNU > /dev/null \
+   && test "$ld_vers_major" -eq 2 -a "$ld_vers_minor" -ge 28; then
+  # Use these options only when both ld.bfd and ld.gold support
+  # --push-state/--pop-state, which unfortunately wasn't added
+  # at the same time.
+  gcc_cv_ld_as_needed_option='--push-state --as-needed'
+  gcc_cv_ld_no_as_needed_option='--pop-state'
+fi
   fi
   case "$target:$gnu_ld" in
 *-*-solaris2*:no)


Jakub


Re: [patch, fortran] Remove parallell annotation from DO CONCURRENT

2018-04-11 Thread Thomas Koenig

Am 11.04.2018 um 20:33 schrieb Jakub Jelinek:


I have attached updated patch which moves the test case to
gfortran.dg/gomp (where it actually passes).


How could it pass there?  dg-do run tests don't belong into g*.dg/gomp/,
nothing adds the -B etc. options needed to find libgomp.spec or libgomp
as a library, or adds it to LD_LIBRARY_PATH etc.
There are zero dg-do run tests in gfortran.dg/gomp/, there are 4
dg-do run tests in c-c++-common/gomp/, but those work fine because they
use -fopenmp-simd option rather than
-fopenmp/-fopenacc/-ftree-parallelize-loops= etc.


So, where should the test go?

The suggestion in PR 85346, to put it into
libgomp/testsuite/libgomp.fortran/, does not work:

Running ../../../../trunk/libgomp/testsuite/libgomp.fortran/fortran.exp ...
FAIL: libgomp.fortran/do_concurrent_5.f90   -O  execution test

even when ne (the array size) has been reduced to 2**20, far below
reasonable memory limits.  The test passes when given the
-O1 -ftree-parallelize-loops=2 options by hand.

So, what's the idea? Is there actually a directory which works,
or are we left with a wrong-code bug for which no test case is
possible? That would be quite bad, I think.

Thomas


Re: [PATCH] Invoke maybe_warn_nonstring_arg for strcpy/stpcpy builtins.

2018-04-11 Thread Martin Sebor

On 04/11/2018 06:47 AM, Andreas Krebbel wrote:

On 04/11/2018 10:02 AM, Jakub Jelinek wrote:

On Wed, Apr 11, 2018 at 09:48:05AM +0200, Andreas Krebbel wrote:

c-c++-common/attr-nonstring-3.c fails on IBM Z. The reason appears to be
that we provide builtin implementations for strcpy and stpcpy.  The
warnings currently will only be emitted when expanding these as normal
calls.

Bootstrapped and regression tested on x86_64 and s390x.

Ok?

gcc/ChangeLog:

2018-04-11  Andreas Krebbel  

* builtins.c (expand_builtin_strcpy): Invoke
maybe_warn_nonstring_arg.
(expand_builtin_stpcpy): Likewise.


Don't you then warn twice if builtin implementations for strcpy and stpcpy
aren't available or can't be used, once here and once in calls.c?


Looks like this could happen if the expander is present but rejects expansion. 
I basically copied
this from the strcmp builtin which looks like possibly running into the same 
problem:


I tried to avoid the problem in the other instances of the call
to maybe_warn_nonstring_arg (e.g., expand_builtin_strlen or
expand_builtin_strcmp).  I don't know if the expander can fail
after the maybe_warn_nonstring_arg() call and so I have no
tests for it.

In your patch the expander failing seems more likely than in
the others (in fact, on x86_64 it always fails because the call
to targetm.have_movstr () in expand_movstr() returns false).

That said, I see two warnings for a call to strcmp() with
a nonstring argument even without the expander failing, so
what I did isn't quite right either.  I opened bug 85359 for
it.

Martin



  /* Check to see if the argument was declared attribute nonstring
 and if so, issue a warning since at this point it's not known
 to be nul-terminated.  */
  tree fndecl = get_callee_fndecl (exp);
  maybe_warn_nonstring_arg (fndecl, exp);

  if (result)
{
  /* Return the value in the proper mode for this function.  */
  machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
  if (GET_MODE (result) == mode)
return result;
  if (target == 0)
return convert_to_mode (mode, result, 0);
  convert_move (target, result, 0);
  return target;
}

  /* Expand the library call ourselves using a stabilized argument
 list to avoid re-evaluating the function's arguments twice.  */
  tree fn = build_call_nofold_loc (EXPR_LOCATION (exp), fndecl, 2, arg1, arg2);
  gcc_assert (TREE_CODE (fn) == CALL_EXPR);
  CALL_EXPR_TAILCALL (fn) = CALL_EXPR_TAILCALL (exp);
  return expand_call (fn, target, target == const0_rtx);

-Andreas-





Re: [PATCH] sel-sched: run cleanup_cfg just before loop_optimizer_init (PR 84659)

2018-04-11 Thread Alexander Monakov
As noted in PR 85354, we cannot simply invoke cfg_cleanup after dominators are
computed, because they may become invalid but neither freed nor recomputed, so
this trips checking in flow_loops_find.

We can move cleanup_cfg earlier (and run it for all sel-sched invocations, not
only when pipelining).

Bootstrapped/regtested on x86_64 and ppc64 (my previous testing missed this
issue: the testcase requires graphite, but libisl wasn't present).

PR rtl-optimization/85354
* sel-sched-ir.c (sel_init_pipelining): Move cfg_cleanup call...
* sel-sched.c (sel_global_init): ... here.

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index 50a7daafba6..ee970522890 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -30,7 +30,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgrtl.h"
 #include "cfganal.h"
 #include "cfgbuild.h"
-#include "cfgcleanup.h"
 #include "insn-config.h"
 #include "insn-attr.h"
 #include "recog.h"
@@ -6122,9 +6121,6 @@ make_regions_from_loop_nest (struct loop *loop)
 void
 sel_init_pipelining (void)
 {
-  /* Remove empty blocks: their presence can break assumptions elsewhere,
- e.g. the logic to invoke update_liveness_on_insn in sel_region_init.  */
-  cleanup_cfg (0);
   /* Collect loop information to be used in outer loops pipelining.  */
   loop_optimizer_init (LOOPS_HAVE_PREHEADERS
| LOOPS_HAVE_FALLTHRU_PREHEADERS
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index cd29df35666..59762964c6e 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm_p.h"
 #include "regs.h"
 #include "cfgbuild.h"
+#include "cfgcleanup.h"
 #include "insn-config.h"
 #include "insn-attr.h"
 #include "params.h"
@@ -7661,6 +7662,10 @@ sel_sched_region (int rgn)
 static void
 sel_global_init (void)
 {
+  /* Remove empty blocks: their presence can break assumptions elsewhere,
+ e.g. the logic to invoke update_liveness_on_insn in sel_region_init.  */
+  cleanup_cfg (0);
+
   calculate_dominance_info (CDI_DOMINATORS);
   alloc_sched_pools ();


Re: [doc PATCH] fix up C++ option references (PR 71283)

2018-04-11 Thread Jason Merrill
On Wed, Apr 11, 2018 at 1:51 PM, Martin Sebor  wrote:
> On 04/11/2018 09:44 AM, Jason Merrill wrote:
>> On 04/05/2018 07:28 PM, Martin Sebor wrote:
>>>
>>> Attached is the final version of the patch to adjust the lists
>>> of options (C++ Language Options and -Wall) to include missing
>>> C++ options, reference the forms of options that aren't
>>> the default, and use TexInfo tables for the lists of options
>>> in -Wall and -Wextra to address Nathan's comment.  The patch
>>> also fixes bug 71283.
>>
>>
>>>  -Wnoexcept  -Wnoexcept-type  -Wclass-memaccess @gol
>>
>> ...
>>>
>>> +-Wclass-memaccess -Wclobbered  -Wcomment  -Wconditionally-supported @gol
>>
>>
>> -Wclass-memaccess is already in the C++ options summary, I don't think
>> we need to also add it to the diagnostic options summary.
>
>
> Some of these C++-only options are listed in 3.8 Options to
> Request or Suppress Warnings which the See Options to Request
> or Suppress Warnings link under Warning Options points to.
>
> I would expect all the warning options mentioned anywhere in
> 3.8 to be listed in the Warning Options summary.  That would
> include all C++-only options in -Wall and -Wextra but not
> other C++-only options (at least not for now).  That's what
> I'm aiming for with the patch; I may have missed some.
>
> It seems to me that the most intuitive organization might
> actually be to list all warning options in the Warning Summary
> section, even if some of them are specific to just a subset of
> languages and also listed in language-specific sections.  (At
> least for the C family.)

> I often have 3.8 Options to Request or Suppress Warnings open
> in my browser and use it to search for all warning options.
> I find it inconvenient (and prone to error) to have to remember
> to also open 3.5 Options Controlling C++ Dialect to look for
> C++-only options that aren't listed in 3.5.

> Does the approach sound like an improvement to you?

I agree that we want to list the options in both 3.5 and 3.8, but the
argument seems weaker for having them under two headings in the same
summary.  Certainly the current state of having some C++ warnings in
the C++ section and some in the warning section is wrong.

>>> -@item -Wsubobject-linkage @r{(C++ and Objective-C++ only)}
>>> +@item -Wno-subobject-linkage @r{(C++ and Objective-C++ only)}
>>>  @opindex Wsubobject-linkage
>>>  @opindex Wno-subobject-linkage
>>>  Warn if a class type has a base or a field whose type uses the anonymous
>>
>>> -@item -Wdelete-incomplete @r{(C++ and Objective-C++ only)}
>>> +@item -Wno-delete-incomplete @r{(C++ and Objective-C++ only)}
>>>  @opindex Wdelete-incomplete
>>>  @opindex Wno-delete-incomplete
>>>  Warn when deleting a pointer to incomplete type, which may cause
>>
>> If you're reversing the sense of the flag, please adjust the
>> documentation to match.
>
> Not sure I understand what part you think needs adjusting.
> I changed it to -Wno- to reflect that the option is enabled
> by default.  Can you elaborate?

-Wno-* doesn't mean "Warn...", it means "Don't warn..."

Jason


Re: [wwwdocs] document new options in gcc-8/changes.html

2018-04-11 Thread Jason Merrill
OK.

On Wed, Apr 11, 2018 at 3:13 PM, Martin Sebor  wrote:
> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00219.html
>
> There's one outstanding typo that Paolo noticed since I posted
> the update.  I'll fix that before committing.
>
>
> On 04/04/2018 04:28 PM, Martin Sebor wrote:
>>
>> Attached is an updated diff rebased on top of the latest revision
>> of the file.  This new version fixes the typos Paolo pointed out
>> (thanks) and adds a few more options:
>>
>> -Wmissing-attributes, -Wif-not-aligned, and -Wpacked-not-aligned.
>>
>> I used a spell-checker this time to (hopefully) minimize the typos.
>>
>> The rest of the changes are described here:
>> https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00121.html
>>
>> Martin
>
>


Re: [wwwdocs] document new options in gcc-8/changes.html

2018-04-11 Thread Martin Sebor

On 04/04/2018 05:03 PM, Paolo Carlini wrote:

Hi Martin

On 05/04/2018 00:28, Martin Sebor wrote:

+  implementations do suppresses the warning.

suppress


I was about to fix this but re-reading the full sentence made
me realize it's correct as is:

  Note that due to GCC bug 82944, defining strncat, strncpy, or
  stpncpy as a macro in a system header as some implementations
  do suppresses the warning.

I've added a comma after the suppresses to make it clearer and
checked in revision 1.63.

Martin