[PATCH] Fix handling of ZERO_EXTRACT lhs with REG_EQUAL note in the combiner (PR target/69442)

2016-01-26 Thread Jakub Jelinek
Hi!

In the middle of last year, Kugan has defined REG_EQUAL notes even for the
case when SET_DEST of the single set is ZERO_EXTRACT, before that I believe
it has been defined only for REG/SUBREG and STRICT_LOW_PART thereof.
But, like for STRICT_LOW_PART, the REG_EQUAL note describes the whole
contained register rather than just the bits that are being set.
Unfortunately, neither the documentation nor the combiner have been changed
for this, so if say on
(insn 7 4 8 2 (set (reg:SI 117)
(const_int 65535 [0x])) pr69442.c:7 614 {*arm_movsi_vfp}
 (nil))
...
(insn 16 14 17 2 (set (reg:SI 123)
(const_int 0 [0])) pr69442.c:9 614 {*arm_movsi_vfp}
 (nil))
(insn 17 16 18 2 (set (zero_extract:SI (reg:SI 123)
(const_int 16 [0x10])
(const_int 16 [0x10]))
(reg:SI 117)) pr69442.c:9 83 {insv_t2}
 (expr_list:REG_DEAD (reg:SI 117)
(expr_list:REG_EQUAL (const_int -65536 [0x])
(nil
the combiner would replace the SET_SRC of the set with
-65536, but that has different meaning, it says that the upper 16 bits
of a register containing zero are supposed to be set to -65536 (and as we
care just about 16 bits, that is in fact 0).  So, either we could shift
the constant around and say that the zero extract is set to
-65536 >> 16 in this case, but then we don't say anything about the other
bits and if e.g. there are multiple setters of the earlier 123 pseudo value,
we'd give up, or as the patch does, it says the whole inner register
is set to the given constant.  This setting doesn't have to be recognized as
valid, the whole point of the temporary assignment is just improve the
try_combine and then revert it.

Bootstrapped/regtested on x86_64-linux and i686-linux, Kugan has kindly
tested it on arm too (my armhfp bootstrap/regtest of this is progressing too
slowly).  Ok for trunk?

2016-01-26  Jakub Jelinek  

PR target/69442
* combine.c (combine_instructions): For REG_EQUAL note with
SET_DEST being ZERO_EXTRACT, also temporarily set SET_DEST
to the underlying register.
* doc/rtl.texi (REG_EQUAL): Document the behavior of
REG_EQUAL/REG_EQUIV notes if SET_DEST is ZERO_EXTRACT.

* gcc.dg/pr69442.c: New test.

--- gcc/combine.c.jj2016-01-14 16:10:24.0 +0100
+++ gcc/combine.c   2016-01-25 21:03:35.572594150 +0100
@@ -1454,15 +1454,21 @@ combine_instructions (rtx_insn *f, unsig
  && ! unmentioned_reg_p (note, SET_SRC (set))
  && (GET_MODE (note) == VOIDmode
  ? SCALAR_INT_MODE_P (GET_MODE (SET_DEST (set)))
- : GET_MODE (SET_DEST (set)) == GET_MODE (note)))
+ : (GET_MODE (SET_DEST (set)) == GET_MODE (note)
+&& (GET_CODE (SET_DEST (set)) != ZERO_EXTRACT
+|| (GET_MODE (XEXP (SET_DEST (set), 0))
+== GET_MODE (note))
{
  /* Temporarily replace the set's source with the
 contents of the REG_EQUAL note.  The insn will
 be deleted or recognized by try_combine.  */
- rtx orig = SET_SRC (set);
+ rtx orig_src = SET_SRC (set);
+ rtx orig_dest = SET_DEST (set);
+ if (GET_CODE (SET_DEST (set)) == ZERO_EXTRACT)
+   SET_DEST (set) = XEXP (SET_DEST (set), 0);
  SET_SRC (set) = note;
  i2mod = temp;
- i2mod_old_rhs = copy_rtx (orig);
+ i2mod_old_rhs = copy_rtx (orig_src);
  i2mod_new_rhs = copy_rtx (note);
  next = try_combine (insn, i2mod, NULL, NULL,
  &new_direct_jump_p,
@@ -1473,7 +1479,8 @@ combine_instructions (rtx_insn *f, unsig
  statistics_counter_event (cfun, "insn-with-note combine", 
1);
  goto retry;
}
- SET_SRC (set) = orig;
+ SET_SRC (set) = orig_src;
+ SET_DEST (set) = orig_dest;
}
}
 
--- gcc/doc/rtl.texi.jj 2016-01-04 14:55:58.0 +0100
+++ gcc/doc/rtl.texi2016-01-25 18:51:23.813258380 +0100
@@ -3915,9 +3915,9 @@ indicates that that register will be equ
 scope of this equivalence differs between the two types of notes.  The
 value which the insn explicitly copies into the register may look
 different from @var{op}, but they will be equal at run time.  If the
-output of the single @code{set} is a @code{strict_low_part} expression,
-the note refers to the register that is contained in @code{SUBREG_REG}
-of the @code{subreg} expression.
+output of the single @code{set} is a @code{strict_low_part} or
+@code{zero_extract} expression, the note refers to the register that
+is contained in its first operand.
 
 For @code{REG_EQUIV}, the register is equivalent to @var{op} through

Re: [PATCH] PR target/68986: [5/6 Regression] internal compiler error: Segmentation fault

2016-01-26 Thread Uros Bizjak
On Tue, Jan 26, 2016 at 1:58 AM, H.J. Lu  wrote:
> Stack alignment adjustment for __tls_get_addr should be done in
> ix86_update_stack_boundary, not ix86_compute_frame_layout.  Also
> there is no need to over-align stack for __tls_get_addr and function
> with __tls_get_addr call isn't a leaf function.
>
> Tested on x86-64 with -m32 on testsuite.  OK for trunk?

OK, but please write the second part without extra parenthesis as:

  unsigned int stack_realign
= (incoming_stack_boundary
   < (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor
  ? crtl->max_used_stack_slot_alignment
  : crtl->stack_alignment_needed));

Thanks,
Uros.

>
> Thanks.
>
> H.J.
> ---
> gcc/
>
> PR target/68986
> * config/i386/i386.c (ix86_compute_frame_layout): Move stack
> alignment adjustment to ...
> (ix86_update_stack_boundary): Here.  Don't over-align stack for
> __tls_get_addr.
> (ix86_finalize_stack_realign_flags): Use stack_alignment_needed
> if __tls_get_addr is called.
>
> gcc/testsuite/
>
> PR target/68986
> * gcc.target/i386/pr68986-1.c: New test.
> * gcc.target/i386/pr68986-2.c: Likewise.
> * gcc.target/i386/pr68986-3.c: Likewise.
> ---
>  gcc/config/i386/i386.c| 24 +++-
>  gcc/testsuite/gcc.target/i386/pr68986-1.c | 11 +++
>  gcc/testsuite/gcc.target/i386/pr68986-2.c | 13 +
>  gcc/testsuite/gcc.target/i386/pr68986-3.c | 13 +
>  4 files changed, 48 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr68986-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr68986-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr68986-3.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 34b57a4..9c27ea9 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -11360,18 +11360,6 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
>crtl->preferred_stack_boundary = 128;
>crtl->stack_alignment_needed = 128;
>  }
> -  /* preferred_stack_boundary is never updated for call
> - expanded from tls descriptor. Update it here. We don't update it in
> - expand stage because according to the comments before
> - ix86_current_function_calls_tls_descriptor, tls calls may be optimized
> - away.  */
> -  else if (ix86_current_function_calls_tls_descriptor
> -  && crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
> -{
> -  crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
> -  if (crtl->stack_alignment_needed < PREFERRED_STACK_BOUNDARY)
> -   crtl->stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
> -}
>
>stack_alignment_needed = crtl->stack_alignment_needed / BITS_PER_UNIT;
>preferred_alignment = crtl->preferred_stack_boundary / BITS_PER_UNIT;
> @@ -12043,6 +12031,15 @@ ix86_update_stack_boundary (void)
>&& cfun->stdarg
>&& crtl->stack_alignment_estimated < 128)
>  crtl->stack_alignment_estimated = 128;
> +
> +  /* __tls_get_addr needs to be called with 16-byte aligned stack.  */
> +  if (ix86_tls_descriptor_calls_expanded_in_cfun
> +  && crtl->preferred_stack_boundary < 128)
> +{
> +  crtl->preferred_stack_boundary = 128;
> +  if (crtl->stack_alignment_needed < 128)
> +   crtl->stack_alignment_needed = 128;
> +}
>  }
>
>  /* Handle the TARGET_GET_DRAP_RTX hook.  Return NULL if no DRAP is
> @@ -12506,7 +12503,8 @@ ix86_finalize_stack_realign_flags (void)
>  = (crtl->parm_stack_boundary > ix86_incoming_stack_boundary
> ? crtl->parm_stack_boundary : ix86_incoming_stack_boundary);
>unsigned int stack_realign = (incoming_stack_boundary
> -   < (crtl->is_leaf
> +   < ((crtl->is_leaf
> +   && 
> !ix86_current_function_calls_tls_descriptor)
>? crtl->max_used_stack_slot_alignment
>: crtl->stack_alignment_needed));
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr68986-1.c 
> b/gcc/testsuite/gcc.target/i386/pr68986-1.c
> new file mode 100644
> index 000..998f34f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr68986-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target tls_native } */
> +/* { dg-require-effective-target fpic } */
> +/* { dg-options "-fPIC -mno-accumulate-outgoing-args 
> -mpreferred-stack-boundary=5 -mincoming-stack-boundary=4" } */
> +
> +extern __thread int msgdata;
> +int
> +foo ()
> +{
> +  return msgdata;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr68986-2.c 
> b/gcc/testsuite/gcc.target/i386/pr68986-2.c
> new file mode 100644
> index 000..23f9a52
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr68986-2.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target ia32 } } */
> +/* { dg-require-effective-target tls_n

Re: [PATCH] jit: Fix missing references to pthread in jit-playback.c

2016-01-26 Thread Iain Buclaw
On 26 January 2016 at 01:33, David Malcolm  wrote:
>
> On Sat, 2016-01-23 at 19:08 +0100, Iain Buclaw wrote:
> > Hi,
> >
> > I noticed when building from 2016-01-17 snapshot that the JIT frontend
> > failed to build.
> >
> > ---
> > jit-playback.c:2075:36: error: ‘PTHREAD_MUTEX_INITIALIZER’ was not
> > declared in this scope
> > jit-playback.c: In member function ‘void
> > gcc::jit::playback::context::acquire_mutex()’:
> > jit-playback.c:2086:33: error: ‘pthread_mutex_lock’ was not declared
> > in this scope
> > jit-playback.c: In member function ‘void
> > gcc::jit::playback::context::release_mutex()’:
> > jit-playback.c:2100:35: error: ‘pthread_mutex_unlock’ was not declared
> > in this scope
> > ---
> >
> > I'm not sure if this is something environmental on my side, or some
> > reorder/removals were done in the gcc headers included by the JIT
> > frontend, however this was needed in order to continue.
>
> Thanks.  Doko just reported the same issue, and I now see it (with
> r232813) so this isn't just at your end.
>
> OK for trunk.
>
> Dave
>

Thanks, I've committed this in.

Iain.


Re: [Patch, fortran] PR69385 - [6 regression] ICE on valid with -fcheck=mem

2016-01-26 Thread Paul Richard Thomas
Hi Janus,

Thanks!

In fact I was asking you if you would submit/commit in advance of my
return :-) I'll do the business tonight.

Cheers

Paul

On 25 January 2016 at 22:13, Janus Weil  wrote:
> Hi Paul,
>
> seems we were pretty well-synchronized in posting this (in the PR it
> sounded as if you wanted me to submit it ...)
>
> In any case, the patch is ok for my taste.
>
> Thanks!
>
> Cheers,
> Janus
>
>
>
> 2016-01-25 22:02 GMT+01:00 Paul Richard Thomas 
> :
>> Dear All,
>>
>> The initial report concerns initialization assignments that should be
>> excluded from the check for assignment of scalars to unallocated
>> arrays. This part is so trivial that it does not require a test. On
>> the other hand, the block that implemented the check was plain and
>> simple wrong and the rest of the patch corrects this. It is commented
>> such as to be fully comprehensible.
>>
>> Bootstrapped and regtested on FC21/x86_64 - OK for trunk and for
>> 5-branch when all the wrinkles (PR69422 and 69423) are sorted out?
>>
>> Cheers
>>
>> Paul
>>
>> 2016-01-25  Paul Thomas  
>>
>> PR fortran/69385
>> * trans-expr.c (gfc_trans_assignment_1): Exclude initialization
>> assignments from check on assignment of scalars to unassigned
>> arrays and correct wrong code within the corresponding block.
>>
>> 2015-01-25  Paul Thomas  
>>
>> PR fortran/69385
>> * gfortran.dg/allocate_error_6.f90: New test.



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


Re: [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr

2016-01-26 Thread Kyrill Tkachov


On 25/01/16 19:06, Christophe Lyon wrote:

On 22 January 2016 at 12:56, Richard Biener  wrote:

On Fri, Jan 22, 2016 at 12:41 PM, Christian Bruel
 wrote:


On 01/19/2016 04:18 PM, Richard Biener wrote:

maybe just if (currently_expanding_to_rtl)?

But yes, this looks like a safe variant of the fix.

Richard.


thanks, currently_expanding_to_rtl works perfectly. So the final version.
I added a test for each target.

Ok.


Hi,

This small patch is needed to make the new test pass on arm hard-float
targets (eg. arm-none-linux-gnueabihf).

I'm not sure it counts as obvious, so here it is.
OK?


Ok.

Thanks,
Kyrill


Christophe.

DATE  Christophe Lyon  

 * gcc.target/arm/pr68674.c: Check and use arm_fp effective target.



Thanks,
Richard.


bootstrapped / tested for :
 unix/-m32/-march=i586
 unix

 arm-qemu/
 arm-qemu//-mfpu=neon
 arm-qemu//-mfpu=neon-fp-armv8

 aarch64-qemu











Re: Speedup configure and build with system.h

2016-01-26 Thread Uros Bizjak
On Mon, Jan 25, 2016 at 2:53 PM, Michael Matz  wrote:
> Hi,
>
> On Mon, 25 Jan 2016, Uros Bizjak wrote:
>
>> This patch caused bootstrap failure on non-c++11 bootstrap compiler
>> [1], e.g. CentOS 5.11.
>>
>> The problem is with std::swap, which was defined in header 
>> until c++11 [2].
>>
>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69464
>> [2] http://en.cppreference.com/w/cpp/algorithm/swap
>
> Meh.  Can you try the attached patch with a configure test (it includes
> the generated files)?  It works for me with 4.3.4, and should make your
> build include  always.

Yes, this patch works for me and allows bootstrap with gcc-4.1.2 to finish.

Thanks,
Uros.


Re: [PATCH] Fix handling of ZERO_EXTRACT lhs with REG_EQUAL note in the combiner (PR target/69442)

2016-01-26 Thread Bernd Schmidt

On 01/26/2016 09:39 AM, Jakub Jelinek wrote:


PR target/69442
* combine.c (combine_instructions): For REG_EQUAL note with
SET_DEST being ZERO_EXTRACT, also temporarily set SET_DEST
to the underlying register.
* doc/rtl.texi (REG_EQUAL): Document the behavior of
REG_EQUAL/REG_EQUIV notes if SET_DEST is ZERO_EXTRACT.

* gcc.dg/pr69442.c: New test.


Ok.


Bernd



(Non-)offloading diagnostics (was: [hsa 0/10] Merge of HSA branch)

2016-01-26 Thread Thomas Schwinge
Hi!

On Thu, 10 Dec 2015 18:51:48 +0100, Martin Jambor  wrote:
> On Mon, Dec 07, 2015 at 12:46:45PM +0100, Jakub Jelinek wrote:
> > On Mon, Dec 07, 2015 at 12:17:58PM +0100, Martin Jambor wrote:
> > > [...]  There are no failing
> > > testcases if HSA is not configured.  If it is, there are some, all of
> > > which fall into one the following categories:
> > > 
> > >   1) HSA cannot compile a function for one reason or another (most
> > >  common cause is inability of HSA to take an address of a function
> > >  or make an indirect call) and gives a warning, which is regarded
> > >  as an "excess error" by dejagnu.

Confirmed:

[...]/gcc/testsuite/c-c++-common/gomp/clauses-1.c: In function 
'bar._omp_fn.26.hsa.31':
cc1: warning: could not emit HSAIL for the function [-Whsa]
cc1: note: support for HSA does not implement non-gridified OpenMP parallel 
constructs.
[...]

..., and many more.  So, with --enable-offload-targets=[...],hsa we
regress (PASS -> FAIL; "test for excess errors") such compile tests.

> > It would be good if there is a -W* switch to turn such warnings off.
> > Not just for the purposes of dejagnu libgomp testing, but say one
> > might try to compile a program primarily say for XeonPhi or PTX offloading,
> > but have HSA enabled to, but care primarily about the former two, etc.
> 
> All these warnings are in the -Whsa group and can be suppressed with
> -Wno-hsa.

These compile tests are done without any -W* flags; -Whsa is enabled by
default.  How to address this mismatch?  Put -Wno-has into all regressing
test case files individually?  Run the affected testsuites with -Wno-hsa?
Not enable -Whsa by default (but I agree it's useful to users)?
(Instead, enable with -Wall, which any sane user should be specifying?)


A very similar problem also exists for nvptx offloading (Nathan CCed),
where we emit similar warnings (enabled by default).  As nvptx offloading
happens during link-time (not compile-time, as with hsa offloading),
these don't affect GCC's compile tests, but need to be worked around in
libgomp test cases.


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH, libstdc++-v3] Fix import of wide character related symbols in stdlib.h wraper

2016-01-26 Thread Jonathan Wakely

On 26/01/16 06:40 +0200, Andris Pavenis wrote:

include/c_compatibility/stdlib.h imports wide character related symbols
into global namespace unconditionaly which causes libstdc++-v3 build
to fail when one or both of _GLIBCXX_USE_WCHAR_T and _GLIBCXX_HAVE_MBSTATE_T
are not defined.

Included patch changes it to import them into global namespace only
when they are defined in cstdlib


OK for trunk, thanks.


Re: [PATCH] PR other/69006: fix extra newlines after diagnostics (v2)

2016-01-26 Thread Bernd Schmidt

On 01/25/2016 09:13 PM, David Malcolm wrote:

Here's an updated version of the patch.


Thanks!


Instead of testing one particular kind of output via a plugin,
this version of the patch adds code to gcc-dg-prune to issue a
FAIL for any testcase containing blank lines, with a new
   dg-allow-blank-lines-in-output
directive for those test cases that legimitately emit blank lines.
Examples of the latter include a test using -ftime-report, another
using -fdump-tree-cunrolli-details=stderr, and a Fortran test
using -fdump-fortran-original.


Is the =stderr test really necessary, or does it somehow predate the 
ability to scan dumps?



OK for trunk in stage 4?  I regard PR 69006 as a regression, and it
affects all diagnostics we output (unless caret-printing is
disabled).


Yes, I think so. Ok.


+  for (int row = layout.get_first_line ();
+   row <= last_line;
+   row++)


While you're fixing the layout here, see if that doesn't all fit on one 
line.



Bernd


Re: (Non-)offloading diagnostics (was: [hsa 0/10] Merge of HSA branch)

2016-01-26 Thread Alexander Monakov
On Tue, 26 Jan 2016, Thomas Schwinge wrote:
> A very similar problem also exists for nvptx offloading (Nathan CCed),
> where we emit similar warnings (enabled by default).  As nvptx offloading
> happens during link-time (not compile-time, as with hsa offloading),
> these don't affect GCC's compile tests, but need to be worked around in
> libgomp test cases.

Can you mention some examples please?  It's not clear to me what exactly you
have in mind.

Thanks.
Alexander


Re: [PATCH] pr69477 - attribute aligned documentation misleading

2016-01-26 Thread Bernd Schmidt

On 01/25/2016 11:13 PM, Martin Sebor wrote:

The attached patch adjusts the documentation of attribute aligned
and attribute pack so as to prevent misreading the text of the
former attribute as if it had read:

   Specifying attribute aligned for struct and union types is
   equivalent to specifying the packed attribute on each of
   the structure or union members. ...


This is OK. It looks like Sandra moved this by accident in r222714, it 
used to be in the right place in gcc-4.8. Probably the @opindex 
fshort-enums caused confusion.



Bernd


Re: (Non-)offloading diagnostics

2016-01-26 Thread Thomas Schwinge
Hi!

On Tue, 26 Jan 2016 14:18:31 +0300 (MSK), Alexander Monakov 
 wrote:
> On Tue, 26 Jan 2016, Thomas Schwinge wrote:
> > A very similar problem also exists for nvptx offloading (Nathan CCed),
> > where we emit similar warnings (enabled by default).  As nvptx offloading
> > happens during link-time (not compile-time, as with hsa offloading),
> > these don't affect GCC's compile tests, but need to be worked around in
> > libgomp test cases.
> 
> Can you mention some examples please?  It's not clear to me what exactly you
> have in mind.

$ git grep -A3 warning upstream/gomp-4_0-branch -- gcc/config/nvptx/
upstream/gomp-4_0-branch:gcc/config/nvptx/nvptx.c:  warning_at 
(DECL_SOURCE_LOCATION (decl), 0,
upstream/gomp-4_0-branch:gcc/config/nvptx/nvptx.c-  
"OpenACC kernels construct will be executed "
upstream/gomp-4_0-branch:gcc/config/nvptx/nvptx.c-  
"sequentially; will by default avoid offloading "
upstream/gomp-4_0-branch:gcc/config/nvptx/nvptx.c-  
"to prevent data copy penalty");
--
upstream/gomp-4_0-branch:gcc/config/nvptx/nvptx.c:  warning_at (decl ? 
DECL_SOURCE_LOCATION (decl) : UNKNOWN_LOCATION, 0,
upstream/gomp-4_0-branch:gcc/config/nvptx/nvptx.c-  
dims[GOMP_DIM_VECTOR]
upstream/gomp-4_0-branch:gcc/config/nvptx/nvptx.c-  ? 
"using vector_length (%d), ignoring %d"
upstream/gomp-4_0-branch:gcc/config/nvptx/nvptx.c-  : 
"using vector_length (%d), ignoring runtime setting",
--
upstream/gomp-4_0-branch:gcc/config/nvptx/nvptx.c:  warning_at (decl ? 
DECL_SOURCE_LOCATION (decl) : UNKNOWN_LOCATION, 0,
upstream/gomp-4_0-branch:gcc/config/nvptx/nvptx.c-"using 
num_workers (%d), ignoring %d",
upstream/gomp-4_0-branch:gcc/config/nvptx/nvptx.c-
PTX_WORKER_LENGTH, dims[GOMP_DIM_WORKER]);
upstream/gomp-4_0-branch:gcc/config/nvptx/nvptx.c-  
dims[GOMP_DIM_WORKER] = PTX_WORKER_LENGTH;

(The latter two are present in trunk already.)


Grüße
 Thomas


signature.asc
Description: PGP signature


[PATCH] Fix PR69452

2016-01-26 Thread Richard Biener

The following fixes PR69452 - we were using dom order to hoist
stmts and PHIs and expected that to preserve proper def order.
That obviously doesn't work - the following makes us use RPO order
instead.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-26  Richard Biener  

PR tree-optimization/69452
* tree-ssa-loop-im.c (move_computations_dom_walker): Remove.
(move_computations_dom_walker::before_dom_children): Rename
to ...
(move_computations_worker): This.
(move_computations): Perform an RPO rather than a DOM walk.

* gcc.dg/torture/pr69452.c: New testcase.

Index: gcc/tree-ssa-loop-im.c
===
*** gcc/tree-ssa-loop-im.c  (revision 232792)
--- gcc/tree-ssa-loop-im.c  (working copy)
*** public:
*** 1112,1126 
 data stored in LIM_DATA structures associated with each statement.  
Callback
 for walk_dominator_tree.  */
  
! edge
! move_computations_dom_walker::before_dom_children (basic_block bb)
  {
struct loop *level;
unsigned cost = 0;
struct lim_aux_data *lim_data;
  
if (!loop_outer (bb->loop_father))
! return NULL;
  
for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi); )
  {
--- 1112,1127 
 data stored in LIM_DATA structures associated with each statement.  
Callback
 for walk_dominator_tree.  */
  
! unsigned int
! move_computations_worker (basic_block bb)
  {
struct loop *level;
unsigned cost = 0;
struct lim_aux_data *lim_data;
+   unsigned int todo = 0;
  
if (!loop_outer (bb->loop_father))
! return todo;
  
for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi); )
  {
*** move_computations_dom_walker::before_dom
*** 1171,1177 
  gimple_cond_lhs (cond), gimple_cond_rhs (cond));
  new_stmt = gimple_build_assign (gimple_phi_result (stmt),
  COND_EXPR, t, arg0, arg1);
! todo_ |= TODO_cleanup_cfg;
}
if (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (new_stmt)))
  && (!ALWAYS_EXECUTED_IN (bb)
--- 1172,1178 
  gimple_cond_lhs (cond), gimple_cond_rhs (cond));
  new_stmt = gimple_build_assign (gimple_phi_result (stmt),
  COND_EXPR, t, arg0, arg1);
! todo |= TODO_cleanup_cfg;
}
if (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (new_stmt)))
  && (!ALWAYS_EXECUTED_IN (bb)
*** move_computations_dom_walker::before_dom
*** 1266,1272 
else
gsi_insert_on_edge (e, stmt);
  }
!   return NULL;
  }
  
  /* Hoist the statements out of the loops prescribed by data stored in
--- 1267,1274 
else
gsi_insert_on_edge (e, stmt);
  }
! 
!   return todo;
  }
  
  /* Hoist the statements out of the loops prescribed by data stored in
*** move_computations_dom_walker::before_dom
*** 1275,1288 
  static unsigned int
  move_computations (void)
  {
!   move_computations_dom_walker walker (CDI_DOMINATORS);
!   walker.walk (cfun->cfg->x_entry_block_ptr);
  
gsi_commit_edge_inserts ();
if (need_ssa_update_p (cfun))
  rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
  
!   return walker.todo_;
  }
  
  /* Checks whether the statement defining variable *INDEX can be hoisted
--- 1277,1296 
  static unsigned int
  move_computations (void)
  {
!   int *rpo = XNEWVEC (int, last_basic_block_for_fn (cfun));
!   int n = pre_and_rev_post_order_compute_fn (cfun, NULL, rpo, false);
!   unsigned todo = 0;
! 
!   for (int i = 0; i < n; ++i)
! todo |= move_computations_worker (BASIC_BLOCK_FOR_FN (cfun, rpo[i]));
! 
!   free (rpo);
  
gsi_commit_edge_inserts ();
if (need_ssa_update_p (cfun))
  rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
  
!   return todo;
  }
  
  /* Checks whether the statement defining variable *INDEX can be hoisted
Index: gcc/testsuite/gcc.dg/torture/pr69452.c
===
*** gcc/testsuite/gcc.dg/torture/pr69452.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr69452.c  (working copy)
***
*** 0 
--- 1,35 
+ /* { dg-do compile } */
+ 
+ short a, f, h;
+ struct S0 {
+ int f0;
+ } b;
+ char c, d, e, j, k;
+ int g;
+ char fn1(char p1, int p2) { return 7 >> p2 ? p1 : p2; }
+ void fn2() {
+ int l, m, n;
+ struct S0 o = {0};
+ for (;;) {
+   int p = 1, r = e;
+   unsigned q = 6;
+   l = r == 0 ? q : q % r;
+   n = l;
+   c = f;
+   k = fn1(p, n ^ e);
+   char s = k;
+   j = s / 6;
+   if (j) {
+   int t = d, u = m = d ? t : t / d;
+   h = a || u;
+   b.f0 = h;
+   for (; d;)
+ ;
+   } else {
+   b = o;
+   if (d != g)
+ for (;;)

[PATCH] Fix PR69467

2016-01-26 Thread Richard Biener

The following guards X * CST CMP 0 similar to how we guarded other
compare patterns.

Yuri confirmed this solves the performance regression observed.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

2016-01-26  Richard Biener  

PR middle-end/69467
* match.pd: Guard X * CST CMP 0 pattern with single_use.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 232792)
+++ gcc/match.pd(working copy)
@@ -1821,12 +1821,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for cmp (simple_comparison)
  scmp (swapped_simple_comparison)
  (simplify
-  (cmp (mult @0 INTEGER_CST@1) integer_zerop@2)
+  (cmp (mult@3 @0 INTEGER_CST@1) integer_zerop@2)
   /* Handle unfolded multiplication by zero.  */
   (if (integer_zerop (@1))
(cmp @1 @2)
(if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
-   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
+   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+   && single_use (@3))
 /* If @1 is negative we swap the sense of the comparison.  */
 (if (tree_int_cst_sgn (@1) < 0)
  (scmp @0 @2)


[IA-64] Fix ICE on gcc.dg/vect/vect-cond-11.c

2016-01-26 Thread Eric Botcazou
I happened to note that there is an ICE in the C testsuite on IA-64 and that 
it is trivial to fix, so here is the result.

Tested on ia64-suse-linux, applied on the mainline and 5 branch as obvious.


2016-01-26  Eric Botcazou  

* config/ia64/ia64.c (ia64_expand_vecint_compare): Use gen_int_mode.

-- 
Eric BotcazouIndex: config/ia64/ia64.c
===
--- config/ia64/ia64.c	(revision 232773)
+++ config/ia64/ia64.c	(working copy)
@@ -1908,7 +1908,7 @@ ia64_expand_vecint_compare (enum rtx_cod
 
 	/* Subtract (-(INT MAX) - 1) from both operands to make
 	   them signed.  */
-	mask = GEN_INT (0x8000);
+	mask = gen_int_mode (0x8000, SImode);
 	mask = gen_rtx_CONST_VECTOR (V2SImode, gen_rtvec (2, mask, mask));
 	mask = force_reg (mode, mask);
 	t1 = gen_reg_rtx (mode);


Re: Patch RFA: Add option -fcollectible-pointers, use it in ivopts

2016-01-26 Thread Bernd Schmidt

On 01/25/2016 05:03 PM, Ian Lance Taylor wrote:

On Mon, Jan 25, 2016 at 3:39 AM, Bernd Schmidt  wrote:

On 01/23/2016 12:52 AM, Ian Lance Taylor wrote:


2016-01-22  Ian Lance Taylor  

* common.opt (fkeep-gc-roots-live): New option.
* tree-ssa-loop-ivopts.c (add_candidate_1): If
-fkeep-gc-roots-live, skip pointers.
(add_iv_candidate_for_biv): Handle add_candidate_1 returning
NULL.
* doc/invoke.texi (Optimize Options): Document
-fkeep-gc-roots-live.

gcc/testsuite/ChangeLog:

2016-01-22  Ian Lance Taylor  

* gcc.dg/tree-ssa/ivopt_5.c: New test.



Patch not attached?


The patch is there in the mailing list.  See the attachment on
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01781.html .


That seems to be the old patch. At least it doesn't seem to match the 
ChangeLog quoted above.



Bernd


Re: Wonly-top-basic-asm

2016-01-26 Thread Bernd Schmidt

On 01/26/2016 01:29 AM, Segher Boessenkool wrote:


In my opinion we should not warn for any asm that means the same both
as basic and as extended asm.  The problem then becomes, what *is* the
meaning of a basic asm, what does it clobber.


I think this may be too hard to figure out in general without parsing 
the asm string, which we don't really want to do.



Bernd


Re: [PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices

2016-01-26 Thread Tom de Vries

On 24/01/16 09:04, Richard Biener wrote:

On January 23, 2016 7:44:23 PM GMT+01:00, Sebastian Pop  
wrote:

On Sat, Jan 23, 2016 at 12:28 PM, Tom de Vries 
wrote:

That was my original patch, and Richard commented: 'I think avoiding

a NULL

access_fns is ok but it should be done unconditionally, not only for

the

DECL_P case'. In order words, he asked me to do the exact opposite of

the

change you now propose.



In the case of a DECL_P it is correct to say that it has an access
function of 0.
In the graphite testcase it is not correct to say that the access
function for a given data reference is zero:
we only initialize access_fns in the case of a polynomial chrec:

  if (TREE_CODE (ref) == MEM_REF)
{
  op = TREE_OPERAND (ref, 0);
  access_fn = analyze_scalar_evolution (loop, op);
  access_fn = instantiate_scev (before_loop, loop, access_fn);
  if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC)
{
[...]
   access_fns.safe_push (access_fn);
}
}

In all other cases we may not have a representation of the access
functions.
It is incorrect to initialize to "A[0]" all those data references that
cannot be analyzed.


But does it matter as the base will not be equal with one that can be analyzed?



I'd like to propose a different fix.

I think the root cause of the problem is as follows:

The semantics of DDR_ARE_DEPENDENT is:
...
when "ARE_DEPENDENT == NULL_TREE", there exist a dependence
relation between A and B, and the description of this relation
is given in the SUBSCRIPTS array
...

When A and B have DR_NUM_DIMENSIONS == 0, 
initialize_data_dependence_relation can create a ddr with 
DDR_NUM_SUBSCRIPTS == 0, and in the case of our test-case, it does.


I think this is the root cause: initialize_data_dependence_relation 
creates a ddr with DDR_ARE_DEPENDENT (ddr) == NULL_TREE and 
DDR_NUM_SUBSCRIPTS (ddr) == 0, which violates the semantics of 
DDR_ARE_DEPENDENT (ddr) == NULL_TREE.


[ There is the case of non-loop dependence analysis (tested for by 
loop_nest.exists ()), where DR_NUM_DIMENSIONS == 0 for all data 
references, that seems to be an exception. ]


The patch fixes the root cause of the problem by handling 
DR_NUM_DIMENSIONS == 0 in initialize_data_dependence_relation.


OK for trunk, 5.0, 4.9, if bootstrap/reg-test succeeds?

Thanks,
- Tom


Handle DR_NUM_DIMENSIONS == 0 in initialize_data_dependence_relation

2016-01-12  Tom de Vries  

	* tree-data-ref.c (initialize_data_dependence_relation): Handle
	DR_NUM_DIMENSIONS == 0.

	* gcc.dg/autopar/pr69110.c: New test.

	* testsuite/libgomp.c/pr69110.c: New test.

---
 gcc/testsuite/gcc.dg/autopar/pr69110.c | 17 +
 gcc/tree-data-ref.c| 10 ++
 libgomp/testsuite/libgomp.c/pr69110.c  | 26 ++
 3 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/autopar/pr69110.c b/gcc/testsuite/gcc.dg/autopar/pr69110.c
new file mode 100644
index 000..27cdae5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/pr69110.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -ftree-parallelize-loops=2 -fno-tree-loop-im -fdump-tree-parloops2-details" } */
+
+#define N 1000
+
+unsigned int i = 0;
+
+void
+foo (void)
+{
+  unsigned int z;
+  for (z = 0; z < N; ++z)
+++i;
+}
+
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 0 "parloops2" } } */
+/* { dg-final { scan-tree-dump-times "FAILED: data dependencies exist across iterations" 1 "parloops2" } } */
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index a40f40d..4c29fc2 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1510,8 +1510,9 @@ initialize_data_dependence_relation (struct data_reference *a,
   if (operand_equal_p (DR_REF (a), DR_REF (b), 0))
 {
  if (loop_nest.exists ()
-&& !object_address_invariant_in_loop_p (loop_nest[0],
-   	DR_BASE_OBJECT (a)))
+	 && (!object_address_invariant_in_loop_p (loop_nest[0],
+		  DR_BASE_OBJECT (a))
+	 || DR_NUM_DIMENSIONS (a) == 0))
   {
 DDR_ARE_DEPENDENT (res) = chrec_dont_know;
 return res;
@@ -1548,8 +1549,9 @@ initialize_data_dependence_relation (struct data_reference *a,
  analyze it.  TODO -- in fact, it would suffice to record that there may
  be arbitrary dependences in the loops where the base object varies.  */
   if (loop_nest.exists ()
-  && !object_address_invariant_in_loop_p (loop_nest[0],
- 	  DR_BASE_OBJECT (a)))
+  && (!object_address_invariant_in_loop_p (loop_nest[0],
+	   DR_BASE_OBJECT (a))
+	  || DR_NUM_DIMENSIONS (a) == 0))
 {
   DDR_ARE_DEPENDENT (res) = chrec_dont_know;
   return res;
diff --git a/libgomp/testsuite/libgomp.c/pr69110.c b/libgomp/testsuite/libgomp.c/pr69110.c
new file mode 100644
index 000..0d9e5ca
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr69110.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=

Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables

2016-01-26 Thread Tom de Vries

On 25/01/16 14:27, Ilya Verbin wrote:

Hi!

On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote:

diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 62e5454..cdaee41 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1911,6 +1911,11 @@ input_offload_tables (void)
  tree fn_decl
= lto_file_decl_data_get_fn_decl (file_data, decl_index);
  vec_safe_push (offload_funcs, fn_decl);
+
+ /* Prevent IPA from removing fn_decl as unreachable, since there
+may be no refs from the parent function to child_fn in offload
+LTO mode.  */
+ cgraph_node::get (fn_decl)->mark_force_output ();
}
  else if (tag == LTO_symtab_variable)
{
@@ -1918,6 +1923,10 @@ input_offload_tables (void)
  tree var_decl
= lto_file_decl_data_get_var_decl (file_data, decl_index);
  vec_safe_push (offload_vars, var_decl);
+
+ /* Prevent IPA from removing var_decl as unused, since there
+may be no refs to var_decl in offload LTO mode.  */
+ varpool_node::get (var_decl)->force_output = 1;
}


This doesn't work when there is more than one LTO partition, because only first
partition contains full offload table to maintain correct order, but cgraph and
varpool nodes aren't necessarily created for the first partition.  To reproduce:

$ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* --target_board=unix/-flto"
FAIL: libgomp.c/for-3.c (internal compiler error)
FAIL: libgomp.c/for-5.c (internal compiler error)
FAIL: libgomp.c/for-6.c (internal compiler error)
$ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* 
--target_board=unix/-flto"
FAIL: libgomp.c++/for-11.C (internal compiler error)
FAIL: libgomp.c++/for-13.C (internal compiler error)
FAIL: libgomp.c++/for-14.C (internal compiler error)


This works for me.

OK for trunk?

Thanks,
- Tom

Check that cgraph/varpool_node exists before use in input_offload_tables

2016-01-26  Tom de Vries  

	* lto-cgraph.c (input_offload_tables): Check that cgraph/varpool_node
	exists before use.

---
 gcc/lto-cgraph.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 0634779..f4bcbaa 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1915,7 +1915,9 @@ input_offload_tables (void)
 	  /* Prevent IPA from removing fn_decl as unreachable, since there
 		 may be no refs from the parent function to child_fn in offload
 		 LTO mode.  */
-	  cgraph_node::get (fn_decl)->mark_force_output ();
+	  cgraph_node *node = cgraph_node::get (fn_decl);
+	  if (node)
+		node->mark_force_output ();
 	}
 	  else if (tag == LTO_symtab_variable)
 	{
@@ -1926,7 +1928,9 @@ input_offload_tables (void)
 
 	  /* Prevent IPA from removing var_decl as unused, since there
 		 may be no refs to var_decl in offload LTO mode.  */
-	  varpool_node::get (var_decl)->force_output = 1;
+	  varpool_node *node = varpool_node::get (var_decl);
+	  if (node)
+		node->force_output = 1;
 	}
 	  else
 	fatal_error (input_location,


Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms

2016-01-26 Thread Dominik Vogt
On Mon, Dec 21, 2015 at 01:13:28PM +, Alan Lawrence wrote:
> ...the test passes with --param sra-max-scalarization-size-Ospeed.
> 
> Verified on aarch64 and with stage1 compiler for hppa, powerpc, sparc, s390.

How did you test this on s390?  For me, the test still fails
unless I add -march=z13 (s390x).

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove XFAIL for powerpc(32), hppa,
>   aarch64, sparc, s390. Add --param sra-max-scalarization-size-Ospeed.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> index 9eccdc9..748448e 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
> +/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized 
> --param sra-max-scalarization-size-Ospeed=32" } */
>  
>  int
>  foo ()
> @@ -17,7 +17,8 @@ foo ()
>  /* After late unrolling the above loop completely DOM should be
> able to optimize this to return 28.  */
>  
> -/* See PR63679 and PR64159, if the target forces the initializer to memory 
> then
> -   DOM is not able to perform this optimization.  */
> +/* On alpha, the vectorizer generates writes of two vector elements at once,
> +   but the loop reads only one element at a time, and DOM cannot resolve 
> these.
> +   The same happens on powerpc depending on the SIMD support available.  */
>  
> -/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* 
> alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
> +/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail alpha*-*-* 
> powerpc64*-*-* } } } */
> -- 
> 1.9.1


Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



[Fortran, gcc-5, patch, pr69268, v1] [5 Regression] Sourced allocation calls function twice

2016-01-26 Thread Andre Vehreschild
Hi all,

please find attached a patch to solve the issue of evaluating a source=
expression of an allocate() twice in gcc-5. The patch is a combination
and partial back port of several prs of the mainline (namely, but not
the complete list: pr44672, pr65548).

The patch needed the counts of builtin_mallocs/frees in
allocatable_scalar_13 to be adapted. There are now fewer calls to the
memory management routines. Valgrind does not report any memory issues
in the modified code, but that does not mean there aren't any. I am
happy to hear about any issue, this patch causes (still having issues
getting the sanitizer to work).

Bootstrapped and regtested on x86_64-linux-gnu/F23.

Ok, for gcc-5-branch?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
gcc/testsuite/ChangeLog:

2016-01-26  Andre Vehreschild  

* gfortran.dg/allocatable_scalar_13.f90: Fixing counts of malloc/
free to fit the actual number of calls.
* gfortran.dg/allocate_with_source_16.f90: New test.


gcc/fortran/ChangeLog:

2016-01-26  Andre Vehreschild  

* trans-stmt.c (gfc_trans_allocate): Make sure the source=
expression is evaluated once only. Use gfc_trans_assignment ()
instead of explicitly calling gfc_trans_string_copy () to
reduce the code complexity in trans_allocate.

diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 68601f6..0be92cd 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -5108,7 +5108,7 @@ tree
 gfc_trans_allocate (gfc_code * code)
 {
   gfc_alloc *al;
-  gfc_expr *expr;
+  gfc_expr *expr, *e3rhs = NULL;
   gfc_se se, se_sz;
   tree tmp;
   tree parm;
@@ -5130,6 +5130,7 @@ gfc_trans_allocate (gfc_code * code)
   stmtblock_t post;
   tree nelems;
   bool upoly_expr, tmp_expr3_len_flag = false, al_len_needs_set;
+  gfc_symtree *newsym = NULL;
 
   if (!code->ext.alloc.list)
 return NULL_TREE;
@@ -5239,16 +5240,28 @@ gfc_trans_allocate (gfc_code * code)
 	 false, false);
 	  gfc_add_block_to_block (&block, &se.pre);
 	  gfc_add_block_to_block (&post, &se.post);
-	  /* Prevent aliasing, i.e., se.expr may be already a
-		 variable declaration.  */
+
 	  if (!VAR_P (se.expr))
 		{
+		  tree var;
+
 		  tmp = build_fold_indirect_ref_loc (input_location,
 		 se.expr);
-		  tmp = gfc_evaluate_now (tmp, &block);
+
+		  /* We need a regular (non-UID) symbol here, therefore give a
+		 prefix.  */
+		  var = gfc_create_var (TREE_TYPE (tmp), "source");
+		  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (tmp)))
+		{
+		  gfc_allocate_lang_decl (var);
+		  GFC_DECL_SAVED_DESCRIPTOR (var) = GFC_DECL_SAVED_DESCRIPTOR (tmp);
+		}
+		  gfc_add_modify_loc (input_location, &block, var, tmp);
+		  tmp = var;
 		}
 	  else
 		tmp = se.expr;
+
 	  if (!code->expr3->mold)
 		expr3 = tmp;
 	  else
@@ -5357,6 +5370,71 @@ gfc_trans_allocate (gfc_code * code)
 	  else
 	expr3_esize = TYPE_SIZE_UNIT (
 		  gfc_typenode_for_spec (&code->expr3->ts));
+
+	  /* The routine gfc_trans_assignment () already implements all
+	 techniques needed.  Unfortunately we may have a temporary
+	 variable for the source= expression here.  When that is the
+	 case convert this variable into a temporary gfc_expr of type
+	 EXPR_VARIABLE and used it as rhs for the assignment.  The
+	 advantage is, that we get scalarizer support for free,
+	 don't have to take care about scalar to array treatment and
+	 will benefit of every enhancements gfc_trans_assignment ()
+	 gets.
+	 Exclude variables since the following block does not handle
+	 array sections.  In any case, there is no harm in sending
+	 variables to gfc_trans_assignment because there is no
+	 evaluation of variables.  */
+	  if (code->expr3->expr_type != EXPR_VARIABLE
+	  && code->expr3->mold != 1 && expr3 != NULL_TREE
+	  && DECL_P (expr3) && DECL_ARTIFICIAL (expr3))
+	{
+	  /* Build a temporary symtree and symbol.  Do not add it to
+		 the current namespace to prevent accidently modifying
+		 a colliding symbol's as.  */
+	  newsym = XCNEW (gfc_symtree);
+	  /* The name of the symtree should be unique, because
+		 gfc_create_var () took care about generating the
+		 identifier.  */
+	  newsym->name = gfc_get_string (IDENTIFIER_POINTER (
+	   DECL_NAME (expr3)));
+	  newsym->n.sym = gfc_new_symbol (newsym->name, NULL);
+	  /* The backend_decl is known.  It is expr3, which is inserted
+		 here.  */
+	  newsym->n.sym->backend_decl = expr3;
+	  e3rhs = gfc_get_expr ();
+	  e3rhs->ts = code->expr3->ts;
+	  e3rhs->rank = code->expr3->rank;
+	  e3rhs->symtree = newsym;
+	  /* Mark the symbol referenced or gfc_trans_assignment will
+		 bug.  */
+	  newsym->n.sym->attr.referenced = 1;
+	  e3rhs->expr_type = EXPR_VARIABLE;
+	  e3rhs->where = code->expr3->where;
+	  /* Set the symbols type, upto it was BT_UNKNOWN.  */

[PING^3][PATCH, 12/16] Handle acc loop directive

2016-01-26 Thread Tom de Vries

On 18/01/16 15:27, Tom de Vries wrote:

On 24/11/15 13:26, Tom de Vries wrote:

On 09/11/15 21:06, Tom de Vries wrote:

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch deals with loops in an oacc kernels region which are
annotated using "#pragma acc loop". It expands such a loop as a normal
loop, which has the effect of ignoring the "#pragma acc loop".







Ping^3. ( https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01089.html )

Thanks,
- Tom




Re: [PING^3][PATCH, 12/16] Handle acc loop directive

2016-01-26 Thread Jakub Jelinek
On Tue, Jan 26, 2016 at 01:38:39PM +0100, Tom de Vries wrote:
> Ping^3. ( https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01089.html )

First of all, I wonder if it wouldn't be far easier to handle these during
gimplification rather than during omp expansion or during parsing.  Inside
kernels, do you need to honor any clauses on the acc loop, like
privatization etc., or can you just ignore it altogether (after parsing them
to ensure they are valid)?
Handling this in expand_omp_for_generic is not really nice, because it will
make already very complicated function even more complex.
   gomp_ordered *ord_stmt;
+
+  /* True if this is nested inside an OpenACC kernels construct.  */
+  bool inside_kernels_p;
 };

is bad placement, there are other bool/unsigned char fields earlier and the
smaller fields should be adjacent for smaller padding of the struct.

Jakub


[PATCH ARM 0/2] Add new mexecute-only arm option.

2016-01-26 Thread Mickael Guene
 Hi everybody,

  This is a proposal for a patch set that adds a new -mexecute-only arm
option for profile M targets.
  Some STM32 MCUs implement a security feature called 'Proprietary Code
Read-Out Protection' aka PCROP that forbids data read access to some
code areas (only fetch access is allowed).
This protection prevents usage of literal pools (since one cannot load
data from code sections), so compilers have to use a specific code
sequence to generate constants.
 The first patch adds generic support for the new binutils section letter
'y' that allows to specify a section as being no-readable.
 The second patch adds the new -mexecute-only arm option. This option disables
all memory read inside text section, and takes care to emit the corresponding
code in 'y' sections.

 Unit tests have been written to check for correct code generation.

 No regressions have been observed for aarch64-none-elf, aarch64-none-linux-gnu,
aarch64_be-none-elf, arm-none-eabi, arm-none-linux-gnueabi,
arm-none-linux-gnueabihf and armeb-none-linux-gnueabihf.

Mickael Guene (2):
  Add support for section attribute letter 'y' when available
  Add -mexecute-only option.

 gcc/config.in  |   6 ++
 gcc/config/arm/arm-protos.h|   1 +
 gcc/config/arm/arm.c   | 114 +++--
 gcc/config/arm/arm.md  |   2 +-
 gcc/config/arm/arm.opt |   4 +
 gcc/config/arm/thumb1.md   |  71 +++--
 gcc/configure  |  34 +-
 gcc/configure.ac   |   6 ++
 gcc/doc/invoke.texi|   7 ++
 gcc/output.h   |   3 +-
 .../gcc.target/arm/thumb1-execute-only-switch.c|  23 +
 gcc/testsuite/gcc.target/arm/thumb1-execute-only.c |  69 +
 .../gcc.target/arm/thumb2-execute-only-switch.c|  23 +
 gcc/testsuite/gcc.target/arm/thumb2-execute-only.c |  68 
 gcc/varasm.c   |   6 +-
 15 files changed, 420 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-execute-only-switch.c
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-execute-only.c
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb2-execute-only-switch.c
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb2-execute-only.c

-- 
2.7.0.rc3



[PATCH ARM 1/2] Add support for section attribute letter 'y' when available

2016-01-26 Thread Mickael Guene
gcc/ChangeLog:

* configure.ac: Add detection of letter y support in assembler.
* config.in: Regenerate.
* configure: Regenerate.
* output.h (SECTION_NOREAD): Add new bit flag.
* varasm.c (default_elf_asm_named_section): Set y letter when we detect
SECTION_NOREAD.
---
 gcc/config.in|  6 ++
 gcc/configure| 34 +-
 gcc/configure.ac |  6 ++
 gcc/output.h |  3 ++-
 gcc/varasm.c |  6 +-
 5 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index 1796e1d..2aa2d1a 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1266,6 +1266,12 @@
 #endif
 
 
+/* Define if your assembler supports specifying the section flag y. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GAS_SECTION_NOREAD
+#endif
+
+
 /* Define 0/1 if your assembler supports marking sections with SHF_MERGE flag.
*/
 #ifndef USED_FOR_TARGET
diff --git a/gcc/configure b/gcc/configure
index ff646e8..e543732 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -22365,7 +22365,6 @@ else
 $as_echo "$gcc_cv_readelf" >&6; }
 fi
 
-# Figure out what assembler alignment features are present.
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler flags" >&5
 $as_echo_n "checking assembler flags... " >&6; }
 if test "${gcc_cv_as_flags+set}" = set; then :
@@ -22392,6 +22391,39 @@ fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_flags" >&5
 $as_echo "$gcc_cv_as_flags" >&6; }
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .section with 
y" >&5
+$as_echo_n "checking assembler for .section with y... " >&6; }
+if test "${gcc_cv_as_section_has_y+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_section_has_y=no
+  if test x$gcc_cv_as != x; then
+$as_echo '.section foo1,"y"
+.byte 0,0,0,0' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_section_has_y=yes
+else
+  echo "configure: failed program was" >&5
+  cat conftest.s >&5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_section_has_y" >&5
+$as_echo "$gcc_cv_as_section_has_y" >&6; }
+if test $gcc_cv_as_section_has_y = yes; then
+
+$as_echo "#define HAVE_GAS_SECTION_NOREAD 1" >>confdefs.h
+
+fi
+
+# Figure out what assembler alignment features are present.
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .balign and 
.p2align" >&5
 $as_echo_n "checking assembler for .balign and .p2align... " >&6; }
 if test "${gcc_cv_as_balign_and_p2align+set}" = set; then :
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 4dc7c10..d1717bf 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -2446,6 +2446,12 @@ else
AC_MSG_RESULT($gcc_cv_readelf)
 fi
 
+gcc_GAS_CHECK_FEATURE([.section with y], gcc_cv_as_section_has_y,,,
+[.section foo1,"y"
+.byte 0,0,0,0],,
+[AC_DEFINE(HAVE_GAS_SECTION_NOREAD, 1,
+  [Define if your assembler supports specifying the section flag y.])])
+
 # Figure out what assembler alignment features are present.
 gcc_GAS_CHECK_FEATURE([.balign and .p2align], gcc_cv_as_balign_and_p2align,
  [2,6,0],,
diff --git a/gcc/output.h b/gcc/output.h
index 0924499..1df3088 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -381,7 +381,8 @@ extern void no_asm_to_stream (FILE *);
 #define SECTION_COMMON   0x80  /* contains common data */
 #define SECTION_RELRO   0x100  /* data is readonly after relocation 
processing */
 #define SECTION_EXCLUDE  0x200 /* discarded by the linker */
-#define SECTION_MACH_DEP 0x400 /* subsequent bits reserved for target 
*/
+#define SECTION_NOREAD   0x400 /* section cannot be read but can be 
executed */
+#define SECTION_MACH_DEP 0x800 /* subsequent bits reserved for target 
*/
 
 /* This SECTION_STYLE is used for unnamed sections that we can switch
to using a special assembler directive.  */
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 3a3573e..c0499b1 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6233,7 +6233,7 @@ void
 default_elf_asm_named_section (const char *name, unsigned int flags,
   tree decl)
 {
-  char flagchars[10], *f = flagchars;
+  char flagchars[11], *f = flagchars;
 
   /* If we have already declared this section, we can use an
  abbreviated form to switch back to it -- unless this section is
@@ -6266,6 +6266,10 @@ default_elf_asm_named_section (const char *name, 
unsigned int flags,
 *f++ = TLS_SECTION_ASM_FLAG;
   if (HAVE_COMDAT_GROUP && (flags & SECTION_LINKONCE))
 *f++ = 'G';
+#if defined (HAVE_GAS_SECTION_NOREAD) && HAVE_GAS_SECTION_NOREAD == 1
+  if (flags & SECTION_NOREAD)
+*f++ = 'y';
+#endif
   *f = '\0';
 

[PATCH ARM 2/2] Add -mexecute-only option.

2016-01-26 Thread Mickael Guene
gcc/ChangeLog:

* config/arm/arm-protos.h (arm_modes_tieable_p): New.
* config/arm/arm.c (arm_function_section): New.
(arm_section_type_flags): New.
(TARGET_ASM_FUNCTION_SECTION): Define.
(TARGET_SECTION_TYPE_FLAGS): Define.
(arm_option_override): Add mexecute-only new option.
(thumb1_gen_const_int): New.
(thumb1_legitimate_address_p): Disallow constant pool usage for thumb1
for arm_disable_literal_pool.
(thumb1_rtx_costs): Update cost for arm_disable_literal_pool.
(thumb1_size_rtx_costs): Likewise.
(arm_output_mi_thunk): Avoid literal usage for target_execute_only.
* config/arm/arm.md (casesi): Disable for target_execute_only.
* config/arm/arm.opt (target_execute_only): New option.
* config/arm/thumb1.md (define_insn "thumb1_movsi_symbol_ref"): New.
(define_insn "*thumb1_movsi_const_int"): New.
(define_split for generate integer constant): New.
(define_insn "*thumb1_movsi_insn"): Set use_literal_pool attribute so
it's enabled/disabled according to arm_disable_literal_pool.
(define_expand "tablejump"): Disable for target_execute_only.
* doc/invoke.texi (mexecute-only): New.

gcc/testsuite/ChangeLog:

* gcc.target/arm/thumb1-execute-only-switch.c: New.
* gcc.target/arm/thumb1-execute-only.c: New.
* gcc.target/arm/thumb2-execute-only-switch.c: New.
* gcc.target/arm/thumb2-execute-only.c: New.

 This option generate code that don't load no data from text section.
---
 gcc/config/arm/arm-protos.h|   1 +
 gcc/config/arm/arm.c   | 114 +++--
 gcc/config/arm/arm.md  |   2 +-
 gcc/config/arm/arm.opt |   4 +
 gcc/config/arm/thumb1.md   |  71 +++--
 gcc/doc/invoke.texi|   7 ++
 .../gcc.target/arm/thumb1-execute-only-switch.c|  23 +
 gcc/testsuite/gcc.target/arm/thumb1-execute-only.c |  69 +
 .../gcc.target/arm/thumb2-execute-only-switch.c|  23 +
 gcc/testsuite/gcc.target/arm/thumb2-execute-only.c |  68 
 10 files changed, 368 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-execute-only-switch.c
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-execute-only.c
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb2-execute-only-switch.c
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb2-execute-only.c

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 28f2263..e08842e 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -59,6 +59,7 @@ extern bool arm_modes_tieable_p (machine_mode, machine_mode);
 extern int const_ok_for_arm (HOST_WIDE_INT);
 extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code);
 extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
+extern void thumb1_gen_const_int (rtx, HOST_WIDE_INT);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
   HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f152afa..fe8e018 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -300,6 +300,12 @@ static void arm_canonicalize_comparison (int *code, rtx 
*op0, rtx *op1,
 static unsigned HOST_WIDE_INT arm_asan_shadow_offset (void);
 
 static void arm_sched_fusion_priority (rtx_insn *, int, int *, int*);
+
+static section *arm_function_section (tree decl, enum node_frequency freq,
+ bool startup, bool exit);
+
+static unsigned int arm_section_type_flags (tree, const char *, int);
+
 
 /* Table of machine attributes.  */
 static const struct attribute_spec arm_attribute_table[] =
@@ -735,6 +741,12 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_SCHED_FUSION_PRIORITY
 #define TARGET_SCHED_FUSION_PRIORITY arm_sched_fusion_priority
 
+#undef  TARGET_ASM_FUNCTION_SECTION
+#define TARGET_ASM_FUNCTION_SECTION arm_function_section
+
+#undef  TARGET_SECTION_TYPE_FLAGS
+#define TARGET_SECTION_TYPE_FLAGS arm_section_type_flags
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 /* Obstack for minipool constant handling.  */
@@ -3428,6 +3440,15 @@ arm_option_override (void)
   if (target_slow_flash_data)
 arm_disable_literal_pool = true;
 
+  /* We only support -mexecute-only on M-profile targets.  */
+  if (target_execute_only && (flag_pic || !(!arm_arch_notm || arm_arch7em)))
+error ("-mexecute-only only supports non-pic code on M-profile targets");
+
+  /* In execute-only mode we don't want any memory read into text section and
+ so we disable literal pool.  */
+  if (target_execute_only)
+arm_disable_literal_pool = true;
+
   /* Disable scheduling fusion by default if it's not armv

Re: [PATCH] Handle -fsanitize=* in lto-wrapper (PR lto/69254)

2016-01-26 Thread Bernd Schmidt

On 01/25/2016 09:30 PM, Jakub Jelinek wrote:


Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


I've been staring at it for a while, and on the whole I think I can make 
sense of this. However - it does not have test coverage. Can this be 
added? Also, is this a regression?



(parse_sanitizer_options): New function.
(common_handle_option): Use parse_sanitizer_options.


I think this can go in, it just moves code around, right? That'll make 
followup patches smaller.



--- gcc/lto-opts.c.jj   2016-01-23 00:13:00.897015402 +0100
+++ gcc/lto-opts.c  2016-01-25 14:06:31.834127398 +0100
@@ -199,9 +199,11 @@ lto_write_options (void)
/* Also drop all options that are handled by the driver as well,
 which includes things like -o and -v or -fhelp for example.
 We do not need those.  The only exception is -foffload option, if we
-write it in offload_lto section.  Also drop all diagnostic options.  */
+write it in offload_lto section.  Also drop all diagnostic options,
+and -fsanitize=.  */
if ((cl_options[option->opt_index].flags & (CL_DRIVER|CL_WARNING))
- && (!lto_stream_offload_p || option->opt_index != OPT_foffload_))
+ && (!lto_stream_offload_p || option->opt_index != OPT_foffload_)
+ && option->opt_index != OPT_fsanitize_)
continue;


This one puzzles me, doesn't it mean that no sanitizer options make it 
into the LTO stream, which would mean the new code in lto-wrapper 
doesn't trigger?



+/* Set *OPT to decoded -f{,no-}sanitize=shift.  */


I think I'd like some sort of comment about how this is an arbitrary 
choice, just designed to enable DEF_SANITIZER_BUILTIN (IIUC). Also, why 
use shift and not just sanitize=undefined?



@@ -392,6 +415,24 @@ merge_and_complain (struct cl_decoded_op
  break;
}
  }
+
+  /* If FDECODED_OPTIONS requested any ubsan sanitization, pass through


I think we want to add "and DECODED_OPTIONS doesn't", right?


Bernd


Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables

2016-01-26 Thread Ilya Verbin
On Tue, Jan 26, 2016 at 13:21:57 +0100, Tom de Vries wrote:
> On 25/01/16 14:27, Ilya Verbin wrote:
> >On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote:
> >>>diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> >>>index 62e5454..cdaee41 100644
> >>>--- a/gcc/lto-cgraph.c
> >>>+++ b/gcc/lto-cgraph.c
> >>>@@ -1911,6 +1911,11 @@ input_offload_tables (void)
> >>> tree fn_decl
> >>>   = lto_file_decl_data_get_fn_decl (file_data, decl_index);
> >>> vec_safe_push (offload_funcs, fn_decl);
> >>>+
> >>>+/* Prevent IPA from removing fn_decl as unreachable, since there
> >>>+   may be no refs from the parent function to child_fn in offload
> >>>+   LTO mode.  */
> >>>+cgraph_node::get (fn_decl)->mark_force_output ();
> >>>   }
> >>> else if (tag == LTO_symtab_variable)
> >>>   {
> >>>@@ -1918,6 +1923,10 @@ input_offload_tables (void)
> >>> tree var_decl
> >>>   = lto_file_decl_data_get_var_decl (file_data, decl_index);
> >>> vec_safe_push (offload_vars, var_decl);
> >>>+
> >>>+/* Prevent IPA from removing var_decl as unused, since there
> >>>+   may be no refs to var_decl in offload LTO mode.  */
> >>>+varpool_node::get (var_decl)->force_output = 1;
> >>>   }
> >
> >This doesn't work when there is more than one LTO partition, because only 
> >first
> >partition contains full offload table to maintain correct order, but cgraph 
> >and
> >varpool nodes aren't necessarily created for the first partition.  To 
> >reproduce:
> >
> >$ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* 
> >--target_board=unix/-flto"
> >FAIL: libgomp.c/for-3.c (internal compiler error)
> >FAIL: libgomp.c/for-5.c (internal compiler error)
> >FAIL: libgomp.c/for-6.c (internal compiler error)
> >$ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* 
> >--target_board=unix/-flto"
> >FAIL: libgomp.c++/for-11.C (internal compiler error)
> >FAIL: libgomp.c++/for-13.C (internal compiler error)
> >FAIL: libgomp.c++/for-14.C (internal compiler error)
> 
> This works for me.
> 
> OK for trunk?
> 
> Thanks,
> - Tom
> 

> Check that cgraph/varpool_node exists before use in input_offload_tables
> 
> 2016-01-26  Tom de Vries  
> 
>   * lto-cgraph.c (input_offload_tables): Check that cgraph/varpool_node
>   exists before use.

In this case they will be not marked as force_output in other partitions (except
the first one).

  -- Ilya


Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)

2016-01-26 Thread Kyrill Tkachov

Hi Christophe,

On 20/01/16 21:10, Christophe Lyon wrote:

On 19 January 2016 at 15:51, Alan Lawrence  wrote:

On 19/01/16 11:15, Christophe Lyon wrote:


For neon_vdupn, I chose to implement neon_vdup_nv4hf and
neon_vdup_nv8hf instead of updating the VX iterator because I thought
it was not desirable to impact neon_vrev32.


Well, the same instruction will suffice for vrev32'ing vectors of HF just
as
well as vectors of HI, so I think I'd argue that's harmless enough. To
gain the
benefit, we'd need to update arm_evpc_neon_vrev with a few new cases,
though.


Since this is more intrusive, I'd rather leave that part for later. OK?


Sure.


+#ifdef __ARM_BIG_ENDIAN
+  /* Here, 3 is (4-1) where 4 is the number of lanes. This is also the
+ right value for vectors with 8 lanes.  */
+#define __arm_lane(__vec, __idx) (__idx ^ 3)
+#else
+#define __arm_lane(__vec, __idx) __idx
+#endif
+


Looks right, but sounds... my concern here is that I'm hoping at some
point we
will move the *other* vget/set_lane intrinsics to use GCC vector
extensions
too. At which time (unlike __aarch64_lane which can be used everywhere)
this
will be the wrong formula. Can we name (and/or comment) it to avoid
misleading
anyone? The key characteristic seems to be that it is for vectors of
16-bit
elements only.


I'm not to follow, here. Looking at the patterns for
neon_vget_lane_*internal in neon.md,
I can see 2 flavours: one for VD, one for VQ2. The latter uses "halfelts".

Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq),
that would be similar to the aarch64 ones (by computing the number of
lanes of the input vector), but the "q" one would use half the total
number of lanes instead?


That works for me! Sthg like:

#define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx
#define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) +
(NUM_LANES(__vec)/2 - __idx)
//or similarly
#define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1))

Alternatively I'd been thinking

#define __arm_lane_32xN(__idx) __idx ^ 1
#define __arm_lane_16xN(__idx) __idx ^ 3
#define __arm_lane_8xN(__idx) __idx ^ 7

Bear in mind PR64893 that we had on AArch64 :-(


Here is a new version, based on the comments above.
I've also removed the addition of arm_fp_ok effective target since I
added that in my other testsuite patch.

OK now?

Thanks,

Christophe



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3588b83..b1f408c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12370,6 +12370,10 @@ neon_valid_immediate (rtx op, machine_mode mode, int 
inverse,
   if (!vfp3_const_double_rtx (el0) && el0 != CONST0_RTX (GET_MODE (el0)))
 return -1;
 
+  /* FP16 vectors cannot be represented.  */

+  if (innersize == 2)
+   return -1;
+
   r0 = CONST_DOUBLE_REAL_VALUE (el0);


I think it'd be clearer to write "if (GET_MODE_INNER (mode) == HFmode)"

+(define_expand "movv4hf"
+  [(set (match_operand:V4HF 0 "s_register_operand")
+   (match_operand:V4HF 1 "s_register_operand"))]
+  "TARGET_NEON && TARGET_FP16"
+{
+  if (can_create_pseudo_p ())
+{
+  if (!REG_P (operands[0]))
+   operands[1] = force_reg (V4HFmode, operands[1]);
+}
+})

Can you please add a comment saying why you need the force_reg here?
IIRC it's because of CANNOT_CHANGE_MODE_CLASS on big-endian that causes an
ICE during expand with subregs.

I've tried this patch out and it does indeed fix the ICE on armeb.
So ok for trunk with the changes above.
Thanks,
Kyrill




Re: [PATCH] Handle -fsanitize=* in lto-wrapper (PR lto/69254)

2016-01-26 Thread Jakub Jelinek
On Tue, Jan 26, 2016 at 01:54:52PM +0100, Bernd Schmidt wrote:
> On 01/25/2016 09:30 PM, Jakub Jelinek wrote:
> 
> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> I've been staring at it for a while, and on the whole I think I can make
> sense of this. However - it does not have test coverage. Can this be added?

I have trouble creating those.  Because this needs to pass different
compile and different linker options, so I think only lto.exp framework
supports this, but then needs options (-fopenmp, -fopenacc, -fcilkplus,
-fsanitize=) that usually aren't unconditionally available, need some
library and/or target support, and thus such tests generally go into test
subdirectories where all that is tested.

> Also, is this a regression?

Andrew Pinski said so, but because further compiler emitted sanitizers have
been added over time into -fsanitize=undefined, it very likely is a
regression at least in regards to those sanitizers (compile with
-c -flto -fsanitize=undefined something that didn't result in any
sanitization in say 4.9, but now has some of the newer sanitizers, e.g.
__attribute__((nonnull)) char *foo (char *);
char *bar (char *p) { return foo (p); }
and then link without -fsanitize=undefined or with -fno-sanitize=undefined.

> > (parse_sanitizer_options): New function.
> > (common_handle_option): Use parse_sanitizer_options.
> 
> I think this can go in, it just moves code around, right? That'll make
> followup patches smaller.

Yeah, committed.
> 
> >--- gcc/lto-opts.c.jj2016-01-23 00:13:00.897015402 +0100
> >+++ gcc/lto-opts.c   2016-01-25 14:06:31.834127398 +0100
> >@@ -199,9 +199,11 @@ lto_write_options (void)
> >/* Also drop all options that are handled by the driver as well,
> >  which includes things like -o and -v or -fhelp for example.
> >  We do not need those.  The only exception is -foffload option, if we
> >- write it in offload_lto section.  Also drop all diagnostic options.  */
> >+ write it in offload_lto section.  Also drop all diagnostic options,
> >+ and -fsanitize=.  */
> >if ((cl_options[option->opt_index].flags & (CL_DRIVER|CL_WARNING))
> >-  && (!lto_stream_offload_p || option->opt_index != OPT_foffload_))
> >+  && (!lto_stream_offload_p || option->opt_index != OPT_foffload_)
> >+  && option->opt_index != OPT_fsanitize_)
> > continue;
> 
> This one puzzles me, doesn't it mean that no sanitizer options make it into
> the LTO stream, which would mean the new code in lto-wrapper doesn't
> trigger?

Sounds like an error in the comment change only to me, I meant to say that
the only exceptions are now
1) -foffload if we write it in offload_lto section
2) -fsanitize= (regardless where we write it)
The continue causes the given option not to be written into the LTO opts
subsection, which is the case here for CL_DRIVER/CL_WARNING options
(-fsanitize= is CL_DRIVER), with the exception of -foffload= under some
condition and with the exception of -fsanitize= (newly).  I.e. the change
is that -fsanitize= used to be ignored, not writte into LTO opts, while with
the change it is written in there.

> >+/* Set *OPT to decoded -f{,no-}sanitize=shift.  */
> 
> I think I'd like some sort of comment about how this is an arbitrary choice,

Ok, will try to write some comment.

> just designed to enable DEF_SANITIZER_BUILTIN (IIUC). Also, why use shift
> and not just sanitize=undefined?

Because -fsanitize=undefined is a large collection of individual sanitizers,
and at least some of them affect also post-IPA code (e.g.
-fsanitize=unreachable).  The goal is to pick one of the sanitizers that are
handled solely pre-IPA only, after that just are present in form of a
builtin call in the IL (and thus all that lto1 needs to do for that option
is initialize the builtins).

> >@@ -392,6 +415,24 @@ merge_and_complain (struct cl_decoded_op
> >   break;
> > }
> >  }
> >+
> >+  /* If FDECODED_OPTIONS requested any ubsan sanitization, pass through
> 
> I think we want to add "and DECODED_OPTIONS doesn't", right?

Maybe.  It also has to check the option->value though.  The thing is,
because I didn't want to memmove options around when processing the first
TU, it will canonicalize all the -f{,no-}sanitize= options depending on if
in the end some ubsan sanitizers were enabled or not to
-f{,no-}sanitize=shift.  And merge_and_complain thus should ignore
all the -fno-sanitize=shift options in DECODED_OPTIONS, if it finds any
-fsanitize=shift in DECODED_OPTIONS (optionally preceeded by some
-fno-sanitize=shift), then nothing needs to be added, otherwise
-fsanitize=shift is added.

Jakub


Re: Patch RFA: Add option -fcollectible-pointers, use it in ivopts

2016-01-26 Thread Ian Lance Taylor
On Tue, Jan 26, 2016 at 4:10 AM, Bernd Schmidt  wrote:
>
>>> On 01/23/2016 12:52 AM, Ian Lance Taylor wrote:
>>>
 2016-01-22  Ian Lance Taylor  

 * common.opt (fkeep-gc-roots-live): New option.
 * tree-ssa-loop-ivopts.c (add_candidate_1): If
 -fkeep-gc-roots-live, skip pointers.
 (add_iv_candidate_for_biv): Handle add_candidate_1 returning
 NULL.
 * doc/invoke.texi (Optimize Options): Document
 -fkeep-gc-roots-live.

 gcc/testsuite/ChangeLog:

 2016-01-22  Ian Lance Taylor  

 * gcc.dg/tree-ssa/ivopt_5.c: New test.
>>>
>>>
>>>
>>> Patch not attached?
>>
>>
>> The patch is there in the mailing list.  See the attachment on
>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01781.html .
>
>
> That seems to be the old patch. At least it doesn't seem to match the
> ChangeLog quoted above.

I'm sorry, you're quite right.  That's odd.  Here is the actual patch.

Ian
Index: common.opt
===
--- common.opt  (revision 232580)
+++ common.opt  (working copy)
@@ -1380,6 +1380,10 @@
 Enable hoisting adjacent loads to encourage generating conditional move
 instructions.
 
+fkeep-gc-roots-live
+Common Report Var(flag_keep_gc_roots_live) Optimization
+Always keep a pointer to a live memory block
+
 floop-parallelize-all
 Common Report Var(flag_loop_parallelize_all) Optimization
 Mark all loops as parallel.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 232580)
+++ doc/invoke.texi (working copy)
@@ -359,7 +359,7 @@
 -fno-ira-share-spill-slots @gol
 -fisolate-erroneous-paths-dereference -fisolate-erroneous-paths-attribute @gol
 -fivopts -fkeep-inline-functions -fkeep-static-functions @gol
--fkeep-static-consts -flive-range-shrinkage @gol
+-fkeep-static-consts -fkeep-gc-roots-live -flive-range-shrinkage @gol
 -floop-block -floop-interchange -floop-strip-mine @gol
 -floop-unroll-and-jam -floop-nest-optimize @gol
 -floop-parallelize-all -flra-remat -flto -flto-compression-level @gol
@@ -6621,6 +6621,17 @@
 If you use @option{-Wunsafe-loop-optimizations}, the compiler warns you
 if it finds this kind of loop.
 
+@item -fkeep-gc-roots-live
+@opindex fkeep-gc-roots-live
+This option tells the compiler that a garbage collector will be used,
+and that therefore the compiled code must retain a live pointer into
+all memory blocks.  The compiler is permitted to construct a pointer
+that is outside the bounds of a memory block, but it must ensure that
+given a pointer into memory, some pointer into that memory remains
+live in the compiled code whenever it is live in the source code.
+This option is disabled by default for most languages, enabled by
+default for languages that use garbage collection.
+
 @item -fcrossjumping
 @opindex fcrossjumping
 Perform cross-jumping transformation.
Index: testsuite/gcc.dg/tree-ssa/ivopt_5.c
===
--- testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0)
+++ testsuite/gcc.dg/tree-ssa/ivopt_5.c (working copy)
@@ -0,0 +1,23 @@
+/* { dg-options "-O2 -fdump-tree-ivopts -fkeep-gc-roots-live" } */
+
+/* Only integer ivopts here when using -fkeep-gc-roots-live.   */
+
+void foo (char *pstart, int n)
+{
+  char *p;
+  char *pend = pstart + n;
+
+  for (p = pstart; p < pend; p++)
+*p = 1;
+}
+
+void foo1 (char *pstart, int n)
+{
+  char *p;
+  char *pend = pstart + n;
+
+  for (p = pstart; p != pend; p++)
+*p = 1;
+}
+
+/* { dg-final { scan-tree-dump-times "ivtmp.\[0-9_\]* = PHI <\[^0\]" 0 
"ivopts"} } */
Index: tree-ssa-loop-ivopts.c
===
--- tree-ssa-loop-ivopts.c  (revision 232580)
+++ tree-ssa-loop-ivopts.c  (working copy)
@@ -2815,6 +2815,16 @@
   struct iv_cand *cand = NULL;
   tree type, orig_type;
 
+  /* -fkeep-gc-roots-live means that we have to keep a real pointer
+ live, but the ivopts code may replace a real pointer with one
+ pointing before or after the memory block that is then adjusted
+ into the memory block during the loop.  FIXME: It would likely be
+ better to actually force the pointer live and still use ivopts;
+ for example, it would be enough to write the pointer into memory
+ and keep it there until after the loop.  */
+  if (flag_keep_gc_roots_live && POINTER_TYPE_P (TREE_TYPE (base)))
+return NULL;
+
   /* For non-original variables, make sure their values are computed in a type
  that does not invoke undefined behavior on overflows (since in general,
  we cannot prove that these induction variables are non-wrapping).  */
@@ -3083,8 +3093,11 @@
  cand = add_candidate_1 (data,
  iv->base, iv->step, true, IP_ORIGINAL, NULL,
  SSA_NAME_DEF_STMT (def));
- cand->var_before = iv->ssa_name;
- cand->var_after = def;
+ if (cand

Re: [PATCH] rs6000: Put back the 's' output modifier

2016-01-26 Thread David Edelsohn
On Mon, Jan 25, 2016 at 9:39 PM, Segher Boessenkool
 wrote:
> It turns out the 's' output modifier is used in some glibc math code,
> and is in an installed header even.  So let's put it back, it is much
> less of a burden supporting it a bit longer than to deal with the fallout.
> (It is also being fixed for glibc.)
>
> Tested on powerpc64-linux-gcc; is this okay for mainline?

Okay.

Thanks, David


[C++ PATCH] Handle error_mark_node in cp_fold (PR c++/68357)

2016-01-26 Thread Jakub Jelinek
Hi!

Some errors (e.g. in in this particular PR in a backend machine builtin)
are detected only during folding and the recursive cp_fold* call can then
return error_mark_node.  Passing that to fold_build*_loc is undesirable
though, the gimplifiers as well as other places in the compiler don't expect
error_mark_node to be operand of NOP_EXPR and various other trees,
historically the FEs would then just not create the expression at all
and use error_mark_node instead of the whole expression.

The following patch handles those in cp_fold.  Bootstrapped/regtested on
x86_64-linux and i686-linux, and tested on the testcase from the PR (which
is in the testsuite already) using cross to darwin; ok for trunk?

Another alternative would be to make sure tree folders don't introduce
error_mark_node (if it wasn't there already), but instead fold the call say
to build_int_cst (returntype, 0).  The known cases that would need to change
are at least darwin_build_constant_cfstring and darwin_fold_builtin, but
maybe others.

2016-01-26  Jakub Jelinek  

PR c++/68357
* cp-gimplify.c (cp_fold): If some operand folds to error_mark_node,
return error_mark_node instead of building trees with error_mark_node
operands.

--- gcc/cp/cp-gimplify.c.jj 2016-01-20 10:55:15.0 +0100
+++ gcc/cp/cp-gimplify.c2016-01-26 11:42:34.966038507 +0100
@@ -1954,7 +1954,12 @@ cp_fold (tree x)
   op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops);
 
   if (op0 != TREE_OPERAND (x, 0))
-x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
+   {
+ if (op0 == error_mark_node)
+   x = error_mark_node;
+ else
+   x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
+   }
   else
x = fold (x);
 
@@ -1986,7 +1991,12 @@ cp_fold (tree x)
   op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops);
 
   if (op0 != TREE_OPERAND (x, 0))
-x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
+   {
+ if (op0 == error_mark_node)
+   x = error_mark_node;
+ else
+   x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
+   }
   else
x = fold (x);
 
@@ -2043,7 +2053,12 @@ cp_fold (tree x)
   op1 = cp_fold_rvalue (TREE_OPERAND (x, 1));
 
   if (op0 != TREE_OPERAND (x, 0) || op1 != TREE_OPERAND (x, 1))
-   x = fold_build2_loc (loc, code, TREE_TYPE (x), op0, op1);
+   {
+ if (op0 == error_mark_node || op1 == error_mark_node)
+   x = error_mark_node;
+ else
+   x = fold_build2_loc (loc, code, TREE_TYPE (x), op0, op1);
+   }
   else
x = fold (x);
 
@@ -2066,7 +2081,14 @@ cp_fold (tree x)
   if (op0 != TREE_OPERAND (x, 0)
  || op1 != TREE_OPERAND (x, 1)
  || op2 != TREE_OPERAND (x, 2))
-   x = fold_build3_loc (loc, code, TREE_TYPE (x), op0, op1, op2);
+   {
+ if (op0 == error_mark_node
+ || op1 == error_mark_node
+ || op2 == error_mark_node)
+   x = error_mark_node;
+ else
+   x = fold_build3_loc (loc, code, TREE_TYPE (x), op0, op1, op2);
+   }
   else
x = fold (x);
 
@@ -2093,9 +2115,18 @@ cp_fold (tree x)
  {
r = cp_fold (CALL_EXPR_ARG (x, i));
if (r != CALL_EXPR_ARG (x, i))
- changed = 1;
+ {
+   if (r == error_mark_node)
+ {
+   x = error_mark_node;
+   break;
+ }
+   changed = 1;
+ }
CALL_EXPR_ARG (x, i) = r;
  }
+   if (x == error_mark_node)
+ break;
 
optimize = nw;
r = fold (x);
@@ -2143,7 +2174,15 @@ cp_fold (tree x)
constructor_elt e = { p->index, op };
nelts->quick_push (e);
if (op != p->value)
- changed = true;
+ {
+   if (op == error_mark_node)
+ {
+   x = error_mark_node;
+   changed = false;
+   break;
+ }
+   changed = true;
+ }
  }
if (changed)
  x = build_constructor (TREE_TYPE (x), nelts);
@@ -2188,9 +2227,19 @@ cp_fold (tree x)
   op2 = cp_fold (TREE_OPERAND (x, 2));
   op3 = cp_fold (TREE_OPERAND (x, 3));
 
-  if (op0 != TREE_OPERAND (x, 0) || op1 != TREE_OPERAND (x, 1)
- || op2 != TREE_OPERAND (x, 2) || op3 != TREE_OPERAND (x, 3))
-   x = build4_loc (loc, code, TREE_TYPE (x), op0, op1, op2, op3);
+  if (op0 != TREE_OPERAND (x, 0)
+ || op1 != TREE_OPERAND (x, 1)
+ || op2 != TREE_OPERAND (x, 2)
+ || op3 != TREE_OPERAND (x, 3))
+   {
+ if (op0 == error_mark_node
+ || op1 == error_mark_node
+ || op2 == error_mark_node
+ || op3 == error_mark_node)
+   x = error_mark_node;
+ else
+   x = b

Re: [PATCH] Handle -fsanitize=* in lto-wrapper (PR lto/69254)

2016-01-26 Thread Richard Biener
On Mon, 25 Jan 2016, Jakub Jelinek wrote:

> Hi!
> 
> Here is an attempt to handle -f{,no-}sanitize= options in LTO wrapper.
> In addition to that I've noticed ICEs e.g. if some OpenMP code is compiled
> with -c -flto -fopenmp, but final link is -fno-openmp, similarly for
> openacc, -fcilkplus is similar but used to be handled even less.
> 
> The intended behavior for -f{,no-}sanitize= is that for the ubsan
> sanitizers which are typically lowered before IPA, but are often using
> builtins that need initialization even at the LTO level, we collect
> from each TU info on whether any ubsan sanitizers have been enabled
> (note, this needs parsing of the options, because we can e.g. have 
> -fsanitize=shift,return -fno-sanitize=undefined 
> -fsanitize=integer-divide-by-zero
> ) and turn that into -fsanitize=shift from all the TUs if any of them
> needed any (randomly chosen sanitizer that is handled by FEs only).
> For address or thread sanitizers, which are handled solely post IPA,
> the choice whether to sanitize is left to the linker command line.
> And finally we need to ensure that e.g. -fno-sanitize=address,shift
> doesn't turn off the ubsan sanitizers.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Can you split out the non -fsanitize part?  It is ok.

I'm somewhat confused about that you drop -fsanitize options from
the LTO options section writing in lto-opts.c but then add code to
parse it from there in lto-wrapper.c.  The code there also looks
somewhat duplicated - why not just canonicalize any -fsanitize=
option coming in to the first in merge_and_complain
and special-case it in append_compiler_options again by say

  case OPT_fsanitize_:
 obstack_ptr_grow (argv_obstack, "-fsanitize=shift");

?

Thanks,
Richard.

> 2016-01-25  Jakub Jelinek  
> 
>   PR lto/69254
>   * opts.h (parse_sanitizer_options): New prototype.
>   * opts.c (sanitizer_opts): New array.
>   (parse_sanitizer_options): New function.
>   (common_handle_option): Use parse_sanitizer_options.
>   * lto-opts.c (lto_write_options): Write also -f{,no-}sanitize=
>   options.
>   * lto-wrapper.c (sanitize_shift_decoded_opt): New function.
>   (merge_and_complain): Determine if any -fsanitize= options
>   enabled at the end any undefined behavior sanitizers, and
>   append -fsanitize=shift if needed.  Handle -fcilkplus.
>   (append_compiler_options): Handle -fcilkplus and -fsanitize=.
>   (append_linker_options): Ignore -fno-{openmp,openacc,cilkplus}.
>   (find_and_merge_options): Canonicalize -fsanitize= options.
>   (run_gcc): Append -fsanitize=shift if compiler options set it
>   and linker options might override it.
> 
> --- gcc/opts.h.jj 2016-01-23 00:13:00.714017906 +0100
> +++ gcc/opts.h2016-01-25 14:06:31.833127411 +0100
> @@ -372,6 +372,8 @@ extern void control_warning_option (unsi
>  extern char *write_langs (unsigned int mask);
>  extern void print_ignored_options (void);
>  extern void handle_common_deferred_options (void);
> +unsigned int parse_sanitizer_options (const char *, location_t, int,
> +   unsigned int, int, bool);
>  extern bool common_handle_option (struct gcc_options *opts,
> struct gcc_options *opts_set,
> const struct cl_decoded_option *decoded,
> --- gcc/opts.c.jj 2016-01-23 00:13:00.662018617 +0100
> +++ gcc/opts.c2016-01-25 14:06:31.834127398 +0100
> @@ -1433,6 +1433,104 @@ enable_fdo_optimizations (struct gcc_opt
>  opts->x_flag_tree_loop_distribute_patterns = value;
>  }
>  
> +/* -f{,no-}sanitize{,-recover}= suboptions.  */
> +static const struct sanitizer_opts_s
> +{
> +  const char *const name;
> +  unsigned int flag;
> +  size_t len;
> +} sanitizer_opts[] =
> +{
> +#define SANITIZER_OPT(name, flags) { #name, flags, sizeof #name - 1 }
> +  SANITIZER_OPT (address, SANITIZE_ADDRESS | SANITIZE_USER_ADDRESS),
> +  SANITIZER_OPT (kernel-address, SANITIZE_ADDRESS | SANITIZE_KERNEL_ADDRESS),
> +  SANITIZER_OPT (thread, SANITIZE_THREAD),
> +  SANITIZER_OPT (leak, SANITIZE_LEAK),
> +  SANITIZER_OPT (shift, SANITIZE_SHIFT),
> +  SANITIZER_OPT (integer-divide-by-zero, SANITIZE_DIVIDE),
> +  SANITIZER_OPT (undefined, SANITIZE_UNDEFINED),
> +  SANITIZER_OPT (unreachable, SANITIZE_UNREACHABLE),
> +  SANITIZER_OPT (vla-bound, SANITIZE_VLA),
> +  SANITIZER_OPT (return, SANITIZE_RETURN),
> +  SANITIZER_OPT (null, SANITIZE_NULL),
> +  SANITIZER_OPT (signed-integer-overflow, SANITIZE_SI_OVERFLOW),
> +  SANITIZER_OPT (bool, SANITIZE_BOOL),
> +  SANITIZER_OPT (enum, SANITIZE_ENUM),
> +  SANITIZER_OPT (float-divide-by-zero, SANITIZE_FLOAT_DIVIDE),
> +  SANITIZER_OPT (float-cast-overflow, SANITIZE_FLOAT_CAST),
> +  SANITIZER_OPT (bounds, SANITIZE_BOUNDS),
> +  SANITIZER_OPT (bounds-strict, SANITIZE_BOUNDS | SANITIZE_BOUNDS_STRICT),
> +  SANITIZER_OPT (alignment, SANITIZE_ALIGNMENT),
> +  SANITIZER_OPT (nonnull-a

Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables

2016-01-26 Thread Richard Biener
On Tue, 26 Jan 2016, Tom de Vries wrote:

> On 25/01/16 14:27, Ilya Verbin wrote:
> > Hi!
> > 
> > On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote:
> > > > diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> > > > index 62e5454..cdaee41 100644
> > > > --- a/gcc/lto-cgraph.c
> > > > +++ b/gcc/lto-cgraph.c
> > > > @@ -1911,6 +1911,11 @@ input_offload_tables (void)
> > > >   tree fn_decl
> > > > = lto_file_decl_data_get_fn_decl (file_data,
> > > > decl_index);
> > > >   vec_safe_push (offload_funcs, fn_decl);
> > > > +
> > > > + /* Prevent IPA from removing fn_decl as unreachable, 
> > > > since there
> > > > +may be no refs from the parent function to child_fn in
> > > > offload
> > > > +LTO mode.  */
> > > > + cgraph_node::get (fn_decl)->mark_force_output ();
> > > > }
> > > >   else if (tag == LTO_symtab_variable)
> > > > {
> > > > @@ -1918,6 +1923,10 @@ input_offload_tables (void)
> > > >   tree var_decl
> > > > = lto_file_decl_data_get_var_decl (file_data,
> > > > decl_index);
> > > >   vec_safe_push (offload_vars, var_decl);
> > > > +
> > > > + /* Prevent IPA from removing var_decl as unused, since 
> > > > there
> > > > +may be no refs to var_decl in offload LTO mode.  */
> > > > + varpool_node::get (var_decl)->force_output = 1;
> > > > }
> > 
> > This doesn't work when there is more than one LTO partition, because only
> > first
> > partition contains full offload table to maintain correct order, but cgraph
> > and
> > varpool nodes aren't necessarily created for the first partition.  To
> > reproduce:
> > 
> > $ make check-target-libgomp RUNTESTFLAGS="c.exp=for-*
> > --target_board=unix/-flto"
> > FAIL: libgomp.c/for-3.c (internal compiler error)
> > FAIL: libgomp.c/for-5.c (internal compiler error)
> > FAIL: libgomp.c/for-6.c (internal compiler error)
> > $ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-*
> > --target_board=unix/-flto"
> > FAIL: libgomp.c++/for-11.C (internal compiler error)
> > FAIL: libgomp.c++/for-13.C (internal compiler error)
> > FAIL: libgomp.c++/for-14.C (internal compiler error)
> 
> This works for me.
> 
> OK for trunk?

Ok.

Thanks,
Richard.


[PATCH] Fix up ICE with initializer containing address of invalid var (PR tree-optimization/69483)

2016-01-26 Thread Jakub Jelinek
Hi!

If as in the testcase below a VAR_DECL has error_mark_node type
(and that unfortunately happens (and has to) quite late, at the end of
parsing the TU), canonicalize_constructor_val can ICE on that, because it
will try to fold convert something to error_mark_node type.

Fixed by giving up in that case.
The patch also cleans up the change that introduced the error_mark_node in
there, to use FOR_EACH_VEC_ELT.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-01-26  Jakub Jelinek  

PR tree-optimization/69483
* gimple-fold.c (canonicalize_constructor_val): Return NULL
if base has error_mark_node type.

* c-parser.c (c_parser_translation_unit): Use FOR_EACH_VEC_ELT.

* gcc.dg/pr69483.c: New test.
* g++.dg/opt/pr69483.C: New test.

--- gcc/gimple-fold.c.jj2016-01-08 21:48:36.0 +0100
+++ gcc/gimple-fold.c   2016-01-26 10:54:12.142355308 +0100
@@ -195,6 +195,8 @@ canonicalize_constructor_val (tree cval,
   || TREE_CODE (base) == FUNCTION_DECL)
  && !can_refer_decl_in_current_unit_p (base, from_decl))
return NULL_TREE;
+  if (TREE_TYPE (base) == error_mark_node)
+   return NULL_TREE;
   if (TREE_CODE (base) == VAR_DECL)
TREE_ADDRESSABLE (base) = 1;
   else if (TREE_CODE (base) == FUNCTION_DECL)
--- gcc/c/c-parser.c.jj 2016-01-21 00:41:47.0 +0100
+++ gcc/c/c-parser.c2016-01-26 10:59:30.104941374 +0100
@@ -1431,15 +1431,14 @@ c_parser_translation_unit (c_parser *par
   while (c_parser_next_token_is_not (parser, CPP_EOF));
 }
 
-  for (unsigned i = 0; i < incomplete_record_decls.length (); ++i)
-{
-  tree decl = incomplete_record_decls[i];
-  if (DECL_SIZE (decl) == NULL_TREE && TREE_TYPE (decl) != error_mark_node)
-   {
- error ("storage size of %q+D isn%'t known", decl);
- TREE_TYPE (decl) = error_mark_node;
-   }
-}
+  unsigned int i;
+  tree decl;
+  FOR_EACH_VEC_ELT (incomplete_record_decls, i, decl)
+if (DECL_SIZE (decl) == NULL_TREE && TREE_TYPE (decl) != error_mark_node)
+  {
+   error ("storage size of %q+D isn%'t known", decl);
+   TREE_TYPE (decl) = error_mark_node;
+  }
 }
 
 /* Parse an external declaration (C90 6.7, C99 6.9).
--- gcc/testsuite/gcc.dg/pr69483.c.jj   2016-01-26 11:02:41.152289108 +0100
+++ gcc/testsuite/gcc.dg/pr69483.c  2016-01-26 11:02:20.0 +0100
@@ -0,0 +1,6 @@
+/* PR tree-optimization/69483 */
+/* { dg-do compile } */
+
+struct T { struct S *a; };
+struct S b; /* { dg-error "storage size of 'b' isn't known" } */
+struct T c = { &b };
--- gcc/testsuite/g++.dg/opt/pr69483.C.jj   2016-01-26 11:06:03.375481313 
+0100
+++ gcc/testsuite/g++.dg/opt/pr69483.C  2016-01-26 11:03:20.0 +0100
@@ -0,0 +1,6 @@
+// PR tree-optimization/69483
+// { dg-do compile }
+
+struct T { struct S *a; };
+struct S b; // { dg-error "aggregate 'S b' has incomplete type and cannot be 
defined" }
+struct T c = { &b };

Jakub


Re: [PATCH] Handle -fsanitize=* in lto-wrapper (PR lto/69254)

2016-01-26 Thread Bernd Schmidt

On 01/26/2016 02:24 PM, Jakub Jelinek wrote:



just designed to enable DEF_SANITIZER_BUILTIN (IIUC). Also, why use shift
and not just sanitize=undefined?


Because -fsanitize=undefined is a large collection of individual sanitizers,
and at least some of them affect also post-IPA code (e.g.
-fsanitize=unreachable).  The goal is to pick one of the sanitizers that are
handled solely pre-IPA only, after that just are present in form of a
builtin call in the IL (and thus all that lto1 needs to do for that option
is initialize the builtins).


Ok. That should also go into one of the comments describing the choice 
of -fsanitize=shift.



Maybe.  It also has to check the option->value though.  The thing is,
because I didn't want to memmove options around when processing the first
TU, it will canonicalize all the -f{,no-}sanitize= options depending on if
in the end some ubsan sanitizers were enabled or not to
-f{,no-}sanitize=shift.  And merge_and_complain thus should ignore
all the -fno-sanitize=shift options in DECODED_OPTIONS, if it finds any
-fsanitize=shift in DECODED_OPTIONS (optionally preceeded by some
-fno-sanitize=shift), then nothing needs to be added, otherwise
-fsanitize=shift is added.


Again that sort of thing might be worthwhile to have in a comment.


Bernd


[committed, PATCH] Remove -m32 from gcc.target/i386/pr68986-2.c

2016-01-26 Thread H.J. Lu
Index: ChangeLog
===
--- ChangeLog   (revision 232829)
+++ ChangeLog   (working copy)
@@ -1,5 +1,9 @@
 2016-01-26  H.J. Lu  
 
+   * gcc.target/i386/pr68986-2.c: Remove -m32.
+
+2016-01-26  H.J. Lu  
+
PR target/68986
* gcc.target/i386/pr68986-1.c: New test.
* gcc.target/i386/pr68986-2.c: Likewise.
Index: gcc.target/i386/pr68986-2.c
===
--- gcc.target/i386/pr68986-2.c (revision 232829)
+++ gcc.target/i386/pr68986-2.c (working copy)
@@ -1,7 +1,7 @@
 /* { dg-do compile { target ia32 } } */
 /* { dg-require-effective-target tls_native } */
 /* { dg-require-effective-target fpic } */
-/* { dg-options "-fPIC -mno-accumulate-outgoing-args 
-mpreferred-stack-boundary=2 -m32" } */
+/* { dg-options "-fPIC -mno-accumulate-outgoing-args 
-mpreferred-stack-boundary=2" } */
 
 extern __thread int msgdata;
 int


RFA: Fix for cygwin/mingw PR 66655

2016-01-26 Thread Nick Clifton
Hi Guys,

  The patch below is offered as a fix for PR 66655.  In testing it
  appears that the patch does work, and does not break building
  libstdc++-v3 for cygwin or mingw.  (Unlike the earlier version...)

  Due to my brain being so small, I have already checked the patch in,
  without receiving proper authorisation.  I apologise for this.  If the
  patch does prove suitable and is approved today, then I will leave it
  in.  Otherwise I will revert the change and wait for proper approval.

  The patch itself is also slightly dubious in that I am not sure if I
  have all the correct terms in the if-statement.  I was going for
  minimal impact on the current code, so I went for a set of selectors
  that matched the testcase for PR 66655, but nothing else.  In
  particular I did not check to see if a similar problem exists for
  methods or virtual functions.

  My theory was that if does turn out that these kinds of functions can
  also trigger this kind of bug, then the patch could be extended
  later.  Plus a new bug report is likely to include a new testcase that
  can be added to the testsuite.

  So ... OK to apply ?

Cheers
  Nick

gcc/ChangeLog
2016-01-26  Nick Clifton  

PR target/66655
* config/i386/winnt.c (i386_pe_binds_local_p): If a function has
been marked as DECL_ONE_ONLY but we do not the means to make it
so, then do not allow it to bind locally.

Index: gcc/config/i386/winnt.c
===
--- gcc/config/i386/winnt.c (revision 232784)
+++ gcc/config/i386/winnt.c (working copy)
@@ -341,6 +341,20 @@
   && TREE_PUBLIC (exp)
   && DECL_EXTERNAL (exp))
 return true;
+
+#ifndef MAKE_DECL_ONE_ONLY
+  /* PR target/66655: If a function has been marked as DECL_ONE_ONLY
+ but we do not the means to make it so, then do not allow it to
+ bind locally.  */
+  if (DECL_P (exp)
+  && TREE_CODE (exp) == FUNCTION_DECL
+  && TREE_PUBLIC (exp)
+  && DECL_ONE_ONLY (exp)
+  && ! DECL_EXTERNAL (exp)
+  && DECL_DECLARED_INLINE_P (exp))
+return false;
+#endif
+  
   return default_binds_local_p_1 (exp, 0);
 }
 


Re: [PATCH] Handle -fsanitize=* in lto-wrapper (PR lto/69254)

2016-01-26 Thread Jakub Jelinek
On Tue, Jan 26, 2016 at 03:06:43PM +0100, Richard Biener wrote:
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Can you split out the non -fsanitize part?  It is ok.

Ok, I've committed following patch:

2016-01-26  Jakub Jelinek  

PR lto/69254
* lto-wrapper.c (merge_and_complain): Handle -fcilkplus.
(append_compiler_options): Handle -fcilkplus.
(append_linker_options): Ignore -fno-{openmp,openacc,cilkplus}.

--- gcc/lto-wrapper.c.jj2016-01-23 00:13:00.632019027 +0100
+++ gcc/lto-wrapper.c   2016-01-25 15:59:49.778877313 +0100
@@ -277,6 +293,7 @@ merge_and_complain (struct cl_decoded_op
case OPT_fwrapv:
case OPT_fopenmp:
case OPT_fopenacc:
+   case OPT_fcilkplus:
case OPT_fcheck_pointer_bounds:
  /* For selected options we can merge conservatively.  */
  for (j = 0; j < *decoded_options_count; ++j)
@@ -505,6 +546,7 @@ append_compiler_options (obstack *argv_o
case OPT_fwrapv:
case OPT_fopenmp:
case OPT_fopenacc:
+   case OPT_fcilkplus:
case OPT_ftrapv:
case OPT_fstrict_overflow:
case OPT_foffload_abi_:
@@ -558,6 +601,15 @@ append_linker_options (obstack *argv_obs
 ???  We fail to diagnose a possible mismatch here.  */
  continue;
 
+   case OPT_fopenmp:
+   case OPT_fopenacc:
+   case OPT_fcilkplus:
+ /* Ignore -fno-XXX form of these options, as otherwise
+corresponding builtins will not be enabled.  */
+ if (option->value == 0)
+   continue;
+ break;
+
default:
  break;
}


Jakub


Re: [PATCH][AArch64] Add vector permute cost

2016-01-26 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 16 December 2015 11:37
To: Richard Biener; James Greenhalgh
Cc: GCC Patches; nd
Subject: RE: [PATCH][AArch64] Add vector permute cost

Richard Biener wrote:
> On Wed, Dec 16, 2015 at 10:32 AM, James Greenhalgh
>  wrote:
> > On Tue, Dec 15, 2015 at 11:35:45AM +, Wilco Dijkstra wrote:
> >>
> >> Add support for vector permute cost since various permutes can expand into 
> >> a complex
> >> sequence of instructions.  This fixes major performance regressions due to 
> >> recent changes
> >> in the SLP vectorizer (which now vectorizes more aggressively and emits 
> >> many complex
> >> permutes).
> >>
> >> Set the cost to > 1 for all microarchitectures so that the number of 
> >> permutes is usually zero
> >> and regressions disappear.  An example of the kind of code that might be 
> >> emitted for
> >> VEC_PERM_EXPR {0, 3} where registers happen to be in the wrong order:
> >>
> >> adrpx4, .LC16
> >> ldr q5, [x4, #:lo12:.LC16
> >> eor v1.16b, v1.16b, v0.16b
> >> eor v0.16b, v1.16b, v0.16b
> >> eor v1.16b, v1.16b, v0.16b
> >> tbl v0.16b, {v0.16b - v1.16b}, v5.16b
> >>
> >> Regress passes. This fixes regressions that were introduced recently, so 
> >> OK for commit?
> >>
> >>
> >> ChangeLog:
> >> 2015-12-15  Wilco Dijkstra  
> >>
> >>   * gcc/config/aarch64/aarch64.c (generic_vector_cost):
> >>   Set vec_permute_cost.
> >>   (cortexa57_vector_cost): Likewise.
> >>   (exynosm1_vector_cost): Likewise.
> >>   (xgene1_vector_cost): Likewise.
> >>   (aarch64_builtin_vectorization_cost): Use vec_permute_cost.
> >>   * gcc/config/aarch64/aarch64-protos.h (cpu_vector_cost):
> >>   Add vec_permute_cost entry.
> >>
> >>
> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> >> index 
> >> 10754c88c0973d8ef3c847195b727f02b193bbd8..2584f16d345b3d015d577dd28c08a73ee3e0b0fb
> >>  100644
> >> --- a/gcc/config/aarch64/aarch64.c
> >> +++ b/gcc/config/aarch64/aarch64.c
> >> @@ -314,6 +314,7 @@ static const struct cpu_vector_cost 
> >> generic_vector_cost =
> >>1, /* scalar_load_cost  */
> >>1, /* scalar_store_cost  */
> >>1, /* vec_stmt_cost  */
> >> +  2, /* vec_permute_cost  */
> >>1, /* vec_to_scalar_cost  */
> >>1, /* scalar_to_vec_cost  */
> >>1, /* vec_align_load_cost  */
> >
> > Is there any reasoning behind making this 2? Do we now miss vectorization
> > for some of the cheaper permutes? Across the cost models/pipeline
> > descriptions that have been contributed to GCC I think that this is a
> > sensible change to the generic costs, but I just want to check there
> > was some reasoning/experimentation behind the number you picked.

Yes, it blocks vectorization if they have a high percentage of permutes, even 
if trivial.
But that is exactly the goal - when we vectorize we want to minimize data 
shuffling
that just adds latency and does not contribute to actual work.

The value 2 was the minimal value that avoided the regressions I noticed due the
improved SLP vectorization. I tried other possibilities like increasing the 
statement cost,
but that affects other cases, so this is the simplest change.

Note that I'm not too convinced about any of the existing vector costs, they 
seem
fairly random numbers - the A57 vector costs for example block many simple
vectorization cases that are beneficial. The goal of this is simply to fix the 
regressions
rather than tuning vector costs (which is only possible if we have a better 
cost model).

> > As permutes can have such wildly different costs, this all seems like a good
> > candidate for some future much more involved hook from the vectorizer to the
> > back-end specifying the candidate permute operation and requesting a cost
> > (part of the bigger gimple costs framework?).
>
> Yes, the vectorizer side also needs to improve here.  Not sure if it is 
> possible
> to represent this kind of complex cost queries with a single gimple cost hook.
> After all we don't really want to generate the full gimple stmt just to query
> its cost ...
>
> To better represent permute cost in the short term we'd need another 
> vectorizer
> specific hook, not sth for stage3 unless we face some serious regressions
> on real-world code (thus not microbenchmarks only)

Yes we need a better cost model for the vectorizer. The sort of things that are 
important
is the vector mode (so we can differentiate between 2x, 4x, 8x, 16x 
vectorization etc),
the specific permute for permutes, the size and type of load/store (as some do 
permutes) etc.

Wilco



[AArch64] Disable pcrelative_literal_loads with fix-cortex-a53-843419

2016-01-26 Thread Christophe Lyon
Hi,

This is a followup to PR63304.

As discussed in bugzilla, this patch disables pcrelative_literal_loads
when -mfix-cortex-a53-843419 (or its default configure option) is
used.

I copied the behavior of -mfix-cortex-a53-835769 (e.g. in
aarch64_can_inline_p), and I have tested by building the Linux kernel
using -mfix-cortex-a53-843419 and checked that
R_AARCH64_ADR_PREL_PG_HI21 relocations are not emitted anymore (under
CONFIG_ARM64_ERRATUM_843419).

For reference, this is motivated by:
https://bugs.linaro.org/show_bug.cgi?id=1994
and further details on Launchpad:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533009

OK for trunk?

Thanks,

Christophe.


Re: [AArch64] Disable pcrelative_literal_loads with fix-cortex-a53-843419

2016-01-26 Thread Christophe Lyon
With the attachment


On 26 January 2016 at 15:42, Christophe Lyon  wrote:
> Hi,
>
> This is a followup to PR63304.
>
> As discussed in bugzilla, this patch disables pcrelative_literal_loads
> when -mfix-cortex-a53-843419 (or its default configure option) is
> used.
>
> I copied the behavior of -mfix-cortex-a53-835769 (e.g. in
> aarch64_can_inline_p), and I have tested by building the Linux kernel
> using -mfix-cortex-a53-843419 and checked that
> R_AARCH64_ADR_PREL_PG_HI21 relocations are not emitted anymore (under
> CONFIG_ARM64_ERRATUM_843419).
>
> For reference, this is motivated by:
> https://bugs.linaro.org/show_bug.cgi?id=1994
> and further details on Launchpad:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533009
>
> OK for trunk?
>
> Thanks,
>
> Christophe.
2016-01-26  Christophe Lyon  

* config/aarch64/aarch64.h (TARGET_FIX_ERR_A53_843419_DEFAULT):
Always define to 0 or 1.
(TARGET_FIX_ERR_A53_843419): New macro.
* config/aarch64/aarch64-elf-raw.h
(TARGET_FIX_ERR_A53_843419_DEFAULT): Update for above changes.
* config/aarch64/aarch64-linux.h: Likewise.
* config/aarch64/aarch64.c
(aarch64_override_options_after_change_1): Do not default
aarch64_nopcrelative_literal_loads to true if Cortex-A53 erratum
843419 is on.
(aarch64_attributes): Handle fix-cortex-a53-843419.
(aarch64_can_inline_p): Likewise.
* config/aarch64/aarch64.opt (aarch64_fix_a53_err843419): Save.
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 8b463c9..ec96ce3 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -179,6 +179,20 @@ extern unsigned aarch64_architecture_version;
   ((aarch64_fix_a53_err835769 == 2)\
   ? TARGET_FIX_ERR_A53_835769_DEFAULT : aarch64_fix_a53_err835769)
 
+/* Make sure this is always defined so we don't have to check for ifdefs
+   but rather use normal ifs.  */
+#ifndef TARGET_FIX_ERR_A53_843419_DEFAULT
+#define TARGET_FIX_ERR_A53_843419_DEFAULT 0
+#else
+#undef TARGET_FIX_ERR_A53_843419_DEFAULT
+#define TARGET_FIX_ERR_A53_843419_DEFAULT 1
+#endif
+
+/* Apply the workaround for Cortex-A53 erratum 843419.  */
+#define TARGET_FIX_ERR_A53_843419  \
+  ((aarch64_fix_a53_err843419 == 2)\
+  ? TARGET_FIX_ERR_A53_843419_DEFAULT : aarch64_fix_a53_err843419)
+
 /* ARMv8.1 Adv.SIMD support.  */
 #define TARGET_SIMD_RDMA (TARGET_SIMD && AARCH64_ISA_RDMA)
 
diff --git a/gcc/config/aarch64/aarch64-elf-raw.h 
b/gcc/config/aarch64/aarch64-elf-raw.h
index 2dcb6d4..9097017 100644
--- a/gcc/config/aarch64/aarch64-elf-raw.h
+++ b/gcc/config/aarch64/aarch64-elf-raw.h
@@ -35,7 +35,7 @@
   " %{mfix-cortex-a53-835769:--fix-cortex-a53-835769}"
 #endif
 
-#ifdef TARGET_FIX_ERR_A53_843419_DEFAULT
+#if TARGET_FIX_ERR_A53_843419_DEFAULT
 #define CA53_ERR_843419_SPEC \
   " %{!mno-fix-cortex-a53-843419:--fix-cortex-a53-843419}"
 #else
diff --git a/gcc/config/aarch64/aarch64-linux.h 
b/gcc/config/aarch64/aarch64-linux.h
index 6064b26..5fcaa59 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -53,7 +53,7 @@
   " %{mfix-cortex-a53-835769:--fix-cortex-a53-835769}"
 #endif
 
-#ifdef TARGET_FIX_ERR_A53_843419_DEFAULT
+#if TARGET_FIX_ERR_A53_843419_DEFAULT
 #define CA53_ERR_843419_SPEC \
   " %{!mno-fix-cortex-a53-843419:--fix-cortex-a53-843419}"
 #else
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 03bc1b9..3bea61e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8062,9 +8062,11 @@ aarch64_override_options_after_change_1 (struct 
gcc_options *opts)
   if (opts->x_nopcrelative_literal_loads == 1)
 aarch64_nopcrelative_literal_loads = false;
 
-  /* If it is not set on the command line, we default to no
- pc relative literal loads.  */
-  if (opts->x_nopcrelative_literal_loads == 2)
+  /* If it is not set on the command line, we default to no pc
+ relative literal loads, unless the workaround for Cortex-A53
+ erratum 843419 is in effect.  */
+  if (opts->x_nopcrelative_literal_loads == 2
+  && !TARGET_FIX_ERR_A53_843419)
 aarch64_nopcrelative_literal_loads = true;
 
   /* In the tiny memory model it makes no sense
@@ -8748,6 +8750,8 @@ static const struct aarch64_attribute_info 
aarch64_attributes[] =
  OPT_mgeneral_regs_only },
   { "fix-cortex-a53-835769", aarch64_attr_bool, true, NULL,
  OPT_mfix_cortex_a53_835769 },
+  { "fix-cortex-a53-843419", aarch64_attr_bool, true, NULL,
+ OPT_mfix_cortex_a53_843419 },
   { "cmodel", aarch64_attr_enum, false, NULL, OPT_mcmodel_ },
   { "strict-align", aarch64_attr_mask, false, NULL, OPT_mstrict_align },
   { "omit-leaf-frame-pointer", aarch64_attr_bool, true, NULL,
@@ -9162,6 +9166,12 @@ aarch64_can_inline_p (tree caller, tree callee)
  2, TARGET_FIX_ERR_A53_835769_DEFAULT))
 return false;
 
+  if (!aarch64_tribools_ok_for_inlining_p (
+ caller_opts->x_aarc

Re: [AArch64] Disable pcrelative_literal_loads with fix-cortex-a53-843419

2016-01-26 Thread Kyrill Tkachov


On 26/01/16 14:42, Christophe Lyon wrote:

Hi,

This is a followup to PR63304.

As discussed in bugzilla, this patch disables pcrelative_literal_loads
when -mfix-cortex-a53-843419 (or its default configure option) is
used.

I copied the behavior of -mfix-cortex-a53-835769 (e.g. in
aarch64_can_inline_p), and I have tested by building the Linux kernel
using -mfix-cortex-a53-843419 and checked that
R_AARCH64_ADR_PREL_PG_HI21 relocations are not emitted anymore (under
CONFIG_ARM64_ERRATUM_843419).

For reference, this is motivated by:
https://bugs.linaro.org/show_bug.cgi?id=1994
and further details on Launchpad:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533009

OK for trunk?

Thanks,

Christophe.



ENOPATCH.

Thanks,
Kyrill


Re: [PATCH][AArch64] Add vector permute cost

2016-01-26 Thread James Greenhalgh
On Tue, Dec 15, 2015 at 11:35:45AM +, Wilco Dijkstra wrote:
> 
> Add support for vector permute cost since various permutes can expand into a 
> complex
> sequence of instructions.  This fixes major performance regressions due to 
> recent changes
> in the SLP vectorizer (which now vectorizes more aggressively and emits many 
> complex 
> permutes).
> 
> Set the cost to > 1 for all microarchitectures so that the number of permutes 
> is usually zero
> and regressions disappear.  An example of the kind of code that might be 
> emitted for
> VEC_PERM_EXPR {0, 3} where registers happen to be in the wrong order:
> 
> adrpx4, .LC16
> ldr q5, [x4, #:lo12:.LC16
> eor v1.16b, v1.16b, v0.16b
> eor v0.16b, v1.16b, v0.16b
> eor v1.16b, v1.16b, v0.16b
> tbl v0.16b, {v0.16b - v1.16b}, v5.16b
> 
> Regress passes. This fixes regressions that were introduced recently, so OK 
> for commit?

OK.

Thanks,
James

> ChangeLog:
> 2015-12-15  Wilco Dijkstra  
> 
>   * gcc/config/aarch64/aarch64.c (generic_vector_cost):
>   Set vec_permute_cost.
>   (cortexa57_vector_cost): Likewise.
>   (exynosm1_vector_cost): Likewise.
>   (xgene1_vector_cost): Likewise.
>   (aarch64_builtin_vectorization_cost): Use vec_permute_cost.
>   * gcc/config/aarch64/aarch64-protos.h (cpu_vector_cost):
>   Add vec_permute_cost entry.
 


Re: [PATCH], PowerPC IEEE 128-bit fp, #12 (default -mfloat128 on PowerPC-Linux)

2016-01-26 Thread David Edelsohn
On Thu, Jan 21, 2016 at 4:25 PM, Michael Meissner
 wrote:
> This is the final patch (at least so far) that turns on -mfloat128 by default
> for PowerPC Linux systems where the VSX instruction set is enabled.  As I
> mentioned in the last email, because we don't build the __float128 emulator on
> other systems, I didn't think it would be useful to make it the default.
>
> I did a boostrap build/check with no regressions on a little endian power8
> system.  Are the patches ok to check in?
>
> [gcc]
> 2016-01-21   Michael Meissner  
>
> * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
> -mfloat128 by default on PowerPC Linux systems with the VSX
> instruction enabled.
>
> [gcc/testsuite]
> 2016-01-21   Michael Meissner  
>
> * gcc.target/powerpc/float128-1.c: New test for IEEE 128-bit
> floating point support.
> * gcc.target/powerpc/float128-2.c: Likewise.

No.  This is too risky a change during Stage 4.

- David


Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)

2016-01-26 Thread Christophe Lyon
On 26 January 2016 at 14:20, Kyrill Tkachov  wrote:
> Hi Christophe,
>
> On 20/01/16 21:10, Christophe Lyon wrote:
>>
>> On 19 January 2016 at 15:51, Alan Lawrence 
>> wrote:
>>>
>>> On 19/01/16 11:15, Christophe Lyon wrote:
>>>
>> For neon_vdupn, I chose to implement neon_vdup_nv4hf and
>> neon_vdup_nv8hf instead of updating the VX iterator because I thought
>> it was not desirable to impact neon_vrev32.
>
>
> Well, the same instruction will suffice for vrev32'ing vectors of HF
> just
> as
> well as vectors of HI, so I think I'd argue that's harmless enough. To
> gain the
> benefit, we'd need to update arm_evpc_neon_vrev with a few new cases,
> though.
>
 Since this is more intrusive, I'd rather leave that part for later. OK?
>>>
>>>
>>> Sure.
>>>
>> +#ifdef __ARM_BIG_ENDIAN
>> +  /* Here, 3 is (4-1) where 4 is the number of lanes. This is also
>> the
>> + right value for vectors with 8 lanes.  */
>> +#define __arm_lane(__vec, __idx) (__idx ^ 3)
>> +#else
>> +#define __arm_lane(__vec, __idx) __idx
>> +#endif
>> +
>
>
> Looks right, but sounds... my concern here is that I'm hoping at some
> point we
> will move the *other* vget/set_lane intrinsics to use GCC vector
> extensions
> too. At which time (unlike __aarch64_lane which can be used everywhere)
> this
> will be the wrong formula. Can we name (and/or comment) it to avoid
> misleading
> anyone? The key characteristic seems to be that it is for vectors of
> 16-bit
> elements only.
>
 I'm not to follow, here. Looking at the patterns for
 neon_vget_lane_*internal in neon.md,
 I can see 2 flavours: one for VD, one for VQ2. The latter uses
 "halfelts".

 Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq),
 that would be similar to the aarch64 ones (by computing the number of
 lanes of the input vector), but the "q" one would use half the total
 number of lanes instead?
>>>
>>>
>>> That works for me! Sthg like:
>>>
>>> #define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx
>>> #define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) +
>>> (NUM_LANES(__vec)/2 - __idx)
>>> //or similarly
>>> #define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1))
>>>
>>> Alternatively I'd been thinking
>>>
>>> #define __arm_lane_32xN(__idx) __idx ^ 1
>>> #define __arm_lane_16xN(__idx) __idx ^ 3
>>> #define __arm_lane_8xN(__idx) __idx ^ 7
>>>
>>> Bear in mind PR64893 that we had on AArch64 :-(
>>>
>> Here is a new version, based on the comments above.
>> I've also removed the addition of arm_fp_ok effective target since I
>> added that in my other testsuite patch.
>>
>> OK now?
>>
>> Thanks,
>>
>> Christophe
>>
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 3588b83..b1f408c 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -12370,6 +12370,10 @@ neon_valid_immediate (rtx op, machine_mode mode,
> int inverse,
>if (!vfp3_const_double_rtx (el0) && el0 != CONST0_RTX (GET_MODE
> (el0)))
>  return -1;
>  +  /* FP16 vectors cannot be represented.  */
> +  if (innersize == 2)
> +   return -1;
> +
>r0 = CONST_DOUBLE_REAL_VALUE (el0);
>
>
> I think it'd be clearer to write "if (GET_MODE_INNER (mode) == HFmode)"
>
> +(define_expand "movv4hf"
> +  [(set (match_operand:V4HF 0 "s_register_operand")
> +   (match_operand:V4HF 1 "s_register_operand"))]
> +  "TARGET_NEON && TARGET_FP16"
> +{
> +  if (can_create_pseudo_p ())
> +{
> +  if (!REG_P (operands[0]))
> +   operands[1] = force_reg (V4HFmode, operands[1]);
> +}
> +})
>
> Can you please add a comment saying why you need the force_reg here?
> IIRC it's because of CANNOT_CHANGE_MODE_CLASS on big-endian that causes an
> ICE during expand with subregs.
>
> I've tried this patch out and it does indeed fix the ICE on armeb.
> So ok for trunk with the changes above.
> Thanks,
> Kyrill
>
>

OK thanks, here is what I have committed (r232832).


Christophe.
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 232831)
+++ gcc/config/arm/arm.c(working copy)
@@ -12381,6 +12381,10 @@
   if (!vfp3_const_double_rtx (el0) && el0 != CONST0_RTX (GET_MODE (el0)))
 return -1;
 
+  /* FP16 vectors cannot be represented.  */
+  if (GET_MODE_INNER (mode) == HFmode)
+   return -1;
+
   r0 = CONST_DOUBLE_REAL_VALUE (el0);
 
   for (i = 1; i < n_elts; i++)
Index: gcc/config/arm/arm_neon.h
===
--- gcc/config/arm/arm_neon.h   (revision 232831)
+++ gcc/config/arm/arm_neon.h   (working copy)
@@ -5302,16 +5302,28 @@
were marked always-inline so there were no call sites, the declaration
would nonetheless raise an error.  Hence, we must use a macro instead. 

Re: [C PATCH] Fix -Wunused-function (PR debug/66869)

2016-01-26 Thread Richard Biener
On Mon, Jan 25, 2016 at 9:38 PM, Jakub Jelinek  wrote:
> Hi!
>
> The early-debug changes moved warnings about unused functions into cgraph.
> The problem is that if we have just unused declarations, they aren't
> sometimes even registered with cgraph and therefore we no longer warn.
>
> Here is an attempt to register those with cgraph anyway to get the warning,
> for C FE only (no idea where to do that in C++ FE).  Or anyone has better
> suggestions what to do?
>
> Bootstrapped/regtested on x86_64-linux and i686-linux.
>
> 2016-01-25  Jakub Jelinek  
>
> PR debug/66869
> * c-decl.c (c_write_global_declarations_1): For warn_unused_function,
> ensure creation of cgraph node even if there is no definition.
>
> * gcc.dg/pr66869.c: New test.
>
> --- gcc/c/c-decl.c.jj   2016-01-21 00:41:47.0 +0100
> +++ gcc/c/c-decl.c  2016-01-25 16:36:31.973504082 +0100
> @@ -10741,11 +10741,19 @@ c_write_global_declarations_1 (tree glob
>if (TREE_CODE (decl) == FUNCTION_DECL
>   && DECL_INITIAL (decl) == 0
>   && DECL_EXTERNAL (decl)
> - && !TREE_PUBLIC (decl)
> - && C_DECL_USED (decl))
> + && !TREE_PUBLIC (decl))
> {
> - pedwarn (input_location, 0, "%q+F used but never defined", decl);
> - TREE_NO_WARNING (decl) = 1;
> + if (C_DECL_USED (decl))
> +   {
> + pedwarn (input_location, 0, "%q+F used but never defined", 
> decl);
> + TREE_NO_WARNING (decl) = 1;
> +   }
> + /* For -Wunused-function push the unused statics into cgraph,
> +so that check_global_declaration emits the warning.  */
> + else if (warn_unused_function
> +  && ! DECL_ARTIFICIAL (decl)
> +  && ! TREE_NO_WARNING (decl))
> +   cgraph_node::get_create (decl);

Err, so why not warn here directly?

Richard.

> }
>
>wrapup_global_declaration_1 (decl);
> --- gcc/testsuite/gcc.dg/pr66869.c.jj   2016-01-25 16:38:39.037758657 +0100
> +++ gcc/testsuite/gcc.dg/pr66869.c  2016-01-25 16:39:42.346888954 +0100
> @@ -0,0 +1,6 @@
> +/* PR debug/66869 */
> +/* { dg-do compile } */
> +/* { dg-options "-Wunused-function" } */
> +
> +static void test (void); /* { dg-warning "'test' declared 'static' but never 
> defined" } */
> +int i;
>
> Jakub


Re: [PATCH] Fix up ICE with initializer containing address of invalid var (PR tree-optimization/69483)

2016-01-26 Thread Richard Biener
On Tue, Jan 26, 2016 at 3:17 PM, Jakub Jelinek  wrote:
> Hi!
>
> If as in the testcase below a VAR_DECL has error_mark_node type
> (and that unfortunately happens (and has to) quite late, at the end of
> parsing the TU), canonicalize_constructor_val can ICE on that, because it
> will try to fold convert something to error_mark_node type.
>
> Fixed by giving up in that case.
> The patch also cleans up the change that introduced the error_mark_node in
> there, to use FOR_EACH_VEC_ELT.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2016-01-26  Jakub Jelinek  
>
> PR tree-optimization/69483
> * gimple-fold.c (canonicalize_constructor_val): Return NULL
> if base has error_mark_node type.
>
> * c-parser.c (c_parser_translation_unit): Use FOR_EACH_VEC_ELT.
>
> * gcc.dg/pr69483.c: New test.
> * g++.dg/opt/pr69483.C: New test.
>
> --- gcc/gimple-fold.c.jj2016-01-08 21:48:36.0 +0100
> +++ gcc/gimple-fold.c   2016-01-26 10:54:12.142355308 +0100
> @@ -195,6 +195,8 @@ canonicalize_constructor_val (tree cval,
>|| TREE_CODE (base) == FUNCTION_DECL)
>   && !can_refer_decl_in_current_unit_p (base, from_decl))
> return NULL_TREE;
> +  if (TREE_TYPE (base) == error_mark_node)
> +   return NULL_TREE;
>if (TREE_CODE (base) == VAR_DECL)
> TREE_ADDRESSABLE (base) = 1;
>else if (TREE_CODE (base) == FUNCTION_DECL)
> --- gcc/c/c-parser.c.jj 2016-01-21 00:41:47.0 +0100
> +++ gcc/c/c-parser.c2016-01-26 10:59:30.104941374 +0100
> @@ -1431,15 +1431,14 @@ c_parser_translation_unit (c_parser *par
>while (c_parser_next_token_is_not (parser, CPP_EOF));
>  }
>
> -  for (unsigned i = 0; i < incomplete_record_decls.length (); ++i)
> -{
> -  tree decl = incomplete_record_decls[i];
> -  if (DECL_SIZE (decl) == NULL_TREE && TREE_TYPE (decl) != 
> error_mark_node)
> -   {
> - error ("storage size of %q+D isn%'t known", decl);
> - TREE_TYPE (decl) = error_mark_node;
> -   }
> -}
> +  unsigned int i;
> +  tree decl;
> +  FOR_EACH_VEC_ELT (incomplete_record_decls, i, decl)
> +if (DECL_SIZE (decl) == NULL_TREE && TREE_TYPE (decl) != error_mark_node)
> +  {
> +   error ("storage size of %q+D isn%'t known", decl);
> +   TREE_TYPE (decl) = error_mark_node;
> +  }
>  }
>
>  /* Parse an external declaration (C90 6.7, C99 6.9).
> --- gcc/testsuite/gcc.dg/pr69483.c.jj   2016-01-26 11:02:41.152289108 +0100
> +++ gcc/testsuite/gcc.dg/pr69483.c  2016-01-26 11:02:20.0 +0100
> @@ -0,0 +1,6 @@
> +/* PR tree-optimization/69483 */
> +/* { dg-do compile } */
> +
> +struct T { struct S *a; };
> +struct S b; /* { dg-error "storage size of 'b' isn't known" } */
> +struct T c = { &b };
> --- gcc/testsuite/g++.dg/opt/pr69483.C.jj   2016-01-26 11:06:03.375481313 
> +0100
> +++ gcc/testsuite/g++.dg/opt/pr69483.C  2016-01-26 11:03:20.0 +0100
> @@ -0,0 +1,6 @@
> +// PR tree-optimization/69483
> +// { dg-do compile }
> +
> +struct T { struct S *a; };
> +struct S b; // { dg-error "aggregate 'S b' has incomplete type and cannot be 
> defined" }
> +struct T c = { &b };
>
> Jakub


Re: [PATCH] PR other/69006: fix extra newlines after diagnostics (v2)

2016-01-26 Thread David Malcolm
On Tue, 2016-01-26 at 12:18 +0100, Bernd Schmidt wrote:
> On 01/25/2016 09:13 PM, David Malcolm wrote:
> > Here's an updated version of the patch.
> 
> Thanks!
> 
> > Instead of testing one particular kind of output via a plugin,
> > this version of the patch adds code to gcc-dg-prune to issue a
> > FAIL for any testcase containing blank lines, with a new
> >dg-allow-blank-lines-in-output
> > directive for those test cases that legimitately emit blank lines.
> > Examples of the latter include a test using -ftime-report, another
> > using -fdump-tree-cunrolli-details=stderr, and a Fortran test
> > using -fdump-fortran-original.
> 
> Is the =stderr test really necessary, or does it somehow predate the 
> ability to scan dumps?

It was deliberate.

Specifically:  this was in gcc/testsuite/gcc.dg/unroll-2.c

Looking backwards through history, it was renamed from
  gcc/testsuite/gcc.dg/unroll_1.c
in r219675 (2015-01-15)

It was changed to:
  -fdump-tree-cunrolli-details=stderr
from:
  -fdump-rtl-loop2_unroll=stderr
in r216238 (2014-10-14)

It was changed to:
  -fdump-rtl-loop2_unroll=stderr
from:
  -fdump-rtl-loop2_unroll
in r202260:

2013-09-04  Teresa Johnson  

* dumpfile.c (dump_finish): Don't close stderr/stdout.

* testsuite/gcc.dg/unroll_1.c: Test dumping to stderr.

which seems to have been this discussion:
  https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00151.html
which was part of this thread:
  https://gcc.gnu.org/ml/gcc-patches/2013-08/msg00247.html
  "[PATCH] Convert more passes to new dump framework"

Hence it appears to be deliberate: to give us test coverage for dumping
to stderr, during a rewrite of dumping.


> > OK for trunk in stage 4?  I regard PR 69006 as a regression, and it
> > affects all diagnostics we output (unless caret-printing is
> > disabled).
> 
> Yes, I think so. Ok.

Thanks.

> > +  for (int row = layout.get_first_line ();
> > +   row <= last_line;
> > +   row++)
> 
> While you're fixing the layout here, see if that doesn't all fit on one 
> line.

It does.  Re-running bootstrap®rtest now with that tweak; I'll commit
it if it passes.

Dave



Re: [PATCH] Handle -fsanitize=* in lto-wrapper (PR lto/69254)

2016-01-26 Thread Jakub Jelinek
On Tue, Jan 26, 2016 at 03:06:43PM +0100, Richard Biener wrote:
> I'm somewhat confused about that you drop -fsanitize options from
> the LTO options section writing in lto-opts.c but then add code to
> parse it from there in lto-wrapper.c.  The code there also looks

Sorry, as I said to Bernd, that has been just thinko in the comment change,
the code was doing what I meant to.

> somewhat duplicated - why not just canonicalize any -fsanitize=
> option coming in to the first in merge_and_complain
> and special-case it in append_compiler_options again by say
> 
>   case OPT_fsanitize_:
>  obstack_ptr_grow (argv_obstack, "-fsanitize=shift");

It is certainly much easier to just propagate around a bool flag
whether we need to add -fsanitize=shift after the linker options.
The reason I haven't done that because I didn't want to pass around
yet another argument, furthermore so specific to a particular option.

But looking around, it seems we are already passing around pairs,
the pointer to array of struct cl_decoded_option and pointer to the count of
elements in that array.  By sticking the two into a structure and adding
the sanitize_undefined flag for now, if in the future we need to handle
some other option similarly, we can just handle it in 3-4 spots and not
everywhere among so many different functions.

So, what do you think about the untested patch below (the test with
-c -flto -fsanitize=undefined
-flto -fno-sanitize=undefined
passes)?  The patch is large, because it changed many places from
a pair of pointer and unsigned int (or a pair of pointer to pointer and
pointer to unsigned int) to pointer to this new struct.

2016-01-26  Jakub Jelinek  

PR lto/69254
* lto-opts.c (lto_write_options): Write also -f{,no-}sanitize=
options.
* lto-wrapper.c (struct lto_decoded_options): New type.
(append_option, merge_and_complain, append_compiler_options,
append_linker_options, append_offload_options,
compile_offload_image, compile_images_for_offload_targets,
find_and_merge_options): Pass around options
in struct lto_decoded_options instead of struct cl_decoded_option
pointer and count pair.
(get_options_from_collect_gcc_options): Likewise.  Parse -fsanitize=
options and if in the end any ub sanitizers are enabled, set
decoded_opts->sanitize_undefined to true.
(run_gcc): Adjust callers of these functions.  If
fdecoded_options.sanitize_undefined is true, append
-fsanitize=shift after the linker options.

--- gcc/lto-opts.c.jj   2016-01-25 22:33:11.477029666 +0100
+++ gcc/lto-opts.c  2016-01-26 15:41:02.937040062 +0100
@@ -198,10 +198,13 @@ lto_write_options (void)
 
   /* Also drop all options that are handled by the driver as well,
 which includes things like -o and -v or -fhelp for example.
-We do not need those.  The only exception is -foffload option, if we
-write it in offload_lto section.  Also drop all diagnostic options.  */
+We do not need those.  The only exceptions are:
+1) -foffload option, if we write it in offload_lto section
+2) -f{,no-}sanitize=
+Also drop all diagnostic options.  */
   if ((cl_options[option->opt_index].flags & (CL_DRIVER|CL_WARNING))
- && (!lto_stream_offload_p || option->opt_index != OPT_foffload_))
+ && (!lto_stream_offload_p || option->opt_index != OPT_foffload_)
+ && option->opt_index != OPT_fsanitize_)
continue;
 
   for (j = 0; j < option->canonical_option_num_elements; ++j)
--- gcc/lto-wrapper.c.jj2016-01-26 15:24:10.457845617 +0100
+++ gcc/lto-wrapper.c   2016-01-26 16:38:16.784303735 +0100
@@ -118,6 +118,16 @@ maybe_unlink (const char *file)
 /* Template of LTRANS dumpbase suffix.  */
 #define DUMPBASE_SUFFIX ".ltrans18446744073709551615"
 
+/* Structure containing decoded options, number of them and auxiliary
+   state from the options handling.  */
+
+struct lto_decoded_options
+{
+  struct cl_decoded_option *opt;
+  unsigned int count;
+  bool sanitize_undefined;
+};
+
 /* Create decoded options from the COLLECT_GCC and COLLECT_GCC_OPTIONS
environment according to LANG_MASK.  */
 
@@ -125,13 +135,14 @@ static void
 get_options_from_collect_gcc_options (const char *collect_gcc,
  const char *collect_gcc_options,
  unsigned int lang_mask,
- struct cl_decoded_option 
**decoded_options,
- unsigned int *decoded_options_count)
+ struct lto_decoded_options *decoded_opts)
 {
   struct obstack argv_obstack;
   char *argv_storage;
   const char **argv;
   int j, k, argc;
+  unsigned int i;
+  int sanitize = 0;
 
   argv_storage = xstrdup (collect_gcc_options);
   obstack_init (&argv_obstack);
@@ -166,9 +177,30 @@ get_options_from_collect_gcc_options (co
   ar

Re: [PATCH] PR c++/69399: Add HAVE_WORKING_CXX_BUILTIN_CONSTANT_P

2016-01-26 Thread Richard Biener
On Mon, Jan 25, 2016 at 5:25 PM, H.J. Lu  wrote:
> On Mon, Jan 25, 2016 at 4:40 AM, Richard Biener
>  wrote:
>> On Fri, Jan 22, 2016 at 7:55 PM, H.J. Lu  wrote:
>>> Without the fix for PR 65656, g++ miscompiles __builtin_constant_p in
>>> wi::lrshift in wide-int.h.  Add a check with PR 65656 testcase to verify
>>> that C++ __builtin_constant_p works properly.
>>>
>>> Tested on x86-64 with working GCC:
>>>
>>> gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */
>>> prev-gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */
>>> stage1-gcc/auto-host.h:#define HAVE_WORKING_CXX_BUILTIN_CONSTANT_P 1
>>>
>>> and broken GCC:
>>>
>>> gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */
>>> prev-gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */
>>> stage1-gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */
>>>
>>> Ok for trunk?
>>
>> I have a hard time seeing how we are "miscompiling"
>>
>>   if (STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT)
>>   ? xi.len == 1 && xi.val[0] >= 0
>>   : xi.precision <= HOST_BITS_PER_WIDE_INT)
>>
>> anything that relies on __builtin_constant_p () for sematics is fishy so why
>> is this not simply an lrshfit implementation bug?
>>
>
>
> We hit this via:
>
> Breakpoint 1, wi::lrshift
>>, generic_wide_int > > (x=..., y=...)
> at /export/gnu/import/git/sources/gcc-release/gcc/wide-int.h:2898
> 2898  val[0] = xi.to_uhwi () >> shift;
> (gdb) bt
> #0  wi::lrshift >,
> generic_wide_int > > (x=..., y=...)
> at /export/gnu/import/git/sources/gcc-release/gcc/wide-int.h:2898
> #1  0x009e7bbe in
> wi::rshift >,
> generic_wide_int > > (sgn=,
> y=..., x=...)
> at /export/gnu/import/git/sources/gcc-release/gcc/wide-int.h:2947
> #2  bit_value_binop_1 (code=code@entry=RSHIFT_EXPR,
> type=type@entry=0x7fffefe82dc8, val=val@entry=0x7fffd7c0,
> mask=mask@entry=0x7fffd790, r1type=0x7fffefe82dc8, r1val=...,
> r1mask=..., r2type=0x7fffefd6b690, r2val=..., r2mask=...)
> at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:1348
> #3  0x009e9e7b in bit_value_binop (code=code@entry=RSHIFT_EXPR,
> type=0x7fffefe82dc8, rhs1=rhs1@entry=0x7fffefd71708, rhs2=)
> at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:1549
> #4  0x009eb520 in evaluate_stmt (stmt=stmt@entry=0x7fffefe9a160)
> at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:1785
> #5  0x009ec8d2 in visit_assignment (stmt=stmt@entry=0x7fffefe9a160,
> output_p=output_p@entry=0x7fffdba0)
> at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:2258
> #6  0x009ec9c2 in ccp_visit_stmt (stmt=0x7fffefe9a160,
> taken_edge_p=0x7fffdba8, output_p=0x7fffdba0)
> at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:2336
> ---Type  to continue, or q  to quit---
> #7  0x00a4efcf in simulate_stmt (stmt=stmt@entry=0x7fffefe9a160)
> at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-propagate.c:348
> #8  0x00a50f79 in simulate_block (block=)
> at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-propagate.c:471
> #9  ssa_propagate (
> visit_stmt=visit_stmt@entry=0x9ec937  edge*, tree*)>, visit_phi=visit_phi@entry=0x9e6aa5
> )
> at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-propagate.c:888
> #10 0x009e6295 in do_ssa_ccp ()
> at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:2382
> #11 (anonymous namespace)::pass_ccp::execute (this=)
> at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:2415
> #12 0x0089ca0c in execute_one_pass (pass=pass@entry=0x19b4bf0)
> at /export/gnu/import/git/sources/gcc-release/gcc/passes.c:2330
> #13 0x0089cd62 in execute_pass_list_1 (pass=0x19b4bf0)
> at /export/gnu/import/git/sources/gcc-release/gcc/passes.c:2382
> #14 0x0089cd7f in execute_pass_list_1 (pass=0x19b4a70,
> pass@entry=0x19b48f0)
> at /export/gnu/import/git/sources/gcc-release/gcc/passes.c:2383
> #15 0x0089cd9c in execute_pass_list (fn=0x7fffefe98000, 
> pass=0x19b48f0)
> at /export/gnu/import/git/sources/gcc-release/gcc/passes.c:2393
> #16 0x0089ba57 in do_per_function_toporder (
> callback=callback@entry=0x89cd83  opt_pass*)>, ---Type  to continue, or q  to quit---
> data=0x19b48f0) at 
> /export/gnu/import/git/sources/gcc-release/gcc/passes.c:1728
> #17 0x0089d3e3 in execute_ipa_pass_list (pass=0x19b4890)
> at /export/gnu/import/git/sources/gcc-release/gcc/passes.c:2736
> #18 0x0066f3ac in ipa_passes ()
> at /export/gnu/import/git/sources/gcc-release/gcc/cgraphunit.c:2172
> #19 symbol_table::compile (this=this@entry=0x7fffefd6b000)
> at /export/gnu/import/git/sources/gcc-release/gcc/cgraphunit.c:2313
> #20 0x00670be5 in symbol_table::finalize_compilation_unit (
> this=0x7fffefd6b000)
> at /export/gnu/imp

Re: [PATCH][ARM] Fix PR target/69245 Rewrite arm_set_current_function

2016-01-26 Thread Kyrill Tkachov

Hi Christian,

On 26/01/16 15:29, Christian Bruel wrote:



On 01/25/2016 05:37 PM, Kyrill Tkachov wrote:


So this is ok for trunk with the testcase changed as discussed above and using 
-O2
optimisation level and with a couple comment fixes below.

Hi Kyrill,

I realized afterwards that my implementation had a flaw with the handling of #pragma GCC reset. What happened is that when both old and new TREE_TARGET_OPTION are NULL, we didn't save_restore the globals flags, to save compute time. The 
problem is that with #pragma GCC reset, we also fall into this situation, and exit without calling target_reeinit :-(


Handling this situation doesn't complicate the code much, because I factorized the calls to target_reeinit + restore_target_globals into a new function (save_restore_target_globals). This function is called from the pragma context when 
resetting the state arm_reset_previous_fndecl to the default, and from arm_set_current_function when setting to a new target. This is only done when there is a change of the target flags, so no extra computing cost.


Same testing as with previous patch:
arm-qemu/
arm-qemu//-mfpu=neon-fp-armv8
arm-qemu//-mfpu=neon

Still OK ?



+/* Restore the TREE_TARGET_GLOBALS from previous state, or save it.  */
+static void
+save_restore_target_globals (tree new_tree)
+{
+  /* If we have a previous state, use it.  */
+  if (TREE_TARGET_GLOBALS (new_tree))
+restore_target_globals (TREE_TARGET_GLOBALS (new_tree));
+  else if (new_tree == target_option_default_node)
+restore_target_globals (&default_target_globals);
+  else
+{
+  /* Call target_reinit and save the state for TARGET_GLOBALS.  */
+  TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts ();
+}
+
+  arm_option_params_internal ();
+}

Space before the function comment and signature. Also, you need to document 
what is the NEW_TREE
parameter.

 /* Invalidate arm_previous_fndecl.  */
 void
-arm_reset_previous_fndecl (void)
+arm_reset_previous_fndecl (tree new_tree)
 {
+  if (new_tree)
+save_restore_target_globals (new_tree);
+
   arm_previous_fndecl = NULL_TREE;
 }

I'm a bit wary of complicating this function. Suddenly it doesn't just reset 
the previous fndecl
but also restores globals and can save stuff into its argument. It's suddenly 
not clear what it's
purpose is.
I think it would be clearer if you just added save_restore_target_globals to 
arm_protos.h and called
it explicitly from arm_pragma_target_parse when appropriate.

+
+  /* If nothing to do return. #pragma GCC reset or #pragma GCC pop to
+ the default have been handled by save_restore_target_globals from
+ arm_pragma_target_parse.  */

Two spaces between fullstop and "#pragma GCC".

Thanks,
Kyrill



Re: Patch RFA: Add option -fcollectible-pointers, use it in ivopts

2016-01-26 Thread David Malcolm
On Tue, 2016-01-26 at 05:35 -0800, Ian Lance Taylor wrote:
[...]

> Index: common.opt
> ===
> --- common.opt  (revision 232580)
> +++ common.opt  (working copy)
> @@ -1380,6 +1380,10 @@
>  Enable hoisting adjacent loads to encourage generating conditional move
>  instructions.
>  
> +fkeep-gc-roots-live
> +Common Report Var(flag_keep_gc_roots_live) Optimization
> +Always keep a pointer to a live memory block
> +
>  floop-parallelize-all
>  Common Report Var(flag_loop_parallelize_all) Optimization
>  Mark all loops as parallel.
> Index: doc/invoke.texi
> ===
> --- doc/invoke.texi (revision 232580)
> +++ doc/invoke.texi (working copy)
> @@ -359,7 +359,7 @@
>  -fno-ira-share-spill-slots @gol
>  -fisolate-erroneous-paths-dereference -fisolate-erroneous-paths-attribute 
> @gol
>  -fivopts -fkeep-inline-functions -fkeep-static-functions @gol
> --fkeep-static-consts -flive-range-shrinkage @gol
> +-fkeep-static-consts -fkeep-gc-roots-live -flive-range-shrinkage @gol
>  -floop-block -floop-interchange -floop-strip-mine @gol
>  -floop-unroll-and-jam -floop-nest-optimize @gol
>  -floop-parallelize-all -flra-remat -flto -flto-compression-level @gol
> @@ -6621,6 +6621,17 @@
>  If you use @option{-Wunsafe-loop-optimizations}, the compiler warns you
>  if it finds this kind of loop.
>  
> +@item -fkeep-gc-roots-live
> +@opindex fkeep-gc-roots-live
> +This option tells the compiler that a garbage collector will be used,
> +and that therefore the compiled code must retain a live pointer into
> +all memory blocks.  The compiler is permitted to construct a pointer
> +that is outside the bounds of a memory block, but it must ensure that
> +given a pointer into memory, some pointer into that memory remains
> +live in the compiled code whenever it is live in the source code.
> +This option is disabled by default for most languages, enabled by
> +default for languages that use garbage collection.

Is the patch missing some logic to make the option be enabled by default
for gc-using languages?  (presumably go, and maybe java?)

[...snip...]



[Patch AArch64] Restrict 16-bit sqrdml{sa}h instructions to FP_LO_REGS

2016-01-26 Thread James Greenhalgh

Hi,

In their forms using 16-bit lanes, the sqrdmlah and sqrdmlsh instruction
available when compiling with -march=armv8.1-a are only usable with
a register number in the range 0 to 15 for operand 3, as gas will point
out:

  Error: register number out of range 0 to 15 at
operand 3 -- `sqrdmlsh v2.4h,v4.4h,v23.h[5]'

This patch teaches GCC to avoid registers outside of this range when
appropriate, in the same fashion as we do for other instructions with
this limitation.

Tested on an internal testsuite targeting Neon intrinsics.

OK?

Thanks,
James

---
2016-01-25  James Greenhalgh  

* config/aarch64/aarch64.md
(arch64_sqrdmlh_lane): Fix register
constraints for operand 3.
(aarch64_sqrdmlh_laneq): Likewise.

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index e1f5682..0b46e78 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3240,7 +3240,7 @@
 	  [(match_operand:VDQHS 1 "register_operand" "0")
 	   (match_operand:VDQHS 2 "register_operand" "w")
 	   (vec_select:
-	 (match_operand: 3 "register_operand" "w")
+	 (match_operand: 3 "register_operand" "")
 	 (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
 	  SQRDMLH_AS))]
"TARGET_SIMD_RDMA"
@@ -3258,7 +3258,7 @@
 	  [(match_operand:SD_HSI 1 "register_operand" "0")
 	   (match_operand:SD_HSI 2 "register_operand" "w")
 	   (vec_select:
-	 (match_operand: 3 "register_operand" "w")
+	 (match_operand: 3 "register_operand" "")
 	 (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
 	  SQRDMLH_AS))]
"TARGET_SIMD_RDMA"
@@ -3278,7 +3278,7 @@
 	  [(match_operand:VDQHS 1 "register_operand" "0")
 	   (match_operand:VDQHS 2 "register_operand" "w")
 	   (vec_select:
-	 (match_operand: 3 "register_operand" "w")
+	 (match_operand: 3 "register_operand" "")
 	 (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
 	  SQRDMLH_AS))]
"TARGET_SIMD_RDMA"
@@ -3296,7 +3296,7 @@
 	  [(match_operand:SD_HSI 1 "register_operand" "0")
 	   (match_operand:SD_HSI 2 "register_operand" "w")
 	   (vec_select:
-	 (match_operand: 3 "register_operand" "w")
+	 (match_operand: 3 "register_operand" "")
 	 (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
 	  SQRDMLH_AS))]
"TARGET_SIMD_RDMA"


Re: Wonly-top-basic-asm

2016-01-26 Thread Segher Boessenkool
On Tue, Jan 26, 2016 at 01:11:36PM +0100, Bernd Schmidt wrote:
> On 01/26/2016 01:29 AM, Segher Boessenkool wrote:
> 
> >In my opinion we should not warn for any asm that means the same both
> >as basic and as extended asm.  The problem then becomes, what *is* the
> >meaning of a basic asm, what does it clobber.
> 
> I think this may be too hard to figure out in general without parsing 
> the asm string, which we don't really want to do.

That depends on the semantics of basic asm.  With the currently implemented
semantics, it is trivial.


Segher


Re: [PATCH] PR c++/69399: Add HAVE_WORKING_CXX_BUILTIN_CONSTANT_P

2016-01-26 Thread Jakub Jelinek
On Tue, Jan 26, 2016 at 04:54:43PM +0100, Richard Biener wrote:
> > Somehow PR 65656 miscompiled:
> >
> >   if (STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT)
> >   ? xi.len == 1 && xi.val[0] >= 0
> >   : xi.precision <= HOST_BITS_PER_WIDE_INT)
> >
> > which turned the expression into true and hit
> >
> >   val[0] = xi.to_uhwi () >> shift;
> >   result.set_len (1);
> 
> I think we need a better analysis as we use __builtin_constant_p
> elsewhere as well.

Yeah, it would be nice to understand the somehow and what exactly is gong
on.
So, what is xi.precision, is it variable or constant, what value does it
have, has __builtin_constant_p returned 1 when it was supposed to return 0?
But then there would be still the precision check.
What is xi.len and xi.val[0]?

Jakub


Re: [C PATCH] Fix -Wunused-function (PR debug/66869)

2016-01-26 Thread Jakub Jelinek
On Tue, Jan 26, 2016 at 04:21:08PM +0100, Richard Biener wrote:
> > --- gcc/c/c-decl.c.jj   2016-01-21 00:41:47.0 +0100
> > +++ gcc/c/c-decl.c  2016-01-25 16:36:31.973504082 +0100
> > @@ -10741,11 +10741,19 @@ c_write_global_declarations_1 (tree glob
> >if (TREE_CODE (decl) == FUNCTION_DECL
> >   && DECL_INITIAL (decl) == 0
> >   && DECL_EXTERNAL (decl)
> > - && !TREE_PUBLIC (decl)
> > - && C_DECL_USED (decl))
> > + && !TREE_PUBLIC (decl))
> > {
> > - pedwarn (input_location, 0, "%q+F used but never defined", decl);
> > - TREE_NO_WARNING (decl) = 1;
> > + if (C_DECL_USED (decl))
> > +   {
> > + pedwarn (input_location, 0, "%q+F used but never defined", 
> > decl);
> > + TREE_NO_WARNING (decl) = 1;
> > +   }
> > + /* For -Wunused-function push the unused statics into cgraph,
> > +so that check_global_declaration emits the warning.  */
> > + else if (warn_unused_function
> > +  && ! DECL_ARTIFICIAL (decl)
> > +  && ! TREE_NO_WARNING (decl))
> > +   cgraph_node::get_create (decl);
> 
> Err, so why not warn here directly?

You mean check if it has a cgraph node (i.e. get instead of get_create) and
if it doesn't, warn?  What I'm worried in that case is that it might have a
cgraph node created later on for whatever reason and that we'll get double
warning (from here and from cgraphunit.c (check_global_declaration)).
I can try it though.

Jakub


[PATCH] Fix PR c++/69139 (deduction failure with trailing return type)

2016-01-26 Thread Patrick Palka
This patch makes the parser more robust in determining whether an 'auto'
specifier that appears in a parameter declaration corresponds to a
placeholder for a late return type, or corresponds to an implicit
template parameter as for an abbreviated function template.

Bootstrap + regtest in progress on x86_64-pc-linux-gnu, will also test
this change against Boost.  OK to commit if testing succeeds?  What
about for GCC 4.9/5?

gcc/cp/ChangeLog:

PR c++/69139
* parser.c (cp_parser_simple_type_specifier): Make the check
for disambiguating between an 'auto' placeholder and an implicit
template parameter more robust.

gcc/testsuite/ChangeLog:

PR c++/69139
* g++.dg/cpp0x/auto47.C: New test.
---
 gcc/cp/parser.c | 33 +++--
 gcc/testsuite/g++.dg/cpp0x/auto47.C | 20 
 2 files changed, 43 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/auto47.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d03b0c9..56c834f 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -16032,20 +16032,33 @@ cp_parser_simple_type_specifier (cp_parser* parser,
  /* The 'auto' might be the placeholder return type for a function decl
 with trailing return type.  */
  bool have_trailing_return_fn_decl = false;
- if (cp_lexer_peek_nth_token (parser->lexer, 2)->type
- == CPP_OPEN_PAREN)
+
+ cp_parser_parse_tentatively (parser);
+ cp_lexer_consume_token (parser->lexer);
+ while (cp_lexer_next_token_is_not (parser->lexer, CPP_EQ)
+&& cp_lexer_next_token_is_not (parser->lexer, CPP_COMMA)
+&& cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN)
+&& cp_lexer_next_token_is_not (parser->lexer, CPP_EOF))
{
- cp_parser_parse_tentatively (parser);
- cp_lexer_consume_token (parser->lexer);
- cp_lexer_consume_token (parser->lexer);
- if (cp_parser_skip_to_closing_parenthesis (parser,
+ if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))
+   {
+ cp_lexer_consume_token (parser->lexer);
+ cp_parser_skip_to_closing_parenthesis (parser,
 /*recovering*/false,
 /*or_comma*/false,
-/*consume_paren*/true))
-   have_trailing_return_fn_decl
- = cp_lexer_next_token_is (parser->lexer, CPP_DEREF);
- cp_parser_abort_tentative_parse (parser);
+/*consume_paren*/true);
+ continue;
+   }
+
+ if (cp_lexer_next_token_is (parser->lexer, CPP_DEREF))
+   {
+ have_trailing_return_fn_decl = true;
+ break;
+   }
+
+ cp_lexer_consume_token (parser->lexer);
}
+ cp_parser_abort_tentative_parse (parser);
 
  if (have_trailing_return_fn_decl)
{
diff --git a/gcc/testsuite/g++.dg/cpp0x/auto47.C 
b/gcc/testsuite/g++.dg/cpp0x/auto47.C
new file mode 100644
index 000..08adf31
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/auto47.C
@@ -0,0 +1,20 @@
+// PR c++/69139
+// { dg-do compile { target c++11 } }
+
+auto get(int) -> int { return {}; }
+template  int f(auto (*)(int) -> R) { return {}; }
+int i = f(get);
+
+int foo1 (auto (int) -> char);
+
+int foo2 (auto f(int) -> char);
+
+int foo2 (auto (f)(int) -> char);
+
+int foo3 (auto (*f)(int) -> char);
+
+int foo4 (auto (*const **&f)(int) -> char);
+
+int foo5 (auto (*const **&f)(int, int *) -> char);
+
+int foo6 (auto (int) const -> char); // { dg-error "const" }
-- 
2.7.0.134.gf5046bd.dirty



[C++ PATCH] Handle error_mark_node in cp_fold (alt; PR c++/68357)

2016-01-26 Thread Jakub Jelinek
On Tue, Jan 26, 2016 at 02:56:24PM +0100, Jakub Jelinek wrote:
> Another alternative would be to make sure tree folders don't introduce
> error_mark_node (if it wasn't there already), but instead fold the call say
> to build_int_cst (returntype, 0).  The known cases that would need to change
> are at least darwin_build_constant_cfstring and darwin_fold_builtin, but
> maybe others.

Here is the alternative (but it is unclear if other targets don't have
similar issues in their folders).

I have no access to Darwin, so all I've done was test it on the preprocessed
source from the PR.

2016-01-26  Jakub Jelinek  

PR c++/68357
* config/darwin.c (darwin_fold_builtin): For errorneous use
of the __builtin___CFStringMakeConstantString builtin return
constant 0 in the right type rather than error_mark_node.

--- gcc/config/darwin.c.jj  2016-01-04 14:55:54.0 +0100
+++ gcc/config/darwin.c 2016-01-26 17:28:12.489018588 +0100
@@ -3345,19 +3345,17 @@ darwin_fold_builtin (tree fndecl, int n_
   if (fcode == darwin_builtin_cfstring)
 {
   if (!darwin_constant_cfstrings)
+   error ("built-in function %qD requires the" 
+  " %<-mconstant-cfstrings%> flag", fndecl);
+  else if (n_args != 1)
+   error ("built-in function %qD takes one argument only", fndecl);
+  else
{
- error ("built-in function %qD requires the" 
-" %<-mconstant-cfstrings%> flag", fndecl);
- return error_mark_node;
+ tree ret = darwin_build_constant_cfstring (*argp);
+ if (ret != error_mark_node)
+   return ret;
}
-
-  if (n_args != 1)
-   {
- error ("built-in function %qD takes one argument only", fndecl);
- return error_mark_node;
-   }
-
-  return darwin_build_constant_cfstring (*argp);
+  return build_int_cst (TREE_TYPE (TREE_TYPE (fndecl)), 0);
 }
 
   return NULL_TREE;


Jakub


Re: Speedup configure and build with system.h

2016-01-26 Thread Michael Matz
Hi,

On Tue, 26 Jan 2016, Uros Bizjak wrote:

> > Meh.  Can you try the attached patch with a configure test (it 
> > includes the generated files)?  It works for me with 4.3.4, and should 
> > make your build include  always.
> 
> Yes, this patch works for me and allows bootstrap with gcc-4.1.2 to 
> finish.

Thanks for checking.  r232836 now.


Ciao,
Michael.


Re: [PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices

2016-01-26 Thread Sebastian Pop
Tom de Vries wrote:
> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
> index a40f40d..4c29fc2 100644
> --- a/gcc/tree-data-ref.c
> +++ b/gcc/tree-data-ref.c
> @@ -1510,8 +1510,9 @@ initialize_data_dependence_relation (struct 
> data_reference *a,
>if (operand_equal_p (DR_REF (a), DR_REF (b), 0))
>  {
>   if (loop_nest.exists ()
> -&& !object_address_invariant_in_loop_p (loop_nest[0],
> - DR_BASE_OBJECT (a)))
> +  && (!object_address_invariant_in_loop_p (loop_nest[0],
> +   DR_BASE_OBJECT (a))
> +  || DR_NUM_DIMENSIONS (a) == 0))

Also please fix the indentation of all this if stmt.

>{
>  DDR_ARE_DEPENDENT (res) = chrec_dont_know;
>  return res;
> @@ -1548,8 +1549,9 @@ initialize_data_dependence_relation (struct 
> data_reference *a,
>   analyze it.  TODO -- in fact, it would suffice to record that there may
>   be arbitrary dependences in the loops where the base object varies.  */
>if (loop_nest.exists ()
> -  && !object_address_invariant_in_loop_p (loop_nest[0],
> -   DR_BASE_OBJECT (a)))
> +  && (!object_address_invariant_in_loop_p (loop_nest[0],
> +DR_BASE_OBJECT (a))
> +   || DR_NUM_DIMENSIONS (a) == 0))
>  {
>DDR_ARE_DEPENDENT (res) = chrec_dont_know;
>return res;

Let's check for DR_NUM_DIMENSIONS (a) == 0 independently of loop_nest.exists ().
We check for the loop_nest because we need to access the outer loop loop_nest[0]
to analyze the base object of a.

Otherwise the change looks good to me.

Thanks,
Sebastian


Re: [PATCH][ARM] Fix PR target/69245 Rewrite arm_set_current_function

2016-01-26 Thread Kyrill Tkachov


On 26/01/16 16:56, Christian Bruel wrote:



On 01/26/2016 04:58 PM, Kyrill Tkachov wrote:

Hi Christian,

On 26/01/16 15:29, Christian Bruel wrote:


On 01/25/2016 05:37 PM, Kyrill Tkachov wrote:


So this is ok for trunk with the testcase changed as discussed above and using 
-O2
optimisation level and with a couple comment fixes below.

Hi Kyrill,

I realized afterwards that my implementation had a flaw with the handling of 
#pragma GCC reset. What happened is that when both old and new 
TREE_TARGET_OPTION are NULL, we didn't save_restore the globals flags, to save 
compute time. The
problem is that with #pragma GCC reset, we also fall into this situation, and 
exit without calling target_reeinit :-(

Handling this situation doesn't complicate the code much, because I factorized 
the calls to target_reeinit + restore_target_globals into a new function 
(save_restore_target_globals). This function is called from the pragma context 
when
resetting the state arm_reset_previous_fndecl to the default, and from 
arm_set_current_function when setting to a new target. This is only done when 
there is a change of the target flags, so no extra computing cost.

Same testing as with previous patch:
 arm-qemu/
 arm-qemu//-mfpu=neon-fp-armv8
 arm-qemu//-mfpu=neon

Still OK ?


+/* Restore the TREE_TARGET_GLOBALS from previous state, or save it.  */
+static void
+save_restore_target_globals (tree new_tree)
+{
+  /* If we have a previous state, use it.  */
+  if (TREE_TARGET_GLOBALS (new_tree))
+restore_target_globals (TREE_TARGET_GLOBALS (new_tree));
+  else if (new_tree == target_option_default_node)
+restore_target_globals (&default_target_globals);
+  else
+{
+  /* Call target_reinit and save the state for TARGET_GLOBALS.  */
+  TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts ();
+}
+
+  arm_option_params_internal ();
+}

Space before the function comment and signature.
what do you mean ?  a new line between the comment and the function signature ? I usually mimic what's done in the other arm.c declarations, which sometimes have a new line, sometime not. It's not clear to me what's mandatory here, even 
in the other parts of the compiler.




Yes, I meant new line, sorry. A new line is the rule, though there are some
functions that don't follow it. I guess we can fix those as we encounter them.


  Also, you need to document what is the NEW_TREE
parameter.

   /* Invalidate arm_previous_fndecl.  */
   void
-arm_reset_previous_fndecl (void)
+arm_reset_previous_fndecl (tree new_tree)
   {
+  if (new_tree)
+save_restore_target_globals (new_tree);
+
 arm_previous_fndecl = NULL_TREE;
   }

I'm a bit wary of complicating this function. Suddenly it doesn't just reset 
the previous fndecl
but also restores globals and can save stuff into its argument. It's suddenly 
not clear what it's
purpose is.
I think it would be clearer if you just added save_restore_target_globals to 
arm_protos.h and called
it explicitly from arm_pragma_target_parse when appropriate.


sure, like this one attached (sanity checked) ?




This is ok if it passes a proper testing round.

Thanks,
Kyrill




+
+  /* If nothing to do return. #pragma GCC reset or #pragma GCC pop to
+ the default have been handled by save_restore_target_globals from
+ arm_pragma_target_parse.  */

Two spaces between fullstop and "#pragma GCC".


thanks,

Christian





[PR69315] enable finish_function to recurse for constexpr functions

2016-01-26 Thread Alexandre Oliva
We don't want finish_function to be called recursively from mark_used.
However, it's desirable and necessary to call itself recursively when
performing delayed folding, because that may have to instantiate and
evaluate constexpr template functions.

So, arrange for finish_function to accept being called recursively
during delayed folding, save and restore the controlling variables,
and process the deferred mark_used calls only when the outermost call
completes.

Regstrapped on x86_64-linux-gnu and i686-linux-gnu.  Ok to install?

for  gcc/cp/ChangeLog

PR c++/69315
* decl.c (is_folding_function): New variable.
(finish_function): Test, save and set it.

for  gcc/testsuite/ChangeLog

PR c++/69315
* g++.dg/pr69315.C: New.
---
 gcc/cp/decl.c  |   31 +--
 gcc/testsuite/g++.dg/pr69315.C |   34 ++
 2 files changed, 59 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr69315.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index f4604b6..65eff2f 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -227,10 +227,14 @@ struct GTY((for_user)) named_label_entry {
function, two inside the body of a function in a local class, etc.)  */
 int function_depth;
 
-/* To avoid unwanted recursion, finish_function defers all mark_used calls
-   encountered during its execution until it finishes.  */
+/* To avoid unwanted recursion, finish_function defers all mark_used
+   calls encountered during its execution until it finishes.
+   finish_function refuses to be called recursively, unless the
+   recursion occurs during folding, which often requires instantiating
+   and evaluating template functions.  */
 bool defer_mark_used_calls;
 vec *deferred_mark_used_calls;
+static bool is_folding_function;
 
 /* States indicating how grokdeclarator() should handle declspecs marked
with __attribute__((deprecated)).  An object declared as
@@ -14528,8 +14532,11 @@ finish_function (int flags)
   if (c_dialect_objc ())
 objc_finish_function ();
 
-  gcc_assert (!defer_mark_used_calls);
+  gcc_assert (!defer_mark_used_calls
+ || (is_folding_function && DECL_DECLARED_CONSTEXPR_P (fndecl)));
   defer_mark_used_calls = true;
+  bool save_folding_function = is_folding_function;
+  is_folding_function = false;
 
   record_key_method_defined (fndecl);
 
@@ -14636,7 +14643,14 @@ finish_function (int flags)
 
   /* Perform delayed folding before NRV transformation.  */
   if (!processing_template_decl)
-cp_fold_function (fndecl);
+{
+  is_folding_function = true;
+  cp_fold_function (fndecl);
+  /* Check that our controlling variables were restored to the
+expect state.  */
+  gcc_assert (is_folding_function && defer_mark_used_calls);
+  is_folding_function = false;
+}
 
   /* Set up the named return value optimization, if we can.  Candidate
  variables are selected in check_return_expr.  */
@@ -14780,8 +14794,13 @@ finish_function (int flags)
   /* Clean up.  */
   current_function_decl = NULL_TREE;
 
-  defer_mark_used_calls = false;
-  if (deferred_mark_used_calls)
+  is_folding_function = save_folding_function;
+  /* Iff we were called recursively for a constexpr function,
+ is_folding_function was just restored to TRUE.  If we weren't
+ called recursively, it was restored to FALSE.  That's just how
+ defer_mark_used_call ought to be set.  */
+  defer_mark_used_calls = is_folding_function;
+  if (!defer_mark_used_calls && deferred_mark_used_calls)
 {
   unsigned int i;
   tree decl;
diff --git a/gcc/testsuite/g++.dg/pr69315.C b/gcc/testsuite/g++.dg/pr69315.C
new file mode 100644
index 000..28975b6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr69315.C
@@ -0,0 +1,34 @@
+// { dg-do compile }
+// { dg-options "-std=c++11" }
+
+// Template instantiation and evaluation for folding within
+// finish_function may call finish_function recursively.
+// Make sure we don't reject or delay that sort of recursion.
+
+template  struct Iter;
+
+struct Arg {
+  Iter begin();
+  Iter end();
+};
+
+template  struct Iter {
+  int operator*();
+  Iter operator++();
+  template  friend constexpr bool operator==(Iter, 
Iter);
+  template  friend constexpr bool operator!=(Iter, 
Iter);
+};
+
+void func(Arg a) {
+  for (auto ch : a) {
+a.begin() == a.end();
+  }
+}
+
+template  constexpr bool operator==(Iter, Iter) {
+  return true;
+}
+
+template  constexpr bool operator!=(Iter a, Iter b) {
+  return a == b;
+}

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PATCH, 69217]: [6 Regression] ICE at var-tracking.c:5038 Segmentation fault

2016-01-26 Thread Alexandre Oliva
On Jan 23, 2016, Iain Buclaw  wrote:

>   PR rtl-optimization/69217
>   * var-tracking.c (tracked_record_parameter_p): Don't segfault if there
>   are no TYPE_FIELDS set for the record type.

This looks good to me, thanks.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


[PR 69355] Correct hole detection when total_scalarization fails

2016-01-26 Thread Martin Jambor
Hi,

PR 69355 has revealed that when SRA attempts total scalarization of an
aggregate but this fails because the user type-casts a scalar field
and stores into a it a smaller aggregate (and the scalar field is not
written to, whether directly or as a part of an aggregate store), the
pass can loose track of unscalarized data there.

I think that this can happen only when violating strict aliasing rules
but with -fno-strict-aliasing it should work.

Fixed thusly with the patch below (the condition is there to avoid
detecting padding after aggregate-fields in totally-scalarized
aggregates as unscalarized data).  Bootstrapped and tested on
x86_64-linux.  OK for trunk?  And the gcc-5 branch?

Thanks,

Martin


2016-01-26  Martin Jambor  

PR tree-optimization/69355
* tree-sra.c (analyze_access_subtree): Correct hole detection when
total_scalarization fails.

testsuite/
* gcc.dg/tree-ssa/pr69355.c: New test.

---
 gcc/testsuite/gcc.dg/tree-ssa/pr69355.c | 44 +
 gcc/tree-sra.c  |  2 +-
 2 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr69355.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr69355.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr69355.c
new file mode 100644
index 000..f515c21
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr69355.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-options "-O -fno-strict-aliasing" } */
+
+struct S
+{
+  void *a;
+  long double b;
+};
+
+struct Z
+{
+  long long l;
+  short s;
+} __attribute__((packed));
+
+struct S __attribute__((noclone, noinline))
+foo (void *v, struct Z *z)
+{
+  struct S t;
+  t.a = v;
+  *(struct Z *) &t.b = *z;
+  return t;
+}
+
+struct Z gz;
+
+int
+main (int argc, char **argv)
+{
+  struct S s;
+
+  if (sizeof (long double) < sizeof (struct Z))
+return 0;
+
+  gz.l = 0xbeef;
+  gz.s = 0xab;
+
+  s = foo ((void *) 0, &gz);
+
+  if struct Z *) &s.b)->l != gz.l)
+  || (((struct Z *) &s.b)->s != gz.s))
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 740542f..b0e737a 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -2421,7 +2421,7 @@ analyze_access_subtree (struct access *root, struct 
access *parent,
 
   if (covered_to < limit)
hole = true;
-  if (scalar)
+  if (scalar || !allow_replacements)
root->grp_total_scalarization = 0;
 }
 
-- 
2.7.0



Re: Patch RFA: Add option -fcollectible-pointers, use it in ivopts

2016-01-26 Thread Ian Lance Taylor
On Tue, Jan 26, 2016 at 8:03 AM, David Malcolm  wrote:
>
> Is the patch missing some logic to make the option be enabled by default
> for gc-using languages?  (presumably go, and maybe java?)

I am intentionally leaving that to a separate patch, yes.  I think
this option is useful by itself for C/C++ programs that intend to use
something like the Boehm collector.  If the option is not useful by
itself, then a different approach may be appropriate.

Ian


[trans-mem, committed] Fix 60908

2016-01-26 Thread Richard Henderson
Just a silly think-o in building the tm region tree,
which resulted in the one region being found twice.


r~
PR middle-end/60908
* trans-mem.c (tm_region_init): Mark entry block as visited.

testsuite/
* gcc.dg/tm/pr60908.c: New test.



diff --git a/gcc/testsuite/gcc.dg/tm/pr60908.c 
b/gcc/testsuite/gcc.dg/tm/pr60908.c
new file mode 100644
index 000..773438d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tm/pr60908.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm" } */
+
+int t, v;
+
+int
+foo (void)
+{
+  while (1)
+{
+  __transaction_atomic { v++; }
+  if (t)
+return 0;
+}
+}
diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c
index b204760..500071f 100644
--- a/gcc/trans-mem.c
+++ b/gcc/trans-mem.c
@@ -2039,16 +2039,17 @@ tm_region_init (struct tm_region *region)
   struct tm_region *old_region;
   auto_vec bb_regions;
 
-  all_tm_regions = region;
-  bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
-
   /* We could store this information in bb->aux, but we may get called
  through get_all_tm_blocks() from another pass that may be already
  using bb->aux.  */
   bb_regions.safe_grow_cleared (last_basic_block_for_fn (cfun));
 
+  all_tm_regions = region;
+  bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
   queue.safe_push (bb);
+  bitmap_set_bit (visited_blocks, bb->index);
   bb_regions[bb->index] = region;
+
   do
 {
   bb = queue.pop ();


Re: [Fortran, gcc-5, patch, pr69268, v1] [5 Regression] Sourced allocation calls function twice

2016-01-26 Thread Paul Richard Thomas
Dear Andre,

The patch looks fine to me. OK for 5-branch.

Thanks for the patch.

Paul

On 26 January 2016 at 13:28, Andre Vehreschild  wrote:
> Hi all,
>
> please find attached a patch to solve the issue of evaluating a source=
> expression of an allocate() twice in gcc-5. The patch is a combination
> and partial back port of several prs of the mainline (namely, but not
> the complete list: pr44672, pr65548).
>
> The patch needed the counts of builtin_mallocs/frees in
> allocatable_scalar_13 to be adapted. There are now fewer calls to the
> memory management routines. Valgrind does not report any memory issues
> in the modified code, but that does not mean there aren't any. I am
> happy to hear about any issue, this patch causes (still having issues
> getting the sanitizer to work).
>
> Bootstrapped and regtested on x86_64-linux-gnu/F23.
>
> Ok, for gcc-5-branch?
>
> Regards,
> Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


Re: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

2016-01-26 Thread Wilco Dijkstra
ping (note the regressions discussed below are addressed by 
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01761.html)


From: Wilco Dijkstra
Sent: 17 December 2015 13:37
To: James Greenhalgh
Cc: gcc-patches@gcc.gnu.org; nd
Subject: RE: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

James Greenhalgh wrote:
> On Wed, Dec 16, 2015 at 01:05:21PM +, Wilco Dijkstra wrote:
> > James Greenhalgh wrote:
> > > On Tue, Dec 15, 2015 at 10:54:49AM +, Wilco Dijkstra wrote:
> > > > ping
> > > >
> > > > > -Original Message-
> > > > > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> > > > > Sent: 06 November 2015 20:06
> > > > > To: 'gcc-patches@gcc.gnu.org'
> > > > > Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > > >
> > > > > This patch adds support for the TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > > > hook. When the cost of GENERAL_REGS and FP_REGS is identical, the 
> > > > > register
> > > > > allocator always uses ALL_REGS even when it has a much higher cost. 
> > > > > The
> > > > > hook changes the class to either FP_REGS or GENERAL_REGS depending on 
> > > > > the
> > > > > mode of the register. This results in better register allocation 
> > > > > overall,
> > > > > fewer spills and reduced codesize - particularly in SPEC2006 gamess.
> > > > >
> > > > > GCC regression passes with several minor fixes.
> > > > >
> > > > > OK for commit?
> > > > >
> > > > > ChangeLog:
> > > > > 2015-11-06  Wilco Dijkstra  
> > > > >
> > > > >   * gcc/config/aarch64/aarch64.c
> > > > >   (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): New define.
> > > > >   (aarch64_ira_change_pseudo_allocno_class): New function.
> > > > >   * gcc/testsuite/gcc.target/aarch64/cvtf_1.c: Build with -O2.
> > > > >   * gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > >   (test_corners_sisd_di): Improve force to SIMD register.
> > > > >   (test_corners_sisd_si): Likewise.
> > > > >   * gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c: Build with 
> > > > > -O2.
> > > > >   * gcc/testsuite/gcc.target/aarch64/vect-ld1r-compile-fp.c:
> > > > >   Remove scan-assembler check for ldr.
> > >
> > > Drop the gcc/ from the ChangeLog.
> > >
> > > > > --
> > > > >  gcc/config/aarch64/aarch64.c   | 22 
> > > > > ++
> > > > >  gcc/testsuite/gcc.target/aarch64/cvtf_1.c  |  2 +-
> > > > >  gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c  |  4 ++--
> > > > >  gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c |  2 +-
> > > > >  .../gcc.target/aarch64/vect-ld1r-compile-fp.c  |  1 -
> > >
> > > These testsuite changes concern me a bit, and you don't mention them 
> > > beyond
> > > saying they are minor fixes...
> >
> > Well any changes to register allocator preferencing would cause fallout in
> > tests that are assuming which register is allocated, especially if they use
> > nasty inline assembler hacks to do so...
>
> Sure, but the testcases here each operate on data that should live in
> FP_REGS given the initial conditions that the nasty hacks try to mimic -
> that's what makes the regressions notable.
>
> >
> > > > >  #define FCVTDEF(ftype,itype) \
> > > > >  void \
> > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c 
> > > > > b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > index 363f554..8465c89 100644
> > > > > --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > @@ -186,9 +186,9 @@ test_corners_sisd_di (Int64x1 b)
> > > > >  {
> > > > >force_simd_di (b);
> > > > >b = b >> 63;
> > > > > +  force_simd_di (b);
> > > > >b = b >> 0;
> > > > >b += b >> 65; /* { dg-warning "right shift count >= width of type" 
> > > > > } */
> > > > > -  force_simd_di (b);
> > >
> > > This one I don't understand, but seems to say that we've decided to move
> > > b out of FP_REGS after getting it in there for b = b << 63; ? So this is
> > > another register allocator regression?
> >
> > No, basically the register allocator is now making better decisions as to
> > where to allocate integer variables. It will only allocate them to FP
> > registers if they are primarily used by other FP operations. The
> > force_simd_di inline assembler tries to mimic FP uses, and if there are
> > enough of them at the right places then everything works as expected.  If
> > however you do 3 consecutive integer operations then the allocator will now
> > correctly prefer to allocate them to the integer registers (while previously
> > it wouldn't, which is inefficient).
>
> I'm not sure I understand this argument in the abstract (though I believe
> it for some of the supported cores for the AArch64 target). At an abstract
> level, given a set of operations which can execute in either FP_REGS or
> GENERAL_REGS and initial and post conditions that allocate all input and
> output register

Re: [PATCH][ARM] Enable fusion of AES instructions

2016-01-26 Thread Wilco Dijkstra

ping

> -Original Message-
> From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> Sent: 19 November 2015 18:12
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH][ARM] Enable fusion of AES instructions
>
> Enable instruction fusion of AES instructions on ARM for Cortex-A53 and 
> Cortex-A57.
>
> OK for commit?
>
> ChangeLog:
> 2015-11-20  Wilco Dijkstra  
>
>   * gcc/config/arm/arm.c (arm_cortex_a53_tune): Add AES fusion.
>   (arm_cortex_a57_tune): Likewise.
>   (aarch_macro_fusion_pair_p): Add support for AES fusion.
>   * gcc/config/arm/arm-protos.h (fuse_ops): Add FUSE_AES_AESMC.
>
> ---
>  gcc/config/arm/arm-protos.h | 5 +++--
>  gcc/config/arm/arm.c| 9 +++--
>  2 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index f9b1276..4801bb8 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -302,8 +302,9 @@ struct tune_params
>enum fuse_ops
>{
>  FUSE_NOTHING   = 0,
> -FUSE_MOVW_MOVT = 1 << 0
> -  } fusible_ops: 1;
> +FUSE_MOVW_MOVT = 1 << 0,
> +FUSE_AES_AESMC = 1 << 1
> +  } fusible_ops: 2;
>/* Depth of scheduling queue to check for L2 autoprefetcher.  */
>enum {SCHED_AUTOPREF_OFF, SCHED_AUTOPREF_RANK, SCHED_AUTOPREF_FULL}
>  sched_autopref: 2;
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 02f5dc3..7077199 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -1969,7 +1969,7 @@ const struct tune_params arm_cortex_a53_tune =
>tune_params::DISPARAGE_FLAGS_NEITHER,
>tune_params::PREF_NEON_64_FALSE,
>tune_params::PREF_NEON_STRINGOPS_TRUE,
> -  FUSE_OPS (tune_params::FUSE_MOVW_MOVT),
> +  FUSE_OPS (tune_params::FUSE_MOVW_MOVT | tune_params::FUSE_AES_AESMC),
>tune_params::SCHED_AUTOPREF_OFF
>  };
>
> @@ -1992,7 +1992,7 @@ const struct tune_params arm_cortex_a57_tune =
>tune_params::DISPARAGE_FLAGS_ALL,
>tune_params::PREF_NEON_64_FALSE,
>tune_params::PREF_NEON_STRINGOPS_TRUE,
> -  FUSE_OPS (tune_params::FUSE_MOVW_MOVT),
> +  FUSE_OPS (tune_params::FUSE_MOVW_MOVT | tune_params::FUSE_AES_AESMC),
>tune_params::SCHED_AUTOPREF_FULL
>  };
>
> @@ -29668,6 +29668,11 @@ aarch_macro_fusion_pair_p (rtx_insn* prev, rtx_insn* 
> curr)
>  && REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set)))
>return true;
>  }
> +
> +  if (current_tune->fusible_ops & tune_params::FUSE_AES_AESMC
> +  && aarch_crypto_can_dual_issue (prev, curr))
> +return true;
> +
>return false;
>  }
>
> --
> 1.9.1



[PATCH] Fix up wi::lrshift (PR c++/69399)

2016-01-26 Thread Jakub Jelinek
Hi!

On Tue, Jan 26, 2016 at 04:54:43PM +0100, Richard Biener wrote:
> > Somehow PR 65656 miscompiled:
> >
> >   if (STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT)
> >   ? xi.len == 1 && xi.val[0] >= 0
> >   : xi.precision <= HOST_BITS_PER_WIDE_INT)
> >
> > which turned the expression into true and hit
> >
> >   val[0] = xi.to_uhwi () >> shift;
> >   result.set_len (1);
> 
> I think we need a better analysis as we use __builtin_constant_p
> elsewhere as well.

So, I had a look at this bug and it seems clearly a bug on the wide-int.h
side.  wi::lrshift right now doesn't do what the comment says (which says
that for the larger precision fixed types it only cares about constant
shift).
Compared to wi::lshift, for the
STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT)
case lrshift doesn't bother to check the value of shift.
While for xi.precision <= HOST_BITS_PER_WIDE_INT, at least conforming code
should not perform out of bounds shifts and thus there is no need to check
the value of shift, for xi.precision > HOST_BITS_PER_WIDE_INT it is
completely valid to have large shift (in between HOST_BITS_PER_WIDE_INT
inclusive and xi.precision exclusive), and then we just trigger undefined
behavior for that case.

The question is, shall we do what wi::lshift does and have the fast path
only for the constant shifts, as the untested patch below does, or shall we
check shift dynamically (thus use
shift < HOST_BITS_PER_WIDE_INT
instead of
STATIC_CONSTANT_P (shift < HOST_BITS_PER_WIDE_INT)
in the patch), or shall we have another case for such shifts and set
val[0] = 0; then?

The __builtin_constant_p change affects whether we trigger this bug
only for fixed large precision instantiations (with trunk
__builtin_constant_p) or also by mistake for variable large precision
instantiations (with gcc 5 __builtin_constant_p), but the fixed large
precision instantiations are broken in any case.

2016-01-26  Jakub Jelinek  

PR tree-optimization/69399
* wide-int.h (wi::lrshift): For larger precisions, only
use fast path if shift is known to be < HOST_BITS_PER_WIDE_INT.

--- gcc/wide-int.h.jj   2016-01-04 14:55:50.0 +0100
+++ gcc/wide-int.h  2016-01-26 19:05:20.715269366 +0100
@@ -2909,7 +2909,9 @@ wi::lrshift (const T1 &x, const T2 &y)
 For variable-precision integers like wide_int, handle HWI
 and sub-HWI integers inline.  */
   if (STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT)
- ? xi.len == 1 && xi.val[0] >= 0
+ ? (STATIC_CONSTANT_P (shift < HOST_BITS_PER_WIDE_INT)
+&& xi.len == 1
+&& xi.val[0] >= 0)
  : xi.precision <= HOST_BITS_PER_WIDE_INT)
{
  val[0] = xi.to_uhwi () >> shift;


Jakub


Re: [PATCH] Fix up wi::lrshift (PR c++/69399)

2016-01-26 Thread Mike Stump
On Jan 26, 2016, at 10:21 AM, Jakub Jelinek  wrote
> The question is, shall we do what wi::lshift does and have the fast path
> only for the constant shifts, as the untested patch below does, or shall we
> check shift dynamically (thus use
> shift < HOST_BITS_PER_WIDE_INT
> instead of
> STATIC_CONSTANT_P (shift < HOST_BITS_PER_WIDE_INT)
> in the patch),

Hum…  I think I prefer the dynamic check.  The reasoning is that when we fast 
path, we can tolerate the conditional branch to retain the fast path, as most 
of the time, that condition will usually be true.  If the compiler had troubles 
with knowing the usual truth value of the expression, seems like we can hint 
that it will be true and influence the static prediction of the branch.  This 
permits us to fast path almost all the time in the non-constant, but small 
enough case.  For known shifts, there is no code gen difference, so it doesn’t 
matter.

Re: [C++ PATCH] Handle error_mark_node in cp_fold (alt; PR c++/68357)

2016-01-26 Thread Mike Stump
On Jan 26, 2016, at 8:39 AM, Jakub Jelinek  wrote:
> On Tue, Jan 26, 2016 at 02:56:24PM +0100, Jakub Jelinek wrote:
>> Another alternative would be to make sure tree folders don't introduce
>> error_mark_node (if it wasn't there already), but instead fold the call say
>> to build_int_cst (returntype, 0).  The known cases that would need to change
>> are at least darwin_build_constant_cfstring and darwin_fold_builtin, but
>> maybe others.
> 
> Here is the alternative (but it is unclear if other targets don't have
> similar issues in their folders).

So, Jason needs to make the decision on which patch he prefers.

If he prefers that error mark node not be generated this late, then the below 
patch is Ok.

[PATCH] Partial fix for PR target/68662

2016-01-26 Thread Jakub Jelinek
Hi!

As Alan mentioned in the PR, there is some other issue still around, but
by the time I've noticed that, I already had this patch being
bootstrapped/regtested on powerpc64{,le}-linux (which just passed).
Ok for trunk and deal with the rest incrementally?

2016-01-26  Jakub Jelinek  

PR target/68662
* config/rs6000/rs6000.c (rs6000_option_override_internal): Initialize
toc_label_name unconditionally.
(rs6000_emit_load_toc_table): Call ggc_strdup on toc_label_name for
SYMBOL_REF string.  Use toc_label_name instead of constructing
LCTOC1.
(rs6000_elf_declare_function_name): Use toc_label_name instead of
constructing LCTOC1.

--- gcc/config/rs6000/rs6000.c.jj   2016-01-25 22:33:17.0 +0100
+++ gcc/config/rs6000/rs6000.c  2016-01-26 13:05:18.600072073 +0100
@@ -4560,8 +4560,7 @@ rs6000_option_override_internal (bool gl
   if (TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD)
REAL_MODE_FORMAT (TFmode) = &ibm_extended_format;
 
-  if (TARGET_TOC)
-   ASM_GENERATE_INTERNAL_LABEL (toc_label_name, "LCTOC", 1);
+  ASM_GENERATE_INTERNAL_LABEL (toc_label_name, "LCTOC", 1);
 
   /* We can only guarantee the availability of DI pseudo-ops when
 assembling for 64-bit targets.  */
@@ -23983,7 +23982,7 @@ rs6000_emit_load_toc_table (int fromprol
   ASM_GENERATE_INTERNAL_LABEL (buf, "L", CODE_LABEL_NUMBER (lab));
   lab = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (buf));
   if (flag_pic == 2)
-   got = gen_rtx_SYMBOL_REF (Pmode, toc_label_name);
+   got = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (toc_label_name));
   else
got = rs6000_got_sym ();
   tmp1 = tmp2 = dest;
@@ -24027,7 +24026,7 @@ rs6000_emit_load_toc_table (int fromprol
{
  rtx tocsym, lab;
 
- tocsym = gen_rtx_SYMBOL_REF (Pmode, toc_label_name);
+ tocsym = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (toc_label_name));
  lab = gen_label_rtx ();
  emit_insn (gen_load_toc_v4_PIC_1b (tocsym, lab));
  emit_move_insn (dest, gen_rtx_REG (Pmode, LR_REGNO));
@@ -24040,10 +24039,7 @@ rs6000_emit_load_toc_table (int fromprol
   else if (TARGET_ELF && !TARGET_AIX && flag_pic == 0 && TARGET_MINIMAL_TOC)
 {
   /* This is for AIX code running in non-PIC ELF32.  */
-  char buf[30];
-  rtx realsym;
-  ASM_GENERATE_INTERNAL_LABEL (buf, "LCTOC", 1);
-  realsym = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (buf));
+  rtx realsym = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (toc_label_name));
 
   emit_insn (gen_elf_high (dest, realsym));
   emit_insn (gen_elf_low (dest, dest, realsym));
@@ -31726,9 +31722,8 @@ rs6000_elf_declare_function_name (FILE *
 
   (*targetm.asm_out.internal_label) (file, "LCL", rs6000_pic_labelno);
 
-  ASM_GENERATE_INTERNAL_LABEL (buf, "LCTOC", 1);
   fprintf (file, "\t.long ");
-  assemble_name (file, buf);
+  assemble_name (file, toc_label_name);
   putc ('-', file);
   ASM_GENERATE_INTERNAL_LABEL (buf, "LCF", rs6000_pic_labelno);
   assemble_name (file, buf);

Jakub


Re: [PATCH] Fix up wi::lrshift (PR c++/69399)

2016-01-26 Thread Jakub Jelinek
On Tue, Jan 26, 2016 at 11:00:52AM -0800, Mike Stump wrote:
> On Jan 26, 2016, at 10:21 AM, Jakub Jelinek  wrote
> > The question is, shall we do what wi::lshift does and have the fast path
> > only for the constant shifts, as the untested patch below does, or shall we
> > check shift dynamically (thus use
> > shift < HOST_BITS_PER_WIDE_INT
> > instead of
> > STATIC_CONSTANT_P (shift < HOST_BITS_PER_WIDE_INT)
> > in the patch),
> 
> Hum…  I think I prefer the dynamic check.  The reasoning is that when we
> fast path, we can tolerate the conditional branch to retain the fast path,
> as most of the time, that condition will usually be true.  If the compiler
> had troubles with knowing the usual truth value of the expression, seems
> like we can hint that it will be true and influence the static prediction
> of the branch.  This permits us to fast path almost all the time in the
> non-constant, but small enough case.  For known shifts, there is no code
> gen difference, so it doesn’t matter.

Ok, I've cancelled my pending bootstrap/regtest and am testing this instead:

2016-01-26  Jakub Jelinek  

PR tree-optimization/69399
* wide-int.h (wi::lrshift): For larger precisions, only
use fast path if shift is known to be < HOST_BITS_PER_WIDE_INT.

--- gcc/wide-int.h.jj   2016-01-04 18:50:34.656471663 +0100
+++ gcc/wide-int.h  2016-01-26 20:07:03.147054988 +0100
@@ -2909,7 +2909,9 @@ wi::lrshift (const T1 &x, const T2 &y)
 For variable-precision integers like wide_int, handle HWI
 and sub-HWI integers inline.  */
   if (STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT)
- ? xi.len == 1 && xi.val[0] >= 0
+ ? (shift < HOST_BITS_PER_WIDE_INT
+&& xi.len == 1
+&& xi.val[0] >= 0)
  : xi.precision <= HOST_BITS_PER_WIDE_INT)
{
  val[0] = xi.to_uhwi () >> shift;


Jakub


Re: [PATCH] Partial fix for PR target/68662

2016-01-26 Thread David Edelsohn
On Tue, Jan 26, 2016 at 2:15 PM, Jakub Jelinek  wrote:
> Hi!
>
> As Alan mentioned in the PR, there is some other issue still around, but
> by the time I've noticed that, I already had this patch being
> bootstrapped/regtested on powerpc64{,le}-linux (which just passed).
> Ok for trunk and deal with the rest incrementally?
>
> 2016-01-26  Jakub Jelinek  
>
> PR target/68662
> * config/rs6000/rs6000.c (rs6000_option_override_internal): Initialize
> toc_label_name unconditionally.
> (rs6000_emit_load_toc_table): Call ggc_strdup on toc_label_name for
> SYMBOL_REF string.  Use toc_label_name instead of constructing
> LCTOC1.
> (rs6000_elf_declare_function_name): Use toc_label_name instead of
> constructing LCTOC1.

This is okay as an incremental fix.

Thanks, David


Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2016-01-26 Thread Jason Merrill

On 12/14/2015 05:08 PM, H.J. Lu wrote:

+  if (abi_version_at_least (10))
+TYPE_EMPTY_RECORD (t) = is_really_empty_class (t);


This should use is_empty_class or CLASSTYPE_EMPTY_P.  We don't want to 
change how classes with just a vptr are passed.


Otherwise, it looks OK to me.

Jason



Re: [C++ PATCH] Handle error_mark_node in cp_fold (PR c++/68357)

2016-01-26 Thread Jason Merrill

On 01/26/2016 08:56 AM, Jakub Jelinek wrote:

PR c++/68357
* cp-gimplify.c (cp_fold): If some operand folds to error_mark_node,
return error_mark_node instead of building trees with error_mark_node
operands.


OK.

Jason




[patch] libstdc++/69478 Fix assertions for move assignment of trivial types

2016-01-26 Thread Jonathan Wakely

The PR shows that we are incorrectly asserting that types are
copy-assignable when we are going to move-assign them.

Tested powerpc64-linux, committed to trunk. Branch commits to follow
shortly.

commit d11631c7c7f6630e83fdbe7f8e16f55eea2dd773
Author: Jonathan Wakely 
Date:   Tue Jan 26 13:38:33 2016 +

Fix assertions for move assignment of trivial types

	PR libstdc++/69478
	* include/bits/stl_algobase.h (__copy_move<_IsMove, true,
	random_access_iterator_tag>): Check is_move_assignable when moving.
	(__copy_move_backwards<_IsMove, true, random_access_iterator_tag>):
	Likewise.
	* testsuite/25_algorithms/copy/move_iterators/69478.cc: New.
	* testsuite/25_algorithms/copy_backward/move_iterators/69478.cc: New.
	* testsuite/25_algorithms/move/69478.cc: New.
	* testsuite/25_algorithms/move_backward/69478.cc: new.

diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h
index 4a618be..d95ea51 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -357,9 +357,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __copy_m(const _Tp* __first, const _Tp* __last, _Tp* __result)
 {
 #if __cplusplus >= 201103L
+	  using __assignable = conditional<_IsMove,
+	   is_move_assignable<_Tp>,
+	   is_copy_assignable<_Tp>>;
 	  // trivial types can have deleted assignment
-	  static_assert( is_copy_assignable<_Tp>::value,
-	 "type is not assignable" );
+	  static_assert( __assignable::type::value, "type is not assignable" );
 #endif
 	  const ptrdiff_t _Num = __last - __first;
 	  if (_Num)
@@ -557,9 +559,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __copy_move_b(const _Tp* __first, const _Tp* __last, _Tp* __result)
 {
 #if __cplusplus >= 201103L
+	  using __assignable = conditional<_IsMove,
+	   is_move_assignable<_Tp>,
+	   is_copy_assignable<_Tp>>;
 	  // trivial types can have deleted assignment
-	  static_assert( is_copy_assignable<_Tp>::value,
-	 "type is not assignable" );
+	  static_assert( __assignable::type::value, "type is not assignable" );
 #endif
 	  const ptrdiff_t _Num = __last - __first;
 	  if (_Num)
diff --git a/libstdc++-v3/testsuite/25_algorithms/copy/move_iterators/69478.cc b/libstdc++-v3/testsuite/25_algorithms/copy/move_iterators/69478.cc
new file mode 100644
index 000..707b273
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/copy/move_iterators/69478.cc
@@ -0,0 +1,40 @@
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// PR libstdc++/69478
+
+#include 
+#include 
+
+void
+test01()
+{
+  // A move-only type that is also a trivial class.
+  struct trivial_rvalstruct
+  {
+trivial_rvalstruct() = default;
+trivial_rvalstruct(trivial_rvalstruct&&) = default;
+trivial_rvalstruct& operator=(trivial_rvalstruct&&) = default;
+  };
+  static_assert(std::is_trivial::value, "");
+
+  trivial_rvalstruct a[1], b[1];
+  copy(std::make_move_iterator(a), std::make_move_iterator(a + 1), b);
+}
diff --git a/libstdc++-v3/testsuite/25_algorithms/copy_backward/move_iterators/69478.cc b/libstdc++-v3/testsuite/25_algorithms/copy_backward/move_iterators/69478.cc
new file mode 100644
index 000..e00d146
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/copy_backward/move_iterators/69478.cc
@@ -0,0 +1,40 @@
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+

Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2016-01-26 Thread H.J. Lu
On Tue, Jan 26, 2016 at 11:27 AM, Jason Merrill  wrote:
> On 12/14/2015 05:08 PM, H.J. Lu wrote:
>>
>> +  if (abi_version_at_least (10))
>> +TYPE_EMPTY_RECORD (t) = is_really_empty_class (t);
>
>
> This should use is_empty_class or CLASSTYPE_EMPTY_P.  We don't want to
> change how classes with just a vptr are passed.
>
> Otherwise, it looks OK to me.

Is true_type an empty class here?  is_empty_class returns false
on this:

[hjl@gnu-skl-1 gcc]$ cat x.cc
struct dummy { };
struct true_type { struct dummy i; };

extern true_type y;
extern void xxx (true_type c);

void
yyy (void)
{
  xxx (y);
}
[hjl@gnu-skl-1 gcc]$


-- 
H.J.


Re: [PATCH] Fix up wi::lrshift (PR c++/69399)

2016-01-26 Thread H.J. Lu
On Tue, Jan 26, 2016 at 11:17 AM, Jakub Jelinek  wrote:
> On Tue, Jan 26, 2016 at 11:00:52AM -0800, Mike Stump wrote:
>> On Jan 26, 2016, at 10:21 AM, Jakub Jelinek  wrote
>> > The question is, shall we do what wi::lshift does and have the fast path
>> > only for the constant shifts, as the untested patch below does, or shall we
>> > check shift dynamically (thus use
>> > shift < HOST_BITS_PER_WIDE_INT
>> > instead of
>> > STATIC_CONSTANT_P (shift < HOST_BITS_PER_WIDE_INT)
>> > in the patch),
>>
>> Hum…  I think I prefer the dynamic check.  The reasoning is that when we
>> fast path, we can tolerate the conditional branch to retain the fast path,
>> as most of the time, that condition will usually be true.  If the compiler
>> had troubles with knowing the usual truth value of the expression, seems
>> like we can hint that it will be true and influence the static prediction
>> of the branch.  This permits us to fast path almost all the time in the
>> non-constant, but small enough case.  For known shifts, there is no code
>> gen difference, so it doesn’t matter.
>
> Ok, I've cancelled my pending bootstrap/regtest and am testing this instead:
>
> 2016-01-26  Jakub Jelinek  
>
> PR tree-optimization/69399
> * wide-int.h (wi::lrshift): For larger precisions, only
> use fast path if shift is known to be < HOST_BITS_PER_WIDE_INT.
>
> --- gcc/wide-int.h.jj   2016-01-04 18:50:34.656471663 +0100
> +++ gcc/wide-int.h  2016-01-26 20:07:03.147054988 +0100
> @@ -2909,7 +2909,9 @@ wi::lrshift (const T1 &x, const T2 &y)
>  For variable-precision integers like wide_int, handle HWI
>  and sub-HWI integers inline.  */
>if (STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT)
> - ? xi.len == 1 && xi.val[0] >= 0
> + ? (shift < HOST_BITS_PER_WIDE_INT
> +&& xi.len == 1
> +&& xi.val[0] >= 0)
>   : xi.precision <= HOST_BITS_PER_WIDE_INT)
> {
>   val[0] = xi.to_uhwi () >> shift;
>
>
> Jakub

Can you add the testcase in PR 69399 to gcc.dg/torture?

Thanks.


-- 
H.J.


RFA (tree.c): PATCH for c++/68782 (wrong TREE_CONSTANT flag on C++ CONSTRUCTOR)

2016-01-26 Thread Jason Merrill
The problem in this bug was that the constexpr code builds a lot of 
CONSTRUCTORs and then fills in the elements later without ever going 
back and updating TREE_CONSTANT and TREE_SIDE_EFFECTS.


This patch adds middle end functions recompute_constructor_flags and 
verify_constructor_flags, and fixes the constexpr code to be more 
careful about updating the flags.


Tested x86_64-pc-linux-gnu. Are the tree.c changes OK for trunk?
commit 2ffc171465931c8de27a8f5afd2963df63d8d6e5
Author: Jason Merrill 
Date:   Tue Jan 26 15:12:27 2016 -0500

	PR c++/68782

gcc/
	* tree.c (recompute_constructor_flags): Split out from
	build_constructor.
	(verify_constructor_flags): New.
	* tree.h: Declare them.
gcc/cp/
	* constexpr.c (cxx_eval_bare_aggregate): Update TREE_CONSTANT
	and TREE_SIDE_EFFECTS.
	(cxx_eval_constant_expression) [CONSTRUCTOR]: Call
	verify_constructor_flags.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 6b0e5a8..263ef38 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -2214,6 +2214,9 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, tree t,
   vec **p = &CONSTRUCTOR_ELTS (ctx->ctor);
   vec_alloc (*p, vec_safe_length (v));
 
+  bool constant_p = true;
+  bool side_effects_p = false;
+
   unsigned i; tree index, value;
   FOR_EACH_CONSTRUCTOR_ELT (v, i, index, value)
 {
@@ -2231,6 +2234,11 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, tree t,
 	break;
   if (elt != value)
 	changed = true;
+
+  if (!TREE_CONSTANT (elt))
+	constant_p = false;
+  if (TREE_SIDE_EFFECTS (elt))
+	side_effects_p = true;
   if (index && TREE_CODE (index) == COMPONENT_REF)
 	{
 	  /* This is an initialization of a vfield inside a base
@@ -2264,6 +2272,8 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, tree t,
   /* We're done building this CONSTRUCTOR, so now we can interpret an
  element without an explicit initializer as value-initialized.  */
   CONSTRUCTOR_NO_IMPLICIT_ZERO (t) = false;
+  TREE_CONSTANT (t) = constant_p;
+  TREE_SIDE_EFFECTS (t) = side_effects_p;
   if (VECTOR_TYPE_P (TREE_TYPE (t)))
 t = fold (t);
   return t;
@@ -2826,6 +2836,8 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, tree t,
 }
   type = TREE_TYPE (object);
   bool no_zero_init = true;
+
+  vec *ctors = make_tree_vector();
   while (!refs->is_empty())
 {
   if (*valp == NULL_TREE)
@@ -2837,6 +2849,8 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, tree t,
 	 subobjects will also be zero-initialized.  */
   no_zero_init = CONSTRUCTOR_NO_IMPLICIT_ZERO (*valp);
 
+  vec_safe_push (ctors, *valp);
+
   enum tree_code code = TREE_CODE (type);
   type = refs->pop();
   tree index = refs->pop();
@@ -2889,14 +2903,35 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, tree t,
   /* The hash table might have moved since the get earlier.  */
   valp = ctx->values->get (object);
   if (TREE_CODE (init) == CONSTRUCTOR)
-	/* An outer ctx->ctor might be pointing to *valp, so just replace
-	   its contents.  */
-	CONSTRUCTOR_ELTS (*valp) = CONSTRUCTOR_ELTS (init);
+	{
+	  /* An outer ctx->ctor might be pointing to *valp, so replace
+	 its contents.  */
+	  CONSTRUCTOR_ELTS (*valp) = CONSTRUCTOR_ELTS (init);
+	  TREE_CONSTANT (*valp) = TREE_CONSTANT (init);
+	  TREE_SIDE_EFFECTS (*valp) = TREE_SIDE_EFFECTS (init);
+	}
   else
 	*valp = init;
 }
   else
-*valp = init;
+{
+  *valp = init;
+
+  /* Update TREE_CONSTANT and TREE_SIDE_EFFECTS on enclosing
+	 CONSTRUCTORs.  */
+  unsigned i; tree elt;
+  bool c = TREE_CONSTANT (init);
+  bool s = TREE_SIDE_EFFECTS (init);
+  if (!c || s)
+	FOR_EACH_VEC_SAFE_ELT (ctors, i, elt)
+	  {
+	if (!c)
+	  TREE_CONSTANT (elt) = false;
+	if (s)
+	  TREE_SIDE_EFFECTS (elt) = true;
+	  }
+}
+  release_tree_vector (ctors);
 
   if (*non_constant_p)
 return t;
@@ -3579,9 +3614,16 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,
 
 case CONSTRUCTOR:
   if (TREE_CONSTANT (t))
-	/* Don't re-process a constant CONSTRUCTOR, but do fold it to
-	   VECTOR_CST if applicable.  */
-	return fold (t);
+	{
+	  /* Don't re-process a constant CONSTRUCTOR, but do fold it to
+	 VECTOR_CST if applicable.  */
+	  if (CHECKING_P)
+	verify_constructor_flags (t);
+	  else
+	recompute_constructor_flags (t);
+	  if (TREE_CONSTANT (t))
+	return fold (t);
+	}
   r = cxx_eval_bare_aggregate (ctx, t, lval,
    non_constant_p, overflow_p);
   break;
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-aggr2.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-aggr2.C
new file mode 100644
index 000..805d026
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-aggr2.C
@@ -0,0 +1,27 @@
+// PR c++/68782
+// { dg-do compile { target c++11 } }
+
+#define assert(X) do { if (!(X)) __builtin_abort(); } while (0)
+
+struct holder { int& value; };
+
+constexpr holder from_value(int& val

Re: [PATCH] Fix up wi::lrshift (PR c++/69399)

2016-01-26 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Tue, Jan 26, 2016 at 11:00:52AM -0800, Mike Stump wrote:
>> On Jan 26, 2016, at 10:21 AM, Jakub Jelinek  wrote
>> > The question is, shall we do what wi::lshift does and have the fast path
>> > only for the constant shifts, as the untested patch below does, or shall we
>> > check shift dynamically (thus use
>> > shift < HOST_BITS_PER_WIDE_INT
>> > instead of
>> > STATIC_CONSTANT_P (shift < HOST_BITS_PER_WIDE_INT)
>> > in the patch),
>> 
>> Hum…  I think I prefer the dynamic check.  The reasoning is that when we
>> fast path, we can tolerate the conditional branch to retain the fast path,
>> as most of the time, that condition will usually be true.  If the compiler
>> had troubles with knowing the usual truth value of the expression, seems
>> like we can hint that it will be true and influence the static prediction
>> of the branch.  This permits us to fast path almost all the time in the
>> non-constant, but small enough case.  For known shifts, there is no code
>> gen difference, so it doesn’t matter.
>
> Ok, I've cancelled my pending bootstrap/regtest and am testing this instead:
>
> 2016-01-26  Jakub Jelinek  
>
>   PR tree-optimization/69399
>   * wide-int.h (wi::lrshift): For larger precisions, only
>   use fast path if shift is known to be < HOST_BITS_PER_WIDE_INT.
>
> --- gcc/wide-int.h.jj 2016-01-04 18:50:34.656471663 +0100
> +++ gcc/wide-int.h2016-01-26 20:07:03.147054988 +0100
> @@ -2909,7 +2909,9 @@ wi::lrshift (const T1 &x, const T2 &y)
>For variable-precision integers like wide_int, handle HWI
>and sub-HWI integers inline.  */
>if (STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT)
> -   ? xi.len == 1 && xi.val[0] >= 0
> +   ? (shift < HOST_BITS_PER_WIDE_INT
> +  && xi.len == 1
> +  && xi.val[0] >= 0)
> : xi.precision <= HOST_BITS_PER_WIDE_INT)
>   {
> val[0] = xi.to_uhwi () >> shift;

LGTM, thanks.

Richard


Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2016-01-26 Thread Marc Glisse

On Tue, 26 Jan 2016, H.J. Lu wrote:


On Tue, Jan 26, 2016 at 11:27 AM, Jason Merrill  wrote:

On 12/14/2015 05:08 PM, H.J. Lu wrote:


+  if (abi_version_at_least (10))
+TYPE_EMPTY_RECORD (t) = is_really_empty_class (t);



This should use is_empty_class or CLASSTYPE_EMPTY_P.  We don't want to
change how classes with just a vptr are passed.

Otherwise, it looks OK to me.


Is true_type an empty class here?  is_empty_class returns false
on this:


It isn't empty in the usual C++ sense (we can't apply the empty base 
optimization to something that derives from it, for instance), or the one 
described in the itanium ABI (the relevant one here I guess). On the other 
hand, it is rather useless to pass it by value, so a different notion of 
empty might have been useful when the ABI was defined...



[hjl@gnu-skl-1 gcc]$ cat x.cc
struct dummy { };
struct true_type { struct dummy i; };

extern true_type y;
extern void xxx (true_type c);

void
yyy (void)
{
 xxx (y);
}
[hjl@gnu-skl-1 gcc]$


--
Marc Glisse


Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2016-01-26 Thread H.J. Lu
On Tue, Jan 26, 2016 at 12:23 PM, Marc Glisse  wrote:
> On Tue, 26 Jan 2016, H.J. Lu wrote:
>
>> On Tue, Jan 26, 2016 at 11:27 AM, Jason Merrill  wrote:
>>>
>>> On 12/14/2015 05:08 PM, H.J. Lu wrote:


 +  if (abi_version_at_least (10))
 +TYPE_EMPTY_RECORD (t) = is_really_empty_class (t);
>>>
>>>
>>>
>>> This should use is_empty_class or CLASSTYPE_EMPTY_P.  We don't want to
>>> change how classes with just a vptr are passed.
>>>
>>> Otherwise, it looks OK to me.
>>
>>
>> Is true_type an empty class here?  is_empty_class returns false
>> on this:
>
>
> It isn't empty in the usual C++ sense (we can't apply the empty base
> optimization to something that derives from it, for instance), or the one
> described in the itanium ABI (the relevant one here I guess). On the other
> hand, it is rather useless to pass it by value, so a different notion of

llvm/clang treats it as empty class and I think it should be treated
as "empty" class.

> empty might have been useful when the ABI was defined...

I proposed to update x86-64 psABI:

https://groups.google.com/forum/#!topic/x86-64-abi/VTE-LJ9VnDk

>
>> [hjl@gnu-skl-1 gcc]$ cat x.cc
>> struct dummy { };
>> struct true_type { struct dummy i; };
>>
>> extern true_type y;
>> extern void xxx (true_type c);
>>
>> void
>> yyy (void)
>> {
>>  xxx (y);
>> }
>> [hjl@gnu-skl-1 gcc]$
>
>
> --
> Marc Glisse



-- 
H.J.


Re: RFA (tree.c): PATCH for c++/68782 (wrong TREE_CONSTANT flag on C++ CONSTRUCTOR)

2016-01-26 Thread Jakub Jelinek
On Tue, Jan 26, 2016 at 03:20:04PM -0500, Jason Merrill wrote:
> Tested x86_64-pc-linux-gnu. Are the tree.c changes OK for trunk?

The tree.c changes are ok.  But I have nits and one bigger issue in
constexpr.c:

> --- a/gcc/cp/constexpr.c
> +++ b/gcc/cp/constexpr.c
> @@ -2214,6 +2214,9 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, tree 
> t,
>vec **p = &CONSTRUCTOR_ELTS (ctx->ctor);
>vec_alloc (*p, vec_safe_length (v));
>  
> +  bool constant_p = true;
> +  bool side_effects_p = false;
> +
>unsigned i; tree index, value;

I think vars of different types shouldn't be declared on the same line.
And perhaps the empty line in between two sets of declarations isn't needed.

>FOR_EACH_CONSTRUCTOR_ELT (v, i, index, value)
>  {
> @@ -2826,6 +2836,8 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, 
> tree t,
>  }
>type = TREE_TYPE (object);
>bool no_zero_init = true;
> +
> +  vec *ctors = make_tree_vector();

Missing space before (.

>while (!refs->is_empty())
>  {
>if (*valp == NULL_TREE)
> @@ -2837,6 +2849,8 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, 
> tree t,
>subobjects will also be zero-initialized.  */
>no_zero_init = CONSTRUCTOR_NO_IMPLICIT_ZERO (*valp);
>  
> +  vec_safe_push (ctors, *valp);
> +
>enum tree_code code = TREE_CODE (type);
>type = refs->pop();
>tree index = refs->pop();
> @@ -2889,14 +2903,35 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, 
> tree t,
>/* The hash table might have moved since the get earlier.  */
>valp = ctx->values->get (object);
>if (TREE_CODE (init) == CONSTRUCTOR)
> - /* An outer ctx->ctor might be pointing to *valp, so just replace
> -its contents.  */
> - CONSTRUCTOR_ELTS (*valp) = CONSTRUCTOR_ELTS (init);
> + {
> +   /* An outer ctx->ctor might be pointing to *valp, so replace
> +  its contents.  */
> +   CONSTRUCTOR_ELTS (*valp) = CONSTRUCTOR_ELTS (init);
> +   TREE_CONSTANT (*valp) = TREE_CONSTANT (init);
> +   TREE_SIDE_EFFECTS (*valp) = TREE_SIDE_EFFECTS (init);
> + }
>else
>   *valp = init;
>  }
>else
> -*valp = init;
> +{
> +  *valp = init;
> +
> +  /* Update TREE_CONSTANT and TREE_SIDE_EFFECTS on enclosing
> +  CONSTRUCTORs.  */
> +  unsigned i; tree elt;
> +  bool c = TREE_CONSTANT (init);
> +  bool s = TREE_SIDE_EFFECTS (init);
> +  if (!c || s)
> + FOR_EACH_VEC_SAFE_ELT (ctors, i, elt)
> +   {
> + if (!c)
> +   TREE_CONSTANT (elt) = false;
> + if (s)
> +   TREE_SIDE_EFFECTS (elt) = true;
> +   }
> +}
> +  release_tree_vector (ctors);
>  
>if (*non_constant_p)
>  return t;
> @@ -3579,9 +3614,16 @@ cxx_eval_constant_expression (const constexpr_ctx 
> *ctx, tree t,
>  
>  case CONSTRUCTOR:
>if (TREE_CONSTANT (t))
> - /* Don't re-process a constant CONSTRUCTOR, but do fold it to
> -VECTOR_CST if applicable.  */
> - return fold (t);
> + {
> +   /* Don't re-process a constant CONSTRUCTOR, but do fold it to
> +  VECTOR_CST if applicable.  */
> +   if (CHECKING_P)
> + verify_constructor_flags (t);
> +   else
> + recompute_constructor_flags (t);

But I don't understand this.  Either the flags are supposed to be already
correct here, then I'd expect to see
  if (CHECKING_P)
verify_constructor_flags (t);
only, or they are not guaranteed to be correct, and then I'd expect
unconditional
  recompute_constructor_flags (t).

> +   if (TREE_CONSTANT (t))
> + return fold (t);
> + }
>r = cxx_eval_bare_aggregate (ctx, t, lval,
>  non_constant_p, overflow_p);
>break;

Jakub


[wwwdocs][PATCH] Add notes on -Wmisleading-indentation to GCC 6 porting guide

2016-01-26 Thread David Malcolm
htdocs/gcc-6/porting_to.html is looking rather empty right now.  The
attached patch starts fleshing it out by adding some notes on
-Wmisleading-indentation.

[see the notes at https://gcc.gnu.org/ml/gcc/2016-01/msg00224.html on
what -Wmisleading-indentation ran into on a mass-rebuild of Debian]

I put it apart from the existing headings as it relates to both C and to
C++.

Validates.

OK to commit?
Dave
Index: htdocs/gcc-6/porting_to.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/porting_to.html,v
retrieving revision 1.1
diff -u -p -r1.1 porting_to.html
--- htdocs/gcc-6/porting_to.html	20 Jan 2016 11:53:52 -	1.1
+++ htdocs/gcc-6/porting_to.html	26 Jan 2016 20:35:05 -
@@ -33,6 +33,69 @@ manner. Additions and suggestions for im
 
 C++ language issues
 
+-Wmisleading-indentation
+
+A new warning -Wmisleading-indentation was added
+to -Wall, warning about places where the indentation of
+the code might mislead a human reader about the control flow:
+
+
+
+sslKeyExchange.c: In function 'SSLVerifySignedServerKeyExchange':
+sslKeyExchange.c:631:8: warning: statement is indented as if it were guarded by... [-Wmisleading-indentation]
+goto fail;
+^~~~
+sslKeyExchange.c:629:4: note: ...this 'if' clause, but it is not
+if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
+^~
+
+
+
+This has highlighted genuine bugs, often due to missing braces, but it
+sometimes reports warnings for poorly-indented files, or on projects
+with unusual indentation.  This may cause build errors if you
+have -Wall -Werror in your project.
+
+
+
+The best fix is usually to fix the indentation of the code to match
+the block structure, or to fix the block structure by adding missing
+braces.  If changing the source is not practical or desirable (e.g. for
+autogenerated code, or to avoid churn in the source history), the
+warning can be disabled by adding -Wno-misleading-indentation
+to the build flags.  Alternatively, you can disable it for just one part of
+a source file or function using pragmas:
+
+
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wmisleading-indentation"
+
+/* (code for which the warning is to be disabled)  */
+
+#pragma GCC diagnostic pop
+
+
+
+Source files with mixed tabs and spaces that don't use 8-space tabs
+may lead to warnings.  A real-world example was for such a source file, which
+contained an Emacs directive to view tabs to be 4 spaces wide:
+
+
+
+  /* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */
+
+
+
+The mixture of tabs and spaces did correctly reflect the block
+structure when viewed in Emacs, but not in other editors, or in an
+HTML view of the source repository.
+By default, -Wmisleading-indentation assumes tabs to
+be 8 spaces wide.  It would have been possible to avoid this warning
+by adding -ftabstop=4 to the build flags for this file,
+but given that the code was confusing when viewed in other editors,
+the indentation of the source was fixed instead.
+
 
 Links
 


[Patch, MIPS] Patch for PR 68400, a mips16 bug

2016-01-26 Thread Steve Ellcey
Here is a patch for PR6400.  The problem is that and_operands_ok was checking
one operand to see if it was a memory_operand but MIPS16 addressing is more
restrictive than what the general memory_operand allows.  The fix was to
call mips_classify_address if TARGET_MIPS16 is set because it will do a
more complete mips16 addressing check and reject operands that do not meet
the more restrictive requirements.

I ran the GCC testsuite with no regressions and have included a test case as
part of this patch.

OK to checkin?

Steve Ellcey
sell...@imgtec.com


2016-01-26  Steve Ellcey  

PR target/68400
* config/mips/mips.c (and_operands_ok): Add MIPS16 check.



diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index dd54d6a..adeafa3 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -8006,9 +8006,18 @@ mask_low_and_shift_p (machine_mode mode, rtx mask, rtx 
shift, int maxlen)
 bool
 and_operands_ok (machine_mode mode, rtx op1, rtx op2)
 {
-  return (memory_operand (op1, mode)
- ? and_load_operand (op2, mode)
- : and_reg_operand (op2, mode));
+
+  if (memory_operand (op1, mode))
+{
+  if (TARGET_MIPS16) {
+   struct mips_address_info addr;
+   if (!mips_classify_address (&addr, op1, mode, false))
+ return false;
+  }
+  return and_load_operand (op2, mode);
+}
+  else
+return and_reg_operand (op2, mode);
 }
 
 /* The canonical form of a mask-low-and-shift-left operation is




2016-01-26  Steve Ellcey  

PR target/68400
* gcc.target/mips/mips.exp (mips_option_groups): Add stack-protector.
* gcc.target/mips/pr68400.c: New test.


diff --git a/gcc/testsuite/gcc.target/mips/mips.exp 
b/gcc/testsuite/gcc.target/mips/mips.exp
index f191331..ff9c99a 100644
--- a/gcc/testsuite/gcc.target/mips/mips.exp
+++ b/gcc/testsuite/gcc.target/mips/mips.exp
@@ -257,6 +257,7 @@ set mips_option_groups {
 lsa "(|!)HAS_LSA"
 section_start "-Wl,--section-start=.*"
 frame-header "-mframe-header-opt|-mno-frame-header-opt"
+stack-protector "-fstack-protector"
 }
 
 for { set option 0 } { $option < 32 } { incr option } {
diff --git a/gcc/testsuite/gcc.target/mips/pr68400.c 
b/gcc/testsuite/gcc.target/mips/pr68400.c
index e69de29..1099568 100644
--- a/gcc/testsuite/gcc.target/mips/pr68400.c
+++ b/gcc/testsuite/gcc.target/mips/pr68400.c
@@ -0,0 +1,28 @@
+/* PR target/pr68400
+   This was triggering an ICE in change_address_1 when compiled with -Os.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fstack-protector -mips16" } */
+
+typedef struct s {
+ unsigned long long d;
+ long long t;
+} p;
+
+int sh(int x, unsigned char *buf)
+{
+ p *uhdr = (p *)buf;
+ unsigned int i = 0;
+ uhdr->d = ((uhdr->d & 0xff00LL) >> 56)
+| ((uhdr->d & 0xff00LL) >> 24)
+| ((uhdr->d & 0xff00LL) << 8)
+| ((uhdr->d & 0x00ffLL) << 56);
+ uhdr->t = ((uhdr->t & 0xff00LL) >> 56)
+| ((uhdr->t & 0xff00LL) >> 24)
+| ((uhdr->t & 0x00ffLL) >> 8)
+| ((uhdr->t & 0xff00LL) << 8)
+| ((uhdr->t & 0xff00LL) << 40)
+| ((uhdr->t & 0x00ffLL) << 56);
+ i += 4;
+ if (x < i) return 0; else return 1;
+}


Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2016-01-26 Thread Marc Glisse

On Tue, 26 Jan 2016, H.J. Lu wrote:


On Tue, Jan 26, 2016 at 12:23 PM, Marc Glisse  wrote:

On Tue, 26 Jan 2016, H.J. Lu wrote:


On Tue, Jan 26, 2016 at 11:27 AM, Jason Merrill  wrote:


On 12/14/2015 05:08 PM, H.J. Lu wrote:



+  if (abi_version_at_least (10))
+TYPE_EMPTY_RECORD (t) = is_really_empty_class (t);




This should use is_empty_class or CLASSTYPE_EMPTY_P.  We don't want to
change how classes with just a vptr are passed.

Otherwise, it looks OK to me.



Is true_type an empty class here?  is_empty_class returns false
on this:



It isn't empty in the usual C++ sense (we can't apply the empty base
optimization to something that derives from it, for instance), or the one
described in the itanium ABI (the relevant one here I guess). On the other
hand, it is rather useless to pass it by value, so a different notion of


llvm/clang treats it as empty class and I think it should be treated
as "empty" class.


Is it still empty if there are several empty members? Is there a clear 
definition somewhere of what empty means? I guess it makes sense to 
recursively allow "empty" members for this purpose.



empty might have been useful when the ABI was defined...


I proposed to update x86-64 psABI:

https://groups.google.com/forum/#!topic/x86-64-abi/VTE-LJ9VnDk


Does the full document have a definition of empty anywhere?


[hjl@gnu-skl-1 gcc]$ cat x.cc
struct dummy { };
struct true_type { struct dummy i; };

extern true_type y;
extern void xxx (true_type c);

void
yyy (void)
{
 xxx (y);
}
[hjl@gnu-skl-1 gcc]$



--
Marc Glisse


--
Marc Glisse


Re: [PATCH] Fix up wi::lrshift (PR c++/69399)

2016-01-26 Thread Richard Biener
On January 26, 2016 8:00:52 PM GMT+01:00, Mike Stump  
wrote:
>On Jan 26, 2016, at 10:21 AM, Jakub Jelinek  wrote
>> The question is, shall we do what wi::lshift does and have the fast
>path
>> only for the constant shifts, as the untested patch below does, or
>shall we
>> check shift dynamically (thus use
>> shift < HOST_BITS_PER_WIDE_INT
>> instead of
>> STATIC_CONSTANT_P (shift < HOST_BITS_PER_WIDE_INT)
>> in the patch),
>
>Hum…  I think I prefer the dynamic check.  The reasoning is that when
>we fast path, we can tolerate the conditional branch to retain the fast
>path, as most of the time, that condition will usually be true.  If the
>compiler had troubles with knowing the usual truth value of the
>expression, seems like we can hint that it will be true and influence
>the static prediction of the branch.  This permits us to fast path
>almost all the time in the non-constant, but small enough case.  For
>known shifts, there is no code gen difference, so it doesn’t matter.

The original reasoning was to inline only the fast path if it is known at 
compile time and otherwise have a call. Exactly to avoid bloating callers with 
inlined conditionals.

Richard.




Re: RFA (tree.c): PATCH for c++/68782 (wrong TREE_CONSTANT flag on C++ CONSTRUCTOR)

2016-01-26 Thread Jason Merrill

On 01/26/2016 03:32 PM, Jakub Jelinek wrote:

>+ if (CHECKING_P)
>+   verify_constructor_flags (t);
>+ else
>+   recompute_constructor_flags (t);



But I don't understand this.  Either the flags are supposed to be already
correct here, then I'd expect to see
   if (CHECKING_P)
 verify_constructor_flags (t);
only, or they are not guaranteed to be correct, and then I'd expect
unconditional
   recompute_constructor_flags (t).



They are supposed to be correct, so when --enable-checking, we check for 
that.  The recompute is for better fault-tolerance in release compilers 
in case the patch doesn't catch everything.


Jason



Re: RFA (tree.c): PATCH for c++/68782 (wrong TREE_CONSTANT flag on C++ CONSTRUCTOR)

2016-01-26 Thread Jakub Jelinek
On Tue, Jan 26, 2016 at 03:46:50PM -0500, Jason Merrill wrote:
> On 01/26/2016 03:32 PM, Jakub Jelinek wrote:
> >>>+if (CHECKING_P)
> >>>+  verify_constructor_flags (t);
> >>>+else
> >>>+  recompute_constructor_flags (t);
> 
> >But I don't understand this.  Either the flags are supposed to be already
> >correct here, then I'd expect to see
> >   if (CHECKING_P)
> > verify_constructor_flags (t);
> >only, or they are not guaranteed to be correct, and then I'd expect
> >unconditional
> >   recompute_constructor_flags (t).
> >
> 
> They are supposed to be correct, so when --enable-checking, we check for
> that.  The recompute is for better fault-tolerance in release compilers in
> case the patch doesn't catch everything.

Ah, ok.  But please make sure to remove it after GCC 6 branches. 

Jakub


[PATCH] [graphite] handle isl_ast_op_select

2016-01-26 Thread Sebastian Pop
2016-01-26  Abderrazek Zaafrani  
Sebastian Pop  

* graphite-isl-ast-to-gimple.c (ternary_op_to_tree): Handle
isl_ast_op_cond and isl_ast_op_select.
(gcc_expression_from_isl_expr_op): Same.

* gcc.dg/graphite/isl-ast-op-select.c: New.
---
 gcc/graphite-isl-ast-to-gimple.c  | 18 +++---
 gcc/testsuite/gcc.dg/graphite/isl-ast-op-select.c | 29 +++
 2 files changed, 37 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/isl-ast-op-select.c

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 0f58503..81ed304 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -689,22 +689,20 @@ tree
 translate_isl_ast_to_gimple::
 ternary_op_to_tree (tree type, __isl_take isl_ast_expr *expr, ivs_params &ip)
 {
-  gcc_assert (isl_ast_expr_get_op_type (expr) == isl_ast_op_minus);
+  enum isl_ast_op_type t = isl_ast_expr_get_op_type (expr);
+  gcc_assert (t == isl_ast_op_cond || t == isl_ast_op_select);
   isl_ast_expr *arg_expr = isl_ast_expr_get_op_arg (expr, 0);
-  tree tree_first_expr
-= gcc_expression_from_isl_expression (type, arg_expr, ip);
+  tree a = gcc_expression_from_isl_expression (type, arg_expr, ip);
   arg_expr = isl_ast_expr_get_op_arg (expr, 1);
-  tree tree_second_expr
-= gcc_expression_from_isl_expression (type, arg_expr, ip);
+  tree b = gcc_expression_from_isl_expression (type, arg_expr, ip);
   arg_expr = isl_ast_expr_get_op_arg (expr, 2);
-  tree tree_third_expr
-= gcc_expression_from_isl_expression (type, arg_expr, ip);
+  tree c = gcc_expression_from_isl_expression (type, arg_expr, ip);
   isl_ast_expr_free (expr);
 
   if (codegen_error)
 return NULL_TREE;
-  return fold_build3 (COND_EXPR, type, tree_first_expr,
- tree_second_expr, tree_third_expr);
+
+  return fold_build3 (COND_EXPR, type, a, b, c);
 }
 
 /* Converts a unary isl_ast_expr_op expression E to a GCC expression tree of
@@ -791,7 +789,6 @@ gcc_expression_from_isl_expr_op (tree type, __isl_take 
isl_ast_expr *expr,
 case isl_ast_op_call:
 case isl_ast_op_and_then:
 case isl_ast_op_or_else:
-case isl_ast_op_select:
   gcc_unreachable ();
 
 case isl_ast_op_max:
@@ -822,6 +819,7 @@ gcc_expression_from_isl_expr_op (tree type, __isl_take 
isl_ast_expr *expr,
   return unary_op_to_tree (type, expr, ip);
 
 case isl_ast_op_cond:
+case isl_ast_op_select:
   return ternary_op_to_tree (type, expr, ip);
 
 default:
diff --git a/gcc/testsuite/gcc.dg/graphite/isl-ast-op-select.c 
b/gcc/testsuite/gcc.dg/graphite/isl-ast-op-select.c
new file mode 100644
index 000..688176e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/isl-ast-op-select.c
@@ -0,0 +1,29 @@
+/* { dg-options "-O2 -floop-nest-optimize" } */
+
+static void kernel_gemm(int ni, int nj, int nk, double alpha, double beta, 
double C[1024][1024], double A[1024][1024], double B[1024][1024])
+{
+ int i, j, k;
+ for (i = 0; i < ni; i++)
+   for (j = 0; j < nj; j++)
+ {
+   C[i][j] *= beta;
+   for (k = 0; k < nk; ++k)
+ C[i][j] += alpha * A[i][k] * B[k][j];
+ }
+}
+
+void *polybench_alloc_data (int, int);
+
+int main(int argc, char** argv) {
+  int ni = 1024;
+  int nj = 1024;
+  int nk = 1024;
+  double alpha;
+  double beta;
+  double (*C)[1024][1024];
+  C = (double(*)[1024][1024])polybench_alloc_data ((1024) * (1024), 
sizeof(double));
+  double (*A)[1024][1024];
+  A = (double(*)[1024][1024])polybench_alloc_data ((1024) * (1024), 
sizeof(double));
+  double (*B)[1024][1024];
+  kernel_gemm (ni, nj, nk, alpha, beta, *C, *A, *B);
+}
-- 
2.5.0



Re: [PATCH] Handle -fsanitize=* in lto-wrapper (PR lto/69254)

2016-01-26 Thread Jakub Jelinek
On Tue, Jan 26, 2016 at 04:44:39PM +0100, Jakub Jelinek wrote:
> 2016-01-26  Jakub Jelinek  
> 
>   PR lto/69254
>   * lto-opts.c (lto_write_options): Write also -f{,no-}sanitize=
>   options.
>   * lto-wrapper.c (struct lto_decoded_options): New type.
>   (append_option, merge_and_complain, append_compiler_options,
>   append_linker_options, append_offload_options,
>   compile_offload_image, compile_images_for_offload_targets,
>   find_and_merge_options): Pass around options
>   in struct lto_decoded_options instead of struct cl_decoded_option
>   pointer and count pair.
>   (get_options_from_collect_gcc_options): Likewise.  Parse -fsanitize=
>   options and if in the end any ub sanitizers are enabled, set
>   decoded_opts->sanitize_undefined to true.
>   (run_gcc): Adjust callers of these functions.  If
>   fdecoded_options.sanitize_undefined is true, append
>   -fsanitize=shift after the linker options.

Now successfully bootstrapped/regtested on x86_64-linux and i686-linux.

Jakub


Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-01-26 Thread Richard Sandiford
[cc-ing Eric as RTL maintainer]

Matthew Fortune  writes:
> Bernd Edlinger  writes:
>> Matthew Fortune  writes:
>> > Has the patch been tested beyond just building GCC? I can do a
>> > test run for you if you don't have things set up to do one yourself.
>> 
>> I built a cross-gcc with all languages and a cross-glibc, but I have
>> not set up an emulation environment, so if you could give it a test
>> that would be highly welcome.
>
> mipsel-linux-gnu test results are the same before and after this patch.
>
> Please go ahead and commit.

I still object to this.  And it feels like the patch was posted
as though it was a new one in order to avoid answering the objections
that were raised when it was last posted:

  https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02218.html

IMO the problem is that rtx_addr_can_trap_p_1 duplicates a large
bit of LRA/reload logic:


/* Compute an approximation for the offset between the register
   FROM and TO for the current function, as it was at the start
   of the routine.  */

static HOST_WIDE_INT
get_initial_register_offset (int from, int to)
{
#ifdef ELIMINABLE_REGS
  static const struct elim_table_t
  {
const int from;
const int to;
  } table[] = ELIMINABLE_REGS;
  HOST_WIDE_INT offset1, offset2;
  unsigned int i, j;

  if (to == from)
return 0;

  /* It is not safe to call INITIAL_ELIMINATION_OFFSET
 before the reload pass.  We need to give at least
 an estimation for the resulting frame size.  */
  if (! reload_completed)
{
  offset1 = crtl->outgoing_args_size + get_frame_size ();
#if !STACK_GROWS_DOWNWARD
  offset1 = - offset1;
#endif
  if (to == STACK_POINTER_REGNUM)
return offset1;
  else if (from == STACK_POINTER_REGNUM)
return - offset1;
  else
return 0;
 }

  for (i = 0; i < ARRAY_SIZE (table); i++)
  if (table[i].from == from)
{
  if (table[i].to == to)
{
  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
  offset1);
  return offset1;
}
  for (j = 0; j < ARRAY_SIZE (table); j++)
{
  if (table[j].to == to
  && table[j].from == table[i].to)
{
  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
  offset1);
  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
  offset2);
  return offset1 + offset2;
}
  if (table[j].from == to
  && table[j].to == table[i].to)
{
  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
  offset1);
  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
  offset2);
  return offset1 - offset2;
}
}
}
  else if (table[i].to == from)
{
  if (table[i].from == to)
{
  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
  offset1);
  return - offset1;
}
  for (j = 0; j < ARRAY_SIZE (table); j++)
{
  if (table[j].to == to
  && table[j].from == table[i].from)
{
  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
  offset1);
  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
  offset2);
  return - offset1 + offset2;
}
  if (table[j].from == to
  && table[j].to == table[i].from)
{
  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
  offset1);
  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
  offset2);
  return - offset1 - offset2;
}
}
}

  /* If the requested register combination was not found,
 try a different more simple combination.  */
  if (from == ARG_POINTER_REGNUM)
return get_initial_register_offset (HARD_FRAME_POINTER_REGNUM, to);
  else if (to == ARG_POINTER_REGNUM)
return get_initial_register_offset (from, HARD_FRAME_POINTER_REGNUM);
  else if (from == HARD_FRAME_POINTER_REGNUM)
return get_initial_register_offset (FRAME_POINTER_REGNUM, to);
  else if (to == HARD_FRAME_POINTER_REGNUM)
return get_initial_register_offset (from, FRAME_POINTER_REGNUM);
  else
return 0;

#else
  HOST_WIDE_INT offset;

  if (to == from)
return 0;

  if (reload_completed)
{
  

Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2016-01-26 Thread H.J. Lu
On Tue, Jan 26, 2016 at 12:44 PM, Marc Glisse  wrote:
> On Tue, 26 Jan 2016, H.J. Lu wrote:
>
>> On Tue, Jan 26, 2016 at 12:23 PM, Marc Glisse 
>> wrote:
>>>
>>> On Tue, 26 Jan 2016, H.J. Lu wrote:
>>>
 On Tue, Jan 26, 2016 at 11:27 AM, Jason Merrill 
 wrote:
>
>
> On 12/14/2015 05:08 PM, H.J. Lu wrote:
>>
>>
>>
>> +  if (abi_version_at_least (10))
>> +TYPE_EMPTY_RECORD (t) = is_really_empty_class (t);
>
>
>
>
> This should use is_empty_class or CLASSTYPE_EMPTY_P.  We don't want to
> change how classes with just a vptr are passed.
>
> Otherwise, it looks OK to me.



 Is true_type an empty class here?  is_empty_class returns false
 on this:
>>>
>>>
>>>
>>> It isn't empty in the usual C++ sense (we can't apply the empty base
>>> optimization to something that derives from it, for instance), or the one
>>> described in the itanium ABI (the relevant one here I guess). On the
>>> other
>>> hand, it is rather useless to pass it by value, so a different notion of
>>
>>
>> llvm/clang treats it as empty class and I think it should be treated
>> as "empty" class.
>
>
> Is it still empty if there are several empty members? Is there a clear
> definition somewhere of what empty means? I guess it makes sense to
> recursively allow "empty" members for this purpose.

Like this:

/* Returns true if TYPE is POD for the purpose of layout and an empty
   class or an class with empty classes.  */

static bool
is_empty_record (tree type)
{
  if (type == error_mark_node)
return false;

  if (!CLASS_TYPE_P (type))
return false;

  if (CLASSTYPE_NON_LAYOUT_POD_P (type))
return false;

  gcc_assert (COMPLETE_TYPE_P (type));

  if (CLASSTYPE_EMPTY_P (type))
return true;

  tree field;

  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
if (TREE_CODE (field) == FIELD_DECL
&& !DECL_ARTIFICIAL (field)
&& !is_empty_record (TREE_TYPE (field)))
  return false;

  return true;
}

-- 
H.J.


Re: [PATCH] Fix up wi::lrshift (PR c++/69399)

2016-01-26 Thread Jakub Jelinek
On Tue, Jan 26, 2016 at 09:45:19PM +0100, Richard Biener wrote:
> On January 26, 2016 8:00:52 PM GMT+01:00, Mike Stump  
> wrote:
> >On Jan 26, 2016, at 10:21 AM, Jakub Jelinek  wrote
> >> The question is, shall we do what wi::lshift does and have the fast
> >path
> >> only for the constant shifts, as the untested patch below does, or
> >shall we
> >> check shift dynamically (thus use
> >> shift < HOST_BITS_PER_WIDE_INT
> >> instead of
> >> STATIC_CONSTANT_P (shift < HOST_BITS_PER_WIDE_INT)
> >> in the patch),
> >
> >Hum…  I think I prefer the dynamic check.  The reasoning is that when
> >we fast path, we can tolerate the conditional branch to retain the fast
> >path, as most of the time, that condition will usually be true.  If the
> >compiler had troubles with knowing the usual truth value of the
> >expression, seems like we can hint that it will be true and influence
> >the static prediction of the branch.  This permits us to fast path
> >almost all the time in the non-constant, but small enough case.  For
> >known shifts, there is no code gen difference, so it doesn’t matter.
> 
> The original reasoning was to inline only the fast path if it is known at 
> compile time and otherwise have a call. Exactly to avoid bloating callers 
> with inlined conditionals.

I'm now also bootstrapping/regtesting following patch, the previous one
passed bootstrap/regtest, and will do cc1plus size comparison afterwards.
That said, I have done a quick check, where I believe that unless xi and
shift are both compile time constants, for the
STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT)
case there should be some comparison and the lrshift_large call with
the non-STATIC_CONSTANT_P variant, but in the bootstrapped (non-LTO)
cc1plus I only see 14 calls to lrshift_large, thus I bet it will likely affect
only <= 14 places right now:

00776990 
<_ZL11build_new_1PP3vecIP9tree_node5va_gc8vl_embedES1_S1_S6_bi>:
  777ca4:   e8 17 28 91 00  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>
--
008b8bc0 <_ZL31adjust_offset_for_component_refP9tree_nodePbPl.part.91>:
  8b8cc2:   e8 f9 17 7d 00  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>
--
00a7b550 
<_ZN2wi7rrotateIPK9tree_node16generic_wide_intI16wide_int_storageEEENS_12unary_traitsIT_E11result_typeERKS8_RKT0_j>:
  a7baea:   e8 d1 e9 60 00  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>
00a7bd90 
<_ZN2wi7lrotateIPK9tree_node16generic_wide_intI16wide_int_storageEEENS_12unary_traitsIT_E11result_typeERKS8_RKT0_j>:
  a7c457:   e8 64 e0 60 00  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>
--
00a7c710 
<_ZN2wi7lrshiftIP9tree_nodelEENS_12unary_traitsIT_E11result_typeERKS4_RKT0_>:
  a7c851:   e8 6a dc 60 00  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>
--
00a7d280 
<_ZN2wi7lrshiftIPK9tree_node16generic_wide_intI16wide_int_storageEEENS_12unary_traitsIT_E11result_typeERKS8_RKT0_>:
  a7d3b9:   e8 02 d1 60 00  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>
--
00cc1370 <_Z15real_to_integerPK10real_valuePbi>:
  cc1752:   e8 69 8d 3c 00  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>
--
00cc27f0 
<_Z17real_from_integerP10real_value13format_helperRK16generic_wide_intI20wide_int_ref_storageILb0EEE6signop>:
  cc2f05:   e8 b6 75 3c 00  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>
--
00d6c420 
<_Z31simplify_const_binary_operation8rtx_code12machine_modeP7rtx_defS2_>:
  d6f5a5:   e8 16 af 31 00  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>
--
00d7ca40 
<_ZN2wi7lrotateISt4pairIP7rtx_def12machine_modeES5_EENS_12unary_traitsIT_E11result_typeERKS7_RKT0_j>:
  d7d17a:   e8 41 d3 30 00  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>
00d7d310 
<_ZN2wi7rrotateISt4pairIP7rtx_def12machine_modeES5_EENS_12unary_traitsIT_E11result_typeERKS7_RKT0_j>:
  d7d99d:   e8 1e cb 30 00  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>
--
00ea3c40 
<_ZN2wi7lrshiftI16generic_wide_intI22fixed_wide_int_storageILi192EEES4_EENS_12unary_traitsIT_E11result_typeERKS6_RKT0_>:
  ea3ca7:   e8 14 68 1e 00  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>
--
00f435a0 
<_ZL27copy_reference_ops_from_refP9tree_nodeP3vecI22vn_reference_op_struct7va_heap6vl_ptrE>:
  f444b2:   e8 09 60 14 00  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>
--
0161c840 
<_ZL21restructure_referencePP9tree_nodeS1_P16generic_wide_intI22fixed_wide_int_storageILi192EEES1_.constprop.133>:
 161cb51:   e8 6a d9 a6 ff  callq  108a4c0 
<_ZN2wi13lrshift_largeEPlPKl>

2016-01-26  Jakub Jelinek  

PR tree-optimization/69399
* wide-int.h (wi::lrshift): For larger precisions, only
use fast path if shift is known to be < HOST_BITS_PER_WIDE_INT.

* gcc.dg/torture/pr69399.c: New test.

--- gcc/wide-int.h.jj   2016-01-04 14:5

Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2016-01-26 Thread Jakub Jelinek
On Tue, Jan 26, 2016 at 01:21:52PM -0800, H.J. Lu wrote:
> Like this:
> 
> /* Returns true if TYPE is POD for the purpose of layout and an empty
>class or an class with empty classes.  */
> 
> static bool
> is_empty_record (tree type)
> {
>   if (type == error_mark_node)
> return false;
> 
>   if (!CLASS_TYPE_P (type))
> return false;
> 
>   if (CLASSTYPE_NON_LAYOUT_POD_P (type))
> return false;
> 
>   gcc_assert (COMPLETE_TYPE_P (type));
> 
>   if (CLASSTYPE_EMPTY_P (type))
> return true;
> 
>   tree field;
> 
>   for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
> if (TREE_CODE (field) == FIELD_DECL
> && !DECL_ARTIFICIAL (field)
> && !is_empty_record (TREE_TYPE (field)))
>   return false;
> 
>   return true;
> }

So you say that K1 in e.g.:

struct A1 {}; struct A2 {};
struct B1 { A1 a; A2 b; }; struct B2 { A1 a; A2 b; };
struct C1 { B1 a; B2 b; }; struct C2 { B1 a; B2 b; };
struct D1 { C1 a; C2 b; }; struct D2 { C1 a; C2 b; };
struct E1 { D1 a; D2 b; }; struct E2 { D1 a; D2 b; };
struct F1 { E1 a; E2 b; }; struct F2 { E1 a; E2 b; };
struct G1 { F1 a; F2 b; }; struct G2 { F1 a; F2 b; };
struct H1 { G1 a; G2 b; }; struct H2 { G1 a; G2 b; };
struct I1 { H1 a; H2 b; }; struct I2 { H1 a; H2 b; };
struct J1 { I1 a; I2 b; }; struct J2 { I1 a; I2 b; };
struct K1 { J1 a; J2 b; };
int v;
__attribute__((noinline, noclone))
K1 foo (int a, K1 x, int b)
{
  v = a + b;
  return x;
}
K1 k, m;
void
bar (void)
{
  m = foo (1, k, 2);
}

is empty class?  What does clang do with this?

Jakub


  1   2   >