Re: [PATCH 7/7] Libsanitizer merge from upstream r249633.

2015-10-19 Thread Jakub Jelinek
On Fri, Oct 16, 2015 at 02:29:08PM +0300, Maxim Ostapenko wrote:
> On 14/10/15 15:12, Jakub Jelinek wrote:
> >On Wed, Oct 14, 2015 at 03:02:22PM +0300, Maxim Ostapenko wrote:
> >>On 14/10/15 14:06, Jakub Jelinek wrote:
> >>>On Wed, Oct 14, 2015 at 01:51:44PM +0300, Maxim Ostapenko wrote:
> Ok, got it. The first solution would require changes in libsanitizer 
> because
> heuristic doesn't work for GCC, so perhaps new UBSan entry point should go
> upstream, right? Or this may be implemented as local patch for GCC?
> >>>No.  The heuristics relies on:
> >>>1) either it is old style float cast overflow without location
> >>>2) or it is new style float cast with location, but the location must:
> >>>a) not have NULL filename
> >>>b) the filename must not be ""
> >>>c) the filename must not be "\1"
> >>>So, my proposal was to emit in GCC the old style float cast overflow if 
> >>>a), b) or
> >>>c) is true, otherwise the new style.  I have no idea what you mean by
> >>>heuristic doesn't work for GCC after that.
> >>I mean that there are some cases where (FilenameOrTypeDescriptor[0] +
> >>FilenameOrTypeDescriptor[1] < 2) is not sufficient to determine if we should
> >>use old style. I actually caught this on float-cast-overflow-10.c testcase.
> >Ah, ok, in that case the heuristics is flawed.  If they want to keep it,
> >they should check if MaybeFromTypeKind is either < 2 or equal to 0x1fe.
> >Can you report it upstream?  If that is changed, we'd need to change the
> >above and also add
> >   d) the filename must not start with "\xff\xff"
> >to the rules.
> >
> >I think it would be better to just add a whole new entrypoint, but if they
> >think the heuristics is good enough, they should at least fix it up.
> >
> > Jakub
> >
> 
> Done. I've realized that we could just set loc to input_location if loc ==
> UNKNOWN_LOCATION. In this case, we always would have new style. This would

While using input_location in this case (as it is invoked from the FEs)
might help sometimes, it still doesn't guarantee input_location will not
be UNKNOWN_LOCATION afterwards, or builtin location, or b), c) or d) above.

Plus there is no fix on the library side to the heuristics, which we need
anyway.

Jakub


Re: [PATCH 2/7] Libsanitizer merge from upstream r249633.

2015-10-19 Thread Jakub Jelinek
On Thu, Oct 15, 2015 at 01:34:06PM +0300, Maxim Ostapenko wrote:
> Ah, right, fixing this now. Does this looks better now?

Yes, it is ok now.

> 2015-10-12  Maxim Ostapenko  
> 
> config/
> 
>   * bootstrap-asan.mk: Replace ASAN_OPTIONS=detect_leaks with
>   LSAN_OPTIONS=detect_leaks.
> 
> gcc/
> 
>   * asan.c (asan_emit_stack_protection): Don't pass local stack to
>   asan_stack_malloc_[n] anymore. Check if asan_stack_malloc_[n] returned
>   NULL and use local stack than.
>   (asan_finish_file): Insert __asan_version_mismatch_check_v[n] call
>   in addition to __asan_init.
>   * sanitizer.def (BUILT_IN_ASAN_INIT): Rename to __asan_init.
>   (BUILT_IN_ASAN_VERSION_MISMATCH_CHECK): Add new builtin call.
> 
> gcc/testsuite/
> 
>   g++.dg/asan/default-options-1.C: Adjust testcase.

Jakub


[RFC] Add OPTGROUP_PAR

2015-10-19 Thread Tom de Vries

Hi,

this patch adds OPTGROUP_PAR.

It allows a user to see on stderr what loops are parallelized by 
pass_parallelize_loops, using -fopt-info-par:

...
$ gcc -O2 -fopt-info-par test.c -ftree-parallelize-loops=32
test.c:5:3: note: parallelized inner loop
...

This patch doesn't include any MSG_MISSED_OPTIMIZATION/MSG_NOTE messages 
yet.


Idea of the patch OK?

Any other comments?

Thanks,
- Tom
Add OPTGROUP_PAR

2015-10-19  Tom de Vries  

	* doc/invoke.texi (@item -fopt-info): Add @item par in group of
	optimizations table.
	* dumpfile.c (optgroup_options): Add OPTGROUP_PAR entry.
	* dumpfile.h (OPTGROUP_PAR): New define.
	(OPTGROUP_OTHER): Renumber.
	(OPTGROUP_ALL): Add OPTGROUP_PAR.
	* tree-parloops.c (parallelize_loops): Handle -fopt-info-par.
	(pass_data_parallelize_loops): Change optinfo_flags from OPTGROUP_LOOP
	to OPTGROUP_PAR.
---
 gcc/doc/invoke.texi |  2 ++
 gcc/dumpfile.c  |  1 +
 gcc/dumpfile.h  |  5 +++--
 gcc/tree-parloops.c | 16 ++--
 4 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 54e9f12..629ee37 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7319,6 +7319,8 @@ Enable dumps from all loop optimizations.
 Enable dumps from all inlining optimizations.
 @item vec
 Enable dumps from all vectorization optimizations.
+@item par
+Enable dumps from all auto-parallelization optimizations.
 @item optall
 Enable dumps from all optimizations. This is a superset of
 the optimization groups listed above.
diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
index e4c4748..421d19b 100644
--- a/gcc/dumpfile.c
+++ b/gcc/dumpfile.c
@@ -138,6 +138,7 @@ static const struct dump_option_value_info optgroup_options[] =
   {"loop", OPTGROUP_LOOP},
   {"inline", OPTGROUP_INLINE},
   {"vec", OPTGROUP_VEC},
+  {"par", OPTGROUP_PAR},
   {"optall", OPTGROUP_ALL},
   {NULL, 0}
 };
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index 5f30077..52371f4 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -97,9 +97,10 @@ enum tree_dump_index
 #define OPTGROUP_LOOP(1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE  (1 << 3)   /* Inlining passes */
 #define OPTGROUP_VEC (1 << 4)   /* Vectorization passes */
-#define OPTGROUP_OTHER   (1 << 5)   /* All other passes */
+#define OPTGROUP_PAR	 (1 << 5)   /* Auto-parallelization passes */
+#define OPTGROUP_OTHER   (1 << 6)   /* All other passes */
 #define OPTGROUP_ALL	 (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
-  | OPTGROUP_VEC | OPTGROUP_OTHER)
+			  | OPTGROUP_VEC | OPTGROUP_PAR | OPTGROUP_OTHER)
 
 /* Define a tree dump switch.  */
 struct dump_file_info
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index c7aa62c..e98c2c7 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2718,17 +2718,21 @@ parallelize_loops (void)
 
   changed = true;
   skip_loop = loop->inner;
+  const char *loop_describe = (loop->inner
+   ? "outer"
+   : "inner");
+  loop_loc = find_loop_location (loop);
   if (dump_file && (dump_flags & TDF_DETAILS))
   {
-	if (loop->inner)
-	  fprintf (dump_file, "parallelizing outer loop %d\n",loop->header->index);
-	else
-	  fprintf (dump_file, "parallelizing inner loop %d\n",loop->header->index);
-	loop_loc = find_loop_location (loop);
+	fprintf (dump_file, "parallelizing %s loop %d\n", loop_describe,
+		 loop->header->index);
 	if (loop_loc != UNKNOWN_LOCATION)
 	  fprintf (dump_file, "\nloop at %s:%d: ",
 		   LOCATION_FILE (loop_loc), LOCATION_LINE (loop_loc));
   }
+  if (dump_enabled_p ())
+	dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loop_loc,
+			 "parallelized %s loop\n", loop_describe);
   gen_parallel_loop (loop, &reduction_list,
 			 n_threads, &niter_desc);
 }
@@ -2752,7 +2756,7 @@ const pass_data pass_data_parallelize_loops =
 {
   GIMPLE_PASS, /* type */
   "parloops", /* name */
-  OPTGROUP_LOOP, /* optinfo_flags */
+  OPTGROUP_PAR, /* optinfo_flags */
   TV_TREE_PARALLELIZE_LOOPS, /* tv_id */
   ( PROP_cfg | PROP_ssa ), /* properties_required */
   0, /* properties_provided */
-- 
1.9.1



Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-10-19 Thread Eric Botcazou
> Why is Ada fliddling with the modes? Is it only for packed structures?

Yes, in Ada packing or representation clauses are allowed to modify the type 
of components, so you can have e.g. a record type with size S1 and BLKmode and 
fields of this type with a packed version of this record type (with size S2 I was wondering how to produce VCE convesions of aggregates with C frontend
> at all (that is getting them synthetized by the middle-end) to get non-ada
> testcases.  Storing through union is never folded to one and I don't see
> any other obvious way of getting them.  Perhaps it may be possible to get
> them via inliner on incompatible parameter and LTO, but that seems to be
> the only case I can think of right now.

That makes sense, all the machinery implementing type fiddling for the Ada 
compiler is in gigi, not in stor-layout.c for example.

> I am testing the change to compare modes and revert the two expr.c changes.
> Lets see what is Richard's opinion. The whole concept of modes on aggregate
> types is bit funny post-tree-ssa days when we do SRA. I suppose they may be
> tied to calling conventions but should no longer be needed for code quality?

Ideally it should not be tied to calling conventions either, but it is known 
that some back-ends still use it for this purpose.

-- 
Eric Botcazou


Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-10-19 Thread Eric Botcazou
> Adding back the mode check is fine if all types with the same TYPE_CANONICAL
> have the same mode.  Otherwise we'd regress here.

It's true for the Ada compiler, the type fiddling machinery always resets it.

-- 
Eric Botcazou


Re: [patch 2/6] scalar-storage-order merge: C front-end

2015-10-19 Thread Eric Botcazou
> > +  if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
> > +error ("scalar_storage_order is not supported");
> 
> You might want to consider indicating why it's not supported.  Not that
> I expect folks to be using this on a pdp11 :-)

Done, I added "because endianness is not uniform".

> > -  /* For &x[y], return x+y */
> > -  if (TREE_CODE (arg) == ARRAY_REF)
> > -   {
> > - tree op0 = TREE_OPERAND (arg, 0);
> > - if (!c_mark_addressable (op0))
> > -   return error_mark_node;
> > -   }
> 
> Do we still get a proper diagnostic for &x[y] where x isn't something we
> can mark addressable?

Yes, c_mark_addressable is invoked on 'arg' later and the function looks into 
the prefix of an ARRAY_REF

> No real objections, assuming that &x[y] diagnostics is still handled
> correctly somewhere.

OK, thanks.

-- 
Eric Botcazou


[gomp4, committed] Don't parallelize loops in oacc routine

2015-10-19 Thread Tom de Vries

Hi,

this patch prevents parloops from trying to parallelize loops in an oacc 
routine.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Don't parallelize loops in oacc routine

2015-10-19  Tom de Vries  

	* tree-parloops.c (parallelize_loops): Do not parallelize loops in
	offloaded functions.
---
 gcc/tree-parloops.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index b2c2e6e..cef1b52 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -3191,6 +3191,11 @@ parallelize_loops (bool oacc_kernels_p)
   /* Do not parallelize loops in the functions created by parallelization.  */
   if (parallelized_function_p (cfun->decl))
 return false;
+
+  /* Do not parallelize loops in offloaded functions.  */
+  if (get_oacc_fn_attrib (cfun->decl) != NULL)
+return false;
+
   if (cfun->has_nonlocal_label)
 return false;
 
-- 
1.9.1



Re: [RFC VTV] Fix VTV for targets that have section anchors.

2015-10-19 Thread Ramana Radhakrishnan
On Tue, Oct 13, 2015 at 1:53 PM, Ramana Radhakrishnan
 wrote:
>
>
>
> On 12/10/15 21:44, Jeff Law wrote:
>> On 10/09/2015 03:17 AM, Ramana Radhakrishnan wrote:
>>> This started as a Friday afternoon project ...
>>>
>>> It turned out enabling VTV for AArch64 and ARM was a matter of fixing
>>> PR67868 which essentially comes from building libvtv with section
>>> anchors turned on. The problem was that the flow of control from
>>> output_object_block through to switch_section did not have the same
>>> special casing for the vtable section that exists in
>>> assemble_variable.
>> That's some ugly code.  You might consider factoring that code into a 
>> function and just calling it from both places.  Your version doesn't seem to 
>> handle PECOFF, so I'd probably refactor from assemble_variable.
>>
>
> I was a bit lazy as I couldn't immediately think of a target that would want 
> PECOFF, section anchors and VTV. That combination seems to be quite rare, 
> anyway point taken on the refactor.
>
> Ok if no regressions ?

Ping.

Ramana

>
>>>
>>> However both these failures also occur on x86_64 - so I'm content to
>>> declare victory on AArch64 as far as basic enablement goes.
>> Cool.
>>
>>>
>>> 1. Are the generic changes to varasm.c ok ? 2. Can we take the
>>> AArch64 support in now, given this amount of testing ? Marcus /
>>> Caroline ? 3. Any suggestions / helpful debug hints for VTV debugging
>>> (other than turning VTV_DEBUG on and inspecting trace) ?
>> I think that with refactoring they'd be good to go.  No opinions on the 
>> AArch64 specific question -- call for the AArch64 maintainers.
>>
>> Good to see someone hacking on vtv.  It's in my queue to look at as well.
>
> Yeah figuring out more about vtv is also in my background queue.
>
> regards
> Ramana
>
> PR other/67868
>
> * varasm.c (assemble_variable): Move special vtv handling to..
> (handle_vtv_comdat_sections): .. here. New function.
> (output_object_block): Handle vtv sections.
>
> libvtv/Changelog
>
> * configure.tgt: Support aarch64 and arm.


Re: [PATCH] PR middle-end/68002: introduce -fkeep-static-functions

2015-10-19 Thread Richard Biener
On Sat, Oct 17, 2015 at 5:17 PM, VandeVondele  Joost
 wrote:
> In some cases (e.g. coverage testing) it is useful to emit code for static 
> functions even if they are never used, which currently is not possible at -O1 
> and above. The following patch introduces a flag for this, which basically 
> triggers the same code that keeps those functions alive at -O0. Thanks to 
> Marc Glisse for replying at gcc-help and for suggesting where to look.
>
> Bootstrapped and regtested on x86_64-unknown-linux-gnu
>
> OK for trunk ?

Ok.

Thanks,
Richard.

> Joost


Re: [PATCH, committed] PR other/65800. Fix crash in gengtype's internal debug debug dump

2015-10-19 Thread Richard Biener
On Mon, Oct 19, 2015 at 1:46 AM, Mikhail Maltsev  wrote:
> Hi!
>
> gengtype has an option '-d' which allows to dump it's internal state. I 
> planned
> to use it in order to create some kind of list of all data which GCC stores in
> garbage-collected memory.
>
> Unfortunately this option was broken. The attached patch fixes it. Because it
> only affects gengtype's internal debugging option (and is also rather small), 
> I
> think it's OK to commit it without approve (as obvious).
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu.

Ok.

> --
> Regards,
> Mikhail Maltsev
>
> gcc/ChangeLog:
>
> 2015-10-18  Mikhail Maltsev  
>
> PR other/65800
> * gengtype.c (dump_type): Handle TYPE_UNDEFINED correctly.


Re: [RFC] Add OPTGROUP_PAR

2015-10-19 Thread Richard Biener
On Mon, Oct 19, 2015 at 9:27 AM, Tom de Vries  wrote:
> Hi,
>
> this patch adds OPTGROUP_PAR.
>
> It allows a user to see on stderr what loops are parallelized by
> pass_parallelize_loops, using -fopt-info-par:
> ...
> $ gcc -O2 -fopt-info-par test.c -ftree-parallelize-loops=32
> test.c:5:3: note: parallelized inner loop
> ...
>
> This patch doesn't include any MSG_MISSED_OPTIMIZATION/MSG_NOTE messages
> yet.
>
> Idea of the patch OK?
>
> Any other comments?

Ok.

> Thanks,
> - Tom


Re: [PATCH] Fix default_binds_local_p_2 for extern protected data

2015-10-19 Thread Szabolcs Nagy

On 14/10/15 10:55, Szabolcs Nagy wrote:

On 30/09/15 20:23, Andreas Krebbel wrote:

On 09/30/2015 06:21 PM, Szabolcs Nagy wrote:

On 30/09/15 14:47, Bernd Schmidt wrote:

On 09/17/2015 11:15 AM, Szabolcs Nagy wrote:

ping 2.

this patch is needed for working visibility ("protected")
attribute for extern data on targets using default_binds_local_p_2.
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01871.html


I hesitate to review this one since I don't think I understand the
issues on the various affected arches well enough. It looks like Jakub
had some input on the earlier changes, maybe he could take a look? Or
maybe rth knows best. Adding Ccs.

It would help to have examples of code generation demonstrating the
problem and how you would solve it. Input from the s390 maintainers
whether this is correct for their port would also be appreciated.


We are having the same problem on S/390. I think the GCC change is correct for 
S/390 as well.

-Andreas-



i think the approvals of arm and aarch64 maintainers
are needed to apply this fix for pr target/66912.

(only s390, arm and aarch64 use this predicate.)



i was told this needs global maintainer approval.
adding jakub and rth back to cc.





consider the TU

__attribute__((visibility("protected"))) int n;

int f () { return n; }

if n "binds_local" then gcc -O -fpic -S is like

  .text
  .align  2
  .global f
  .arch armv8-a+fp+simd
  .type   f, %function
f:
  adrpx0, n
  ldr w0, [x0, #:lo12:n]
  ret
  .size   f, .-f
  .protected  n
  .comm   n,4,4

so 'n' is a direct reference, not accessed through
the GOT ('n' will be in the .bss of the dso).
this is the current behavior.

if i remove the protected visibility attribute
then the access goes through GOT:

  .text
  .align  2
  .global f
  .arch armv8-a+fp+simd
  .type   f, %function
f:
  adrpx0, _GLOBAL_OFFSET_TABLE_
  ldr x0, [x0, #:gotpage_lo15:n]
  ldr w0, [x0]
  ret
  .size   f, .-f
  .comm   n,4,4

protected visibility means the definition cannot
be overridden by another module, but it should
still allow extern references.

if the main module references such an object then
(as an implementation detail) it may use copy
relocation against it, which places 'n' in the
main module and the dynamic linker should make
sure that references to 'n' point there.

this is only possible if references to 'n' go
through the GOT (i.e. it should not be "binds_local").












[hsa] Fix ICE in build_outer_var_ref within GPUKERNEL

2015-10-19 Thread Martin Jambor
Hi,

the following patch fixes a segfault when building an outer_ref which
would be in GPUKERNEL context hwen lowering.  In that case, we need to
use the outer context od the GPUKERNEL container.  Committed to the
branch.

Thanks,

Martin


2015-10-19  Martin Jambor  

* omp-low.c (build_outer_var_ref): If outer ctx is GPUKERNEL, use its
outer ctx.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 383f34a..5234a11 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1186,7 +1186,16 @@ build_outer_var_ref (tree var, omp_context *ctx)
x = var;
 }
   else if (ctx->outer)
-x = lookup_decl (var, ctx->outer);
+{
+  omp_context *outer = ctx->outer;
+  if (gimple_code (outer->stmt) == GIMPLE_OMP_GPUKERNEL)
+   {
+ outer = outer->outer;
+ gcc_assert (outer
+ && gimple_code (outer->stmt) != GIMPLE_OMP_GPUKERNEL);
+   }
+   x = lookup_decl (var, outer);
+}
   else if (is_reference (var))
 /* This can happen with orphaned constructs.  If var is reference, it is
possible it is shared and as such valid.  */


Re: [gomp4.1] map clause parsing improvements

2015-10-19 Thread Thomas Schwinge
Hi!

On Thu, 11 Jun 2015 14:14:20 +0200, Jakub Jelinek  wrote:
> On Tue, Jun 09, 2015 at 09:36:08PM +0300, Ilya Verbin wrote:
> > On Wed, Apr 29, 2015 at 14:06:44 +0200, Jakub Jelinek wrote:
> > > [...] The draft requires only alloc or to
> > > (or always, variants) for enter data and only from or delete (or always,
> > > variants) for exit data, so in theory it is possible to figure that from
> > > the call without extra args, but not so for update - enter data is 
> > > supposed
> > > to increment reference counts, exit data decrement. [...]
> > 
> > TR3.pdf also says about 'release' map-type for exit data, but it is not
> > described in the document.
> 
> So, I've committed a patch to add parsing release map-kind, and fix up or add
> verification in C/C++ FE what map-kinds are used.
> 
> Furthermore, it seems the OpenMP 4.1 always modifier is something completely
> unrelated to the OpenACC force flag, in OpenMP 4.1 everything is reference
> count based, and always seems to make a difference only for from/to/tofrom,
> where it says that the copying is done unconditionally; thus the patch uses
> a different bit for that.

Aha, I see.  (The poor OpenACC/OpenMP users, having to remember so may
small yet intricate details...)

> include/
>   * gomp-constants.h (GOMP_MAP_FLAG_ALWAYS): Define.
>   (enum gomp_map_kind): Add GOMP_MAP_ALWAYS_TO, GOMP_MAP_ALWAYS_FROM,
>   GOMP_MAP_ALWAYS_TOFROM, GOMP_MAP_DELETE, GOMP_MAP_RELEASE.

> --- include/gomp-constants.h.jj   2015-05-21 11:12:09.0 +0200
> +++ include/gomp-constants.h  2015-06-11 11:24:32.041654947 +0200
> @@ -41,6 +41,8 @@
>  #define GOMP_MAP_FLAG_SPECIAL_1  (1 << 3)
>  #define GOMP_MAP_FLAG_SPECIAL(GOMP_MAP_FLAG_SPECIAL_1 \
>| GOMP_MAP_FLAG_SPECIAL_0)
> +/* OpenMP always flag.  */
> +#define GOMP_MAP_FLAG_ALWAYS (1 << 6)
>  /* Flag to force a specific behavior (or else, trigger a run-time error).  */
>  #define GOMP_MAP_FLAG_FORCE  (1 << 7)
>  
> @@ -77,7 +79,21 @@ enum gomp_map_kind
>  /* ..., and copy from device.  */
>  GOMP_MAP_FORCE_FROM =(GOMP_MAP_FLAG_FORCE | GOMP_MAP_FROM),
>  /* ..., and copy to and from device.  */
> -GOMP_MAP_FORCE_TOFROM =  (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TOFROM)
> +GOMP_MAP_FORCE_TOFROM =  (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TOFROM),
> +/* If not already present, allocate.  And unconditionally copy to
> +   device.  */
> +GOMP_MAP_ALWAYS_TO = (GOMP_MAP_FLAG_ALWAYS | GOMP_MAP_TO),
> +/* If not already present, allocate.  And unconditionally copy from
> +   device.  */
> +GOMP_MAP_ALWAYS_FROM =   (GOMP_MAP_FLAG_ALWAYS | GOMP_MAP_FROM),
> +/* If not already present, allocate.  And unconditionally copy to and 
> from
> +   device.  */
> +GOMP_MAP_ALWAYS_TOFROM = (GOMP_MAP_FLAG_ALWAYS | 
> GOMP_MAP_TOFROM),
> +/* OpenMP 4.1 alias for forced deallocation.  */
> +GOMP_MAP_DELETE =GOMP_MAP_FORCE_DEALLOC,

To avoid confusion about two different identifiers naming the same
functionality, I'd prefer to avoid such aliases ("GOMP_MAP_DELETE =
GOMP_MAP_FORCE_DEALLOC"), and instead just rename GOMP_MAP_FORCE_DEALLOC
to GOMP_MAP_DELETE, if that's the name you prefer.

By the way, looking at GCC 6 libgomp compatibility regarding
OpenACC/nvptx offloading for executables compiled with GCC 5, for the
legacy entry point libgomp/oacc-parallel.c:GOACC_parallel only supports
host-fallback execution, which doesn't pay attention to data clause at
all (sizes and kinds formal parameters), so you're free to renumber
GOMP_MAP_* if/where that makes sense.

> +/* Decrement usage count and deallocate if zero.  */
> +GOMP_MAP_RELEASE =   (GOMP_MAP_FLAG_ALWAYS
> +  | GOMP_MAP_FORCE_DEALLOC)
>};

I have not yet read the OpenMP 4.1/4.5 standard, but it's not obvious to
me here how the GOMP_MAP_FLAG_ALWAYS flag relates to the OpenMP release
clause (GOMP_MAP_RELEASE here)?  Shouldn't GOMP_MAP_RELEASE be
"(GOMP_MAP_FLAG_SPECIAL_1 | 3)" or similar?


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [gomp4.1] map clause parsing improvements

2015-10-19 Thread Jakub Jelinek
On Mon, Oct 19, 2015 at 12:20:23PM +0200, Thomas Schwinge wrote:
> > @@ -77,7 +79,21 @@ enum gomp_map_kind
> >  /* ..., and copy from device.  */
> >  GOMP_MAP_FORCE_FROM =  (GOMP_MAP_FLAG_FORCE | GOMP_MAP_FROM),
> >  /* ..., and copy to and from device.  */
> > -GOMP_MAP_FORCE_TOFROM =(GOMP_MAP_FLAG_FORCE | 
> > GOMP_MAP_TOFROM)
> > +GOMP_MAP_FORCE_TOFROM =(GOMP_MAP_FLAG_FORCE | 
> > GOMP_MAP_TOFROM),
> > +/* If not already present, allocate.  And unconditionally copy to
> > +   device.  */
> > +GOMP_MAP_ALWAYS_TO =   (GOMP_MAP_FLAG_ALWAYS | GOMP_MAP_TO),
> > +/* If not already present, allocate.  And unconditionally copy from
> > +   device.  */
> > +GOMP_MAP_ALWAYS_FROM = (GOMP_MAP_FLAG_ALWAYS | GOMP_MAP_FROM),
> > +/* If not already present, allocate.  And unconditionally copy to and 
> > from
> > +   device.  */
> > +GOMP_MAP_ALWAYS_TOFROM =   (GOMP_MAP_FLAG_ALWAYS | 
> > GOMP_MAP_TOFROM),
> > +/* OpenMP 4.1 alias for forced deallocation.  */
> > +GOMP_MAP_DELETE =  GOMP_MAP_FORCE_DEALLOC,
> 
> To avoid confusion about two different identifiers naming the same
> functionality, I'd prefer to avoid such aliases ("GOMP_MAP_DELETE =
> GOMP_MAP_FORCE_DEALLOC"), and instead just rename GOMP_MAP_FORCE_DEALLOC
> to GOMP_MAP_DELETE, if that's the name you prefer.

If you are ok with removing GOMP_MAP_FORCE_DEALLOC and just use
GOMP_MAP_DELETE, that is ok by me, just post a patch.

> By the way, looking at GCC 6 libgomp compatibility regarding
> OpenACC/nvptx offloading for executables compiled with GCC 5, for the
> legacy entry point libgomp/oacc-parallel.c:GOACC_parallel only supports
> host-fallback execution, which doesn't pay attention to data clause at
> all (sizes and kinds formal parameters), so you're free to renumber
> GOMP_MAP_* if/where that makes sense.
> 
> > +/* Decrement usage count and deallocate if zero.  */
> > +GOMP_MAP_RELEASE = (GOMP_MAP_FLAG_ALWAYS
> > +| GOMP_MAP_FORCE_DEALLOC)
> >};
> 
> I have not yet read the OpenMP 4.1/4.5 standard, but it's not obvious to
> me here how the GOMP_MAP_FLAG_ALWAYS flag relates to the OpenMP release
> clause (GOMP_MAP_RELEASE here)?  Shouldn't GOMP_MAP_RELEASE be
> "(GOMP_MAP_FLAG_SPECIAL_1 | 3)" or similar?

It isn't related to always, but always really is something that affects
solely the data movement (i.e. to, from, tofrom), and while it can be
specified elsewhere, it makes no difference.  Wasting one bit just for that
is something we don't have the luxury for, which is why I've started using
that bit for other OpenMP stuff (it acts there like GOMP_MAP_FLAG_SPECIAL_2
to some extent).  It is not just release, but also the struct mapping etc.
I'll still need to make further changes, because the rules for mapping
structure element pointer/reference based array sections and structure
element references have changed again.

Some changes in the enum can be of course still be done until say mid stage3
but at least for OpenMP 4.0 we should keep backwards compatibility (so
whatever we've already used in GCC 4.9/5 should keep working).

Jakub


Re: Add simple sign-stripping cases to match.pd

2015-10-19 Thread Richard Sandiford
Richard Sandiford  writes:
> Marc Glisse  writes:
>> On Thu, 15 Oct 2015, Richard Sandiford wrote:
>>
>>> This patch makes sure that, for every simplification that uses
>>> fold_strip_sign_ops, there are associated match.pd rules for the
>>> leaf sign ops, i.e. abs, negate and copysign.  A follow-on patch
>>> will add a pass to handle more complex cases.
>>>
>>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>>> OK to install?
>>>
>>> Thanks,
>>> Richard
>>>
>>>
>>> gcc/
>>> * match.pd: Add rules to simplify ccos, ccosh, hypot, copysign
>>> and x*x in cases where the operands are sign ops.  Extend these
>>> rules to handle copysign as a sign op (including for cos, cosh
>>> and pow, which already treated negate and abs as sign ops).
>>>
>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>> index 83c48cd..4331df6 100644
>>> --- a/gcc/match.pd
>>> +++ b/gcc/match.pd
>>> @@ -61,6 +61,12 @@ along with GCC; see the file COPYING3.  If not see
>>> (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
>>> (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
>>> (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL)
>>> +(define_operator_list CCOS BUILT_IN_CCOSF BUILT_IN_CCOS BUILT_IN_CCOSL)
>>> +(define_operator_list CCOSH BUILT_IN_CCOSHF BUILT_IN_CCOSH BUILT_IN_CCOSHL)
>>> +(define_operator_list HYPOT BUILT_IN_HYPOTF BUILT_IN_HYPOT BUILT_IN_HYPOTL)
>>> +(define_operator_list COPYSIGN BUILT_IN_COPYSIGNF
>>> +  BUILT_IN_COPYSIGN
>>> +  BUILT_IN_COPYSIGNL)
>>>
>>> /* Simplifications of operations with one constant operand and
>>>simplifications to constants or single values.  */
>>> @@ -321,7 +327,69 @@ along with GCC; see the file COPYING3.  If not see
>>>(pows (op @0) REAL_CST@1)
>>>(with { HOST_WIDE_INT n; }
>>> (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
>>> - (pows @0 @1))
>>> + (pows @0 @1)
>>> + /* Strip negate and abs from both operands of hypot.  */
>>> + (for hypots (HYPOT)
>>> +  (simplify
>>> +   (hypots (op @0) @1)
>>> +   (hypots @0 @1))
>>> +  (simplify
>>> +   (hypots @0 (op @1))
>>> +   (hypots @0 @1)))
>>
>> Out of curiosity, would hypots:c have worked? (it is probably not worth 
>> gratuitously swapping the operands to save 3 lines though)
>
> Yeah, I think I'd prefer to keep it like it is if that's OK.
>
>>> + /* copysign(-x, y) and copysign(abs(x), y) -> copysign(x, y).  */
>>> + (for copysigns (COPYSIGN)
>>> +  (simplify
>>> +   (copysigns (op @0) @1)
>>> +   (copysigns @0 @1)))
>>> + /* -x*-x and abs(x)*abs(x) -> x*x.  Should be valid for all types.  */
>>> + (simplify
>>> +  (mult (op @0) (op @1))
>>> +  (mult @0 @0)))
>>
>> Typo @1 -> @0 ?
>
> Argh!  Thanks for catching that.  Wonder how many proof-reads that
> escaped :-(
>
>> This will partially duplicate Naveen's patch "Move some bit and binary 
>> optimizations in simplify and match".
>
> OK.  Should I just limit it to the abs case?
>
>>> +/* copysign(x,y)*copysign(x,y) -> x*x.  */
>>> +(for copysigns (COPYSIGN)
>>> + (simplify
>>> +  (mult (copysigns @0 @1) (copysigns @0 @1))
>>
>> (mult (copysigns@2 @0 @1) @2)
>> ? Or is there some reason not to rely on CSE? (I don't think copysign has 
>> any errno issue)
>
> No, simply didn't know about that trick.  I'll use it for the
> (mult (op @0) (op @0)) case as well.

Here's the updated patch.  I've kept the (mult (negate@1 @0) @1)
pattern for now, but can limit it to abs as necessary when
Naveen's patch goes in.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.

Thanks,
Richard



Re: Add simple sign-stripping cases to match.pd

2015-10-19 Thread Richard Sandiford
Richard Sandiford  writes:
> Richard Sandiford  writes:
>> Marc Glisse  writes:
>>> On Thu, 15 Oct 2015, Richard Sandiford wrote:
>>>
 This patch makes sure that, for every simplification that uses
 fold_strip_sign_ops, there are associated match.pd rules for the
 leaf sign ops, i.e. abs, negate and copysign.  A follow-on patch
 will add a pass to handle more complex cases.

 Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
 OK to install?

 Thanks,
 Richard


 gcc/
* match.pd: Add rules to simplify ccos, ccosh, hypot, copysign
and x*x in cases where the operands are sign ops.  Extend these
rules to handle copysign as a sign op (including for cos, cosh
and pow, which already treated negate and abs as sign ops).

 diff --git a/gcc/match.pd b/gcc/match.pd
 index 83c48cd..4331df6 100644
 --- a/gcc/match.pd
 +++ b/gcc/match.pd
 @@ -61,6 +61,12 @@ along with GCC; see the file COPYING3.  If not see
 (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
 (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
 (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL)
 +(define_operator_list CCOS BUILT_IN_CCOSF BUILT_IN_CCOS BUILT_IN_CCOSL)
 +(define_operator_list CCOSH BUILT_IN_CCOSHF BUILT_IN_CCOSH 
 BUILT_IN_CCOSHL)
 +(define_operator_list HYPOT BUILT_IN_HYPOTF BUILT_IN_HYPOT 
 BUILT_IN_HYPOTL)
 +(define_operator_list COPYSIGN BUILT_IN_COPYSIGNF
 + BUILT_IN_COPYSIGN
 + BUILT_IN_COPYSIGNL)

 /* Simplifications of operations with one constant operand and
simplifications to constants or single values.  */
 @@ -321,7 +327,69 @@ along with GCC; see the file COPYING3.  If not see
(pows (op @0) REAL_CST@1)
(with { HOST_WIDE_INT n; }
 (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
 - (pows @0 @1))
 + (pows @0 @1)
 + /* Strip negate and abs from both operands of hypot.  */
 + (for hypots (HYPOT)
 +  (simplify
 +   (hypots (op @0) @1)
 +   (hypots @0 @1))
 +  (simplify
 +   (hypots @0 (op @1))
 +   (hypots @0 @1)))
>>>
>>> Out of curiosity, would hypots:c have worked? (it is probably not worth 
>>> gratuitously swapping the operands to save 3 lines though)
>>
>> Yeah, I think I'd prefer to keep it like it is if that's OK.
>>
 + /* copysign(-x, y) and copysign(abs(x), y) -> copysign(x, y).  */
 + (for copysigns (COPYSIGN)
 +  (simplify
 +   (copysigns (op @0) @1)
 +   (copysigns @0 @1)))
 + /* -x*-x and abs(x)*abs(x) -> x*x.  Should be valid for all types.  */
 + (simplify
 +  (mult (op @0) (op @1))
 +  (mult @0 @0)))
>>>
>>> Typo @1 -> @0 ?
>>
>> Argh!  Thanks for catching that.  Wonder how many proof-reads that
>> escaped :-(
>>
>>> This will partially duplicate Naveen's patch "Move some bit and binary 
>>> optimizations in simplify and match".
>>
>> OK.  Should I just limit it to the abs case?
>>
 +/* copysign(x,y)*copysign(x,y) -> x*x.  */
 +(for copysigns (COPYSIGN)
 + (simplify
 +  (mult (copysigns @0 @1) (copysigns @0 @1))
>>>
>>> (mult (copysigns@2 @0 @1) @2)
>>> ? Or is there some reason not to rely on CSE? (I don't think copysign has 
>>> any errno issue)
>>
>> No, simply didn't know about that trick.  I'll use it for the
>> (mult (op @0) (op @0)) case as well.
>
> Here's the updated patch.  I've kept the (mult (negate@1 @0) @1)
> pattern for now, but can limit it to abs as necessary when
> Naveen's patch goes in.
>
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>
> Thanks,
> Richard

Er...

gcc/
* match.pd: Add rules to simplify ccos, ccosh, hypot, copysign
and x*x in cases where the operands are sign ops.  Extend these
rules to handle copysign as a sign op (including for cos, cosh
and pow, which already treated negate and abs as sign ops).

diff --git a/gcc/match.pd b/gcc/match.pd
index f3813d8..d677e69 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -61,6 +61,12 @@ along with GCC; see the file COPYING3.  If not see
 (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
 (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
 (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL)
+(define_operator_list CCOS BUILT_IN_CCOSF BUILT_IN_CCOS BUILT_IN_CCOSL)
+(define_operator_list CCOSH BUILT_IN_CCOSHF BUILT_IN_CCOSH BUILT_IN_CCOSHL)
+(define_operator_list HYPOT BUILT_IN_HYPOTF BUILT_IN_HYPOT BUILT_IN_HYPOTL)
+(define_operator_list COPYSIGN BUILT_IN_COPYSIGNF
+  BUILT_IN_COPYSIGN
+  BUILT_IN_COPYSIGNL)
 
 /* Simplifications of operations with one constant operand and
simplifications to constants or single values.  */
@@ -322

Re: [PATCH] Use GET_MODE_BITSIZE to get vector natural alignment

2015-10-19 Thread Uros Bizjak
On Fri, Oct 16, 2015 at 7:42 PM, H.J. Lu  wrote:
> Since GET_MODE_ALIGNMENT is defined by psABI and the biggest alignment
> is 4 byte for IA MCU psABI, we should use GET_MODE_BITSIZE to get
> vector natural alignment to check misaligned vector move.
>
> OK for trunk?
>
> Thanks.
>
> H.J.
> ---
> * config/i386/i386.c (ix86_expand_vector_move): Use
> GET_MODE_BITSIZE to get vector natural alignment.
> ---
>  gcc/config/i386/i386.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ebe2b0a..d0e1f4c 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -18650,7 +18650,9 @@ void
>  ix86_expand_vector_move (machine_mode mode, rtx operands[])
>  {
>rtx op0 = operands[0], op1 = operands[1];
> -  unsigned int align = GET_MODE_ALIGNMENT (mode);
> +  /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT since the
> + biggest alignment is 4 byte for IA MCU psABI.  */
> +  unsigned int align = GET_MODE_BITSIZE (mode);

How about using TARGET_IAMCU condition here and using bitsize only for
TARGET_IAMCU?

Uros.

>if (push_operand (op0, VOIDmode))
>  op0 = emit_move_resolve_push (mode, op0);
> --
> 2.4.3
>


[vec-cmp, patch 7/6] Vector comparison enabling in SLP

2015-10-19 Thread Ilya Enkovich
Hi,

It appeared our testsuite doesn't have a test which would require vector 
comparison support in SLP even after boolean pattern disabling.  This patch 
adds such tests and allow comparison for SLP.  Is it OK?

Thanks,
Ilya
--
gcc/

2015-10-19  Ilya Enkovich  

* tree-vect-slp.c (vect_build_slp_tree_1): Allow
comparison statements.
(vect_get_constant_vectors): Support boolean vector
constants.

gcc/testsuite/

2015-10-19  Ilya Enkovich  

* gcc.dg/vect/slp-cond-5.c: New test.

diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-5.c 
b/gcc/testsuite/gcc.dg/vect/slp-cond-5.c
new file mode 100644
index 000..5ade7d1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-5.c
@@ -0,0 +1,81 @@
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+
+#define N 128
+
+static inline int
+foo (int x, int y, int a, int b)
+{
+  if (x >= y && a > b)
+return a;
+  else
+return b;
+}
+
+__attribute__((noinline, noclone)) void
+bar (int * __restrict__ a, int * __restrict__ b,
+ int * __restrict__ c, int * __restrict__ d,
+ int * __restrict__ e, int w)
+{
+  int i;
+  for (i = 0; i < N/16; i++, a += 16, b += 16, c += 16, d += 16, e += 16)
+{
+  e[0] = foo (c[0], d[0], a[0] * w, b[0] * w);
+  e[1] = foo (c[1], d[1], a[1] * w, b[1] * w);
+  e[2] = foo (c[2], d[2], a[2] * w, b[2] * w);
+  e[3] = foo (c[3], d[3], a[3] * w, b[3] * w);
+  e[4] = foo (c[4], d[4], a[4] * w, b[4] * w);
+  e[5] = foo (c[5], d[5], a[5] * w, b[5] * w);
+  e[6] = foo (c[6], d[6], a[6] * w, b[6] * w);
+  e[7] = foo (c[7], d[7], a[7] * w, b[7] * w);
+  e[8] = foo (c[8], d[8], a[8] * w, b[8] * w);
+  e[9] = foo (c[9], d[9], a[9] * w, b[9] * w);
+  e[10] = foo (c[10], d[10], a[10] * w, b[10] * w);
+  e[11] = foo (c[11], d[11], a[11] * w, b[11] * w);
+  e[12] = foo (c[12], d[12], a[12] * w, b[12] * w);
+  e[13] = foo (c[13], d[13], a[13] * w, b[13] * w);
+  e[14] = foo (c[14], d[14], a[14] * w, b[14] * w);
+  e[15] = foo (c[15], d[15], a[15] * w, b[15] * w);
+}
+}
+
+
+int a[N], b[N], c[N], d[N], e[N];
+
+int main ()
+{
+  int i;
+
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+{
+  a[i] = i;
+  b[i] = 5;
+  e[i] = 0;
+
+  switch (i % 9)
+{
+case 0: asm (""); c[i] = i; d[i] = i + 1; break;
+case 1: c[i] = 0; d[i] = 0; break;
+case 2: c[i] = i + 1; d[i] = i - 1; break;
+case 3: c[i] = i; d[i] = i + 7; break;
+case 4: c[i] = i; d[i] = i; break;
+case 5: c[i] = i + 16; d[i] = i + 3; break;
+case 6: c[i] = i - 5; d[i] = i; break;
+case 7: c[i] = i; d[i] = i; break;
+case 8: c[i] = i; d[i] = i - 7; break;
+}
+}
+
+  bar (a, b, c, d, e, 2);
+  for (i = 0; i < N; i++)
+if (e[i] != ((i % 3) == 0 || i <= 5 ? 10 : 2 * i))
+  abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { i?86-*-* x86_64-*-* } } } } */
+
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 1424123..fa8291e 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -827,6 +827,7 @@ vect_build_slp_tree_1 (vec_info *vinfo,
  if (TREE_CODE_CLASS (rhs_code) != tcc_binary
  && TREE_CODE_CLASS (rhs_code) != tcc_unary
  && TREE_CODE_CLASS (rhs_code) != tcc_expression
+ && TREE_CODE_CLASS (rhs_code) != tcc_comparison
  && rhs_code != CALL_EXPR)
{
  if (dump_enabled_p ())
@@ -2596,7 +2597,14 @@ vect_get_constant_vectors (tree op, slp_tree slp_node,
   struct loop *loop;
   gimple_seq ctor_seq = NULL;
 
-  vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+  /* Check if vector type is a boolean vector.  */
+  if (TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE
+  && (VECTOR_BOOLEAN_TYPE_P (STMT_VINFO_VECTYPE (stmt_vinfo))
+ || (code == COND_EXPR && op_num < 2)))
+vector_type
+  = build_same_sized_truth_vector_type (STMT_VINFO_VECTYPE (stmt_vinfo));
+  else
+vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
   nunits = TYPE_VECTOR_SUBPARTS (vector_type);
 
   if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def
@@ -2768,8 +2776,21 @@ vect_get_constant_vectors (tree op, slp_tree slp_node,
{
  if (CONSTANT_CLASS_P (op))
{
- op = fold_unary (VIEW_CONVERT_EXPR,
-  TREE_TYPE (vector_type), op);
+ if (VECTOR_BOOLEAN_TYPE_P (vector_type))
+   {
+ /* Can't use VIEW_CONVERT_EXPR for booleans because
+of possibly different sizes of scalar value and
+vector element.  */
+ if (integer_zerop (op))
+   op = build_int_cst (TREE_TYPE (vector_type), 0);
+ else if (integer_onep (op))
+   op 

Re: [PATCH] PR target/67995: __attribute__ ((target("arch=XXX"))) enables unsupported ISA

2015-10-19 Thread Uros Bizjak
On Sat, Oct 17, 2015 at 12:38 AM, H.J. Lu  wrote:
> When processing __attribute__ ((target("arch=XXX"))), we should clear
> the ISA bits in x_ix86_isa_flags first to avoid leaking ISA from
> command line.
>
> Tested on x86-64.  OK for trunk?

OK.

Thanks,
Uros.

> Thanks.
>
> H.J.
> ---
> gcc/
>
> PR target/67995
> * config/i386/i386.c (ix86_valid_target_attribute_tree): If
> arch= is set,  clear all bits in x_ix86_isa_flags, except for
> ISA_64BIT, ABI_64, ABI_X32, and CODE16.
>
> gcc/testsuite/
>
> PR target/67995
> * gcc.target/i386/pr67995-1.c: New test.
> * gcc.target/i386/pr67995-2.c: Likewise.
> * gcc.target/i386/pr67995-3.c: Likewise.
> ---
>  gcc/config/i386/i386.c| 13 -
>  gcc/testsuite/gcc.target/i386/pr67995-1.c | 16 
>  gcc/testsuite/gcc.target/i386/pr67995-2.c | 16 
>  gcc/testsuite/gcc.target/i386/pr67995-3.c | 16 
>  4 files changed, 60 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67995-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67995-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67995-3.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index d0e1f4c..b0281c9 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -6145,7 +6145,18 @@ ix86_valid_target_attribute_tree (tree args,
>/* If we are using the default tune= or arch=, undo the string 
> assigned,
>  and use the default.  */
>if (option_strings[IX86_FUNCTION_SPECIFIC_ARCH])
> -   opts->x_ix86_arch_string = 
> option_strings[IX86_FUNCTION_SPECIFIC_ARCH];
> +   {
> + opts->x_ix86_arch_string
> +   = option_strings[IX86_FUNCTION_SPECIFIC_ARCH];
> +
> + /* If arch= is set,  clear all bits in x_ix86_isa_flags,
> +except for ISA_64BIT, ABI_64, ABI_X32, and CODE16.  */
> + opts->x_ix86_isa_flags &= (OPTION_MASK_ISA_64BIT
> +| OPTION_MASK_ABI_64
> +| OPTION_MASK_ABI_X32
> +| OPTION_MASK_CODE16);
> +
> +   }
>else if (!orig_arch_specified)
> opts->x_ix86_arch_string = NULL;
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr67995-1.c 
> b/gcc/testsuite/gcc.target/i386/pr67995-1.c
> new file mode 100644
> index 000..072b1fe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr67995-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=haswell" } */
> +
> +unsigned int
> +__attribute__ ((target("arch=core2")))
> +__x86_rdrand(void)
> +{
> +  unsigned int retries = 100;
> +  unsigned int val;
> +
> +  while (__builtin_ia32_rdrand32_step(&val) == 0) /* { dg-error "needs isa 
> option" } */
> +if (--retries == 0)
> +  return 0;
> +
> +  return val;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr67995-2.c 
> b/gcc/testsuite/gcc.target/i386/pr67995-2.c
> new file mode 100644
> index 000..632bb63
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr67995-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=core2" } */
> +
> +unsigned int
> +__attribute__ ((target("arch=haswell")))
> +__x86_rdrand(void)
> +{
> +  unsigned int retries = 100;
> +  unsigned int val;
> +
> +  while (__builtin_ia32_rdrand32_step(&val) == 0)
> +if (--retries == 0)
> +  return 0;
> +
> +  return val;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr67995-3.c 
> b/gcc/testsuite/gcc.target/i386/pr67995-3.c
> new file mode 100644
> index 000..11993b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr67995-3.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=core2" } */
> +
> +unsigned int
> +__attribute__ ((target("rdrnd")))
> +__x86_rdrand(void)
> +{
> +  unsigned int retries = 100;
> +  unsigned int val;
> +
> +  while (__builtin_ia32_rdrand32_step(&val) == 0)
> +if (--retries == 0)
> +  return 0;
> +
> +  return val;
> +}
> --
> 2.4.3
>


Re: [PATCH, rs6000] Pass --secure-plt to the linker

2015-10-19 Thread Alan Modra
On Thu, Oct 15, 2015 at 06:50:50PM +0100, Szabolcs Nagy wrote:
> A powerpc toolchain built with (or without) --enable-secureplt
> currently creates a binary that uses bss plt if
> 
> (1) any of the linked PIC objects have bss plt relocs
> (2) or all the linked objects are non-PIC or have no relocs,
> 
> because this is the binutils linker behaviour.
> 
> This patch passes --secure-plt to the linker which makes the linker
> warn in case (1) and produce a binary with secure plt in case (2).

The idea is OK I think, but

> @@ -574,6 +577,7 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
>  %{R*} \
>  %(link_shlib) \
>  %{!T*: %(link_start) } \
> +%{!static: %(link_secure_plt_default)} \
>  %(link_os)"

this change needs to be conditional on !mbss-plt too.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Use GET_MODE_BITSIZE to get vector natural alignment

2015-10-19 Thread H.J. Lu
On Mon, Oct 19, 2015 at 4:05 AM, Uros Bizjak  wrote:
> On Fri, Oct 16, 2015 at 7:42 PM, H.J. Lu  wrote:
>> Since GET_MODE_ALIGNMENT is defined by psABI and the biggest alignment
>> is 4 byte for IA MCU psABI, we should use GET_MODE_BITSIZE to get
>> vector natural alignment to check misaligned vector move.
>>
>> OK for trunk?
>>
>> Thanks.
>>
>> H.J.
>> ---
>> * config/i386/i386.c (ix86_expand_vector_move): Use
>> GET_MODE_BITSIZE to get vector natural alignment.
>> ---
>>  gcc/config/i386/i386.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index ebe2b0a..d0e1f4c 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -18650,7 +18650,9 @@ void
>>  ix86_expand_vector_move (machine_mode mode, rtx operands[])
>>  {
>>rtx op0 = operands[0], op1 = operands[1];
>> -  unsigned int align = GET_MODE_ALIGNMENT (mode);
>> +  /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT since the
>> + biggest alignment is 4 byte for IA MCU psABI.  */
>> +  unsigned int align = GET_MODE_BITSIZE (mode);
>
> How about using TARGET_IAMCU condition here and using bitsize only for
> TARGET_IAMCU?
>

Works for me.  Is it OK with that change?


-- 
H.J.


Re: [PATCH] Use GET_MODE_BITSIZE to get vector natural alignment

2015-10-19 Thread Uros Bizjak
On Mon, Oct 19, 2015 at 1:12 PM, H.J. Lu  wrote:
> On Mon, Oct 19, 2015 at 4:05 AM, Uros Bizjak  wrote:
>> On Fri, Oct 16, 2015 at 7:42 PM, H.J. Lu  wrote:
>>> Since GET_MODE_ALIGNMENT is defined by psABI and the biggest alignment
>>> is 4 byte for IA MCU psABI, we should use GET_MODE_BITSIZE to get
>>> vector natural alignment to check misaligned vector move.
>>>
>>> OK for trunk?
>>>
>>> Thanks.
>>>
>>> H.J.
>>> ---
>>> * config/i386/i386.c (ix86_expand_vector_move): Use
>>> GET_MODE_BITSIZE to get vector natural alignment.
>>> ---
>>>  gcc/config/i386/i386.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> index ebe2b0a..d0e1f4c 100644
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -18650,7 +18650,9 @@ void
>>>  ix86_expand_vector_move (machine_mode mode, rtx operands[])
>>>  {
>>>rtx op0 = operands[0], op1 = operands[1];
>>> -  unsigned int align = GET_MODE_ALIGNMENT (mode);
>>> +  /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT since the
>>> + biggest alignment is 4 byte for IA MCU psABI.  */
>>> +  unsigned int align = GET_MODE_BITSIZE (mode);
>>
>> How about using TARGET_IAMCU condition here and using bitsize only for
>> TARGET_IAMCU?
>>
>
> Works for me.  Is it OK with that change?

Yes.

Thanks,
Uros.


Re: [PATCH 1/9] ENABLE_CHECKING refactoring

2015-10-19 Thread Bernd Schmidt

On 10/18/2015 08:17 AM, Mikhail Maltsev wrote:

On 10/12/2015 11:57 PM, Jeff Law wrote:

-#ifdef ENABLE_CHECKING
+#if CHECKING_P


I fail to see the point of this change.

I'm guessing (and Mikhail, please correct me if I'm wrong), but I think he's
trying to get away from ENABLE_CHECKING and instead use a macro which is
always defined to a value.

Yes, exactly. Such macro is better because it can be used both for conditional
compilation (if needed) and normal if-s (unlike ENABLE_CHECKING).


But for normal C conditions the patches end up using flag_checking, so 
the CHECKING_P macro buys us nothing over ENABLE_CHECKING. A change like 
this is just churn: changing things without making forward progress, and 
every change like that will cause someone else grief when they have to 
adjust their own out-of-tree patches. (It's something I think we've been 
doing too much lately, and others have complained to me about this issue 
as well).


I'm ok with pretty much all of the rest of the changes (some minor 
comments to follow), so if you could eliminate CHECKING_P I'd be likely 
to approve them.



Bernd


Re: Move some bit and binary optimizations in simplify and match

2015-10-19 Thread Hurugalawadi, Naveen
Hi,

Please find attached the modified patch of duplicate patterns which were
posted in the earlier part.

Please review them and let me know if any further modifications are required.

Thanks,
Naveendiff --git a/gcc/fold-const.c b/gcc/fold-const.c
index de45a2c..b36e2f5 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9232,26 +9232,6 @@ fold_binary_loc (location_t loc,
   return NULL_TREE;
 
 case PLUS_EXPR:
-  if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
-	{
-	  /* X + (X / CST) * -CST is X % CST.  */
-	  if (TREE_CODE (arg1) == MULT_EXPR
-	  && TREE_CODE (TREE_OPERAND (arg1, 0)) == TRUNC_DIV_EXPR
-	  && operand_equal_p (arg0,
-  TREE_OPERAND (TREE_OPERAND (arg1, 0), 0), 0))
-	{
-	  tree cst0 = TREE_OPERAND (TREE_OPERAND (arg1, 0), 1);
-	  tree cst1 = TREE_OPERAND (arg1, 1);
-	  tree sum = fold_binary_loc (loc, PLUS_EXPR, TREE_TYPE (cst1),
-  cst1, cst0);
-	  if (sum && integer_zerop (sum))
-		return fold_convert_loc (loc, type,
-	 fold_build2_loc (loc, TRUNC_MOD_EXPR,
-		  TREE_TYPE (arg0), arg0,
-		  cst0));
-	}
-	}
-
   /* Handle (A1 * C1) + (A2 * C2) with A1, A2 or C1, C2 being the same or
 	 one.  Make sure the type is not saturating and has the signedness of
 	 the stripped operands, as fold_plusminus_mult_expr will re-associate.
@@ -9692,28 +9672,6 @@ fold_binary_loc (location_t loc,
 			fold_convert_loc (loc, type,
 	  TREE_OPERAND (arg0, 0)));
 
-  if (! FLOAT_TYPE_P (type))
-	{
-	  /* Fold (A & ~B) - (A & B) into (A ^ B) - B, where B is
-	 any power of 2 minus 1.  */
-	  if (TREE_CODE (arg0) == BIT_AND_EXPR
-	  && TREE_CODE (arg1) == BIT_AND_EXPR
-	  && operand_equal_p (TREE_OPERAND (arg0, 0),
-  TREE_OPERAND (arg1, 0), 0))
-	{
-	  tree mask0 = TREE_OPERAND (arg0, 1);
-	  tree mask1 = TREE_OPERAND (arg1, 1);
-	  tree tem = fold_build1_loc (loc, BIT_NOT_EXPR, type, mask0);
-
-	  if (operand_equal_p (tem, mask1, 0))
-		{
-		  tem = fold_build2_loc (loc, BIT_XOR_EXPR, type,
- TREE_OPERAND (arg0, 0), mask1);
-		  return fold_build2_loc (loc, MINUS_EXPR, type, tem, mask1);
-		}
-	}
-	}
-
   /* Fold __complex__ ( x, 0 ) - __complex__ ( 0, y ) to
 	 __complex__ ( x, -y ).  This is not the same for SNaNs or if
 	 signed zeros are involved.  */
@@ -10013,28 +9971,6 @@ fold_binary_loc (location_t loc,
 arg1);
 	}
 
-  /* (X & ~Y) | (~X & Y) is X ^ Y */
-  if (TREE_CODE (arg0) == BIT_AND_EXPR
-	  && TREE_CODE (arg1) == BIT_AND_EXPR)
-{
-	  tree a0, a1, l0, l1, n0, n1;
-
-	  a0 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 0));
-	  a1 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 1));
-
-	  l0 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
-	  l1 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1));
-	  
-	  n0 = fold_build1_loc (loc, BIT_NOT_EXPR, type, l0);
-	  n1 = fold_build1_loc (loc, BIT_NOT_EXPR, type, l1);
-	  
-	  if ((operand_equal_p (n0, a0, 0)
-	   && operand_equal_p (n1, a1, 0))
-	  || (operand_equal_p (n0, a1, 0)
-		  && operand_equal_p (n1, a0, 0)))
-	return fold_build2_loc (loc, BIT_XOR_EXPR, type, l0, n1);
-	}
-
   /* See if this can be simplified into a rotate first.  If that
 	 is unsuccessful continue in the association code.  */
   goto bit_rotate;
diff --git a/gcc/match.pd b/gcc/match.pd
index f3813d8..5ee345e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -324,6 +324,42 @@ along with GCC; see the file COPYING3.  If not see
 (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
  (pows @0 @1))
 
+/* Fold X + (X / CST) * -CST to X % CST.  */
+(simplify
+ (plus (convert? @0) (convert? (mult (trunc_div @0 @1) (negate @1
+  (if (INTEGRAL_TYPE_P (type)
+   && tree_nop_conversion_p (type, TREE_TYPE (@0)))
+   (trunc_mod (convert @0) (convert @1
+(simplify
+ (plus (convert? @0) (convert? (mult (trunc_div @0 INTEGER_CST@1) INTEGER_CST@2)))
+  (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
+   && wi::add (@1, @2) == 0)
+   (trunc_mod (convert @0) (convert @1
+
+/* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
+(simplify
+ (minus (bit_and:s @0 (bit_not @1)) (bit_and:s @0 @1))
+  (if (! FLOAT_TYPE_P (type))
+   (minus (bit_xor @0 @1) @1)))
+(simplify
+ (minus (bit_and:s @0 INTEGER_CST@2) (bit_and:s @0 INTEGER_CST@1))
+ (if (! FLOAT_TYPE_P (type)
+  && wi::eq_p (const_unop (BIT_NOT_EXPR, TREE_TYPE (type), @2), @1))
+  (minus (bit_xor @0 @1) @1)))
+
+/* Simplify (X & ~Y) | (~X & Y) -> X ^ Y.  */
+(simplify
+ (bit_ior (bit_and:c @0 (bit_not @1)) (bit_and:c (bit_not @0) @1))
+  (bit_xor @0 @1))
+(simplify
+ (bit_ior (bit_and:c @0 INTEGER_CST@2) (bit_and:c (bit_not @0) INTEGER_CST@1))
+  (if (wi::eq_p (const_unop (BIT_NOT_EXPR, TREE_TYPE (type), @2), @1))
+   (bit_xor @0 @1)))
+(simplify
+ (bit_ior (bit_and:c INTEGER_CST@0 (bit_not @1)) (bit_and:c (bit_not INTEGER_CST@2) @1))
+  (if (wi::eq_p (const_unop (BIT_NOT_EXPR, T

Re: Move some bit and binary optimizations in simplify and match

2015-10-19 Thread Hurugalawadi, Naveen
Hi,

>> That's not what Richard meant. We already have:

Done. As per the comments.

Please find attached the modified patch as per your comments.

Please review them and let me know if any further modifications are required.

Thanks,
Naveendiff --git a/gcc/fold-const.c b/gcc/fold-const.c
index de45a2c..1e7fbb4 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9803,20 +9803,6 @@ fold_binary_loc (location_t loc,
   goto associate;
 
 case MULT_EXPR:
-  /* (-A) * (-B) -> A * B  */
-  if (TREE_CODE (arg0) == NEGATE_EXPR && negate_expr_p (arg1))
-	return fold_build2_loc (loc, MULT_EXPR, type,
-			fold_convert_loc (loc, type,
-	  TREE_OPERAND (arg0, 0)),
-			fold_convert_loc (loc, type,
-	  negate_expr (arg1)));
-  if (TREE_CODE (arg1) == NEGATE_EXPR && negate_expr_p (arg0))
-	return fold_build2_loc (loc, MULT_EXPR, type,
-			fold_convert_loc (loc, type,
-	  negate_expr (arg0)),
-			fold_convert_loc (loc, type,
-	  TREE_OPERAND (arg1, 0)));
-
   if (! FLOAT_TYPE_P (type))
 	{
 	  /* Transform x * -C into -x * C if x is easily negatable.  */
@@ -9830,16 +9816,6 @@ fold_binary_loc (location_t loc,
 		  negate_expr (arg0)),
 tem);
 
-	  /* (a * (1 << b)) is (a << b)  */
-	  if (TREE_CODE (arg1) == LSHIFT_EXPR
-	  && integer_onep (TREE_OPERAND (arg1, 0)))
-	return fold_build2_loc (loc, LSHIFT_EXPR, type, op0,
-TREE_OPERAND (arg1, 1));
-	  if (TREE_CODE (arg0) == LSHIFT_EXPR
-	  && integer_onep (TREE_OPERAND (arg0, 0)))
-	return fold_build2_loc (loc, LSHIFT_EXPR, type, op1,
-TREE_OPERAND (arg0, 1));
-
 	  /* (A + A) * C -> A * 2 * C  */
 	  if (TREE_CODE (arg0) == PLUS_EXPR
 	  && TREE_CODE (arg1) == INTEGER_CST
@@ -9882,21 +9858,6 @@ fold_binary_loc (location_t loc,
 	}
   else
 	{
-	  /* Convert (C1/X)*C2 into (C1*C2)/X.  This transformation may change
- the result for floating point types due to rounding so it is applied
- only if -fassociative-math was specify.  */
-	  if (flag_associative_math
-	  && TREE_CODE (arg0) == RDIV_EXPR
-	  && TREE_CODE (arg1) == REAL_CST
-	  && TREE_CODE (TREE_OPERAND (arg0, 0)) == REAL_CST)
-	{
-	  tree tem = const_binop (MULT_EXPR, TREE_OPERAND (arg0, 0),
-  arg1);
-	  if (tem)
-		return fold_build2_loc (loc, RDIV_EXPR, type, tem,
-TREE_OPERAND (arg0, 1));
-	}
-
   /* Strip sign operations from X in X*X, i.e. -Y*-Y -> Y*Y.  */
 	  if (operand_equal_p (arg0, arg1, 0))
 	{
@@ -10053,22 +10014,6 @@ fold_binary_loc (location_t loc,
   goto bit_rotate;
 
 case BIT_AND_EXPR:
-  /* ~X & X, (X == 0) & X, and !X & X are always zero.  */
-  if ((TREE_CODE (arg0) == BIT_NOT_EXPR
-	   || TREE_CODE (arg0) == TRUTH_NOT_EXPR
-	   || (TREE_CODE (arg0) == EQ_EXPR
-	   && integer_zerop (TREE_OPERAND (arg0, 1
-	  && operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0))
-	return omit_one_operand_loc (loc, type, integer_zero_node, arg1);
-
-  /* X & ~X , X & (X == 0), and X & !X are always zero.  */
-  if ((TREE_CODE (arg1) == BIT_NOT_EXPR
-	   || TREE_CODE (arg1) == TRUTH_NOT_EXPR
-	   || (TREE_CODE (arg1) == EQ_EXPR
-	   && integer_zerop (TREE_OPERAND (arg1, 1
-	  && operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
-	return omit_one_operand_loc (loc, type, integer_zero_node, arg0);
-
   /* Fold (X ^ 1) & 1 as (X & 1) == 0.  */
   if (TREE_CODE (arg0) == BIT_XOR_EXPR
 	  && INTEGRAL_TYPE_P (type)
diff --git a/gcc/match.pd b/gcc/match.pd
index f3813d8..04b6138 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -324,6 +324,27 @@ along with GCC; see the file COPYING3.  If not see
 (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
  (pows @0 @1))
 
+/* Fold (a * (1 << b)) into (a << b)  */
+(simplify
+ (mult:c @0 (convert? (lshift integer_onep@1 @2)))
+  (if (! FLOAT_TYPE_P (type)
+   && tree_nop_conversion_p (type, TREE_TYPE (@1)))
+   (lshift @0 @2)))
+
+/* Fold (C1/X)*C2 into (C1*C2)/X.  */
+(simplify
+ (mult (rdiv:s REAL_CST@0 @1) REAL_CST@2)
+  (if (flag_associative_math)
+   (with
+{ tree tem = const_binop (MULT_EXPR, type, @0, @2); }
+(if (tem)
+ (rdiv { tem; } @1)
+
+/* Simplify ~X & X as zero.  */
+(simplify
+ (bit_and:c (convert? @0) (convert? (bit_not @0)))
+  { build_zero_cst (type); })
+
 /* X % Y is smaller than Y.  */
 (for cmp (lt ge)
  (simplify
@@ -543,6 +564,13 @@ along with GCC; see the file COPYING3.  If not see
 (match negate_expr_p
  VECTOR_CST
  (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type
+
+/* (-A) * (-B) -> A * B  */
+(simplify
+ (mult:c (convert1? (negate @0)) (convert2? negate_expr_p@1))
+  (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
+   && tree_nop_conversion_p (type, TREE_TYPE (@1)))
+   (mult (convert @0) (convert (negate @1)
  
 /* -(A + B) -> (-B) - A.  */
 (simplify
@@ -629,6 +657,8 @@ along with GCC; see the file COPYING3.  If not see
   (truth_not 

[mask-vec_cond, patch 3/2] SLP support

2015-10-19 Thread Ilya Enkovich
Hi,

This patch adds missing support for cond_expr with no embedded comparison in 
SLP.  No new test added because vec cmp SLP test becomes (due to changes in 
bool patterns by the first patch) a regression test for this patch.  Does it 
look OK?

Thanks,
Ilya
--
gcc/

2015-10-19  Ilya Enkovich  

* tree-vect-slp.c (vect_get_and_check_slp_defs): Allow
cond_exp with no embedded comparison.
(vect_build_slp_tree_1): Likewise.


diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index fa8291e..48311dd 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -257,7 +257,8 @@ vect_get_and_check_slp_defs (vec_info *vinfo,
 {
   enum tree_code code = gimple_assign_rhs_code (stmt);
   number_of_oprnds = gimple_num_ops (stmt) - 1;
-  if (gimple_assign_rhs_code (stmt) == COND_EXPR)
+  if (gimple_assign_rhs_code (stmt) == COND_EXPR
+ && COMPARISON_CLASS_P (gimple_assign_rhs1 (stmt)))
{
  first_op_cond = true;
  commutative = true;
@@ -482,7 +483,6 @@ vect_build_slp_tree_1 (vec_info *vinfo,
   machine_mode vec_mode;
   HOST_WIDE_INT dummy;
   gimple *first_load = NULL, *prev_first_load = NULL;
-  tree cond;
 
   /* For every stmt in NODE find its def stmt/s.  */
   FOR_EACH_VEC_ELT (stmts, i, stmt)
@@ -527,24 +527,6 @@ vect_build_slp_tree_1 (vec_info *vinfo,
  return false;
}
 
-   if (is_gimple_assign (stmt)
-  && gimple_assign_rhs_code (stmt) == COND_EXPR
-   && (cond = gimple_assign_rhs1 (stmt))
-   && !COMPARISON_CLASS_P (cond))
-{
-  if (dump_enabled_p ())
-{
-  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, 
-  "Build SLP failed: condition is not "
-  "comparison ");
-  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
-  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
-}
- /* Fatal mismatch.  */
- matches[0] = false;
-  return false;
-}
-
   scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);
   vectype = get_vectype_for_scalar_type (scalar_type);
   if (!vectype)


Re: [PATCH] PR middle-end/68002: introduce -fkeep-static-functions

2015-10-19 Thread H.J. Lu
On Mon, Oct 19, 2015 at 2:18 AM, Richard Biener
 wrote:
> On Sat, Oct 17, 2015 at 5:17 PM, VandeVondele  Joost
>  wrote:
>> In some cases (e.g. coverage testing) it is useful to emit code for static 
>> functions even if they are never used, which currently is not possible at 
>> -O1 and above. The following patch introduces a flag for this, which 
>> basically triggers the same code that keeps those functions alive at -O0. 
>> Thanks to Marc Glisse for replying at gcc-help and for suggesting where to 
>> look.
>>
>> Bootstrapped and regtested on x86_64-unknown-linux-gnu
>>
>> OK for trunk ?
>
> Ok.

I checked in this as an obvious fix.

-- 
H.J.
---
Index: ChangeLog
===
--- ChangeLog (revision 228967)
+++ ChangeLog (working copy)
@@ -1,5 +1,9 @@
 2015-10-19  H.J. Lu  

+ * doc/invoke.texi: Replace @optindex with @opindex.
+
+2015-10-19  H.J. Lu  
+
  PR target/67995
  * config/i386/i386.c (ix86_valid_target_attribute_tree): If
  arch= is set,  clear all bits in x_ix86_isa_flags, except for
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 228967)
+++ doc/invoke.texi (working copy)
@@ -8014,7 +8014,7 @@ of its callers.  This switch does not af
 inline functions into the object file.

 @item -fkeep-static-functions
-@optindex fkeep-static-functions
+@opindex fkeep-static-functions
 Emit @code{static} functions into the object file, even if the function
 is never used.


Re: Add a pass to back-propagate use information

2015-10-19 Thread Richard Biener
On Thu, Oct 15, 2015 at 3:17 PM, Richard Sandiford
 wrote:
> This patch adds a pass that collects information that is common to all
> uses of an SSA name X and back-propagates that information up the statements
> that generate X.  The general idea is to use the information to simplify
> instructions (rather than a pure DCE) so I've simply called it
> tree-ssa-backprop.c, to go with tree-ssa-forwprop.c.
>
> At the moment the only use of the pass is to remove unnecessry sign
> operations, so that it's effectively a global version of
> fold_strip_sign_ops.  I'm hoping it could be extended in future to
> record which bits of an integer are significant.  There are probably
> other potential uses too.
>
> A later patch gets rid of fold_strip_sign_ops.
>
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
> OK to install?
>
> Thanks,
> Richard
>
>
> gcc/
> * doc/invoke.texi (-fdump-tree-backprop, -ftree-backprop): Document.
> * Makefile.in (OBJS): Add tree-ssa-backprop.o.
> * common.opt (ftree-backprop): New option.
> * fold-const.h (negate_mathfn_p): Declare.
> * fold-const.c (negate_mathfn_p): Make public.
> * timevar.def (TV_TREE_BACKPROP): New.
> * tree-passes.h (make_pass_backprop): Declare.
> * passes.def (pass_backprop): Add.
> * tree-ssa-backprop.c: New file.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/backprop-1.c, gcc.dg/tree-ssa/backprop-2.c,
> gcc.dg/tree-ssa/backprop-3.c, gcc.dg/tree-ssa/backprop-4.c,
> gcc.dg/tree-ssa/backprop-5.c, gcc.dg/tree-ssa/backprop-6.c: New tests.
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 783e4c9..69e669d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1445,6 +1445,7 @@ OBJS = \
> tree-switch-conversion.o \
> tree-ssa-address.o \
> tree-ssa-alias.o \
> +   tree-ssa-backprop.o \
> tree-ssa-ccp.o \
> tree-ssa-coalesce.o \
> tree-ssa-copy.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 5060208..5aef625 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2364,6 +2364,10 @@ ftree-pta
>  Common Report Var(flag_tree_pta) Optimization
>  Perform function-local points-to analysis on trees.
>
> +ftree-backprop
> +Common Report Var(flag_tree_backprop) Init(1) Optimization
> +Enable backward propagation of use properties at the tree level.

Don't add new -ftree-* "tree" doesn't add any info for our users.  I'd
also refer to SSA level rather than "tree" level.  Not sure if -fbackprop
is good, but let's go for it.

> +
>  ftree-reassoc
>  Common Report Var(flag_tree_reassoc) Init(1) Optimization
>  Enable reassociation on tree level
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 54e9f12..fe15d08 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -343,6 +343,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fdump-tree-dse@r{[}-@var{n}@r{]} @gol
>  -fdump-tree-phiprop@r{[}-@var{n}@r{]} @gol
>  -fdump-tree-phiopt@r{[}-@var{n}@r{]} @gol
> +-fdump-tree-backprop@r{[}-@var{n}@r{]} @gol
>  -fdump-tree-forwprop@r{[}-@var{n}@r{]} @gol
>  -fdump-tree-nrv -fdump-tree-vect @gol
>  -fdump-tree-sink @gol
> @@ -451,8 +452,8 @@ Objective-C and Objective-C++ Dialects}.
>  -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
>  -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
>  -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
> --ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
> --ftree-loop-if-convert-stores -ftree-loop-im @gol
> +-ftree-dse -ftree-backprop -ftree-forwprop -ftree-fre @gol
> +-ftree-loop-if-convert -ftree-loop-if-convert-stores -ftree-loop-im @gol
>  -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
>  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
>  -ftree-loop-vectorize @gol
> @@ -7236,6 +7237,12 @@ name is made by appending @file{.dse} to the source 
> file name.
>  Dump each function after optimizing PHI nodes into straightline code.  The 
> file
>  name is made by appending @file{.phiopt} to the source file name.
>
> +@item backprop
> +@opindex fdump-tree-backprop
> +Dump each function after back-propagating use information up the definition
> +chain.  The file name is made by appending @file{.backprop} to the
> +source file name.
> +
>  @item forwprop
>  @opindex fdump-tree-forwprop
>  Dump each function after forward propagating single use variables.  The file
> @@ -7716,6 +7723,7 @@ compilation time.
>  -ftree-dce @gol
>  -ftree-dominator-opts @gol
>  -ftree-dse @gol
> +-ftree-backprop @gol
>  -ftree-forwprop @gol
>  -ftree-fre @gol
>  -ftree-phiprop @gol
> @@ -8658,6 +8666,13 @@ enabled by default at @option{-O2} and @option{-O3}.
>  Make partial redundancy elimination (PRE) more aggressive.  This flag is
>  enabled by default at @option{-O3}.
>
> +@item -ftree-backprop
> +@opindex ftree-backprop
> +Propagate information about uses of a value

Re: Move some bit and binary optimizations in simplify and match

2015-10-19 Thread Marc Glisse

+/* Fold X + (X / CST) * -CST to X % CST.  */

This one is still wrong. It is extremely similar to X-(X/CST)*CST, and the 
current version of that one in match.pd is broken, we should fix that one 
first.


+/* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
+(simplify
+ (minus (bit_and:s @0 (bit_not @1)) (bit_and:s @0 @1))
+  (if (! FLOAT_TYPE_P (type))
+   (minus (bit_xor @0 @1) @1)))

I don't understand the point of the FLOAT_TYPE_P check.

Will we also simplify (A & B) - (A & ~B) into B - (A ^ B) ?

+(simplify
+ (minus (bit_and:s @0 INTEGER_CST@2) (bit_and:s @0 INTEGER_CST@1))
+ (if (! FLOAT_TYPE_P (type)
+  && wi::eq_p (const_unop (BIT_NOT_EXPR, TREE_TYPE (type), @2), @1))

TREE_TYPE (type) ???

+  (minus (bit_xor @0 @1) @1)))

(just a random comment, not for your patch)
When we generalize this to vector, should that be:
operand_equal_p (const_unop (BIT_NOT_EXPR, type, @2), @1, OEP_ONLY_CONST)
or maybe
integer_all_onesp (const_binop (BIT_XOR_EXPR, type, @2, @1))
?

+/* Simplify (X & ~Y) | (~X & Y) -> X ^ Y.  */
+(simplify
+ (bit_ior (bit_and:c @0 (bit_not @1)) (bit_and:c (bit_not @0) @1))
+  (bit_xor @0 @1))

:c on bit_ior? It should also allow you to merge the 2 CST versions into 
one.


+ (bit_ior (bit_and:c INTEGER_CST@0 (bit_not @1)) (bit_and:c (bit_not 
INTEGER_CST@2) @1))


gcc always puts the constant last in bit_and, so
(bit_and (bit_not @1) INTEGER_CST@0)

You still have a (bit_not INTEGER_CST@2)...

-/* X & !X -> 0.  */
+/* X & !X or X & ~X -> 0.  */
 (simplify
  (bit_and:c @0 (logical_inverted_value @0))
- { build_zero_cst (type); })
+  { build_zero_cst (type); })
 /* X | !X and X ^ !X -> 1, , if X is truth-valued.  */
 (for op (bit_ior bit_xor)
  (simplify

I think that was already in your other patch, and I am not really in favor 
of the indentation change (or the comment).


--
Marc Glisse


Re: [PATCH] Use GET_MODE_BITSIZE to get vector natural alignment

2015-10-19 Thread H.J. Lu
On Mon, Oct 19, 2015 at 4:12 AM, Uros Bizjak  wrote:
> On Mon, Oct 19, 2015 at 1:12 PM, H.J. Lu  wrote:
>> On Mon, Oct 19, 2015 at 4:05 AM, Uros Bizjak  wrote:
>>> On Fri, Oct 16, 2015 at 7:42 PM, H.J. Lu  wrote:
 Since GET_MODE_ALIGNMENT is defined by psABI and the biggest alignment
 is 4 byte for IA MCU psABI, we should use GET_MODE_BITSIZE to get
 vector natural alignment to check misaligned vector move.

 OK for trunk?

 Thanks.

 H.J.
 ---
 * config/i386/i386.c (ix86_expand_vector_move): Use
 GET_MODE_BITSIZE to get vector natural alignment.
 ---
  gcc/config/i386/i386.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index ebe2b0a..d0e1f4c 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -18650,7 +18650,9 @@ void
  ix86_expand_vector_move (machine_mode mode, rtx operands[])
  {
rtx op0 = operands[0], op1 = operands[1];
 -  unsigned int align = GET_MODE_ALIGNMENT (mode);
 +  /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT since the
 + biggest alignment is 4 byte for IA MCU psABI.  */
 +  unsigned int align = GET_MODE_BITSIZE (mode);
>>>
>>> How about using TARGET_IAMCU condition here and using bitsize only for
>>> TARGET_IAMCU?
>>>
>>
>> Works for me.  Is it OK with that change?
>
> Yes.
>

This is what I checked in.

Thanks.

-- 
H.J.
---
[PATCH] Use GET_MODE_BITSIZE to get vector natural alignment

Since GET_MODE_ALIGNMENT is defined by psABI and the biggest alignment
is 4 byte for IA MCU psABI, we should use GET_MODE_BITSIZE for IA MCU
psABI to get vector natural alignment to check misaligned vector move.

* config/i386/i386.c (ix86_expand_vector_move): Use
GET_MODE_BITSIZE for IA MCU psABI to get vector natural
alignment.
---
 gcc/config/i386/i386.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1049455..a4f4b6f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18645,7 +18645,11 @@ void
 ix86_expand_vector_move (machine_mode mode, rtx operands[])
 {
   rtx op0 = operands[0], op1 = operands[1];
-  unsigned int align = GET_MODE_ALIGNMENT (mode);
+  /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT for IA MCU
+ psABI since the biggest alignment is 4 byte for IA MCU psABI.  */
+  unsigned int align = (TARGET_IAMCU
+ ? GET_MODE_BITSIZE (mode)
+ : GET_MODE_ALIGNMENT (mode));

   if (push_operand (op0, VOIDmode))
 op0 = emit_move_resolve_push (mode, op0);
-- 
2.4.3


Re: Remove fold_strip_sign_ops

2015-10-19 Thread Richard Biener
On Thu, Oct 15, 2015 at 3:28 PM, Richard Sandiford
 wrote:
> This patch deletes fold_strip_sign_ops in favour of the tree-ssa-backprop.c
> pass.
>
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
> OK to install?

Ok once the pass goes in.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * fold-const.h (fold_strip_sign_ops): Delete.
> * fold-const.c (fold_strip_sign_ops): Likewise.
> (fold_unary_loc, fold_binary_loc): Remove calls to it.
> * builtins.c (fold_builtin_cos, fold_builtin_cosh)
> (fold_builtin_ccos): Delete.
> (fold_builtin_pow): Don't call fold_strip_sign_ops.
> (fold_builtin_hypot, fold_builtin_copysign): Likewise.
> Remove fndecl argument.
> (fold_builtin_1): Update calls accordingly.  Handle constant
> cos, cosh, ccos and ccosh here.
>
> gcc/testsuite/
> * gcc.dg/builtins-86.c: XFAIL.
> * gcc.dg/torture/builtin-symmetric-1.c: Don't run at -O0.
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index b4ac535..1e4ec35 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -160,8 +160,6 @@ static rtx expand_builtin_fabs (tree, rtx, rtx);
>  static rtx expand_builtin_signbit (tree, rtx);
>  static tree fold_builtin_pow (location_t, tree, tree, tree, tree);
>  static tree fold_builtin_powi (location_t, tree, tree, tree, tree);
> -static tree fold_builtin_cos (location_t, tree, tree, tree);
> -static tree fold_builtin_cosh (location_t, tree, tree, tree);
>  static tree fold_builtin_tan (tree, tree);
>  static tree fold_builtin_trunc (location_t, tree, tree);
>  static tree fold_builtin_floor (location_t, tree, tree);
> @@ -7688,77 +7686,6 @@ fold_builtin_cproj (location_t loc, tree arg, tree 
> type)
>return NULL_TREE;
>  }
>
> -/* Fold function call to builtin cos, cosf, or cosl with argument ARG.
> -   TYPE is the type of the return value.  Return NULL_TREE if no
> -   simplification can be made.  */
> -
> -static tree
> -fold_builtin_cos (location_t loc,
> - tree arg, tree type, tree fndecl)
> -{
> -  tree res, narg;
> -
> -  if (!validate_arg (arg, REAL_TYPE))
> -return NULL_TREE;
> -
> -  /* Calculate the result when the argument is a constant.  */
> -  if ((res = do_mpfr_arg1 (arg, type, mpfr_cos, NULL, NULL, 0)))
> -return res;
> -
> -  /* Optimize cos(-x) into cos (x).  */
> -  if ((narg = fold_strip_sign_ops (arg)))
> -return build_call_expr_loc (loc, fndecl, 1, narg);
> -
> -  return NULL_TREE;
> -}
> -
> -/* Fold function call to builtin cosh, coshf, or coshl with argument ARG.
> -   Return NULL_TREE if no simplification can be made.  */
> -
> -static tree
> -fold_builtin_cosh (location_t loc, tree arg, tree type, tree fndecl)
> -{
> -  if (validate_arg (arg, REAL_TYPE))
> -{
> -  tree res, narg;
> -
> -  /* Calculate the result when the argument is a constant.  */
> -  if ((res = do_mpfr_arg1 (arg, type, mpfr_cosh, NULL, NULL, 0)))
> -   return res;
> -
> -  /* Optimize cosh(-x) into cosh (x).  */
> -  if ((narg = fold_strip_sign_ops (arg)))
> -   return build_call_expr_loc (loc, fndecl, 1, narg);
> -}
> -
> -  return NULL_TREE;
> -}
> -
> -/* Fold function call to builtin ccos (or ccosh if HYPER is TRUE) with
> -   argument ARG.  TYPE is the type of the return value.  Return
> -   NULL_TREE if no simplification can be made.  */
> -
> -static tree
> -fold_builtin_ccos (location_t loc, tree arg, tree type, tree fndecl,
> -  bool hyper)
> -{
> -  if (validate_arg (arg, COMPLEX_TYPE)
> -  && TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) == REAL_TYPE)
> -{
> -  tree tmp;
> -
> -  /* Calculate the result when the argument is a constant.  */
> -  if ((tmp = do_mpc_arg1 (arg, type, (hyper ? mpc_cosh : mpc_cos
> -   return tmp;
> -
> -  /* Optimize fn(-x) into fn(x).  */
> -  if ((tmp = fold_strip_sign_ops (arg)))
> -   return build_call_expr_loc (loc, fndecl, 1, tmp);
> -}
> -
> -  return NULL_TREE;
> -}
> -
>  /* Fold function call to builtin tan, tanf, or tanl with argument ARG.
> Return NULL_TREE if no simplification can be made.  */
>
> @@ -8174,10 +8101,9 @@ fold_builtin_bswap (tree fndecl, tree arg)
> NULL_TREE if no simplification can be made.  */
>
>  static tree
> -fold_builtin_hypot (location_t loc, tree fndecl,
> -   tree arg0, tree arg1, tree type)
> +fold_builtin_hypot (location_t loc, tree arg0, tree arg1, tree type)
>  {
> -  tree res, narg0, narg1;
> +  tree res;
>
>if (!validate_arg (arg0, REAL_TYPE)
>|| !validate_arg (arg1, REAL_TYPE))
> @@ -8187,16 +8113,6 @@ fold_builtin_hypot (location_t loc, tree fndecl,
>if ((res = do_mpfr_arg2 (arg0, arg1, type, mpfr_hypot)))
>  return res;
>
> -  /* If either argument to hypot has a negate or abs, strip that off.
> - E.g. hypot(-x,fabs(y)) -> hypot(x,y).  */
> -  narg0 = fold_strip_sign_ops (arg0);
> -  narg1 = fold_strip_sign_ops (arg1);
> -  i

Re: [PATCH V3][GCC] Algorithmic optimization in match and simplify

2015-10-19 Thread Richard Biener
On Thu, Oct 15, 2015 at 3:50 PM, Christophe Lyon
 wrote:
> On 9 October 2015 at 18:11, James Greenhalgh  wrote:
>> On Thu, Oct 08, 2015 at 01:29:34PM +0100, Richard Biener wrote:
>>> > Thanks again for the comments Richard!
>>> >
>>> > A new algorithmic optimisation:
>>> >
>>> > ((X inner_op C0) outer_op C1)
>>> > With X being a tree where value_range has reasoned certain bits to always 
>>> > be
>>> > zero throughout its computed value range, we will call this the zero_mask,
>>> > and with inner_op = {|,^}, outer_op = {|,^} and inner_op != outer_op.
>>> > if (inner_op == '^') C0 &= ~C1;
>>> > if ((C0 & ~zero_mask) == 0) then emit (X outer_op (C0 outer_op C1)
>>> > if ((C1 & ~zero_mask) == 0) then emit (X inner_op (C0 outer_op C1)
>>> >
>>> > And extended '(X & C2) << C1 into (X << C1) & (C2 << C1)' and
>>> > '(X & C2) >> C1 into (X >> C1) & (C2 >> C1)' to also accept the bitwise or
>>> > and xor operators:
>>> > '(X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)' and
>>> > '(X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1)'.
>>> >
>>> > The second transformation enables more applications of the first. Also 
>>> > some
>>> > targets may benefit from delaying shift operations. I am aware that such 
>>> > an
>>> > optimization, in combination with one or more optimizations that cause the
>>> > reverse transformation, may lead to an infinite loop. Though such behavior
>>> > has not been detected during regression testing and bootstrapping on
>>> > aarch64.
>>> >
>>> > gcc/ChangeLog:
>>> >
>>> > 2015-10-05 Andre Vieira 
>>> >
>>> > * match.pd: Added a new pattern
>>> > ((X inner_op C0) outer_op C1)
>>> > and expanded existing one
>>> > (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>> >
>>> > gcc/testsuite/ChangeLog:
>>> >
>>> > 2015-10-05 Andre Vieira 
>>> >
>>> > Hale Wang 
>>> >
>>> > * gcc.dg/tree-ssa/forwprop-33.c: New test.
>>>
>>> Ok.
>>>
>>> Thanks,
>>> Richard.
>>>
>>
>> As Andre does not have commit rights, I've committed this on his behalf as
>> revision 228661. Please watch for any fallout over the weekend.
>>
>
> Since this commit I'm seeing:
> FAIL: gcc.target/arm/xor-and.c scan-assembler orr
> on most arm targets.
>
> See: 
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/228661/report-build-info.html
>
> Since that's already a few days old, I suspect you are already aware of that?

Please file a bugreport.

Thanks,
Richard.

> Christophe.
>
>
>> Andre, please check your ChangeLog format in future. In the end I
>> committed this:
>>
>> gcc/ChangeLog
>>
>> 2015-10-09  Andre Vieira  
>>
>> * match.pd: ((X inner_op C0) outer_op C1) New pattern.
>> ((X & C2) << C1): Expand to...
>> (X {&,^,|} C2 << C1): ...This.
>> ((X & C2) >> C1): Expand to...
>> (X {&,^,|} C2 >> C1): ...This.
>>
>> gcc/testsuite/ChangeLog
>>
>> 2015-10-09  Andre Vieira  
>> Hale Wang  
>>
>> * gcc.dg/tree-ssa/forwprop-33.c: New.
>>
>> Thanks,
>> James
>>


Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant

2015-10-19 Thread Richard Biener
On Fri, Oct 16, 2015 at 5:25 PM, Alan Lawrence  wrote:
> This lets the vectorizer handle some simple strides expressed using left-shift
> rather than mul, e.g. a[i << 1] (whereas previously only a[i * 2] would have
> been handled).
>
> This patch does *not* handle the general case of shifts - neither a[i << j]
> nor a[1 << i] will be handled; that would be a significantly bigger patch
> (probably duplicating or generalizing much of chrec_fold_multiply and
> chrec_fold_multiply_poly_poly in tree-chrec.c), and would probably also only
> be applicable to machines with gather-load support.
>
> Bootstrapped+check-gcc,g++,gfortran on x86_64, AArch64 and ARM, also Ada on 
> x86_64.
>
> Is this OK for trunk?
>
> gcc/ChangeLog:
>
> PR tree-optimization/65963
> * tree-scalar-evolution.c (interpret_rhs_expr): Handle some 
> LSHIFT_EXPRs
> as equivalent MULT_EXPRs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-strided-shift-1.c: New.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 
> 
>  gcc/tree-scalar-evolution.c  | 18 +
>  2 files changed, 51 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> new file mode 100644
> index 000..b1ce2ec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> @@ -0,0 +1,33 @@
> +/* PR tree-optimization/65963.  */
> +#include "tree-vect.h"
> +
> +#define N 512
> +
> +int in[2*N], out[N];
> +
> +__attribute__ ((noinline)) void
> +loop (void)
> +{
> +  for (int i = 0; i < N; i++)
> +out[i] = in[i << 1] + 7;
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> +  check_vect ();
> +  for (int i = 0; i < 2*N; i++)
> +{
> +  in[i] = i;
> +  __asm__ volatile ("" : : : "memory");
> +}
> +  loop ();
> +  __asm__ volatile ("" : : : "memory");
> +  for (int i = 0; i < N; i++)
> +{
> +  if (out[i] != i*2 + 7)
> +   abort ();
> +}
> +  return 0;
> +}
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 
> "vect" { target { vect_strided2 } } } } */
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 0753bf3..e478b0e 100644
> --- a/gcc/tree-scalar-evolution.c
> +++ b/gcc/tree-scalar-evolution.c
> @@ -1831,12 +1831,30 @@ interpret_rhs_expr (struct loop *loop, gimple 
> *at_stmt,
>break;
>
>  case MULT_EXPR:
> +case LSHIFT_EXPR:
> +  /* Handle Achrec2 = analyze_scalar_evolution (loop, rhs2);
>chrec1 = chrec_convert (type, chrec1, at_stmt);
>chrec2 = chrec_convert (type, chrec2, at_stmt);
>chrec1 = instantiate_parameters (loop, chrec1);
>chrec2 = instantiate_parameters (loop, chrec2);
> +  if (code == LSHIFT_EXPR)
> +   {
> + /* Do the shift in the larger size, as in e.g. (long) << (int)32,
> +we must do 1<<32 as a long or we'd overflow.  */

Err, you should always do the shift in the type of rhs1.  You should also
avoid the chrec_convert of rhs2 above for shifts.  I think globbing
shifts and multiplies together doesn't make the code any clearer.

Richard.

> + tree type = TREE_TYPE (chrec2);
> + if (TYPE_PRECISION (TREE_TYPE (chrec1)) > TYPE_PRECISION (type))
> +   type = TREE_TYPE (chrec1);
> + if (TYPE_PRECISION (type) == 0)
> +   {
> + res = chrec_dont_know;
> + break;
> +   }
> + chrec2 = fold_build2 (LSHIFT_EXPR, type,
> +   build_int_cst (type, 1),
> +   chrec2);
> +   }
>res = chrec_fold_multiply (type, chrec1, chrec2);
>break;
>
> --
> 1.9.1
>


Re: [PATCH 7/9] ENABLE_CHECKING refactoring: middle-end, LTO FE

2015-10-19 Thread Bernd Schmidt

diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
-
-#ifdef ENABLE_CHECKING
-  verify_flow_info ();
-#endif
+ checking_verify_flow_info ();


This looks misindented.


-#ifdef ENABLE_CHECKING
cgraph_edge *e;
gcc_checking_assert (
!(e = caller->get_edge (call_stmt)) || e->speculative);
-#endif


While you're here, that would look nicer as
 gcc_checking_assert (!(e = caller->get_edge (call_stmt))
  || e->speculative);


-#ifdef ENABLE_CHECKING
-  if (check_same_comdat_groups)
+  if (CHECKING_P && check_same_comdat_groups)


flag_checking


-#ifdef ENABLE_CHECKING
-  struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
-#endif
+  struct df_rd_bb_info *bb_info = flag_checking ? DF_RD_BB_INFO (g->bb)
+   : NULL;


I think no need to make that conditional, that's a bit too ugly.


+  if (CHECKING_P)
+   sparseset_set_bit (active_defs_check, regno);



+  if (CHECKING_P)
+sparseset_clear (active_defs_check);


> -#ifdef ENABLE_CHECKING
> -  active_defs_check = sparseset_alloc (max_reg_num ());
> -#endif

> +  if (CHECKING_P)
> +active_defs_check = sparseset_alloc (max_reg_num ());

> +  if (CHECKING_P)
> +sparseset_free (active_defs_check);

flag_checking. Lots of other occurrences, I'll mention some but not all 
but please fix them for consistency.



  void
  sem_item_optimizer::verify_classes (void)
  {
-#if ENABLE_CHECKING
+  if (!flag_checking)
+return;
+


Not entirely sure whether you want to wrap this into a 
checking_verify_classes instead so that it remains easily callable by 
the debugger?



+ if (flag_checking)
+   {
+ for (symtab_node *n = node->same_comdat_group;
+  n != node;
+  n = n->same_comdat_group)
+   /* If at least one of same comdat group functions is external,
+  all of them have to be, otherwise it is a front-end bug.  */
+   gcc_assert (DECL_EXTERNAL (n->decl));
+   }


Unnecessary set of braces.


diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index 2986f57..941a829 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -1591,7 +1591,7 @@ lra_assign (void)
bitmap_initialize (&all_spilled_pseudos, ®_obstack);
create_live_range_start_chains ();
setup_live_pseudos_and_spill_after_risky_transforms (&all_spilled_pseudos);
-#ifdef ENABLE_CHECKING
+#if CHECKING_P
if (!flag_ipa_ra)
  for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0


Seems inconsistent, use flag_checking and no #if? Looks like the problem 
you're trying to solve is that a structure field exists only with 
checking, I think that could just be made available unconditionally - 
the struct is huge anyway.


As mentioned in the other mail, I see no value changing the #ifdefs to 
#ifs here or elsewhere in the patch.



-  check_rtl (false);
-#endif
+  if (flag_checking)
+check_rtl (/*final_p=*/false);


Lose the /*final_p=*/.


-#ifdef ENABLE_CHECKING
+#if CHECKING_P
  gcc_assert (!bitmap_bit_p (output, DECL_UID (node->decl)));
  bitmap_set_bit (output, DECL_UID (node->decl));
  #endif


Not entirely clear why this isn't using flag_checking.


  tree t = (*trees)[i];
-#ifdef ENABLE_CHECKING
- if (TYPE_P (t))
+ if (CHECKING_P && TYPE_P (t))
verify_type (t);
-#endif


flag_checking


@@ -14108,7 +14102,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
default:
break;
case OMP_CLAUSE_MAP:
-#ifdef ENABLE_CHECKING
+#if CHECKING_P
/* First check what we're prepared to handle in the following.  */
switch (OMP_CLAUSE_MAP_KIND (c))
  {


Here too...


-#ifdef ENABLE_CHECKING
-static void
+static void DEBUG_FUNCTION
  verify_curr_properties (function *fn, void *data)


Hmm, I noticed a few cases where we lost the DEBUG_FUNCTION annotation 
and was going to comment that this is one is odd - but don't we actually 
want to keep DEBUG_FUNCTION annotations for the others as well so that 
they don't get inlined everywhere and eliminated?



+ if (flag_checking)
+   {
+ FOR_EACH_EDGE (e, ei, bb->preds)
+   gcc_assert (!bitmap_bit_p (tovisit, e->src->index)
+   || (e->flags & EDGE_DFS_BACK));
+   }


Unnecessary braces.


+  if (CHECKING_P)
+{
+  for (; argno < PP_NL_ARGMAX; argno++)
+   gcc_assert (!formatters[argno]);
+}


Here too. Use flag_checking.


+  if (CHECKING_P && mode != VOIDmode)


flag_checking.


-#ifdef ENABLE_CHECKING
  static void
  validate_value_data (struct value_data *vd)
  {
+  if (!flag_checking)
+return;


Same thought as before, it might be better to have this check in the 
callers for easier use from the debugger.



-#endif
-

+


Don't change the whit

Re: Add simple sign-stripping cases to match.pd

2015-10-19 Thread Richard Biener
On Mon, Oct 19, 2015 at 12:48 PM, Richard Sandiford
 wrote:
> Richard Sandiford  writes:
>> Richard Sandiford  writes:
>>> Marc Glisse  writes:
 On Thu, 15 Oct 2015, Richard Sandiford wrote:

> This patch makes sure that, for every simplification that uses
> fold_strip_sign_ops, there are associated match.pd rules for the
> leaf sign ops, i.e. abs, negate and copysign.  A follow-on patch
> will add a pass to handle more complex cases.
>
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
> OK to install?
>
> Thanks,
> Richard
>
>
> gcc/
>* match.pd: Add rules to simplify ccos, ccosh, hypot, copysign
>and x*x in cases where the operands are sign ops.  Extend these
>rules to handle copysign as a sign op (including for cos, cosh
>and pow, which already treated negate and abs as sign ops).
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 83c48cd..4331df6 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -61,6 +61,12 @@ along with GCC; see the file COPYING3.  If not see
> (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
> (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
> (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI 
> BUILT_IN_CEXPIL)
> +(define_operator_list CCOS BUILT_IN_CCOSF BUILT_IN_CCOS BUILT_IN_CCOSL)
> +(define_operator_list CCOSH BUILT_IN_CCOSHF BUILT_IN_CCOSH 
> BUILT_IN_CCOSHL)
> +(define_operator_list HYPOT BUILT_IN_HYPOTF BUILT_IN_HYPOT 
> BUILT_IN_HYPOTL)
> +(define_operator_list COPYSIGN BUILT_IN_COPYSIGNF
> + BUILT_IN_COPYSIGN
> + BUILT_IN_COPYSIGNL)
>
> /* Simplifications of operations with one constant operand and
>simplifications to constants or single values.  */
> @@ -321,7 +327,69 @@ along with GCC; see the file COPYING3.  If not see
>(pows (op @0) REAL_CST@1)
>(with { HOST_WIDE_INT n; }
> (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
> - (pows @0 @1))
> + (pows @0 @1)
> + /* Strip negate and abs from both operands of hypot.  */
> + (for hypots (HYPOT)
> +  (simplify
> +   (hypots (op @0) @1)
> +   (hypots @0 @1))
> +  (simplify
> +   (hypots @0 (op @1))
> +   (hypots @0 @1)))

 Out of curiosity, would hypots:c have worked? (it is probably not worth
 gratuitously swapping the operands to save 3 lines though)
>>>
>>> Yeah, I think I'd prefer to keep it like it is if that's OK.
>>>
> + /* copysign(-x, y) and copysign(abs(x), y) -> copysign(x, y).  */
> + (for copysigns (COPYSIGN)
> +  (simplify
> +   (copysigns (op @0) @1)
> +   (copysigns @0 @1)))
> + /* -x*-x and abs(x)*abs(x) -> x*x.  Should be valid for all types.  */
> + (simplify
> +  (mult (op @0) (op @1))
> +  (mult @0 @0)))

 Typo @1 -> @0 ?
>>>
>>> Argh!  Thanks for catching that.  Wonder how many proof-reads that
>>> escaped :-(
>>>
 This will partially duplicate Naveen's patch "Move some bit and binary
 optimizations in simplify and match".
>>>
>>> OK.  Should I just limit it to the abs case?
>>>
> +/* copysign(x,y)*copysign(x,y) -> x*x.  */
> +(for copysigns (COPYSIGN)
> + (simplify
> +  (mult (copysigns @0 @1) (copysigns @0 @1))

 (mult (copysigns@2 @0 @1) @2)
 ? Or is there some reason not to rely on CSE? (I don't think copysign has
 any errno issue)
>>>
>>> No, simply didn't know about that trick.  I'll use it for the
>>> (mult (op @0) (op @0)) case as well.
>>
>> Here's the updated patch.  I've kept the (mult (negate@1 @0) @1)
>> pattern for now, but can limit it to abs as necessary when
>> Naveen's patch goes in.
>>
>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>>
>> Thanks,
>> Richard
>
> Er...
>
> gcc/
> * match.pd: Add rules to simplify ccos, ccosh, hypot, copysign
> and x*x in cases where the operands are sign ops.  Extend these
> rules to handle copysign as a sign op (including for cos, cosh
> and pow, which already treated negate and abs as sign ops).

Ok.

Thanks,
Richard.

> diff --git a/gcc/match.pd b/gcc/match.pd
> index f3813d8..d677e69 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -61,6 +61,12 @@ along with GCC; see the file COPYING3.  If not see
>  (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
>  (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
>  (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL)
> +(define_operator_list CCOS BUILT_IN_CCOSF BUILT_IN_CCOS BUILT_IN_CCOSL)
> +(define_operator_list CCOSH BUILT_IN_CCOSHF BUILT_IN_CCOSH BUILT_IN_CCOSHL)
> +(define_operator_list HYPOT BUILT_IN_HYPOTF BUILT_IN_HYPOT BUILT_IN_HYPOTL)
> +(define_operator_list COPYSIGN BUILT_IN_COPYSIGNF

[mask conversion, patch 1/2] Add pattern for mask conversions

2015-10-19 Thread Ilya Enkovich
Hi,

This patch adds a vectorization pattern which detects cases where mask 
conversion is needed and adds it.  It is done for all statements which may 
consume mask.  Some additional changes were made to support MASK_LOAD with 
pattern and allow scalar mode for vectype of pattern stmt.  It is applied on 
top of all other boolean vector series.  Does it look OK?

Thanks,
Ilya
--
gcc/

2015-10-19  Ilya Enkovich  

* optabs.c (expand_binop_directly): Allow scalar mode for
vec_pack_trunc_optab.
* tree-vect-loop.c (vect_determine_vectorization_factor): Skip
boolean vector producers from pattern sequence when computing VF.
* tree-vect-patterns.c (vect_vect_recog_func_ptrs) Add
vect_recog_mask_conversion_pattern.
(search_type_for_mask): Choose the smallest
type if different size types are mixed.
(build_mask_conversion): New.
(vect_recog_mask_conversion_pattern): New.
(vect_pattern_recog_1): Allow scalar mode for boolean vectype.
* tree-vect-stmts.c (vectorizable_mask_load_store): Support masked
load with pattern.
(vectorizable_conversion): Support boolean vectors.
(free_stmt_vec_info): Allow patterns for statements with no lhs.
* tree-vectorizer.h (NUM_PATTERNS): Increase to 14.


diff --git a/gcc/optabs.c b/gcc/optabs.c
index 83f4be3..8d61d33 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -1055,7 +1055,8 @@ expand_binop_directly (machine_mode mode, optab binoptab,
   /* The mode of the result is different then the mode of the
 arguments.  */
   tmp_mode = insn_data[(int) icode].operand[0].mode;
-  if (GET_MODE_NUNITS (tmp_mode) != 2 * GET_MODE_NUNITS (mode))
+  if (VECTOR_MODE_P (mode)
+ && GET_MODE_NUNITS (tmp_mode) != 2 * GET_MODE_NUNITS (mode))
{
  delete_insns_since (last);
  return NULL_RTX;
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 14804b3..e388533 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -497,6 +497,17 @@ vect_determine_vectorization_factor (loop_vec_info 
loop_vinfo)
}
 }
 
+ /* Boolean vectors don't affect VF.  */
+ if (VECTOR_BOOLEAN_TYPE_P (vectype))
+   {
+ if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
+   {
+ pattern_def_seq = NULL;
+ gsi_next (&si);
+   }
+ continue;
+   }
+
  /* The vectorization factor is according to the smallest
 scalar type (or the largest vector size, but we only
 support one vector size per loop).  */
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index a737129..34b1ea6 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -76,6 +76,7 @@ static gimple *vect_recog_mult_pattern (vec *,
 static gimple *vect_recog_mixed_size_cond_pattern (vec *,
  tree *, tree *);
 static gimple *vect_recog_bool_pattern (vec *, tree *, tree *);
+static gimple *vect_recog_mask_conversion_pattern (vec *, tree *, 
tree *);
 static vect_recog_func_ptr vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
vect_recog_widen_mult_pattern,
vect_recog_widen_sum_pattern,
@@ -89,7 +90,8 @@ static vect_recog_func_ptr 
vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
vect_recog_divmod_pattern,
vect_recog_mult_pattern,
vect_recog_mixed_size_cond_pattern,
-   vect_recog_bool_pattern};
+   vect_recog_bool_pattern,
+   vect_recog_mask_conversion_pattern};
 
 static inline void
 append_pattern_def_seq (stmt_vec_info stmt_info, gimple *stmt)
@@ -3180,7 +3182,7 @@ search_type_for_mask (tree var, vec_info *vinfo)
   enum vect_def_type dt;
   tree rhs1;
   enum tree_code rhs_code;
-  tree res = NULL;
+  tree res = NULL, res2;
 
   if (TREE_CODE (var) != SSA_NAME)
 return NULL;
@@ -3213,13 +3215,26 @@ search_type_for_mask (tree var, vec_info *vinfo)
 case BIT_AND_EXPR:
 case BIT_IOR_EXPR:
 case BIT_XOR_EXPR:
-  if (!(res = search_type_for_mask (rhs1, vinfo)))
-   res = search_type_for_mask (gimple_assign_rhs2 (def_stmt), vinfo);
+  res = search_type_for_mask (rhs1, vinfo);
+  res2 = search_type_for_mask (gimple_assign_rhs2 (def_stmt), vinfo);
+  if (!res || (res2 && TYPE_PRECISION (res) > TYPE_PRECISION (res2)))
+   res = res2;
   break;
 
 default:
   if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
{
+ tree comp_vectype, mask_type;
+
+ comp_vectype = get_vectype_for_scalar_type (TREE_TYPE (rhs1));
+ if (comp_vectype == NULL_TREE)
+   return NULL;
+
+ mask_type = get_mask_type_for_scalar_type (TREE_TYPE (rhs1));
+ if (!mask_type
+ || !expand_vec_cmp_expr_p (comp_vectype, mask_type))
+   return NULL;
+
  if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE

Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-10-19 Thread Richard Biener
On Sun, Oct 18, 2015 at 7:14 PM, Jan Hubicka  wrote:
>>
>> Adding back the mode check is fine if all types with the same TYPE_CANONICAL 
>> have the same mode.  Otherwise we'd regress here.  I thought we do for
>>
>> Struct x { int i; };
>> Typedef y x __attribute__((packed));
>>
>> And then doing
>>
>> X x;
>> Y y;
>> X = y;
>
> Do you have any idea how to turn this into a testcase? I don't think we could
> add packed attribute to typedef. Even in
> gimple_canonical_types_compatible_p
>   /* Can't be the same type if they have different mode.  */
>   if (TYPE_MODE (t1) != TYPE_MODE (t2))
> return false;
> (which IMO may be wrong WRT -mavx flags where modes of same types may be 
> different
> in different TUs)

Ok, so the following works:

struct x { int i; };
typedef struct x y __attribute__((aligned(1)));

void foo (void)
{
  struct x X;
  y Y;
  X = Y;
}

but we use SImode for y as well even though it's alignment is just one byte ...

Not sure what happens on strict-align targets for this and not sure how this
cannot be _not_ a problem.  Consider

void bar (struct x);

and

bar (Y);

or using y *Y and X = *Y or bar (*Y).

> Therefore I would say that TYPE_CANONICAL determine mode modulo the fact that
> incoplete variant of a complete type will have VOIDmode instead of complete
> type's mode (during non-LTO).  That is why I allow mode changes for casts from
> complete to incomplete.

Incomplete have VOIDmode, right?

> In longer run I think that every query to useless_type_conversion_p that
> contains incomplete types is a confused query.  useless_type_conversion_p is
> about operations on the value and there are no operations for incomplete type
> (and function types).  I know that ipa-icf-gimple and the following code in
> gimplify-stmt checks this frequently:
>   /* The FEs may end up building ADDR_EXPRs early on a decl with
>  an incomplete type.  Re-build ADDR_EXPRs in canonical form
>  here.  */
>   if (!types_compatible_p (TREE_TYPE (op0), TREE_TYPE (TREE_TYPE (expr
> *expr_p = build_fold_addr_expr (op0);
> Taking address of incomplete type or functions, naturally, makes sense.  We 
> may
> want to check something else here, like simply
>TREE_TYPE (op0) != TREE_TYPE (TREE_TYPE (expr))
> and once ipa-icf is cleanded up start sanity checking in 
> usless_type_conversion
> that we use it to force equality only on types that do have values.
>
> We also can trip it when checking TYPE_METHOD_BASETYPE which may be 
> incomplete.
> This is in the code checking useless_type_conversion on functions that I think
> are confused querries anyway - we need the ABI matcher, I am looking into 
> that.

Ok, so given we seem to be fine in practive with TYPE_MODE (type) ==
TYPE_MODE (TYPE_CANONICAL (type))
(whether that's a but or not ...) I'm fine with re-instantiating the
mode check for
aggregate types.  Please do that with

Index: gcc/gimple-expr.c
===
--- gcc/gimple-expr.c   (revision 228963)
+++ gcc/gimple-expr.c   (working copy)
@@ -89,8 +89,7 @@ useless_type_conversion_p (tree outer_ty

   /* Changes in machine mode are never useless conversions unless we
  deal with aggregate types in which case we defer to later checks.  */
-  if (TYPE_MODE (inner_type) != TYPE_MODE (outer_type)
-  && !AGGREGATE_TYPE_P (inner_type))
+  if (TYPE_MODE (inner_type) != TYPE_MODE (outer_type))
 return false;

   /* If both the inner and outer types are integral types, then the

Can we asses equal sizes when modes are non-BLKmode then?  Thus

@@ -270,10 +269,9 @@ useless_type_conversion_p (tree outer_ty
  use the types in move operations.  */
   else if (AGGREGATE_TYPE_P (inner_type)
   && TREE_CODE (inner_type) == TREE_CODE (outer_type))
-return (!TYPE_SIZE (outer_type)
-   || (TYPE_SIZE (inner_type)
-   && operand_equal_p (TYPE_SIZE (inner_type),
-   TYPE_SIZE (outer_type), 0)));
+return (TYPE_MODE (outer_type) != BLKmode
+   || operand_equal_p (TYPE_SIZE (inner_type),
+   TYPE_SIZE (outer_type), 0));

   else if (TREE_CODE (inner_type) == OFFSET_TYPE
   && TREE_CODE (outer_type) == OFFSET_TYPE)

?  Hoping for VOIDmode incomplete case.

Richard.

> Honza
>>
>> Richard.
>>
>>
>> >Honza
>> >>
>> >> --
>> >> Eric Botcazou
>>


[mask conversion, patch 2/2, i386] Add pack/unpack patterns for scalar masks

2015-10-19 Thread Ilya Enkovich
Hi,

This patch adds patterns to be used for vector masks pack/unpack for AVX512.   
Bootstrapped and tested on x86_64-unknown-linux-gnu.  Does it look OK?

Thanks,
Ilya
--
gcc/

2015-10-19  Ilya Enkovich  

* config/i386/sse.md (HALFMASKMODE): New attribute.
(DOUBLEMASKMODE): New attribute.
(vec_pack_trunc_qi): New.
(vec_pack_trunc_): New.
(vec_unpacku_lo_hi): New.
(vec_unpacku_lo_si): New.
(vec_unpacku_lo_di): New.
(vec_unpacku_hi_hi): New.
(vec_unpacku_hi_): New.

gcc/testsuite/

2015-10-19  Ilya Enkovich  

* gcc.target/i386/mask-pack.c: New test.
* gcc.target/i386/mask-unpack.c: New test.


diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 452629f..ed0eedc 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -799,6 +799,14 @@
   [(V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t") (V8SF "t") (V4DF "t")
(V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g") (V16SF "g") (V8DF "g")])
 
+;; Half mask mode for unpacks
+(define_mode_attr HALFMASKMODE
+  [(DI "SI") (SI "HI")])
+
+;; Double mask mode for packs
+(define_mode_attr DOUBLEMASKMODE
+  [(HI "SI") (SI "DI")])
+
 
 ;; Include define_subst patterns for instructions with mask
 (include "subst.md")
@@ -11578,6 +11586,23 @@
   DONE;
 })
 
+(define_expand "vec_pack_trunc_qi"
+  [(set (match_operand:HI 0 ("register_operand"))
+(ior:HI (ashift:HI (zero_extend:HI (match_operand:QI 1 
("register_operand")))
+   (const_int 8))
+(zero_extend:HI (match_operand:QI 2 ("register_operand")]
+  "TARGET_AVX512F")
+
+(define_expand "vec_pack_trunc_"
+  [(set (match_operand: 0 ("register_operand"))
+(ior: (ashift: 
(zero_extend: (match_operand:SWI24 1 ("register_operand")))
+   (match_dup 3))
+(zero_extend: (match_operand:SWI24 2 
("register_operand")]
+  "TARGET_AVX512BW"
+{
+  operands[3] = GEN_INT (GET_MODE_BITSIZE (mode));
+})
+
 (define_insn "_packsswb"
   [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x")
(vec_concat:VI1_AVX512
@@ -13474,12 +13499,42 @@
   "TARGET_SSE2"
   "ix86_expand_sse_unpack (operands[0], operands[1], true, false); DONE;")
 
+(define_expand "vec_unpacku_lo_hi"
+  [(set (match_operand:QI 0 "register_operand")
+(subreg:QI (match_operand:HI 1 "register_operand") 0))]
+  "TARGET_AVX512DQ")
+
+(define_expand "vec_unpacku_lo_si"
+  [(set (match_operand:HI 0 "register_operand")
+(subreg:HI (match_operand:SI 1 "register_operand") 0))]
+  "TARGET_AVX512F")
+
+(define_expand "vec_unpacku_lo_di"
+  [(set (match_operand:SI 0 "register_operand")
+(subreg:SI (match_operand:DI 1 "register_operand") 0))]
+  "TARGET_AVX512BW")
+
 (define_expand "vec_unpacku_hi_"
   [(match_operand: 0 "register_operand")
(match_operand:VI124_AVX2_24_AVX512F_1_AVX512BW 1 "register_operand")]
   "TARGET_SSE2"
   "ix86_expand_sse_unpack (operands[0], operands[1], true, true); DONE;")
 
+(define_expand "vec_unpacku_hi_hi"
+  [(set (subreg:HI (match_operand:QI 0 "register_operand") 0)
+(lshiftrt:HI (match_operand:HI 1 "register_operand")
+ (const_int 8)))]
+  "TARGET_AVX512F")
+
+(define_expand "vec_unpacku_hi_"
+  [(set (subreg:SWI48x (match_operand: 0 "register_operand") 0)
+(lshiftrt:SWI48x (match_operand:SWI48x 1 "register_operand")
+ (match_dup 2)))]
+  "TARGET_AVX512BW"
+{
+  operands[2] = GEN_INT (GET_MODE_BITSIZE (mode));
+})
+
 ;
 ;;
 ;; Miscellaneous
diff --git a/gcc/testsuite/gcc.target/i386/mask-pack.c 
b/gcc/testsuite/gcc.target/i386/mask-pack.c
new file mode 100644
index 000..0b564ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mask-pack.c
@@ -0,0 +1,100 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O3 -fopenmp-simd -fdump-tree-vect-details" } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 10 "vect" } } */
+/* { dg-final { scan-assembler-not "maskmov" } } */
+
+#define LENGTH 1000
+
+long l1[LENGTH], l2[LENGTH];
+int i1[LENGTH], i2[LENGTH];
+short s1[LENGTH], s2[LENGTH];
+char c1[LENGTH], c2[LENGTH];
+double d1[LENGTH], d2[LENGTH];
+
+int test1 (int n)
+{
+  int i;
+  #pragma omp simd safelen(16)
+  for (i = 0; i < LENGTH; i++)
+if (l1[i] > l2[i])
+  i1[i] = 1;
+}
+
+int test2 (int n)
+{
+  int i;
+  #pragma omp simd safelen(16)
+  for (i = 0; i < LENGTH; i++)
+if (i1[i] > i2[i])
+  s1[i] = 1;
+}
+
+int test3 (int n)
+{
+  int i;
+  #pragma omp simd safelen(16)
+  for (i = 0; i < LENGTH; i++)
+if (s1[i] > s2[i])
+  c1[i] = 1;
+}
+
+int test4 (int n)
+{
+  int i;
+  #pragma omp simd safelen(16)
+  for (i = 0; i < LENGTH; i++)
+if (d1[i] > d2[i])
+  c1[i] = 1;
+}
+
+int test5 (int n)
+{
+  int i;
+  #pragma omp simd safelen(16)
+  for (i = 0; i < LENGTH; i++)
+i1[i] = l1[i] > l2[i] ? 3 : 4;
+}
+
+int test6 (int n)
+{
+

Re: [c++-delayed-folding] First stab at convert_to_integer

2015-10-19 Thread Marek Polacek
On Fri, Oct 16, 2015 at 02:07:51PM -1000, Jason Merrill wrote:
> On 10/16/2015 07:35 AM, Marek Polacek wrote:
> >>This code path seems to be for pushing a conversion down into a binary
> >>expression.  We shouldn't do this at all when we aren't folding.
> >
> >I tend to agree, but this case is tricky.  What's this code about is
> >e.g. for
> >
> >int
> >fn (long p, long o)
> >{
> >   return p + o;
> >}
> >
> >we want to narrow the operation and do the addition on unsigned ints and then
> >convert to int.  We do it here because we're still missing the
> >promotion/demotion pass on GIMPLE (PR45397 / PR47477).  Disabling this
> >optimization here would regress a few testcases, so I kept the code as it 
> >was.
> >Thoughts?
> 
> That makes sense, but please add a comment referring to one of those PRs and
> also add a note to the PR about this place.  OK with that change.
 
Done.  But I can't seem to commit the patch to the c++-delayed-folding
branch; is that somehow restricted?  I'm getting:

svn: E170001: Commit failed (details follow):
svn: E170001: Authorization failed
svn: E170001: Your commit message was left in a temporary file:
svn: E170001:'/home/marek/svn/c++-delayed-folding/svn-commit.tmp'

and I've checked out the branch using
svn co svn://mpola...@gcc.gnu.org/svn/gcc/branches/c++-delayed-folding/

> >Moreover, there are some places in the C++ FE where we still call
> >convert_to_integer and not convert_to_integer_nofold -- should they be
> >changed to the _nofold variant?
> 
> Not in build_base_path; its arithmetic is compiler generated and should
> really be delayed until genericize anyway.  Likewise for
> get_delta_difference.
> 
> I think the call in finish_omp_clauses could change.

All right, I'll submit a separate patch.  Thanks,

Marek


Re: Move some bit and binary optimizations in simplify and match

2015-10-19 Thread Richard Biener
On Mon, Oct 19, 2015 at 1:14 PM, Hurugalawadi, Naveen
 wrote:
> Hi,
>
>>> That's not what Richard meant. We already have:
>
> Done. As per the comments.
>
> Please find attached the modified patch as per your comments.
>
> Please review them and let me know if any further modifications are required.

This patch is ok when bootstrapped / tested and with a proper changelog entry.

Thanks,
Richard.

> Thanks,
> Naveen


Re: Add a pass to back-propagate use information

2015-10-19 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Oct 15, 2015 at 3:17 PM, Richard Sandiford
>  wrote:
>> This patch adds a pass that collects information that is common to all
>> uses of an SSA name X and back-propagates that information up the statements
>> that generate X.  The general idea is to use the information to simplify
>> instructions (rather than a pure DCE) so I've simply called it
>> tree-ssa-backprop.c, to go with tree-ssa-forwprop.c.
>>
>> At the moment the only use of the pass is to remove unnecessry sign
>> operations, so that it's effectively a global version of
>> fold_strip_sign_ops.  I'm hoping it could be extended in future to
>> record which bits of an integer are significant.  There are probably
>> other potential uses too.
>>
>> A later patch gets rid of fold_strip_sign_ops.
>>
>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>> OK to install?
>>
>> Thanks,
>> Richard
>>
>>
>> gcc/
>> * doc/invoke.texi (-fdump-tree-backprop, -ftree-backprop): Document.
>> * Makefile.in (OBJS): Add tree-ssa-backprop.o.
>> * common.opt (ftree-backprop): New option.
>> * fold-const.h (negate_mathfn_p): Declare.
>> * fold-const.c (negate_mathfn_p): Make public.
>> * timevar.def (TV_TREE_BACKPROP): New.
>> * tree-passes.h (make_pass_backprop): Declare.
>> * passes.def (pass_backprop): Add.
>> * tree-ssa-backprop.c: New file.
>>
>> gcc/testsuite/
>> * gcc.dg/tree-ssa/backprop-1.c, gcc.dg/tree-ssa/backprop-2.c,
>> gcc.dg/tree-ssa/backprop-3.c, gcc.dg/tree-ssa/backprop-4.c,
>> gcc.dg/tree-ssa/backprop-5.c, gcc.dg/tree-ssa/backprop-6.c: New 
>> tests.
>>
>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>> index 783e4c9..69e669d 100644
>> --- a/gcc/Makefile.in
>> +++ b/gcc/Makefile.in
>> @@ -1445,6 +1445,7 @@ OBJS = \
>> tree-switch-conversion.o \
>> tree-ssa-address.o \
>> tree-ssa-alias.o \
>> +   tree-ssa-backprop.o \
>> tree-ssa-ccp.o \
>> tree-ssa-coalesce.o \
>> tree-ssa-copy.o \
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 5060208..5aef625 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -2364,6 +2364,10 @@ ftree-pta
>>  Common Report Var(flag_tree_pta) Optimization
>>  Perform function-local points-to analysis on trees.
>>
>> +ftree-backprop
>> +Common Report Var(flag_tree_backprop) Init(1) Optimization
>> +Enable backward propagation of use properties at the tree level.
>
> Don't add new -ftree-* "tree" doesn't add any info for our users.  I'd
> also refer to SSA level rather than "tree" level.  Not sure if -fbackprop
> is good, but let's go for it.

OK.

>> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
>> index de45a2c..7f00e72 100644
>> --- a/gcc/fold-const.c
>> +++ b/gcc/fold-const.c
>> @@ -319,7 +318,7 @@ fold_overflow_warning (const char* gmsgid, enum
> warn_strict_overflow_code wc)
>>  /* Return true if the built-in mathematical function specified by CODE
>> is odd, i.e. -f(x) == f(-x).  */
>>
>> -static bool
>> +bool
>>  negate_mathfn_p (enum built_in_function code)
>>  {
>>switch (code)
>
> Belongs more to builtins.[ch] if exported.

The long-term plan is to abstract away whether it's a built-in function
or an internal function, in which case I hope to have a single predicate
that handles both.  I'm not sure where the code should go after that change.
Maybe a new file?

>> diff --git a/gcc/fold-const.h b/gcc/fold-const.h
>> index ee74dc8..4d5b24b 100644
>> --- a/gcc/fold-const.h
>> +++ b/gcc/fold-const.h
>> @@ -173,6 +173,7 @@ extern tree sign_bit_p (tree, const_tree);
>>  extern tree exact_inverse (tree, tree);
>>  extern tree const_unop (enum tree_code, tree, tree);
>>  extern tree const_binop (enum tree_code, tree, tree, tree);
>> +extern bool negate_mathfn_p (enum built_in_function);
>>
>>  /* Return OFF converted to a pointer offset type suitable as offset for
>> POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */
>> diff --git a/gcc/passes.def b/gcc/passes.def
>> index dc3f44c..36d2b3b 100644
>> --- a/gcc/passes.def
>> +++ b/gcc/passes.def
>> @@ -159,6 +159,7 @@ along with GCC; see the file COPYING3.  If not see
>>/* After CCP we rewrite no longer addressed locals into SSA
>>  form if possible.  */
>>NEXT_PASS (pass_complete_unrolli);
>> +  NEXT_PASS (pass_backprop);
>>NEXT_PASS (pass_phiprop);
>>NEXT_PASS (pass_forwprop);
>
> Any reason to not put this later?  I was thinking before reassoc.

I think we're relying on FRE to notice the redundancy in the
builtins-*.c tests, once this pass has converted the version
with redundant sign ops to make it look like the version without.
reassoc is likely to be too late.

I also thought it should go before rather than after some instance
of forwprop because the pass might expose more forward folding
opportunities.  E.g. if the sign of A = -B * B doesn't matter,
we'll end up with A = B * B, which migh

Re: [PATCH, rs6000] Pass --secure-plt to the linker

2015-10-19 Thread Szabolcs Nagy

On 19/10/15 12:12, Alan Modra wrote:

On Thu, Oct 15, 2015 at 06:50:50PM +0100, Szabolcs Nagy wrote:

A powerpc toolchain built with (or without) --enable-secureplt
currently creates a binary that uses bss plt if

(1) any of the linked PIC objects have bss plt relocs
(2) or all the linked objects are non-PIC or have no relocs,

because this is the binutils linker behaviour.

This patch passes --secure-plt to the linker which makes the linker
warn in case (1) and produce a binary with secure plt in case (2).


The idea is OK I think, but


@@ -574,6 +577,7 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
  %{R*} \
  %(link_shlib) \
  %{!T*: %(link_start) } \
+%{!static: %(link_secure_plt_default)} \
  %(link_os)"


this change needs to be conditional on !mbss-plt too.



OK, will change that.

if -msecure-plt and -mbss-plt are supposed to affect
linking too (not just code gen) then shall i add
%{msecure-plt: --secure-plt} too?



Re: [PATCH 1/9] ENABLE_CHECKING refactoring

2015-10-19 Thread Mikhail Maltsev
On 10/19/2015 02:13 PM, Bernd Schmidt wrote:
> But for normal C conditions the patches end up using flag_checking, so
> the CHECKING_P macro buys us nothing over ENABLE_CHECKING.
Presumably 'if (CHECKING_P)' can be used for performance-critical parts
(in this case the condition will be DCE-d) and also for those parts of
the compiler which we want to decouple from 'options.h'.
IIRC, Jeff's idea was get rid of 'ENABLE_CHECKING' completely and use
either 'flag_checking' or 'CHECKING_P'. But I don't know what is the
consensus on it (I would like to hear Jeff's and Richard's opinion).
Of course it will be easy for me to adjust the patch to whatever the
final decision will be.

-- 
Regards,
Mikhail Maltsev


[PATCH] Move cproj simplification to match.pd

2015-10-19 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-10-19  Richard Biener  

* gimple-fold.c (gimple_phi_nonnegative_warnv_p): New function.
(gimple_stmt_nonnegative_warnv_p): Use it.
* match.pd (CPROJ): New operator list.
(cproj (complex ...)): Move simplifications from ...
* builtins.c (fold_builtin_cproj): ... here.

* gcc.dg/torture/builtin-cproj-1.c: Skip for -O0.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 228877)
+++ gcc/gimple-fold.c   (working copy)
@@ -6224,6 +6224,24 @@ gimple_call_nonnegative_warnv_p (gimple
strict_overflow_p, depth);
 }
 
+/* Return true if return value of call STMT is known to be non-negative.
+   If the return value is based on the assumption that signed overflow is
+   undefined, set *STRICT_OVERFLOW_P to true; otherwise, don't change
+   *STRICT_OVERFLOW_P.  DEPTH is the current nesting depth of the query.  */
+
+static bool
+gimple_phi_nonnegative_warnv_p (gimple *stmt, bool *strict_overflow_p,
+   int depth)
+{
+  for (unsigned i = 0; i < gimple_phi_num_args (stmt); ++i)
+{
+  tree arg = gimple_phi_arg_def (stmt, i);
+  if (!tree_single_nonnegative_warnv_p (arg, strict_overflow_p, depth + 1))
+   return false;
+}
+  return true;
+}
+
 /* Return true if STMT is known to compute a non-negative value.
If the return value is based on the assumption that signed overflow is
undefined, set *STRICT_OVERFLOW_P to true; otherwise, don't change
@@ -6241,6 +6259,9 @@ gimple_stmt_nonnegative_warnv_p (gimple
 case GIMPLE_CALL:
   return gimple_call_nonnegative_warnv_p (stmt, strict_overflow_p,
  depth);
+case GIMPLE_PHI:
+  return gimple_phi_nonnegative_warnv_p (stmt, strict_overflow_p,
+depth);
 default:
   return false;
 }
Index: gcc/match.pd
===
--- gcc/match.pd(revision 228877)
+++ gcc/match.pd(working copy)
@@ -61,6 +61,7 @@ (define_operator_list COS BUILT_IN_COSF
 (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
 (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
 (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL)
+(define_operator_list CPROJ BUILT_IN_CPROJF BUILT_IN_CPROJ BUILT_IN_CPROJL)
 
 /* Simplifications of operations with one constant operand and
simplifications to constants or single values.  */
@@ -2361,6 +2362,32 @@ (define_operator_list CEXPI BUILT_IN_CEX
(cbrts (pows tree_expr_nonnegative_p@0 @1))
(pows @0 (mult @1 { build_real_truncate (type, dconst_third ()); })
 
+/* If the real part is inf and the imag part is known to be
+   nonnegative, return (inf + 0i).  */
+(simplify
+ (CPROJ (complex REAL_CST@0 tree_expr_nonnegative_p@1))
+ (if (real_isinf (TREE_REAL_CST_PTR (@0)))
+  (with
+{
+  REAL_VALUE_TYPE rinf;
+  real_inf (&rinf);
+}
+   { build_complex (type, build_real (TREE_TYPE (type), rinf),
+   build_zero_cst (TREE_TYPE (type))); })))
+/* If the imag part is inf, return (inf+I*copysign(0,imag)).  */
+(simplify
+ (CPROJ (complex @0 REAL_CST@1))
+ (if (real_isinf (TREE_REAL_CST_PTR (@1)))
+  (with
+{
+  REAL_VALUE_TYPE rinf, rzero = dconst0;
+  real_inf (&rinf);
+  rzero.sign = TREE_REAL_CST_PTR (@1)->sign;
+}
+   { build_complex (type, build_real (TREE_TYPE (type), rinf),
+   build_real (TREE_TYPE (type), rzero)); })))
+
+
 /* Narrowing of arithmetic and logical operations. 
 
These are conceptually similar to the transformations performed for
Index: gcc/builtins.c
===
--- gcc/builtins.c  (revision 228877)
+++ gcc/builtins.c  (working copy)
@@ -7657,33 +7657,6 @@ fold_builtin_cproj (location_t loc, tree
   else
return arg;
 }
-  else if (TREE_CODE (arg) == COMPLEX_EXPR)
-{
-  tree real = TREE_OPERAND (arg, 0);
-  tree imag = TREE_OPERAND (arg, 1);
-
-  STRIP_NOPS (real);
-  STRIP_NOPS (imag);
-  
-  /* If the real part is inf and the imag part is known to be
-nonnegative, return (inf + 0i).  Remember side-effects are
-possible in the imag part.  */
-  if (TREE_CODE (real) == REAL_CST
- && real_isinf (TREE_REAL_CST_PTR (real))
- && tree_expr_nonnegative_p (imag))
-   return omit_one_operand_loc (loc, type,
-build_complex_cproj (type, false),
-arg);
-  
-  /* If the imag part is inf, return (inf+I*copysign(0,imag)).
-Remember side-effects are possible in the real part.  */
-  if (TREE_CODE (imag) == REAL_CST
- && real_

[PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation

2015-10-19 Thread Kyrill Tkachov

Hi all,

The fcvtzs and fcvtzu instructions have a form where they convert to a 
fixed-point form with a specified number of
fractional bits. In practice this has the effect of multiplying the floating point 
argument by 2^
and then converting the result to integer. We can exploit that behaviour during 
combine to eliminate a floating-point multiplication
by an FP immediate that is a power of 2 i.e. 4.0, 8.0, 16.0 etc.
For example for code:
int
sffoo1 (float a)
{
  return a * 4.0f;
}

we currently generate:
sffoo1:
fmovs1, 4.0e+0
fmuls0, s0, s1
fcvtzs  w0, s0
ret

with this patch we can generate:
sffoo1:
fcvtzs  w0, s0, #2
ret

We already pefrom the analogous combination for the arm target (see the 
*combine_vcvtf2i pattern in config/arm/vfp.md)
However, this patch also implements the fcvtzu form i.e. the unsigned_fix form 
as well as the vector forms.

However, not everything is rosy. The code:
int
foo (float a)
{
  return a * 32.0f;
}

will not trigger the optimisation because 32.0f is stored in the constant pool 
and due to a deficiency in
simplify-rtx.c the simplification doesn't get through. I have a patch to fix 
that as part 2/2.

Also, for code:
int
foo (float a)
{
  return a * 2.0f;
}

This gets folded early on as a + a and thus emits an fadd instruction followed 
by a fcvtzs.
Nothing we can do about that (in this patch at least).

I've seen this trigger once in 453.povray in SPEC2006 and one other time in 
435.gromacs after
patch 2/2 is applied. I've heard this can also trigger in codec-like codebases 
and I did see it
trigger a few times in ffmpeg.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-10-19  Kyrylo Tkachov  

* config/aarch64/aarch64.md
 (*aarch64_fcvt2_mult): New pattern.
* config/aarch64/aarch64-simd.md
 (*aarch64_fcvt2_mult): Likewise.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
(aarch64_fpconst_pow_of_2): New function.
(aarch64_vec_fpconst_pow_of_2): Likewise.
* config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
prototype.
(aarch64_vec_fpconst_pow_of_2): Likewise.
* config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
(aarch64_fp_vec_pow2): Likewise.

2015-10-19  Kyrylo Tkachov  

* gcc.target/aarch64/fmul_fcvt_1.c: New test.
* gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
commit a13a5967a1f94744776d616ca84d5512b24bf546
Author: Kyrylo Tkachov 
Date:   Thu Oct 8 15:17:47 2015 +0100

[AArch64] Add fmul+fcvt optimisation

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index a8ac8d3..309dcfb 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -294,12 +294,14 @@ enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx);
 enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
+int aarch64_fpconst_pow_of_2 (rtx);
 machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 		   machine_mode);
 int aarch64_hard_regno_mode_ok (unsigned, machine_mode);
 int aarch64_hard_regno_nregs (unsigned, machine_mode);
 int aarch64_simd_attr_length_move (rtx_insn *);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
+int aarch64_vec_fpconst_pow_of_2 (rtx);
 rtx aarch64_final_eh_return_addr (void);
 rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int);
 const char *aarch64_output_move_struct (rtx *operands);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6a2ab61..3d2c496 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1654,6 +1654,27 @@ (define_insn "l2"
   [(set_attr "type" "neon_fp_to_int_")]
 )
 
+(define_insn "*aarch64_fcvt2_mult"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(FIXUORS: (unspec:
+			   [(mult:VDQF
+	 (match_operand:VDQF 1 "register_operand" "w")
+	 (match_operand:VDQF 2 "aarch64_fp_vec_pow2" ""))]
+			   UNSPEC_FRINTZ)))]
+  "TARGET_SIMD
+   && IN_RANGE (aarch64_vec_fpconst_pow_of_2 (operands[2]), 1,
+		GET_MODE_BITSIZE (GET_MODE_INNER (mode)))"
+  {
+int fbits = aarch64_vec_fpconst_pow_of_2 (operands[2]);
+char buf[64];
+sprintf (buf, "fcvtz\\t%%0., %%1., #%d", fbits);
+output_asm_insn (buf, operands);
+return "";
+  }
+  [(set_attr "type" "neon_fp_to_int_")]
+)
+
+
 (define_expand "2"
   [(set (match_operand: 0 "register_operand")
 	(FIXUORS: (unspec:
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2ec76a5..9b76746 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6808,6 +6808,19 @@ cost_plus:
 	  else
 	*cost += extra_cost->fp[GET_MODE (x) == DFmode].toint;
 	}
+
+  /* We can combine fmul by a power of 2 followed by a fcvt into a single
+	 fixed-point fcvt.  */
+  if (GET_CODE 

[PATCH][simplify-rtx][2/2] Use constants from pool when simplifying binops

2015-10-19 Thread Kyrill Tkachov

Hi all,

This second patch teaches simplify_binary_operation to return the dereferenced
constants from the constant pool in the binary expression if other 
simplifications failed.

This, combined with the 1/2 patch for aarch64
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01744.html) allow for:

int
foo (float a)
{
  return a * 32.0f;
}

to generate the code:
foo:
fcvtzs  w0, s0, #5
ret

because combine now successfully tries to match:
(set (reg/i:SI 0 x0)
(fix:SI (mult:SF (reg:SF 32 v0 [ a ])
(const_double:SF 3.2e+1 [0x0.8p+6]

whereas before it would not try the to use the const_double directly
but rather its constant pool reference.

I've seen this patch trigger once in 453.gromacs from SPEC2006 on aarch64 where 
it
ended up eliminating a floating-point multiplication and a load from a constant 
pool.
There were no other changes, so I reckon this is pretty low impact.

Bootstrapped and tested on aarch64, arm, x86_64.
CC'ing Eric as this is an RTL optimisation and Segher as this is something that
has an effect through combine.

Ok for trunk?

Thanks,
Kyrill

2015-10-19  Kyrylo Tkachov  

* simplify-rtx.c (simplify_binary_operation): If either operand was
a constant pool reference use them if all other simplifications failed.

2015-10-19  Kyrylo Tkachov  

* gcc.target/aarch64/fmul_fcvt_1.c: Add multiply-by-32 cases.
commit f941a03f6ca5dcc0d509490d0e0ec39cefed714b
Author: Kyrylo Tkachov 
Date:   Mon Oct 12 17:12:34 2015 +0100

[simplify-rtx] Use constants from pool when simplifying binops

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 5ea5522..519850a 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2001,7 +2001,17 @@ simplify_binary_operation (enum rtx_code code, machine_mode mode,
   tem = simplify_const_binary_operation (code, mode, trueop0, trueop1);
   if (tem)
 return tem;
-  return simplify_binary_operation_1 (code, mode, op0, op1, trueop0, trueop1);
+  tem = simplify_binary_operation_1 (code, mode, op0, op1, trueop0, trueop1);
+
+  if (tem)
+return tem;
+
+  /* If the above steps did not result in a simplification and op0 or op1
+ were constant pool references, use the referenced constants directly.  */
+  if (trueop0 != op0 || trueop1 != op1)
+return simplify_gen_binary (code, mode, trueop0, trueop1);
+
+  return NULL_RTX;
 }
 
 /* Subroutine of simplify_binary_operation.  Simplify a binary operation
diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
index 5af8290..354f2be 100644
--- a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
@@ -83,6 +83,17 @@ FUNC_DEFD (16)
 /* { dg-final { scan-assembler "fcvtzu\tx\[0-9\], d\[0-9\]*.*#4" } } */
 /* { dg-final { scan-assembler "fcvtzu\tw\[0-9\], d\[0-9\]*.*#4" } } */
 
+FUNC_DEFS (32)
+FUNC_DEFD (32)
+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzs\tx\[0-9\], s\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzs\tx\[0-9\], d\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], d\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzu\tw\[0-9\], s\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzu\tx\[0-9\], s\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzu\tx\[0-9\], d\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzu\tw\[0-9\], d\[0-9\]*.*#5" } } */
+
 
 #define FUNC_TESTS(__a, __b)	\
 do\
@@ -120,10 +131,12 @@ main (void)
   FUNC_TESTS (4, i);
   FUNC_TESTS (8, i);
   FUNC_TESTS (16, i);
+  FUNC_TESTS (32, i);
 
   FUNC_TESTD (4, i);
   FUNC_TESTD (8, i);
   FUNC_TESTD (16, i);
+  FUNC_TESTD (32, i);
 }
   return 0;
 }


[PATCH v10][aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-10-19 Thread Benedikt Huber
This tenth revision of the patch:
 * Removes unnecessary enum.

Ok for check in.


Benedikt Huber (1):
  2015-10-19  Benedikt Huber  
Philipp Tomsich  

 gcc/ChangeLog  |  20 
 gcc/config/aarch64/aarch64-builtins.c  | 115 +
 gcc/config/aarch64/aarch64-protos.h|   4 +
 gcc/config/aarch64/aarch64-simd.md |  27 +
 gcc/config/aarch64/aarch64-tuning-flags.def|   1 +
 gcc/config/aarch64/aarch64.c   | 107 ++-
 gcc/config/aarch64/aarch64.md  |   3 +
 gcc/config/aarch64/aarch64.opt |   5 +
 gcc/doc/invoke.texi|  12 +++
 gcc/testsuite/gcc.target/aarch64/rsqrt_1.c | 111 
 .../gcc.target/aarch64/rsqrt_asm_check_1.c |  25 +
 .../gcc.target/aarch64/rsqrt_asm_check_common.h|  42 
 .../aarch64/rsqrt_asm_check_negative_1.c   |  12 +++
 13 files changed, 482 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt_asm_check_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt_asm_check_common.h
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/rsqrt_asm_check_negative_1.c

-- 
1.9.1



[PATCH] 2015-10-19 Benedikt Huber Philipp Tomsich

2015-10-19 Thread Benedikt Huber
* config/aarch64/aarch64-builtins.c: Builtins for rsqrt and rsqrtf.
* config/aarch64/aarch64-protos.h: Declare.
* config/aarch64/aarch64-simd.md: Matching expressions for frsqrte and
frsqrts.
* config/aarch64/aarch64-tuning-flags.def: Added recip_sqrt.
* config/aarch64/aarch64.c: New functions. Emit rsqrt estimation code 
when
applicable.
* config/aarch64/aarch64.md: Added enum entries.
* config/aarch64/aarch64.opt: Added option -mlow-precision-recip-sqrt.
* testsuite/gcc.target/aarch64/rsqrt_asm_check_common.h: Common macros 
for
assembly checks.
* testsuite/gcc.target/aarch64/rsqrt_asm_check_negative_1.c: Make sure
frsqrts and frsqrte are not emitted.
* testsuite/gcc.target/aarch64/rsqrt_asm_check_1.c: Make sure frsqrts 
and
frsqrte are emitted.
* testsuite/gcc.target/aarch64/rsqrt_1.c: Functional tests for rsqrt.

Signed-off-by: Philipp Tomsich 
---
 gcc/ChangeLog  |  20 
 gcc/config/aarch64/aarch64-builtins.c  | 115 +
 gcc/config/aarch64/aarch64-protos.h|   4 +
 gcc/config/aarch64/aarch64-simd.md |  27 +
 gcc/config/aarch64/aarch64-tuning-flags.def|   1 +
 gcc/config/aarch64/aarch64.c   | 107 ++-
 gcc/config/aarch64/aarch64.md  |   3 +
 gcc/config/aarch64/aarch64.opt |   5 +
 gcc/doc/invoke.texi|  12 +++
 gcc/testsuite/gcc.target/aarch64/rsqrt_1.c | 111 
 .../gcc.target/aarch64/rsqrt_asm_check_1.c |  25 +
 .../gcc.target/aarch64/rsqrt_asm_check_common.h|  42 
 .../aarch64/rsqrt_asm_check_negative_1.c   |  12 +++
 13 files changed, 482 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt_asm_check_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt_asm_check_common.h
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/rsqrt_asm_check_negative_1.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f39753d..596c9c3 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,23 @@
+2015-10-19  Benedikt Huber  
+   Philipp Tomsich  
+
+   * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and rsqrtf.
+   * config/aarch64/aarch64-protos.h: Declare.
+   * config/aarch64/aarch64-simd.md: Matching expressions for frsqrte and
+   frsqrts.
+   * config/aarch64/aarch64-tuning-flags.def: Added recip_sqrt.
+   * config/aarch64/aarch64.c: New functions. Emit rsqrt estimation code 
when
+   applicable.
+   * config/aarch64/aarch64.md: Added enum entries.
+   * config/aarch64/aarch64.opt: Added option -mlow-precision-recip-sqrt.
+   * testsuite/gcc.target/aarch64/rsqrt_asm_check_common.h: Common macros 
for
+   assembly checks.
+   * testsuite/gcc.target/aarch64/rsqrt_asm_check_negative_1.c: Make sure
+   frsqrts and frsqrte are not emitted.
+   * testsuite/gcc.target/aarch64/rsqrt_asm_check_1.c: Make sure frsqrts 
and
+   frsqrte are emitted.
+   * testsuite/gcc.target/aarch64/rsqrt_1.c: Functional tests for rsqrt.
+
 2015-10-16  Trevor Saunders  
 
* lra-constraints.c (add_next_usage_insn): Change argument type
diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index a1998ed..6b4208f 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -324,6 +324,11 @@ enum aarch64_builtins
   AARCH64_BUILTIN_GET_FPSR,
   AARCH64_BUILTIN_SET_FPSR,
 
+  AARCH64_BUILTIN_RSQRT_DF,
+  AARCH64_BUILTIN_RSQRT_SF,
+  AARCH64_BUILTIN_RSQRT_V2DF,
+  AARCH64_BUILTIN_RSQRT_V2SF,
+  AARCH64_BUILTIN_RSQRT_V4SF,
   AARCH64_SIMD_BUILTIN_BASE,
   AARCH64_SIMD_BUILTIN_LANE_CHECK,
 #include "aarch64-simd-builtins.def"
@@ -822,6 +827,46 @@ aarch64_init_crc32_builtins ()
 }
 }
 
+/* Add builtins for reciprocal square root.  */
+
+void
+aarch64_init_builtin_rsqrt (void)
+{
+  tree fndecl = NULL;
+  tree ftype = NULL;
+
+  tree V2SF_type_node = build_vector_type (float_type_node, 2);
+  tree V2DF_type_node = build_vector_type (double_type_node, 2);
+  tree V4SF_type_node = build_vector_type (float_type_node, 4);
+
+  struct builtin_decls_data
+  {
+tree type_node;
+const char *builtin_name;
+int function_code;
+  };
+
+  builtin_decls_data bdda[] =
+  {
+{ double_type_node, "__builtin_aarch64_rsqrt_df", AARCH64_BUILTIN_RSQRT_DF 
},
+{ float_type_node, "__builtin_aarch64_rsqrt_sf", AARCH64_BUILTIN_RSQRT_SF 
},
+{ V2DF_type_node, "__builtin_aarch64_rsqrt_v2df", 
AARCH64_BUILTIN_RSQRT_V2DF },
+{ V2SF_type_node, "__builtin_aarch64_rsqrt_v2sf", 
AARCH64_BUILTIN_RSQRT_V2SF },
+{ V4SF_type_node, "__builtin_aarch64_rsqrt_v4sf", 
AARCH64_BUILTIN_RSQR

Re: [PATCH v10][aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-10-19 Thread Bernd Schmidt

On 01/04/1970 01:02 AM, Benedikt Huber wrote:

This tenth revision of the patch:
  * Removes unnecessary enum.


Please fix your clock.


Bernd



Re: [PATCH][simplify-rtx][2/2] Use constants from pool when simplifying binops

2015-10-19 Thread Segher Boessenkool
Hi Kyrill,

On Mon, Oct 19, 2015 at 02:57:54PM +0100, Kyrill Tkachov wrote:
> because combine now successfully tries to match:
> (set (reg/i:SI 0 x0)
> (fix:SI (mult:SF (reg:SF 32 v0 [ a ])
> (const_double:SF 3.2e+1 [0x0.8p+6]
> 
> whereas before it would not try the to use the const_double directly
> but rather its constant pool reference.

What happens if the constant pool reference is actually the better
code, do we still generate that?


Segher


Re: [PATCH][simplify-rtx][2/2] Use constants from pool when simplifying binops

2015-10-19 Thread Bernd Schmidt

On 10/19/2015 03:57 PM, Kyrill Tkachov wrote:

This second patch teaches simplify_binary_operation to return the
dereferenced
constants from the constant pool in the binary expression if other
simplifications failed.

This, combined with the 1/2 patch for aarch64
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01744.html) allow for:

int
foo (float a)
{
   return a * 32.0f;
}

to generate the code:
foo:
 fcvtzs  w0, s0, #5
 ret

because combine now successfully tries to match:
(set (reg/i:SI 0 x0)
 (fix:SI (mult:SF (reg:SF 32 v0 [ a ])
 (const_double:SF 3.2e+1 [0x0.8p+6]

whereas before it would not try the to use the const_double directly
but rather its constant pool reference.


The only way I could see a problem with that if there are circumstances 
where the memory variant would simplify further. That doesn't seem 
highly likely, so...



 * simplify-rtx.c (simplify_binary_operation): If either operand was
 a constant pool reference use them if all other simplifications
failed.


Ok.


Bernd



Re: Add a pass to back-propagate use information

2015-10-19 Thread Richard Biener
On Mon, Oct 19, 2015 at 2:38 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Thu, Oct 15, 2015 at 3:17 PM, Richard Sandiford
>>  wrote:
>>> This patch adds a pass that collects information that is common to all
>>> uses of an SSA name X and back-propagates that information up the statements
>>> that generate X.  The general idea is to use the information to simplify
>>> instructions (rather than a pure DCE) so I've simply called it
>>> tree-ssa-backprop.c, to go with tree-ssa-forwprop.c.
>>>
>>> At the moment the only use of the pass is to remove unnecessry sign
>>> operations, so that it's effectively a global version of
>>> fold_strip_sign_ops.  I'm hoping it could be extended in future to
>>> record which bits of an integer are significant.  There are probably
>>> other potential uses too.
>>>
>>> A later patch gets rid of fold_strip_sign_ops.
>>>
>>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>>> OK to install?
>>>
>>> Thanks,
>>> Richard
>>>
>>>
>>> gcc/
>>> * doc/invoke.texi (-fdump-tree-backprop, -ftree-backprop): Document.
>>> * Makefile.in (OBJS): Add tree-ssa-backprop.o.
>>> * common.opt (ftree-backprop): New option.
>>> * fold-const.h (negate_mathfn_p): Declare.
>>> * fold-const.c (negate_mathfn_p): Make public.
>>> * timevar.def (TV_TREE_BACKPROP): New.
>>> * tree-passes.h (make_pass_backprop): Declare.
>>> * passes.def (pass_backprop): Add.
>>> * tree-ssa-backprop.c: New file.
>>>
>>> gcc/testsuite/
>>> * gcc.dg/tree-ssa/backprop-1.c, gcc.dg/tree-ssa/backprop-2.c,
>>> gcc.dg/tree-ssa/backprop-3.c, gcc.dg/tree-ssa/backprop-4.c,
>>> gcc.dg/tree-ssa/backprop-5.c, gcc.dg/tree-ssa/backprop-6.c: New 
>>> tests.
>>>
>>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>>> index 783e4c9..69e669d 100644
>>> --- a/gcc/Makefile.in
>>> +++ b/gcc/Makefile.in
>>> @@ -1445,6 +1445,7 @@ OBJS = \
>>> tree-switch-conversion.o \
>>> tree-ssa-address.o \
>>> tree-ssa-alias.o \
>>> +   tree-ssa-backprop.o \
>>> tree-ssa-ccp.o \
>>> tree-ssa-coalesce.o \
>>> tree-ssa-copy.o \
>>> diff --git a/gcc/common.opt b/gcc/common.opt
>>> index 5060208..5aef625 100644
>>> --- a/gcc/common.opt
>>> +++ b/gcc/common.opt
>>> @@ -2364,6 +2364,10 @@ ftree-pta
>>>  Common Report Var(flag_tree_pta) Optimization
>>>  Perform function-local points-to analysis on trees.
>>>
>>> +ftree-backprop
>>> +Common Report Var(flag_tree_backprop) Init(1) Optimization
>>> +Enable backward propagation of use properties at the tree level.
>>
>> Don't add new -ftree-* "tree" doesn't add any info for our users.  I'd
>> also refer to SSA level rather than "tree" level.  Not sure if -fbackprop
>> is good, but let's go for it.
>
> OK.
>
>>> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
>>> index de45a2c..7f00e72 100644
>>> --- a/gcc/fold-const.c
>>> +++ b/gcc/fold-const.c
>>> @@ -319,7 +318,7 @@ fold_overflow_warning (const char* gmsgid, enum
>> warn_strict_overflow_code wc)
>>>  /* Return true if the built-in mathematical function specified by CODE
>>> is odd, i.e. -f(x) == f(-x).  */
>>>
>>> -static bool
>>> +bool
>>>  negate_mathfn_p (enum built_in_function code)
>>>  {
>>>switch (code)
>>
>> Belongs more to builtins.[ch] if exported.
>
> The long-term plan is to abstract away whether it's a built-in function
> or an internal function, in which case I hope to have a single predicate
> that handles both.  I'm not sure where the code should go after that change.
> Maybe a new file?

Hmm, we'll see.  So just leave it in fold-const.c for now.

>>> diff --git a/gcc/fold-const.h b/gcc/fold-const.h
>>> index ee74dc8..4d5b24b 100644
>>> --- a/gcc/fold-const.h
>>> +++ b/gcc/fold-const.h
>>> @@ -173,6 +173,7 @@ extern tree sign_bit_p (tree, const_tree);
>>>  extern tree exact_inverse (tree, tree);
>>>  extern tree const_unop (enum tree_code, tree, tree);
>>>  extern tree const_binop (enum tree_code, tree, tree, tree);
>>> +extern bool negate_mathfn_p (enum built_in_function);
>>>
>>>  /* Return OFF converted to a pointer offset type suitable as offset for
>>> POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */
>>> diff --git a/gcc/passes.def b/gcc/passes.def
>>> index dc3f44c..36d2b3b 100644
>>> --- a/gcc/passes.def
>>> +++ b/gcc/passes.def
>>> @@ -159,6 +159,7 @@ along with GCC; see the file COPYING3.  If not see
>>>/* After CCP we rewrite no longer addressed locals into SSA
>>>  form if possible.  */
>>>NEXT_PASS (pass_complete_unrolli);
>>> +  NEXT_PASS (pass_backprop);
>>>NEXT_PASS (pass_phiprop);
>>>NEXT_PASS (pass_forwprop);
>>
>> Any reason to not put this later?  I was thinking before reassoc.
>
> I think we're relying on FRE to notice the redundancy in the
> builtins-*.c tests, once this pass has converted the version
> with redundant sign ops to make it look like the version without.
> reassoc is likel

Re: [PATCH][simplify-rtx][2/2] Use constants from pool when simplifying binops

2015-10-19 Thread Kyrill Tkachov


On 19/10/15 15:31, Segher Boessenkool wrote:

Hi Kyrill,


Hi Segher,



On Mon, Oct 19, 2015 at 02:57:54PM +0100, Kyrill Tkachov wrote:

because combine now successfully tries to match:
(set (reg/i:SI 0 x0)
 (fix:SI (mult:SF (reg:SF 32 v0 [ a ])
 (const_double:SF 3.2e+1 [0x0.8p+6]

whereas before it would not try the to use the const_double directly
but rather its constant pool reference.

What happens if the constant pool reference is actually the better
code, do we still generate that?


In that case I think the previous calls to simplify_const_binary_operation and
simplify_binary_operation_1 should have returned a non-NULL rtx.

Kyrill





Segher





Re: [PATCH][simplify-rtx][2/2] Use constants from pool when simplifying binops

2015-10-19 Thread Kyrill Tkachov

Hi Bernd,

On 19/10/15 15:31, Bernd Schmidt wrote:

On 10/19/2015 03:57 PM, Kyrill Tkachov wrote:

This second patch teaches simplify_binary_operation to return the
dereferenced
constants from the constant pool in the binary expression if other
simplifications failed.

This, combined with the 1/2 patch for aarch64
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01744.html) allow for:

int
foo (float a)
{
   return a * 32.0f;
}

to generate the code:
foo:
 fcvtzs  w0, s0, #5
 ret

because combine now successfully tries to match:
(set (reg/i:SI 0 x0)
 (fix:SI (mult:SF (reg:SF 32 v0 [ a ])
 (const_double:SF 3.2e+1 [0x0.8p+6]

whereas before it would not try the to use the const_double directly
but rather its constant pool reference.


The only way I could see a problem with that if there are circumstances where 
the memory variant would simplify further. That doesn't seem highly likely, 
so...



I that were the case, I'd expect the earlier call to 
simplify_binary_operation_1 have returned a non-NULL rtx,
and the code in this patch would not come into play.


 * simplify-rtx.c (simplify_binary_operation): If either operand was
 a constant pool reference use them if all other simplifications
failed.


Ok.


Thanks,
I'll commit it when the first (aarch64-specific) patch is approved.

Kyrill




Bernd





Move cabs simplifications to match.pd

2015-10-19 Thread Richard Sandiford
The fold code also expanded cabs(x+yi) to fsqrt(x*x+y*y) when optimising
for speed.  tree-ssa-math-opts.c has this transformation too, but unlike
the fold code, it first checks whether the target implements the sqrt
optab.  The patch simply removes the fold code and keeps the
tree-ssa-math-opts.c logic the same.

gcc.dg/lto/20110201-1_0.c was relying on us replacing cabs
with fsqrt even on targets where fsqrt is itself a library call.
The discussion leading up to that patch suggested that we only
want to test the fold on targets with a square root instruction,
so it would be OK to skip the test on other targets:

https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01961.html
https://gcc.gnu.org/ml/gcc-patches/2011-07/msg02036.html

The patch does that using the sqrt_insn effective target.

It's possible that removing the tree folds renders the LTO trick
unnecessary, but since the test was originally for an ICE, it seems
better to leave it as-is.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
20110201-1_0.c passes on all three.  OK to install?

Thanks,
Richard


gcc/
* builtins.c (fold_builtin_cabs): Delete.
(fold_builtin_1): Update accordingly.  Handle constant arguments here.
* match.pd: Add rules previously handled by fold_builtin_cabs.

gcc/testsuite/
* gcc.dg/lto/20110201-1_0.c: Restrict to sqrt_insn targets.
Add associated options for arm*-*-*.
(sqrt): Remove dummy definition.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 1e4ec35..8f87fd9 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7539,82 +7539,6 @@ fold_fixed_mathfn (location_t loc, tree fndecl, tree arg)
   return NULL_TREE;
 }
 
-/* Fold call to builtin cabs, cabsf or cabsl with argument ARG.  TYPE is the
-   return type.  Return NULL_TREE if no simplification can be made.  */
-
-static tree
-fold_builtin_cabs (location_t loc, tree arg, tree type, tree fndecl)
-{
-  tree res;
-
-  if (!validate_arg (arg, COMPLEX_TYPE)
-  || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != REAL_TYPE)
-return NULL_TREE;
-
-  /* Calculate the result when the argument is a constant.  */
-  if (TREE_CODE (arg) == COMPLEX_CST
-  && (res = do_mpfr_arg2 (TREE_REALPART (arg), TREE_IMAGPART (arg),
- type, mpfr_hypot)))
-return res;
-
-  if (TREE_CODE (arg) == COMPLEX_EXPR)
-{
-  tree real = TREE_OPERAND (arg, 0);
-  tree imag = TREE_OPERAND (arg, 1);
-
-  /* If either part is zero, cabs is fabs of the other.  */
-  if (real_zerop (real))
-   return fold_build1_loc (loc, ABS_EXPR, type, imag);
-  if (real_zerop (imag))
-   return fold_build1_loc (loc, ABS_EXPR, type, real);
-
-  /* cabs(x+xi) -> fabs(x)*sqrt(2).  */
-  if (flag_unsafe_math_optimizations
- && operand_equal_p (real, imag, OEP_PURE_SAME))
-{
- STRIP_NOPS (real);
- return fold_build2_loc (loc, MULT_EXPR, type,
- fold_build1_loc (loc, ABS_EXPR, type, real),
- build_real_truncate (type, dconst_sqrt2 ()));
-   }
-}
-
-  /* Optimize cabs(-z) and cabs(conj(z)) as cabs(z).  */
-  if (TREE_CODE (arg) == NEGATE_EXPR
-  || TREE_CODE (arg) == CONJ_EXPR)
-return build_call_expr_loc (loc, fndecl, 1, TREE_OPERAND (arg, 0));
-
-  /* Don't do this when optimizing for size.  */
-  if (flag_unsafe_math_optimizations
-  && optimize && optimize_function_for_speed_p (cfun))
-{
-  tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
-
-  if (sqrtfn != NULL_TREE)
-   {
- tree rpart, ipart, result;
-
- arg = builtin_save_expr (arg);
-
- rpart = fold_build1_loc (loc, REALPART_EXPR, type, arg);
- ipart = fold_build1_loc (loc, IMAGPART_EXPR, type, arg);
-
- rpart = builtin_save_expr (rpart);
- ipart = builtin_save_expr (ipart);
-
- result = fold_build2_loc (loc, PLUS_EXPR, type,
-   fold_build2_loc (loc, MULT_EXPR, type,
-rpart, rpart),
-   fold_build2_loc (loc, MULT_EXPR, type,
-ipart, ipart));
-
- return build_call_expr_loc (loc, sqrtfn, 1, result);
-   }
-}
-
-  return NULL_TREE;
-}
-
 /* Build a complex (inf +- 0i) for the result of cproj.  TYPE is the
complex tree type of the result.  If NEG is true, the imaginary
zero is negative.  */
@@ -9683,7 +9607,11 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
 break;
 
 CASE_FLT_FN (BUILT_IN_CABS):
-  return fold_builtin_cabs (loc, arg0, type, fndecl);
+  if (TREE_CODE (arg0) == COMPLEX_CST
+ && TREE_CODE (TREE_TYPE (TREE_TYPE (arg0))) == REAL_TYPE)
+return do_mpfr_arg2 (TREE_REALPART (arg0), TREE_IMAGPART (arg0),
+type, mpfr_hypot);
+  break;
 
 CASE_FLT_FN (BUILT_IN_CARG):
   return fold_builti

Re: Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))

2015-10-19 Thread Michael Matz
Hi,

On Fri, 16 Oct 2015, David Malcolm wrote:

> This fixes much of the bloat seen for influence.i when sending ranges 
> through for every token.

Yeah, I think that's on the right track.

> This was with 8 bits allocated for packed ranges (which is probably 
> excessive, but it makes debugging easier).

Probably in the end it should be done similar to how column bits are dealt 
with, start with a reasonably low number (5 bits?) and increase if 
necessary and budget allows (budget being column+range < N bits && range < 
8 bits, or so; so that range can't consume all of the column bits).


Ciao,
Michael.


Re: Add a pass to back-propagate use information

2015-10-19 Thread Richard Sandiford
Richard Biener  writes:
 diff --git a/gcc/fold-const.c b/gcc/fold-const.c
 index de45a2c..7f00e72 100644
 --- a/gcc/fold-const.c
 +++ b/gcc/fold-const.c
 @@ -319,7 +318,7 @@ fold_overflow_warning (const char* gmsgid, enum
>>> warn_strict_overflow_code wc)
  /* Return true if the built-in mathematical function specified by CODE
 is odd, i.e. -f(x) == f(-x).  */

 -static bool
 +bool
  negate_mathfn_p (enum built_in_function code)
  {
switch (code)
>>>
>>> Belongs more to builtins.[ch] if exported.
>>
>> The long-term plan is to abstract away whether it's a built-in function
>> or an internal function, in which case I hope to have a single predicate
>> that handles both.  I'm not sure where the code should go after that change.
>> Maybe a new file?
>
> Hmm, we'll see.  So just leave it in fold-const.c for now.

OK, thanks.

 diff --git a/gcc/fold-const.h b/gcc/fold-const.h
 index ee74dc8..4d5b24b 100644
 --- a/gcc/fold-const.h
 +++ b/gcc/fold-const.h
 @@ -173,6 +173,7 @@ extern tree sign_bit_p (tree, const_tree);
  extern tree exact_inverse (tree, tree);
  extern tree const_unop (enum tree_code, tree, tree);
  extern tree const_binop (enum tree_code, tree, tree, tree);
 +extern bool negate_mathfn_p (enum built_in_function);

  /* Return OFF converted to a pointer offset type suitable as offset for
 POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */
 diff --git a/gcc/passes.def b/gcc/passes.def
 index dc3f44c..36d2b3b 100644
 --- a/gcc/passes.def
 +++ b/gcc/passes.def
 @@ -159,6 +159,7 @@ along with GCC; see the file COPYING3.  If not see
/* After CCP we rewrite no longer addressed locals into SSA
  form if possible.  */
NEXT_PASS (pass_complete_unrolli);
 +  NEXT_PASS (pass_backprop);
NEXT_PASS (pass_phiprop);
NEXT_PASS (pass_forwprop);
>>>
>>> Any reason to not put this later?  I was thinking before reassoc.
>>
>> I think we're relying on FRE to notice the redundancy in the
>> builtins-*.c tests, once this pass has converted the version
>> with redundant sign ops to make it look like the version without.
>> reassoc is likely to be too late.
>
> There is PRE after reassoc (run as FRE at -O1).

Ah, OK.  It looks like that also runs after sincos though, whereas
I think we want an FRE between this pass and sincos.

>> I also thought it should go before rather than after some instance
>> of forwprop because the pass might expose more forward folding
>> opportunities.  E.g. if the sign of A = -B * B doesn't matter,
>> we'll end up with A = B * B, which might be foldable with uses of A.
>> It seems less likely that forwprop would expose more backprop
>> opportunities.
>
> Indeed.  I was asking because backprop runs after inlining but
> with nearly no effective scalar cleanup after it to cleanup after
> inlining.
>
> In principle it only depends on some kind of DCE (DSE?) to avoid
> false uses, right?

Yeah, that sounds right.  Complex control-flow leading up to uses
shouldn't be a problem, but phantom uses would be a blocker.
>
> It's probably ok where you put it, I just wanted to get an idea of your
> reasoning.
>
>>
 +/* Make INFO describe all uses of RHS in ASSIGN.  */
 +
 +void
 +backprop::process_assign_use (gassign *assign, tree rhs, usage_info *info)
 +{
 +  tree lhs = gimple_assign_lhs (assign);
 +  switch (gimple_assign_rhs_code (assign))
 +{
 +case ABS_EXPR:
 +  /* The sign of the input doesn't matter.  */
 +  info->flags.ignore_sign = true;
 +  break;
 +
 +case COND_EXPR:
 +  /* For A = B ? C : D, propagate information about all uses of A
 +to B and C.  */
 +  if (rhs != gimple_assign_rhs1 (assign))
 +   if (const usage_info *lhs_info = lookup_operand (lhs))
>>>
>>> Use && instead of nested if
>>
>> That means introducing an extra level of braces just for something
>> that that isn't needed by the first statement, i.e.:
>>
>> {
>>   const usage_info *lhs_info;
>>   if (rhs != gimple_assign_rhs1 (assign)
>>   && (lhs_info = lookup_operand (lhs)))
>> *info = *lhs_info;
>>   break;
>> }
>>
>> There also used to be a strong preference for not embedding assignments
>> in && and || conditions.
>>
>> If there had been some other set-up for the lookup_operand call, we
>> would have had:
>>
>>   if (rhs != gimple_assign_rhs1 (assign))
>> {
>>   ...
>>   if (const usage_info *lhs_info = lookup_operand (lhs))
>> ..
>> }
>>
>> and presumably that would have been OK.  So if the original really isn't,
>> acceptable, I'd rather write it as:
>>
>>   if (rhs != gimple_assign_rhs1 (assign))
>> {
>>   const usage_info *lhs_info = lookup_operand (lhs);
>>   if (lhs_info)
>> ..
>> 

Re: [gomp4.1] map clause parsing improvements

2015-10-19 Thread Thomas Schwinge
Hi!

On Mon, 19 Oct 2015 12:34:08 +0200, Jakub Jelinek  wrote:
> On Mon, Oct 19, 2015 at 12:20:23PM +0200, Thomas Schwinge wrote:
> > > +/* Decrement usage count and deallocate if zero.  */
> > > +GOMP_MAP_RELEASE =   (GOMP_MAP_FLAG_ALWAYS
> > > +  | GOMP_MAP_FORCE_DEALLOC)
> > >};
> > 
> > I have not yet read the OpenMP 4.1/4.5 standard, but it's not obvious to
> > me here how the GOMP_MAP_FLAG_ALWAYS flag relates to the OpenMP release
> > clause (GOMP_MAP_RELEASE here)?  Shouldn't GOMP_MAP_RELEASE be
> > "(GOMP_MAP_FLAG_SPECIAL_1 | 3)" or similar?
> 
> It isn't related to always, but always really is something that affects
> solely the data movement (i.e. to, from, tofrom), and while it can be
> specified elsewhere, it makes no difference.  Wasting one bit just for that
> is something we don't have the luxury for, which is why I've started using
> that bit for other OpenMP stuff (it acts there like GOMP_MAP_FLAG_SPECIAL_2
> to some extent).  It is not just release, but also the struct mapping etc.
> I'll still need to make further changes, because the rules for mapping
> structure element pointer/reference based array sections and structure
> element references have changed again.

Hmm, I do think we should allow the luxury to use its own bit for
GOMP_MAP_FLAG_ALWAYS -- we can extend the interface later, should we
really find uses for the other two remaining bits -- or if not using a
separate bit, at least make sure that GOMP_MAP_FLAG_ALWAYS is not used as
a flag.  See, for example, the following occasions where
GOMP_MAP_FLAG_ALWAYS is used as a flag: these conditionals will also be
matched for GOMP_MAP_STRUCT, GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION, and
GOMP_MAP_RELEASE.  I have not analyzed whether that is erroneous or not,
but it surely is confusing?

$ < gcc/gimplify.c grep -C3 GOMP_MAP_FLAG_ALWAYS
  struct_map_to_clause->put (decl, *list_p);
  list_p = &OMP_CLAUSE_CHAIN (*list_p);
  flags = GOVD_MAP | GOVD_EXPLICIT;
  if (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
flags |= GOVD_SEEN;
  goto do_add_decl;
}
--
  tree *sc = NULL, *pt = NULL;
  if (!ptr && TREE_CODE (*osc) == TREE_LIST)
osc = &TREE_PURPOSE (*osc);
  if (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
n->value |= GOVD_SEEN;
  offset_int o1, o2;
  if (offset)
--
  n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
  if ((ctx->region_type & ORT_TARGET) != 0
  && !(n->value & GOVD_SEEN)
  && ((OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS) == 0
  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT))
{
  remove = true;

I'd suggest turning GOMP_MAP_FLAG_ALWAYS into GOMP_MAP_FLAG_SPECIAL_2,
and then provide a GOMP_MAP_ALWAYS_P that evaluates to true just for the
three "always,to", "always,from", and "always,tofrom" cases.


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: Fix prototype for print_insn in rtl.h

2015-10-19 Thread Jeff Law

On 10/15/2015 10:28 AM, Andrew MacLeod wrote:

On 10/13/2015 11:32 AM, Jeff Law wrote:

On 10/13/2015 02:21 AM, Nikolai Bozhenov wrote:

2015-10-13  Nikolai Bozhenov

 * gcc/rtl.h (print_insn): fix prototype

Installed on the trunk after bootstrap & regression test.

jeff


Sorry, a little late to the party.. but why is print_insn even in
rtl.h?  it seems that sched-vis.c is the only thing that uses it...

Then let's move it to sched-int.h, unless there's some good reason not to.

jeff


Re: Split out some tests from builtins-20.c

2015-10-19 Thread Jeff Law

On 10/15/2015 07:18 AM, Richard Sandiford wrote:

Stripping unnecessary sign ops at the gimple level means that we're
no longer able to optimise:

   if (cos(y<10 ? -fabs(x) : tan(x<20 ? -x : -fabs(y)))
   != cos(y<10 ? x : tan(x<20 ? x : y)))
 link_error ();

because we're currently not able to fold away the equality in:

int
f1 (double x, double y)
{
   double z1 = __builtin_cos(y<10 ? x : __builtin_tan(x<20 ? x : y));
   double z2 = __builtin_cos(y<10 ? x : __builtin_tan(x<20 ? x : y));
   return z1 == z2;
}

The missed fold is being tracked as PR 67975.  This patch splits the
test out into a separate file so that we can XFAIL it until the PR
is fixed.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/testsuite/
* gcc.dg/builtins-20.c: Move some tests to...
* gcc.dg/builtins-86.c: ...this new file.
Yes.  I just went through this in a totally unrelated area.  I'd much 
rather have the test split out and xfailed.


jeff



[c++-delayed-folding] Introduce convert_to_pointer_nofold

2015-10-19 Thread Marek Polacek
This patch introduces convert_to_pointer_nofold; a variant that only folds
CONSTANT_CLASS_P expressions.  In the C++ FE, convert_to_pointer was only used
in cp_convert_to_pointer which is only used in cp_convert.  Instead of
introducing many _nofold variants, I just made cp_convert_to_pointer use the
convert_to_pointer_nofold.

Bootstrapped/regtested on x86_64-linux, ok for branch?

diff --git gcc/convert.c gcc/convert.c
index 1ce8099..79b4138 100644
--- gcc/convert.c
+++ gcc/convert.c
@@ -39,10 +39,11 @@ along with GCC; see the file COPYING3.  If not see
 
 /* Convert EXPR to some pointer or reference type TYPE.
EXPR must be pointer, reference, integer, enumeral, or literal zero;
-   in other cases error is called.  */
+   in other cases error is called.  If FOLD_P is true, try to fold the
+   expression.  */
 
-tree
-convert_to_pointer (tree type, tree expr)
+static tree
+convert_to_pointer_1 (tree type, tree expr, bool fold_p)
 {
   location_t loc = EXPR_LOCATION (expr);
   if (TREE_TYPE (expr) == type)
@@ -58,10 +59,21 @@ convert_to_pointer (tree type, tree expr)
addr_space_t to_as = TYPE_ADDR_SPACE (TREE_TYPE (type));
addr_space_t from_as = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (expr)));
 
-   if (to_as == from_as)
- return fold_build1_loc (loc, NOP_EXPR, type, expr);
+   if (fold_p)
+ {
+   if (to_as == from_as)
+ return fold_build1_loc (loc, NOP_EXPR, type, expr);
+   else
+ return fold_build1_loc (loc, ADDR_SPACE_CONVERT_EXPR, type,
+ expr);
+ }
else
- return fold_build1_loc (loc, ADDR_SPACE_CONVERT_EXPR, type, expr);
+ {
+   if (to_as == from_as)
+ return build1_loc (loc, NOP_EXPR, type, expr);
+   else
+ return build1_loc (loc, ADDR_SPACE_CONVERT_EXPR, type, expr);
+ }
   }
 
 case INTEGER_TYPE:
@@ -75,20 +87,43 @@ convert_to_pointer (tree type, tree expr)
unsigned int pprec = TYPE_PRECISION (type);
unsigned int eprec = TYPE_PRECISION (TREE_TYPE (expr));
 
-   if (eprec != pprec)
- expr = fold_build1_loc (loc, NOP_EXPR,
- lang_hooks.types.type_for_size (pprec, 0),
- expr);
+   if (eprec != pprec)
+ {
+   tree totype = lang_hooks.types.type_for_size (pprec, 0);
+   if (fold_p)
+ expr = fold_build1_loc (loc, NOP_EXPR, totype, expr);
+   else
+ expr = build1_loc (loc, NOP_EXPR, totype, expr);
+ }
   }
 
-  return fold_build1_loc (loc, CONVERT_EXPR, type, expr);
+  if (fold_p)
+   return fold_build1_loc (loc, CONVERT_EXPR, type, expr);
+  return build1_loc (loc, CONVERT_EXPR, type, expr);
 
 default:
   error ("cannot convert to a pointer type");
-  return convert_to_pointer (type, integer_zero_node);
+  return convert_to_pointer_1 (type, integer_zero_node, fold_p);
 }
 }
 
+/* A wrapper around convert_to_pointer_1 that always folds the
+   expression.  */
+
+tree
+convert_to_pointer (tree type, tree expr)
+{
+  return convert_to_pointer_1 (type, expr, true);
+}
+
+/* A wrapper around convert_to_pointer_1 that only folds the
+   expression if it is CONSTANT_CLASS_P.  */
+
+tree
+convert_to_pointer_nofold (tree type, tree expr)
+{
+  return convert_to_pointer_1 (type, expr, CONSTANT_CLASS_P (expr));
+}
 
 /* Convert EXPR to some floating-point type TYPE.
 
diff --git gcc/convert.h gcc/convert.h
index ac78f95..24fa6bf 100644
--- gcc/convert.h
+++ gcc/convert.h
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 extern tree convert_to_integer (tree, tree);
 extern tree convert_to_integer_nofold (tree, tree);
 extern tree convert_to_pointer (tree, tree);
+extern tree convert_to_pointer_nofold (tree, tree);
 extern tree convert_to_real (tree, tree);
 extern tree convert_to_fixed (tree, tree);
 extern tree convert_to_complex (tree, tree);
diff --git gcc/cp/cvt.c gcc/cp/cvt.c
index 0a30270..cb73bb7 100644
--- gcc/cp/cvt.c
+++ gcc/cp/cvt.c
@@ -241,7 +241,7 @@ cp_convert_to_pointer (tree type, tree expr, tsubst_flags_t 
complain)
   gcc_assert (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (expr)))
  == GET_MODE_SIZE (TYPE_MODE (type)));
 
-  return convert_to_pointer (type, expr);
+  return convert_to_pointer_nofold (type, expr);
 }
 
   if (type_unknown_p (expr))

Marek


[c++-delayed-folding] Use convert_to_integer_nofold

2015-10-19 Thread Marek Polacek
As discussed in the other thread.  This is a patch I intend to commit to the
branch when I can actually commit stuff.

diff --git gcc/cp/semantics.c gcc/cp/semantics.c
index 9a8caa7..d15d6f9 100644
--- gcc/cp/semantics.c
+++ gcc/cp/semantics.c
@@ -5577,7 +5577,8 @@ finish_omp_clauses (tree clauses)
  if (OMP_CLAUSE_SCHEDULE_KIND (c)
  == OMP_CLAUSE_SCHEDULE_CILKFOR)
{
- t = convert_to_integer (long_integer_type_node, t);
+ t = convert_to_integer_nofold (long_integer_type_node,
+t);
  if (t == error_mark_node)
{
  remove = true;

Marek


[gomp4.5] Add checking that OpenMP loop iterators aren't referenced in the bounds/step expressions

2015-10-19 Thread Jakub Jelinek
Hi!

In 4.0 and earlier, there has just been a restriction that the lb, b and
incr expressions in the syntax (low/high bounds and step) can't change their
values during the loop, but in OpenMP 4.5 we have even stronger restriction,
the iterators may not be referenced in there at all, so even if you ignore
the value, or multiply or and with 0, subtract it from itself etc., it is
still invalid.  That means the compiler can actually easily diagnose invalid
loops.

2015-10-19  Jakub Jelinek  

gcc/
* tree.h (OMP_FOR_ORIG_DECLS): Use OMP_LOOP_CHECK instead of
OMP_FOR_CHECK.  Remove comment.
* tree.def (OMP_SIMD, CILK_SIMD, CILK_FOR, OMP_DISTRIBUTE,
OMP_TASKLOOP, OACC_LOOP): Add OMP_FOR_ORIG_DECLS argument.
gcc/c-family/
* c-common.h (c_omp_check_loop_iv, c_omp_check_loop_iv_exprs): New
prototypes.
* c-omp.c (c_finish_omp_for): Store OMP_FOR_ORIG_DECLS always.
Don't call add_stmt here.
(struct c_omp_check_loop_iv_data): New type.
(c_omp_check_loop_iv_r, c_omp_check_loop_iv,
c_omp_check_loop_iv_exprs): New functions.
gcc/c/
* c-parser.c (c_parser_omp_for_loop): Call c_omp_check_loop_iv.
Call add_stmt here.
gcc/cp/
* cp-tree.h (finish_omp_for): Add ORIG_INITS argument.
* parser.c (cp_parser_omp_for_loop_init): Add ORIG_INIT argument,
initialize it.
(cp_parser_omp_for_loop): Compute orig_inits, pass it's address
to finish_omp_for.
* pt.c (tsubst_expr): Use OMP_FOR_ORIG_DECLS for all
OpenMP/OpenACC/Cilk+ looping constructs.  Adjust finish_omp_for
caller.
* semantics.c (handle_omp_for_class_iterator): Add ORIG_DECLS
argument.  Call c_omp_check_loop_iv_exprs on cond.
(finish_omp_for): Add ORIG_INITS argument.  Call
c_omp_check_loop_iv_exprs on ORIG_INITS elements.  Adjust
handle_omp_for_class_iterator caller.  Call c_omp_check_loop_iv.
Call add_stmt.
gcc/testsuite/
* c-c++-common/gomp/pr67521.c: Add dg-error directives.
* gcc.dg/gomp/loop-1.c: New test.
* g++.dg/gomp/pr38639.C (foo): Adjust dg-error.
(bar): Remove dg-message.
* g++.dg/gomp/loop-1.C: New test.
* g++.dg/gomp/loop-2.C: New test.
* g++.dg/gomp/loop-3.C: New test.

--- gcc/tree.h.jj   2015-10-14 10:24:55.0 +0200
+++ gcc/tree.h  2015-10-19 12:01:11.390680056 +0200
@@ -1264,8 +1264,7 @@ extern void protected_set_expr_location
 #define OMP_FOR_COND(NODE)TREE_OPERAND (OMP_LOOP_CHECK (NODE), 3)
 #define OMP_FOR_INCR(NODE)TREE_OPERAND (OMP_LOOP_CHECK (NODE), 4)
 #define OMP_FOR_PRE_BODY(NODE)TREE_OPERAND (OMP_LOOP_CHECK (NODE), 5)
-/* Note that this is only available for OMP_FOR, hence OMP_FOR_CHECK.  */
-#define OMP_FOR_ORIG_DECLS(NODE)   TREE_OPERAND (OMP_FOR_CHECK (NODE), 6)
+#define OMP_FOR_ORIG_DECLS(NODE)   TREE_OPERAND (OMP_LOOP_CHECK (NODE), 6)
 
 #define OMP_SECTIONS_BODY(NODE)TREE_OPERAND (OMP_SECTIONS_CHECK (NODE), 0)
 #define OMP_SECTIONS_CLAUSES(NODE) TREE_OPERAND (OMP_SECTIONS_CHECK (NODE), 1)
--- gcc/tree.def.jj 2015-10-14 10:25:43.0 +0200
+++ gcc/tree.def2015-10-19 12:00:50.282982246 +0200
@@ -1101,28 +1101,28 @@ DEFTREECODE (OMP_TASK, "omp_task", tcc_s
 DEFTREECODE (OMP_FOR, "omp_for", tcc_statement, 7)
 
 /* OpenMP - #pragma omp simd [clause1 ... clauseN]
-   Operands like operands 1-6 of OMP_FOR.  */
-DEFTREECODE (OMP_SIMD, "omp_simd", tcc_statement, 6)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (OMP_SIMD, "omp_simd", tcc_statement, 7)
 
 /* Cilk Plus - #pragma simd [clause1 ... clauseN]
-   Operands like operands 1-6 of OMP_FOR.  */
-DEFTREECODE (CILK_SIMD, "cilk_simd", tcc_statement, 6)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (CILK_SIMD, "cilk_simd", tcc_statement, 7)
 
 /* Cilk Plus - _Cilk_for (..)
-   Operands like operands 1-6 of OMP_FOR.  */
-DEFTREECODE (CILK_FOR, "cilk_for", tcc_statement, 6)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (CILK_FOR, "cilk_for", tcc_statement, 7)
 
 /* OpenMP - #pragma omp distribute [clause1 ... clauseN]
-   Operands like operands 1-6 of OMP_FOR.  */
-DEFTREECODE (OMP_DISTRIBUTE, "omp_distribute", tcc_statement, 6)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (OMP_DISTRIBUTE, "omp_distribute", tcc_statement, 7)
 
 /* OpenMP - #pragma omp taskloop [clause1 ... clauseN]
-   Operands like operands 1-6 of OMP_FOR.  */
-DEFTREECODE (OMP_TASKLOOP, "omp_taskloop", tcc_statement, 6)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (OMP_TASKLOOP, "omp_taskloop", tcc_statement, 7)
 
 /* OpenMP - #pragma acc loop [clause1 ... clauseN]
-   Operands like operands 1-6 of OMP_FOR.  */
-DEFTREECODE (OACC_LOOP, "oacc_loop", tcc_statement, 6)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (OACC_LOOP, "oacc_loop", tcc_statement, 7)
 
 /* OpenMP - #pragma omp teams [clause1 ... clauseN]
Operand 0: OMP_TEAMS_BODY: Teams body.
--- gcc/c-family/c-common.h.jj  2015-10-14 10:24:54.00

Re: config header file reduction patch checked in.

2015-10-19 Thread Andrew MacLeod

On 10/18/2015 05:31 AM, Iain Sandoe wrote:

Hi Andrew,

On 16 Oct 2015, at 20:49, Andrew MacLeod wrote:


On 10/12/2015 04:04 AM, Jeff Law wrote:

On 10/08/2015 07:37 AM, Andrew MacLeod wrote:

On 10/07/2015 06:02 PM, Jeff Law wrote:

I'm slightly concerned about the darwin, windows and solaris bits.  The former 
primarily because Darwin has been a general source of pain, and in the others 
because I'm not sure the cross testing will exercise that code terribly much.

I'll go ahead and approve all the config/ bits.  Please be on the lookout for 
any fallout.

I'll try and get into more of the other patches tomorrow.



OK, I've checked in the config changes.  I rebuilt all the cross compilers for 
the 200+ targets, and they still build.. as well as bootstrapping on 
x86_64-pc-linux-gnu with no regressions.

So. If any one runs into a native build issue you can either add the required 
header back in, or back out the file for your port, and I'll look into why 
something happened.   The only thing I can imagine is files that have 
conditional compilation based on a  macro that is only ever defined on a native 
build command line or headers.  Its unlikely... but possible.

I've applied the following to fix Darwin native bootstrap.
AFAICT (from reading the other thread on the re-ordering tools) putting the 
diagnostics header at the end of the list is the right thing to do.

FWIW,
a) of course, Darwin exercises ObjC/ObjC++ in *both* NeXT and GNU mode - so 
those are pretty well covered by this too.

b) darwin folks will usually do their best to test any patch that you think is 
specifically risky - but you need to ask, because we have (very) limited 
resources in time and hardware ;-) ...

thanks for tidying things up!
(I, for one, think that improving the separation of things is worth a small 
amount of pain along the way).

cheers,
Iain

gcc/

+2015-10-18  Iain Sandoe  
+
+   * config/darwin-driver.h: Adjust includes to add diagnostic-core.
+



interesting that none of the cross builds need diagnostics-core.h. I see 
it used in 7 different targets.  Must be something on the native build 
command line that is defined which causes it to be needed.


Anyway, Thanks for fixing it.

btw, that should be darwin-driver.c not .h  in the changelog right?

Andrew

  2015-10-16  Trevor Saunders  
  
  	* lra-constraints.c (add_next_usage_insn): Change argument type

Index: gcc/config/darwin-driver.c
===
--- gcc/config/darwin-driver.c  (revision 228938)
+++ gcc/config/darwin-driver.c  (working copy)
@@ -23,6 +23,7 @@
  #include "coretypes.h"
  #include "tm.h"
  #include "opts.h"
+#include "diagnostic-core.h"
  
  #ifndef CROSS_DIRECTORY_STRUCTURE

  #include 





[gomp4] Merge gomp-4_1-branch r224607 (2015-06-18) into gomp-4_0-branch

2015-10-19 Thread Thomas Schwinge
Hi!

I have recently merged trunk r228776 (2015-10-13) into gomp-4_0-branch,
which is the trunk revision before Jakub's big "Merge from
gomp-4_1-branch to trunk",
.
Instead of attempting to merge that one in one go -- that is, to avoid
having to deal with a ton of merge conflicts at once, and to allow for
easier understanding of individual changes/regressions -- in the
following I'll gradually merge individual "blocks" of all the
gomp-4_1-branch changes into gomp-4_0-branch.  Committed to
gomp-4_0-branch in r228972:

commit 3931662876141de5c18d0c5e02c156eef5286bee
Merge: fdc2c87 2b9f218
Author: tschwinge 
Date:   Mon Oct 19 15:38:31 2015 +

svn merge -r 222404:224607 
svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_1-branch


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@228972 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [mask-vec_cond, patch 3/2] SLP support

2015-10-19 Thread Jeff Law

On 10/19/2015 05:21 AM, Ilya Enkovich wrote:

Hi,

This patch adds missing support for cond_expr with no embedded comparison in 
SLP.  No new test added because vec cmp SLP test becomes (due to changes in 
bool patterns by the first patch) a regression test for this patch.  Does it 
look OK?

Thanks,
Ilya
--
gcc/

2015-10-19  Ilya Enkovich  

* tree-vect-slp.c (vect_get_and_check_slp_defs): Allow
cond_exp with no embedded comparison.
(vect_build_slp_tree_1): Likewise.
Is it even valid gimple to have a COND_EXPR that is anything other than 
a conditional?


From looking at gimplify_cond_expr, it looks like we could have a 
SSA_NAME that's a bool as the conditional.  Presumably we're allowing a 
vector of bools as the conditional once we hit the vectorizer, which 
seems fairly natural.


OK.  Please install when the prerequisites are installed.

Thanks,
jeff



[PATCH] Don't allow FSM threader to create irreducible loops unless it eliminates a multi-way branch

2015-10-19 Thread Jeff Law
If I hack up GCC's old jump threader to avoid threading across backedges 
and instead let the FSM threader handle that case, then we end up with 
cases where the FSM threader creates irreducible loops with marginal 
benefit.


This can be seen in ssa-dom-thread-2{d,e,f}.c.

We've long avoided such threads in the old jump threader.  We generally 
want to avoid them in the FSM threader as well.  The only case where 
we're going to allow them is when we're able to eliminate a multi-way 
branch from the loop.


Bootstrapped and regression tested on x86_64-linux-gnu.  Also tested the 
above mentioned testcases with my hacked up compiler.


Installed on the trunk.

Jeff
commit 518690952b62c1d38b89cdbef0490a7d11f06c40
Author: Jeff Law 
Date:   Mon Oct 19 10:23:26 2015 -0600

[PATCH] Don't allow FSM threader to create irreducible loops unless it 
eliminates a multi-way branch

* tree-ssa-threadupdate.c (valid_jump_thread_path): Reject paths
that create irreducible loops unless the path elimiantes a multiway
branch.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 89a42c1..ff3d3fc 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2015-10-19  Jeff Law  
+
+   * tree-ssa-threadupdate.c (valid_jump_thread_path): Reject paths
+   that create irreducible loops unless the path elimiantes a multiway
+   branch.
+
 2015-10-19  Richard Biener  
 
PR tree-optimization/67975
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 5632a88..8e3437a 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2553,11 +2553,31 @@ static bool
 valid_jump_thread_path (vec *path)
 {
   unsigned len = path->length ();
+  bool multiway_branch = false;
 
-  /* Check that the path is connected.  */
+  /* Check that the path is connected and see if there's a multi-way
+ branch on the path.  */
   for (unsigned int j = 0; j < len - 1; j++)
-if ((*path)[j]->e->dest != (*path)[j+1]->e->src)
-  return false;
+{
+  if ((*path)[j]->e->dest != (*path)[j+1]->e->src)
+return false;
+  gimple *last = last_stmt ((*path)[j]->e->dest);
+  multiway_branch |= (last && gimple_code (last) == GIMPLE_SWITCH);
+}
+
+  /* If we are trying to thread the loop latch to a block that does
+ not dominate the loop latch, then that will create an irreducible
+ loop.  We avoid that unless the jump thread has a multi-way
+ branch, in which case we have deemed it worth losing other
+ loop optimizations later if we can eliminate the multi-way branch.  */
+  edge e = (*path)[0]->e;
+  struct loop *loop = e->dest->loop_father;
+  if (!multiway_branch
+  && loop->latch
+  && loop_latch_edge (loop) == e
+  && (determine_bb_domination_status (loop, path->last ()->e->dest)
+ == DOMST_NONDOMINATING))
+return false;
 
   return true;
 }
@@ -2650,7 +2670,9 @@ thread_through_all_blocks (bool may_peel_loop_headers)
   if (bitmap_bit_p (threaded_blocks, entry->src->index)
  /* Verify that the jump thread path is still valid: a
 previous jump-thread may have changed the CFG, and
-invalidated the current path.  */
+invalidated the current path or the requested jump
+thread might create irreducible loops which should
+generally be avoided.  */
  || !valid_jump_thread_path (path))
{
  /* Remove invalid FSM jump-thread paths.  */


OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data)

2015-10-19 Thread Thomas Schwinge
Hi!

Chung-Lin, would you please have a look at the following (on
gomp-4_0-branch)?  Also, anyone else got any ideas off-hand?

On Tue, 23 Jun 2015 13:51:39 +0200, Jakub Jelinek  wrote:
> On Tue, Jun 23, 2015 at 02:40:43PM +0300, Ilya Verbin wrote:
> > On Sat, Jun 20, 2015 at 00:35:14 +0300, Ilya Verbin wrote:
> > > Given that a mapped variable in 4.1 can have different kinds across 
> > > nested data
> > > regions, we need to store map-type not only for each var, but also for 
> > > each
> > > structured mapping.  Here is my WIP patch, is it sane? :)
> > > Attached testcase works OK on the device with non-shared memory.
> > 
> > A bit updated version with a fix for GOMP_MAP_TO_PSET.
> > make check-target-libgomp passed.
> 
> Ok, thanks.
> 
> > include/gcc/
> > * gomp-constants.h (GOMP_MAP_ALWAYS_TO_P,
> > GOMP_MAP_ALWAYS_FROM_P): Define.
> > libgomp/
> > * libgomp.h (struct target_var_desc): New.
> > (struct target_mem_desc): Replace array of splay_tree_key with array of
> > target_var_desc.
> > (struct splay_tree_key_s): Move copy_from to target_var_desc.
> > * oacc-mem.c (gomp_acc_remove_pointer): Use copy_from from
> > target_var_desc.
> > * oacc-parallel.c (GOACC_parallel): Use copy_from from target_var_desc.
> > * target.c (gomp_map_vars_existing): Copy data to device if map-type is
> > 'always to' or 'always tofrom'.
> > (gomp_map_vars): Use key from target_var_desc.  Set copy_from and
> > always_copy_from.
> > (gomp_copy_from_async): Use key and copy_from from target_var_desc.
> > (gomp_unmap_vars): Copy data from device if always_copy_from is set.
> > (gomp_offload_image_to_device): Do not use copy_from.
> > * testsuite/libgomp.c/target-11.c: New test.

(That's gomp-4_1-branch r224838.  The attached
gomp-4_1-branch-r224838.patch is a variant that applies on top of
gomp-4_0-branch r228972.)  This change introduces regressions in OpenACC
async clause handling.

Testing on gomp-4_1-branch r224838:

PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

Same for C++.

Testing on gomp-4_0-branch r228972 plus the attached
gomp-4_1-branch-r224838.patch:

PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/asyncwait-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none (test for 
excess errors)
[-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/asyncwait-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none execution 
test

Same for C++.

As I mentioned in
,
all three regressions are visible when testing on trunk r228777.  I have
not analyzed why the three different branches show different sets of
regressions -- I'm hoping they're all manifestations of the same
underlying problem: they're all using the OpenACC async clause.

Looking at gomp-4_0-branch r228972 plus the attached
gomp-4_1-branch-r224838.patch, clearly there is "some kind of data
corruption":

$ gdb -q a.out 
Reading symbols from a.out...done.
(gdb) start
[...]
25  a = (float *) malloc (nbytes);
(gdb) n
26  b = (float *) malloc (nbytes);
(gdb) print a
$1 = (float *) 0xab12c0
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x004015d2 in main (argc=1, argv=0x7fffd408) at 
source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c:133
133 if (a[i] != 3.0)
(gdb) print a
$2 = (float *) 0x500680620

0x500680620 looks like a nvptx device pointer to me, which is a) wrong
(after the "malloc", "a" shouldn't change its value throughout program
execution), and b) that "explains" the segmentation fault (device pointer
dereferenced in host code).

So, maybe data is erroneously being copied back to the host from device,
or from libgomp internal data structures.  Maybe some copy_from flag
handling needs to be adjusted or added in the OpenACC code in libgomp?


I have no idea whether that's related, but I noticed that currently we're
not in any way handling async_refcount in libgomp/oacc-*.c -- do we have
to?  (Its name certainly makes me believe it's related to asynchronous
data (un-)mapping.)  Should we be able to drop some of the
OpenACC-specific async implementation in libgomp, and use new/generic
target.c code instead?


Please note that there will be further libgomp changes (target.c, and
other files) coming in l

Re: Fix prototype for print_insn in rtl.h

2015-10-19 Thread Jeff Law

On 10/19/2015 09:14 AM, Jeff Law wrote:

On 10/15/2015 10:28 AM, Andrew MacLeod wrote:

On 10/13/2015 11:32 AM, Jeff Law wrote:

On 10/13/2015 02:21 AM, Nikolai Bozhenov wrote:

2015-10-13  Nikolai Bozhenov

 * gcc/rtl.h (print_insn): fix prototype

Installed on the trunk after bootstrap & regression test.

jeff


Sorry, a little late to the party.. but why is print_insn even in
rtl.h?  it seems that sched-vis.c is the only thing that uses it...

Then let's move it to sched-int.h, unless there's some good reason not to.
Because there isn't a sched-vis.h file and sched-vis.c would need to 
include sched-int.h.


That's all rather silly because sched-vis.c has nothing to do with 
scheduling.  It's just an RTL dumper.


I think moving all that stuff into print-rtl.[ch] is probably the better 
solution.


jeff


Re: [patch] fix gotools cross build

2015-10-19 Thread Ian Lance Taylor
On Wed, May 6, 2015 at 5:34 AM, Matthias Klose  wrote:
>
> Yes, it's documented that there is still some work to do for cross builds,
> however a cross build for gotools currently fails.
>
> The toplevel make always passes the GOC variable in the environment, 
> overwriting
> anything configured in gotools own configure.ac. Fixed by explictly using 
> @GOC@
> for GOCOMPILER.
>
> gotools is a host project, and the cross_compiling check always fails. Not 
> sure
> if the $host != $target test is any better, but it works for me.
>
> Ok for the trunk and the gcc-5 branch?
>
>   Matthias
>
> * Makefile.am: Use GOC configured in configure.ac for cross builds.
> * configure.ac: Fix NATIVE conditional.
> * Makefile.in, configure: Regenerate.
>
> --- gotools/Makefile.am
> +++ gotools/Makefile.am
> @@ -33,7 +33,7 @@
>  # Use the compiler we just built.
>  GOCOMPILER = $(GOC_FOR_TARGET)
>  else
> -GOCOMPILER = $(GOC)
> +GOCOMPILER = @GOC@
>  endif
>
>  GOCFLAGS = $(CFLAGS_FOR_TARGET)
> --- gotools/configure.ac
> +++ gotools/configure.ac
> @@ -46,7 +46,7 @@
>  AC_PROG_CC
>  AC_PROG_GO
>
> -AM_CONDITIONAL(NATIVE, test "$cross_compiling" = no)
> +AM_CONDITIONAL(NATIVE, test "$host" = "$target")
>
>  dnl Test for -lsocket and -lnsl.  Copied from libjava/configure.ac.
>  AC_CACHE_CHECK([for socket libraries], gotools_cv_lib_sockets,


I really apologize for never responding to this.  I keep meaning to
figure out the right thing to do, but I never have time to think about
it.

As far as I can see, both these suggested changes are wrong.  They may
avoid problems that exist today, but as far as I can see they don't do
it in the right way.

If GOC is not being set correctly by the top level Makefile, then the
fix is to set it correctly.  Overriding GOC in the gotools Makefile is
just going to lead us into more complexity and confusion over time.

The test for NATIVE currently tests whether we are building with a
cross-compiler.  You are suggesting changing it to be whether we are
building a cross-compiler.  Neither test is fully correct.  If we just
want to make things slightly more correct, we should that $build ==
$host (aka ! $cross_compiling) and that $host == $target.  If both
conditions are true, then we have a native build.  A change along
those lines is OK.

Sorry again for not responding to this.

Ian


Re: OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data)

2015-10-19 Thread Ilya Verbin
On Mon, Oct 19, 2015 at 18:24:35 +0200, Thomas Schwinge wrote:
> Chung-Lin, would you please have a look at the following (on
> gomp-4_0-branch)?  Also, anyone else got any ideas off-hand?
> 
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

Maybe it was caused by this change in gomp_unmap_vars?
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01376.html

Looking at the code, I don't see any difference in async_refcount handling, but
I was unable to test it without having hardware :(

  -- Ilya


Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-19 Thread Thomas Schwinge
Hi!

Ping...

On Wed, 30 Sep 2015 17:54:07 +0200, I wrote:
> On Tue, 29 Sep 2015 10:18:14 +0200, Jakub Jelinek  wrote:
> > On Mon, Sep 28, 2015 at 11:39:10AM +0200, Thomas Schwinge wrote:
> > > On Fri, 11 Sep 2015 17:43:49 +0200, Jakub Jelinek  
> > > wrote:
> > > > So, do I understand well that you'll call GOMP_set_offload_targets from
> > > > construct[ors] of all shared libraries (and the binary) that contain 
> > > > offloaded
> > > > code?  If yes, that is surely going to fail the assertions in there.
> > > 
> > > Indeed.  My original plan has been to generate/invoke this constructor
> > > only for/from the final executable and not for any shared libraries, but
> > > it seems I didn't implemented this correctly.
> > 
> > How would you mean to implement it?
> 
> I have come to realize that we need to generate/invoke this constructor
> From everything that links against libgomp (which is what I implemented),
> that is, executables as well as shared libraries.
> 
> > -fopenmp or -fopenacc code with
> > offloading bits might not be in the final executable at all, nor in shared
> > libraries it is linked against; such libraries could be only dlopened,
> > consider say python plugin.  And this is not just made up, perhaps not with
> > offloading yet, but people regularly use OpenMP code in plugins and then we
> > get complains that fork child of the main program is not allowed to do
> > anything but async-signal-safe functions.
> 
> I'm not sure I'm completely understanding that paragraph?  Are you saying
> that offloaded code can be in libraries that are not linked against
> libgomp?  How would these register (GOMP_offload_register) their
> offloaded code?  I think it's a reasonable to expect that every shared
> library that contains offloaded code must link against libgomp, which
> will happen automatically given that it is built with -fopenmp/-fopenacc?
> 
> > > > You can dlopen such libraries etc.  What if you link one library with
> > > > -fopenmp=nvptx-none and another one with 
> > > > -fopenmp=x86_64-intelmicemul-linux?
> > > 
> > > So, the first question to answer is: what do we expect to happen in this
> > > case, or similarly, if the executable and any shared libraries are
> > > compiled with different/incompatible -foffload options?
> > 
> > As the device numbers are per-process, the only possibility I see is that
> > all the physically available devices are always available, and just if you
> > try to offload from some code to a device that doesn't support it, you get
> > host fallback.  Because, one shared library could carefully use device(xyz)
> > to offload to say XeonPhi it is compiled for and supports, and another
> > library device(abc) to offload to PTX it is compiled for and supports.
> 
> OK, I think I get that, and it makes sense.  Even though, I don't know
> how you'd do that today: as far as I can tell, there is no specification
> covering the OpenMP 4 target device IDs, so I have no idea how a user
> program/library could realiably use them in practice?  For example, in
> the current GCC implementation, the OpenMP 4 target device IDs depend on
> the number of individual devices availble in the system, and the order in
> which libgomp loads the plugins, which is defined (arbitrarily) by the
> GCC configuration?
> 
> > > For this, I propose that the only mode of operation that we currently can
> > > support is that all of the executable and any shared libraries agree on
> > > the offload targets specified by -foffload, and I thus propose the
> > > following patch on top of what Joseph has posted before (passes the
> > > testsuite, but not yet tested otherwise):
> > 
> > See above, no.
> 
> OK.
> 
> How's the following (complete patch instead of incremental patch; the
> driver changes are still the same as before)?  The changes are:
> 
>   * libgomp/target.c:gomp_target_init again loads all the plugins.
>   * libgomp/target.c:resolve_device and
> libgomp/oacc-init.c:resolve_device verify that a default device
> (OpenMP device-var ICV, and acc_device_default, respectively) is
> actually enabled, or resort to host fallback if not.
>   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
> to enable devices specified by -foffload.  Can be called multiple
> times (executable, any shared libraries); the set of enabled devices
> is the union of all those ever requested.
>   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
> changed to enable all devices.  This is to maintain compatibility
> with old executables and shared libraries built without the -foffload
> constructor support.
>   * IntelMIC mkoffload changed to use GOMP_offload_register_ver instead
> of GOMP_offload_register, and GOMP_offload_unregister_ver instead of
> GOMP_offload_unregister.  To avoid enabling all devices
> (GOMP_offload_register).
>   * New test cases to verify this (-foffload=disable, host fallback).

(Will write ChangeLog once

Re: [PATCH 1/3] [ARM] PR63870 Add qualifiers for NEON builtins

2015-10-19 Thread Alan Lawrence

On 14/10/15 23:02, Charles Baylis wrote:

On 12 October 2015 at 11:58, Alan Lawrence  wrote:

>

Given we are making changes here to how this all works on bigendian, have
you tested armeb at all?


I tested on big endian, and it passes, except


Well, I asked because it seemed good to make sure that the changes/improvements 
to how lane-swapping was done, wasn't breaking anything on armeb by the back 
door, and so thank you, I'm happy with that as far as your patch is concerned ;).



for a testsuite issue
with the *_f16 tests, which fail because they are built without the
fp16 options on big endian. This is because
check_effective_target_arm_neon_fp16_ok_nocache gets an ICE when it
attempts to compile the test program. I think those fp16 intrinsics
are in your area, do you want to take a look? :)


Heh, yes, I see ;). So I've dug into this a bit, and the problem seems to be 
that we don't define a movv4hf pattern, and hence, we fall back to 
emit_multi_word_move. This uses subregs, and in simplify_subreg_regno, 
REG_CANNOT_CHANGE_MODE_P is true on bigendian (but false on little-endian).


That is, I *think* the right thing to do is just to add a movv4hf (and v8hf) 
pattern, I'm testing this now


Cheers, Alan



Re: [vec-cmp, patch 7/6] Vector comparison enabling in SLP

2015-10-19 Thread Jeff Law

On 10/19/2015 05:06 AM, Ilya Enkovich wrote:

Hi,

It appeared our testsuite doesn't have a test which would require vector 
comparison support in SLP even after boolean pattern disabling.  This patch 
adds such tests and allow comparison for SLP.  Is it OK?

Thanks,
Ilya
--
gcc/

2015-10-19  Ilya Enkovich  

* tree-vect-slp.c (vect_build_slp_tree_1): Allow
comparison statements.
(vect_get_constant_vectors): Support boolean vector
constants.

gcc/testsuite/

2015-10-19  Ilya Enkovich  

* gcc.dg/vect/slp-cond-5.c: New test.

OK once any prerequisites are approved.

jeff



Re: [PR fortran/63858] Fix mix of OpenACC and OpenMP sentinels in continuations

2015-10-19 Thread Thomas Schwinge
Hi!

Ping...

On Fri, 9 Oct 2015 12:15:24 +0200, I wrote:
> On Mon, 27 Jul 2015 16:14:17 +0200, I wrote:
> > On Tue, 30 Jun 2015 03:39:42 +0300, Ilmir Usmanov  wrote:
> > > 08.06.2015, 17:59, "Cesar Philippidis" :
> > > > On 06/07/2015 02:05 PM, Ilmir Usmanov wrote:
> > > >>  08.06.2015, 00:01, "Ilmir Usmanov" :
> > >   This patch fixes checks of OpenMP and OpenACC continuations in
> > >   case if someone mixes them (i.e. continues OpenMP directive with
> > >   !$ACC sentinel or vice versa).
> > 
> > Thanks for working on this!
> > 
> > >   OK for gomp branch?
> > 
> > The same applies to GCC trunk, as far as I can tell -- any reason not to
> > apply the patch to trunk?
> 
> Ping -- OK to commit the following (by Ilmir) to trunk:
> 
> commit 38e62678ef11f349f029d42439668071f170e059
> Author: Ilmir Usmanov 
> Date:   Sun Jul 26 12:10:36 2015 +
> 
> [PR fortran/63858] Fix mix of OpenACC and OpenMP sentinels in 
> continuations
> 
>   gcc/fortran/
>   PR fortran/63858
>   * scanner.c (skip_omp_attribute_fixed, skip_oacc_attribute_fixed):
>   New functions.
>   (skip_fixed_comments, gfc_next_char_literal): Fix mix of OpenACC
>   and OpenMP sentinels in continuation.
>   gcc/testsuite/
>   PR fortran/63858
>   * gfortran.dg/goacc/omp-fixed.f: New file.
>   * gfortran.dg/goacc/omp.f95: Extend.
> ---
>  gcc/fortran/scanner.c   | 258 
> +---
>  gcc/testsuite/gfortran.dg/goacc/omp-fixed.f |  32 
>  gcc/testsuite/gfortran.dg/goacc/omp.f95 |  10 +-
>  3 files changed, 199 insertions(+), 101 deletions(-)
> 
> diff --git gcc/fortran/scanner.c gcc/fortran/scanner.c
> index bfb7d45..1e1ea84 100644
> --- gcc/fortran/scanner.c
> +++ gcc/fortran/scanner.c
> @@ -935,6 +935,63 @@ skip_free_comments (void)
>return false;
>  }
>  
> +/* Return true if MP was matched in fixed form.  */
> +static bool
> +skip_omp_attribute_fixed (locus *start)
> +{
> +  gfc_char_t c;
> +  if (((c = next_char ()) == 'm' || c == 'M')
> +  && ((c = next_char ()) == 'p' || c == 'P'))
> +{
> +  c = next_char ();
> +  if (c != '\n'
> +   && (continue_flag
> +   || c == ' ' || c == '\t' || c == '0'))
> + {
> +   do
> + c = next_char ();
> +   while (gfc_is_whitespace (c));
> +   if (c != '\n' && c != '!')
> + {
> +   /* Canonicalize to *$omp.  */
> +   *start->nextc = '*';
> +   openmp_flag = 1;
> +   gfc_current_locus = *start;
> +   return true;
> + }
> + }
> +}
> +  return false;
> +}
> +
> +/* Return true if CC was matched in fixed form.  */
> +static bool
> +skip_oacc_attribute_fixed (locus *start)
> +{
> +  gfc_char_t c;
> +  if (((c = next_char ()) == 'c' || c == 'C')
> +  && ((c = next_char ()) == 'c' || c == 'C'))
> +{
> +  c = next_char ();
> +  if (c != '\n'
> +   && (continue_flag
> +   || c == ' ' || c == '\t' || c == '0'))
> + {
> +   do
> + c = next_char ();
> +   while (gfc_is_whitespace (c));
> +   if (c != '\n' && c != '!')
> + {
> +   /* Canonicalize to *$omp.  */
> +   *start->nextc = '*';
> +   openacc_flag = 1;
> +   gfc_current_locus = *start;
> +   return true;
> + }
> + }
> +}
> +  return false;
> +}
>  
>  /* Skip comment lines in fixed source mode.  We have the same rules as
> in skip_free_comment(), except that we can have a 'c', 'C' or '*'
> @@ -1003,128 +1060,92 @@ skip_fixed_comments (void)
> && continue_line < gfc_linebuf_linenum (gfc_current_locus.lb))
>   continue_line = gfc_linebuf_linenum (gfc_current_locus.lb);
>  
> -   if (flag_openmp || flag_openmp_simd)
> +   if ((flag_openmp || flag_openmp_simd) && !flag_openacc)
>   {
> if (next_char () == '$')
>   {
> c = next_char ();
> if (c == 'o' || c == 'O')
>   {
> -   if (((c = next_char ()) == 'm' || c == 'M')
> -   && ((c = next_char ()) == 'p' || c == 'P'))
> - {
> -   c = next_char ();
> -   if (c != '\n'
> -   && ((openmp_flag && continue_flag)
> -   || c == ' ' || c == '\t' || c == '0'))
> - {
> -   do
> - c = next_char ();
> -   while (gfc_is_whitespace (c));
> -   if (c != '\n' && c != '!')
> - {
> -   /* Canonicalize to *$omp.  */
> -   *start.nextc = '*';
> -   openmp_flag = 1;
> -   gfc_current_locus = start;
> -   return;
> - }
> - 

[HSA] Class refactoring

2015-10-19 Thread Martin Liška

Hello.

Following tarball is made of patches that rename class member variables to 
respect
C++ coding style, where all member variables should begin with 'm_'.

Martin


hsa-class-refactoring.tar.bz2
Description: application/bzip


Re: Move cabs simplifications to match.pd

2015-10-19 Thread Richard Biener
On October 19, 2015 4:42:23 PM GMT+02:00, Richard Sandiford 
 wrote:
>The fold code also expanded cabs(x+yi) to fsqrt(x*x+y*y) when
>optimising
>for speed.  tree-ssa-math-opts.c has this transformation too, but
>unlike
>the fold code, it first checks whether the target implements the sqrt
>optab.  The patch simply removes the fold code and keeps the
>tree-ssa-math-opts.c logic the same.
>
>gcc.dg/lto/20110201-1_0.c was relying on us replacing cabs
>with fsqrt even on targets where fsqrt is itself a library call.
>The discussion leading up to that patch suggested that we only
>want to test the fold on targets with a square root instruction,
>so it would be OK to skip the test on other targets:
>
>https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01961.html
>https://gcc.gnu.org/ml/gcc-patches/2011-07/msg02036.html
>
>The patch does that using the sqrt_insn effective target.
>
>It's possible that removing the tree folds renders the LTO trick
>unnecessary, but since the test was originally for an ICE, it seems
>better to leave it as-is.
>
>Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>20110201-1_0.c passes on all three.  OK to install?

OK.

Thanks,
Richard.

>Thanks,
>Richard
>
>
>gcc/
>   * builtins.c (fold_builtin_cabs): Delete.
>   (fold_builtin_1): Update accordingly.  Handle constant arguments here.
>   * match.pd: Add rules previously handled by fold_builtin_cabs.
>
>gcc/testsuite/
>   * gcc.dg/lto/20110201-1_0.c: Restrict to sqrt_insn targets.
>   Add associated options for arm*-*-*.
>   (sqrt): Remove dummy definition.
>
>diff --git a/gcc/builtins.c b/gcc/builtins.c
>index 1e4ec35..8f87fd9 100644
>--- a/gcc/builtins.c
>+++ b/gcc/builtins.c
>@@ -7539,82 +7539,6 @@ fold_fixed_mathfn (location_t loc, tree fndecl,
>tree arg)
>   return NULL_TREE;
> }
> 
>-/* Fold call to builtin cabs, cabsf or cabsl with argument ARG.  TYPE
>is the
>-   return type.  Return NULL_TREE if no simplification can be made. 
>*/
>-
>-static tree
>-fold_builtin_cabs (location_t loc, tree arg, tree type, tree fndecl)
>-{
>-  tree res;
>-
>-  if (!validate_arg (arg, COMPLEX_TYPE)
>-  || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != REAL_TYPE)
>-return NULL_TREE;
>-
>-  /* Calculate the result when the argument is a constant.  */
>-  if (TREE_CODE (arg) == COMPLEX_CST
>-  && (res = do_mpfr_arg2 (TREE_REALPART (arg), TREE_IMAGPART
>(arg),
>-type, mpfr_hypot)))
>-return res;
>-
>-  if (TREE_CODE (arg) == COMPLEX_EXPR)
>-{
>-  tree real = TREE_OPERAND (arg, 0);
>-  tree imag = TREE_OPERAND (arg, 1);
>-
>-  /* If either part is zero, cabs is fabs of the other.  */
>-  if (real_zerop (real))
>-  return fold_build1_loc (loc, ABS_EXPR, type, imag);
>-  if (real_zerop (imag))
>-  return fold_build1_loc (loc, ABS_EXPR, type, real);
>-
>-  /* cabs(x+xi) -> fabs(x)*sqrt(2).  */
>-  if (flag_unsafe_math_optimizations
>-&& operand_equal_p (real, imag, OEP_PURE_SAME))
>-{
>-STRIP_NOPS (real);
>-return fold_build2_loc (loc, MULT_EXPR, type,
>-fold_build1_loc (loc, ABS_EXPR, type, real),
>-build_real_truncate (type, dconst_sqrt2 ()));
>-  }
>-}
>-
>-  /* Optimize cabs(-z) and cabs(conj(z)) as cabs(z).  */
>-  if (TREE_CODE (arg) == NEGATE_EXPR
>-  || TREE_CODE (arg) == CONJ_EXPR)
>-return build_call_expr_loc (loc, fndecl, 1, TREE_OPERAND (arg,
>0));
>-
>-  /* Don't do this when optimizing for size.  */
>-  if (flag_unsafe_math_optimizations
>-  && optimize && optimize_function_for_speed_p (cfun))
>-{
>-  tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
>-
>-  if (sqrtfn != NULL_TREE)
>-  {
>-tree rpart, ipart, result;
>-
>-arg = builtin_save_expr (arg);
>-
>-rpart = fold_build1_loc (loc, REALPART_EXPR, type, arg);
>-ipart = fold_build1_loc (loc, IMAGPART_EXPR, type, arg);
>-
>-rpart = builtin_save_expr (rpart);
>-ipart = builtin_save_expr (ipart);
>-
>-result = fold_build2_loc (loc, PLUS_EXPR, type,
>-  fold_build2_loc (loc, MULT_EXPR, type,
>-   rpart, rpart),
>-  fold_build2_loc (loc, MULT_EXPR, type,
>-   ipart, ipart));
>-
>-return build_call_expr_loc (loc, sqrtfn, 1, result);
>-  }
>-}
>-
>-  return NULL_TREE;
>-}
>-
> /* Build a complex (inf +- 0i) for the result of cproj.  TYPE is the
>complex tree type of the result.  If NEG is true, the imaginary
>zero is negative.  */
>@@ -9683,7 +9607,11 @@ fold_builtin_1 (location_t loc, tree fndecl,
>tree arg0)
> break;
> 
> CASE_FLT_FN (BUILT_IN_CABS):
>-  return fold_builtin_cabs (loc, arg0, type, fndecl);
>+  if (TREE_CODE (arg0) == COMPLEX_CST
>+&& TREE_CODE (TREE_TYPE (TREE_TYPE (arg0))) == REAL_TY

[gomp4] auto partitioning

2015-10-19 Thread Nathan Sidwell

I've committed this patch to gomp4 branch.

It implements handling of the 'auto' clause on a loop.  such loops can be 
implicitly partitioned, if they are (explicitly or implicitly) 'independent'. 
This patch walks the loop structure after explicit partitioning has been 
handled, and attempts to allocate a partitioning for such auto loops.  If 
there's no available partitioning a diagnostic is emitted.


Auto partitioning caused a failure of a collapse testcase.  I considered this a 
latent bug and forced that testcase to retain the original behaviour of a 'seq' 
loop.


nathan
2015-10-19  Nathan Sidwell  

	gcc/
	* omp-low.c (oacc_loop_auto_partitions): New.
	(oacc_loop_partition): Call it.

	gcc/testsuite/
	* gfortran.dg/goacc/routine-4.f90: Add diagnostic.
	* gfortran.dg/goacc/routine-5.f90: Add diagnostic.
	* c-c++-common/goacc-gomp/nesting-1.c: Add diagnostic.
	* c-c++-common/goacc/routine-6.c: Add diagnostic.
	* c-c++-common/goacc/routine-7.c: Add diagnostic.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Force
	serialization.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 228960)
+++ gcc/omp-low.c	(working copy)
@@ -16244,6 +16244,50 @@ oacc_loop_fixed_partitions (oacc_loop *l
   return has_auto;
 }
 
+/* Walk the OpenACC loop heirarchy to assign auto-partitioned loops.
+   OUTER_MASK is the partitioning this loop is contained within.
+   Return the cumulative partitioning used by this loop, siblings and
+   children.  */
+
+static unsigned
+oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask)
+{
+  unsigned inner_mask = 0;
+  bool noisy = true;
+
+#ifdef ACCEL_COMPILER
+  /* When device_type is supported, we want the device compiler to be
+ noisy, if the loop parameters are device_type-specific.  */
+  noisy = false;
+#endif
+
+  if (loop->child)
+inner_mask |= oacc_loop_auto_partitions (loop->child,
+	 outer_mask | loop->mask);
+
+  if ((loop->flags & OLF_AUTO) && (loop->flags & OLF_INDEPENDENT))
+{
+  unsigned this_mask = 0;
+  
+  /* Pick the innermost free partitioning.  */
+  this_mask = inner_mask | GOMP_DIM_MASK (GOMP_DIM_MAX);
+  this_mask = (this_mask & -this_mask) >> 1;
+  this_mask &= ~outer_mask;
+
+  if (!this_mask && noisy)
+	warning_at (loop->loc, 0,
+		"insufficient parallelism available to partition loop");
+
+  loop->mask = this_mask;
+}
+  inner_mask |= loop->mask;
+  
+  if (loop->sibling)
+inner_mask |= oacc_loop_auto_partitions (loop->sibling, outer_mask);
+  
+  return inner_mask;
+}
+
 /* Walk the OpenACC loop heirarchy to check and assign partitioning
axes.  */
 
@@ -16255,7 +16299,8 @@ oacc_loop_partition (oacc_loop *loop, in
   if (fn_level >= 0)
 outer_mask = GOMP_DIM_MASK (fn_level) - 1;
 
-  oacc_loop_fixed_partitions (loop, outer_mask);
+  if (oacc_loop_fixed_partitions (loop, outer_mask))
+oacc_loop_auto_partitions (loop, outer_mask);
 }
 
 /* Default launch dimension validator.  Force everything to 1.  A
Index: gcc/testsuite/gfortran.dg/goacc/routine-4.f90
===
--- gcc/testsuite/gfortran.dg/goacc/routine-4.f90	(revision 228960)
+++ gcc/testsuite/gfortran.dg/goacc/routine-4.f90	(working copy)
@@ -44,7 +44,7 @@ program main
   !
 
   !$acc parallel copy (a)
-  !$acc loop
+  !$acc loop ! { dg-warning "insufficient parallelism" }
   do i = 1, N
  call gang (a)
   end do
Index: gcc/testsuite/gfortran.dg/goacc/routine-5.f90
===
--- gcc/testsuite/gfortran.dg/goacc/routine-5.f90	(revision 228960)
+++ gcc/testsuite/gfortran.dg/goacc/routine-5.f90	(working copy)
@@ -87,7 +87,7 @@ subroutine seq (a)
   integer, intent (inout) :: a(N)
   integer :: i
 
-  !$acc loop
+  !$acc loop ! { dg-warning "insufficient parallelism" }
   do i = 1, N
  a(i) = a(i) - a(i)
   end do
Index: gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
===
--- gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c	(revision 228960)
+++ gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c	(working copy)
@@ -20,7 +20,7 @@ f_acc_kernels (void)
   }
 }
 
-#pragma acc routine
+#pragma acc routine vector
 void
 f_acc_loop (void)
 {
Index: gcc/testsuite/c-c++-common/goacc/routine-7.c
===
--- gcc/testsuite/c-c++-common/goacc/routine-7.c	(revision 228960)
+++ gcc/testsuite/c-c++-common/goacc/routine-7.c	(working copy)
@@ -74,7 +74,7 @@ vector (int red)
 int
 seq (int red)
 {
-#pragma acc loop reduction (+:red)
+#pragma acc loop reduction (+:red) // { dg-warning "insufficient parallelism" }
   for (int i = 0; i < 10; i++)
 red ++;
 
Index: gcc/testsuite/c-c++-common/goacc/routine-6.c
===
--- gcc/testsuite/c-c++-common/goacc/r

Re: [PATCH] mn10300: Use the STC bb-reorder algorithm at -Os

2015-10-19 Thread Jeff Law

On 10/16/2015 06:53 AM, Segher Boessenkool wrote:

For mn10300, STC still gives better results for optimise-for-size than
"simple" does.  So use STC at -Os as well.

Is this okay for trunk?


Segher


2015-10-16  Segher Boessenkool  

* common/config/mn10300/mn10300-common.c
(mn10300_option_optimization_table) :
Use REORDER_BLOCKS_ALGORITHM_STC at -Os and up.


OK.
jeff



Re: [Patch] Add OPT_Wattributes to ignored attributes on template args

2015-10-19 Thread Ryan Mansfield

Ping:

https://gcc.gnu.org/ml/gcc-patches/2015-09/msg02256.html

Regards,

Ryan Mansfield

On 15-09-29 04:21 PM, Ryan Mansfield wrote:

Hi,

In canonicalize_type_argument attributes are being discarded with a
warning. Should it be added to OPT_Wattributes?

2015-09-29  Ryan Mansfield  

 * pt.c (canonicalize_type_argument): Use OPT_Wattributes in
warning.


Re: [PATCH] Fix default_binds_local_p_2 for extern protected data

2015-10-19 Thread Richard Henderson

On 07/22/2015 07:01 AM, Szabolcs Nagy wrote:

2015-07-22  Szabolcs Nagy

PR target/66912
* varasm.c (default_binds_local_p_2): Turn on extern_protected_data.

gcc/testsuite/ChangeLog:

2015-07-22  Szabolcs Nagy

PR target/66912
* gcc.target/aarch64/pr66912.c: New.
* gcc.target/arm/pr66912.c: New.


Ok.


r~


Re: [PATCH, libiberty] Fix PR63758 by using the _NSGetEnviron() API on Darwin.

2015-10-19 Thread Mike Stump
On Oct 18, 2015, at 3:42 AM, Iain Sandoe  wrote:
 This seems likely to break cross-compilers to Darwin that do not have
 the system libraries available.  I guess I don't care about that if
 you don't.
>>> 
>>> I do care about it, but I'm not visualising the case...
>>> 
>>> AFAICS, when built as a host component for a cross to Darwin from 
>>> non-Darwin, environ would be declared as **environ as usual.
>>> 
>>> If an implementation includes a compiler targeting Darwin that defines 
>>> __APPLE__ but doesn't provide _NSGetEnviron in its libc, then isn't it 
>>> broken anyway?
>> 
>> I'm talking about the case of building a cross-compiler where the
>> system libraries are not available.  This is sometimes done as a first
>> step toward building a full cross-compiler.
> 
> I've applied the patch since it solves an immediate problem (and has been 
> requested).
> 
> Right now, the only case that I can think of when there's a Darwin-hosted 
> statically-linked user-space executable is in bringing up the system itself, 
> in which case one has to build non-standard crts and a statically-linkable 
> libc.  Last time I did this was on 10.5 with the darwinbuild stuff, not sure 
> it's even feasible on modern Darwin which is built with a different compiler.
> 
> It's possible that making the Darwin case conditional on ! 
> defined(__STATIC__) might be sufficient to guard that, but I need to think of 
> some way to test it.

So, I see two different things here.  One is a build of the darwin open source 
kernel.  I’ve never done that, though, I knew people that did.  I don’t play in 
this space, so I don’t know how much of a rat hole it is, or if it is even 
possible anymore.  Really, it should just be a drop a new gcc into the official 
open source darwin build infrastructure and hit build.  If it didn’t just build 
before, then it might be a time sink to make it work, I just don’t know.

The other is, it is theoretically nice to be able to build up an entire gcc 
tool chain for a mac, starting from a linux box.  I usually don’t do this, but, 
I do a subset, which is a cc1 with no headers and no link or assembly support 
that fails to build, but works far enough to get past cc1.  This isn’t handy 
for users, but for a developer, I like to do this from time to time.

I don’t see the case that Ian is concerned about.  Either, they have Apple’s 
library, and it does include this routine, or, someone is making a replacement 
OS, and then will need to now provide that routine, if they did not before.  
Partial builds without a library are fine, but without a library, you can’t 
link anything (other than -r) anyway, so I’m not sure it matters that it would 
fail to link, as it failed before anyway (for example, printf would not be 
found either).

Kernel builds are special, and they are one of the few things that build 
static, as does the dyld program).  To test either, I’d recommend either, not 
worrying about it, life is short, or, if you do care enough, you just gotta 
roll up you sleeves, as they truly are special.

Re: [PATCH, sh][v3] musl support for sh

2015-10-19 Thread Szabolcs Nagy

On 17/10/15 02:14, Oleg Endo wrote:

On Fri, 2015-10-16 at 17:06 +0100, Szabolcs Nagy wrote:

Revision of
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01636.html

The musl dynamic linker name is /lib/ld-musl-sh{-nofpu}{-fdpic}.so.1

New in this revision:

Add -fdpic to the name, will be useful with the pending sh2 FDPIC support.

2015-10-16  Gregor Richards  
Szabolcs Nagy  

* config/sh/linux.h (MUSL_DYNAMIC_LINKER): Define.
(MUSL_DYNAMIC_LINKER_E, MUSL_DYNAMIC_LINKER_FP): Define.




+#if TARGET_CPU_DEFAULT & (MASK_HARD_SH2A_DOUBLE | MASK_SH4)
+/* "-nofpu" if any nofpu option is specified.  */
+#define MUSL_DYNAMIC_LINKER_FP \
+  "%{m1|m2|m2a-nofpu|m3|m4-nofpu|m4-100-nofpu|m4-200-nofpu|m4-300-nofpu|" \
+  "m4-340|m4-400|m4-500|m4al|m5-32media-nofpu|m5-64media-nofpu|" \
+  "m5-compact-nofpu:-nofpu}"
+#else
+/* "-nofpu" if none of the hard fpu options are specified.  */
+#define MUSL_DYNAMIC_LINKER_FP \
+  
"%{m2a|m4|m4-100|m4-200|m4-300|m4a|m5-32media|m5-64media|m5-compact:;:-nofpu}"
+#endif


SH5 has been declared obsolete.  Please do not add any new SH5 related
things.  In this case, drop the m5-* thingies.



removed m5*.
OK to commit with this?

diff --git a/gcc/config/sh/linux.h b/gcc/config/sh/linux.h
index 0f5d614..61cf777 100644
--- a/gcc/config/sh/linux.h
+++ b/gcc/config/sh/linux.h
@@ -43,6 +43,27 @@ along with GCC; see the file COPYING3.  If not see
 
 #define TARGET_ASM_FILE_END file_end_indicate_exec_stack
 
+#if TARGET_ENDIAN_DEFAULT == MASK_LITTLE_ENDIAN
+#define MUSL_DYNAMIC_LINKER_E "%{mb:eb}"
+#else
+#define MUSL_DYNAMIC_LINKER_E "%{!ml:eb}"
+#endif
+
+#if TARGET_CPU_DEFAULT & (MASK_HARD_SH2A_DOUBLE | MASK_SH4)
+/* "-nofpu" if any nofpu option is specified.  */
+#define MUSL_DYNAMIC_LINKER_FP \
+  "%{m1|m2|m2a-nofpu|m3|m4-nofpu|m4-100-nofpu|m4-200-nofpu|m4-300-nofpu|" \
+  "m4-340|m4-400|m4-500|m4al:-nofpu}"
+#else
+/* "-nofpu" if none of the hard fpu options are specified.  */
+#define MUSL_DYNAMIC_LINKER_FP "%{m2a|m4|m4-100|m4-200|m4-300|m4a:;:-nofpu}"
+#endif
+
+#undef MUSL_DYNAMIC_LINKER
+#define MUSL_DYNAMIC_LINKER \
+  "/lib/ld-musl-sh" MUSL_DYNAMIC_LINKER_E MUSL_DYNAMIC_LINKER_FP \
+  "%{mfdpic:-fdpic}.so.1"
+
 #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux.so.2"
 
 #undef SUBTARGET_LINK_EMUL_SUFFIX


Re: [PATCH, rs6000] Pass --secure-plt to the linker

2015-10-19 Thread Szabolcs Nagy

On 19/10/15 14:04, Szabolcs Nagy wrote:

On 19/10/15 12:12, Alan Modra wrote:

On Thu, Oct 15, 2015 at 06:50:50PM +0100, Szabolcs Nagy wrote:

A powerpc toolchain built with (or without) --enable-secureplt
currently creates a binary that uses bss plt if

(1) any of the linked PIC objects have bss plt relocs
(2) or all the linked objects are non-PIC or have no relocs,

because this is the binutils linker behaviour.

This patch passes --secure-plt to the linker which makes the linker
warn in case (1) and produce a binary with secure plt in case (2).


The idea is OK I think, but


@@ -574,6 +577,7 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
  %{R*} \
  %(link_shlib) \
  %{!T*: %(link_start) } \
+%{!static: %(link_secure_plt_default)} \
  %(link_os)"


this change needs to be conditional on !mbss-plt too.



OK, will change that.

if -msecure-plt and -mbss-plt are supposed to affect
linking too (not just code gen) then shall i add
%{msecure-plt: --secure-plt} too?



I added !mbss-plt only for now as a mix of -msecure-plt
and -mbss-plt options do not cancel each other in gcc,
the patch only changes behaviour for a secureplt toolchain.

OK to commit?

diff --git a/gcc/config/rs6000/secureplt.h b/gcc/config/rs6000/secureplt.h
index b463463..77edf2a 100644
--- a/gcc/config/rs6000/secureplt.h
+++ b/gcc/config/rs6000/secureplt.h
@@ -18,3 +18,4 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #define CC1_SECURE_PLT_DEFAULT_SPEC "-msecure-plt"
+#define LINK_SECURE_PLT_DEFAULT_SPEC "--secure-plt"
diff --git a/gcc/config/rs6000/sysv4.h b/gcc/config/rs6000/sysv4.h
index 7b2f9bd..93499e8 100644
--- a/gcc/config/rs6000/sysv4.h
+++ b/gcc/config/rs6000/sysv4.h
@@ -537,6 +537,9 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
 #ifndef CC1_SECURE_PLT_DEFAULT_SPEC
 #define CC1_SECURE_PLT_DEFAULT_SPEC ""
 #endif
+#ifndef LINK_SECURE_PLT_DEFAULT_SPEC
+#define LINK_SECURE_PLT_DEFAULT_SPEC ""
+#endif
 
 /* Pass -G xxx to the compiler.  */
 #undef CC1_SPEC
@@ -574,6 +577,7 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
 %{R*} \
 %(link_shlib) \
 %{!T*: %(link_start) } \
+%{!static: %{!mbss-plt: %(link_secure_plt_default)}} \
 %(link_os)"
 
 /* Shared libraries are not default.  */
@@ -889,6 +893,7 @@ ncrtn.o%s"
   { "link_os_openbsd",		LINK_OS_OPENBSD_SPEC },			\
   { "link_os_default",		LINK_OS_DEFAULT_SPEC },			\
   { "cc1_secure_plt_default",	CC1_SECURE_PLT_DEFAULT_SPEC },		\
+  { "link_secure_plt_default",	LINK_SECURE_PLT_DEFAULT_SPEC },		\
   { "cpp_os_ads",		CPP_OS_ADS_SPEC },			\
   { "cpp_os_yellowknife",	CPP_OS_YELLOWKNIFE_SPEC },		\
   { "cpp_os_mvme",		CPP_OS_MVME_SPEC },			\


[PATCH] fortran/68019 -- Remove an assert() that prevents error message

2015-10-19 Thread Steve Kargl
The attached patch removes an assert() that prevents gfortran from
issuing an error message.  Built and tested on x86_64-*-freebsd.
Althoug probably an "obviously correct" patch, OK to commit?

2015-10-19  Steven G. Kargl  

PR fortran/68019
* decl.c (add_init_expr_to_sym): Remove an assert() to allow an error
message to be issued.

2015-10-19  Steven G. Kargl  

PR fortran/68019
* gfortran.dg/pr68019.f90: new test.

-- 
Steve
Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c	(revision 228974)
+++ gcc/fortran/decl.c	(working copy)
@@ -1486,7 +1486,6 @@ add_init_expr_to_sym (const char *name, 
 			 " with scalar", &sym->declared_at);
 	  return false;
 	}
-	  gcc_assert (sym->as->rank == init->rank);
 
 	  /* Shape should be present, we get an initialization expression.  */
 	  gcc_assert (init->shape);
Index: gcc/testsuite/gfortran.dg/pr68019.f90
===
--- gcc/testsuite/gfortran.dg/pr68019.f90	(revision 0)
+++ gcc/testsuite/gfortran.dg/pr68019.f90	(working copy)
@@ -0,0 +1,13 @@
+! { dg-do compile }
+! Original code from Gerhard Steinmetz
+! Gerhard dot Steinmetz for fortran at t-online dot de
+! PR fortran/68019
+!
+program p
+   integer :: i
+   type t
+  integer :: n
+   end type
+   type(t), parameter :: vec(*) = [(t(i), i = 1, 4)]
+   type(t), parameter :: arr(*) = reshape(vec, [2, 2])   ! { dg-error "ranks 1 and 2 in assignment" }
+end


Re: [PATCH] Fix partial template specialization syntax in wide-int.h

2015-10-19 Thread H.J. Lu
On Mon, Jul 20, 2015 at 12:15 AM, Mikhail Maltsev  wrote:
> On 07/17/2015 07:46 PM, Mike Stump wrote:
>> On Jul 17, 2015, at 2:28 AM, Mikhail Maltsev  wrote:
>>> The following code (reduced from wide-int.h) is rejected by Intel C++
>>> Compiler (EDG-based):
>>
>> So, could you test this with the top of the tree compiler and file a bug
>> report against g++ for it, if it seems to not work right.  If that bug report
>> is rejected, then I’d say file a bug report against clang and EDG.
>
> In addition to usual bootstrap+regtest, I also checked that build succeeds 
> with
> GCC 4.3.6 (IIRC, this is now the minimal required version) as well as with
> recent GCC snapshot used as stage 0. Committed as r225993.
> I also filed this bugreport: 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66941
>
>>> I think that the warning is correct, and "template <>" should not be used
>>> here. The attached patch should fix this issue. Bootstrapped and regtested
>>> on x86_64-linux. OK for trunk?
>>
>> Ok.  Does this need to go into the gcc-5 release branch as well?  If so, ok
>> there too.  Thanks.
> I think there is no need for it.

It is also need for gcc-5. I am backporting it now.


-- 
H.J.


Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-10-19 Thread Jan Hubicka
Hi,
this is patch that reverts the TYPE_MODE mismatch related changes and
adds test to type checker that TYPE_MODE always match with TYPE_CANONICAL.
I have bootstrapped/regtested x86_64-linux, but unfrtunately the regtesting
had some unrelated noise (spawn failures). I am re-testing. I am on a trip
and will likely only access interenet again from Des Moines tonight.

Honza

* tree.c (verify_type): Verify that TYPE_MODE match
between TYPE_CANONICAL and type.
* expr.c (store_expr_with_bounds): Revert my previous change.
* expmed.c (store_bit_field_1): Revert prevoius change.
* gimple-expr.c (useless_type_conversion_p): Require TYPE_MODE
to match for complete types.

Index: tree.c
===
--- tree.c  (revision 228933)
+++ tree.c  (working copy)
@@ -13344,6 +13344,14 @@ verify_type (const_tree t)
   error_found = true;
 }
 
+  if (COMPLETE_TYPE_P (t) && TYPE_CANONICAL (t)
+  && TYPE_MODE (t) != TYPE_MODE (TYPE_CANONICAL (t)))
+{
+  error ("TYPE_MODE of TYPE_CANONICAL is not compatible");
+  debug_tree (ct);
+  error_found = true;
+}
+
 
   /* Check various uses of TYPE_MINVAL.  */
   if (RECORD_OR_UNION_TYPE_P (t))
Index: expr.c
===
--- expr.c  (revision 228933)
+++ expr.c  (working copy)
@@ -5425,14 +5425,6 @@ store_expr_with_bounds (tree exp, rtx ta
 temp = convert_modes (GET_MODE (target), TYPE_MODE (TREE_TYPE (exp)),
  temp, TYPE_UNSIGNED (TREE_TYPE (exp)));
 
-  /* We allow move between structures of same size but different mode.
- If source is in memory and the mode differs, simply change the memory.  */
-  if (GET_MODE (temp) == BLKmode && GET_MODE (target) != BLKmode)
-{
-  gcc_assert (MEM_P (temp));
-  temp = adjust_address_nv (temp, GET_MODE (target), 0);
-}
-
   /* If value was not generated in the target, store it there.
  Convert the value to TARGET's type first if necessary and emit the
  pending incrementations that have been queued when expanding EXP.
Index: expmed.c
===
--- expmed.c(revision 228933)
+++ expmed.c(working copy)
@@ -757,14 +757,6 @@ store_bit_field_1 (rtx str_rtx, unsigned
   }
   }
 
-  /* We allow move between structures of same size but different mode.
- If source is in memory and the mode differs, simply change the memory.  */
-  if (GET_MODE (value) == BLKmode && GET_MODE (op0) != BLKmode)
-{
-  gcc_assert (MEM_P (value));
-  value = adjust_address_nv (value, GET_MODE (op0), 0);
-}
-
   /* Storing an lsb-aligned field in a register
  can be done with a movstrict instruction.  */
 
Index: gimple-expr.c
===
--- gimple-expr.c   (revision 228933)
+++ gimple-expr.c   (working copy)
@@ -88,9 +88,10 @@ useless_type_conversion_p (tree outer_ty
 return true;
 
   /* Changes in machine mode are never useless conversions unless we
- deal with aggregate types in which case we defer to later checks.  */
+ deal with complete aggregate types in which case we defer to later
+ checks.  */
   if (TYPE_MODE (inner_type) != TYPE_MODE (outer_type)
-  && !AGGREGATE_TYPE_P (inner_type))
+  && (!AGGREGATE_TYPE_P (inner_type) || COMPLETE_TYPE_P (outer_type)))
 return false;
 
   /* If both the inner and outer types are integral types, then the


Re: [PATCH] Fix partial template specialization syntax in wide-int.h

2015-10-19 Thread H.J. Lu
On Mon, Oct 19, 2015 at 12:39 PM, H.J. Lu  wrote:
> On Mon, Jul 20, 2015 at 12:15 AM, Mikhail Maltsev  wrote:
>> On 07/17/2015 07:46 PM, Mike Stump wrote:
>>> On Jul 17, 2015, at 2:28 AM, Mikhail Maltsev  wrote:
 The following code (reduced from wide-int.h) is rejected by Intel C++
 Compiler (EDG-based):
>>>
>>> So, could you test this with the top of the tree compiler and file a bug
>>> report against g++ for it, if it seems to not work right.  If that bug 
>>> report
>>> is rejected, then I’d say file a bug report against clang and EDG.
>>
>> In addition to usual bootstrap+regtest, I also checked that build succeeds 
>> with
>> GCC 4.3.6 (IIRC, this is now the minimal required version) as well as with
>> recent GCC snapshot used as stage 0. Committed as r225993.
>> I also filed this bugreport: 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66941
>>
 I think that the warning is correct, and "template <>" should not be used
 here. The attached patch should fix this issue. Bootstrapped and regtested
 on x86_64-linux. OK for trunk?
>>>
>>> Ok.  Does this need to go into the gcc-5 release branch as well?  If so, ok
>>> there too.  Thanks.
>> I think there is no need for it.
>
> It is also need for gcc-5. I am backporting it now.
>

This is what I checked into gcc-5-branch.

-- 
H.J.
From 4ae06c3dbe5fb2c4d345060b1ba9cd34b2dc7d37 Mon Sep 17 00:00:00 2001
From: miyuki 
Date: Mon, 20 Jul 2015 05:30:12 +
Subject: [PATCH] Fix partial specialization syntax of wide int traits.

gcc/
	* wide-int.h (struct binary_traits): Fix partial specialization syntax.
	(struct int_traits): Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225993 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  |  8 
 gcc/wide-int.h | 10 --
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7fb0538..45ae071 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2015-10-19  H.J. Lu  
+
+	Backport from mainline
+	2015-07-20  Mikhail Maltsev  
+
+	* wide-int.h (struct binary_traits): Fix partial specialization syntax.
+	(struct int_traits): Likewise.
+
 2015-10-16  Richard Sandiford  
 
 	PR middle-end/66311
diff --git a/gcc/wide-int.h b/gcc/wide-int.h
index 46f4545..9a71c4f 100644
--- a/gcc/wide-int.h
+++ b/gcc/wide-int.h
@@ -365,21 +365,18 @@ namespace wi
  inputs.  Note that CONST_PRECISION and VAR_PRECISION cannot be
  mixed, in order to give stronger type checking.  When both inputs
  are CONST_PRECISION, they must have the same precision.  */
-  template <>
   template 
   struct binary_traits 
   {
 typedef widest_int result_type;
   };
 
-  template <>
   template 
   struct binary_traits 
   {
 typedef wide_int result_type;
   };
 
-  template <>
   template 
   struct binary_traits 
   {
@@ -389,14 +386,12 @@ namespace wi
 			   ::precision> > result_type;
   };
 
-  template <>
   template 
   struct binary_traits 
   {
 typedef wide_int result_type;
   };
 
-  template <>
   template 
   struct binary_traits 
   {
@@ -406,7 +401,6 @@ namespace wi
 			   ::precision> > result_type;
   };
 
-  template <>
   template 
   struct binary_traits 
   {
@@ -417,7 +411,6 @@ namespace wi
 			   ::precision> > result_type;
   };
 
-  template <>
   template 
   struct binary_traits 
   {
@@ -881,7 +874,6 @@ generic_wide_int ::dump () const
 
 namespace wi
 {
-  template <>
   template 
   struct int_traits < generic_wide_int  >
 : public wi::int_traits 
@@ -960,7 +952,6 @@ inline wide_int_ref_storage ::wide_int_ref_storage (const T &x,
 
 namespace wi
 {
-  template <>
   template 
   struct int_traits  >
   {
@@ -1147,7 +1138,6 @@ public:
 
 namespace wi
 {
-  template <>
   template 
   struct int_traits < fixed_wide_int_storage  >
   {
-- 
2.4.3



Re: [gomp4.1] depend nowait support for target {update,{enter,exit} data}

2015-10-19 Thread Ilya Verbin
On Thu, Oct 15, 2015 at 16:01:56 +0200, Jakub Jelinek wrote:
> >void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> >  
> > +  if (flags & GOMP_TARGET_FLAG_NOWAIT)
> > +{
> > +  gomp_create_target_task (devicep, fn_addr, mapnum, hostaddrs, sizes,
> > +  kinds, flags, depend);
> > +  return;
> > +}
> 
> But this is not ok.  You need to do this far earlier, already before the
> if (depend != NULL) code in GOMP_target_41.  And, I think you should just
> not pass fn_addr, but fn itself.
> 
> > @@ -1636,34 +1657,58 @@ void
> >  gomp_target_task_fn (void *data)
> >  {
> >struct gomp_target_task *ttask = (struct gomp_target_task *) data;
> > +  struct gomp_device_descr *devicep = ttask->devicep;
> > +
> >if (ttask->fn != NULL)
> >  {
> > -  /* GOMP_target_41 */
> > +  if (devicep == NULL
> > + || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
> > +   {
> > + /* FIXME: Save host fn addr into gomp_target_task?  */
> > + gomp_target_fallback_firstprivate (NULL, ttask->mapnum,
> 
> If you pass above fn instead of fn_addr, ttask->fn is what you want
> to pass to gomp_target_fallback_firstprivate here and remove the FIXME.
> 
> > +ttask->hostaddrs, ttask->sizes,
> > +ttask->kinds);
> > + return;
> > +   }
> > +
> > +  struct target_mem_desc *tgt_vars
> > +   = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL,
> > +ttask->sizes, ttask->kinds, true,
> > +GOMP_MAP_VARS_TARGET);
> > +  devicep->async_run_func (devicep->target_id, ttask->fn,
> > +  (void *) tgt_vars->tgt_start, data);
> 
> You need to void *fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn);
> first obviously, and pass fn_addr.
> 
> > +
> > +  /* FIXME: TMP example of checking for completion.
> > +Alternatively the plugin can set some completion flag in ttask.  */
> > +  while (!devicep->async_is_completed_func (devicep->target_id, data))
> > +   {
> > + fprintf (stderr, "-");
> > + usleep (10);
> > +   }
> 
> This obviously doesn't belong here.
> 
> >if (device->capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
> > diff --git a/libgomp/testsuite/libgomp.c/target-tmp.c 
> > b/libgomp/testsuite/libgomp.c/target-tmp.c
> > new file mode 100644
> > index 000..23a739c
> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.c/target-tmp.c
> > @@ -0,0 +1,40 @@
> > +#include 
> > +#include 
> > +
> > +#pragma omp declare target
> > +void foo (int n)
> > +{
> > +  printf ("Start tgt %d\n", n);
> > +  usleep (500);
> 
> 5s is too long.  Not to mention that not sure if PTX can do printf
> and especially usleep.
> 
> > diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
> > b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > index 26ac6fe..c843710 100644
> > --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> ...
> > +/* Set of asynchronously running target tasks.  */
> > +static std::set *async_tasks;
> > +
> >  /* Thread-safe registration of the main image.  */
> >  static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
> >  
> > +/* Mutex for protecting async_tasks.  */
> > +static pthread_mutex_t async_tasks_lock = PTHREAD_MUTEX_INITIALIZER;
> > +
> >  static VarDesc vd_host2tgt = {
> >{ 1, 1 },  /* dst, src */
> >{ 1, 0 },  /* in, out  */
> > @@ -156,6 +163,8 @@ init (void)
> >  
> >  out:
> >address_table = new ImgDevAddrMap;
> > +  async_tasks = new std::set;
> > +  pthread_mutex_init (&async_tasks_lock, NULL);
> 
> PTHREAD_MUTEX_INITIALIZER should already initialize the lock.
> But, do you really need async_tasks and the lock?  Better store
> something into some plugin's owned field in target_task struct and
> let the plugin callback be passed address of that field rather than the
> whole target_task?

So, here is what I have for now.  Attached target-29.c testcase works fine with
MIC emul, however I don't know how to (and where) properly check for completion
of async execution on target.  And, similarly, where to do unmapping after that?
Do we need a callback from plugin to libgomp (as far as I understood, PTX
runtime supports this, but HSA doesn't), or libgomp will just check for
ttask->is_completed in task.c?

 
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 9c8b1fb..e707c80 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -430,6 +430,7 @@ struct gomp_target_task
   size_t *sizes;
   unsigned short *kinds;
   unsigned int flags;
+  bool is_completed;
   void *hostaddrs[];
 };
 
@@ -877,6 +878,7 @@ struct gomp_device_descr
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void *(*dev2dev_func) (int, void *, const void *, size_t);
   void (*run

PING: [PATCH] PR target/67215: -fno-plt needs improvements for x86

2015-10-19 Thread H.J. Lu
-- Forwarded message --
From: H.J. Lu 
Date: Wed, Sep 9, 2015 at 3:02 PM
Subject: [PATCH] PR target/67215: -fno-plt needs improvements for x86
To: gcc-patches@gcc.gnu.org


prepare_call_address in calls.c is the wrong place to handle -fno-plt.
We shoudn't force function address into register and hope that load
function address via GOT and indirect call via register will be folded
into indirect call via GOT, which doesn't always happen.  Also non-PIC
case can only be handled in backend.  Instead, backend should expand
external function call into indirect call via GOT for -fno-plt.

This patch reverts -fno-plt in prepare_call_address and handles it in
ix86_expand_call.  Other backends may need similar changes to support
-fno-plt.  Alternately, we can introduce a target hook to indicate
whether an external function should be called via register for -fno-plt
so that i386 backend can disable it in prepare_call_address.

gcc/

PR target/67215
* calls.c (prepare_call_address): Don't handle -fno-plt here.
* config/i386/i386.c (ix86_expand_call): Generate indirect call
via GOT for -fno-plt.  Support indirect call via GOT for x32.
* config/i386/predicates.md (sibcall_memory_operand): Allow
GOT memory operand.

gcc/testsuite/

PR target/67215
* gcc.target/i386/pr67215-1.c: New test.
* gcc.target/i386/pr67215-2.c: Likewise.
* gcc.target/i386/pr67215-3.c: Likewise.
---
 gcc/calls.c   | 12 --
 gcc/config/i386/i386.c| 71 ---
 gcc/config/i386/predicates.md |  7 ++-
 gcc/testsuite/gcc.target/i386/pr67215-1.c | 20 +
 gcc/testsuite/gcc.target/i386/pr67215-2.c | 20 +
 gcc/testsuite/gcc.target/i386/pr67215-3.c | 13 ++
 6 files changed, 114 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67215-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67215-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67215-3.c

diff --git a/gcc/calls.c b/gcc/calls.c
index 026cb53..22c65cd 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -203,18 +203,6 @@ prepare_call_address (tree fndecl_or_type, rtx
funexp, rtx static_chain_value,
   && targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
  ? force_not_mem (memory_address (FUNCTION_MODE, funexp))
  : memory_address (FUNCTION_MODE, funexp));
-  else if (flag_pic
-  && fndecl_or_type
-  && TREE_CODE (fndecl_or_type) == FUNCTION_DECL
-  && (!flag_plt
-  || lookup_attribute ("noplt", DECL_ATTRIBUTES (fndecl_or_type)))
-  && !targetm.binds_local_p (fndecl_or_type))
-{
-  /* This is done only for PIC code.  There is no easy interface
to force the
-function address into GOT for non-PIC case.  non-PIC case needs to be
-handled specially by the backend.  */
-  funexp = force_reg (Pmode, funexp);
-}
   else if (! sibcallp)
 {
   if (!NO_FUNCTION_CSE && optimize && ! flag_no_function_cse)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d78f4e7..b9299d4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -25649,21 +25649,54 @@ ix86_expand_call (rtx retval, rtx fnaddr,
rtx callarg1,
   /* Static functions and indirect calls don't need the pic
register.  Also,
 check if PLT was explicitly avoided via no-plt or "noplt"
attribute, making
 it an indirect call.  */
+  rtx addr = XEXP (fnaddr, 0);
   if (flag_pic
- && (!TARGET_64BIT
- || (ix86_cmodel == CM_LARGE_PIC
- && DEFAULT_ABI != MS_ABI))
- && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
- && !SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))
- && flag_plt
- && (SYMBOL_REF_DECL ((XEXP (fnaddr, 0))) == NULL_TREE
- || !lookup_attribute ("noplt",
-DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP (fnaddr, 0))
+ && GET_CODE (addr) == SYMBOL_REF
+ && !SYMBOL_REF_LOCAL_P (addr))
{
- use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
- if (ix86_use_pseudo_pic_reg ())
-   emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM),
-   pic_offset_table_rtx);
+ if (flag_plt
+ && (SYMBOL_REF_DECL (addr) == NULL_TREE
+ || !lookup_attribute ("noplt",
+   DECL_ATTRIBUTES
(SYMBOL_REF_DECL (addr)
+   {
+ if (!TARGET_64BIT
+ || (ix86_cmodel == CM_LARGE_PIC
+ && DEFAULT_ABI != MS_ABI))
+   {
+ use_reg (&use, gen_rtx_REG (Pmode,
+ REAL_PIC_OFFSET_TABLE_REGNUM));
+ if (ix86_use_pseudo_pic_reg ())
+   emit_move_insn (gen_rtx_REG (Pmode,
+

Re: [c++-delayed-folding] First stab at convert_to_integer

2015-10-19 Thread Jason Merrill

On 10/19/2015 02:31 AM, Marek Polacek wrote:

On Fri, Oct 16, 2015 at 02:07:51PM -1000, Jason Merrill wrote:

On 10/16/2015 07:35 AM, Marek Polacek wrote:

This code path seems to be for pushing a conversion down into a binary
expression.  We shouldn't do this at all when we aren't folding.


I tend to agree, but this case is tricky.  What's this code about is
e.g. for

int
fn (long p, long o)
{
   return p + o;
}

we want to narrow the operation and do the addition on unsigned ints and then
convert to int.  We do it here because we're still missing the
promotion/demotion pass on GIMPLE (PR45397 / PR47477).  Disabling this
optimization here would regress a few testcases, so I kept the code as it was.
Thoughts?


That makes sense, but please add a comment referring to one of those PRs and
also add a note to the PR about this place.  OK with that change.


Done.  But I can't seem to commit the patch to the c++-delayed-folding
branch; is that somehow restricted?  I'm getting:

svn: E170001: Commit failed (details follow):
svn: E170001: Authorization failed
svn: E170001: Your commit message was left in a temporary file:
svn: E170001:'/home/marek/svn/c++-delayed-folding/svn-commit.tmp'

and I've checked out the branch using
svn co svn://mpola...@gcc.gnu.org/svn/gcc/branches/c++-delayed-folding/


You need to use svn+ssh:// rather than svn:// if you want to be able to 
commit.  From svnwrite.html:


It is also possible to convert an existing SVN tree to use SSH by using 
svn switch --relocate:


svn switch --relocate svn://gcc.gnu.org/svn/gcc 
svn+ssh://usern...@gcc.gnu.org/svn/gcc


Jason



Re: [PATCH] fortran/68019 -- Remove an assert() that prevents error message

2015-10-19 Thread Paul Richard Thomas
Hi Steve,

Yes, this is OK for trunk. I suggest that it is so obvious that it
should go into 5 branch as well.

Cheers

Paul

On 19 October 2015 at 21:13, Steve Kargl
 wrote:
> The attached patch removes an assert() that prevents gfortran from
> issuing an error message.  Built and tested on x86_64-*-freebsd.
> Althoug probably an "obviously correct" patch, OK to commit?
>
> 2015-10-19  Steven G. Kargl  
>
> PR fortran/68019
> * decl.c (add_init_expr_to_sym): Remove an assert() to allow an error
> message to be issued.
>
> 2015-10-19  Steven G. Kargl  
>
> PR fortran/68019
> * gfortran.dg/pr68019.f90: new test.
>
> --
> Steve



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx


PING: [PATCH] X86: Optimize access to globals in PIE with copy reloc

2015-10-19 Thread H.J. Lu
PING.


-- Forwarded message --
From: H.J. Lu 
Date: Wed, Jul 1, 2015 at 5:11 AM
Subject: [PATCH] X86: Optimize access to globals in PIE with copy reloc
To: gcc-patches@gcc.gnu.org


Normally, with PIE, GCC accesses globals that are extern to the module
using GOT.  This is two instructions, one to get the address of the global
from GOT and the other to get the value.  Examples:

---
extern int a_glob;
int
main ()
{
  return a_glob;
}
---

With PIE, the generated code accesses global via GOT using two memory
loads:

movqa_glob@GOTPCREL(%rip), %rax
movl(%rax), %eax

for 64-bit or

movla_glob@GOT(%ecx), %eax
movl(%eax), %eax

for 32-bit.

Some experiments on google and SPEC CPU benchmarks show that the extra
instruction affects performance by 1% to 5%.

Solution - Copy Relocations:

When the linker supports copy relocations, GCC can always assume that
the global will be defined in the executable.  For globals that are
truly extern (come from shared objects), the linker will create copy
relocations and have them defined in the executable.  Result is that
no global access needs to go through GOT and hence improves performance.
We can generate

movla_glob(%rip), %eax

for 64-bit and

movla_glob@GOTOFF(%eax), %eax

for 32-bit.  This optimization only applies to undefined non-weak
non-TLS global data.  Undefined weak global or TLS data access still
must go through GOT.

This patch reverts legitimate_pic_address_disp_p change made in revision
218397, which only applies to x86-64.  Instead, this patch updates
targetm.binds_local_p to indicate if undefined non-weak non-TLS global
data is defined locally in PIE.  It also introduces a new target hook,
binds_tls_local_p to distinguish TLS variable from non-TLS variable.  By
default, binds_tls_local_p is the same as binds_local_p which assumes
TLS variable.

This patch checks if 32-bit and 64-bit linkers support PIE with copy
reloc at configure time.  64-bit linker is enabled in binutils 2.25
and 32-bit linker is enabled in binutils 2.26.  This optimization
is enabled only if the linker support is available.

Since copy relocation in PIE is incompatible with DSO created by
-Wl,-Bsymbolic, this patch also adds a new option, -fsymbolic, which
controls how references to global symbols are bound.  The -fsymbolic
option binds references to global symbols to the local definitions
and external references globally.  It avoids copy relocations in PIE
and optimizes global symbol references in shared library created
by -Wl,-Bsymbolic.

gcc/

PR target/65846
PR target/65886
* configure.ac (HAVE_LD_PIE_COPYRELOC): Renamed to ...
(HAVE_LD_X86_64_PIE_COPYRELOC): This.
(HAVE_LD_386_PIE_COPYRELOC): New.   Defined to 1 if Linux/ia32
linker supports PIE with copy reloc.
* output.h (default_binds_tls_local_p): New.
(default_binds_local_p_3): Add 2 bool arguments.
* target.def (binds_tls_local_p): New target hook.
* varasm.c (decl_default_tls_model): Replace targetm.binds_local_p
with targetm.binds_tls_local_p.
(default_binds_local_p_3): Add a bool argument to indicate TLS
variable and a bool argument to indicate if an undefined non-TLS
non-weak data is local.  Double check TLS variable.  If an
undefined non-TLS non-weak data is local, treat it as defined
locally.
(default_binds_local_p): Pass true and false to
default_binds_local_p_3.
(default_binds_local_p_2): Likewise.
(default_binds_local_p_1): Likewise.
(default_binds_tls_local_p): New.
* config.in: Regenerated.
* configure: Likewise.
* doc/tm.texi: Likewise.
* config/i386/i386.c (legitimate_pic_address_disp_p): Don't
check HAVE_LD_PIE_COPYRELOC here.
(ix86_binds_local): New.
(ix86_binds_tls_local_p): Likewise.
(ix86_binds_local_p): Use it.
(TARGET_BINDS_TLS_LOCAL_P): New.
* doc/tm.texi.in (TARGET_BINDS_TLS_LOCAL_P): New hook.

gcc/testsuite/

PR target/65846
PR target/65886
* gcc.target/i386/pie-copyrelocs-1.c: Updated for ia32.
* gcc.target/i386/pie-copyrelocs-2.c: Likewise.
* gcc.target/i386/pie-copyrelocs-3.c: Likewise.
* gcc.target/i386/pie-copyrelocs-4.c: Likewise.
* gcc.target/i386/pr32219-9.c: Likewise.
* gcc.target/i386/pr32219-10.c: New file.
* gcc.target/i386/pr65886-1.c: Likewise.
* gcc.target/i386/pr65886-2.c: Likewise.
* gcc.target/i386/pr65886-3.c: Likewise.
* gcc.target/i386/pr65886-4.c: Likewise.
* gcc.target/i386/pr65886-4.c: Likewise.
* gcc.target/i386/pr65886-5.c: Likewise.

* lib/target-supports.exp (check_effective_target_pie_copyreloc):
Check HAVE_LD_X86_64_PIE_COPYRELOC and HAVE_LD_386_PIE_COPYRELOC
instead of HAVE_LD_X86_64_PIE_COPYRELOC.
---
 gcc/c

Re: New power of 2 hash policy

2015-10-19 Thread François Dumont
Is this one ok ?

François


On 28/09/2015 21:16, François Dumont wrote:
> On 25/09/2015 15:28, Jonathan Wakely wrote:
>> @@ -501,6 +503,129 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>> mutable std::size_t_M_next_resize;
>>>   };
>>>
>>> +  /// Range hashing function considering that second args is a power
>>> of 2.
>> Does this mean "assuming" not "considering"?
> I assume yes.
>
>>> +  struct _Mask_range_hashing
>>> +  {
>>> +typedef std::size_t first_argument_type;
>>> +typedef std::size_t second_argument_type;
>>> +typedef std::size_t result_type;
>>> +
>>> +result_type
>>> +operator()(first_argument_type __num,
>>> +   second_argument_type __den) const noexcept
>>> +{ return __num & (__den - 1); }
>>> +  };
>>> +
>>> +
>>> +  /// Helper type to compute next power of 2.
>>> +  template
>>> +struct _NextPower2
>>> +{
>>> +  static std::size_t
>>> +  _Get(std::size_t __n)
>>> +  {
>>> +std::size_t __next = _NextPower2<(_N >> 1)>::_Get(__n);
>>> +return __next |= __next >> _N;
>>> +  }
>>> +};
>>> +
>>> +  template<>
>>> +struct _NextPower2<1>
>>> +{
>>> +  static std::size_t
>>> +  _Get(std::size_t __n)
>>> +  { return __n |= __n >> 1; }
>>> +};
>> This doesn't seem to return the next power of 2, it returns one less.
>>
>> _NextPower2<32>::_Get(2) returns 3, but 2 is already a power of 2.
>> _NextPower2<32>::_Get(3) returns 3, but the next power of 2 is 4.
>
> Yes, name is bad, that is just part of the algo you copy/paste below. I
> review implementation to have _NextPower2 do all the algo.
>
>>
>> I don't think this needs to be a recursive template, it can simply be
>> a function, can't it?
> I wanted code to adapt to any sizeof(std::size_t) without relying on
> some preprocessor checks. As you pointed out additional >> 32 on 32 bits
> or >> 64 on 64 bits wouldn't hurt but the recursive template just make
> sure that we don't do useless operations.
>
>>
>>> +  /// Rehash policy providing power of 2 bucket numbers. Ease modulo
>>> +  /// operations.
>>> +  struct _Power2_rehash_policy
>>> +  {
>>> +using __has_load_factor = std::true_type;
>>> +
>>> +_Power2_rehash_policy(float __z = 1.0) noexcept
>>> +: _M_max_load_factor(__z), _M_next_resize(0) { }
>>> +
>>> +float
>>> +max_load_factor() const noexcept
>>> +{ return _M_max_load_factor; }
>>> +
>>> +// Return a bucket size no smaller than n (as long as n is not
>>> above the
>>> +// highest power of 2).
>> This says "no smaller than n" but it actually seems to guarantee
>> "greater than n" because _NextPower2<>::_Get(n)+1 is 2n when n is a
>> power of two.
> yes but this function is calling _NextPower2<>::_Get(n - 1) + 1, there
> is a minus one which make this comment valid as shown by newly
> introduced test.
>
>>> +std::size_t
>>> +_M_next_bkt(std::size_t __n) const
>>> +{
>>> +  constexpr auto __max_bkt
>>> += (std::size_t(1) << (sizeof(std::size_t) * 8 - 1));
>>> +
>>> +  std::size_t __res
>>> += _NextPower2<((sizeof(std::size_t) * 8) >> 1)>::_Get(--__n) + 1;
>> You wouldn't need to add one to the result if the template actually
>> returned a power of two!
>>
>>> +  if (__res == 0)
>>> +__res = __max_bkt;
>>> +
>>> +  if (__res == __max_bkt)
>>> +// Set next resize to the max value so that we never try to
>>> rehash again
>>> +// as we already reach the biggest possible bucket number.
>>> +// Note that it might result in max_load_factor not being
>>> respected.
>>> +_M_next_resize = std::size_t(0) - 1;
>>> +  else
>>> +_M_next_resize
>>> +  = __builtin_floor(__res * (long double)_M_max_load_factor);
>>> +
>>> +  return __res;
>>> +}
>> What are the requirements for this function, "no smaller than n" or
>> "greater than n"?
> 'No smaller than n' like stated in the comment. However for big n it is
> not possible, even in the prime number based implementation. So I played
> with _M_next_resize to make sure that _M_next_bkt won't be called again
> as soon as the max bucket number has been reach.
>
>
>> If "no smaller than n" is correct then the algorithm you want is
>> "round up to nearest power of 2", which you can find here (I wrote
>> this earlier this year for some reason I can't remember now):
>>
>> https://gitlab.com/redistd/redistd/blob/master/include/redi/bits.h
>>
>> The non-recursive version is only a valid constexpr function in C++14,
>> but since you don't need a constexpr function you could just that,
>> extended to handle 64-bit:
>>
>>  std::size_t
>>  clp2(std::size_t n)
>>  {
>>std::uint_least64_t x = n;
>>// Algorithm from Hacker's Delight, Figure 3-3.
>>x = x - 1;
>>x = x | (x >> 1);
>>x = x | (x >> 2);
>>x = x | (x >> 4);
>>x = x | (x >> 8);
>>x = x | (x >>16);
>>x = x | (x >>32);
>>return x + 1;
>>  }
>>
>> We could avoid the last shift when sizeof(size_t) == 32, I don't know
>> if the optimiser

Re: [c++-delayed-folding] First stab at convert_to_integer

2015-10-19 Thread Marek Polacek
On Mon, Oct 19, 2015 at 09:59:03AM -1000, Jason Merrill wrote:
> >Done.  But I can't seem to commit the patch to the c++-delayed-folding
> >branch; is that somehow restricted?  I'm getting:
> >
> >svn: E170001: Commit failed (details follow):
> >svn: E170001: Authorization failed
> >svn: E170001: Your commit message was left in a temporary file:
> >svn: E170001:'/home/marek/svn/c++-delayed-folding/svn-commit.tmp'
> >
> >and I've checked out the branch using
> >svn co svn://mpola...@gcc.gnu.org/svn/gcc/branches/c++-delayed-folding/
> 
> You need to use svn+ssh:// rather than svn:// if you want to be able to
> commit.  From svnwrite.html:
> 
> It is also possible to convert an existing SVN tree to use SSH by using svn
> switch --relocate:
> 
> svn switch --relocate svn://gcc.gnu.org/svn/gcc
> svn+ssh://usern...@gcc.gnu.org/svn/gcc

Oh my, thanks.  Committed now.

Marek


[PATCH] Refactoring sese.h and graphite-poly.h

2015-10-19 Thread Aditya Kumar
Rename scop->region to scop->scop_info
Removed conversion constructors for sese_l and dr_info.
Removed macros.

No functional changed intended. Passes regtest and bootstrap.

gcc/ChangeLog:

2015-19-10  Aditya Kumar  
* graphite-poly.h (struct dr_info): Removed conversion constructor.
(struct scop): Renamed scop::region to scop::scop_info
(scop_set_region): Same.
(SCOP_REGION): Removed
(SCOP_CONTEXT): Removed.
(POLY_SCOP_P): Removed.
* graphite-isl-ast-to-gimple.c (translate_isl_ast_node_user):
Rename scop->region to scop->scop_info.
(add_parameters_to_ivs_params): Same.
(graphite_regenerate_ast_isl): Same.
* graphite-poly.c (new_scop): Same.
(free_scop): Same.
(print_scop_params): Same.
* graphite-scop-detection.c (scop_detection::remove_subscops): Same.
(scop_detection::remove_intersecting_scops): Use pointer to sese_l.
(dot_all_scops_1): Rename scop->region to scop->scop_info.
(scop_detection::nb_pbbs_in_loops): Same.
(find_scop_parameters): Same.
(try_generate_gimple_bb): Same.
(gather_bbs::before_dom_children): Same.
(gather_bbs::after_dom_children): Same.
(build_scops): Same.
* graphite-sese-to-poly.c (build_scop_scattering): Same.
(extract_affine_chrec): Same.
(extract_affine): Same.
(set_scop_parameter_dim): Same.
(build_loop_iteration_domains): Same.
(create_pw_aff_from_tree): Same.
(add_param_constraints): Same.
(build_scop_iteration_domain): Same.
(build_scop_drs): Same.
(analyze_drs_in_stmts): Same.
(insert_out_of_ssa_copy_on_edge): Same.
(rewrite_close_phi_out_of_ssa):Same. 
(rewrite_reductions_out_of_ssa):Same. 
(handle_scalar_deps_crossing_scop_limits):Same. 
(rewrite_cross_bb_scalar_deps):Same. 
(rewrite_cross_bb_scalar_deps_out_of_ssa):Same. 
(build_poly_scop):Same. 
(build_alias_set): Use pointer to dr_info.
* graphite.c (print_graphite_scop_statistics):
(graphite_transform_loops):
* sese.h (struct sese_l): Remove conversion constructor.



---
 gcc/graphite-isl-ast-to-gimple.c |  8 
 gcc/graphite-poly.c  |  8 
 gcc/graphite-poly.h  | 14 ++
 gcc/graphite-scop-detection.c| 34 
 gcc/graphite-sese-to-poly.c  | 42 
 gcc/graphite.c   | 10 +-
 gcc/sese.h   |  3 ---
 7 files changed, 53 insertions(+), 66 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 2f2e2ba..7f99bce 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -786,10 +786,10 @@ translate_isl_ast_node_user (__isl_keep isl_ast_node 
*node,
   iv_map.create (nb_loops);
   iv_map.safe_grow_cleared (nb_loops);
 
-  build_iv_mapping (iv_map, gbb, user_expr, ip, pbb->scop->region->region);
+  build_iv_mapping (iv_map, gbb, user_expr, ip, pbb->scop->scop_info->region);
   isl_ast_expr_free (user_expr);
   next_e = copy_bb_and_scalar_dependences (GBB_BB (gbb),
-  pbb->scop->region, next_e,
+  pbb->scop->scop_info, next_e,
   iv_map,
   &graphite_regenerate_error);
   iv_map.release ();
@@ -909,7 +909,7 @@ print_isl_ast_node (FILE *file, __isl_keep isl_ast_node 
*node,
 static void
 add_parameters_to_ivs_params (scop_p scop, ivs_params &ip)
 {
-  sese_info_p region = scop->region;
+  sese_info_p region = scop->scop_info;
   unsigned nb_parameters = isl_set_dim (scop->param_context, isl_dim_param);
   gcc_assert (nb_parameters == SESE_PARAMS (region).length ());
   unsigned i;
@@ -1144,7 +1144,7 @@ bool
 graphite_regenerate_ast_isl (scop_p scop)
 {
   loop_p context_loop;
-  sese_info_p region = scop->region;
+  sese_info_p region = scop->scop_info;
   ifsese if_region = NULL;
   isl_ast_node *root_node;
   ivs_params ip;
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 0d1dc63..eb76f05 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -306,7 +306,7 @@ new_scop (edge entry, edge exit)
   scop->may_waw_no_source = NULL;
   scop_set_region (scop, region);
   scop->pbbs.create (3);
-  POLY_SCOP_P (scop) = false;
+  scop->poly_scop_p = false;
   scop->drs.create (3);
 
   return scop;
@@ -321,7 +321,7 @@ free_scop (scop_p scop)
   poly_bb_p pbb;
 
   remove_gbbs_in_scop (scop);
-  free_sese_info (SCOP_REGION (scop));
+  free_sese_info (scop->scop_info);
 
   FOR_EACH_VEC_ELT (scop->pbbs, i, pbb)
 free_poly_bb (pbb);
@@ -475,13 +475,13 @@ print_pbb (FILE *file, poly_bb_p pbb)
 void
 print_scop_params (FILE *file, scop_p scop)
 {
-  if (SESE_PARAMS (SCOP_REGION (scop)).i

Re: [PATCH] Move cproj simplification to match.pd

2015-10-19 Thread Christophe Lyon
On 19 October 2015 at 15:54, Richard Biener  wrote:
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>

Hi Richard,

This patch caused arm and aarch64 builds of newlib to cause ICEs:
In file included from
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/include/stdlib.h:11:0,
 from
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:13:
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:
In function '_mktm_r':
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:28:9:
internal compiler error: Segmentation fault
 _DEFUN (_mktm_r, (tim_p, res, is_gmtime),
0xa90205 crash_signal
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:353
0x7b3b0c tree_class_check
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:3055
0x7b3b0c tree_single_nonnegative_warnv_p(tree_node*, bool*, int)

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/fold-const.c:13025
0x814053 gimple_phi_nonnegative_warnv_p

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:6239
0x814053 gimple_stmt_nonnegative_warnv_p(gimple*, bool*, int)

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:6264
0x7b5c94 tree_expr_nonnegative_p(tree_node*)

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/fold-const.c:13325
0xe2f657 gimple_simplify_108

/tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:5116
0xe3060d gimple_simplify_TRUNC_MOD_EXPR

/tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:24762
0xe0809b gimple_simplify

/tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:34389
0xe08c2b gimple_resimplify2(gimple**, code_helper*, tree_node*,
tree_node**, tree_node* (*)(tree_node*))

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:193
0xe17600 gimple_simplify(gimple*, code_helper*, tree_node**, gimple**,
tree_node* (*)(tree_node*), tree_node* (*)(tree_node*))

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:762
0x81c694 fold_stmt_1

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:3605
0xad0f6c replace_uses_by(tree_node*, tree_node*)

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfg.c:1835
0xad1a2f gimple_merge_blocks

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfg.c:1921
0x67d325 merge_blocks(basic_block_def*, basic_block_def*)

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfghooks.c:776
0xae06da cleanup_tree_cfg_bb

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:654
0xae1118 cleanup_tree_cfg_1

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:686
0xae1118 cleanup_tree_cfg_noloop

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:738
0xae1118 cleanup_tree_cfg()

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:793
0x9c5c94 execute_function_todo

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/passes.c:1920
Please submit a full bug report,

This happens for instance with GCC configured
--target arm-none-eabi
--with-cpu cortex-a9

You can download logs of a failed build from
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-build/trunk/228970/build.html

Sorry, I'm out of office for one week, I can't produce further details.

Christophe


> Richard.
>
> 2015-10-19  Richard Biener  
>
> * gimple-fold.c (gimple_phi_nonnegative_warnv_p): New function.
> (gimple_stmt_nonnegative_warnv_p): Use it.
> * match.pd (CPROJ): New operator list.
> (cproj (complex ...)): Move simplifications from ...
> * builtins.c (fold_builtin_cproj): ... here.
>
> * gcc.dg/torture/builtin-cproj-1.c: Skip for -O0.
>
> Index: gcc/gimple-fold.c
> ===
> --- gcc/gimple-fold.c   (revision 228877)
> +++ gcc/gimple-fold.c   (working copy)
> @@ -6224,6 +6224,24 @@ gimple_call_nonnegative_warnv_p (gimple
> strict_overflow_p, depth);
>  }
>
> +/* Return true if return value of call STMT is known to be non-negative.
> +   If the return value is based on the assumption that signed overflow is
> +   undefined, set *STRICT_OVERFLOW_P to true; otherwise, don't change
> +   *STRICT_OVERFLOW_P.  DEPTH is the current nesting depth of the query.  */
> +
> +static bool
> +gimple_phi_nonnegative_warnv_p (gimple *stmt, bool *strict_overflow_p,
> +   int depth)
> +{
> +  for (unsigned i = 0; i < gimple_phi_num_args (stmt); ++i)
> +{
> +  tree ar

Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-10-19 Thread Jan Hubicka
Richard,
I missed your reply earlier today.
> > Therefore I would say that TYPE_CANONICAL determine mode modulo the fact 
> > that
> > incoplete variant of a complete type will have VOIDmode instead of complete
> > type's mode (during non-LTO).  That is why I allow mode changes for casts 
> > from
> > complete to incomplete.
> 
> Incomplete have VOIDmode, right?

Yes
> 
> > In longer run I think that every query to useless_type_conversion_p that
> > contains incomplete types is a confused query.  useless_type_conversion_p is
> > about operations on the value and there are no operations for incomplete 
> > type
> > (and function types).  I know that ipa-icf-gimple and the following code in
> > gimplify-stmt checks this frequently:
> >   /* The FEs may end up building ADDR_EXPRs early on a decl with
> >  an incomplete type.  Re-build ADDR_EXPRs in canonical form
> >  here.  */
> >   if (!types_compatible_p (TREE_TYPE (op0), TREE_TYPE (TREE_TYPE 
> > (expr
> > *expr_p = build_fold_addr_expr (op0);
> > Taking address of incomplete type or functions, naturally, makes sense.  We 
> > may
> > want to check something else here, like simply
> >TREE_TYPE (op0) != TREE_TYPE (TREE_TYPE (expr))
> > and once ipa-icf is cleanded up start sanity checking in 
> > usless_type_conversion
> > that we use it to force equality only on types that do have values.
> >
> > We also can trip it when checking TYPE_METHOD_BASETYPE which may be 
> > incomplete.
> > This is in the code checking useless_type_conversion on functions that I 
> > think
> > are confused querries anyway - we need the ABI matcher, I am looking into 
> > that.
> 
> Ok, so given we seem to be fine in practive with TYPE_MODE (type) ==
> TYPE_MODE (TYPE_CANONICAL (type))

Witht the exception of incopmlete variants a type. Then TYPE_CANONICAL may
be complete and !VOIDmode.  
But sure, i believe we ought to chase away the calls to useless_type_conversion
when one of types in incomplete.
> (whether that's a but or not ...) I'm fine with re-instantiating the
> mode check for
> aggregate types.  Please do that with
> 
> Index: gcc/gimple-expr.c
> ===
> --- gcc/gimple-expr.c   (revision 228963)
> +++ gcc/gimple-expr.c   (working copy)
> @@ -89,8 +89,7 @@ useless_type_conversion_p (tree outer_ty
> 
>/* Changes in machine mode are never useless conversions unless we
>   deal with aggregate types in which case we defer to later checks.  */
> -  if (TYPE_MODE (inner_type) != TYPE_MODE (outer_type)
> -  && !AGGREGATE_TYPE_P (inner_type))
> +  if (TYPE_MODE (inner_type) != TYPE_MODE (outer_type))
>  return false;

OK, that is variant of the patch I had at beggining.  I will test it.
> 
>/* If both the inner and outer types are integral types, then the
> 
> Can we asses equal sizes when modes are non-BLKmode then?  Thus
> 
> @@ -270,10 +269,9 @@ useless_type_conversion_p (tree outer_ty
>   use the types in move operations.  */
>else if (AGGREGATE_TYPE_P (inner_type)
>&& TREE_CODE (inner_type) == TREE_CODE (outer_type))
> -return (!TYPE_SIZE (outer_type)
> -   || (TYPE_SIZE (inner_type)
> -   && operand_equal_p (TYPE_SIZE (inner_type),
> -   TYPE_SIZE (outer_type), 0)));
> +return (TYPE_MODE (outer_type) != BLKmode
> +   || operand_equal_p (TYPE_SIZE (inner_type),
> +   TYPE_SIZE (outer_type), 0));
> 
>else if (TREE_CODE (inner_type) == OFFSET_TYPE
>&& TREE_CODE (outer_type) == OFFSET_TYPE)
> 
> ?  Hoping for VOIDmode incomplete case.
Don't see why this would be a problem either.  I am going to start the testing 
of this variant.

Honza
> 
> Richard.
> 
> > Honza
> >>
> >> Richard.
> >>
> >>
> >> >Honza
> >> >>
> >> >> --
> >> >> Eric Botcazou
> >>


Re: [patch] header file re-ordering.

2015-10-19 Thread Jeff Law

On 10/14/2015 08:05 AM, Andrew MacLeod wrote:

On 10/12/2015 04:04 AM, Jeff Law wrote:

Oh, you must be looking at the original combined patch?

Possibly :-)



fold-const.h is indirectly included by cp-tree.h, which gets it from
including c-common.h.   some of the output from show-headers on
objc-act.c  (indentation represents levels of including.  The number in
parenthesis is the number of times that include has been seen so far in
the files include list.   As you can see, we include ansidecl.h a lot
:-)  Most of the time there isn't much we can do about those sorts of
things. :

cp-tree.h
 tm.h  (2)
 hard-reg-set.h
 function.h  (1)
 c-common.h
   splay-tree.h
 ansidecl.h  (4)
   cpplib.h
 symtab.h  (2)
 line-map.h  (2)
   alias.h
   tree.h  (2)
   fold-const.h
   diagnostic-core.h  (1)
 bversion.h

I guess It could be a useful addition to show-headers to specify a
header file you are looking for and show you where it comes from if its
included...
Yea.  Though I think it's probably easy enough to get it from the 
current output.




I any case, there is some indirection here because none of the front end
files were flattened that much
And I think that's probably some source of the confusion on my part.  I 
thought we'd flattened the front-end .h files too.  So I didn't look 
deeply into the .h files to see if they were doing something undesirable 
behind my back.




incidentally, you may notice this is the second time tree.h is
included.  The first occurrence of tree.h is included directly by
objc-act.c, but it needs to be left because something between that and
cp-tree.h needs tree.h to compile.This sort of thing is resolved by
using the re-order tool, but I did not run that tool on most of the objc
and objcp files as they have some complex conditionals in their include
list:
#include "tree.h"
#include "stringpool.h"
#include "stor-layout.h"
#include "attribs.h"

#ifdef OBJCPLUS
#include "cp/cp-tree.h"
#else
#include "c/c-tree.h"
#include "c/c-lang.h"
#endif

#include "c-family/c-objc.h"
#include "langhooks.h"

Its beyond the scope of the reorder tool to deal with re-positioning
this automatically... and happens so rarely I didn't even look into it.
So they are not optimal as far as ordering goes.

Understood.  This unholy sharing had me concerned as well.


So you can not worry about that.  It builds fine.
OK.  I think the major source of confusion was the lack of flattening 
for the front-ends.  I'll go back to it with that in mind and probably 
start using the tools when I get a WTF moment.







I'm slightly concerned about the darwin, windows and solaris bits. The
former primarily because Darwin has been a general source of pain, and
in the others because I'm not sure the cross testing will exercise
that code terribly much.


Its easy enough to NOT do this for any of those files if were too
worried about them.   Its also easy to revert a single file if it
appears to be an issue. Thats why I wanted to run as many of these
on the compile farm natively as I could... but alas, powerPC was the
only thing the farm really offered me.



I'll go ahead and approve all the config/ bits.  Please be on the
lookout for any fallout.


even darwin, windows and solaris? :-)
Yup.  The changes are straighforward enough that if there's fallout (and 
to some degree I expect minor fallout from native builds) it can be 
easily fixed.


Jeff


  1   2   >