Re: [PATCH 2/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek
On Tue, Oct 13, 2015 at 02:16:23PM +0300, Maxim Ostapenko wrote:
> This patch introduces required compiler changes. Now, we don't version
> asan_init, we have a special __asan_version_mismatch_check_v[n] symbol for
> this.

For this, I just have to wonder what is the actual improvement over what we
had.  To me it looks like a step in the wrong direction, it will only bloat
the size of the ctors.  I can live with it, but just want to put on record I
think it is a mistake.

> Also, asan_stack_malloc_[n] doesn't take a local stack as a second parameter
> anymore, so don't pass it.

I think this is another mistake, but this time with actually bad fix on the
compiler side for it.  If I read the code well, previously
__asan_stack_malloc_n would return you the local stack if it failed for
whatever reason, which is actually what you want as fallback.
But, the new code returns NULL instead, so I think you would need to compare
the return value of __asan_stack_malloc_n with NULL and if it is NULL, use
the addr instead of what it returned; which is not what your asan.c change
does.  Now, what is the improvement here?  Bloat the compiler generated
code... :(

> 2015-10-12  Maxim Ostapenko  
> 
> config/
> 
>   * bootstrap-asan.mk: Replace ASAN_OPTIONS=detect_leaks with
>   LSAN_OPTIONS=detect_leaks

Missing . at the end, and the config/ hunk missing from the patch.

> gcc/
> 
>   * asan.c (asan_emit_stack_protection): Don't pass local stack to
>   asan_stack_malloc_[n] anymore.
>   (asan_finish_file): Instert __asan_version_mismatch_check_v[n] call.

s/Instert/Instead/

Jakub


Re: [PATCH 3/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek
On Tue, Oct 13, 2015 at 02:17:46PM +0300, Maxim Ostapenko wrote:
> This is just reapplied patch for SPARC by David S. Miller. I was unable to
> test this, so could anyone help me here?

This is ok if all the other changes are approved.  You don't need to list
my name in there, just list David's.  We don't want 20 copies of Reapply in
a few years.

> 2015-10-12  Maxim Ostapenko  
> 
>   PR sanitizer/63958
>   Reapply:
>   2015-03-09  Jakub Jelinek  
> 
>   PR sanitizer/63958
>   Reapply:
>   2014-10-14  David S. Miller  
> 
>   * sanitizer_common/sanitizer_platform_limits_linux.cc (time_t):
>   Define at __kernel_time_t, as needed for sparc.
>   (struct __old_kernel_stat): Don't check if __sparc__ is defined.
>   * libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
>   (__sanitizer): Define struct___old_kernel_stat_sz,
>   struct_kernel_stat_sz, and struct_kernel_stat64_sz for sparc.
>   (__sanitizer_ipc_perm): Adjust for sparc targets.
>   (__sanitizer_shmid_ds): Likewsie.
>   (__sanitizer_sigaction): Likewise.
>   (IOC_SIZE): Likewsie.

Jakub


Re: [PATCH 4/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek
On Tue, Oct 13, 2015 at 02:18:41PM +0300, Maxim Ostapenko wrote:
> This is a reapplied Jakub's patch for disabling ODR violation detection.
> More details can be found here
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63888).

This is ok when all the other changes are acked.

> 2015-10-12  Maxim Ostapenko  
> 
>   PR bootstrap/63888
>   Reapply:
>   2015-02-20  Jakub Jelinek  
> 
>   * asan/asan_globals.cc (RegisterGlobal): Disable detect_odr_violation
>   support until it is rewritten upstream.
> 
>   * c-c++-common/asan/pr63888.c: New test.
> 
> Index: libsanitizer/asan/asan_globals.cc
> ===
> --- libsanitizer/asan/asan_globals.cc (revision 250059)
> +++ libsanitizer/asan/asan_globals.cc (working copy)
> @@ -146,7 +146,9 @@
>CHECK(AddrIsInMem(g->beg));
>CHECK(AddrIsAlignedByGranularity(g->beg));
>CHECK(AddrIsAlignedByGranularity(g->size_with_redzone));
> -  if (flags()->detect_odr_violation) {
> +  // This "ODR violation" detection is fundamentally incompatible with
> +  // how GCC registers globals.  Disable as useless until rewritten upstream.
> +  if (0 && flags()->detect_odr_violation) {
>  // Try detecting ODR (One Definition Rule) violation, i.e. the situation
>  // where two globals with the same name are defined in different modules.
>  if (__asan_region_is_poisoned(g->beg, g->size_with_redzone)) {


Jakub


Re: Merge from gomp-4_1-branch to trunk

2015-10-14 Thread Sebastian Huber

Hello,

I get now the following error:

libtool: compile: /scratch/git-build/b-gcc-git-arm-rtems4.12/./gcc/xgcc 
-B/scratch/git-build/b-gcc-git-arm-rtems4.12/./gcc/ -nostdinc 
-B/scratch/git-build/b-gcc-git-arm-rtems4.12/arm-rtems4.12/newlib/ 
-isystem 
/scratch/git-build/b-gcc-git-arm-rtems4.12/arm-rtems4.12/newlib/targ-include 
-isystem /home/EB/sebastian_h/archive/gcc-git/newlib/libc/include 
-B/opt/rtems-4.12/arm-rtems4.12/bin/ 
-B/opt/rtems-4.12/arm-rtems4.12/lib/ -isystem 
/opt/rtems-4.12/arm-rtems4.12/include -isystem 
/opt/rtems-4.12/arm-rtems4.12/sys-include -DHAVE_CONFIG_H -I. 
-I/home/EB/sebastian_h/archive/gcc-git/libgomp 
-I/home/EB/sebastian_h/archive/gcc-git/libgomp/config/rtems 
-I/home/EB/sebastian_h/archive/gcc-git/libgomp/config/posix 
-I/home/EB/sebastian_h/archive/gcc-git/libgomp 
-I/home/EB/sebastian_h/archive/gcc-git/libgomp/../include -Wall -Werror 
-g -O2 -MT fortran.lo -MD -MP -MF .deps/fortran.Tpo -c 
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c -o fortran.o
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c: In function 
'omp_get_place_proc_ids_':
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:484:39: error: 
passing argument 2 of 'omp_get_place_proc_ids' from incompatible pointer 
type [-Werror=incompatible-pointer-types]

   omp_get_place_proc_ids (*place_num, ids);
   ^
In file included from 
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:28:0:
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:73:18: note: 
expected 'int *' but argument is of type 'int32_t * {aka long int *}'

 ialias_redirect (omp_get_place_proc_ids)
  ^
/home/EB/sebastian_h/archive/gcc-git/libgomp/libgomp.h:1011:24: note: in 
definition of macro 'ialias_redirect'
   extern __typeof (fn) fn __asm__ (ialias_ulp "gomp_ialias_" #fn) 
attribute_hidden;

^
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c: In function 
'omp_get_partition_place_nums_':
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:508:33: error: 
passing argument 1 of 'omp_get_partition_place_nums' from incompatible 
pointer type [-Werror=incompatible-pointer-types]

   omp_get_partition_place_nums (place_nums);
 ^
In file included from 
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:28:0:
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:76:18: note: 
expected 'int *' but argument is of type 'int32_t * {aka long int *}'

 ialias_redirect (omp_get_partition_place_nums)
  ^
/home/EB/sebastian_h/archive/gcc-git/libgomp/libgomp.h:1011:24: note: in 
definition of macro 'ialias_redirect'
   extern __typeof (fn) fn __asm__ (ialias_ulp "gomp_ialias_" #fn) 
attribute_hidden;


We have for example (libgomp/omp_lib.f90.in):

  subroutine omp_get_place_proc_ids (place_num, ids)
integer (4), intent(in) :: place_num
integer (4), intent(out) :: ids(*)
  end subroutine omp_get_place_proc_ids

So this interface is different to (libgomp/omp.h.in):

extern void omp_get_place_proc_ids (int, int *) __GOMP_NOTHROW;

The following patch fixes the problem, but I am not sure if this is 
really the best way to address this issue:


diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index ceff9ac..44aaf92 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -481,7 +481,9 @@ omp_get_place_num_procs_8_ (const int64_t *place_num)
 void
 omp_get_place_proc_ids_ (const int32_t *place_num, int32_t *ids)
 {
-  omp_get_place_proc_ids (*place_num, ids);
+  int int_ids;
+  omp_get_place_proc_ids (*place_num, &int_ids);
+  *ids = int_ids;
 }

 void
@@ -505,7 +507,9 @@ omp_get_partition_num_places_ (void)
 void
 omp_get_partition_place_nums_ (int32_t *place_nums)
 {
-  omp_get_partition_place_nums (place_nums);
+  int int_place_nums;
+  omp_get_partition_place_nums (&int_place_nums);
+  *place_nums = int_place_nums;
 }

 void

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [PATCH 5/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek
On Tue, Oct 13, 2015 at 02:20:06PM +0300, Maxim Ostapenko wrote:
> This patch removes UBSan stubs from ASan and TSan code. We don't embed UBSan
> to ASan and UBSan because that would lead to undefined references to C++
> stuff when linking with -static-libasan. AFAIK, sanitizer developers use
> different libraries for C and CXX runtimes, but I think this is out of scope
> of this merge.

Where is CAN_SANITIZE_UB defined?  I don't see it anywhere in the current
libsanitizer and in the patch only:
grep CAN_SANITIZE_UB libsanitizer-249633-2.diff 
+#if CAN_SANITIZE_UB
+# define TSAN_CONTAINS_UBSAN (CAN_SANITIZE_UB && !defined(SANITIZER_GO))
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB

So, unless I'm missing something, it would be best to arrange for
-DCAN_SANITIZE_UB=1 to be in CXXFLAGS for ubsan/ source files and
-DCAN_SANITIZE_UB=0 to be in CXXFLAGS for {a,t}san/ source files?
Are there any other defines that are supposedly set from cmake or wherever
upstream and are left undefined?

> 2015-10-13  Maxim Ostapenko  
> 
>   * tsan/tsan_defs.h: Define TSAN_CONTAINS_UBSAN to 0.
>   * asan/asan_flags.cc (InitializeFlags): Do not initialize UBSan flags.
>   * asan/asan_rtl.cc (AsanInitInternal): Do not init UBSan.

Jakub


Re: [PATCH 6/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek
On Tue, Oct 13, 2015 at 02:21:21PM +0300, Maxim Ostapenko wrote:
> This patch adjusts the fix for
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61771 to extract the last PC
> from the stack frame if no valid FP is available for ARM.

I guess this is ok once all other changes are acked.

> 2015-10-13  Maxim Ostapenko  
> 
>   * sanitizer_common/sanitizer_stacktrace.cc (GetCanonicFrame): Assume we
>   compiled code with GCC when extracting the caller PC for ARM if no
>   valid frame pointer is available.
> 
> Index: libsanitizer/sanitizer_common/sanitizer_stacktrace.cc
> ===
> --- libsanitizer/sanitizer_common/sanitizer_stacktrace.cc (revision 
> 250059)
> +++ libsanitizer/sanitizer_common/sanitizer_stacktrace.cc (working copy)
> @@ -62,8 +62,8 @@
>// Nope, this does not look right either. This means the frame after next 
> does
>// not have a valid frame pointer, but we can still extract the caller PC.
>// Unfortunately, there is no way to decide between GCC and LLVM frame
> -  // layouts. Assume LLVM.
> -  return bp_prev;
> +  // layouts. Assume GCC.
> +  return bp_prev - 1;
>  #else
>return (uhwptr*)bp;
>  #endif


Jakub


Re: [PATCH 7/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek
On Tue, Oct 13, 2015 at 02:22:36PM +0300, Maxim Ostapenko wrote:
> This is the final patch. Force libsanitizer to use an old ABI for ubsan
> float cast data descriptors, because for some exprs (e.g. that type of
> tcc_declaration) we can't get the right location for now. I'm not sure about
> this, perhaps it should be fixed in GCC somehow.

I don't like this (neither the heuristics on the libubsan, it wouldn't be a
big deal to add a new library entrypoint).
If because of the heuristics you need to ensure that the SourceLocation is
always known, then either you check in ubsan.c whether expand_location
gives you NULL xloc.file and in that case use old style float cast overflow
(without location) - i.e. pass 0, NULL, otherwise you use new style, i.e.
pass 1, &loc.  Or arrange through some special option to emit something like
{ "", 0, 0 } instead of { NULL, 0, 0 } for the float cast case.
And, regardless of this, any progress in making sure we have fewer cases
with UNKNOWN_LOCATION on this will not hurt.  I think at this point I'd
prefer the first choice, i.e. using old style for locations without
filename, and new style otherwise.

> 2015-10-13  Maxim Ostapenko  
> 
>   * ubsan/ubsan_handlers.cc (looksLikeFloatCastOverflowDataV1): Always
>   return true for now.
> 
> Index: libsanitizer/ubsan/ubsan_handlers.cc
> ===
> --- libsanitizer/ubsan/ubsan_handlers.cc  (revision 250059)
> +++ libsanitizer/ubsan/ubsan_handlers.cc  (working copy)
> @@ -307,6 +307,9 @@
>  }
>  
>  static bool looksLikeFloatCastOverflowDataV1(void *Data) {
> +  // (TODO): propagate SourceLocation into DataDescriptor and use this
> +  // heuristic than.
> +  return true;
>// First field is either a pointer to filename or a pointer to a
>// TypeDescriptor.
>u8 *FilenameOrTypeDescriptor;


Jakub


Re: [PATCH 2/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Yury Gribov

On 10/13/2015 02:16 PM, Maxim Ostapenko wrote:

This patch introduces required compiler changes. Now, we don't version
asan_init, we have a special __asan_version_mismatch_check_v[n] symbol
for this.

Also, asan_stack_malloc_[n] doesn't take a local stack as a second
parameter anymore, so don't pass it.


Did you compare libasan.so and libclang_rt-asan.so for other ABI 
incompatibilities e.g. via libabigail?


-Y


Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek
On Tue, Oct 13, 2015 at 07:54:33PM +0300, Maxim Ostapenko wrote:
> On 13/10/15 14:15, Maxim Ostapenko wrote:
> >This is the raw merge itself. I'm bumping SONAME to libasan.so.3.
> >
> >-Maxim
> 
> I have just noticed that I've misused autoconf stuff (used wrong version).
> Here a fixed version of the same patch. Sorry for inconvenience.

Is libubsan, libtsan backwards compatible, or do we want to change SONAME
there too?

The aarch64 changes are terrible, not just because it doesn't yet have
runtime decision on what VA to use or that it doesn't support 48-bit VA,
but also that for the 42-bit VA it uses a different shadow offset from
39-bit VA.  But on the compiler side we have just one...
Though, only the 39-bit VA is enabled right now by default, so out of the
box the state is as bad as we had in 5.x - users wanting 42-bit VA or 48-bit
VA have to patch libsanitizer.

Have you verified libbacktrace sanitization still works properly (that is
something upstream does not test)?

Do you plan to update the asan tests we have to reflect the changes in
upstream?

Jakub


Re: Merge from gomp-4_1-branch to trunk

2015-10-14 Thread Jakub Jelinek
On Wed, Oct 14, 2015 at 09:34:48AM +0200, Sebastian Huber wrote:
> /home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:28:0:
> /home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:73:18: note: expected
> 'int *' but argument is of type 'int32_t * {aka long int *}'

Ugh, wasn't aware that some targets use long int for int32_t :(.

> The following patch fixes the problem, but I am not sure if this is really
> the best way to address this issue:
> 
> diff --git a/libgomp/fortran.c b/libgomp/fortran.c
> index ceff9ac..44aaf92 100644
> --- a/libgomp/fortran.c
> +++ b/libgomp/fortran.c
> @@ -481,7 +481,9 @@ omp_get_place_num_procs_8_ (const int64_t *place_num)
>  void
>  omp_get_place_proc_ids_ (const int32_t *place_num, int32_t *ids)
>  {
> -  omp_get_place_proc_ids (*place_num, ids);
> +  int int_ids;
> +  omp_get_place_proc_ids (*place_num, &int_ids);
> +  *ids = int_ids;
>  }
> 
>  void
> @@ -505,7 +507,9 @@ omp_get_partition_num_places_ (void)
>  void
>  omp_get_partition_place_nums_ (int32_t *place_nums)
>  {
> -  omp_get_partition_place_nums (place_nums);
> +  int int_place_nums;
> +  omp_get_partition_place_nums (&int_place_nums);
> +  *place_nums = int_place_nums;
>  }

No, both the above changes are wrong.  There is not a single int32_t
written, but could be many more, it is an array of 32-bit integers.
I'd say you just want to cast explicitly,
  omp_get_place_proc_ids (*place_num, (int *) ids);
and
  omp_get_parition_place_nums ((int *) place_nums);
The reason for int32_t is that on the Fortran side it is integer(kind=4)
and everywhere else for that int32_t is used.
If this works, the patch is preapproved.

As for aliasing, it will always be int stores vs. integer(kind=4) reads,
int32_t is just a type used in the wrappers.

Jakub


Re: Merge from gomp-4_1-branch to trunk

2015-10-14 Thread Sebastian Huber



On 14/10/15 10:04, Jakub Jelinek wrote:

On Wed, Oct 14, 2015 at 09:34:48AM +0200, Sebastian Huber wrote:

>/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:28:0:
>/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:73:18: note: expected
>'int *' but argument is of type 'int32_t * {aka long int *}'

Ugh, wasn't aware that some targets use long int for int32_t:(.



This is actually a feature of the newlib-stdint.h:

[...]
/* newlib uses 32-bit long in certain cases for all non-SPU
   targets.  */
#ifndef STDINT_LONG32
#define STDINT_LONG32 (LONG_TYPE_SIZE == 32)
#endif

#define SIG_ATOMIC_TYPE "int"

/* The newlib logic actually checks for sizes greater than 32 rather
   than equal to 64 for various 64-bit types.  */

#define INT8_TYPE (CHAR_TYPE_SIZE == 8 ? "signed char" : 0)
#define INT16_TYPE (SHORT_TYPE_SIZE == 16 ? "short int" : INT_TYPE_SIZE 
== 16 ? "int" : CHAR_TYPE_SIZE == 16 ? "signed char" : 0)
#define INT32_TYPE (STDINT_LONG32 ? "long int" : INT_TYPE_SIZE == 32 ? 
"int" : SHORT_TYPE_SIZE == 32 ? "short int" : CHAR_TYPE_SIZE == 32 ? 
"signed char" : 0)

[...]

This regularly causes problems like this. In addition it leads to a 
complicated PRI* macro definition in .


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: using scratchpads to enhance RTL-level if-conversion: revised patch

2015-10-14 Thread Eric Botcazou
> If you're using one of the switches that checks for stack overflow at the
> start of the function, you certainly don't want to do any such stores.

There is a protection area for -fstack-check (STACK_CHECK_PROTECT bytes) so 
you can do stores just below the stack pointer as far as it's concerned.

There is indeed the issue of the mere writing below the stack pointer.  Our 
experience with various OSes and architectures shows that this almost always 
works.  The only problematic case is x86{-64}/Linux historically, where you 
cannot write below the page pointed to by the stack pointer (that's why there 
is a specific implementation of -fstack-check for x86{-64}/Linux).

-- 
Eric Botcazou


[PATCH] More vectorizer TLC

2015-10-14 Thread Richard Biener

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-10-14  Richard Biener  

* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
Reset info at start.
(vect_analyze_group_access_1): Add debug print.
* tree-vect-loop.c (vect_get_single_scalar_iteration_cost): Rename ...
(vect_compute_single_scalar_iteration_cost): ... to this.
(vect_analyze_loop_2): Adjust.
* tree-vect-slp.c (struct _slp_oprnd_info): Move from ...
* tree-vectorizer.h: ... here.
(add_stmt_info_to_vec): Remove.
* tree-vect-stmts.c (record_stmt_cost): Inline add_stmt_info_to_vec.

Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 228759)
--- gcc/tree-vect-data-refs.c   (working copy)
*** vect_enhance_data_refs_alignment (loop_v
*** 1352,1357 
--- 1352,1361 
  dump_printf_loc (MSG_NOTE, vect_location,
   "=== vect_enhance_data_refs_alignment ===\n");
  
+   /* Reset data so we can safely be called multiple times.  */
+   LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo).truncate (0);
+   LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0;
+ 
/* While cost model enhancements are expected in the future, the high level
   view of the code at this time is as follows:
  
*** vect_analyze_group_access_1 (struct data
*** 2151,2156 
--- 2155,2164 
return false;
  }
  
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Two or more load stmts share the same dr.\n");
+ 
/* For load use the same data-ref load.  */
GROUP_SAME_DR_STMT (vinfo_for_stmt (next)) = prev;
  
Index: gcc/tree-vect-loop.c
===
*** gcc/tree-vect-loop.c(revision 228759)
--- gcc/tree-vect-loop.c(working copy)
*** destroy_loop_vec_info (loop_vec_info loo
*** 1043,1049 
  
  /* Calculate the cost of one scalar iteration of the loop.  */
  static void
! vect_get_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
  {
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
--- 1043,1049 
  
  /* Calculate the cost of one scalar iteration of the loop.  */
  static void
! vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
  {
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
*** vect_analyze_loop_2 (loop_vec_info loop_
*** 1739,1745 
  }
  
/* Compute the scalar iteration cost.  */
!   vect_get_single_scalar_iteration_cost (loop_vinfo);
  
/* This pass will decide on using loop versioning and/or loop peeling in
   order to enhance the alignment of data references in the loop.  */
--- 1739,1745 
  }
  
/* Compute the scalar iteration cost.  */
!   vect_compute_single_scalar_iteration_cost (loop_vinfo);
  
/* This pass will decide on using loop versioning and/or loop peeling in
   order to enhance the alignment of data references in the loop.  */
Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 228759)
--- gcc/tree-vect-slp.c (working copy)
*** vect_create_new_slp_node (vec
*** 135,140 
--- 135,157 
  }
  
  
+ /* This structure is used in creation of an SLP tree.  Each instance
+corresponds to the same operand in a group of scalar stmts in an SLP
+node.  */
+ typedef struct _slp_oprnd_info
+ {
+   /* Def-stmts for the operands.  */
+   vec def_stmts;
+   /* Information about the first statement, its vector def-type, type, the
+  operand itself in case it's constant, and an indication if it's a pattern
+  stmt.  */
+   enum vect_def_type first_dt;
+   tree first_op_type;
+   bool first_pattern;
+   bool second_pattern;
+ } *slp_oprnd_info;
+ 
+ 
  /* Allocate operands info for NOPS operands, and GROUP_SIZE def-stmts for each
 operand.  */
  static vec 
Index: gcc/tree-vect-stmts.c
===
*** gcc/tree-vect-stmts.c   (revision 228759)
--- gcc/tree-vect-stmts.c   (working copy)
*** record_stmt_cost (stmt_vector_for_cost *
*** 94,105 
if (body_cost_vec)
  {
tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
!   add_stmt_info_to_vec (body_cost_vec, count, kind,
!   stmt_info ? STMT_VINFO_STMT (stmt_info) : NULL,
!   misalign);
return (unsigned)
(builtin_vectorization_cost (kind, vectype, misalign) * count);
-
  }
else
  return add_stmt_cost (stmt_info->vinfo->target_cost_data,
--- 94,105

Re: Merge from gomp-4_1-branch to trunk

2015-10-14 Thread Sebastian Huber

On 14/10/15 10:04, Jakub Jelinek wrote:

No, both the above changes are wrong.  There is not a single int32_t
written, but could be many more, it is an array of 32-bit integers.
I'd say you just want to cast explicitly,
   omp_get_place_proc_ids (*place_num, (int *) ids);
and
   omp_get_parition_place_nums ((int *) place_nums);
The reason for int32_t is that on the Fortran side it is integer(kind=4)
and everywhere else for that int32_t is used.
If this works, the patch is preapproved.


I checked this in:

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=228805

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [vec-cmp, patch 4/6] Support vector mask invariants

2015-10-14 Thread Richard Biener
On Tue, Oct 13, 2015 at 4:52 PM, Ilya Enkovich  wrote:
> 2015-10-13 16:54 GMT+03:00 Richard Biener :
>> On Thu, Oct 8, 2015 at 5:11 PM, Ilya Enkovich  wrote:
>>> Hi,
>>>
>>> This patch adds a special handling of boolean vector invariants.  We need 
>>> additional code to determine type of generated invariant.  For 
>>> VEC_COND_EXPR case we even provide this type directly because statement 
>>> vectype doesn't allow us to compute it.  Separate code is used to generate 
>>> and expand such vectors.
>>>
>>> Thanks,
>>> Ilya
>>> --
>>> gcc/
>>>
>>> 2015-10-08  Ilya Enkovich  
>>>
>>> * expr.c (const_vector_mask_from_tree): New.
>>> (const_vector_from_tree): Use const_vector_mask_from_tree
>>> for boolean vectors.
>>> * tree-vect-stmts.c (vect_init_vector): Support boolean vector
>>> invariants.
>>> (vect_get_vec_def_for_operand): Add VECTYPE arg.
>>> (vectorizable_condition): Directly provide vectype for invariants
>>> used in comparison.
>>> * tree-vectorizer.h (vect_get_vec_def_for_operand): Add VECTYPE
>>> arg.
>>>
>>>
>>> diff --git a/gcc/expr.c b/gcc/expr.c
>>> index 88da8cb..a624a34 100644
>>> --- a/gcc/expr.c
>>> +++ b/gcc/expr.c
>>> @@ -11320,6 +11320,40 @@ try_tablejump (tree index_type, tree index_expr, 
>>> tree minval, tree range,
>>>return 1;
>>>  }
>>>
>>> +/* Return a CONST_VECTOR rtx representing vector mask for
>>> +   a VECTOR_CST of booleans.  */
>>> +static rtx
>>> +const_vector_mask_from_tree (tree exp)
>>> +{
>>> +  rtvec v;
>>> +  unsigned i;
>>> +  int units;
>>> +  tree elt;
>>> +  machine_mode inner, mode;
>>> +
>>> +  mode = TYPE_MODE (TREE_TYPE (exp));
>>> +  units = GET_MODE_NUNITS (mode);
>>> +  inner = GET_MODE_INNER (mode);
>>> +
>>> +  v = rtvec_alloc (units);
>>> +
>>> +  for (i = 0; i < VECTOR_CST_NELTS (exp); ++i)
>>> +{
>>> +  elt = VECTOR_CST_ELT (exp, i);
>>> +
>>> +  gcc_assert (TREE_CODE (elt) == INTEGER_CST);
>>> +  if (integer_zerop (elt))
>>> +   RTVEC_ELT (v, i) = CONST0_RTX (inner);
>>> +  else if (integer_onep (elt)
>>> +  || integer_minus_onep (elt))
>>> +   RTVEC_ELT (v, i) = CONSTM1_RTX (inner);
>>> +  else
>>> +   gcc_unreachable ();
>>> +}
>>> +
>>> +  return gen_rtx_CONST_VECTOR (mode, v);
>>> +}
>>> +
>>>  /* Return a CONST_VECTOR rtx for a VECTOR_CST tree.  */
>>>  static rtx
>>>  const_vector_from_tree (tree exp)
>>> @@ -11335,6 +11369,9 @@ const_vector_from_tree (tree exp)
>>>if (initializer_zerop (exp))
>>>  return CONST0_RTX (mode);
>>>
>>> +  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp)))
>>> +  return const_vector_mask_from_tree (exp);
>>> +
>>>units = GET_MODE_NUNITS (mode);
>>>inner = GET_MODE_INNER (mode);
>>>
>>> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>>> index 6949c71..337ea7b 100644
>>> --- a/gcc/tree-vect-stmts.c
>>> +++ b/gcc/tree-vect-stmts.c
>>> @@ -1308,27 +1308,61 @@ vect_init_vector_1 (gimple *stmt, gimple *new_stmt, 
>>> gimple_stmt_iterator *gsi)
>>>  tree
>>>  vect_init_vector (gimple *stmt, tree val, tree type, gimple_stmt_iterator 
>>> *gsi)
>>>  {
>>> +  tree val_type = TREE_TYPE (val);
>>> +  machine_mode mode = TYPE_MODE (type);
>>> +  machine_mode val_mode = TYPE_MODE(val_type);
>>>tree new_var;
>>>gimple *init_stmt;
>>>tree vec_oprnd;
>>>tree new_temp;
>>>
>>>if (TREE_CODE (type) == VECTOR_TYPE
>>> -  && TREE_CODE (TREE_TYPE (val)) != VECTOR_TYPE)
>>> -{
>>> -  if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
>>> +  && TREE_CODE (val_type) != VECTOR_TYPE)
>>> +{
>>> +  /* Handle vector of bool represented as a vector of
>>> +integers here rather than on expand because it is
>>> +a default mask type for targets.  Vector mask is
>>> +built in a following way:
>>> +
>>> +tmp = (int)val
>>> +vec_tmp = {tmp, ..., tmp}
>>> +vec_cst = VIEW_CONVERT_EXPR(vec_tmp);  */
>>> +  if (TREE_CODE (val_type) == BOOLEAN_TYPE
>>> + && VECTOR_MODE_P (mode)
>>> + && SCALAR_INT_MODE_P (GET_MODE_INNER (mode))
>>> + && GET_MODE_INNER (mode) != val_mode)
>>> {
>>> - if (CONSTANT_CLASS_P (val))
>>> -   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
>>> - else
>>> + unsigned size = GET_MODE_BITSIZE (GET_MODE_INNER (mode));
>>> + tree stype = build_nonstandard_integer_type (size, 1);
>>> + tree vectype = get_vectype_for_scalar_type (stype);
>>> +
>>> + new_temp = make_ssa_name (stype);
>>> + init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
>>> + vect_init_vector_1 (stmt, init_stmt, gsi);
>>> +
>>> + val = make_ssa_name (vectype);
>>> + new_temp = build_vector_from_val (vectype, new_temp);
>>> + init_stmt = gimple_build_assign (val, new_temp);
>>> + vect_init_vector_1 (stmt, init_stmt, gsi);
>>> +
>>> + val = build

Re: Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))

2015-10-14 Thread Richard Biener
On Tue, Oct 13, 2015 at 5:32 PM, David Malcolm  wrote:
> On Thu, 2015-09-24 at 10:15 +0200, Richard Biener wrote:
>> On Thu, Sep 24, 2015 at 2:25 AM, David Malcolm  wrote:
>> > On Wed, 2015-09-23 at 15:36 +0200, Richard Biener wrote:
>> >> On Wed, Sep 23, 2015 at 3:19 PM, Michael Matz  wrote:
>> >> > Hi,
>> >> >
>> >> > On Tue, 22 Sep 2015, David Malcolm wrote:
>> >> >
>> >> >> The drawback is that it could bloat the ad-hoc table.  Can the ad-hoc
>> >> >> table ever get smaller, or does it only ever get inserted into?
>> >> >
>> >> > It only ever grows.
>> >> >
>> >> >> An idea I had is that we could stash short ranges directly into the 32
>> >> >> bits of location_t, by offsetting the per-column-bits somewhat.
>> >> >
>> >> > It's certainly worth an experiment: let's say you restrict yourself to
>> >> > tokens less than 8 characters, you need an additional 3 bits (using one
>> >> > value, e.g. zero, as the escape value).  That leaves 20 bits for the 
>> >> > line
>> >> > numbers (for the normal 8 bit columns), which might be enough for most
>> >> > single-file compilations.  For LTO compilation this often won't be 
>> >> > enough.
>> >> >
>> >> >> My plan is to investigate the impact these patches have on the time and
>> >> >> memory consumption of the compiler,
>> >> >
>> >> > When you do so, make sure you're also measuring an LTO compilation with
>> >> > debug info of something big (firefox).  I know that we already had 
>> >> > issues
>> >> > with the size of the linemap data in the past for these cases (probably
>> >> > when we added columns).
>> >>
>> >> The issue we have with LTO is that the linemap gets populated in quite
>> >> random order and thus we repeatedly switch files (we've mitigated this
>> >> somewhat for GCC 5).  We also considered dropping column info
>> >> (and would drop range info) as diagnostics are from optimizers only
>> >> with LTO and we keep locations merely for debug info.
>> >
>> > Thanks.  Presumably the mitigation you're referring to is the
>> > lto_location_cache class in lto-streamer-in.c?
>> >
>> > Am I right in thinking that, right now, the LTO code doesn't support
>> > ad-hoc locations? (presumably the block pointers only need to exist
>> > during optimization, which happens after the serialization)
>>
>> LTO code does support ad-hoc locations but they are "restored" only
>> when reading function bodies and stmts (by means of COMBINE_LOCATION_DATA).
>>
>> > The obvious simplification would be, as you suggest, to not bother
>> > storing range information with LTO, falling back to just the existing
>> > representation.  Then there's no need to extend LTO to serialize ad-hoc
>> > data; simply store the underlying locus into the bit stream.  I think
>> > that this happens already: lto-streamer-out.c calls expand_location and
>> > stores the result, so presumably any ad-hoc location_t values made by
>> > the v2 patches would have dropped their range data there when I ran the
>> > test suite.
>>
>> Yep.  We only preserve BLOCKs, so if you don't add extra code to
>> preserve ranges they'll be "dropped".
>>
>> > If it's acceptable to not bother with ranges for LTO, one way to do the
>> > "stashing short ranges into the location_t" idea might be for the
>> > bits-per-range of location_t values to be a property of the line_table
>> > (or possibly the line map), set up when the struct line_maps is created.
>> > For non-LTO it could be some tuned value (maybe from a param?); for LTO
>> > it could be zero, so that we have as many bits as before for line/column
>> > data.
>>
>> That could be a possibility (likewise for column info?)
>>
>> Richard.
>>
>> > Hope this sounds sane
>> > Dave
>
> I did some crude benchmarking of the patchkit, using these scripts:
>   https://github.com/davidmalcolm/gcc-benchmarking
> (specifically, bb0222b455df8cefb53bfc1246eb0a8038256f30),
> using the "big-code.c" and "kdecore.cc" files Michael posted as:
>   https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00062.html
> and "influence.i", a preprocessed version of SPEC2006's 445.gobmk
> engine/influence.c (as an example of a moderate-sized pure C source
> file).
>
> This doesn't yet cover very large autogenerated C files, and the .cc
> file is only being measured to see the effect on the ad-hoc table (and
> tokenization).
>
> "control" was r227977.
> "experiment" was the same revision with the v2 patchkit applied.
>
> Recall that this patchkit captures ranges for tokens as an extra field
> within tokens within libcpp and the C FE, and adds ranges to the ad-hoc
> location lookaside, storing them for all tree nodes within the C FE that
> have a location_t, and passing them around within c_expr for all C
> expressions (including those that don't have a location_t).
>
> Both control and experiment were built with
>   --enable-checking=release \
>   --disable-bootstrap \
>   --disable-multilib \
>   --enable-languages=c,ada,c++,fortran,go,java,lto,objc,obj-c++
>
> The script measures:
>
> (a) wallclock time for

Re: [PATCH] Optimize const1 * copysign (const2, y) in reassoc (PR tree-optimization/67815)

2015-10-14 Thread Richard Biener
On Tue, 13 Oct 2015, Marek Polacek wrote:

> This patch implements the copysign optimization for reassoc I promised
> I'd look into.  I.e.,
> 
> CST1 * copysign (CST2, y) -> copysign (CST1 * CST2, y) if CST1 > 0
> CST1 * copysign (CST2, y) -> -copysign (CST1 * CST2, y) if CST1 < 0
> 
> After getting familiar with reassoc a bit this wasn't that hard.  But
> I'm hopeless when it comes to floating-point stuff, so I'd appreciate
> if you could glance over the tests.  The reassoc-40.c should address
> Joseph's comment in the audit trail (with -fno-rounding-math the
> optimization would take place).
> 
> For 0.0 * copysign (cst, x), the result is folded into 0.0 way before
> reassoc, so we probably don't have to pay attention to this case.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2015-10-13  Marek Polacek  
> 
>   PR tree-optimization/67815
>   * tree-ssa-reassoc.c (attempt_builtin_copysign): New function.
>   (reassociate_bb): Call it.
> 
>   * gcc.dg/tree-ssa/reassoc-39.c: New test.
>   * gcc.dg/tree-ssa/reassoc-40.c: New test.
> 
> diff --git gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c 
> gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
> index e69de29..589d06b 100644
> --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
> +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
> @@ -0,0 +1,41 @@
> +/* PR tree-optimization/67815 */
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -fdump-tree-reassoc1-details" } */
> +
> +float
> +f0 (float x)
> +{
> +  return 7.5 * __builtin_copysignf (2.0, x);
> +}
> +
> +float
> +f1 (float x)
> +{
> +  return -7.5 * __builtin_copysignf (2.0, x);
> +}
> +
> +double
> +f2 (double x, double y)
> +{
> +  return x * ((1.0/12) * __builtin_copysign (1.0, y));
> +}
> +
> +double
> +f3 (double x, double y)
> +{
> +  return (x * (-1.0/12)) * __builtin_copysign (1.0, y);
> +}
> +
> +double
> +f4 (double x, double y, double z)
> +{
> +  return (x * z) * ((1.0/12) * __builtin_copysign (4.0, y));
> +}
> +
> +double
> +f5 (double x, double y, double z)
> +{
> +  return (x * (-1.0/12)) * z * __builtin_copysign (2.0, y);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Optimizing copysign" 6 "reassoc1"} }*/
> diff --git gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c 
> gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c
> index e69de29..d65bcc1b 100644
> --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c
> +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c
> @@ -0,0 +1,21 @@
> +/* PR tree-optimization/67815 */
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -frounding-math -fdump-tree-reassoc1-details" } */
> +
> +/* Test that the copysign reassoc optimization doesn't fire for
> +   -frounding-math (i.e. HONOR_SIGN_DEPENDENT_ROUNDING) if the multiplication
> +   is inexact.  */
> +
> +double
> +f1 (double y)
> +{
> +  return (1.2 * __builtin_copysign (1.1, y));
> +}
> +
> +double
> +f2 (double y)
> +{
> +  return (-1.2 * __builtin_copysign (1.1, y));
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Optimizing copysign" "reassoc1" } } */
> diff --git gcc/tree-ssa-reassoc.c gcc/tree-ssa-reassoc.c
> index 879722e..b8897b7 100644
> --- gcc/tree-ssa-reassoc.c
> +++ gcc/tree-ssa-reassoc.c
> @@ -4622,6 +4622,95 @@ attempt_builtin_powi (gimple *stmt, vec *> *ops)
>return result;
>  }
>  
> +/* Attempt to optimize
> +   CST1 * copysign (CST2, y) -> copysign (CST1 * CST2, y) if CST1 > 0, or
> +   CST1 * copysign (CST2, y) -> -copysign (CST1 * CST2, y) if CST1 < 0.  */
> +
> +static void
> +attempt_builtin_copysign (vec *ops)
> +{
> +  operand_entry *oe;
> +  unsigned int i;
> +  unsigned int length = ops->length ();
> +  tree cst1 = ops->last ()->op;
> +
> +  if (length == 1 || TREE_CODE (cst1) != REAL_CST)
> +return;
> +
> +  FOR_EACH_VEC_ELT (*ops, i, oe)
> +{
> +  if (TREE_CODE (oe->op) == SSA_NAME)

I think you need to check whether the SSA_NAME has a single use only
as you are changing its value.  Which also means you shouldn't be
"reusing" it (because existing debug stmts will then be wrong).
Thus you have to replace it.

> + {
> +   gimple *def_stmt = SSA_NAME_DEF_STMT (oe->op);
> +   if (is_gimple_call (def_stmt))
> + {
> +   tree fndecl = gimple_call_fndecl (def_stmt);
> +   tree cst2;
> +   switch (DECL_FUNCTION_CODE (fndecl))
> + {
> + CASE_FLT_FN (BUILT_IN_COPYSIGN):
> +   cst2 = gimple_call_arg (def_stmt, 0);
> +   /* The first argument of copysign must be a constant,
> +  otherwise there's nothing to do.  */
> +   if (TREE_CODE (cst2) == REAL_CST)
> + {
> +   tree mul = const_binop (MULT_EXPR, TREE_TYPE (cst1),
> +   cst1, cst2);
> +   /* If we couldn't fold to a single constant, skip it.  */
> +   if (mul == NULL_TREE)
> + break;
> +   /* We're going to replace the copysign argument with
> +

Re: [PATCH 8/9] Add TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID

2015-10-14 Thread Richard Biener
On Tue, Oct 13, 2015 at 10:59 PM, Richard Henderson  wrote:
> On 10/14/2015 02:49 AM, Jeff Law wrote:
>>
>> The problem here is we don't know what address space the *0 is going to
>> hit,
>> right?
>
>
> Correct, not before we do the walk of stmt to see what's present.
>
>> Isn't that also an issue for code generation as well?
>
>
> What sort of problem are you thinking of?  I haven't seen one yet.

The actual dereference of course has a properly address-space qualified zero.

Only your walking depends on operand_equal_p to treat different address-space
zero addresses as equal (which they are of course not ...):


int
operand_equal_p (const_tree arg0, const_tree arg1, unsigned int flags)
{
...
  /* Check equality of integer constants before bailing out due to
 precision differences.  */
  if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
{
  /* Address of INTEGER_CST is not defined; check that we did not forget
 to drop the OEP_ADDRESS_OF/OEP_CONSTANT_ADDRESS_OF flags.  */
  gcc_checking_assert (!(flags
 & (OEP_ADDRESS_OF | OEP_CONSTANT_ADDRESS_OF)));
  return tree_int_cst_equal (arg0, arg1);
}

but only later we do

  /* We cannot consider pointers to different address space equal.  */
  if (POINTER_TYPE_P (TREE_TYPE (arg0))
  && POINTER_TYPE_P (TREE_TYPE (arg1))
  && (TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg0)))
  != TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg1)
return 0;

So "fixing" that would make the walker only look for default
address-space zero dereferences.

I think we need to fix operand_equal_p anyway because 0 is clearly not
equal to 0 (only if
they convert to the same literal)

Richard.


>
> r~


Re: [PATCH 8/9] Add TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID

2015-10-14 Thread Richard Biener
On Wed, Oct 14, 2015 at 11:19 AM, Richard Biener
 wrote:
> On Tue, Oct 13, 2015 at 10:59 PM, Richard Henderson  wrote:
>> On 10/14/2015 02:49 AM, Jeff Law wrote:
>>>
>>> The problem here is we don't know what address space the *0 is going to
>>> hit,
>>> right?
>>
>>
>> Correct, not before we do the walk of stmt to see what's present.
>>
>>> Isn't that also an issue for code generation as well?
>>
>>
>> What sort of problem are you thinking of?  I haven't seen one yet.
>
> The actual dereference of course has a properly address-space qualified zero.
>
> Only your walking depends on operand_equal_p to treat different address-space
> zero addresses as equal (which they are of course not ...):
>
>
> int
> operand_equal_p (const_tree arg0, const_tree arg1, unsigned int flags)
> {
> ...
>   /* Check equality of integer constants before bailing out due to
>  precision differences.  */
>   if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
> {
>   /* Address of INTEGER_CST is not defined; check that we did not forget
>  to drop the OEP_ADDRESS_OF/OEP_CONSTANT_ADDRESS_OF flags.  */
>   gcc_checking_assert (!(flags
>  & (OEP_ADDRESS_OF | OEP_CONSTANT_ADDRESS_OF)));
>   return tree_int_cst_equal (arg0, arg1);
> }
>
> but only later we do
>
>   /* We cannot consider pointers to different address space equal.  */
>   if (POINTER_TYPE_P (TREE_TYPE (arg0))
>   && POINTER_TYPE_P (TREE_TYPE (arg1))
>   && (TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg0)))
>   != TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg1)
> return 0;
>
> So "fixing" that would make the walker only look for default
> address-space zero dereferences.
>
> I think we need to fix operand_equal_p anyway because 0 is clearly not
> equal to 0 (only if
> they convert to the same literal)

I think you could trigger bogus CSE of dereferences of literal addresses
from different address-spaces.

Richard.

> Richard.
>
>
>>
>> r~


Re: [PR67891] drop is_gimple_reg test from set_parm_rtl

2015-10-14 Thread Richard Biener
On Wed, Oct 14, 2015 at 5:25 AM, Alexandre Oliva  wrote:
> On Oct 12, 2015, Richard Biener  wrote:
>
>> On Sat, Oct 10, 2015 at 3:16 PM, Alexandre Oliva  wrote:
>>> On Oct  9, 2015, Richard Biener  wrote:
>>>
 Ok.  Note that I think emit_block_move shouldn't mess with the addressable 
 flag.
>>>
>>> I have successfully tested a patch that stops it from doing so,
>>> reverting https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49429#c11 but
>>> according to bugs 49429 and 49454, it looks like removing it would mess
>>> with escape analysis introduced in r175063 for bug 44194.  The thread
>>> that introduces the mark_addressable calls suggests some discomfort with
>>> this solution, and even a suggestion that the markings should be
>>> deferred past the end of expand, but in the end there was agreement to
>>> go with it.  https://gcc.gnu.org/ml/gcc-patches/2011-06/msg01746.html
>
>> Aww, indeed.  Of course the issue is that we don't track pointers to the
>> stack introduced during RTL properly.
>
>> Thanks for checking.  Might want to add a comment before that
>> addressable setting now that you've done the archeology.
>
> I decided to give the following approach a try instead.  The following
> patch was regstrapped on x86_64-linux-gnu and i686-linux-gnu.
> Ok to install?

It looks ok to me but lacks a comment in mark_addressable_1 why we
do this queueing when currently expanding to RTL.

Richard.

> Would anyone with access to hpux (pa and ia64 are both affected) give it
> a spin?
>
>
> defer mark_addressable calls during expand till the end of expand
>
> From: Alexandre Oliva 
>
> for  gcc/ChangeLog
>
> * gimple-expr.c: Include hash-set.h and rtl.h.
> (mark_addressable_queue): New var.
> (mark_addressable): Factor actual marking into...
> (mark_addressable_1): ... this.  Queue it up during expand.
> (mark_addressable_2): New.
> (flush_mark_addressable_queue): New.
> * gimple-expr.h (flush_mark_addressable_queue): Declare.
> * cfgexpand.c: Include gimple-expr.h.
> (pass_expand::execute): Flush mark_addressable queue.
> ---
>  gcc/cfgexpand.c   |3 +++
>  gcc/gimple-expr.c |   50 --
>  gcc/gimple-expr.h |1 +
>  3 files changed, 52 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index eaad859..a362e17 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "internal-fn.h"
>  #include "tree-eh.h"
>  #include "gimple-iterator.h"
> +#include "gimple-expr.h"
>  #include "gimple-walk.h"
>  #include "cgraph.h"
>  #include "tree-cfg.h"
> @@ -6373,6 +6374,8 @@ pass_expand::execute (function *fun)
>/* We're done expanding trees to RTL.  */
>currently_expanding_to_rtl = 0;
>
> +  flush_mark_addressable_queue ();
> +
>FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (fun)->next_bb,
>   EXIT_BLOCK_PTR_FOR_FN (fun), next_bb)
>  {
> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
> index 2a6ba1a..db249a3 100644
> --- a/gcc/gimple-expr.c
> +++ b/gcc/gimple-expr.c
> @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimplify.h"
>  #include "stor-layout.h"
>  #include "demangle.h"
> +#include "hash-set.h"
> +#include "rtl.h"
>
>  /* - Type related -  */
>
> @@ -823,6 +825,50 @@ is_gimple_mem_ref_addr (tree t)
>   || decl_address_invariant_p (TREE_OPERAND (t, 0);
>  }
>
> +/* Hold trees marked addressable during expand.  */
> +
> +static hash_set *mark_addressable_queue;
> +
> +/* Mark X as addressable or queue it up if called during expand.  */
> +
> +static void
> +mark_addressable_1 (tree x)
> +{
> +  if (!currently_expanding_to_rtl)
> +{
> +  TREE_ADDRESSABLE (x) = 1;
> +  return;
> +}
> +
> +  if (!mark_addressable_queue)
> +mark_addressable_queue = new hash_set();
> +  mark_addressable_queue->add (x);
> +}
> +
> +/* Adaptor for mark_addressable_1 for use in hash_set traversal.  */
> +
> +bool
> +mark_addressable_2 (tree const &x, void * ATTRIBUTE_UNUSED = NULL)
> +{
> +  mark_addressable_1 (x);
> +  return false;
> +}
> +
> +/* Mark all queued trees as addressable, and empty the queue.  To be
> +   called right after clearing CURRENTLY_EXPANDING_TO_RTL.  */
> +
> +void
> +flush_mark_addressable_queue ()
> +{
> +  gcc_assert (!currently_expanding_to_rtl);
> +  if (mark_addressable_queue)
> +{
> +  mark_addressable_queue->traverse (NULL);
> +  delete mark_addressable_queue;
> +  mark_addressable_queue = NULL;
> +}
> +}
> +
>  /* Mark X addressable.  Unlike the langhook we expect X to be in gimple
> form and we don't do any syntax checking.  */
>
> @@ -838,7 +884,7 @@ mark_addressable (tree x)
>&& TREE_CODE (x) != PARM_DECL
>&& TREE_CODE (x) != RESULT_DECL)
>  return;
> -  TREE_ADDRESSABLE (x) = 1;
> +  mark_addressable_1 (x);
>
>   

Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Maxim Ostapenko

On 14/10/15 10:54, Jakub Jelinek wrote:

On Tue, Oct 13, 2015 at 07:54:33PM +0300, Maxim Ostapenko wrote:

On 13/10/15 14:15, Maxim Ostapenko wrote:

This is the raw merge itself. I'm bumping SONAME to libasan.so.3.

-Maxim

I have just noticed that I've misused autoconf stuff (used wrong version).
Here a fixed version of the same patch. Sorry for inconvenience.

Is libubsan, libtsan backwards compatible, or do we want to change SONAME
there too?


No, they are not (for UBSan heuristic doesn't work well for GCC, TSan 
has some type changes into interceptors and data structures, e.g. in 
struct ReportStack). I  can share more details, if desired.




The aarch64 changes are terrible, not just because it doesn't yet have
runtime decision on what VA to use or that it doesn't support 48-bit VA,
but also that for the 42-bit VA it uses a different shadow offset from
39-bit VA.  But on the compiler side we have just one...
Though, only the 39-bit VA is enabled right now by default, so out of the
box the state is as bad as we had in 5.x - users wanting 42-bit VA or 48-bit
VA have to patch libsanitizer.

Have you verified libbacktrace sanitization still works properly (that is
something upstream does not test)?


I'm sorry, didn't catch well your words about libbacktrace sanitization. 
Did you mean symbolization? If so, I didn't perform any special 
validation here (thought output patterns tests use libbacktrace output, 
no?). But I wonder how can I verify this more or less automatically.




Do you plan to update the asan tests we have to reflect the changes in
upstream?


Hm, there aren't changes into instrumentation, so the only thing is new 
interceptors. If it is desirable, I can migrate some tests for new 
interceptors from upstream.




Jakub





Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Yury Gribov

On 10/14/2015 12:34 PM, Maxim Ostapenko wrote:

On 14/10/15 10:54, Jakub Jelinek wrote:

Do you plan to update the asan tests we have to reflect the changes in
upstream?


Hm, there aren't changes into instrumentation, so the only thing is new
interceptors. If it is desirable, I can migrate some tests for new
interceptors from upstream.


What about e.g. "-Improvements for ASan deactivated start were performed"?

-Y


Re: [PATCH] Fix default_binds_local_p_2 for extern protected data

2015-10-14 Thread Szabolcs Nagy

On 30/09/15 20:23, Andreas Krebbel wrote:

On 09/30/2015 06:21 PM, Szabolcs Nagy wrote:

On 30/09/15 14:47, Bernd Schmidt wrote:

On 09/17/2015 11:15 AM, Szabolcs Nagy wrote:

ping 2.

this patch is needed for working visibility ("protected")
attribute for extern data on targets using default_binds_local_p_2.
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01871.html


I hesitate to review this one since I don't think I understand the
issues on the various affected arches well enough. It looks like Jakub
had some input on the earlier changes, maybe he could take a look? Or
maybe rth knows best. Adding Ccs.

It would help to have examples of code generation demonstrating the
problem and how you would solve it. Input from the s390 maintainers
whether this is correct for their port would also be appreciated.


We are having the same problem on S/390. I think the GCC change is correct for 
S/390 as well.

-Andreas-



i think the approvals of arm and aarch64 maintainers
are needed to apply this fix for pr target/66912.

(only s390, arm and aarch64 use this predicate.)





consider the TU

__attribute__((visibility("protected"))) int n;

int f () { return n; }

if n "binds_local" then gcc -O -fpic -S is like

  .text
  .align  2
  .global f
  .arch armv8-a+fp+simd
  .type   f, %function
f:
  adrpx0, n
  ldr w0, [x0, #:lo12:n]
  ret
  .size   f, .-f
  .protected  n
  .comm   n,4,4

so 'n' is a direct reference, not accessed through
the GOT ('n' will be in the .bss of the dso).
this is the current behavior.

if i remove the protected visibility attribute
then the access goes through GOT:

  .text
  .align  2
  .global f
  .arch armv8-a+fp+simd
  .type   f, %function
f:
  adrpx0, _GLOBAL_OFFSET_TABLE_
  ldr x0, [x0, #:gotpage_lo15:n]
  ldr w0, [x0]
  ret
  .size   f, .-f
  .comm   n,4,4

protected visibility means the definition cannot
be overridden by another module, but it should
still allow extern references.

if the main module references such an object then
(as an implementation detail) it may use copy
relocation against it, which places 'n' in the
main module and the dynamic linker should make
sure that references to 'n' point there.

this is only possible if references to 'n' go
through the GOT (i.e. it should not be "binds_local").










Re: Move some bit and binary optimizations in simplify and match

2015-10-14 Thread Richard Biener
On Wed, Oct 14, 2015 at 7:39 AM, Marc Glisse  wrote:
>
> +(simplify
> + (plus (convert? @0) (convert? (xdivamulminusa @0 @1)))
> +  (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
> +   && tree_nop_conversion_p (type, TREE_TYPE (@0)))
> +   (trunc_mod (convert @0) (convert @1
>
> See PR 67953.

Please drop xdivamulminusa.  It was a bad idea of mine, just add two patterns.

> +(match (abitandnotb @0 @1)
> + (bit_and:c @0 (bit_not INTEGER_CST@1)))
>
> Does that work?

No.  Please drop these helpers and instead duplicate the patterns.

>
> +/* Fold (a * (1 << b)) into (a << b)  */
> +(simplify
> + (mult:c @0 (convert? (lshift integer_onep@1 @2)))
> +  (if (! FLOAT_TYPE_P (type)
> +&& tree_nop_conversion_p (type, TREE_TYPE (@2)))
> +   (lshift @0 (convert @2
>
> You don't need/want to convert @2 (fold-const doesn't convert, does it?),
> and you don't need to check for tree_nop_conversion_p.

I think for long x and x * (long)(1u << b) you need to do because the
result for b == 33 would be different.  Indeed you don't need the convert on @2.

Richard.

>
> --
> Marc Glisse


Re: [PATCH] Allow FSM to thread single block cases too

2015-10-14 Thread Richard Biener
On Tue, Oct 13, 2015 at 2:52 PM, Richard Biener
 wrote:
> On Tue, Oct 13, 2015 at 2:21 PM, Jeff Law  wrote:
>>
>> One of the cases that was missing in the FSM support is threading when the
>> path is a single block.  ie, a control statement's output can be statically
>> determined just by looking at PHIs in the control statement's block for one
>> or incoming edges.
>>
>> This is necessary to fix a regression if I turn off the old jump threader's
>> backedge support.  Just as important, Jan has in the past asked about a
>> trivial jump threader to be run during early optimizations.  Limiting the
>> FSM bits to this case would likely satisfy that need in the future.
>
> I think he asked for trivial forward threads though due to repeated tests.
> I hacked FRE to do this (I think), but maybe some trivial cleanup 
> opportunities
> are still left here.  Honza?

This or other related patches in the range r228731:228774 has caused a quite
big jump in SPEC CPU 2000 binary sizes (notably 176.gcc - so maybe affecting
bootstrap as well, at -O3).  Are you sure this doesn't re-introduce DOM
effectively peeling all loops once?

Richard.

> Richard.
>
>> Bootstrapped and regression tested on x86_64-linux-gnu.  Installed on the
>> trunk.
>>
>> Jeff
>>
>> commit a53bb29a1dffd329aa6235b88b0c2a830aa5a59e
>> Author: Jeff Law 
>> Date:   Tue Oct 13 06:19:20 2015 -0600
>>
>> [PATCH] Allow FSM to thread single block cases too
>>
>> * tree-ssa-threadbackward.c
>> (fsm_find_control_statement_thread_paths):
>> Allow single block jump threading paths.
>>
>> * gcc.dg/tree-ssa/ssa-thread-13.c: New test.
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index d71bcd2..caab533 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,8 @@
>> +2015-10-13  Jeff Law  
>> +
>> +   * tree-ssa-threadbackward.c
>> (fsm_find_control_statement_thread_paths):
>> +   Allow single block jump threading paths.
>> +
>>  2015-10-13  Tom de Vries  
>>
>> PR tree-optimization/67476
>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>> index 4a08f0f..acf6df5 100644
>> --- a/gcc/testsuite/ChangeLog
>> +++ b/gcc/testsuite/ChangeLog
>> @@ -1,3 +1,7 @@
>> +2015-10-13  Jeff Law  
>> +
>> +   * gcc.dg/tree-ssa/ssa-thread-13.c: New test.
>> +
>>  2015-10-12  Jeff Law  
>>
>> * gcc.dg/tree-ssa/ssa-thread-12.c: New test.
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
>> new file mode 100644
>> index 000..5051d11
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
>> @@ -0,0 +1,70 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-vrp1-details" } */
>> +/* { dg-final { scan-tree-dump "FSM" "vrp1" } } */
>> +
>> +typedef struct rtx_def *rtx;
>> +typedef const struct rtx_def *const_rtx;
>> +enum rtx_code
>> +{
>> +  UNKNOWN, VALUE, DEBUG_EXPR, EXPR_LIST, INSN_LIST, SEQUENCE, ADDRESS,
>> +DEBUG_INSN, INSN, JUMP_INSN, CALL_INSN, BARRIER, CODE_LABEL, NOTE,
>> +COND_EXEC, PARALLEL, ASM_INPUT, ASM_OPERANDS, UNSPEC, UNSPEC_VOLATILE,
>> +ADDR_VEC, ADDR_DIFF_VEC, PREFETCH, SET, USE, CLOBBER, CALL, RETURN,
>> +EH_RETURN, TRAP_IF, CONST_INT, CONST_FIXED, CONST_DOUBLE, CONST_VECTOR,
>> +CONST_STRING, CONST, PC, REG, SCRATCH, SUBREG, STRICT_LOW_PART, CONCAT,
>> +CONCATN, MEM, LABEL_REF, SYMBOL_REF, CC0, IF_THEN_ELSE, COMPARE, PLUS,
>> +MINUS, NEG, MULT, SS_MULT, US_MULT, DIV, SS_DIV, US_DIV, MOD, UDIV,
>> UMOD,
>> +AND, IOR, XOR, NOT, ASHIFT, ROTATE, ASHIFTRT, LSHIFTRT, ROTATERT, SMIN,
>> +SMAX, UMIN, UMAX, PRE_DEC, PRE_INC, POST_DEC, POST_INC, PRE_MODIFY,
>> +POST_MODIFY, NE, EQ, GE, GT, LE, LT, GEU, GTU, LEU, LTU, UNORDERED,
>> +ORDERED, UNEQ, UNGE, UNGT, UNLE, UNLT, LTGT, SIGN_EXTEND, ZERO_EXTEND,
>> +TRUNCATE, FLOAT_EXTEND, FLOAT_TRUNCATE, FLOAT, FIX, UNSIGNED_FLOAT,
>> +UNSIGNED_FIX, FRACT_CONVERT, UNSIGNED_FRACT_CONVERT, SAT_FRACT,
>> +UNSIGNED_SAT_FRACT, ABS, SQRT, BSWAP, FFS, CLZ, CTZ, POPCOUNT, PARITY,
>> +SIGN_EXTRACT, ZERO_EXTRACT, HIGH, LO_SUM, VEC_MERGE, VEC_SELECT,
>> +VEC_CONCAT, VEC_DUPLICATE, SS_PLUS, US_PLUS, SS_MINUS, SS_NEG, US_NEG,
>> +SS_ABS, SS_ASHIFT, US_ASHIFT, US_MINUS, SS_TRUNCATE, US_TRUNCATE, FMA,
>> +VAR_LOCATION, DEBUG_IMPLICIT_PTR, ENTRY_VALUE, LAST_AND_UNUSED_RTX_CODE
>> +};
>> +union rtunion_def
>> +{
>> +  rtx rt_rtx;
>> +};
>> +typedef union rtunion_def rtunion;
>> +struct rtx_def
>> +{
>> +  __extension__ enum rtx_code code:16;
>> +  union u
>> +  {
>> +rtunion fld[1];
>> +  }
>> +  u;
>> +};
>> +
>> +unsigned int rtx_cost (rtx, enum rtx_code, unsigned char);
>> +rtx single_set_2 (const_rtx, rtx);
>> +
>> +unsigned
>> +seq_cost (const_rtx seq, unsigned char speed)
>> +{
>> +  unsigned cost = 0;
>> +  rtx set;
>> +  for (; seq; seq = (((seq)->u.fld[2]).rt_rtx))
>> +{
>> +  set =
>> +   (enum rtx_code) (seq)->code) == INSN)
>> + || (((enum rtx_code) (seq)->cod

Re: Move some bit and binary optimizations in simplify and match

2015-10-14 Thread Marc Glisse

On Wed, 14 Oct 2015, Richard Biener wrote:


+/* Fold (a * (1 << b)) into (a << b)  */
+(simplify
+ (mult:c @0 (convert? (lshift integer_onep@1 @2)))
+  (if (! FLOAT_TYPE_P (type)
+&& tree_nop_conversion_p (type, TREE_TYPE (@2)))
+   (lshift @0 (convert @2

You don't need/want to convert @2 (fold-const doesn't convert, does it?),
and you don't need to check for tree_nop_conversion_p.


I think for long x and x * (long)(1u << b) you need to do because the
result for b == 33 would be different.


- that check should be with TREE_TYPE (@1)
- 1u << 33 is undefined, isn't it?

x * (int)(1ul << b), which for b=33 should yield 0, would give the 
undefined x << b so some check does seem needed indeed.


--
Marc Glisse


Re: [vec-cmp, patch 4/6] Support vector mask invariants

2015-10-14 Thread Ilya Enkovich
2015-10-14 11:49 GMT+03:00 Richard Biener :
> On Tue, Oct 13, 2015 at 4:52 PM, Ilya Enkovich  wrote:
>> I don't understand what you mean. vect_get_vec_def_for_operand has two
>> changes made.
>> 1. For boolean invariants use build_same_sized_truth_vector_type
>> instead of get_vectype_for_scalar_type in case statement produces a
>> boolean vector. This covers cases when we use invariants in
>> comparison, AND, IOR, XOR.
>
> Yes, I understand we need this special-casing to differentiate between
> the vector type
> used for boolean-typed loads/stores and the type for boolean typed constants.
> What happens if we mix them btw, like with
>
>   _Bool b = bools[i];
>   _Bool c = b || d;
>   ...
>
> ?

Here both statements should get vector of char as a vectype and we
never go VECTOR_BOOLEAN_TYPE_P way for them

>
>> 2. COND_EXPR is an exception because it has built-in boolean vector
>> result not reflected in its vecinfo. Thus I added additional operand
>> for vect_get_vec_def_for_operand to directly specify vectype for
>> vector definition in case it is a loop invariant.
>> So what do you propose to do with these changes?
>
> This is the change I don't like and don't see why we need it.  It works today
> and the comparison operands should be of appropriate type already?

Today it works because we always create vector of integer constant.
With boolean vectors it may be either integer vector or boolean vector
depending on context. Consider:

_Bool _1;
int _2;

_2 = _1 != 0 ? 0 : 1

We have two zero constants here requiring different vectypes.

Ilya

>
> Richard.
>
>> Thanks,
>> Ilya


Re: [PATCH 7/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Maxim Ostapenko

On 14/10/15 10:48, Jakub Jelinek wrote:

On Tue, Oct 13, 2015 at 02:22:36PM +0300, Maxim Ostapenko wrote:

This is the final patch. Force libsanitizer to use an old ABI for ubsan
float cast data descriptors, because for some exprs (e.g. that type of
tcc_declaration) we can't get the right location for now. I'm not sure about
this, perhaps it should be fixed in GCC somehow.

I don't like this (neither the heuristics on the libubsan, it wouldn't be a
big deal to add a new library entrypoint).
If because of the heuristics you need to ensure that the SourceLocation is
always known, then either you check in ubsan.c whether expand_location
gives you NULL xloc.file and in that case use old style float cast overflow
(without location) - i.e. pass 0, NULL, otherwise you use new style, i.e.
pass 1, &loc.  Or arrange through some special option to emit something like
{ "", 0, 0 } instead of { NULL, 0, 0 } for the float cast case.
And, regardless of this, any progress in making sure we have fewer cases
with UNKNOWN_LOCATION on this will not hurt.  I think at this point I'd
prefer the first choice, i.e. using old style for locations without
filename, and new style otherwise.


2015-10-13  Maxim Ostapenko  

* ubsan/ubsan_handlers.cc (looksLikeFloatCastOverflowDataV1): Always
return true for now.

Index: libsanitizer/ubsan/ubsan_handlers.cc
===
--- libsanitizer/ubsan/ubsan_handlers.cc(revision 250059)
+++ libsanitizer/ubsan/ubsan_handlers.cc(working copy)
@@ -307,6 +307,9 @@
  }
  
  static bool looksLikeFloatCastOverflowDataV1(void *Data) {

+  // (TODO): propagate SourceLocation into DataDescriptor and use this
+  // heuristic than.
+  return true;
// First field is either a pointer to filename or a pointer to a
// TypeDescriptor.
u8 *FilenameOrTypeDescriptor;


Jakub



Ok, got it. The first solution would require changes in libsanitizer 
because heuristic doesn't work for GCC, so perhaps new UBSan entry point 
should go upstream, right? Or this may be implemented as local patch for 
GCC?


BTW, I actually saw UNKNOWN_LOCATION for this expr:

volatile double var;  // this is tcc_decaration, so we have 
UNKNOWN_LOCATION for it.


I wonder if we need emit __ubsan_handle_float_cast_overflow here at all.


Re: Move some bit and binary optimizations in simplify and match

2015-10-14 Thread Richard Biener
On Wed, Oct 14, 2015 at 12:45 PM, Marc Glisse  wrote:
> On Wed, 14 Oct 2015, Richard Biener wrote:
>
>>> +/* Fold (a * (1 << b)) into (a << b)  */
>>> +(simplify
>>> + (mult:c @0 (convert? (lshift integer_onep@1 @2)))
>>> +  (if (! FLOAT_TYPE_P (type)
>>> +&& tree_nop_conversion_p (type, TREE_TYPE (@2)))
>>> +   (lshift @0 (convert @2
>>>
>>> You don't need/want to convert @2 (fold-const doesn't convert, does it?),
>>> and you don't need to check for tree_nop_conversion_p.
>>
>>
>> I think for long x and x * (long)(1u << b) you need to do because the
>> result for b == 33 would be different.
>
>
> - that check should be with TREE_TYPE (@1)

of course

> - 1u << 33 is undefined, isn't it?

Is it?  I thought it were fine for unsigned.  Not sure if we should exploit this
undefinedness here.  Btw, if it were a truncating conversion then the
resulting shift could be invalid if @2 is too big (but not too big for the
wider shift).  So either way I think we should only allow nop conversions
here (as fold-const.c did).

Richard.

> x * (int)(1ul << b), which for b=33 should yield 0, would give the undefined
> x << b so some check does seem needed indeed.
>
> --
> Marc Glisse


Re: [PATCH 7/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek
On Wed, Oct 14, 2015 at 01:51:44PM +0300, Maxim Ostapenko wrote:
> Ok, got it. The first solution would require changes in libsanitizer because
> heuristic doesn't work for GCC, so perhaps new UBSan entry point should go
> upstream, right? Or this may be implemented as local patch for GCC?

No.  The heuristics relies on:
1) either it is old style float cast overflow without location
2) or it is new style float cast with location, but the location must:
   a) not have NULL filename
   b) the filename must not be ""
   c) the filename must not be "\1"
So, my proposal was to emit in GCC the old style float cast overflow if a), b) 
or
c) is true, otherwise the new style.  I have no idea what you mean by
heuristic doesn't work for GCC after that.

> BTW, I actually saw UNKNOWN_LOCATION for this expr:
> 
> volatile double var;  // this is tcc_decaration, so we have UNKNOWN_LOCATION
> for it.

This is not a complete testcase, so I wonder what exactly you are talking
about.  The above doesn't not generate any
__ubsan_handle_float_cast_overflow calls with
-fsanitize=float-cast-overflow, and
volatile double d;
int bar (void) { return d; }
has location.

Jakub


Re: [vec-cmp, patch 2/6] Vectorization factor computation

2015-10-14 Thread Ilya Enkovich
2015-10-13 16:37 GMT+03:00 Richard Biener :
> On Thu, Oct 8, 2015 at 4:59 PM, Ilya Enkovich  wrote:
>> Hi,
>>
>> This patch handles statements with boolean result in vectorization factor 
>> computation.  For comparison its operands type is used instead of restult 
>> type to compute VF.  Other boolean statements are ignored for VF.
>>
>> Vectype for comparison is computed using type of compared values.  Computed 
>> type is propagated into other boolean operations.
>
> This feels rather ad-hoc, mixing up the existing way of computing
> vector type and VF.  I'd rather have turned the whole
> vector type computation around to the scheme working on the operands
> rather than on the lhs and then searching
> for smaller/larger types on the rhs'.
>
> I know this is a tricky function (heh, but you make it even worse...).
> And it needs a helper with knowledge about operations
> so one can compute the result vector type for an operation on its
> operands.  The seeds should be PHIs (handled like now)
> and loads, and yes, externals need special handling.
>
> Ideally we'd do things in two stages, first compute vector types in a
> less constrained manner (not forcing a single vector size)
> and then in a 2nd run promote to a common size also computing the VF to do 
> that.

This sounds like a refactoring, not a functional change, right? Also I
don't see a reason to analyze DF to compute vectypes if we promote it
to a single vector size anyway. For booleans we have to do it because
boolean vectors of the same size may have different number of
elements. What is the reason to do it for other types?

Shouldn't it be a patch independent from comparison vectorization series?

>
> Btw, I think you "mishandle" bool b = boolvar != 0;

This should be handled fine. Statement will inherit a vectype of
'boolvar' definition. If it's invariant - then yes, invariant boolean
statement case is not handled. But this is only because I supposed we
just shouldn't have such statements in a loop. If we may have them,
then using 'vector _Bool (VF)' type for that should be OK.

Ilya

>
> Richard.
>


Re: Move some bit and binary optimizations in simplify and match

2015-10-14 Thread Marc Glisse

On Wed, 14 Oct 2015, Richard Biener wrote:


On Wed, Oct 14, 2015 at 12:45 PM, Marc Glisse  wrote:

On Wed, 14 Oct 2015, Richard Biener wrote:


+/* Fold (a * (1 << b)) into (a << b)  */
+(simplify
+ (mult:c @0 (convert? (lshift integer_onep@1 @2)))
+  (if (! FLOAT_TYPE_P (type)
+&& tree_nop_conversion_p (type, TREE_TYPE (@2)))
+   (lshift @0 (convert @2

You don't need/want to convert @2 (fold-const doesn't convert, does it?),
and you don't need to check for tree_nop_conversion_p.



I think for long x and x * (long)(1u << b) you need to do because the
result for b == 33 would be different.



- that check should be with TREE_TYPE (@1)


of course


- 1u << 33 is undefined, isn't it?


Is it?  I thought it were fine for unsigned.


Can't be, Intel thinks it is 2u while some other hardware thinks it is 0.


Not sure if we should exploit this
undefinedness here.  Btw, if it were a truncating conversion then the
resulting shift could be invalid if @2 is too big (but not too big for the
wider shift).


Yes, that was my example below.


So either way I think we should only allow nop conversions
here (as fold-const.c did).


I agree that's the safest / easiest for now.


x * (int)(1ul << b), which for b=33 should yield 0, would give the undefined
x << b so some check does seem needed indeed.


--
Marc Glisse


Re: [gomp4] privatize internal array variables introduced by the fortran FE

2015-10-14 Thread Paul Richard Thomas
Dear Cesar,

>
> Is there any reason why only certain arrays have array descriptors? The
> arrays with descriptors don't have this problem. It's only the ones
> without descriptors that leak new internal variables that cause errors
> with default(none).
>

This is an obvious question to which there is no obvious answer. When
asked it of one of the originators of gfortran, I was told that they
tried but got into some unspecified mess.

I would add the question as to why characters and scalars do not have
descriptors as well? One day, the volunteer maintainers will have
sorted out enough of the PRs to turn to issues like this. However,
simplification of this kind is just not on the cards at present.

Cheers

Paul


Re: [PATCH 7/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Maxim Ostapenko

On 14/10/15 14:06, Jakub Jelinek wrote:

On Wed, Oct 14, 2015 at 01:51:44PM +0300, Maxim Ostapenko wrote:

Ok, got it. The first solution would require changes in libsanitizer because
heuristic doesn't work for GCC, so perhaps new UBSan entry point should go
upstream, right? Or this may be implemented as local patch for GCC?

No.  The heuristics relies on:
1) either it is old style float cast overflow without location
2) or it is new style float cast with location, but the location must:
a) not have NULL filename
b) the filename must not be ""
c) the filename must not be "\1"
So, my proposal was to emit in GCC the old style float cast overflow if a), b) 
or
c) is true, otherwise the new style.  I have no idea what you mean by
heuristic doesn't work for GCC after that.


I mean that there are some cases where (FilenameOrTypeDescriptor[0] + 
FilenameOrTypeDescriptor[1] < 2) is not sufficient to determine if we 
should use old style. I actually caught this on float-cast-overflow-10.c 
testcase. Here:


$ /home/max/build/master-ref/gcc/xgcc -B/home/max/build/master-ref/gcc/ 
/home/max/workspace/downloads/svn/trunk/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-10.c 
-B/home/max/build/master-ref/x86_64-unknown-linux-gnu/./libsanitizer/ 
-B/home/max/build/master-ref/x86_64-unknown-linux-gnu/./libsanitizer/ubsan/ 
-L/home/max/build/master-ref/x86_64-unknown-linux-gnu/./libsanitizer/ubsan/.libs 
-fno-diagnostics-show-caret -fdiagnostics-color=never -O2 
-fsanitize=float-cast-overflow -fsanitize-recover=float-cast-overflow 
-DUSE_INT128 -DUSE_DFP -DBROKEN_DECIMAL_INT128 -lm -o 
./float-cast-overflow-10.s -S


$ cat float-cast-overflow-10.s

cvt_sc_d32:
.LFB0:
.cfi_startproc
pushq   %rbx
..
.L6:
movl%ebx, %esi
movl$.Lubsan_data0, %edi
call__ubsan_handle_float_cast_overflow
...
.Lubsan_data0:
.quad   .Lubsan_type1
.quad   .Lubsan_type0
.align 2
.type   .Lubsan_type1, @object
.size   .Lubsan_type1, 17
.Lubsan_type1:
.value  -1  // <- TypeKind
.value  32
.string "'_Decimal32'"
.align 2
.type   .Lubsan_type0, @object
.size   .Lubsan_type0, 18
.Lubsan_type0:
.value  0 // <- TypeKind
.value  7
.string "'signed char'"
.section.rodata.cst4,"aM",@progbits,4
.align 4

Here, one can see, we have FilenameOrTypeDescriptor[0]  == -1 and 
FilenameOrTypeDescriptor[1] == 0. So, we end up with wrong decision and 
have SEGV later.

BTW, I actually saw UNKNOWN_LOCATION for this expr:

volatile double var;  // this is tcc_decaration, so we have UNKNOWN_LOCATION
for it.

This is not a complete testcase, so I wonder what exactly you are talking
about.  The above doesn't not generate any
__ubsan_handle_float_cast_overflow calls with
-fsanitize=float-cast-overflow, and
volatile double d;
int bar (void) { return d; }
has location.

Jakub





Re: libgomp testsuite: Remove some explicit acc_device_nvidia usage

2015-10-14 Thread Bernd Schmidt

On 10/09/2015 05:11 PM, Thomas Schwinge wrote:

On Wed, 22 Jul 2015 16:39:54 +0200, I wrote:

[...] cleanup; committed to
gomp-4_0-branch in r226072: [...]


OK for trunk?


I think all three patches here look OK.


Bernd



Re: [vec-cmp, patch 3/6] Vectorize comparison

2015-10-14 Thread Ilya Enkovich
2015-10-13 16:45 GMT+03:00 Richard Biener :
> On Thu, Oct 8, 2015 at 5:03 PM, Ilya Enkovich  wrote:
>> Hi,
>>
>> This patch supports comparison statements vectrization basing on introduced 
>> optabs.
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2015-10-08  Ilya Enkovich  
>>
>> * tree-vect-data-refs.c (vect_get_new_vect_var): Support 
>> vect_mask_var.
>> (vect_create_destination_var): Likewise.
>> * tree-vect-stmts.c (vectorizable_comparison): New.
>> (vect_analyze_stmt): Add vectorizable_comparison.
>> (vect_transform_stmt): Likewise.
>> * tree-vectorizer.h (enum vect_var_kind): Add vect_mask_var.
>> (enum stmt_vec_info_type): Add comparison_vec_info_type.
>> (vectorizable_comparison): New.
>>
>>
>> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> index 3befa38..9edc663 100644
>> --- a/gcc/tree-vect-data-refs.c
>> +++ b/gcc/tree-vect-data-refs.c
>> @@ -3849,6 +3849,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind 
>> var_kind, const char *name)
>>case vect_scalar_var:
>>  prefix = "stmp";
>>  break;
>> +  case vect_mask_var:
>> +prefix = "mask";
>> +break;
>>case vect_pointer_var:
>>  prefix = "vectp";
>>  break;
>> @@ -4403,7 +4406,11 @@ vect_create_destination_var (tree scalar_dest, tree 
>> vectype)
>>tree type;
>>enum vect_var_kind kind;
>>
>> -  kind = vectype ? vect_simple_var : vect_scalar_var;
>> +  kind = vectype
>> +? VECTOR_BOOLEAN_TYPE_P (vectype)
>> +? vect_mask_var
>> +: vect_simple_var
>> +: vect_scalar_var;
>>type = vectype ? vectype : TREE_TYPE (scalar_dest);
>>
>>gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME);
>> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> index 8eda8e9..6949c71 100644
>> --- a/gcc/tree-vect-stmts.c
>> +++ b/gcc/tree-vect-stmts.c
>> @@ -7525,6 +7525,211 @@ vectorizable_condition (gimple *stmt, 
>> gimple_stmt_iterator *gsi,
>>return true;
>>  }
>>
>> +/* vectorizable_comparison.
>> +
>> +   Check if STMT is comparison expression that can be vectorized.
>> +   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
>> +   comparison, put it in VEC_STMT, and insert it at GSI.
>> +
>> +   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
>> +
>> +bool
>> +vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi,
>> +gimple **vec_stmt, tree reduc_def,
>> +slp_tree slp_node)
>> +{
>> +  tree lhs, rhs1, rhs2;
>> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>> +  tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
>> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>> +  tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
>> +  tree vec_compare;
>> +  tree new_temp;
>> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>> +  tree def;
>> +  enum vect_def_type dt, dts[4];
>> +  unsigned nunits;
>> +  int ncopies;
>> +  enum tree_code code;
>> +  stmt_vec_info prev_stmt_info = NULL;
>> +  int i, j;
>> +  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
>> +  vec vec_oprnds0 = vNULL;
>> +  vec vec_oprnds1 = vNULL;
>> +  tree mask_type;
>> +  tree mask;
>> +
>> +  if (!VECTOR_BOOLEAN_TYPE_P (vectype))
>> +return false;
>> +
>> +  mask_type = vectype;
>> +  nunits = TYPE_VECTOR_SUBPARTS (vectype);
>> +
>> +  if (slp_node || PURE_SLP_STMT (stmt_info))
>> +ncopies = 1;
>> +  else
>> +ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
>> +
>> +  gcc_assert (ncopies >= 1);
>> +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>> +return false;
>> +
>> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
>> +  && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
>> +  && reduc_def))
>> +return false;
>> +
>> +  if (STMT_VINFO_LIVE_P (stmt_info))
>> +{
>> +  if (dump_enabled_p ())
>> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +"value used after loop.\n");
>> +  return false;
>> +}
>> +
>> +  if (!is_gimple_assign (stmt))
>> +return false;
>> +
>> +  code = gimple_assign_rhs_code (stmt);
>> +
>> +  if (TREE_CODE_CLASS (code) != tcc_comparison)
>> +return false;
>> +
>> +  rhs1 = gimple_assign_rhs1 (stmt);
>> +  rhs2 = gimple_assign_rhs2 (stmt);
>> +
>> +  if (TREE_CODE (rhs1) == SSA_NAME)
>> +{
>> +  gimple *rhs1_def_stmt = SSA_NAME_DEF_STMT (rhs1);
>> +  if (!vect_is_simple_use_1 (rhs1, stmt, loop_vinfo, bb_vinfo,
>> +&rhs1_def_stmt, &def, &dt, &vectype1))
>> +   return false;
>> +}
>> +  else if (TREE_CODE (rhs1) != INTEGER_CST && TREE_CODE (rhs1) != REAL_CST
>> +  && TREE_CODE (rhs1) != FIXED_CST)
>> +return false;
>
> I think vect_is_simple_use_1 handles constants just fine an def_stmt
> is an output,
> you don't need to initialize it.

OK

>
>> +
>> +  if (TREE_CODE (rhs2) == SSA_NAME)
>> +{
>> +  gimple *rhs2_d

Re: [PR libgomp/65437, libgomp/66518] Initialize runtime in acc_update_device, acc_update_self

2015-10-14 Thread Bernd Schmidt

On 10/09/2015 05:14 PM, Thomas Schwinge wrote:

Hi!

On Fri, 19 Jun 2015 09:47:41 +0200, I wrote:

On Tue, 5 May 2015 11:43:20 +0200, I wrote:

On Mon, 4 May 2015 10:20:14 -0400, John David Anglin  
wrote:

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-42.c
-DACC_DEVICE_TYPE_hos
t=1 -DACC_MEM_SHARED=1 output pattern test, is , should match
\[[0-9a-fA-FxX]+,2
56\] is not mapped



In r224639, now (at least, and at last...) XFAILed


OK to commit?


Ok.


Bernd



Re: [PATCH 7/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek
On Wed, Oct 14, 2015 at 03:02:22PM +0300, Maxim Ostapenko wrote:
> On 14/10/15 14:06, Jakub Jelinek wrote:
> >On Wed, Oct 14, 2015 at 01:51:44PM +0300, Maxim Ostapenko wrote:
> >>Ok, got it. The first solution would require changes in libsanitizer because
> >>heuristic doesn't work for GCC, so perhaps new UBSan entry point should go
> >>upstream, right? Or this may be implemented as local patch for GCC?
> >No.  The heuristics relies on:
> >1) either it is old style float cast overflow without location
> >2) or it is new style float cast with location, but the location must:
> >a) not have NULL filename
> >b) the filename must not be ""
> >c) the filename must not be "\1"
> >So, my proposal was to emit in GCC the old style float cast overflow if a), 
> >b) or
> >c) is true, otherwise the new style.  I have no idea what you mean by
> >heuristic doesn't work for GCC after that.
> 
> I mean that there are some cases where (FilenameOrTypeDescriptor[0] +
> FilenameOrTypeDescriptor[1] < 2) is not sufficient to determine if we should
> use old style. I actually caught this on float-cast-overflow-10.c testcase.

Ah, ok, in that case the heuristics is flawed.  If they want to keep it,
they should check if MaybeFromTypeKind is either < 2 or equal to 0x1fe.
Can you report it upstream?  If that is changed, we'd need to change the
above and also add
  d) the filename must not start with "\xff\xff"
to the rules.

I think it would be better to just add a whole new entrypoint, but if they
think the heuristics is good enough, they should at least fix it up.

Jakub


Re: [PATCH] Optimize const1 * copysign (const2, y) in reassoc (PR tree-optimization/67815)

2015-10-14 Thread Marek Polacek
On Wed, Oct 14, 2015 at 11:10:38AM +0200, Richard Biener wrote:
> > +  FOR_EACH_VEC_ELT (*ops, i, oe)
> > +{
> > +  if (TREE_CODE (oe->op) == SSA_NAME)
> 
> I think you need to check whether the SSA_NAME has a single use only
> as you are changing its value.  Which also means you shouldn't be
> "reusing" it (because existing debug stmts will then be wrong).
> Thus you have to replace it.
 
Changed as per our discussion on IRC.  I'm building a new call while
the old one is going to be cleaned up by subsequent DCE.

> > + ops->ordered_remove (i);
> > + add_to_ops_vec (ops, negrhs);
> 
> Why use ordered remove and add_to_ops_vec here?  Just replace the entry?

Fixed.

I also added a new test.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-10-14  Marek Polacek  

PR tree-optimization/67815
* tree-ssa-reassoc.c (attempt_builtin_copysign): New function.
(reassociate_bb): Call it.

* gcc.dg/tree-ssa/reassoc-39.c: New test.
* gcc.dg/tree-ssa/reassoc-40.c: New test.
* gcc.dg/tree-ssa/reassoc-41.c: New test.

diff --git gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c 
gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
index e69de29..589d06b 100644
--- gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
+++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
@@ -0,0 +1,41 @@
+/* PR tree-optimization/67815 */
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-reassoc1-details" } */
+
+float
+f0 (float x)
+{
+  return 7.5 * __builtin_copysignf (2.0, x);
+}
+
+float
+f1 (float x)
+{
+  return -7.5 * __builtin_copysignf (2.0, x);
+}
+
+double
+f2 (double x, double y)
+{
+  return x * ((1.0/12) * __builtin_copysign (1.0, y));
+}
+
+double
+f3 (double x, double y)
+{
+  return (x * (-1.0/12)) * __builtin_copysign (1.0, y);
+}
+
+double
+f4 (double x, double y, double z)
+{
+  return (x * z) * ((1.0/12) * __builtin_copysign (4.0, y));
+}
+
+double
+f5 (double x, double y, double z)
+{
+  return (x * (-1.0/12)) * z * __builtin_copysign (2.0, y);
+}
+
+/* { dg-final { scan-tree-dump-times "Optimizing copysign" 6 "reassoc1"} }*/
diff --git gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c 
gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c
index e69de29..d65bcc1b 100644
--- gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c
+++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c
@@ -0,0 +1,21 @@
+/* PR tree-optimization/67815 */
+/* { dg-do compile } */
+/* { dg-options "-Ofast -frounding-math -fdump-tree-reassoc1-details" } */
+
+/* Test that the copysign reassoc optimization doesn't fire for
+   -frounding-math (i.e. HONOR_SIGN_DEPENDENT_ROUNDING) if the multiplication
+   is inexact.  */
+
+double
+f1 (double y)
+{
+  return (1.2 * __builtin_copysign (1.1, y));
+}
+
+double
+f2 (double y)
+{
+  return (-1.2 * __builtin_copysign (1.1, y));
+}
+
+/* { dg-final { scan-tree-dump-not "Optimizing copysign" "reassoc1" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/reassoc-41.c 
gcc/testsuite/gcc.dg/tree-ssa/reassoc-41.c
index e69de29..8a18b88 100644
--- gcc/testsuite/gcc.dg/tree-ssa/reassoc-41.c
+++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-41.c
@@ -0,0 +1,21 @@
+/* PR tree-optimization/67815 */
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fno-rounding-math -fdump-tree-reassoc1-details" } */
+
+/* Test that the copysign reassoc optimization does fire for
+   -fno-rounding-math (i.e. HONOR_SIGN_DEPENDENT_ROUNDING) if the 
multiplication
+   is inexact.  */
+
+double
+f1 (double y)
+{
+  return (1.2 * __builtin_copysign (1.1, y));
+}
+
+double
+f2 (double y)
+{
+  return (-1.2 * __builtin_copysign (1.1, y));
+}
+
+/* { dg-final { scan-tree-dump-times "Optimizing copysign" 2 "reassoc1"} }*/
diff --git gcc/tree-ssa-reassoc.c gcc/tree-ssa-reassoc.c
index 879722e..62438dd 100644
--- gcc/tree-ssa-reassoc.c
+++ gcc/tree-ssa-reassoc.c
@@ -4622,6 +4622,102 @@ attempt_builtin_powi (gimple *stmt, vec *ops)
   return result;
 }
 
+/* Attempt to optimize
+   CST1 * copysign (CST2, y) -> copysign (CST1 * CST2, y) if CST1 > 0, or
+   CST1 * copysign (CST2, y) -> -copysign (CST1 * CST2, y) if CST1 < 0.  */
+
+static void
+attempt_builtin_copysign (vec *ops)
+{
+  operand_entry *oe;
+  unsigned int i;
+  unsigned int length = ops->length ();
+  tree cst = ops->last ()->op;
+
+  if (length == 1 || TREE_CODE (cst) != REAL_CST)
+return;
+
+  FOR_EACH_VEC_ELT (*ops, i, oe)
+{
+  if (TREE_CODE (oe->op) == SSA_NAME
+ && has_single_use (oe->op))
+   {
+ gimple *def_stmt = SSA_NAME_DEF_STMT (oe->op);
+ if (is_gimple_call (def_stmt))
+   {
+ tree fndecl = gimple_call_fndecl (def_stmt);
+ tree arg0, arg1;
+ switch (DECL_FUNCTION_CODE (fndecl))
+   {
+   CASE_FLT_FN (BUILT_IN_COPYSIGN):
+ arg0 = gimple_call_arg (def_stmt, 0);
+ arg1 = gimple_call_arg (def_stmt, 1);
+ /* The first argument of copysign must be a constant,
+other

[PATCH][ARM] PR target/67929 Tighten vfp3_const_double_for_bits checks

2015-10-14 Thread Kyrill Tkachov

Hi all,

This patch fixes the referenced PR by rewriting the vfp3_const_double_for_bits 
function in arm.c
The function is supposed to accept positive CONST_DOUBLE rtxes whose value is 
an exact power of 2
and whose log2 is between 1 and 32. That is values like 2.0, 4.0, 8.9, 16.0 
etc...

The current implementation seems to have been written under the assumption that 
exact_real_truncate returns
false if the input value is not an exact integer, whereas in fact 
exact_real_truncate returns false if the
truncation operation was not exact, which are different things. This would lead 
the function to accept any
CONST_DOUBLE that can truncate to a power of 2, such as 4.9, 16.2 etc.

In any case, I've rewritten this function and used the real_isinteger predicate 
to check if the real value
is an exact integer.

The testcase demonstrates the kind of wrong code that this patch addresses.

This bug appears on GCC 5 and 4.9 as well, but due to the recent introduction 
of CONST_DOUBLE_REAL_VALUE
this patch doesn't apply on those branches. I will soon post the backportable 
variant.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2015-10-12  Kyrylo Tkachov  

PR target/67929
* config/arm/arm.c (vfp3_const_double_for_bits): Rewrite.
* config/arm/constraints.md (Dp): Update callsite.
* config/arm/predicates.md (const_double_vcvt_power_of_two): Likewise.

2015-10-12  Kyrylo Tkachov  

PR target/67929
* gcc.target/arm/pr67929_1.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 0bf1164..29dd489 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27734,25 +27734,37 @@ vfp3_const_double_for_fract_bits (rtx operand)
   return 0;
 }
 
+/* If X is a CONST_DOUBLE with a value that is a power of 2 whose
+   log2 is in [1, 32], return that log2.  Otherwise return -1.
+   This is used in the patterns for vcvt.s32.f32 floating-point to
+   fixed-point conversions.  */
+
 int
-vfp3_const_double_for_bits (rtx operand)
+vfp3_const_double_for_bits (rtx x)
 {
-  const REAL_VALUE_TYPE *r0;
+  const REAL_VALUE_TYPE *r;
 
-  if (!CONST_DOUBLE_P (operand))
-return 0;
+  if (!CONST_DOUBLE_P (x))
+return -1;
 
-  r0 = CONST_DOUBLE_REAL_VALUE (operand);
-  if (exact_real_truncate (DFmode, r0))
-{
-  HOST_WIDE_INT value = real_to_integer (r0);
-  value = value & 0x;
-  if ((value != 0) && ( (value & (value - 1)) == 0))
-	return int_log2 (value);
-}
+  r = CONST_DOUBLE_REAL_VALUE (x);
 
-  return 0;
+  if (REAL_VALUE_NEGATIVE (*r)
+  || REAL_VALUE_ISNAN (*r)
+  || REAL_VALUE_ISINF (*r)
+  || !real_isinteger (r, SFmode))
+return -1;
+
+  HOST_WIDE_INT hwint = exact_log2 (real_to_integer (r));
+
+/* The exact_log2 above will have returned -1 if this is
+   not an exact log2.  */
+  if (!IN_RANGE (hwint, 1, 32))
+return -1;
+
+  return hwint;
 }
+
 
 /* Emit a memory barrier around an atomic sequence according to MODEL.  */
 
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index e24858f..901cfe5 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -339,7 +339,8 @@
  "@internal
   In ARM/ Thumb2 a const_double which can be used with a vcvt.s32.f32 with bits operation"
   (and (match_code "const_double")
-   (match_test "TARGET_32BIT && TARGET_VFP && vfp3_const_double_for_bits (op)")))
+   (match_test "TARGET_32BIT && TARGET_VFP
+		&& vfp3_const_double_for_bits (op) > 0")))
 
 (define_register_constraint "Ts" "(arm_restrict_it) ? LO_REGS : GENERAL_REGS"
  "For arm_restrict_it the core registers @code{r0}-@code{r7}.  GENERAL_REGS otherwise.")
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 08cc899..48e4ba8 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -668,7 +668,7 @@
 (define_predicate "const_double_vcvt_power_of_two"
   (and (match_code "const_double")
(match_test "TARGET_32BIT && TARGET_VFP
-   && vfp3_const_double_for_bits (op)")))
+		&& vfp3_const_double_for_bits (op) > 0")))
 
 (define_predicate "neon_struct_operand"
   (and (match_code "mem")
diff --git a/gcc/testsuite/gcc.target/arm/pr67929_1.c b/gcc/testsuite/gcc.target/arm/pr67929_1.c
new file mode 100644
index 000..14943b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr67929_1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_vfp3_ok } */
+/* { dg-options "-O2 -fno-inline" } */
+/* { dg-add-options arm_vfp3 } */
+/* { dg-skip-if "need fp instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+
+int
+foo (float a)
+{
+  return a * 4.9f;
+}
+
+
+int
+main (void)
+{
+  if (foo (10.0f) != 49)
+__builtin_abort ();
+
+  return 0;
+}
\ No newline at end of file


[PATCH][ARM][4.9/5 Backport] PR target/67929 Tighten vfp3_const_double_for_bits checks

2015-10-14 Thread Kyrill Tkachov

Hi all,

This is the 4.9 and GCC 5 version of the patch I posted earlier to fix the 
referenced PR.
Bootstrapped and tested arm-none-linux-gnueabihf on those branches.

Ok for the branches?

Thanks,
Kyrill

2015-10-12  Kyrylo Tkachov  

PR target/67929
* config/arm/arm.c (vfp3_const_double_for_bits): Rewrite.
* config/arm/constraints.md (Dp): Update callsite.
* config/arm/predicates.md (const_double_vcvt_power_of_two): Likewise.

2015-10-12  Kyrylo Tkachov  

PR target/67929
* gcc.target/arm/pr67929_1.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d87eca1..abf2dbb 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27544,25 +27544,36 @@ vfp3_const_double_for_fract_bits (rtx operand)
   return 0;
 }
 
+/* If X is a CONST_DOUBLE with a value that is a power of 2 whose
+   log2 is in [1, 32], return that log2.  Otherwise return -1.
+   This is used in the patterns for vcvt.s32.f32 floating-point to
+   fixed-point conversions.  */
+
 int
-vfp3_const_double_for_bits (rtx operand)
+vfp3_const_double_for_bits (rtx x)
 {
-  REAL_VALUE_TYPE r0;
+  if (!CONST_DOUBLE_P (x))
+return -1;
 
-  if (!CONST_DOUBLE_P (operand))
-return 0;
+  REAL_VALUE_TYPE r;
 
-  REAL_VALUE_FROM_CONST_DOUBLE (r0, operand);
-  if (exact_real_truncate (DFmode, &r0))
-{
-  HOST_WIDE_INT value = real_to_integer (&r0);
-  value = value & 0x;
-  if ((value != 0) && ( (value & (value - 1)) == 0))
-	return int_log2 (value);
-}
+  REAL_VALUE_FROM_CONST_DOUBLE (r, x);
+  if (REAL_VALUE_NEGATIVE (r)
+  || REAL_VALUE_ISNAN (r)
+  || REAL_VALUE_ISINF (r)
+  || !real_isinteger (&r, SFmode))
+return -1;
 
-  return 0;
+  HOST_WIDE_INT hwint = exact_log2 (real_to_integer (&r));
+
+  /* The exact_log2 above will have returned -1 if this is
+ not an exact log2.  */
+  if (!IN_RANGE (hwint, 1, 32))
+return -1;
+
+  return hwint;
 }
+
 
 /* Emit a memory barrier around an atomic sequence according to MODEL.  */
 
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index f9e11e0..d7d0826 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -339,7 +339,8 @@
  "@internal
   In ARM/ Thumb2 a const_double which can be used with a vcvt.s32.f32 with bits operation"
   (and (match_code "const_double")
-   (match_test "TARGET_32BIT && TARGET_VFP && vfp3_const_double_for_bits (op)")))
+   (match_test "TARGET_32BIT && TARGET_VFP
+		&& vfp3_const_double_for_bits (op) > 0")))
 
 (define_register_constraint "Ts" "(arm_restrict_it) ? LO_REGS : GENERAL_REGS"
  "For arm_restrict_it the core registers @code{r0}-@code{r7}.  GENERAL_REGS otherwise.")
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 08cc899..48e4ba8 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -668,7 +668,7 @@
 (define_predicate "const_double_vcvt_power_of_two"
   (and (match_code "const_double")
(match_test "TARGET_32BIT && TARGET_VFP
-   && vfp3_const_double_for_bits (op)")))
+		&& vfp3_const_double_for_bits (op) > 0")))
 
 (define_predicate "neon_struct_operand"
   (and (match_code "mem")
diff --git a/gcc/testsuite/gcc.target/arm/pr67929_1.c b/gcc/testsuite/gcc.target/arm/pr67929_1.c
new file mode 100644
index 000..14943b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr67929_1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_vfp3_ok } */
+/* { dg-options "-O2 -fno-inline" } */
+/* { dg-add-options arm_vfp3 } */
+/* { dg-skip-if "need fp instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+
+int
+foo (float a)
+{
+  return a * 4.9f;
+}
+
+
+int
+main (void)
+{
+  if (foo (10.0f) != 49)
+__builtin_abort ();
+
+  return 0;
+}
\ No newline at end of file


[PATCH][AArch64] Enable fusion of AES instructions

2015-10-14 Thread Wilco Dijkstra
Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs. This 
can give up to 2x
speedup on many AArch64 implementations. Also model the crypto instructions on 
Cortex-A57 according
to the Optimization Guide.

Passes regression tests.

ChangeLog:
2015-10-14  Wilco Dijkstra  

* gcc/config/aarch64/aarch64.c (cortexa53_tunings): Add AES fusion.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(arch_macro_fusion_pair_p): Add support for AES fusion.
* gcc/config/aarch64/aarch64-fusion-pairs.def: Add AES_AESMC entry.
* gcc/config/arm/aarch-common.c (aarch_crypto_can_dual_issue):
Allow virtual registers before reload so early scheduling works.
* gcc/config/arm/cortex-a57.md (cortex_a57_crypto_simple): Use
correct latency and pipeline.
(cortex_a57_crypto_complex): Likewise.
(cortex_a57_crypto_xor): Likewise.
(define_bypass): Add AES bypass.


---
 gcc/config/aarch64/aarch64-fusion-pairs.def |  1 +
 gcc/config/aarch64/aarch64.c| 10 +++---
 gcc/config/arm/aarch-common.c   |  7 +--
 gcc/config/arm/cortex-a57.md| 17 +++--
 4 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def
b/gcc/config/aarch64/aarch64-fusion-pairs.def
index 53bbef4..fea79fc 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -33,4 +33,5 @@ AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD)
 AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
 AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
 AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
+AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 230902d..96368c6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -376,7 +376,7 @@ static const struct tune_params cortexa53_tunings =
   &generic_branch_cost,
   4, /* memmov_cost  */
   2, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
| AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
   8,   /* function_align.  */
   8,   /* jump_align.  */
@@ -398,7 +398,7 @@ static const struct tune_params cortexa57_tunings =
   &generic_branch_cost,
   4, /* memmov_cost  */
   3, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
| AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
   16,  /* function_align.  */
   8,   /* jump_align.  */
@@ -420,7 +420,7 @@ static const struct tune_params cortexa72_tunings =
   &generic_branch_cost,
   4, /* memmov_cost  */
   3, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
| AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
   16,  /* function_align.  */
   8,   /* jump_align.  */
@@ -12843,6 +12843,10 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
 }
 }
 
+  if ((aarch64_tune_params.fusible_ops & AARCH64_FUSE_AES_AESMC)
+   && aarch_crypto_can_dual_issue (prev, curr))
+return true;
+
   if ((aarch64_tune_params.fusible_ops & AARCH64_FUSE_CMP_BRANCH)
   && any_condjump_p (curr))
 {
diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index 5dd8222..e191ab6 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -63,8 +63,11 @@ aarch_crypto_can_dual_issue (rtx_insn *producer_insn, 
rtx_insn *consumer_insn)
   {
 unsigned int regno = REGNO (SET_DEST (producer_set));
 
-return REGNO (SET_DEST (consumer_set)) == regno
-   && REGNO (XVECEXP (consumer_src, 0, 0)) == regno;
+/* Before reload the registers are virtual, so the destination of
+   consumer_set doesn't need to match.  */
+
+return (REGNO (SET_DEST (consumer_set)) == regno || !reload_completed)
+   && REGNO (XVECEXP (consumer_src, 0, 0)) == regno;
   }
 
   return 0;
diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md
index a32c848..eab9d99 100644
--- a/gcc/config/arm/cortex-a57.md
+++ b/gcc/config/arm/cortex-a57.md
@@ -745,20 +745,20 @@
 neon_fp_sqrt_s_q, neon_fp_sqrt_d_q"))
   "ca57_cx2_block*3")
 
-(define_insn_reservation "cortex_a57_crypto_simple" 4
+(define_insn_reservation "cortex_a57_crypto_simple" 3
   (and (eq_attr "tune" "cortexa57")
(eq_attr "type" 
"crypto_aese,crypto_aesmc,crypto_sha1_fast,crypto_sha256_fast"))
-  "ca57_cx2")
+  "ca57_cx1")
 
-(define_insn_reservation "cortex_a57_crypto_complex" 7
+(define_insn_reservation "cortex_a57_crypto_complex" 6
   (and (eq_attr "tune" "cortexa57")
(eq_attr "type" "crypto_sha1_slow,crypto_sha256_slow"))
-  "ca57_cx2+(ca57_cx2_issue,ca57_cx2)")
+  "ca57_cx1*2")
 
-(

Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.

2015-10-14 Thread Kirill Yukhin
Hello,
On 07 Oct 11:09, Jeff Law wrote:
> On 10/05/2015 07:24 AM, Joseph Myers wrote:
> >On Mon, 5 Oct 2015, Kirill Yukhin wrote:
> >
> >>To enable vectorization of loops w/ calls to math functions it is reasonable
> >>to enable parsing of attribute vector for functions unconditionally and
> >>change GlibC's header file not to use `omp declare simd', but use
> >>__attribute__((vector)) instead.
> >
> >I assume you mean __vector__, for namespace reasons.  Obviously you need
> >appropriate GCC version conditionals in the headers to use the attribute
> >only when supported.  In addition, (a) this attribute doesn't seem to be
> >documented in extend.texi, and you'll need to include documentation in
> >your GCC patch that makes this a generic extension rather than just part
> >of Cilkplus, and (b) you'll need to agree with the x86_64 ABI mailing list
> >an extension of the ABI document (as attached to
> >) to cover this attribute, and
> >update the document there.
> I'm not sure why this attribute isn't documented, but clearly that should be
> fixed.
> 
> From the GCC side, I don't see a compelling reason to keep this attribute
> conditional on Cilk+ support.   One could very easily want to use the math
> library's vector entry points independent of OpenMP or Cilk+.
> 
> I thought the ABI for this stuff was consistent across the implementations
> (that was certainly the goal).  So aside from an example of how to use the
> attribute to get calls into the vector math library, I'm not sure what's
> needed.  Essentially the attribute is just another way to ensure we exploit
> the vector library when possible.
> 
> It also seems to me that showing that example on the libmvec page would be
> advisable.

Patch in the bottom introduces new attribute called `simd'.
I've decided to introduce a new one since Cilk's `vector' attribute
generates only one version of SIMD-enabled function [1] (which one -
implementation specific).

This new attribute shouldn't be used along w/ Cilk's `vector'.
If it is used w/ `pragma omp declare simd' - it is ignored.

Bootstrapped. New tests pass.

gcc/
* c/c-parser.c (c_parser): Add simd_attr_present flag.
(c_parser_declaration_or_fndef): Call c_parser_declaration_or_fndef
if simd_attr_present is set.
(c_finish_omp_declare_simd): Handle simd_attr_present.
* omp-low.c (pass_omp_simd_clone::gate): If target allows - call
without additional conditions.
gcc/testsuite/
* c-c++-common/attr-simd.c: New test.
* c-c++-common/attr-simd-2.c: Ditto.
* c-c++-common/attr-simd-3.c: Ditto.

Is it ok for trunk?

Here is a description for new attribute:
simd
Enables creation of one or more versions that can process multiple 
arguments using
SIMD instructions from a single invocation from a SIMD loop. It is 
ultimately an alias
to `omp declare simd’ pragma, available w/o additional compiler 
switches.
It is prohibited to use the attribute along with Cilk Plus’s `vector’ 
attribute.
If the attribute is specified and `pragma omp declare simd’ presents on 
a decl, then
the attribute is ignored.

[1] - 
https://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm#elem.attr

--
Thanks, K

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 2d24c21..b83c9d8 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -224,6 +224,9 @@ struct GTY(()) c_parser {
   /* Buffer to hold all the tokens from parsing the vector attribute for the
  SIMD-enabled functions (formerly known as elemental functions).  */
   vec  *cilk_simd_fn_tokens;
+
+  /* Designates if "simd" attribute is specified in decl.  */
+  BOOL_BITFIELD simd_attr_present : 1;
 };
 
 
@@ -1700,7 +1703,8 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
   if (declarator == NULL)
{
  if (omp_declare_simd_clauses.exists ()
- || !vec_safe_is_empty (parser->cilk_simd_fn_tokens))
+ || !vec_safe_is_empty (parser->cilk_simd_fn_tokens)
+ || parser->simd_attr_present)
c_finish_omp_declare_simd (parser, NULL_TREE, NULL_TREE,
   omp_declare_simd_clauses);
  c_parser_skip_to_end_of_block_or_statement (parser);
@@ -1796,7 +1800,8 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
  if (!d)
d = error_mark_node;
  if (omp_declare_simd_clauses.exists ()
- || !vec_safe_is_empty (parser->cilk_simd_fn_tokens))
+ || !vec_safe_is_empty (parser->cilk_simd_fn_tokens)
+ || parser->simd_attr_present)
c_finish_omp_declare_simd (parser, d, NULL_TREE,
   omp_declare_simd_clauses);
}
@@ -1809,7 +1814,8 @@ c_parser_declaration_or_fndef (c_parser *parse

Re: [PATCH] Allow FSM to thread single block cases too

2015-10-14 Thread Jeff Law

On 10/14/2015 04:16 AM, Richard Biener wrote:

On Tue, Oct 13, 2015 at 2:52 PM, Richard Biener
 wrote:

On Tue, Oct 13, 2015 at 2:21 PM, Jeff Law  wrote:


One of the cases that was missing in the FSM support is threading when the
path is a single block.  ie, a control statement's output can be statically
determined just by looking at PHIs in the control statement's block for one
or incoming edges.

This is necessary to fix a regression if I turn off the old jump threader's
backedge support.  Just as important, Jan has in the past asked about a
trivial jump threader to be run during early optimizations.  Limiting the
FSM bits to this case would likely satisfy that need in the future.


I think he asked for trivial forward threads though due to repeated tests.
I hacked FRE to do this (I think), but maybe some trivial cleanup opportunities
are still left here.  Honza?


This or other related patches in the range r228731:228774 has caused a quite
big jump in SPEC CPU 2000 binary sizes (notably 176.gcc - so maybe affecting
bootstrap as well, at -O3).  Are you sure this doesn't re-introduce DOM
effectively peeling all loops once?
It's possible.  I've actually got a patch in overnight testing that 
introduces some of the heuristics to avoid mucking up loops to the FSM bits.


jeff



Re: [PATCH] Optimize const1 * copysign (const2, y) in reassoc (PR tree-optimization/67815)

2015-10-14 Thread Richard Biener
On Wed, 14 Oct 2015, Marek Polacek wrote:

> On Wed, Oct 14, 2015 at 11:10:38AM +0200, Richard Biener wrote:
> > > +  FOR_EACH_VEC_ELT (*ops, i, oe)
> > > +{
> > > +  if (TREE_CODE (oe->op) == SSA_NAME)
> > 
> > I think you need to check whether the SSA_NAME has a single use only
> > as you are changing its value.  Which also means you shouldn't be
> > "reusing" it (because existing debug stmts will then be wrong).
> > Thus you have to replace it.
>  
> Changed as per our discussion on IRC.  I'm building a new call while
> the old one is going to be cleaned up by subsequent DCE.
> 
> > > +   ops->ordered_remove (i);
> > > +   add_to_ops_vec (ops, negrhs);
> > 
> > Why use ordered remove and add_to_ops_vec here?  Just replace the entry?
> 
> Fixed.
> 
> I also added a new test.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2015-10-14  Marek Polacek  
> 
>   PR tree-optimization/67815
>   * tree-ssa-reassoc.c (attempt_builtin_copysign): New function.
>   (reassociate_bb): Call it.
> 
>   * gcc.dg/tree-ssa/reassoc-39.c: New test.
>   * gcc.dg/tree-ssa/reassoc-40.c: New test.
>   * gcc.dg/tree-ssa/reassoc-41.c: New test.
> 
> diff --git gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c 
> gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
> index e69de29..589d06b 100644
> --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
> +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
> @@ -0,0 +1,41 @@
> +/* PR tree-optimization/67815 */
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -fdump-tree-reassoc1-details" } */
> +
> +float
> +f0 (float x)
> +{
> +  return 7.5 * __builtin_copysignf (2.0, x);
> +}
> +
> +float
> +f1 (float x)
> +{
> +  return -7.5 * __builtin_copysignf (2.0, x);
> +}
> +
> +double
> +f2 (double x, double y)
> +{
> +  return x * ((1.0/12) * __builtin_copysign (1.0, y));
> +}
> +
> +double
> +f3 (double x, double y)
> +{
> +  return (x * (-1.0/12)) * __builtin_copysign (1.0, y);
> +}
> +
> +double
> +f4 (double x, double y, double z)
> +{
> +  return (x * z) * ((1.0/12) * __builtin_copysign (4.0, y));
> +}
> +
> +double
> +f5 (double x, double y, double z)
> +{
> +  return (x * (-1.0/12)) * z * __builtin_copysign (2.0, y);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Optimizing copysign" 6 "reassoc1"} }*/
> diff --git gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c 
> gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c
> index e69de29..d65bcc1b 100644
> --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c
> +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c
> @@ -0,0 +1,21 @@
> +/* PR tree-optimization/67815 */
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -frounding-math -fdump-tree-reassoc1-details" } */
> +
> +/* Test that the copysign reassoc optimization doesn't fire for
> +   -frounding-math (i.e. HONOR_SIGN_DEPENDENT_ROUNDING) if the multiplication
> +   is inexact.  */
> +
> +double
> +f1 (double y)
> +{
> +  return (1.2 * __builtin_copysign (1.1, y));
> +}
> +
> +double
> +f2 (double y)
> +{
> +  return (-1.2 * __builtin_copysign (1.1, y));
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Optimizing copysign" "reassoc1" } } */
> diff --git gcc/testsuite/gcc.dg/tree-ssa/reassoc-41.c 
> gcc/testsuite/gcc.dg/tree-ssa/reassoc-41.c
> index e69de29..8a18b88 100644
> --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-41.c
> +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-41.c
> @@ -0,0 +1,21 @@
> +/* PR tree-optimization/67815 */
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -fno-rounding-math -fdump-tree-reassoc1-details" } */
> +
> +/* Test that the copysign reassoc optimization does fire for
> +   -fno-rounding-math (i.e. HONOR_SIGN_DEPENDENT_ROUNDING) if the 
> multiplication
> +   is inexact.  */
> +
> +double
> +f1 (double y)
> +{
> +  return (1.2 * __builtin_copysign (1.1, y));
> +}
> +
> +double
> +f2 (double y)
> +{
> +  return (-1.2 * __builtin_copysign (1.1, y));
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Optimizing copysign" 2 "reassoc1"} }*/
> diff --git gcc/tree-ssa-reassoc.c gcc/tree-ssa-reassoc.c
> index 879722e..62438dd 100644
> --- gcc/tree-ssa-reassoc.c
> +++ gcc/tree-ssa-reassoc.c
> @@ -4622,6 +4622,102 @@ attempt_builtin_powi (gimple *stmt, vec *> *ops)
>return result;
>  }
>  
> +/* Attempt to optimize
> +   CST1 * copysign (CST2, y) -> copysign (CST1 * CST2, y) if CST1 > 0, or
> +   CST1 * copysign (CST2, y) -> -copysign (CST1 * CST2, y) if CST1 < 0.  */
> +
> +static void
> +attempt_builtin_copysign (vec *ops)
> +{
> +  operand_entry *oe;
> +  unsigned int i;
> +  unsigned int length = ops->length ();
> +  tree cst = ops->last ()->op;
> +
> +  if (length == 1 || TREE_CODE (cst) != REAL_CST)
> +return;
> +
> +  FOR_EACH_VEC_ELT (*ops, i, oe)
> +{
> +  if (TREE_CODE (oe->op) == SSA_NAME
> +   && has_single_use (oe->op))
> + {
> +   gimple *def_stmt = SSA_NAME_DEF_STMT (oe->op);
> +   if (is_gimple_call (def_stmt))
> + {
> +   tree fndecl = gimple_call_fnd

Re: [PATCH] Allow FSM to thread single block cases too

2015-10-14 Thread Richard Biener
On Wed, Oct 14, 2015 at 2:42 PM, Jeff Law  wrote:
> On 10/14/2015 04:16 AM, Richard Biener wrote:
>>
>> On Tue, Oct 13, 2015 at 2:52 PM, Richard Biener
>>  wrote:
>>>
>>> On Tue, Oct 13, 2015 at 2:21 PM, Jeff Law  wrote:


 One of the cases that was missing in the FSM support is threading when
 the
 path is a single block.  ie, a control statement's output can be
 statically
 determined just by looking at PHIs in the control statement's block for
 one
 or incoming edges.

 This is necessary to fix a regression if I turn off the old jump
 threader's
 backedge support.  Just as important, Jan has in the past asked about a
 trivial jump threader to be run during early optimizations.  Limiting
 the
 FSM bits to this case would likely satisfy that need in the future.
>>>
>>>
>>> I think he asked for trivial forward threads though due to repeated
>>> tests.
>>> I hacked FRE to do this (I think), but maybe some trivial cleanup
>>> opportunities
>>> are still left here.  Honza?
>>
>>
>> This or other related patches in the range r228731:228774 has caused a
>> quite
>> big jump in SPEC CPU 2000 binary sizes (notably 176.gcc - so maybe
>> affecting
>> bootstrap as well, at -O3).  Are you sure this doesn't re-introduce DOM
>> effectively peeling all loops once?
>
> It's possible.  I've actually got a patch in overnight testing that
> introduces some of the heuristics to avoid mucking up loops to the FSM bits.

Like never threading a loop exit test to the loop header (but only to the exit).
At least if it is the only exit in the loop (but maybe better for all exits).

Richard.

> jeff
>


Re: Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))

2015-10-14 Thread Michael Matz
Hi,

On Wed, 14 Oct 2015, Richard Biener wrote:

> The compile-time and memory-usage impact for the adhocloc at every token 
> patchkit is quite big.  Remember that gaining 1% in compile-time is hard 
> and 20-40% memory increase for influence.i looks too much.

Yes.  OTOH the compile time and memory use for the v2 patchkit itself look 
reasonable.

> I also wonder why you see differences in memory usage change for
> different -O levels.  I think we should
> have a pretty "static" line table after parsing?  Thus rather than
> percentages I'd like to see absolute changes

He gave the absolute numbers, so you can calculate this yourself :)
empty.c 3KB, big-code.c 6MB, influence.i 400KB, kdecore.cc 4MB and 8MB (v2 
patchkit).

> (which I'd expect to be the same for all -O levels).

This strangely is not the case for influence.i and kdecore.cc.


Ciao,
Michael.


Re: [PATCH] Allow FSM to thread single block cases too

2015-10-14 Thread Jeff Law

On 10/14/2015 06:46 AM, Richard Biener wrote:

This or other related patches in the range r228731:228774 has caused a
quite
big jump in SPEC CPU 2000 binary sizes (notably 176.gcc - so maybe
affecting
bootstrap as well, at -O3).  Are you sure this doesn't re-introduce DOM
effectively peeling all loops once?


It's possible.  I've actually got a patch in overnight testing that
introduces some of the heuristics to avoid mucking up loops to the FSM bits.


Like never threading a loop exit test to the loop header (but only to the exit).
At least if it is the only exit in the loop (but maybe better for all exits).
Right.  The FSM bits are totally missing the restrictions on threading 
through the header or latch that we added to the old style threader many 
years ago.


That was OK as the FSM bits weren't used all that much and primarily 
were concerned with removing the multi-way branch.  With the move 
towards generalizing that code, we need the restrictions.   I expect to 
commit the restrictions today after some minor adjustments.


jeff


[PATCH] Fix PR67915

2015-10-14 Thread Richard Biener

The following removes GENERIC folding from cleanup_control_expr_graph
in favor of GIMPLE one.  This likely doesn't solve the underlying issue
of PR67915 fully but using const_binop as I originally wanted doesn't
catch all cases fold_binary did because stmts that were previously
folded get non-folded as the cgraph state changes (and thus we
can start simplifying aliases and weaks).

I've adjusted match.pd to handle &"ab" < &"ab"[2] and gimplification
to fold the conditions it builds (the C++ FE leaves us with unfolded
COND_EXPR_CONDs).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-10-14  Richard Biener  

PR tree-optimization/67915
* match.pd: Handle comparisons of addresses of STRING_CSTs.
* gimplify.c (gimplify_cond_expr): Fold the GIMPLE conds we build.
* tree-cfgcleanup.c (cleanup_control_expr_graph): Remove GENERIC
stmt folding in favor of GIMPLE one.

* gcc.dg/torture/pr67915.c: New testcase.

Index: gcc/testsuite/gcc.dg/torture/pr67915.c
===
*** gcc/testsuite/gcc.dg/torture/pr67915.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr67915.c  (revision 0)
***
*** 0 
--- 1,23 
+ /* { dg-do compile } */
+ 
+ int a, b, c, d, e, f, g;
+ 
+ int
+ fn1 (int p1)
+ {
+   return p1;
+ }
+ 
+ void
+ fn2 ()
+ {
+ lbl:
+   g = b;
+   if (fn1 (c && e))
+ {
+   f = a ? 0 : 1 << 1;
+   short h = b;
+   d = h < 0 || f ? 0 : 1;
+ }
+   goto lbl;
+ }
Index: gcc/match.pd
===
*** gcc/match.pd(revision 228804)
--- gcc/match.pd(working copy)
*** (define_operator_list CEXPI BUILT_IN_CEX
*** 1998,2005 
   && decl_in_symtab_p (base1))
   equal = symtab_node::get_create (base0)
   ->equal_address_to (symtab_node::get_create (base1));
!else if ((DECL_P (base0) || TREE_CODE (base0) == SSA_NAME)
!   && (DECL_P (base1) || TREE_CODE (base1) == SSA_NAME))
   equal = (base0 == base1);
   }
   (if (equal == 1
--- 1998,2009 
   && decl_in_symtab_p (base1))
   equal = symtab_node::get_create (base0)
   ->equal_address_to (symtab_node::get_create (base1));
!else if ((DECL_P (base0)
!|| TREE_CODE (base0) == SSA_NAME
!|| TREE_CODE (base0) == STRING_CST)
!   && (DECL_P (base1)
!   || TREE_CODE (base1) == SSA_NAME
!   || TREE_CODE (base1) == STRING_CST))
   equal = (base0 == base1);
   }
   (if (equal == 1
*** (define_operator_list CEXPI BUILT_IN_CEX
*** 2007,2015 
  /* If the offsets are equal we can ignore overflow.  */
  || off0 == off1
  || POINTER_TYPE_OVERFLOW_UNDEFINED
! /* Or if we compare using pointers to decls.  */
  || (POINTER_TYPE_P (TREE_TYPE (@2))
! && DECL_P (base0
(switch
 (if (cmp == EQ_EXPR)
{ constant_boolean_node (off0 == off1, type); })
--- 2011,2019 
  /* If the offsets are equal we can ignore overflow.  */
  || off0 == off1
  || POINTER_TYPE_OVERFLOW_UNDEFINED
! /* Or if we compare using pointers to decls or strings.  */
  || (POINTER_TYPE_P (TREE_TYPE (@2))
! && (DECL_P (base0) || TREE_CODE (base0) == STRING_CST
(switch
 (if (cmp == EQ_EXPR)
{ constant_boolean_node (off0 == off1, type); })
Index: gcc/gimplify.c
===
*** gcc/gimplify.c  (revision 228804)
--- gcc/gimplify.c  (working copy)
*** gimplify_cond_expr (tree *expr_p, gimple
*** 3152,3162 
  
gimple_cond_get_ops_from_tree (COND_EXPR_COND (expr), &pred_code, &arm1,
 &arm2);
- 
cond_stmt = gimple_build_cond (pred_code, arm1, arm2, label_true,
!label_false);
! 
gimplify_seq_add_stmt (&seq, cond_stmt);
label_cont = NULL_TREE;
if (!have_then_clause_p)
  {
--- 3152,3163 
  
gimple_cond_get_ops_from_tree (COND_EXPR_COND (expr), &pred_code, &arm1,
 &arm2);
cond_stmt = gimple_build_cond (pred_code, arm1, arm2, label_true,
!label_false);
gimplify_seq_add_stmt (&seq, cond_stmt);
+   gimple_stmt_iterator gsi = gsi_last (seq);
+   maybe_fold_stmt (&gsi);
+ 
label_cont = NULL_TREE;
if (!have_then_clause_p)
  {
Index: gcc/tree-cfgcleanup.c
===
*** gcc/tree-cfgcleanup.c   (revision 228804)
--- gcc/tree-cfgcleanup.c   (working copy)
*** along with GCC; see the file COPYING3.
*** 56,61 
--- 56,64 
  #inclu

[gomp4] More deferral of partitioning to target

2015-10-14 Thread Nathan Sidwell

I've committed this to  gomp4 branch.

It is another step towards deferring  partitioned execution choices to the 
target compiler -- though sadly not the last step.


At early omp lowering, we now attach partitioning flags to the HEAD_MARK 
function I introduced yesterday, and adjust the partition enter/exit sequence to 
not be level specific.


At the oacc_device_lower stage, the loop structure is augmented to hold up to 3 
levels of head & tail markers, allowing a single loop to be partitioned over up 
to 3 axes.  Once constructed we  then iterate of the marker sequences replacing 
the dummy axis argument of the appropriate builtins with the specific chosen axis.


The next step is to iterate over the body doing the same for the loop 
abstraction builtin.


nathan
2015-10-14  Nathan Sidwell  

	* omp-low.c (struct oacc_loop): Add more fields.
	(enum oacc_loop_flags): New.
	(lower_oacc_head_mark): New.
	(lower_oacc_loop_marker): Change meaning of 2nd parameter.
	(lower_oacc_head_tail): Call lower_oacc_head_mark, adjust.
	(new_oacc_loop_raw): Initialize new fields.
	(new_oacc_loop): Extract flags and mask from marker function.
	(new_oacc_loop_routine): New.
	(finish_oacc_loop): Remove tail parameter.
	(dump_oacc_loop_part, dump_oacc_loop): Adjust for new fields.
	(oacc_loop_walk): Collect markers for same loop into single loop
	structure.  Notice routines.
	(oacc_loop_transform): Rename to ...
	(oacc_loop_xform_head_tail): ... here.
	(oacc_loop_process): Assign partitioning levels to head & tail
	sequences.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 228758)
+++ gcc/omp-low.c	(working copy)
@@ -250,13 +250,36 @@ struct oacc_loop
   location_t loc; /* Location of the loop start.  */
 
   /* Start of head and tail.  */
-  gcall *head;  /* Head marker function. */
-  gcall *tail;  /* Tail marker function.  */
+  gcall *heads[GOMP_DIM_MAX];  /* Head marker functions. */
+  gcall *tails[GOMP_DIM_MAX];  /* Tail marker functions. */
 
-  /* Partitioning level.  */
-  unsigned level;
+  tree routine;  /* Pseudo-loop enclosing a routine.  */
+
+  /* Partitioning mask.  */
+  unsigned mask;
+
+  /* Partitioning flags.  */
+  unsigned flags;
 };
 
+/*  Flags for an OpenACC loop.  */
+
+enum oacc_loop_flags
+  {
+OLF_SEQ	= 1u << 0,  /* Explicitly sequential  */
+OLF_AUTO	= 1u << 1,	/* Compiler chooses axes.  */
+OLF_INDEPENDENT = 1u << 2,	/* Iterations are known independent.  */
+OLF_GANG_STATIC = 1u << 3,	/* Gang partitioning is static (has op). */
+
+/* Explicitly specified loop axes.  */
+OLF_DIM_BASE = 4 - GOMP_DIM_GANG,
+OLF_DIM_GANG   = 1u << (OLF_DIM_BASE + GOMP_DIM_GANG),
+OLF_DIM_WORKER = 1u << (OLF_DIM_BASE + GOMP_DIM_WORKER),
+OLF_DIM_VECTOR = 1u << (OLF_DIM_BASE + GOMP_DIM_VECTOR),
+
+OLF_MAX = OLF_DIM_BASE + GOMP_DIM_MAX
+  };
+
 static splay_tree all_contexts;
 static int taskreg_nesting_level;
 static int target_nesting_level;
@@ -4897,18 +4920,115 @@ lower_oacc_reductions (location_t loc, t
   gimple_seq_add_seq (join_seq, after_join);
 }
 
+/* Emit an OpenACC head marker call, encapulating the partitioning and
+   other information that must be processed by the target compiler.
+   Return the maximum number of dimensions the associated loop might
+   be partitioned over.  */
+
+static unsigned
+lower_oacc_head_mark (location_t loc, tree clauses,
+		  gimple_seq *seq, omp_context *ctx)
+{
+  unsigned levels = 0;
+  unsigned tag = 0;
+  tree gang_static = NULL_TREE;
+  auto_vec args;
+
+  args.quick_push (build_int_cst
+		   (integer_type_node, IFN_UNIQUE_OACC_HEAD_MARK));
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+{
+  switch (OMP_CLAUSE_CODE (c))
+	{
+	case OMP_CLAUSE_GANG:
+	  tag |= OLF_DIM_GANG;
+	  gang_static = OMP_CLAUSE_GANG_STATIC_EXPR (c);
+	  levels++;
+	  break;
+
+	case OMP_CLAUSE_WORKER:
+	  tag |=  OLF_DIM_WORKER;
+	  levels++;
+	  break;
+
+	case OMP_CLAUSE_VECTOR:
+	  tag |= OLF_DIM_VECTOR;
+	  levels++;
+	  break;
+
+	case OMP_CLAUSE_SEQ:
+	  tag |= OLF_SEQ;
+	  break;
+
+	case OMP_CLAUSE_AUTO:
+	  tag |= OLF_AUTO;
+	  break;
+
+	case OMP_CLAUSE_INDEPENDENT:
+	  tag |= OLF_INDEPENDENT;
+	  break;
+
+	case OMP_CLAUSE_DEVICE_TYPE:
+	  /* TODO: Add device type handling.  */
+	  goto done;
+
+	default:
+	  continue;
+	}
+}
+
+ done:
+  if (gang_static)
+tag |= OLF_GANG_STATIC;
+
+  /* In a parallel region, loops are implicitly INDEPENDENT.  */
+  if (is_oacc_parallel (ctx))
+tag |= OLF_INDEPENDENT;
+
+  /* In a kernels region, a loop lacking SEQ, GANG, WORKER and/or
+ VECTOR is implicitly AUTO.  */
+  if (is_oacc_kernels (ctx)
+  && !(tag & (((GOMP_DIM_MASK (GOMP_DIM_MAX) - 1) << OLF_DIM_BASE)
+		  | OLF_SEQ)))
+  tag |= OLF_AUTO;
+
+  {
+/* Check we didn't discover any different partitioning from the
+   existing scheme.  */
+unsigned mask = ctx->gwv_this;
+if (ctx->outer &&  gimple_code (ctx->outer->stmt) == 

Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.

2015-10-14 Thread Joseph Myers
On Wed, 14 Oct 2015, Kirill Yukhin wrote:

> Is it ok for trunk?

This patch has no documentation.  Documentation for new attributes must be 
added to extend.texi.

> Enables creation of one or more versions that can process multiple 
> arguments using SIMD instructions from a single invocation from a SIMD 
> loop. It is ultimately an alias to `omp declare simd’ pragma, available 
> w/o additional compiler switches. It is prohibited to use the attribute 
> along with Cilk Plus’s `vector’ attribute. If the attribute is specified 
> and `pragma omp declare simd’ presents on a decl, then the attribute is 
> ignored.

This is missing the key information that it's not just about *creation* of 
the versions, it's about (in the case of an external declaration) 
*assuming* such versions were created in another translation unit.  And 
you should have a link to the external ABI documents specifying for each 
architecture for which this involves such an assumption exactly what 
versions may be assumed to be present - this is what's required to be able 
to use the attribute in headers for a library and know that future GCC 
versions won't reinterpret the attribute as implying some versions for 
future ISA extensions are also present.

I wonder whether the syntax for this attribute should allow optional 
arguments to describe what versions are present, but maybe that can be 
deferred until e.g. you want a way in future for a library to specify it 
has an AVX1024 version of a function as well as the baseline ABI set of 
versions.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH v2] PR rtl-optimization/66790: uninitialized registers handling in REE

2015-10-14 Thread Bernd Schmidt

On 10/13/2015 05:17 PM, Pierre-Marie de Rodat wrote:

The first attached patch is the second attempt to fix PR
rtl-optimization/66790 (see
).

The second one is a fix for some inconsistency noticed while working on
the original bug. This specific patch fixes no known bug, but anyway…

Both were bootstrapped and regtested on x86_64-linux. Ok to commit?
Thank you in advance!

[PATCH 1/2] REE: fix uninitialized registers handling


This one is OK with minor changes. I ran some tests with it, and the mir 
sets look good this time. Code generation still seems unaffected by it 
on all my example code (which is as expected).



+
+  /* Ignoring artificial defs is intentionnal: these often pretend that some


"intentional".


+  if ((!bitmap_equal_p (&problem_data->in[bb->index], DF_MIR_IN (bb)))
+ || (!bitmap_equal_p (&problem_data->out[bb->index], DF_MIR_OUT (bb
+   {
+ /*df_dump (stderr);*/
+ gcc_unreachable ();
+   }


Please remove the commented out code and then also the unnecessary 
braces. In general we avoid commented out code in gcc, but when doing 
it, #if 0 is generally a better method.



+  const rtx reg = XEXP (src, 0);


Drop the const maybe? It doesn't seem to add much and the idiom is to 
just use rtx.



From ff694bf70e0b1ebd336c684713ce6153cc26b3d6 Mon Sep 17 00:00:00 2001

From: Pierre-Marie de Rodat
Date: Tue, 22 Sep 2015 16:02:41 +0200
Subject: [PATCH 2/2] DF_LIVE: make clobbers cancel effect of previous GENs in
  the same BBs

gcc/ChangeLog:

* df-problems.c (df_live_bb_local_compute): Clear GEN bits for
DF_REF_MUST_CLOBBER references.


This one is probably ok too; I still want to experiment with it a little.


Bernd


[PATCH] More vectorizer TLC

2015-10-14 Thread Richard Biener

This removes superfluous parameters from some analysis helpers.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-10-14  Richard Biener  

* tree-vectorizer.h (vect_is_simple_use): Remove unused parameters.
(vect_is_simple_use_1): Likewise.  Make overload of vect_is_simple_use.
(vect_get_vec_def_for_operand): Remove unused parameter.
* tree-vect-loop.c (get_initial_def_for_induction): Adjust.
(vect_create_epilog_for_reduction): Likewise.
(vectorizable_reduction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-patterns.c (type_conversion_p): Likewise.
(vect_recog_vector_vector_shift_pattern): Likewise.
(check_bool_pattern): Likewise.
* tree-vect-slp.c (vect_get_and_check_slp_defs): Likewise.
(vect_analyze_slp_cost_1): Likewise.
* tree-vect-stmts.c (process_use): Likewise.
(vect_get_vec_def_for_operand): Do not handle reductions.
(vect_get_vec_defs): Adjust.
(vectorizable_mask_load_store): Likewise.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vect_get_loop_based_defs): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vect_is_simple_cond): Likewise.
(vectorizable_condition): Likewise.
(vect_is_simple_use): Remove unused parameters.
(vect_is_simple_use_1): Adjust and rename.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 228806)
+++ gcc/tree-vect-loop.c(working copy)
@@ -3412,7 +3412,7 @@ get_initial_def_for_induction (gimple *i
   /* iv_loop is nested in the loop to be vectorized.  init_expr had already
 been created during vectorization of previous stmts.  We obtain it
 from the STMT_VINFO_VEC_STMT of the defining stmt.  */
-  vec_init = vect_get_vec_def_for_operand (init_expr, iv_phi, NULL);
+  vec_init = vect_get_vec_def_for_operand (init_expr, iv_phi);
   /* If the initial value is not of proper type, convert it.  */
   if (!useless_type_conversion_p (vectype, TREE_TYPE (vec_init)))
{
@@ -3798,8 +3798,7 @@ get_initial_def_for_reduction (gimple *s
 if (adjustment_def)
   {
 if (nested_in_vect_loop)
-  *adjustment_def = vect_get_vec_def_for_operand (init_val, stmt,
-  NULL);
+  *adjustment_def = vect_get_vec_def_for_operand (init_val, stmt);
 else
   *adjustment_def = init_val;
   }
@@ -3853,7 +3852,7 @@ get_initial_def_for_reduction (gimple *s
 if (adjustment_def)
   {
 *adjustment_def = NULL_TREE;
-init_def = vect_get_vec_def_for_operand (init_val, stmt, NULL);
+init_def = vect_get_vec_def_for_operand (init_val, stmt);
 break;
   }
 
@@ -4012,12 +4011,13 @@ vect_create_epilog_for_reduction (vecvinfo, def_stmt,
-  &def, &dt))
+  if (!vect_is_simple_use (name, stmt_vinfo->vinfo, def_stmt, &dt))
 return false;
 
   if (dt != vect_internal_def
@@ -207,8 +204,7 @@ type_conversion_p (tree name, gimple *us
   else
 *promotion = false;
 
-  if (!vect_is_simple_use (oprnd0, *def_stmt, stmt_vinfo->vinfo,
-  &dummy_gimple, &dummy, &dt))
+  if (!vect_is_simple_use (oprnd0, stmt_vinfo->vinfo, &dummy_gimple, &dt))
 return false;
 
   return true;
@@ -1830,7 +1826,7 @@ vect_recog_rotate_pattern (vec
   || !TYPE_UNSIGNED (type))
 return NULL;
 
-  if (!vect_is_simple_use (oprnd1, last_stmt, vinfo, &def_stmt, &def, &dt))
+  if (!vect_is_simple_use (oprnd1, vinfo, &def_stmt, &dt))
 return NULL;
 
   if (dt != vect_internal_def
@@ -2058,7 +2054,6 @@ vect_recog_vector_vector_shift_pattern (
   stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
   vec_info *vinfo = stmt_vinfo->vinfo;
   enum vect_def_type dt;
-  tree def;
 
   if (!is_gimple_assign (last_stmt))
 return NULL;
@@ -2090,8 +2085,7 @@ vect_recog_vector_vector_shift_pattern (
 != TYPE_PRECISION (TREE_TYPE (oprnd0)))
 return NULL;
 
-  if (!vect_is_simple_use (oprnd1, last_stmt, vinfo, &def_stmt,
-  &def, &dt))
+  if (!vect_is_simple_use (oprnd1, vinfo, &def_stmt, &dt))
 return NULL;
 
   if (dt != vect_internal_def)
@@ -2102,7 +2096,7 @@ vect_recog_vector_vector_shift_pattern (
   if (*type_in == NULL_TREE)
 return NULL;
 
-  def = NULL_TREE;
+  tree def = NULL_TREE;
   if (gimple_assign_cast_p (def_stmt))
 {
   tree rhs1 = gimple_assign_rhs1 (def_stmt);
@@ -2892,11 +2886,10 @@ check_bool_pattern (tree var, vec_info 

Re: [patch] header file re-ordering.

2015-10-14 Thread Andrew MacLeod

On 10/12/2015 04:04 AM, Jeff Law wrote:



<...>
raised TYPES.UNRECOVERABLE_ERROR : comperr.adb:423
../gcc-interface/Makefile:311: recipe for target 's-regpat.o' failed


However, the tool has been run, and I've made the minor adjustments
required to the source files to make it work.  (ie, a few multi-line
comments and the fact that mul-tables.c is generated on the tile* 
targets.


So this is what it should look like.  I used -cp.Other languages are
bootstrapping, and I have yet to build all the targets... that'll just
take a day.   Be nice if ada worked tho.

I can run the reduction tool over the weekend (its a long weekend here
:-) on this if you want...  the other patch is a couple of weeks out of
date anyway now.
I find myself looking at the objc stuff and wondering if it was built. 
For example objc-act.c calls functions prototyped in fold-const.h, but 
that header is no longer included after your patch.


wait, what?   I don't see any differences to objc-act.c in the 
reordering patches


Oh, you must be looking at the original combined patch?

fold-const.h is indirectly included by cp-tree.h, which gets it from 
including c-common.h.   some of the output from show-headers on 
objc-act.c  (indentation represents levels of including.  The number in 
parenthesis is the number of times that include has been seen so far in 
the files include list.   As you can see, we include ansidecl.h a lot 
:-)  Most of the time there isn't much we can do about those sorts of 
things. :


cp-tree.h
tm.h  (2)
hard-reg-set.h
function.h  (1)
c-common.h
  splay-tree.h
ansidecl.h  (4)
  cpplib.h
symtab.h  (2)
line-map.h  (2)
  alias.h
  tree.h  (2)
  fold-const.h
  diagnostic-core.h  (1)
bversion.h

I guess It could be a useful addition to show-headers to specify a 
header file you are looking for and show you where it comes from if its 
included...


I any case, there is some indirection here because none of the front end 
files were flattened that much


incidentally, you may notice this is the second time tree.h is 
included.  The first occurrence of tree.h is included directly by 
objc-act.c, but it needs to be left because something between that and 
cp-tree.h needs tree.h to compile.This sort of thing is resolved by 
using the re-order tool, but I did not run that tool on most of the objc 
and objcp files as they have some complex conditionals in their include 
list:

#include "tree.h"
#include "stringpool.h"
#include "stor-layout.h"
#include "attribs.h"

#ifdef OBJCPLUS
#include "cp/cp-tree.h"
#else
#include "c/c-tree.h"
#include "c/c-lang.h"
#endif

#include "c-family/c-objc.h"
#include "langhooks.h"

Its beyond the scope of the reorder tool to deal with re-positioning 
this automatically... and happens so rarely I didn't even look into it.

So they are not optimal as far as ordering goes.



Similarly in objcp we remove tree.h from objcp-decl.c, but it uses 
TREE macros and I don't immediately see where those macros would be 
coming from if tree.h is no longer included.


Again, thanks to no flattening of the front end files :-)  It also comes 
from cp-tree.h.  The objcp source files don't specify the full patch of 
cp/cp-tree.h like objc does, so the simplistic show-headers tool doesn't 
know where to look for cp-tree.h to show you what it included like in 
the above example.Maybe I'll tweak the tool to look in common header 
directories.


In general, I'm worried about the objc/objcp stuff.  That in turn 
makes me wonder about the other stuff in a more general sense. 
Regardless, I think I can take a pretty good stab at the config/ changes.




So you can not worry about that.  It builds fine.


A pattern that seems to play out a lot in the target files is they 
liked to include insn-config.h, insn-codes.h, & timevar.h.  I can see 
how those typically won't be needed.  The first two are amazingly 
common.  A comment in the nds32 port indicates that ii may have been 
needed by recog.h in the past.  nds32 actually included insn-config 
twice :-)



Interestingly enough m32r, mcore & pdp11 still need insn-config


most ports get insn-config.h from optabs.h:
  optabs.h
optabs-query.h
  insn-opinit.h  (1)
optabs-libfuncs.h
  insn-opinit.h  (2)
insn-config.h

I think those ports that still include it do not include optabs.h




The strangest thing I saw was rs6000 dropping an include of 
emit-rtl.h.  But presumably various powerpc targets were built, so I 
guess it's really not needed.


It gets emit-rtl.h from ira.h:
  regs.h
  ira.h
emit-rtl.h
  recog.h
insn-codes.h  (2)




I'm slightly concerned about the darwin, windows and solaris bits.  
The former primarily because Darwin has been a general source of pain, 
and in the others because I'm not sure the cross testing will exercise 
that code terribly much.


Its easy enough to NOT do this for any of those files if were too 
worried about them. 

[committed] Improve reassoc-39.c test

2015-10-14 Thread Marek Polacek
Jakub suggested that I improve a testcase I've added a while ago.
Done in the following.

Tested on x86_64-linux, applying to trunk.

2015-10-14  Marek Polacek  

* gcc.dg/tree-ssa/reassoc-39.c: Use -g.  Adjust dg-final.
(f6): New.
(f7): New.
(f8): New.
(f9): New.
(f10): New.
(f11): New.
(f12): New.
(f13): New.

diff --git gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c 
gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
index 589d06b..9befe18 100644
--- gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
+++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
@@ -1,6 +1,9 @@
 /* PR tree-optimization/67815 */
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-reassoc1-details" } */
+/* { dg-options "-Ofast -g -fdump-tree-reassoc1-details" } */
+
+extern float barf (float, float);
+extern double bar (double, double);
 
 float
 f0 (float x)
@@ -38,4 +41,67 @@ f5 (double x, double y, double z)
   return (x * (-1.0/12)) * z * __builtin_copysign (2.0, y);
 }
 
-/* { dg-final { scan-tree-dump-times "Optimizing copysign" 6 "reassoc1"} }*/
+float
+f6 (float x, float y)
+{
+  return 7.5f * y * __builtin_copysignf (2.0f, x);
+}
+
+float
+f7 (float x, float y)
+{
+  return -7.5f * y * __builtin_copysignf (2.0f, x);
+}
+
+float
+f8 (float x)
+{
+  float tmp1 = 7.5f;
+  float tmp2 = __builtin_copysignf (2.0f, x);
+  return tmp1 * tmp2;
+}
+
+double
+f9 (double x)
+{
+  double tmp1 = 7.5;
+  double tmp2 = __builtin_copysign (2.0, x);
+  return tmp1 * tmp2;
+}
+
+float
+f10 (float x)
+{
+  float tmp1 = 7.5f;
+  float tmp2 = __builtin_copysignf (2.0f, x);
+  float tmp3 = tmp2 * 24.0f;
+  return tmp1 * tmp2;
+}
+
+double
+f11 (double x)
+{
+  double tmp1 = 7.5;
+  double tmp2 = __builtin_copysign (2.0, x);
+  double tmp3 = tmp2 * 24.0;
+  return tmp1 * tmp2;
+}
+
+float
+f12 (float x)
+{
+  float tmp1 = 7.5f;
+  float tmp2 = __builtin_copysignf (2.0f, x);
+  /* Can't reassoc here.  */
+  return barf (tmp1 * tmp2, tmp2);
+}
+
+double
+f13 (double x)
+{
+  double tmp1 = 7.5;
+  double tmp2 = __builtin_copysign (2.0, x);
+  /* Can't reassoc here.  */
+  return bar (tmp1 * tmp2, tmp2);
+}
+/* { dg-final { scan-tree-dump-times "Optimizing copysign" 12 "reassoc1"} }*/

Marek


Re: [PATCH] c/67925 - update documentation on `inline'

2015-10-14 Thread Martin Sebor

On 10/13/2015 04:47 PM, Arkadiusz Drabczyk wrote:

* gcc/doc/extend.texi: documentation says that functions declared
`inline' would not be integrated if they are called before they are
defined or if they are recursive. Both of these statements is now
false as shown in examples on Bugzilla.


It might also be worth updating the note in the subsequent
paragraph and removing the mention of variable-length data types
which no longer prevent inlining.

FWIW, the list of most -Winline warnings issued by GCC is here
(there are two more in Ada which, AFAICT, have to do with nested
functions):

$ grep -A1 "can never be inlined" gcc/tree-inline.c
= G_("function %q+F can never be inlined because it uses "
 "alloca (override using the always_inline attribute)");
--
= G_("function %q+F can never be inlined because it uses setjmp");
  *handled_ops_p = true;
--
  = G_("function %q+F can never be inlined because it "
   "uses variable argument lists");
--
  = G_("function %q+F can never be inlined because "
   "it uses setjmp-longjmp exception handling");
--
  = G_("function %q+F can never be inlined because "
   "it uses non-local goto");
--
  = G_("function %q+F can never be inlined because "
   "it uses __builtin_return or __builtin_apply_args");
--
= G_("function %q+F can never be inlined "
 "because it contains a computed goto");
--
warning (OPT_Winline, "function %q+F can never be inlined 
because it "

 "is suppressed using -fno-inline", fn);
--
warning (OPT_Winline, "function %q+F can never be inlined 
because it "

 "uses attributes conflicting with inlining", fn);

Martin


---
  gcc/doc/extend.texi | 9 +++--
  1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 79440d3..7ea4b62 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7088,12 +7088,9 @@ function are integrated into the caller, and the 
function's address is
  never used, then the function's own assembler code is never referenced.
  In this case, GCC does not actually output assembler code for the
  function, unless you specify the option @option{-fkeep-inline-functions}.
-Some calls cannot be integrated for various reasons (in particular,
-calls that precede the function's definition cannot be integrated, and
-neither can recursive calls within the definition).  If there is a
-nonintegrated call, then the function is compiled to assembler code as
-usual.  The function must also be compiled as usual if the program
-refers to its address, because that can't be inlined.
+If there is a nonintegrated call, then the function is compiled to
+assembler code as usual.  The function must also be compiled as usual if
+the program refers to its address, because that can't be inlined.

  @opindex Winline
  Note that certain usages in a function definition can make it unsuitable





[gomp4, committed] Backported param parloops-schedule

2015-10-14 Thread Tom de Vries
[ was: Re: [PATCH, 3/5] Handle original loop tree in 
expand_omp_for_generic ]


On 13/10/15 23:48, Thomas Schwinge wrote:

Hi Tom!

On Mon, 12 Oct 2015 18:56:29 +0200, Tom de Vries  wrote:

>Handle original loop tree in expand_omp_for_generic
>
>2015-09-12  Tom de Vries
>
>PR tree-optimization/67476
>* omp-low.c (expand_omp_for_generic): Handle original loop tree.

Working on a merge from trunk into gomp-4_0-branch, I'm seeing your
change (trunk r228754) conflict with code Chung-Lin changed
(gomp-4_0-branch r224505).  So, would you two please cherry-pick/merge
trunk r228754 into gomp-4_0-branch?  Thanks!  (I'm assuming you can
easily tell what needs to be done here; it's been a long time that
Chung-Lin touched this code, so CCing him just in case.)  Thanks!


Hi Thomas,

I've backport the whole patch series:
 1Handle simple latch in expand_omp_for_generic
 2Add missing phis in expand_omp_for_generic
 3Handle original loop tree in expand_omp_for_generic
 4Support DEFPARAMENUM in params.def
 5Add param parloops-schedule
and committed to gomp-4_0-branch.

I'm only posting patch nr. 3, the only one with a non-trivial conflict.

Thanks,
- Tom
Handle original loop tree in expand_omp_for_generic

2015-10-14  Tom de Vries  

	backport from trunk:
	2015-10-13  Tom de Vries  

	PR tree-optimization/67476
	* omp-low.c (expand_omp_for_generic): Handle original loop tree.
---
 gcc/omp-low.c | 38 +-
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 473e2e7..dde3e1b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6924,7 +6924,6 @@ expand_omp_for_generic (struct omp_region *region,
   remove_edge (e);
 
   make_edge (cont_bb, l2_bb, EDGE_FALSE_VALUE);
-  add_bb_to_loop (l2_bb, cont_bb->loop_father);
   e = find_edge (cont_bb, l1_bb);
   if (e == NULL)
 	{
@@ -7002,23 +7001,36 @@ expand_omp_for_generic (struct omp_region *region,
   set_immediate_dominator (CDI_DOMINATORS, l1_bb,
 			   recompute_dominator (CDI_DOMINATORS, l1_bb));
 
-  struct loop *outer_loop;
-  if (seq_loop)
-	outer_loop = l0_bb->loop_father;
-  else
+  /* We enter expand_omp_for_generic with a loop.  This original loop may
+	 have its own loop struct, or it may be part of an outer loop struct
+	 (which may be the fake loop).  */
+  struct loop *outer_loop = entry_bb->loop_father;
+  bool orig_loop_has_loop_struct = l1_bb->loop_father != outer_loop;
+
+  add_bb_to_loop (l2_bb, outer_loop);
+
+  struct loop *new_loop = NULL;
+  if (!seq_loop)
 	{
-	  outer_loop = alloc_loop ();
-	  outer_loop->header = l0_bb;
-	  outer_loop->latch = l2_bb;
-	  add_loop (outer_loop, l0_bb->loop_father);
+	  /* We've added a new loop around the original loop.  Allocate the
+	 corresponding loop struct.  */
+	  new_loop = alloc_loop ();
+	  new_loop->header = l0_bb;
+	  new_loop->latch = l2_bb;
+	  add_loop (new_loop, outer_loop);
 	}
 
-  if (!gimple_omp_for_combined_p (fd->for_stmt))
+  /* Allocate a loop structure for the original loop unless we already
+	 had one.  */
+  if (!orig_loop_has_loop_struct
+	  && !gimple_omp_for_combined_p (fd->for_stmt))
 	{
-	  struct loop *loop = alloc_loop ();
-	  loop->header = l1_bb;
+	  struct loop *orig_loop = alloc_loop ();
+	  orig_loop->header = l1_bb;
 	  /* The loop may have multiple latches.  */
-	  add_loop (loop, outer_loop);
+	  add_loop (orig_loop, (new_loop != NULL
+? new_loop
+: outer_loop));
 	}
 }
 }
-- 
1.9.1



[AArch64] --with-arch in config.gcc support "."

2015-10-14 Thread Jiong Wang

Since armv8.1 added, we need to improve --with-arch recognition sed
pattern to catch the new "." in the architecture base name.

OK for trunk?

2015-10-14  Jiong Wang  

gcc/
  * config.gcc: Recognize "." in architecture base name for AArch64.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 5818663..215ad9a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3544,7 +3544,7 @@ case "${target}" in
 
 			eval "val=\$with_$which"
 			base_val=`echo $val | sed -e 's/\+.*//'`
-			ext_val=`echo $val | sed -e 's/[a-z0-9\-]\+//'`
+			ext_val=`echo $val | sed -e 's/[a-z0-9\.\-]\+//'`
 
 			if [ $which = arch ]; then
 			  def=aarch64-arches.def


[PATCH] Fix pr67963

2015-10-14 Thread Yulia Koval
Hi,

This patch fixes the issue:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67963

  gcc/config/i386/i386.c (ix86_option_override_internal) Disable
80387 mask if lakemont target is set.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4c25c9e..db722aa 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4943,6 +4943,12 @@ ix86_option_override_internal (bool main_args_p,
  break;
   }

+  if (!strcmp (opts->x_ix86_arch_string, "lakemont"))
+{
+  opts->x_target_flags &= ~MASK_80387;
+  opts_set->x_target_flags |= MASK_80387;
+}
+
   if (TARGET_X32 && (opts->x_ix86_isa_flags & OPTION_MASK_ISA_MPX))
 error ("Intel MPX does not support x32");

Ok for trunk?

Yulia


patch
Description: Binary data


Re: [patch 4/3] Header file reduction - Tools for contrib - second cut

2015-10-14 Thread Andrew MacLeod
Here's the latest version of the tools for a sub directory in contrib.   
I've handled all the feedback, except I have not fully commented the 
python code in the tools, nor followed any particular coding 
convention...   Documentation has been handled, and I've added some 
additional comments to the places which were noted as being unclear. Ive 
also removed all tabs from the source files.


Ive also updated show-headers slightly to be a little more 
error-resistant and to put some emphasis on any header files specified 
on the command as being of interest . (when there are 140 shown, it can 
be hard to find the one you are looking for sometimes)


Do we wish to impose anything in particular on the source for  tools 
going into this sub-directory of contrib? The other tools in contrib 
don't seem to have much in the way of coding standards. I also 
wonder if anyone other than me will look at them much :-)


Andrew




headers/

	* README : New File.
	* count-headers : New File.
	* gcc-order-headers : New File.
	* graph-header-logs : New File.
	* graph-include-web : New File.
	* headerutils.py : New File.
	* included-by : New File.
	* reduce-headers : New File.
	* replace-header : New File.
	* show-headers : New File.

Index: headers/README
===
*** headers/README	(revision 0)
--- headers/README	(working copy)
***
*** 0 
--- 1,283 
+ Quick start documentation for the header file utilities.  
+ 
+ This isn't a full breakdown of the tools, just they typical use scenarios.
+ 
+ - Each tool accepts -h to show it's usage.  Usually no parameters will also
+ trigger the help message.  Help may specify additional functionality to what is
+ listed here.
+ 
+ - For all tools, option format for specifying filenames must have no spaces
+ between the option and filename.
+ ie.: tool -lfilename.h  target.h
+ 
+ - Many of the tools are required to be run from the core gcc source directory
+ containing coretypes.h.  Typically that is in gcc/gcc from a source checkout.
+ For these tools to work on files not in this directory, their path needs to be
+ specified on the command line.
+ ie.: tool c/c-decl.c  lto/lto.c
+ 
+ - options can be intermixed with filenames anywhere on the command line
+ ie.   tool ssa.h rtl.h -a   is equivalent to 
+   tool ssa.h -a rtl.h
+ 
+ 
+ 
+ 
+ 
+ gcc-order-headers
+ -
+   This will reorder any primary backend headers files known to the tool into a
+   canonical order which will resolve any hidden dependencies they may have.
+   Any unknown headers will simply be placed after the recognized files, and
+   retain the same relative ordering they had.
+  
+   This tool must be run in the core gcc source directory.
+ 
+   Simply execute the command listing any files you wish to process on the
+   command line.
+ 
+   Any files which are changed are output, and the original is saved with a
+   .bak extention.
+ 
+   ex.: gcc-order-headers tree-ssa.c c/c-decl.c
+ 
+   -s will list all of the known headers in their canonical order. It does not
+   show which of those headers include other headers, just the final canonical
+   ordering.
+ 
+   if any header files are included within a conditional code block, the tool
+   will issue a message and not change the file.  When this happens, you can
+   manually inspect the file to determine if reordering it is actually OK.  Then
+   rerun the command with the -i option.  This will ignore the conditional error
+   condition and perform the re-ordering anyway.
+   
+   If any #include line has the beginning of a multi-line comment, it will also
+   refuse to process the file until that is resolved by terminating the comment
+   on the same line, or removing it.
+ 
+ 
+ show-headers
+ 
+   This will show the include structure for any given file. Each level of nesting
+   is indented, and when any duplicate headers are seen, they have their
+   duplicate number shown
+ 
+   -i may be used to specify alternate search directories for headers to parse.
+   -s specifies headers to look for and emphasize in the output.
+ 
+   This tool must be run in the core gcc source directory.
+ 
+   ex.: show-headers -sansidecl.h tree-ssa.c
+ 	tree-ssa.c
+ 	  config.h
+ 	auto-host.h
+ 	ansidecl.h  (1)   <<---
+ 	  system.h
+ 	safe-ctype.h
+ 	filenames.h
+ 	  hashtab.h  (1)
+ 		ansidecl.h  (2)<<---
+ 	libiberty.h
+ 	  ansidecl.h  (3)<<---
+ 	hwint.h
+ 	  coretypes.h
+ 	machmode.h  (1)
+ 	  insn-modes.h  (1)
+ 	signop.h
+ 	  <...>
+ 
+ 
+ 
+ 
+ count-headers
+ -
+   simply count all the headers found in the specified files. A summary is 
+   printed showing occurrences from high to low.
+ 
+   ex.:count-headers  tree*.c
+ 	86 : coretypes.h
+ 	86 : config.h
+ 	86 : system.h
+ 	86 : tree.h
+ 	82 : backend.h
+ 	80 :

Re: [PATCH] Fix pr67963

2015-10-14 Thread H.J. Lu
On Wed, Oct 14, 2015 at 8:08 AM, Yulia Koval  wrote:
> Hi,
>
> This patch fixes the issue:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67963
>
>   gcc/config/i386/i386.c (ix86_option_override_internal) Disable
> 80387 mask if lakemont target is set.
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 4c25c9e..db722aa 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -4943,6 +4943,12 @@ ix86_option_override_internal (bool main_args_p,
>   break;
>}
>
> +  if (!strcmp (opts->x_ix86_arch_string, "lakemont"))
> +{
> +  opts->x_target_flags &= ~MASK_80387;
> +  opts_set->x_target_flags |= MASK_80387;
> +}
> +
>if (TARGET_X32 && (opts->x_ix86_isa_flags & OPTION_MASK_ISA_MPX))
>  error ("Intel MPX does not support x32");
>
> Ok for trunk?

We should add a bit to "struct pta" to indicate availability of
80387 ISA and turn it off for lakemount if 90387 ISA hasn't be
turned on explicitly.

We also need some testcases.

-- 
H.J.


Re: [PATCH] Fix pr67963

2015-10-14 Thread H.J. Lu
On Wed, Oct 14, 2015 at 8:15 AM, H.J. Lu  wrote:
> On Wed, Oct 14, 2015 at 8:08 AM, Yulia Koval  wrote:
>> Hi,
>>
>> This patch fixes the issue:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67963
>>
>>   gcc/config/i386/i386.c (ix86_option_override_internal) Disable
>> 80387 mask if lakemont target is set.
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 4c25c9e..db722aa 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -4943,6 +4943,12 @@ ix86_option_override_internal (bool main_args_p,
>>   break;
>>}
>>
>> +  if (!strcmp (opts->x_ix86_arch_string, "lakemont"))
>> +{
>> +  opts->x_target_flags &= ~MASK_80387;
>> +  opts_set->x_target_flags |= MASK_80387;
>> +}
>> +
>>if (TARGET_X32 && (opts->x_ix86_isa_flags & OPTION_MASK_ISA_MPX))
>>  error ("Intel MPX does not support x32");
>>
>> Ok for trunk?
>
> We should add a bit to "struct pta" to indicate availability of
> 80387 ISA and turn it off for lakemount if 90387 ISA hasn't be
> turned on explicitly.

Something like

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a2314e7..1cea58e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4348,6 +4348,7 @@ ix86_option_override_internal (bool main_args_p,
   const enum processor_type processor;
   const enum attr_cpu schedule;
   const unsigned HOST_WIDE_INT flags;
+  const unsigned HOST_WIDE_INT mask;
 }
   const processor_alias_table[] =
 {

> We also need some testcases.
>
> --
> H.J.

-- 
H.J.


Re: [PATCH 8/9] Add TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID

2015-10-14 Thread Jeff Law

On 10/13/2015 02:59 PM, Richard Henderson wrote:

On 10/14/2015 02:49 AM, Jeff Law wrote:

The problem here is we don't know what address space the *0 is going
to hit,
right?


Correct, not before we do the walk of stmt to see what's present.
So the address space information isn't part of the address?  I must 
admit I haven't looked at how that stuff is being implemented.





Isn't that also an issue for code generation as well?


What sort of problem are you thinking of?  I haven't seen one yet.
If the address space information was supposed to be carried in the 
address itself, then we'd need the address to be distinct from 
NULL_POINTER_NODE.


It sounds to me like you're carrying address space information outside 
the address itself, which avoid those issues.  However, it does mean 
that the path isolation code needs some kind of adjustment to 
distinguish between *0 that will fault and *0 which hits a different 
address space and may not fault.


jeff


Re: [patch 0/6] scalar-storage-order merge (2)

2015-10-14 Thread Trevor Saunders
On Tue, Oct 13, 2015 at 07:32:08PM +0200, Eric Botcazou wrote:
> > My main question about this series is - how generally useful do you
> > expect it to be? I know of some different projects that would like
> > bi-endian capability, but it looks like this series implements something
> > that is a little too limited to be of use in these cases.
> 
> AdaCore has customers who have been using it for a few years.  With the 
> inline 
> pragma and either the configuration pragma (Ada) or the switch (C/C++), you 
> can use it without much code rewriting.
> 
> > It looks like it comes with a nontrivial maintenance cost.
> 
> Nontrivial but manageable IMO and the heavily modified parts (mostly the RTL 
> expander) are "cold" these days.  I suspect that less "limited" versions 
> would 
> be far more intrusive and less manageable.
> 
> Of course I would do the maintenance (I have been doing it for a few years at 
> AdaCore), except for the C++ front-end that I don't know at all; that's why 
> I'm OK to drop the C++ support for now.

I haven't looked at the C++ changes, but I tend to think they mat may be
the language where this is the least useful.  I expect it would be
pretty "trivial" to write some wrapper classes that use bswap in
operators so you could say things like struct { uint32_t_be x; }; and
have x stored in big endian.  At which point all you are really missing
is a flag to have all struct members work that way, but rewriting all
struct members of type uint32_t to uint32_t_be should be straight
forward.

trev



Re: [AArch64] --with-arch in config.gcc support "."

2015-10-14 Thread Andreas Schwab
Jiong Wang  writes:

> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 5818663..215ad9a 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -3544,7 +3544,7 @@ case "${target}" in
>  
>   eval "val=\$with_$which"
>   base_val=`echo $val | sed -e 's/\+.*//'`
> - ext_val=`echo $val | sed -e 's/[a-z0-9\-]\+//'`
> + ext_val=`echo $val | sed -e 's/[a-z0-9\.\-]\+//'`

Neither backslash is needed inside bracket expressions (in fact, the
backslash is taken literally here) as the set of special characters is
completely different there.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH 8/9] Add TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID

2015-10-14 Thread Jeff Law

On 10/14/2015 03:19 AM, Richard Biener wrote:

On Tue, Oct 13, 2015 at 10:59 PM, Richard Henderson  wrote:

On 10/14/2015 02:49 AM, Jeff Law wrote:


The problem here is we don't know what address space the *0 is going to
hit,
right?



Correct, not before we do the walk of stmt to see what's present.


Isn't that also an issue for code generation as well?



What sort of problem are you thinking of?  I haven't seen one yet.


The actual dereference of course has a properly address-space qualified zero.
OK.  That's the bit I was missing and hinted out in the message I just 
sent -- the address-space information is carried outside the address. 
It seems that we're carrying it in the type, which is probably sensible 
at some level.





Only your walking depends on operand_equal_p to treat different address-space
zero addresses as equal (which they are of course not ...):
And that's the key problem with carrying the address-space information 
outside the address.   We have to look at more than just the raw address 
to determine if we've got a faulting *0 vs an address space qualified *0.





int
operand_equal_p (const_tree arg0, const_tree arg1, unsigned int flags)
{
...
   /* Check equality of integer constants before bailing out due to
  precision differences.  */
   if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
 {
   /* Address of INTEGER_CST is not defined; check that we did not forget
  to drop the OEP_ADDRESS_OF/OEP_CONSTANT_ADDRESS_OF flags.  */
   gcc_checking_assert (!(flags
  & (OEP_ADDRESS_OF | OEP_CONSTANT_ADDRESS_OF)));
   return tree_int_cst_equal (arg0, arg1);
 }

but only later we do

   /* We cannot consider pointers to different address space equal.  */
   if (POINTER_TYPE_P (TREE_TYPE (arg0))
   && POINTER_TYPE_P (TREE_TYPE (arg1))
   && (TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg0)))
   != TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg1)
 return 0;

So "fixing" that would make the walker only look for default
address-space zero dereferences.

Agreed.



I think we need to fix operand_equal_p anyway because 0 is clearly not
equal to 0 (only if they convert to the same literal)
My worry here is we'd be getting onto a slippery slope.  But it may be 
unavoidable.


jeff


Re: [patch 0/6] scalar-storage-order merge (2)

2015-10-14 Thread Jeff Law

On 10/14/2015 09:25 AM, Trevor Saunders wrote:


I haven't looked at the C++ changes, but I tend to think they mat may be
the language where this is the least useful.  I expect it would be
pretty "trivial" to write some wrapper classes that use bswap in
operators so you could say things like struct { uint32_t_be x; }; and
have x stored in big endian.  At which point all you are really missing
is a flag to have all struct members work that way, but rewriting all
struct members of type uint32_t to uint32_t_be should be straight
forward.
I've seen this kind of thing come up in other languages, even some not 
covered by ACT's work.ACT's work does make supporting the extensions 
in those other languages significantly easier.




jeff


Re: [PATCH] Allow FSM to thread single block cases too

2015-10-14 Thread Jan Hubicka
> >>> I think he asked for trivial forward threads though due to repeated
> >>> tests.
> >>> I hacked FRE to do this (I think), but maybe some trivial cleanup
> >>> opportunities
> >>> are still left here.  Honza?

Well, unthreaded jumps quite confuse profile prediction and create profiles
that we can't fix later. An of course they count in time (and size sometimes)
estimates.

>From cases I commonly see it is the usual lazyness of repeated tests comming
from early inlining/macro expansion and also C++ love to introduce

  if (ptr != NULL)
ptr2 = &ptr->foo;
  else
ptr2 = NULL

for instances of multiple inheritance. usually ptr is known to be non-NULL.
And also cases where if is uses to check individual cases without having proper
esles.

Honza
> >>
> >>
> >> This or other related patches in the range r228731:228774 has caused a
> >> quite
> >> big jump in SPEC CPU 2000 binary sizes (notably 176.gcc - so maybe
> >> affecting
> >> bootstrap as well, at -O3).  Are you sure this doesn't re-introduce DOM
> >> effectively peeling all loops once?
> >
> > It's possible.  I've actually got a patch in overnight testing that
> > introduces some of the heuristics to avoid mucking up loops to the FSM bits.
> 
> Like never threading a loop exit test to the loop header (but only to the 
> exit).
> At least if it is the only exit in the loop (but maybe better for all exits).
> 
> Richard.
> 
> > jeff
> >


[gomp4.5] Support for monotonic and nonmonotonic schedule modifiers

2015-10-14 Thread Jakub Jelinek
Hi!

I've created gomp-4_5-branch in svn, where further OpenMP 4.5 development
will happen.

The following patch which I've committed there (and after a while plan to
merge to trunk together with other smaller changes) adds support for
monotonic and nonmonotonic schedule modifiers.  The older versions of the
standard can be read either way for dynamic and guided schedules, whether
the chunks must be given in order or randomly; all current OpenMP
implementations (including libgomp) have monotonic behavior, but for better
scalability at least of dynamic scheduling allowing random order of the
chunks is desirable, so that work-stealing can be used.

On the library side, this patch right now just adds aliases which make it
clear whether user wants monotonic or nonmonotonic, static kind as well as
ordered clause force monotonic, and for now we treat lack of nonmonotonic as
monotonic (which is going to change in 5.0).
Once we have a work-stealing implementation, we can just change the library
side.

2015-10-14  Jakub Jelinek  

* tree-core.h (enum omp_clause_schedule_kind): Add
OMP_CLAUSE_SCHEDULE_MASK, OMP_CLAUSE_SCHEDULE_MONOTONIC,
OMP_CLAUSE_SCHEDULE_NONMONOTONIC and change
OMP_CLAUSE_SCHEDULE_LAST value.
* omp-builtins.def (BUILT_IN_GOMP_LOOP_NONMONOTONIC_DYNAMIC_START,
BUILT_IN_GOMP_LOOP_NONMONOTONIC_GUIDED_START,
BUILT_IN_GOMP_LOOP_NONMONOTONIC_DYNAMIC_NEXT,
BUILT_IN_GOMP_LOOP_NONMONOTONIC_GUIDED_NEXT,
BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_DYNAMIC_START,
BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_GUIDED_START,
BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_DYNAMIC_NEXT,
BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_GUIDED_NEXT,
BUILT_IN_GOMP_PARALLEL_LOOP_NONMONOTONIC_DYNAMIC,
BUILT_IN_GOMP_PARALLEL_LOOP_NONMONOTONIC_GUIDED): New built-ins.
* omp-low.c (struct omp_region): Add sched_modifiers field.
(struct omp_for_data): Likewise.
(extract_omp_for_data): Fill in sched_modifiers, and mask out
OMP_CLAUSE_SCHEDULE_KIND bits outside of OMP_CLAUSE_SCHEDULE_MASK
from sched_kind.
(determine_parallel_type): Use only OMP_CLAUSE_SCHEDULE_MASK
bits of OMP_CLAUSE_SCHED_KIND.
(expand_parallel_call): Use nonmonotonic entrypoints for
nonmonotonic: dynamic/guided.
(expand_omp_for): Likewise.  Initialize region->sched_modifiers.
* tree-pretty-print.c (dump_omp_clause): Print schedule clause
modifiers.
gcc/c/
* c-parser.c (c_parser_omp_clause_schedule): Parse schedule
modifiers, diagnose monotonic together with nonmonotonic.
* c-typeck.c (c_finish_omp_clauses): Diagnose nonmonotonic
modifier on kinds other than dynamic or guided or nonmonotonic
modifier together with ordered clause.
gcc/cp/
* parser.c (cp_parser_omp_clause_schedule): Parse schedule
modifiers, diagnose monotonic together with nonmonotonic.
* semantics.c (finish_omp_clauses): Diagnose nonmonotonic
modifier on kinds other than dynamic or guided or nonmonotonic
modifier together with ordered clause.
gcc/testsuite/
* c-c++-common/gomp/schedule-modifiers-1.c: New test.
* gcc.dg/gomp/for-20.c: New test.
* gcc.dg/gomp/for-21.c: New test.
* gcc.dg/gomp/for-22.c: New test.
* gcc.dg/gomp/for-23.c: New test.
* gcc.dg/gomp/for-24.c: New test.
libgomp/
* libgomp.map (GOMP_4.5): Export
GOMP_loop_nonmonotonic_dynamic_next,
GOMP_loop_nonmonotonic_dynamic_start,
GOMP_loop_nonmonotonic_guided_next,
GOMP_loop_nonmonotonic_guided_start,
GOMP_loop_ull_nonmonotonic_dynamic_next,
GOMP_loop_ull_nonmonotonic_dynamic_start,
GOMP_loop_ull_nonmonotonic_guided_next,
GOMP_loop_ull_nonmonotonic_guided_start,
GOMP_parallel_loop_nonmonotonic_dynamic and
GOMP_parallel_loop_nonmonotonic_guided.
* libgomp_g.h (GOMP_loop_nonmonotonic_dynamic_next,
GOMP_loop_nonmonotonic_dynamic_start,
GOMP_loop_nonmonotonic_guided_next,
GOMP_loop_nonmonotonic_guided_start,
GOMP_loop_ull_nonmonotonic_dynamic_next,
GOMP_loop_ull_nonmonotonic_dynamic_start,
GOMP_loop_ull_nonmonotonic_guided_next,
GOMP_loop_ull_nonmonotonic_guided_start,
GOMP_parallel_loop_nonmonotonic_dynamic,
GOMP_parallel_loop_nonmonotonic_guided): New prototypes.
* loop.c (GOMP_parallel_loop_nonmonotonic_dynamic,
GOMP_parallel_loop_nonmonotonic_guided,
GOMP_loop_nonmonotonic_dynamic_start,
GOMP_loop_nonmonotonic_guided_start,
GOMP_loop_nonmonotonic_dynamic_next,
GOMP_loop_nonmonotonic_guided_next): New aliases or functions.
* loop_ull.c (GOMP_loop_ull_nonmonotonic_dynamic_start,
GOMP_loop_ull_nonmonotonic_guided_start,
GOMP_loop_ull_nonmonotonic_dynamic_next,
GOMP_loop_ull_nonmonotonic_guided_next): 

Re: [PATCH] Allow FSM to thread single block cases too

2015-10-14 Thread Jeff Law

On 10/14/2015 09:43 AM, Jan Hubicka wrote:

I think he asked for trivial forward threads though due to repeated
tests.
I hacked FRE to do this (I think), but maybe some trivial cleanup
opportunities
are still left here.  Honza?


Well, unthreaded jumps quite confuse profile prediction and create profiles
that we can't fix later. An of course they count in time (and size sometimes)
estimates.

 From cases I commonly see it is the usual lazyness of repeated tests comming
from early inlining/macro expansion and also C++ love to introduce

   if (ptr != NULL)
 ptr2 = &ptr->foo;
   else
 ptr2 = NULL

for instances of multiple inheritance. usually ptr is known to be non-NULL.
And also cases where if is uses to check individual cases without having proper
esles.
Yea.  I still  see a variety of trivial jump threads lying around early 
in the pipeline.


The nice thing about the backwards walking stuff in this context is we 
can control how hard it looks for jump threads much better.


The difficult thing is it's not currently prepared to find the implicit 
sets from conditionals.  Re-using the ASSERT_EXPR mechanisms from vrp 
may be the solution.  I haven't tried that yet, but it's in the back of 
my mind for solving that class of problems cleanly.




jeff



[patch] Minor adjustment to gimplify_addr_expr

2015-10-14 Thread Eric Botcazou
Hi,

this is the regression of ACATS c37213k at -O2 with an upcoming change in
the front-end of the Ada compiler:

eric@polaris:~/gnat/gnat-head/native> gcc/gnat1 -quiet c37213k.adb -I 
/home/eric/gnat/bugs/support -O2 
+===GNAT BUG DETECTED==+
| Pro 7.4.0w (20151014-60) (x86_64-suse-linux) GCC error:  |
| tree check: expected class 'expression', have  |
| 'exceptional' (ssa_name) in tree_operand_check, at tree.h:3431|
| Error detected around c37213k.adb:95:37

It's recompute_tree_invariant_for_addr_expr receiving an SSA_NAME instead of
an ADDR_EXPR when called from gimplify_addr_expr.  The sequence is as follows:
we start with this GIMPLE statement:

  *R.43_60 = c37213k.B_1.B_6.proc6.B_4.B_5.value (); [static-chain: &FRAME.60] 
[return slot optimization]

Then IPA clones the function and turns the statement into:

  MEM[(struct c37213k__B_1__B_6__proc6__B_4__nrec *)R.43_46] = 
c37213k.B_1.B_6.proc6.B_4.B_5.value (); [static-chain: &FRAME.60] [return slot 
optimization]

The 'value' function has been NRVed and contains:

  .builtin_memcpy (&, _9, _10);

and gets inlined so the above statement is rewritten into:

.builtin_memcpy (&MEM[(struct c37213k__B_1__B_6__proc6__B_4__nrec *)R.43_46], 
_174, _175);

so gimplify_addr_expr is invoked on:

  &MEM[(struct c37213k__B_1__B_6__proc6__B_4__nrec *)R.43_46]

and gets confused because it doesn't see that it's just R.43_46 (it would have
seen it if the original INDIRECT_REF was still present in lieu of MEM_REF).

Hence the attached fixlet.  Tested on x86_64-suse-linux, OK for the mainline?


2015-10-14  Eric Botcazou  

* gimplify.c (gimplify_addr_expr) : New case.

-- 
Eric BotcazouIndex: gimplify.c
===
--- gimplify.c	(revision 228794)
+++ gimplify.c	(working copy)
@@ -4984,6 +4984,12 @@ gimplify_addr_expr (tree *expr_p, gimple
   ret = GS_OK;
   break;
 
+case MEM_REF:
+  if (integer_zerop (TREE_OPERAND (op0, 1)))
+	goto do_indirect_ref;
+
+  /* ... fall through ... */
+
 default:
   /* If we see a call to a declared builtin or see its address
 	 being taken (we can unify those cases here) then we can mark


Re: [vec-cmp, patch 3/6] Vectorize comparison

2015-10-14 Thread Ilya Enkovich
On 14 Oct 15:06, Ilya Enkovich wrote:
> 
> Will send an updated version after testing.
> 
> Thanks,
> Ilya
> 

Here is an updated patch version.

Thanks,
Ilya
--
gcc/

2015-10-14  Ilya Enkovich  

* tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
(vect_create_destination_var): Likewise.
* tree-vect-stmts.c (vectorizable_comparison): New.
(vect_analyze_stmt): Add vectorizable_comparison.
(vect_transform_stmt): Likewise.
* tree-vectorizer.h (enum vect_var_kind): Add vect_mask_var.
(enum stmt_vec_info_type): Add comparison_vec_info_type.
(vectorizable_comparison): New.


diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 8a4d489..0be0523 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -3870,6 +3870,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind 
var_kind, const char *name)
   case vect_scalar_var:
 prefix = "stmp";
 break;
+  case vect_mask_var:
+prefix = "mask";
+break;
   case vect_pointer_var:
 prefix = "vectp";
 break;
@@ -4424,7 +4427,11 @@ vect_create_destination_var (tree scalar_dest, tree 
vectype)
   tree type;
   enum vect_var_kind kind;
 
-  kind = vectype ? vect_simple_var : vect_scalar_var;
+  kind = vectype
+? VECTOR_BOOLEAN_TYPE_P (vectype)
+? vect_mask_var
+: vect_simple_var
+: vect_scalar_var;
   type = vectype ? vectype : TREE_TYPE (scalar_dest);
 
   gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 23cec8a..6a52895 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7516,6 +7516,192 @@ vectorizable_condition (gimple *stmt, 
gimple_stmt_iterator *gsi,
   return true;
 }
 
+/* vectorizable_comparison.
+
+   Check if STMT is comparison expression that can be vectorized.
+   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
+   comparison, put it in VEC_STMT, and insert it at GSI.
+
+   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
+
+bool
+vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi,
+gimple **vec_stmt, tree reduc_def,
+slp_tree slp_node)
+{
+  tree lhs, rhs1, rhs2;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
+  tree vec_compare;
+  tree new_temp;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  tree def;
+  enum vect_def_type dts[2] = {vect_unknown_def_type, vect_unknown_def_type};
+  unsigned nunits;
+  int ncopies;
+  enum tree_code code;
+  stmt_vec_info prev_stmt_info = NULL;
+  int i, j;
+  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+  vec vec_oprnds0 = vNULL;
+  vec vec_oprnds1 = vNULL;
+  gimple *def_stmt;
+  tree mask_type;
+  tree mask;
+
+  if (!VECTOR_BOOLEAN_TYPE_P (vectype))
+return false;
+
+  mask_type = vectype;
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  if (slp_node || PURE_SLP_STMT (stmt_info))
+ncopies = 1;
+  else
+ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+  gcc_assert (ncopies >= 1);
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
+  && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+  && reduc_def))
+return false;
+
+  if (STMT_VINFO_LIVE_P (stmt_info))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"value used after loop.\n");
+  return false;
+}
+
+  if (!is_gimple_assign (stmt))
+return false;
+
+  code = gimple_assign_rhs_code (stmt);
+
+  if (TREE_CODE_CLASS (code) != tcc_comparison)
+return false;
+
+  rhs1 = gimple_assign_rhs1 (stmt);
+  rhs2 = gimple_assign_rhs2 (stmt);
+
+  if (!vect_is_simple_use_1 (rhs1, stmt, stmt_info->vinfo,
+&def_stmt, &def, &dts[0], &vectype1))
+return false;
+
+  if (!vect_is_simple_use_1 (rhs2, stmt, stmt_info->vinfo,
+&def_stmt, &def, &dts[1], &vectype2))
+   return false;
+
+  if (vectype1 && vectype2
+  && TYPE_VECTOR_SUBPARTS (vectype1) != TYPE_VECTOR_SUBPARTS (vectype2))
+return false;
+
+  vectype = vectype1 ? vectype1 : vectype2;
+
+  /* Invariant comparison.  */
+  if (!vectype)
+{
+  vectype = build_vector_type (TREE_TYPE (rhs1), nunits);
+  if (tree_to_shwi (TYPE_SIZE_UNIT (vectype)) != current_vector_size)
+   return false;
+}
+  else if (nunits != TYPE_VECTOR_SUBPARTS (vectype))
+return false;
+
+  if (!vec_stmt)
+{
+  STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
+  vect_model_simple_cost (stmt_info, ncopies, dts, NULL, NULL);
+  return expand_vec_cmp_expr_p (vectype, mask_type);
+}
+
+  /* Transform.  */
+  i

Re: [vec-cmp, patch 4/6] Support vector mask invariants

2015-10-14 Thread Ilya Enkovich
On 14 Oct 13:50, Ilya Enkovich wrote:
> 2015-10-14 11:49 GMT+03:00 Richard Biener :
> > On Tue, Oct 13, 2015 at 4:52 PM, Ilya Enkovich  
> > wrote:
> >> I don't understand what you mean. vect_get_vec_def_for_operand has two
> >> changes made.
> >> 1. For boolean invariants use build_same_sized_truth_vector_type
> >> instead of get_vectype_for_scalar_type in case statement produces a
> >> boolean vector. This covers cases when we use invariants in
> >> comparison, AND, IOR, XOR.
> >
> > Yes, I understand we need this special-casing to differentiate between
> > the vector type
> > used for boolean-typed loads/stores and the type for boolean typed 
> > constants.
> > What happens if we mix them btw, like with
> >
> >   _Bool b = bools[i];
> >   _Bool c = b || d;
> >   ...
> >
> > ?
> 
> Here both statements should get vector of char as a vectype and we
> never go VECTOR_BOOLEAN_TYPE_P way for them
> 
> >
> >> 2. COND_EXPR is an exception because it has built-in boolean vector
> >> result not reflected in its vecinfo. Thus I added additional operand
> >> for vect_get_vec_def_for_operand to directly specify vectype for
> >> vector definition in case it is a loop invariant.
> >> So what do you propose to do with these changes?
> >
> > This is the change I don't like and don't see why we need it.  It works 
> > today
> > and the comparison operands should be of appropriate type already?
> 
> Today it works because we always create vector of integer constant.
> With boolean vectors it may be either integer vector or boolean vector
> depending on context. Consider:
> 
> _Bool _1;
> int _2;
> 
> _2 = _1 != 0 ? 0 : 1
> 
> We have two zero constants here requiring different vectypes.
> 
> Ilya
> 
> >
> > Richard.
> >
> >> Thanks,
> >> Ilya

Here is an updated patch version.

Thanks,
Ilya
--
gcc/

2015-10-14  Ilya Enkovich  

* expr.c (const_vector_mask_from_tree): New.
(const_vector_from_tree): Use const_vector_mask_from_tree
for boolean vectors.
* tree-vect-stmts.c (vect_init_vector): Support boolean vector
invariants.
(vect_get_vec_def_for_operand): Add VECTYPE arg.
(vectorizable_condition): Directly provide vectype for invariants
used in comparison.
* tree-vectorizer.h (vect_get_vec_def_for_operand): Add VECTYPE
arg.


diff --git a/gcc/expr.c b/gcc/expr.c
index b5ff598..ab25d1a 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11344,6 +11344,40 @@ try_tablejump (tree index_type, tree index_expr, tree 
minval, tree range,
   return 1;
 }
 
+/* Return a CONST_VECTOR rtx representing vector mask for
+   a VECTOR_CST of booleans.  */
+static rtx
+const_vector_mask_from_tree (tree exp)
+{
+  rtvec v;
+  unsigned i;
+  int units;
+  tree elt;
+  machine_mode inner, mode;
+
+  mode = TYPE_MODE (TREE_TYPE (exp));
+  units = GET_MODE_NUNITS (mode);
+  inner = GET_MODE_INNER (mode);
+
+  v = rtvec_alloc (units);
+
+  for (i = 0; i < VECTOR_CST_NELTS (exp); ++i)
+{
+  elt = VECTOR_CST_ELT (exp, i);
+
+  gcc_assert (TREE_CODE (elt) == INTEGER_CST);
+  if (integer_zerop (elt))
+   RTVEC_ELT (v, i) = CONST0_RTX (inner);
+  else if (integer_onep (elt)
+  || integer_minus_onep (elt))
+   RTVEC_ELT (v, i) = CONSTM1_RTX (inner);
+  else
+   gcc_unreachable ();
+}
+
+  return gen_rtx_CONST_VECTOR (mode, v);
+}
+
 /* Return a CONST_VECTOR rtx for a VECTOR_CST tree.  */
 static rtx
 const_vector_from_tree (tree exp)
@@ -11359,6 +11393,9 @@ const_vector_from_tree (tree exp)
   if (initializer_zerop (exp))
 return CONST0_RTX (mode);
 
+  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp)))
+return const_vector_mask_from_tree (exp);
+
   units = GET_MODE_NUNITS (mode);
   inner = GET_MODE_INNER (mode);
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 6a52895..01168ae 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1308,7 +1308,22 @@ vect_init_vector (gimple *stmt, tree val, tree type, 
gimple_stmt_iterator *gsi)
   if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
{
  if (CONSTANT_CLASS_P (val))
-   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
+   {
+ /* Can't use VIEW_CONVERT_EXPR for booleans because
+of possibly different sizes of scalar value and
+vector element.  */
+ if (VECTOR_BOOLEAN_TYPE_P (type))
+   {
+ if (integer_zerop (val))
+   val = build_int_cst (TREE_TYPE (type), 0);
+ else if (integer_onep (val))
+   val = build_int_cst (TREE_TYPE (type), 1);
+ else
+   gcc_unreachable ();
+   }
+ else
+   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
+   }
  else
{
  new_temp = make_ssa_name (TREE_TYPE (type));
@@ -1339,16 +1354,19 @@ vect_init_vector (gimple *st

[PATCH] PR middle-end/67220: GCC fails to properly handle libcall symbol visibility of built functions

2015-10-14 Thread H.J. Lu
By default, there is no visibility on builtin functions.  When there is
explicitly declared visibility on the C library function which a builtin
function fall back on, we should honor the explicit visibility on the
the C library function.

There are 2 issues:

1. We never update visibility of the fall back C library function.
2. init_block_move_fn and init_block_clear_fn used to implement builtin
memcpy and memset generate the library call to memcpy and memset
directly without checking if there is explicitly declared visibility on
them.

This patch updates builtin function with explicit visibility and checks
visibility on builtin memcpy/memset when generating library call.

Tested on Linux/x86-64 without regressions.  OK for trunk?


H.J.
---
gcc/c/

PR middle-end/67220
* c-decl.c (diagnose_mismatched_decls): Copy explicit visibility
to builtin function.

gcc/

PR middle-end/67220
* expr.c (init_block_move_fn): Copy visibility from the builtin
memcpy.
(init_block_clear_fn): Copy visibility from the builtin memset.

gcc/testsuite/

PR middle-end/67220
* gcc.target/i386/pr67220-1.c: New test.
* gcc.target/i386/pr67220-2.c: Likewise.
* gcc.target/i386/pr67220-3.c: Likewise.
* gcc.target/i386/pr67220-4.c: Likewise.
* gcc.target/i386/pr67220-5.c: Likewise.
* gcc.target/i386/pr67220-6.c: Likewise.
---
 gcc/c/c-decl.c| 21 +
 gcc/expr.c| 12 ++--
 gcc/testsuite/gcc.target/i386/pr67220-1.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr67220-2.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr67220-3.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr67220-4.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr67220-5.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr67220-6.c | 14 ++
 8 files changed, 115 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-6.c

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index ce8406a..26460eb 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -2232,11 +2232,24 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
   /* warnings */
   /* All decls must agree on a visibility.  */
   if (CODE_CONTAINS_STRUCT (TREE_CODE (newdecl), TS_DECL_WITH_VIS)
-  && DECL_VISIBILITY_SPECIFIED (newdecl) && DECL_VISIBILITY_SPECIFIED 
(olddecl)
-  && DECL_VISIBILITY (newdecl) != DECL_VISIBILITY (olddecl))
+  && DECL_VISIBILITY_SPECIFIED (newdecl))
 {
-  warned |= warning (0, "redeclaration of %q+D with different visibility "
-"(old visibility preserved)", newdecl);
+  if (DECL_VISIBILITY_SPECIFIED (olddecl))
+   {
+ if (DECL_VISIBILITY (newdecl) != DECL_VISIBILITY (olddecl))
+   warned |= warning (0, "redeclaration of %q+D with different "
+  "visibility (old visibility preserved)",
+  newdecl);
+   }
+  else if (TREE_CODE (olddecl) == FUNCTION_DECL
+  && DECL_BUILT_IN (olddecl))
+   {
+ enum built_in_function fncode = DECL_FUNCTION_CODE (olddecl);
+ tree fndecl = builtin_decl_explicit (fncode);
+ gcc_assert (fndecl && !DECL_VISIBILITY_SPECIFIED (fndecl));
+ DECL_VISIBILITY (fndecl) = DECL_VISIBILITY (newdecl);
+ DECL_VISIBILITY_SPECIFIED (fndecl) = 1;
+   }
 }
 
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
diff --git a/gcc/expr.c b/gcc/expr.c
index 595324d..a12db96 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1390,7 +1390,11 @@ init_block_move_fn (const char *asmspec)
   TREE_PUBLIC (fn) = 1;
   DECL_ARTIFICIAL (fn) = 1;
   TREE_NOTHROW (fn) = 1;
-  DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
+  tree fndecl = builtin_decl_explicit (BUILT_IN_MEMCPY);
+  if (fndecl)
+   DECL_VISIBILITY (fn) = DECL_VISIBILITY (fndecl);
+  else
+   DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
   DECL_VISIBILITY_SPECIFIED (fn) = 1;
 
   attr_args = build_tree_list (NULL_TREE, build_string (1, "1"));
@@ -2846,7 +2850,11 @@ init_block_clear_fn (const char *asmspec)
   TREE_PUBLIC (fn) = 1;
   DECL_ARTIFICIAL (fn) = 1;
   TREE_NOTHROW (fn) = 1;
-  DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
+  tree fndecl = builtin_decl_explicit (BUILT_IN_MEMSET);
+  if (fndecl)
+   DECL_VISIBILITY (fn) = DECL_VISIBILITY (fndecl);
+  else
+   DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
   DECL_VISIBILITY_SPECIFIED (fn) = 1;
 
   block_clear_fn = fn;
diff --git a/gcc/testsuite/gcc.target/i

Re: [PATCH 5/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Maxim Ostapenko

On 14/10/15 10:37, Jakub Jelinek wrote:

On Tue, Oct 13, 2015 at 02:20:06PM +0300, Maxim Ostapenko wrote:

This patch removes UBSan stubs from ASan and TSan code. We don't embed UBSan
to ASan and UBSan because that would lead to undefined references to C++
stuff when linking with -static-libasan. AFAIK, sanitizer developers use
different libraries for C and CXX runtimes, but I think this is out of scope
of this merge.

Where is CAN_SANITIZE_UB defined?  I don't see it anywhere in the current
libsanitizer and in the patch only:
grep CAN_SANITIZE_UB libsanitizer-249633-2.diff
+#if CAN_SANITIZE_UB
+# define TSAN_CONTAINS_UBSAN (CAN_SANITIZE_UB && !defined(SANITIZER_GO))
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB


Hm, this is strange, perhaps the patch was malformed.



So, unless I'm missing something, it would be best to arrange for
-DCAN_SANITIZE_UB=1 to be in CXXFLAGS for ubsan/ source files and
-DCAN_SANITIZE_UB=0 to be in CXXFLAGS for {a,t}san/ source files?


CAN_SANITIZE_UB definition is hardcoded into new ubsan/ubsan_platform.h 
file. To use DCAN_SANITIZE_UB from CXXFLAGS, we still need some changes 
in libsanitizer against upstream:


Index: libsanitizer/ubsan/ubsan_platform.h
===
--- libsanitizer/ubsan/ubsan_platform.h(revision 250295)
+++ libsanitizer/ubsan/ubsan_platform.h(working copy)
@@ -13,6 +13,7 @@
 #ifndef UBSAN_PLATFORM_H
 #define UBSAN_PLATFORM_H

+#ifndef CAN_SANITIZE_UB
 // Other platforms should be easy to add, and probably work as-is.
 #if (defined(__linux__) || defined(__FreeBSD__) || defined(__APPLE__)) 
&& \

 (defined(__x86_64__) || defined(__i386__) || defined(__arm__) || \
@@ -23,5 +24,6 @@
 #else
 # define CAN_SANITIZE_UB 0
 #endif
+#endif // CAN_SANITIZE_UB

 #endif


Are there any other defines that are supposedly set from cmake or wherever
upstream and are left undefined?


There is ASAN_DYNAMIC macro, but I see it into current libsanitizer too 
and it's not touched in any Makefile. Same for 
ASAN_DYNAMIC_RUNTIME_THUNK, that is used for Windows build and 
ASAN_LOW_MEMORY, that set explicitly only for Android. Do we need to 
touch them?
Also, ASAN_FLEXIBLE_MAPPING_AND_OFFSET was bumped upstream, so we don't 
need it anymore.


I'm applying the patch mentioned above, redefining CAN_SANITIZE_UB in 
corresponding Makefiles, dropping ASAN_FLEXIBLE_MAPPING_AND_OFFSET and 
resending libsanitizer-249633-2.diff in corresponding thread.

2015-10-13  Maxim Ostapenko  

* tsan/tsan_defs.h: Define TSAN_CONTAINS_UBSAN to 0.
* asan/asan_flags.cc (InitializeFlags): Do not initialize UBSan flags.
* asan/asan_rtl.cc (AsanInitInternal): Do not init UBSan.

Jakub





Handle CONSTRUCTOR in operand_equal_p

2015-10-14 Thread Jan Hubicka
Hi,
this patch adds the CONSTRUCTOR case discussed while back.  Only empty
constructors are matched, as those are only appearing in gimple operand.
I tested that during bootstrap about 7500 matches are for empty ctors.
There are couple hundred for non-empty probably used on generic. 

Bootstrapped/regtested x86_64-linux, OK?

Honza

* fold-const.c (operand_equal_p): Match empty constructors.
Index: fold-const.c
===
--- fold-const.c(revision 228735)
+++ fold-const.c(working copy)
@@ -2890,6 +2891,11 @@ operand_equal_p (const_tree arg0, const_
return operand_equal_p (TREE_OPERAND (arg0, 0), TREE_OPERAND (arg1, 0),
flags | OEP_ADDRESS_OF
| OEP_CONSTANT_ADDRESS_OF);
+  case CONSTRUCTOR:
+   /* In GIMPLE empty constructors are allowed in initializers of
+  vector types.  */
+   return (!vec_safe_length (CONSTRUCTOR_ELTS (arg0))
+   && !vec_safe_length (CONSTRUCTOR_ELTS (arg1)));
   default:
break;
   }


Add VIEW_CONVERT_EXPR to operand_equal_p

2015-10-14 Thread Jan Hubicka
Hi,
this patch adds VIEW_CONVERT_EXPR which is another code omitted in
operand_equal_p.  During bootstrap there are about 1000 matches.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* fold-const.c (operand_equal_p): Handle VIEW_CONVERT_EXPR.
Index: fold-const.c
===
--- fold-const.c(revision 228735)
+++ fold-const.c(working copy)
@@ -2962,6 +2968,12 @@ operand_equal_p (const_tree arg0, const_
case IMAGPART_EXPR:
  return OP_SAME (0);
 
+   case VIEW_CONVERT_EXPR:
+ if (!(flags & (OEP_ADDRESS_OF | OEP_CONSTANT_ADDRESS_OF))
+ && !types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1)))
+   return false;
+ return OP_SAME (0);
+
case TARGET_MEM_REF:
case MEM_REF:
  if (!(flags & (OEP_ADDRESS_OF | OEP_CONSTANT_ADDRESS_OF)))


Re: [PATCH] PR middle-end/67220: GCC fails to properly handle libcall symbol visibility of built functions

2015-10-14 Thread Ramana Radhakrishnan
On Wed, Oct 14, 2015 at 5:21 PM, H.J. Lu  wrote:

> ---
> gcc/c/
>
> PR middle-end/67220
> * c-decl.c (diagnose_mismatched_decls): Copy explicit visibility
> to builtin function.
>
> gcc/
>
> PR middle-end/67220
> * expr.c (init_block_move_fn): Copy visibility from the builtin
> memcpy.
> (init_block_clear_fn): Copy visibility from the builtin memset.
>
> gcc/testsuite/
>
> PR middle-end/67220
> * gcc.target/i386/pr67220-1.c: New test.
> * gcc.target/i386/pr67220-2.c: Likewise.
> * gcc.target/i386/pr67220-3.c: Likewise.
> * gcc.target/i386/pr67220-4.c: Likewise.
> * gcc.target/i386/pr67220-5.c: Likewise.
> * gcc.target/i386/pr67220-6.c: Likewise.

Why aren't these tests in gcc.dg ?  The problem affects all targets
not just x86.


Thanks,
Ramana

> ---
>  gcc/c/c-decl.c| 21 +
>  gcc/expr.c| 12 ++--
>  gcc/testsuite/gcc.target/i386/pr67220-1.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr67220-2.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr67220-3.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr67220-4.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr67220-5.c | 14 ++
>  gcc/testsuite/gcc.target/i386/pr67220-6.c | 14 ++
>  8 files changed, 115 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-6.c
>
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index ce8406a..26460eb 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -2232,11 +2232,24 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
>/* warnings */
>/* All decls must agree on a visibility.  */
>if (CODE_CONTAINS_STRUCT (TREE_CODE (newdecl), TS_DECL_WITH_VIS)
> -  && DECL_VISIBILITY_SPECIFIED (newdecl) && DECL_VISIBILITY_SPECIFIED 
> (olddecl)
> -  && DECL_VISIBILITY (newdecl) != DECL_VISIBILITY (olddecl))
> +  && DECL_VISIBILITY_SPECIFIED (newdecl))
>  {
> -  warned |= warning (0, "redeclaration of %q+D with different visibility 
> "
> -"(old visibility preserved)", newdecl);
> +  if (DECL_VISIBILITY_SPECIFIED (olddecl))
> +   {
> + if (DECL_VISIBILITY (newdecl) != DECL_VISIBILITY (olddecl))
> +   warned |= warning (0, "redeclaration of %q+D with different "
> +  "visibility (old visibility preserved)",
> +  newdecl);
> +   }
> +  else if (TREE_CODE (olddecl) == FUNCTION_DECL
> +  && DECL_BUILT_IN (olddecl))
> +   {
> + enum built_in_function fncode = DECL_FUNCTION_CODE (olddecl);
> + tree fndecl = builtin_decl_explicit (fncode);
> + gcc_assert (fndecl && !DECL_VISIBILITY_SPECIFIED (fndecl));
> + DECL_VISIBILITY (fndecl) = DECL_VISIBILITY (newdecl);
> + DECL_VISIBILITY_SPECIFIED (fndecl) = 1;
> +   }
>  }
>
>if (TREE_CODE (newdecl) == FUNCTION_DECL)
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 595324d..a12db96 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -1390,7 +1390,11 @@ init_block_move_fn (const char *asmspec)
>TREE_PUBLIC (fn) = 1;
>DECL_ARTIFICIAL (fn) = 1;
>TREE_NOTHROW (fn) = 1;
> -  DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
> +  tree fndecl = builtin_decl_explicit (BUILT_IN_MEMCPY);
> +  if (fndecl)
> +   DECL_VISIBILITY (fn) = DECL_VISIBILITY (fndecl);
> +  else
> +   DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
>DECL_VISIBILITY_SPECIFIED (fn) = 1;
>
>attr_args = build_tree_list (NULL_TREE, build_string (1, "1"));
> @@ -2846,7 +2850,11 @@ init_block_clear_fn (const char *asmspec)
>TREE_PUBLIC (fn) = 1;
>DECL_ARTIFICIAL (fn) = 1;
>TREE_NOTHROW (fn) = 1;
> -  DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
> +  tree fndecl = builtin_decl_explicit (BUILT_IN_MEMSET);
> +  if (fndecl)
> +   DECL_VISIBILITY (fn) = DECL_VISIBILITY (fndecl);
> +  else
> +   DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
>DECL_VISIBILITY_SPECIFIED (fn) = 1;
>
>block_clear_fn = fn;
> diff --git a/gcc/testsuite/gcc.target/i386/pr67220-1.c 
> b/gcc/testsuite/gcc.target/i386/pr67220-1.c
> new file mode 100644
> index 000..06af0ed
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr67220-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target fpic } } */
> +/* { dg-options "-O2 -fPIC" } */
> +
> +typedef __SIZE_TYPE__ size_t;
> +extern void *memcpy (void *, const void *, size_t);
> +extern void *memcpy (void *

PR67945: Fix oscillation between pow representations

2015-10-14 Thread Richard Sandiford
This patch fixes some fallout from my patch to move the sqrt and cbrt
folding rules to match.pd.  The rules included canonicalisations like:

   sqrt(sqrt(x))->pow(x,1/4)

which in the original code was only ever done at the generic level.
My patch meant that we'd do it whenever we tried to fold a gimple
statement, and eventually it would win over the sincos optimisation
that replaces pow(x,1/4) with sqrt(sqrt(x)).

Following a suggestion from Richard B, the patch adds a new
PROP_gimple_* flag to say whether fp routines have been optimised
for the target.  If so, match.pd should only transform calls to math
functions if the result is actually an optimisation, not just an
IL simplification or canonicalisation.  The question then of course
is: which rules are which?  I've added block comments that describe
the criteria I was using.

A slight wart is that we need to use the cfun global to access
the PROP_gimple_* flag; there's no local function pointer available.

Bootstrapped & regression-tested on x86_64-linux-gnu.  Also tested
on powerc64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
PR tree-optimization/67945
* tree-pass.h (PROP_gimple_opt_math): New property flag.
* generic-match-head.c (canonicalize_math_p): New function.
* gimple-match-head.c: Include tree-pass.h.
(canonicalize_math_p): New function.
* match.pd: Group math built-in rules into simplifications
and canonicalizations.  Guard the latter with canonicalize_math_p.
* tree-ssa-math-opts.c (pass_data_cse_sincos): Provide the
PROP_gimple_opt_math property.

diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c
index 0a7038d..94135b3 100644
--- a/gcc/generic-match-head.c
+++ b/gcc/generic-match-head.c
@@ -73,3 +73,12 @@ single_use (tree t ATTRIBUTE_UNUSED)
 {
   return true;
 }
+
+/* Return true if math operations should be canonicalized,
+   e.g. sqrt(sqrt(x)) -> pow(x, 0.25).  */
+
+static inline bool
+canonicalize_math_p ()
+{
+  return true;
+}
diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index cab77a4..f29e97f 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "target.h"
 #include "cgraph.h"
 #include "gimple-match.h"
+#include "tree-pass.h"
 
 
 /* Forward declarations of the private auto-generated matchers.
@@ -827,3 +828,12 @@ single_use (tree t)
 {
   return TREE_CODE (t) != SSA_NAME || has_zero_uses (t) || has_single_use (t);
 }
+
+/* Return true if math operations should be canonicalized,
+   e.g. sqrt(sqrt(x)) -> pow(x, 0.25).  */
+
+static inline bool
+canonicalize_math_p ()
+{
+  return !cfun || (cfun->curr_properties & PROP_gimple_opt_math) == 0;
+}
diff --git a/gcc/match.pd b/gcc/match.pd
index 655c9ff..d319441 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2134,11 +2134,25 @@ along with GCC; see the file COPYING3.  If not see
clearly less optimal and which we'll transform again in forwprop.  */
 
 
-/* Simplification of math builtins.  */
+/* Simplification of math builtins.  These rules must all be optimizations
+   as well as IL simplifications.  If there is a possibility that the new
+   form could be a pessimization, the rule should go in the canonicalization
+   section that follows this one.
 
-/* fold_builtin_logarithm */
-(if (flag_unsafe_math_optimizations)
+   Rules can generally go in this section if they satisfy one of
+   the following:
+
+   - the rule describes an identity
+
+   - the rule replaces calls with something as simple as addition or
+ multiplication
+
+   - the rule contains unary calls only and simplifies the surrounding
+ arithmetic.  (The idea here is to exclude non-unary calls in which
+ one operand is constant and in which the call is known to be cheap
+ when the operand has that value.)  */
 
+(if (flag_unsafe_math_optimizations)
  /* Simplify sqrt(x) * sqrt(x) -> x.  */
  (simplify
   (mult (SQRT@1 @0) @1)
@@ -2151,63 +2165,12 @@ along with GCC; see the file COPYING3.  If not see
(mult (root:s @0) (root:s @1))
 (root (mult @0 @1
 
- /* Simplify pow(x,y) * pow(x,z) -> pow(x,y+z). */
- (simplify
-  (mult (POW:s @0 @1) (POW:s @0 @2))
-   (POW @0 (plus @1 @2)))
-
- /* Simplify pow(x,y) * pow(z,y) -> pow(x*z,y). */
- (simplify
-  (mult (POW:s @0 @1) (POW:s @2 @1))
-   (POW (mult @0 @2) @1))
-
  /* Simplify expN(x) * expN(y) -> expN(x+y). */
  (for exps (EXP EXP2 EXP10 POW10)
   (simplify
(mult (exps:s @0) (exps:s @1))
 (exps (plus @0 @1
 
- /* Simplify tan(x) * cos(x) -> sin(x). */
- (simplify
-  (mult:c (TAN:s @0) (COS:s @0))
-   (SIN @0))
-
- /* Simplify x * pow(x,c) -> pow(x,c+1). */
- (simplify
-  (mult @0 (POW:s @0 REAL_CST@1))
-  (if (!TREE_OVERFLOW (@1))
-   (POW @0 (plus @1 { build_one_cst (type); }
-
- /* Simplify sin(x) / cos(x) -> tan(x). */
- (simplify
-  (rdiv (SIN:s @0) (COS:s @0))
-   (TAN @0))
-
- /* Simplify cos(x) / sin(x) -> 1 / tan(x). *

Re: [PATCH] PR middle-end/67220: GCC fails to properly handle libcall symbol visibility of built functions

2015-10-14 Thread H.J. Lu
On Wed, Oct 14, 2015 at 9:46 AM, Ramana Radhakrishnan
 wrote:
> On Wed, Oct 14, 2015 at 5:21 PM, H.J. Lu  wrote:
>
>> ---
>> gcc/c/
>>
>> PR middle-end/67220
>> * c-decl.c (diagnose_mismatched_decls): Copy explicit visibility
>> to builtin function.
>>
>> gcc/
>>
>> PR middle-end/67220
>> * expr.c (init_block_move_fn): Copy visibility from the builtin
>> memcpy.
>> (init_block_clear_fn): Copy visibility from the builtin memset.
>>
>> gcc/testsuite/
>>
>> PR middle-end/67220
>> * gcc.target/i386/pr67220-1.c: New test.
>> * gcc.target/i386/pr67220-2.c: Likewise.
>> * gcc.target/i386/pr67220-3.c: Likewise.
>> * gcc.target/i386/pr67220-4.c: Likewise.
>> * gcc.target/i386/pr67220-5.c: Likewise.
>> * gcc.target/i386/pr67220-6.c: Likewise.
>
> Why aren't these tests in gcc.dg ?  The problem affects all targets
> not just x86.
>

If I move tests to gcc.dg, would you mind updating them to verify
that they pass on arm?


-- 
H.J.


[PATCH] [PR testsuite/67959]Minor cleanup for ssa-thread-13.c

2015-10-14 Thread Jeff Law


The enum rtx_code bitfield causes grief for arm-eabi.  Given the test 
doesn't actually care about the size of that field, the easiest fix was 
just to make it a simple integer.


Tested on both x86_64-linux-gnu and arm-eabi to ensure the updated test 
passes on both targets.


Installed on the trunk.

Jeff
commit e1eb08886003fdc954c5763ec712109e158f1b0c
Author: Jeff Law 
Date:   Wed Oct 14 13:01:50 2015 -0400

[PATCH] [PR testsuite/67959]Minor cleanup for ssa-thread-13.c

PR testsuite/67959
* gcc.dg/tree-ssa/ssa-thread-13.c: Avoid bitfield assumptions.

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 04dbdcc..8009732 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2015-10-14  Jeff Law  
+
+PR testsuite/67959
+   * gcc.dg/tree-ssa/ssa-thread-13.c: Avoid bitfield assumptions.
+
 2015-10-14  Marek Polacek  
 
* gcc.dg/tree-ssa/reassoc-39.c: Use -g.  Adjust dg-final.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
index 5051d11..99d45f5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
@@ -33,7 +33,7 @@ union rtunion_def
 typedef union rtunion_def rtunion;
 struct rtx_def
 {
-  __extension__ enum rtx_code code:16;
+  int code;
   union u
   {
 rtunion fld[1];


[gomp4] remove dead code

2015-10-14 Thread Nathan Sidwell
I've committed this to gomp4 branch.  It removes some now unreachable code and 
removes the now bogus description about OpenACC.


nathan
2015-10-14  Nathan Sidwell  

	* omp-low.c (lower_reduction_clauses): Correct comment, remove
	unreachable code.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 228810)
+++ gcc/omp-low.c	(working copy)
@@ -5088,9 +5115,7 @@ lower_oacc_head_tail (location_t loc, tr
   lower_oacc_loop_marker (loc, false, NULL_TREE, tail);
 }
 
-/* Generate code to implement the REDUCTION clauses.  OpenACC reductions
-   are usually executed in parallel, but they fallback to sequential code for
-   known single-threaded regions.  */
+/* Generate code to implement the REDUCTION clauses.  */
 
 static void
 lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
@@ -5153,23 +5178,11 @@ lower_reduction_clauses (tree clauses, g
 
 	  addr = save_expr (addr);
 
-	  if (is_gimple_omp_oacc (ctx->stmt)
-	  && (ctx->gwv_this == 0))
-	{
-	  /* This reduction is done sequentially in OpenACC by a single
-		 thread.  There is no need to use atomics.  */
-	  x = build2 (code, TREE_TYPE (ref), ref, new_var);
-	  ref = build_outer_var_ref (var, ctx);
-	  gimplify_assign (ref, x, stmt_seqp);
-	}
-	  else
-	{
-	  ref = build1 (INDIRECT_REF, TREE_TYPE (TREE_TYPE (addr)), addr);
-	  x = fold_build2_loc (clause_loc, code, TREE_TYPE (ref), ref,
-   new_var);
-	  x = build2 (OMP_ATOMIC, void_type_node, addr, x);
-	  gimplify_and_add (x, stmt_seqp);
-	}
+	  ref = build1 (INDIRECT_REF, TREE_TYPE (TREE_TYPE (addr)), addr);
+	  x = fold_build2_loc (clause_loc, code, TREE_TYPE (ref), ref,
+			   new_var);
+	  x = build2 (OMP_ATOMIC, void_type_node, addr, x);
+	  gimplify_and_add (x, stmt_seqp);
 
 	  return;
 	}


[PATCH] Split ssa-dom-thread-2.c into separate files/tests

2015-10-14 Thread Jeff Law


ssa-dom-thread-2.c is actually 6 distinct tests crammed into a single 
file.  That's normally not a huge problem, but it can make tests hard to 
write when we're scanning dumps.


This patch splits it into 6 distinct tests.  ssa-dom-thread-2[a-f].c. 
It also tightens the expected output slightly for each test and adds 
further comments to the tests.



Tested on x86_64-linux-gnu.  Installed on the trunk.

Jeff
commit 8715d70d2da7ab22a793437c6298c51a6fbae70f
Author: Jeff Law 
Date:   Wed Oct 14 13:09:29 2015 -0400

[PATCH] Split ssa-dom-thread-2.c into separate files/tests

* gcc.dg/tree-ssa/ssa-dom-thread-2.c: Deleted.  The six functions
contained within have their own file/test now.
* gcc.dg/tree-ssa/ssa-dom-thread-2a.c: New test extracted from
ssa-dom-thread-2.c.  Tighten expected output slightly and comment
expectations a bit more.
* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-2c.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-2d.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-2e.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-2f.c: Likewise.

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 8009732..f45ab81 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,16 @@
 2015-10-14  Jeff Law  
 
+   * gcc.dg/tree-ssa/ssa-dom-thread-2.c: Deleted.  The six functions
+   contained within have their own file/test now.
+   * gcc.dg/tree-ssa/ssa-dom-thread-2a.c: New test extracted from
+   ssa-dom-thread-2.c.  Tighten expected output slightly and comment
+   expectations a bit more.
+   * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Likewise.
+   * gcc.dg/tree-ssa/ssa-dom-thread-2c.c: Likewise.
+   * gcc.dg/tree-ssa/ssa-dom-thread-2d.c: Likewise.
+   * gcc.dg/tree-ssa/ssa-dom-thread-2e.c: Likewise.
+   * gcc.dg/tree-ssa/ssa-dom-thread-2f.c: Likewise.
+
 PR testsuite/67959
* gcc.dg/tree-ssa/ssa-thread-13.c: Avoid bitfield assumptions.
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2.c
deleted file mode 100644
index bb697d1..000
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2.c
+++ /dev/null
@@ -1,117 +0,0 @@
-/* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom1-stats" } */
-
-void foo();
-void bla();
-void bar();
-
-/* In the following two cases, we should be able to thread edge through
-   the loop header.  */
-
-void thread_entry_through_header (void)
-{
-  int i;
-
-  for (i = 0; i < 170; i++)
-bla ();
-}
-
-void thread_latch_through_header (void)
-{
-  int i = 0;
-  int first = 1;
-
-  do
-{
-  if (first)
-   foo ();
-
-  first = 0;
-  bla ();
-} while (i++ < 100);
-}
-
-/* This is a TODO -- it is correct to thread both entry and latch edge through
-   the header, but we do not handle this case yet.  */
-
-void dont_thread_1 (void)
-{
-  int i = 0;
-  int first = 1;
-
-  do
-{
-  if (first)
-   foo ();
-  else
-   bar ();
-
-  first = 0;
-  bla ();
-} while (i++ < 100);
-}
-
-/* Avoid threading in the following two cases, to prevent creating subloops.  
*/
-
-void dont_thread_2 (int first)
-{
-  int i = 0;
-
-  do
-{
-  if (first)
-   foo ();
-  else
-   bar ();
-
-  first = 0;
-  bla ();
-} while (i++ < 100);
-}
-
-void dont_thread_3 (int nfirst)
-{
-  int i = 0;
-  int first = 0;
-
-  do
-{
-  if (first)
-   foo ();
-  else
-   bar ();
-
-  first = nfirst;
-  bla ();
-} while (i++ < 100);
-}
-
-/* Avoid threading in this case, in order to avoid creating loop with
-   multiple entries.  */
-
-void dont_thread_4 (int a, int nfirst)
-{
-  int i = 0;
-  int first;
-
-  if (a)
-first = 0;
-  else
-first = 1;
-
-  do
-{
-  if (first)
-   foo ();
-  else
-   bar ();
-
-  first = nfirst;
-  bla ();
-} while (i++ < 100);
-}
-
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "vrp1"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 0 "vrp1"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 0 "dom1"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 1 "dom1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
new file mode 100644
index 000..73d0ccf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */ 
+/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom1-stats" } */
+
+void bla();
+
+/* In the following case, we should be able to thread edge through
+   the loop header.  */
+
+void thread_entry_through_header (void)
+{
+  int i;
+
+  for (i = 0; i < 170; i++)
+bla ();
+}
+
+/* There's a single jump thread that should be handled by the VRP
+   jump thr

[PATCH] print help for undocumented options

2015-10-14 Thread Martin Sebor

Gcc's online help (the output of gcc --help -v) includes a large
number of undocumented options (197 in 5.1.0).  For example, the
section listing language-related options starts with the following
and another 44 or so undocumented options:

  The following options are language-related:
--all-warnings   This switch lacks documentation
--ansi   This switch lacks documentation
--assert This switch lacks documentation
...

It turns out that all of those in the section above and a good
number of others are synonyms for other options that are in fact
documented.  Rather than duplicating the documentation for the
alternate options, the small patchlet below modifies the
print_filtered_help function to print the help for the documented
alias along with its name.  With it applied, the number of options
that "lack documentation" drops to 114, and the section above looks
like this:

  The following options are language-related:
--all-warnings   Enable most warning messages.  Same
 as -Wall
--ansi   A synonym for -std=c89 (for C) or
 -std=c++98 (for C++).  Same as -ansi
-A=Assert the  to .
 Putting '-' before  disables
 the  to .  Same as -A
-A=Assert the  to .
 Putting '-' before  disables
 the  to .  Same as -A

2015-10-14  Martin Sebor  

* options.c (print_filtered_help): Print help for aliased
option and its name instead of undocumented text for
undocumented options.

diff --git a/gcc/opts.c b/gcc/opts.c
index 2bbf653..e441924 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1010,7 +1010,7 @@ print_filtered_help (unsigned int include_flags,
   const char *help;
   bool found = false;
   bool displayed = false;
-  char new_help[128];
+  char new_help[256];

   if (include_flags == CL_PARAMS)
 {
@@ -1086,6 +1086,23 @@ print_filtered_help (unsigned int include_flags,
{
  if (exclude_flags & CL_UNDOCUMENTED)
continue;
+
+ if (option->alias_target < N_OPTS
+ && cl_options [option->alias_target].help)
+   {
+ /* For undocumented options that are aliases for other
+options that are documented, print the other option's
+help and name.  */
+ help = cl_options [option->alias_target].help;
+
+ snprintf (new_help, sizeof new_help, "%s", help);
+ snprintf (new_help + strlen (new_help),
+   sizeof new_help - strlen (new_help),
+   ".  Same as %s",
+   cl_options [option->alias_target].opt_text);
+ help = new_help;
+   }
+ else
help = undocumented_msg;
}



Re: [PATCH] PR middle-end/67220: GCC fails to properly handle libcall symbol visibility of built functions

2015-10-14 Thread Ramana Radhakrishnan
On Wed, Oct 14, 2015 at 5:51 PM, H.J. Lu  wrote:
> On Wed, Oct 14, 2015 at 9:46 AM, Ramana Radhakrishnan
>  wrote:
>> On Wed, Oct 14, 2015 at 5:21 PM, H.J. Lu  wrote:
>>
>>> ---
>>> gcc/c/
>>>
>>> PR middle-end/67220
>>> * c-decl.c (diagnose_mismatched_decls): Copy explicit visibility
>>> to builtin function.
>>>
>>> gcc/
>>>
>>> PR middle-end/67220
>>> * expr.c (init_block_move_fn): Copy visibility from the builtin
>>> memcpy.
>>> (init_block_clear_fn): Copy visibility from the builtin memset.
>>>
>>> gcc/testsuite/
>>>
>>> PR middle-end/67220
>>> * gcc.target/i386/pr67220-1.c: New test.
>>> * gcc.target/i386/pr67220-2.c: Likewise.
>>> * gcc.target/i386/pr67220-3.c: Likewise.
>>> * gcc.target/i386/pr67220-4.c: Likewise.
>>> * gcc.target/i386/pr67220-5.c: Likewise.
>>> * gcc.target/i386/pr67220-6.c: Likewise.
>>
>> Why aren't these tests in gcc.dg ?  The problem affects all targets
>> not just x86.
>>
>
> If I move tests to gcc.dg, would you mind updating them to verify
> that they pass on arm?


It's not just a question of ARM. This affects all targets that support
symbol visibility in shared libraries ... please do the math as to how
many targets are affected. Now reading the test even more, it appears
that you also need a dg-require-visibility so that this test is run on
all targets that support symbol visibility.

The test as written uses target_fpic - target-supports.exp does not
have anything ARM specific about handling this - there is some m68k ,
so it should just work on any target that supports symbol visibility.
If it doesn't, that target has an issue and then folks interested in
that target will do something about it. Even if you don't fix the
issue on every target, please have the courtesy of letting them find
it in some sort of automatic manner rather than auditing every single
commit in every gcc.target directory.


regards
Ramana


>
>
> --
> H.J.


Re: [PATCH] print help for undocumented options

2015-10-14 Thread Joseph Myers
On Wed, 14 Oct 2015, Martin Sebor wrote:

> + /* For undocumented options that are aliases for other
> +options that are documented, print the other option's
> +help and name.  */
> + help = cl_options [option->alias_target].help;
> +
> + snprintf (new_help, sizeof new_help, "%s", help);
> + snprintf (new_help + strlen (new_help),
> +   sizeof new_help - strlen (new_help),
> +   ".  Same as %s",
> +   cl_options [option->alias_target].opt_text);

Obviously this English string needs to be marked for translation.

There is no consistency about whether option descriptions in .opt files 
end with ".", so this might produce "..  Same as" in some cases.  While I 
think the .opt files should be made consistent, I also think it would be 
better just to give the "Same as" message without also repeating the 
description of the canonical option.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Handle CONSTRUCTOR in operand_equal_p

2015-10-14 Thread Jeff Law

On 10/14/2015 10:27 AM, Jan Hubicka wrote:

Hi,
this patch adds the CONSTRUCTOR case discussed while back.  Only empty
constructors are matched, as those are only appearing in gimple operand.
I tested that during bootstrap about 7500 matches are for empty ctors.
There are couple hundred for non-empty probably used on generic.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* fold-const.c (operand_equal_p): Match empty constructors.
OK.  It'd be useful to have a test which shows that matching these 
results in some kind of difference we can see in the dump files.


jeff



Re: [patch] Minor adjustment to gimplify_addr_expr

2015-10-14 Thread Jeff Law

On 10/14/2015 09:59 AM, Eric Botcazou wrote:

Hi,

this is the regression of ACATS c37213k at -O2 with an upcoming change in
the front-end of the Ada compiler:

eric@polaris:~/gnat/gnat-head/native> gcc/gnat1 -quiet c37213k.adb -I
/home/eric/gnat/bugs/support -O2
+===GNAT BUG DETECTED==+
| Pro 7.4.0w (20151014-60) (x86_64-suse-linux) GCC error:  |
| tree check: expected class 'expression', have  |
| 'exceptional' (ssa_name) in tree_operand_check, at tree.h:3431|
| Error detected around c37213k.adb:95:37

It's recompute_tree_invariant_for_addr_expr receiving an SSA_NAME instead of
an ADDR_EXPR when called from gimplify_addr_expr.  The sequence is as follows:
we start with this GIMPLE statement:

   *R.43_60 = c37213k.B_1.B_6.proc6.B_4.B_5.value (); [static-chain: &FRAME.60]
[return slot optimization]

Then IPA clones the function and turns the statement into:

   MEM[(struct c37213k__B_1__B_6__proc6__B_4__nrec *)R.43_46] =
c37213k.B_1.B_6.proc6.B_4.B_5.value (); [static-chain: &FRAME.60] [return slot
optimization]

The 'value' function has been NRVed and contains:

   .builtin_memcpy (&, _9, _10);

and gets inlined so the above statement is rewritten into:

.builtin_memcpy (&MEM[(struct c37213k__B_1__B_6__proc6__B_4__nrec *)R.43_46],
_174, _175);

so gimplify_addr_expr is invoked on:

   &MEM[(struct c37213k__B_1__B_6__proc6__B_4__nrec *)R.43_46]

and gets confused because it doesn't see that it's just R.43_46 (it would have
seen it if the original INDIRECT_REF was still present in lieu of MEM_REF).

Hence the attached fixlet.  Tested on x86_64-suse-linux, OK for the mainline?


2015-10-14  Eric Botcazou  

* gimplify.c (gimplify_addr_expr) : New case.

Can you use the TMR_OFFSET macro rather than TREE_OPERAND (op0, 1)?

It also seems that you need a stronger check here.

Essentially you have to verify that
STEP * INDEX + INDEX2 + OFFSET == 0

Right?

Jeff






Re: [Patch PR target/67366 2/2] [gimple-fold.c] Support movmisalign optabs in gimple-fold.c

2015-10-14 Thread Jeff Law

On 10/08/2015 08:10 AM, Ramana Radhakrishnan wrote:

This patch by Richard allows for movmisalign optabs to be supported
in gimple-fold.c. This caused a bit of pain in the testsuite with strlenopt-8.c
in conjunction with the ARM support for movmisalign_optabs as the test
was coded up to do different things depending on whether the target
supported misaligned access or not. However now with unaligned access
being allowed for different levels of the architecture in the arm backend,
the concept of the helper function non_strict_align mapping identically
to the definition of STRICT_ALIGNMENT disappears.

Adjusted thusly for ARM. The testsuite/lib changes were tested with an
arm-none-eabi multilib that included architecture variants that did not
support unaligned access and architecture variants that did.

The testing matrix for this patch was:

1. x86_64 bootstrap and regression test - no regressions.
2. armhf bootstrap and regression test - no regressions.
3. arm-none-eabi cross build and regression test for

{-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp}
{-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard}
{-marm/-mcpu=arm7tdmi/-mfloat-abi=soft}
{-mthumb/-mcpu=arm7tdmi/-mfloat-abi=soft}

with no regressions.

Ok to apply ?

Ramana

2015-10-08  Richard Biener  

* gimple-fold.c (optabs-query.h): Include
(gimple_fold_builtin_memory_op): Allow unaligned stores
when movmisalign_optabs are available.

2015-10-08  Ramana Radhakrishnan  

PR target/67366
* lib/target-supports.exp (check_effective_target_non_strict_align):
Adjust for arm*-*-*.
* gcc.target/arm/pr67366.c: New test.

OK.
jeff



Re: Handle CONSTRUCTOR in operand_equal_p

2015-10-14 Thread Jan Hubicka
> On 10/14/2015 10:27 AM, Jan Hubicka wrote:
> >Hi,
> >this patch adds the CONSTRUCTOR case discussed while back.  Only empty
> >constructors are matched, as those are only appearing in gimple operand.
> >I tested that during bootstrap about 7500 matches are for empty ctors.
> >There are couple hundred for non-empty probably used on generic.
> >
> >Bootstrapped/regtested x86_64-linux, OK?
> >
> >Honza
> >
> > * fold-const.c (operand_equal_p): Match empty constructors.
> OK.  It'd be useful to have a test which shows that matching these
> results in some kind of difference we can see in the dump files.

I will try to think of something.  My main motivation is to get ipa-icf-gimple
and operand_equal_p into sync and hopefully comonize the logic.  Having testcase
then will be a lot easier (any function with empty constructor will do)

Honza
> 
> jeff


Re: using scratchpads to enhance RTL-level if-conversion: revised patch

2015-10-14 Thread Jeff Law

On 10/13/2015 02:16 PM, Bernd Schmidt wrote:

_Potentially_ so, yes.  However, GCC is free to put the allocation into
an otherwise-unused part of the stack frame.


Well, I looked at code generation changes, and it usually seems to come
with an increase in stack frame size - sometimes causing extra
instructions to be emitted.

I think that's essentially unavoidable when we end up using the scratchpad.




However, why do we need to allocate anything in the first place?

 > If you want to store something that will be thrown away,
 > just pick an address below the stack pointer.

Because allocating a scratchpad should work on all relevant targets.  We
do not have the resources to test on all GCC-supported
CPU ISAs and on all GCC-supported OSes, and we would like to have an
optimization that works on as many targets as makes sense
[those with cmove-like ability and withOUT full-blown conditional
execution].


Yeah, but if you put in a new facility like this, chances are
maintainers for active targets will pick it up and add the necessary
hooks. That's certainly what happened with shrink-wrapping. So I don't
think this is a concern.
But won't you get valgrind warnings if the code loads/stores outside the 
defined stack?  While we know it's safe, the warnings from valgrind will 
likely cause a backlash of user complaints.




I'm afraid I'll have to reject the patch then, on these grounds:
  * it may pessimize code
  * it does not even estimate costs to attempt avoiding this
  * a much simpler, more efficient implementation is possible.

Noted.  I think the pessimization is the area were folks are most concerned.

Obviously some pessimization relative to current code is necessary to 
fix some of the problems WRT thread safety and avoiding things like 
introducing faults in code which did not previously fault.


However, pessimization of safe code is, err, um, bad and needs to be 
avoided.


Jeff


Re: using scratchpads to enhance RTL-level if-conversion: revised patch

2015-10-14 Thread Jeff Law

On 10/14/2015 02:28 AM, Eric Botcazou wrote:

If you're using one of the switches that checks for stack overflow at the
start of the function, you certainly don't want to do any such stores.


There is a protection area for -fstack-check (STACK_CHECK_PROTECT bytes) so
you can do stores just below the stack pointer as far as it's concerned.

There is indeed the issue of the mere writing below the stack pointer.  Our
experience with various OSes and architectures shows that this almost always
works.  The only problematic case is x86{-64}/Linux historically, where you
cannot write below the page pointed to by the stack pointer (that's why there
is a specific implementation of -fstack-check for x86{-64}/Linux).

It was problematical on the PA, but I can't recall precisely why.

The thing we need to remember here is that if we do somethig like use 
space just below the stack pointer, valgrind is probably going to start 
complaining (and legitimately so).


While we know the result is throw-away, valgrind doesn't and the 
complains and noise from this would IMHO outweigh the benefits from 
using the trick of reading outside the defined stack area.


jeff


[PATCH] Fix accounting for num_threaded_edges

2015-10-14 Thread Jeff Law


tree-ssa-threadupdate.c keeps  running total of the number of edges it 
threads.  Those totals are useful debugging tools and are also examined 
by the testsuite.


While looking at the effects of using the FSM threader on 
ssa-dom-thread-2?.c I noticed the counters weren't being updated 
properly for FSM threads.


This patch fixes that minor goof.

Bootstrapped & regression tested on x86_64-linux-gnu.  Installed on the 
trunk.


Jeff
commit 05dc98161472ce2e3d5f68bfcfca907deac03140
Author: Jeff Law 
Date:   Wed Oct 14 11:52:01 2015 -0600

[PATCH] Fix accounting for num_threaded_edges

* tree-ssa-threadupdate.c (thread_through_all_blocks): Bump
num_threaded_edges for successful FSM threads too.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a555d2b..7c64fa8 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2015-10-14  Jeff Law  
+
+   * tree-ssa-threadupdate.c (thread_through_all_blocks): Bump
+   num_threaded_edges for successful FSM threads too.
+
 2015-10-14  Richard Biener  
 
* tree-vectorizer.h (vect_is_simple_use): Remove unused parameters.
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index e426c1d..5632a88 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2672,6 +2672,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
  free_dominance_info (CDI_DOMINATORS);
  bitmap_set_bit (threaded_blocks, entry->src->index);
  retval = true;
+ thread_stats.num_threaded_edges++;
}
 
   delete_jump_thread_path (path);


Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Adhemerval Zanella


On 14-10-2015 04:54, Jakub Jelinek wrote:
> On Tue, Oct 13, 2015 at 07:54:33PM +0300, Maxim Ostapenko wrote:
>> On 13/10/15 14:15, Maxim Ostapenko wrote:
>>> This is the raw merge itself. I'm bumping SONAME to libasan.so.3.
>>>
>>> -Maxim
>>
>> I have just noticed that I've misused autoconf stuff (used wrong version).
>> Here a fixed version of the same patch. Sorry for inconvenience.
> 
> Is libubsan, libtsan backwards compatible, or do we want to change SONAME
> there too?
> 
> The aarch64 changes are terrible, not just because it doesn't yet have
> runtime decision on what VA to use or that it doesn't support 48-bit VA,
> but also that for the 42-bit VA it uses a different shadow offset from
> 39-bit VA.  But on the compiler side we have just one...
> Though, only the 39-bit VA is enabled right now by default, so out of the
> box the state is as bad as we had in 5.x - users wanting 42-bit VA or 48-bit
> VA have to patch libsanitizer.

Yes we are aware with current deficiencies for aarch64 with a 39-bit and 
42-bit vma and the lack of support of 48-bit vma. On LLVM side current 
approach is to built compiler support for either 39 or 42 bit (and again
we are aware this is not the ideal approach). This approach was used
mainly for validation and buildbot enablement.

I am currently working on a way to make the sanitizer (asan and msan)
VMA independent on aarch64 (aiming to current support 39/42 and later 
48-bit vma) and the approach we decide to use is to change the 
instrumentation to use a parametrized value to compute the shadow memory
regions (based on VMA address) which will be initialized externally
by the  ibsanitizer. TSAN is somewhat easier since instrumentation 
does not  take in consideration the VMA (only the libsanitizer itself).

The idea is to avoid compiler switches a make is transparent to run
the binary regardless of the VMA in the system. The downside is 
instrumentation will require more steps (to lead the parametrized
value to compute shadow memory) and thus slower.



> 
> Have you verified libbacktrace sanitization still works properly (that is
> something upstream does not test)?
> 
> Do you plan to update the asan tests we have to reflect the changes in
> upstream?
> 
>   Jakub
> 


Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Evgenii Stepanov
On Wed, Oct 14, 2015 at 11:03 AM, Adhemerval Zanella
 wrote:
>
>
> On 14-10-2015 04:54, Jakub Jelinek wrote:
>> On Tue, Oct 13, 2015 at 07:54:33PM +0300, Maxim Ostapenko wrote:
>>> On 13/10/15 14:15, Maxim Ostapenko wrote:
 This is the raw merge itself. I'm bumping SONAME to libasan.so.3.

 -Maxim
>>>
>>> I have just noticed that I've misused autoconf stuff (used wrong version).
>>> Here a fixed version of the same patch. Sorry for inconvenience.
>>
>> Is libubsan, libtsan backwards compatible, or do we want to change SONAME
>> there too?
>>
>> The aarch64 changes are terrible, not just because it doesn't yet have
>> runtime decision on what VA to use or that it doesn't support 48-bit VA,
>> but also that for the 42-bit VA it uses a different shadow offset from
>> 39-bit VA.  But on the compiler side we have just one...
>> Though, only the 39-bit VA is enabled right now by default, so out of the
>> box the state is as bad as we had in 5.x - users wanting 42-bit VA or 48-bit
>> VA have to patch libsanitizer.
>
> Yes we are aware with current deficiencies for aarch64 with a 39-bit and
> 42-bit vma and the lack of support of 48-bit vma. On LLVM side current
> approach is to built compiler support for either 39 or 42 bit (and again
> we are aware this is not the ideal approach). This approach was used
> mainly for validation and buildbot enablement.
>
> I am currently working on a way to make the sanitizer (asan and msan)
> VMA independent on aarch64 (aiming to current support 39/42 and later
> 48-bit vma) and the approach we decide to use is to change the
> instrumentation to use a parametrized value to compute the shadow memory
> regions (based on VMA address) which will be initialized externally
> by the  ibsanitizer. TSAN is somewhat easier since instrumentation
> does not  take in consideration the VMA (only the libsanitizer itself).

Wait. As Jakub correctly pointed out in the other thread, there is no
obvious reason why there could not be a single shadow offset value
that would work for all 3 possible VMA settings. I suggest figuring
this out first.

>
> The idea is to avoid compiler switches a make is transparent to run
> the binary regardless of the VMA in the system. The downside is
> instrumentation will require more steps (to lead the parametrized
> value to compute shadow memory) and thus slower.
>
>
>
>>
>> Have you verified libbacktrace sanitization still works properly (that is
>> something upstream does not test)?
>>
>> Do you plan to update the asan tests we have to reflect the changes in
>> upstream?
>>
>>   Jakub
>>


Re: [PATCH] print help for undocumented options

2015-10-14 Thread Martin Sebor

On 10/14/2015 11:24 AM, Joseph Myers wrote:

On Wed, 14 Oct 2015, Martin Sebor wrote:


+ /* For undocumented options that are aliases for other
+options that are documented, print the other option's
+help and name.  */
+ help = cl_options [option->alias_target].help;
+
+ snprintf (new_help, sizeof new_help, "%s", help);
+ snprintf (new_help + strlen (new_help),
+   sizeof new_help - strlen (new_help),
+   ".  Same as %s",
+   cl_options [option->alias_target].opt_text);


Obviously this English string needs to be marked for translation.

There is no consistency about whether option descriptions in .opt files
end with ".", so this might produce "..  Same as" in some cases.  While I
think the .opt files should be made consistent, I also think it would be
better just to give the "Same as" message without also repeating the
description of the canonical option.


Thanks for the quick review.

I tweaked the patch to work around the .opt file inconsistency and
always print a period (the driver itself doesn't print one but that
can be fixed if/once we agree that this is a good way to deal with
it).  I also annotated the "Same as" text so that it can be
translated.

IMO, printing the aliased option's help text makes using the output
easier for users who find the undocumented option first, in that
they don't then have to go look for the one that does have
documentation, so I left that part in place.  If you or someone
feels strongly that it shouldn't be there I'll remove it.

Let me know your thoughts on this version.

Martin

2015-10-14  Martin Sebor  

* options.c (wrap_help): End last sentence in a period.
(print_filtered_help): Print help for aliased option and its name
instead of undocumented text for undocumented options.


--git a/gcc/opts.c b/gcc/opts.c
index 2bbf653..7debf33 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -960,9 +960,19 @@ wrap_help (const char *help,
 {
   unsigned int col_width = LEFT_COLUMN;
   unsigned int remaining, room, len;
+  char *new_help = NULL;

   remaining = strlen (help);
-
+  if (remaining && help [remaining - 1] != '.')
+{
+  /* Help text in .opt files doesn't always end in a period.  Make
+	 it constistent by appending a period when it's missing.  */
+  new_help = XNEWVEC (char, remaining + 2);
+  memcpy (new_help, help, remaining);
+  new_help [remaining++] = '.';
+  new_help [remaining] = '\0';
+  help = new_help;
+}
   do
 {
   room = columns - 3 - MAX (col_width, item_width);
@@ -995,6 +1005,9 @@ wrap_help (const char *help,
   remaining -= len;
 }
   while (remaining);
+
+  /* Free the help text allocated by this function.  */
+  XDELETEVEC (const_cast(new_help));
 }

 /* Print help for a specific front-end, etc.  */
@@ -1010,7 +1023,7 @@ print_filtered_help (unsigned int include_flags,
   const char *help;
   bool found = false;
   bool displayed = false;
-  char new_help[128];
+  char new_help[256];

   if (include_flags == CL_PARAMS)
 {
@@ -1086,9 +1099,41 @@ print_filtered_help (unsigned int include_flags,
 	{
 	  if (exclude_flags & CL_UNDOCUMENTED)
 	continue;
+
+	  if (option->alias_target < N_OPTS
+	  && cl_options [option->alias_target].help)
+	{
+	  /* For undocumented options that are aliases for other
+		 options that are documented, print the other option's
+		 help and name.  */
+	  help = cl_options [option->alias_target].help;
+
+	  const char *alias_text =
+		cl_options [option->alias_target].opt_text;
+
+	  /* End the alias help with a period if it doesn't.  */
+	  const char *maybe_period =
+		alias_text [strlen (alias_text) - 1] == '.' ? "" : ".";
+
+	  snprintf (new_help, sizeof new_help, "%s", help);
+	  snprintf (new_help + strlen (new_help),
+			sizeof new_help - strlen (new_help),
+			_("%s  Same as %s"),
+			maybe_period,
+			cl_options [option->alias_target].opt_text);
+	  help = new_help;
+	}
+	  else
 	help = undocumented_msg;
 	}

+  if (option->warn_message)
+	{
+	  snprintf (new_help, sizeof new_help, ">>>%s<<<",
+		option->warn_message);
+	  help = new_help;
+	}
+
   /* Get the translation.  */
   help = _(help);


[PATCH] Fix wrong-code when folding X - (X / Y) * Y (PR tree-optimization/67953)

2015-10-14 Thread Marek Polacek
Evidently, the X - (X / Y) * Y -> X % Y pattern can't change the
signedness of X from signed to unsigned, otherwise we'd generate
wrong code.  (But unsigned -> signed should be fine.)

Does anyone see a better fix than this?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-10-14  Marek Polacek  

PR tree-optimization/67953
* match.pd (X - (X / Y) * Y): Don't change signedness of @0.

* gcc.dg/fold-minus-6.c (fn4): Change the type of A to
unsigned.
* gcc.dg/torture/pr67953.c: New test.

diff --git gcc/match.pd gcc/match.pd
index 655c9ff..24e19a9 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -267,7 +267,8 @@ along with GCC; see the file COPYING3.  If not see
 /* X - (X / Y) * Y is the same as X % Y.  */
 (simplify
  (minus (convert1? @0) (convert2? (mult (trunc_div @0 @1) @1)))
- (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+  && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (type))
   (trunc_mod (convert @0) (convert @1
 
 /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
diff --git gcc/testsuite/gcc.dg/fold-minus-6.c 
gcc/testsuite/gcc.dg/fold-minus-6.c
index 1c22c25..1535452 100644
--- gcc/testsuite/gcc.dg/fold-minus-6.c
+++ gcc/testsuite/gcc.dg/fold-minus-6.c
@@ -20,7 +20,7 @@ fn3 (long int x)
 }
 
 int
-fn4 (int a, int b)
+fn4 (unsigned int a, int b)
 {
   return a - (unsigned) ((a / b) * b);
 }
diff --git gcc/testsuite/gcc.dg/torture/pr67953.c 
gcc/testsuite/gcc.dg/torture/pr67953.c
index e69de29..5ce399b 100644
--- gcc/testsuite/gcc.dg/torture/pr67953.c
+++ gcc/testsuite/gcc.dg/torture/pr67953.c
@@ -0,0 +1,42 @@
+/* PR tree-optimization/67953 */
+/* { dg-do run } */
+
+unsigned int
+fn1 (signed int a)
+{
+  return (unsigned int) a - ((a / 3) * 3);
+}
+
+unsigned int
+fn2 (signed int a)
+{
+  return a - ((a / 3) * 3);
+}
+
+unsigned int
+fn3 (int a)
+{
+  return a - (unsigned) ((a / 3) * 3);
+}
+
+signed int
+fn4 (int a)
+{
+  return (unsigned) a - (unsigned) ((a / 3) * 3);
+}
+
+signed int
+fn5 (unsigned int a)
+{
+  return (signed) a - (int) ((a / 3) * 3);
+}
+
+int
+main ()
+{
+  if (fn1 (-5) != -2
+  || fn2 (-5) != -2
+  || fn3 (-5) != -2
+  || fn4 (-5) != -2)
+__builtin_abort ();
+}

Marek


Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Renato Golin
On 14 October 2015 at 19:21, Evgenii Stepanov  wrote:
> Wait. As Jakub correctly pointed out in the other thread, there is no
> obvious reason why there could not be a single shadow offset value
> that would work for all 3 possible VMA settings. I suggest figuring
> this out first.

We are.

cheers,
--renato


Re: Handle CONSTRUCTOR in operand_equal_p

2015-10-14 Thread Richard Biener
On October 14, 2015 6:27:02 PM GMT+02:00, Jan Hubicka  wrote:
>Hi,
>this patch adds the CONSTRUCTOR case discussed while back.  Only empty
>constructors are matched, as those are only appearing in gimple
>operand.
>I tested that during bootstrap about 7500 matches are for empty ctors.
>There are couple hundred for non-empty probably used on generic. 
>
>Bootstrapped/regtested x86_64-linux, OK?
>
>Honza
>
>   * fold-const.c (operand_equal_p): Match empty constructors.
>Index: fold-const.c
>===
>--- fold-const.c   (revision 228735)
>+++ fold-const.c   (working copy)
>@@ -2890,6 +2891,11 @@ operand_equal_p (const_tree arg0, const_
>   return operand_equal_p (TREE_OPERAND (arg0, 0), TREE_OPERAND (arg1,
>0),
>   flags | OEP_ADDRESS_OF
>   | OEP_CONSTANT_ADDRESS_OF);
>+  case CONSTRUCTOR:
>+  /* In GIMPLE empty constructors are allowed in initializers of
>+ vector types.  */

The comment is wrong (or at least odd),
On gimple an empty vector constructor should be folded to a VECTOR_CST.

>+  return (!vec_safe_length (CONSTRUCTOR_ELTS (arg0))
>+  && !vec_safe_length (CONSTRUCTOR_ELTS (arg1)));
>   default:
>   break;
>   }




Re: [PATCH] Fix wrong-code when folding X - (X / Y) * Y (PR tree-optimization/67953)

2015-10-14 Thread Richard Biener
On October 14, 2015 8:27:31 PM GMT+02:00, Marek Polacek  
wrote:
>Evidently, the X - (X / Y) * Y -> X % Y pattern can't change the
>signedness of X from signed to unsigned, otherwise we'd generate
>wrong code.  (But unsigned -> signed should be fine.)
>
>Does anyone see a better fix than this?
>
>Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

Thanks,
Richard.

>2015-10-14  Marek Polacek  
>
>   PR tree-optimization/67953
>   * match.pd (X - (X / Y) * Y): Don't change signedness of @0.
>
>   * gcc.dg/fold-minus-6.c (fn4): Change the type of A to
>   unsigned.
>   * gcc.dg/torture/pr67953.c: New test.
>
>diff --git gcc/match.pd gcc/match.pd
>index 655c9ff..24e19a9 100644
>--- gcc/match.pd
>+++ gcc/match.pd
>@@ -267,7 +267,8 @@ along with GCC; see the file COPYING3.  If not see
> /* X - (X / Y) * Y is the same as X % Y.  */
> (simplify
>  (minus (convert1? @0) (convert2? (mult (trunc_div @0 @1) @1)))
>- (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
>+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
>+  && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (type))
>   (trunc_mod (convert @0) (convert @1
> 
> /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
>diff --git gcc/testsuite/gcc.dg/fold-minus-6.c
>gcc/testsuite/gcc.dg/fold-minus-6.c
>index 1c22c25..1535452 100644
>--- gcc/testsuite/gcc.dg/fold-minus-6.c
>+++ gcc/testsuite/gcc.dg/fold-minus-6.c
>@@ -20,7 +20,7 @@ fn3 (long int x)
> }
> 
> int
>-fn4 (int a, int b)
>+fn4 (unsigned int a, int b)
> {
>   return a - (unsigned) ((a / b) * b);
> }
>diff --git gcc/testsuite/gcc.dg/torture/pr67953.c
>gcc/testsuite/gcc.dg/torture/pr67953.c
>index e69de29..5ce399b 100644
>--- gcc/testsuite/gcc.dg/torture/pr67953.c
>+++ gcc/testsuite/gcc.dg/torture/pr67953.c
>@@ -0,0 +1,42 @@
>+/* PR tree-optimization/67953 */
>+/* { dg-do run } */
>+
>+unsigned int
>+fn1 (signed int a)
>+{
>+  return (unsigned int) a - ((a / 3) * 3);
>+}
>+
>+unsigned int
>+fn2 (signed int a)
>+{
>+  return a - ((a / 3) * 3);
>+}
>+
>+unsigned int
>+fn3 (int a)
>+{
>+  return a - (unsigned) ((a / 3) * 3);
>+}
>+
>+signed int
>+fn4 (int a)
>+{
>+  return (unsigned) a - (unsigned) ((a / 3) * 3);
>+}
>+
>+signed int
>+fn5 (unsigned int a)
>+{
>+  return (signed) a - (int) ((a / 3) * 3);
>+}
>+
>+int
>+main ()
>+{
>+  if (fn1 (-5) != -2
>+  || fn2 (-5) != -2
>+  || fn3 (-5) != -2
>+  || fn4 (-5) != -2)
>+__builtin_abort ();
>+}
>
>   Marek




  1   2   >