Re: [GSoC] Patches for shared_ptr array and polymorphic_allocator

2015-07-18 Thread Tim Shen
On Fri, Jul 17, 2015 at 7:16 PM, Fan You  wrote:
> Hi,
>
> According to 
> 
>
> Here is my implementation of
>
> [8.2] Extend shared_ptr to support arrays

Please don't resend the shared_ptr patch, since it's already tracked
in another thread.

> [8.3] Type-Erased allocator

Please send a working patch with tests and (probably with Makefile.am changes).

Format: please replace leading consecutive spaces with tabs.

L70:
static std::atomic s_default_resource;

naming: _S_default_resource.

L43:
virtual ~memory_resource() { }

Please break the line after virtual/return type. This also applies for
other places in the patch.

L81:
  template 
class __constructor_helper_imp

This doesn't work correctly for at least following case:
std::__constructor_helper_imp(std::allocator_arg,
std::allocator(),
std::true_type(),
std::true_type(),
std::true_type());

Based on [allocator.uses.construction], this falls into the second
rule, because there exists priorities in those rules; but overloading
resolution thinks it's ambiguous.

To enforce the order, you can do this:

template
struct __uses_allocator_construction_helper;

...
{
constexpr bool __uses_alloc = uses_allocator<...>::value;
constexpr bool __normally_constructible = is_constructible<_Tp,
_Args...>::value;
constexpr bool __constructible_alloc_before =
is_constructible<_Tp, allocator_tag_t, _Alloc, _Args...>::value;
constexpr bool __constructible_alloc_after =
is_constructible<_Args..., _Alloc>::value;

constexpr bool __uses_rule1 = !__uses_alloc && __normally_constructible;
constexpr bool __uses_rule2 = __uses_alloc && __constructible_alloc_before;
constexpr bool __uses_rule3 = __uses_alloc && __constructible_alloc_after;
__uses_allocator_construction_helper<__uses_rule1 ? 1 :
(__uses_rule2 ? 2 : (__uses_rule3 ? 3 : 0))>::_S_apply(...);
}

Consider use a more readable helper name, like
__uses_allocator_construction_helper and document it.

L73:
  bool operator==(const memory_resource& __a,
  const memory_resource& __b) noexcept
  { return &__a == &__b || __a.is_equal(__b); }

Make all non-template functions inlined.

L178:
  { } // used here

What does this mean?

L180:
  polymorphic_allocator(memory_resource* __r)
  : _M_resource(__r ? __r : get_default_resource())
  { }

[8.6.2.3] describes __r != nullptr as a precondition, which is
guaranteed by the caller, so we don't have to check here.
Alternatively you can use _GLIBCXX_ASSERT.

L262:
  memory_resource* _M_resource;

private member variables should be defined after private member functions.

L286:
  template 
bool operator!=(const polymorphic_allocator<_Tp1>& __a,
const polymorphic_allocator<_Tp2>& __b) noexcept
{ return ! (__a == __b); }

Remove extra space after "!".

L340:
auto __p = dynamic_cast(&__other);

What if the user turns off RTTI?

L345:
  // Calculate Aligned Size
  size_t _Aligned_size(size_t __size, size_t __alignment)
  { return ((__size - 1)|(__alignment - 1)) + 1; }

  bool _M_supported (size_t __x)
  { return ((__x != 0) && (__x != 0) && !(__x & (__x - 1))); }

Document these two functions' behaviors, e.g.:
Returns a size that is larger than or equal to __size and divided by
__alignment, where __alignment is required to be the power of 2.

L355:
  // Global memory resources
  atomic memory_resource::s_default_resource;
L386:
  // The default memory resource
  memory_resource* get_default_resource() noexcept
  {
memory_resource *__ret
  = memory_resource::s_default_resource.load();

if (__ret == nullptr) { __ret = new_delete_resource(); }
return __ret;
  }

According to [8.8.5], memory resource pointer should be intialized
with new_delete_resource(), not nullptr; and get_default_resource
should only return the pointer.

L396:
  memory_resource* set_default_resource(memory_resource* __r) noexcept
  {
if ( __r == nullptr)
{ __r = new_delete_resource(); }

memory_resource* __prev = get_default_resource();
memory_resource::s_default_resource.store(__r);
return __prev;
  }

We shouldn't care if it's nullptr or not.
Your get-then-set may cause a data race. I think
std::atomic<>::exchange will work, but we should confirm with Jon.


-- 
Regards,
Tim Shen


[gomp4, committed] Fix OACC_LOOP usage in goacc tests

2015-07-18 Thread Tom de Vries

Hi Chung-Lin,

when compiling f.i. kernels-loop-acc-loop.c with -Wall, I ran into:
...
In file included from kernels-loop-acc-loop.c:8:0:
kernels-loop.c: In function ‘main’:
kernels-loop.c:30:0: warning: ignoring #pragma ACC_LOOP  [-Wunknown-pragmas]
...

kernels-loop-acc-loop.c contains:
...
#define ACC_LOOP acc loop
#include "kernels-loop.c"
...

and kernels-loop.c contains:
...
#pragma ACC_LOOP
 ...

Unfortunately, this expands to '#pragma ACC_LOOP', instead of the 
desired '#pragma acc loop'.


Something that works, is:
...
#define ACC_LOOP loop
#pragma acc ACC_LOOP
...

But the empty variant (the one we need for kernels-loop.c) doesn't work. 
We either get 'ignoring #pragma ACC_LOOP' for:

...
#define ACC_LOOP
#pragma acc ACC_LOOP
...
or 'ignoring #pragma acc' for:
...
#define ACC_LOOP ""
#pragma acc ACC_LOOP
...

It would be nice if the openacc standard defined a pragma acc nop.

For now, I've fixed this using simple conditional compilation:
...
#ifdef ACC_LOOP
#pragma acc loop
#endif
...
which works for both:
- undefined ACC_LOOP, and
- #define ACC_LOOP

[ Btw, note that this scheme:
...
#ifdef ACC_LOOP
#pragma acc ACC_LOOP
#endif
...
would work with:
- undefined ACC_LOOP,
- #define ACC_LOOP loop
- #define ACC_LOOP loop independent ]

Committed to gomp-4_0-branch.

Thanks,
- Tom
Fix OACC_LOOP usage in goacc tests

2015-07-17  Tom de Vries  

	* c-c++-common/goacc/kernels-loop-2-acc-loop.c (ACC_LOOP): Define empty.
	* c-c++-common/goacc/kernels-loop-3-acc-loop.c: Same.
	* c-c++-common/goacc/kernels-loop-acc-loop.c: Same.
	* c-c++-common/goacc/kernels-loop-n-acc-loop.c: Same.
	* c-c++-common/goacc/kernels-loop-2.c (ACC_LOOP): Remove default define.
	 (main): Define #pragma acc loop conditional on ACC_LOOP.
	* c-c++-common/goacc/kernels-loop-3.c: Same.
	* c-c++-common/goacc/kernels-loop.c: Same.
	* c-c++-common/goacc/kernels-loop-n.c (ACC_LOOP): Remove default define.
	(foo): Define #pragma acc loop conditional on ACC_LOOP.
---
 .../c-c++-common/goacc/kernels-loop-2-acc-loop.c |  2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c| 16 +---
 .../c-c++-common/goacc/kernels-loop-3-acc-loop.c |  2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c|  8 +++-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c |  2 +-
 .../c-c++-common/goacc/kernels-loop-n-acc-loop.c |  2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c|  8 +++-
 gcc/testsuite/c-c++-common/goacc/kernels-loop.c  |  8 +++-
 8 files changed, 22 insertions(+), 26 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
index abfcb85..c041ca5 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
@@ -4,7 +4,7 @@
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
 /* Check that loops with '#pragma acc loop' tagged gets properly parallelized.  */
-#define ACC_LOOP acc loop
+#define ACC_LOOP
 #include "kernels-loop-2.c"
 
 /* Check that only three loops are analyzed, and that all can be
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
index e175b5b..d3f8805 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
@@ -8,10 +8,6 @@
 #define N (1024 * 512)
 #define COUNTERTYPE unsigned int
 
-#ifndef ACC_LOOP
-#define ACC_LOOP
-#endif
-
 int
 main (void)
 {
@@ -25,21 +21,27 @@ main (void)
 
 #pragma acc kernels copyout (a[0:N])
   {
-#pragma ACC_LOOP
+#ifdef ACC_LOOP
+#pragma acc loop
+#endif
 for (COUNTERTYPE i = 0; i < N; i++)
   a[i] = i * 2;
   }
 
 #pragma acc kernels copyout (b[0:N])
   {
-#pragma ACC_LOOP
+#ifdef ACC_LOOP
+#pragma acc loop
+#endif
 for (COUNTERTYPE i = 0; i < N; i++)
   b[i] = i * 4;
   }
 
 #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
   {
-#pragma ACC_LOOP
+#ifdef ACC_LOOP
+#pragma acc loop
+#endif
 for (COUNTERTYPE ii = 0; ii < N; ii++)
   c[ii] = a[ii] + b[ii];
   }
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
index d6447a1..7e748c0 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
@@ -4,7 +4,7 @@
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
 /* Check that loops with '#pragma acc loop' tagged gets properly parallelized.  */
-#define ACC_LOOP acc loop
+#define ACC_LOOP
 #include "kernels-loop-3.c"
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
index 7d8848c..2620075 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
+++ b

[gomp4, committed] Obvious -Wall fixes in openacc tests

2015-07-18 Thread Tom de Vries

Hi,

I've committed these three obvious patches that fix -Wall warnings in 
openacc test-cases to gomp-4_0-branch.


Thanks,
- Tom
Add missing return in private-reduction-1.c

2015-07-17  Tom de Vries  

	* c-c++-common/goacc/private-reduction-1.c (reduction): Add missing
	return.
---
 gcc/testsuite/c-c++-common/goacc/private-reduction-1.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/c-c++-common/goacc/private-reduction-1.c b/gcc/testsuite/c-c++-common/goacc/private-reduction-1.c
index 1e0f286..d4e3995 100644
--- a/gcc/testsuite/c-c++-common/goacc/private-reduction-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/private-reduction-1.c
@@ -7,4 +7,6 @@ reduction ()
   #pragma acc loop private (r) reduction (+:r)
   for (i = 0; i < 100; i++)
 r += 10;
+
+  return r;
 }
-- 
1.9.1

Add missing parentheses in libgomp.oacc-c-c++-common/vec-partn-3.c

2015-07-17  Tom de Vries  

	* testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c (main): Add missing
	parentheses.
---
 libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c
index 7908d4c..8dd628e2 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c
@@ -48,7 +48,7 @@ main (int argc, char *argv[])
 assert (n[i] == 2);
 
   for (i = 0; i < 1024; i++)
-assert (arr[i] == (i % 64) < 32 ? 1 : -1);
+assert (arr[i] == ((i % 64) < 32) ? 1 : -1);
 
   return 0;
 }
-- 
1.9.1

Fix return type in libgomp.oacc-c-c++-common/gang-static-1.c

2015-07-17  Tom de Vries  

	* testsuite/libgomp.oacc-c-c++-common/gang-static-1.c (test): Change
	return type to void.
---
 libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-1.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-1.c
index 42f4585..d8ab958 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-1.c
@@ -2,7 +2,8 @@
 
 #define N 100
 
-int test(int *a, int *b, int sarg)
+void
+test (int *a, int *b, int sarg)
 {
   int i;
 
-- 
1.9.1



Re: [PR64164] drop copyrename, integrate into expand

2015-07-18 Thread Alexandre Oliva
On Jul 16, 2015, Alexandre Oliva  wrote:

> So, I decided to run a ppc64le-linux-gnu bootstrap, just in case, and
> there are issues with split complex parms that caused go and fortran
> libs to fail the build.

This incremental patch, along with the previously-posted patches, fix
split complex args handling with preassigned args RTL, and enables
ppc64le-linux-gnu bootstrap to succeed.

I'm not particularly happy with the abuse of DECL_CONTEXT to recognize
split complex args and leave their RTL alone, but that was the best that
occurred to me.  Any other suggestions?

Is the combined patch ok, assuming further (re)testing of embedded
targets passes?

for  gcc/ChangeLog (to be integrated with the approved patches)

* function.c (split_complex_args): Take assign_parm_data_all
argument.  Pass it to rtl_for_parm.  Set up rtl and context
for split args.
(assign_parms_augmented_arg_list): Adjust.
(maybe_reset_rtl_for_parm): Recognize split complex args.
* stor-layout.c (layout_decl): Don't set mem attributes of
non-MEMs.
---
 gcc/function.c|   39 +--
 gcc/stor-layout.c |3 ++-
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/gcc/function.c b/gcc/function.c
index 753d889..6fba001 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -151,6 +151,8 @@ static bool contains (const_rtx, 
hash_table *);
 static void prepare_function_start (void);
 static void do_clobber_return_reg (rtx, void *);
 static void do_use_return_reg (rtx, void *);
+static rtx rtl_for_parm (struct assign_parm_data_all *, tree);
+
 
 /* Stack of nested functions.  */
 /* Keep track of the cfun stack.  */
@@ -2267,7 +2269,7 @@ assign_parms_initialize_all (struct assign_parm_data_all 
*all)
needed, else the old list.  */
 
 static void
-split_complex_args (vec *args)
+split_complex_args (struct assign_parm_data_all *all, vec *args)
 {
   unsigned i;
   tree p;
@@ -2278,6 +2280,7 @@ split_complex_args (vec *args)
   if (TREE_CODE (type) == COMPLEX_TYPE
  && targetm.calls.split_complex_arg (type))
{
+ tree cparm = p;
  tree decl;
  tree subtype = TREE_TYPE (type);
  bool addressable = TREE_ADDRESSABLE (p);
@@ -2296,6 +2299,9 @@ split_complex_args (vec *args)
  DECL_ARTIFICIAL (p) = addressable;
  DECL_IGNORED_P (p) = addressable;
  TREE_ADDRESSABLE (p) = 0;
+ /* Reset the RTL before layout_decl, or it may change the
+mode of the RTL of the original argument copied to P.  */
+ SET_DECL_RTL (p, NULL_RTX);
  layout_decl (p, 0);
  (*args)[i] = p;
 
@@ -2307,6 +2313,25 @@ split_complex_args (vec *args)
  DECL_IGNORED_P (decl) = addressable;
  layout_decl (decl, 0);
  args->safe_insert (++i, decl);
+
+ /* If we are assigning parameters for a function, rather
+than for a call, propagate the RTL of the complex parm to
+the split declarations, and set their contexts so that
+maybe_reset_rtl_for_parm can recognize them and refrain
+from resetting their RTL.  */
+ if (cfun->gimple_df)
+   {
+ rtx rtl = rtl_for_parm (all, cparm);
+ gcc_assert (!rtl || GET_CODE (rtl) == CONCAT);
+ if (rtl)
+   {
+ SET_DECL_RTL (p, XEXP (rtl, 0));
+ SET_DECL_RTL (decl, XEXP (rtl, 1));
+
+ DECL_CONTEXT (p) = cparm;
+ DECL_CONTEXT (decl) = cparm;
+   }
+   }
}
 }
 }
@@ -2369,7 +2394,7 @@ assign_parms_augmented_arg_list (struct 
assign_parm_data_all *all)
 
   /* If the target wants to split complex arguments into scalars, do so.  */
   if (targetm.calls.split_complex_arg)
-split_complex_args (&fnargs);
+split_complex_args (all, &fnargs);
 
   return fnargs;
 }
@@ -2823,6 +2848,16 @@ maybe_reset_rtl_for_parm (tree parm)
 {
   gcc_assert (TREE_CODE (parm) == PARM_DECL
  || TREE_CODE (parm) == RESULT_DECL);
+
+  /* This is a split complex parameter, and its context was set to its
+ original PARM_DECL in split_complex_args so that we could
+ recognize it here and not reset its RTL.  */
+  if (DECL_CONTEXT (parm) && TREE_CODE (DECL_CONTEXT (parm)) == PARM_DECL)
+{
+  DECL_CONTEXT (parm) = DECL_CONTEXT (DECL_CONTEXT (parm));
+  return;
+}
+
   if ((flag_tree_coalesce_vars
|| (DECL_RTL_SET_P (parm) && DECL_RTL (parm) == pc_rtx))
   && is_gimple_reg (parm))
diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c
index 0d4f4a4..288227a 100644
--- a/gcc/stor-layout.c
+++ b/gcc/stor-layout.c
@@ -794,7 +794,8 @@ layout_decl (tree decl, unsigned int known_align)
 {
   PUT_MODE (rtl, DECL_MODE (decl));
   SET_DECL_RTL (decl, 0);
-  set_mem_attributes (rtl, decl, 1);
+  if (MEM_P (rtl))
+   set_mem_attributes (rtl, decl, 1);
   SET_DECL_RTL (decl, r

Re: [PATCH] 2015-07-14 Benedikt Huber Philipp Tomsich

2015-07-18 Thread Andrew Pinski
On Fri, Jul 17, 2015 at 8:43 AM, Benedikt Huber
 wrote:
> * config/aarch64/aarch64-builtins.c: Builtins
> for rsqrt and rsqrtf.
> * config/aarch64/aarch64-protos.h: Declare.
> * config/aarch64/aarch64-simd.md: Matching expressions
> for frsqrte and frsqrts.
> * config/aarch64/aarch64.c: New functions. Emit rsqrt
> estimation code in fast math mode.
> * config/aarch64/aarch64.md: Added enum entries.
> * config/aarch64/aarch64.opt: Added options -mrecip and
> -mlow-precision-recip-sqrt.
> * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
> for frsqrte and frsqrts
> * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests
> for rsqrt.

As I mentioned before,  Can we have -mrecip defaulted to on since it
regresses ThunderX performance by at least 10%.

Thanks,
Andrew

>
> Signed-off-by: Philipp Tomsich 
> ---
>  gcc/ChangeLog  |  17 
>  gcc/config/aarch64/aarch64-builtins.c  | 103 
>  gcc/config/aarch64/aarch64-protos.h|   2 +
>  gcc/config/aarch64/aarch64-simd.md |  27 ++
>  gcc/config/aarch64/aarch64.c   |  58 +++
>  gcc/config/aarch64/aarch64.md  |   3 +
>  gcc/config/aarch64/aarch64.opt |   8 ++
>  gcc/doc/invoke.texi|  20 
>  gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c |  63 
>  gcc/testsuite/gcc.target/aarch64/rsqrt.c   | 107 
> +
>  10 files changed, 408 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 3432adb..f4b7407 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,20 @@
> +2015-07-14  Benedikt Huber  
> +   Philipp Tomsich  
> +
> +   * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and
> +   rsqrtf.
> +   * config/aarch64/aarch64-protos.h: Declare.
> +   * config/aarch64/aarch64-simd.md: Matching expressions for
> +   frsqrte and frsqrts.
> +   * config/aarch64/aarch64.c: New functions. Emit rsqrt
> +   estimation code in fast math mode.
> +   * config/aarch64/aarch64.md: Added enum entries.
> +   * config/aarch64/aarch64.opt: Added options -mrecip and
> +   -mlow-precision-recip-sqrt.
> +   * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
> +   for frsqrte and frsqrts
> +   * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt.
> +
>  2015-07-08  Jiong Wang  
>
> * config/aarch64/aarch64.c (aarch64_unspec_may_trap_p): New function.
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index b6c89b9..adcea07 100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -335,6 +335,11 @@ enum aarch64_builtins
>AARCH64_BUILTIN_GET_FPSR,
>AARCH64_BUILTIN_SET_FPSR,
>
> +  AARCH64_BUILTIN_RSQRT_DF,
> +  AARCH64_BUILTIN_RSQRT_SF,
> +  AARCH64_BUILTIN_RSQRT_V2DF,
> +  AARCH64_BUILTIN_RSQRT_V2SF,
> +  AARCH64_BUILTIN_RSQRT_V4SF,
>AARCH64_SIMD_BUILTIN_BASE,
>AARCH64_SIMD_BUILTIN_LANE_CHECK,
>  #include "aarch64-simd-builtins.def"
> @@ -824,6 +829,42 @@ aarch64_init_crc32_builtins ()
>  }
>
>  void
> +aarch64_add_builtin_rsqrt (void)
> +{
> +  tree fndecl = NULL;
> +  tree ftype = NULL;
> +
> +  tree V2SF_type_node = build_vector_type (float_type_node, 2);
> +  tree V2DF_type_node = build_vector_type (double_type_node, 2);
> +  tree V4SF_type_node = build_vector_type (float_type_node, 4);
> +
> +  ftype = build_function_type_list (double_type_node, double_type_node, 
> NULL_TREE);
> +  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_df",
> +ftype, AARCH64_BUILTIN_RSQRT_DF, BUILT_IN_MD, NULL, NULL_TREE);
> +  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_DF] = fndecl;
> +
> +  ftype = build_function_type_list (float_type_node, float_type_node, 
> NULL_TREE);
> +  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_sf",
> +ftype, AARCH64_BUILTIN_RSQRT_SF, BUILT_IN_MD, NULL, NULL_TREE);
> +  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_SF] = fndecl;
> +
> +  ftype = build_function_type_list (V2DF_type_node, V2DF_type_node, 
> NULL_TREE);
> +  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_v2df",
> +ftype, AARCH64_BUILTIN_RSQRT_V2DF, BUILT_IN_MD, NULL, NULL_TREE);
> +  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_V2DF] = fndecl;
> +
> +  ftype = build_function_type_list (V2SF_type_node, V2SF_type_node, 
> NULL_TREE);
> +  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_v2sf",
> +ftype, AARCH64_BUILTIN_RSQRT_V2SF, BUILT_IN_MD, NULL, NULL_TREE);
> +  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_V2SF] = fndecl;
> +
> +  ftype = 

Re: [PATCH] 2015-07-14 Benedikt Huber Philipp Tomsich

2015-07-18 Thread Andrew Pinski
On Sat, Jul 18, 2015 at 1:25 AM, Andrew Pinski  wrote:
> On Fri, Jul 17, 2015 at 8:43 AM, Benedikt Huber
>  wrote:
>> * config/aarch64/aarch64-builtins.c: Builtins
>> for rsqrt and rsqrtf.
>> * config/aarch64/aarch64-protos.h: Declare.
>> * config/aarch64/aarch64-simd.md: Matching expressions
>> for frsqrte and frsqrts.
>> * config/aarch64/aarch64.c: New functions. Emit rsqrt
>> estimation code in fast math mode.
>> * config/aarch64/aarch64.md: Added enum entries.
>> * config/aarch64/aarch64.opt: Added options -mrecip and
>> -mlow-precision-recip-sqrt.
>> * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
>> for frsqrte and frsqrts
>> * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests
>> for rsqrt.
>
> As I mentioned before,  Can we have -mrecip defaulted to on since it
> regresses ThunderX performance by at least 10%.

As I mentioned before,  Can we have -mrecip not defaulted to on since
it regresses ThunderX performance by at least 10%?


>
> Thanks,
> Andrew
>
>>
>> Signed-off-by: Philipp Tomsich 
>> ---
>>  gcc/ChangeLog  |  17 
>>  gcc/config/aarch64/aarch64-builtins.c  | 103 
>> 
>>  gcc/config/aarch64/aarch64-protos.h|   2 +
>>  gcc/config/aarch64/aarch64-simd.md |  27 ++
>>  gcc/config/aarch64/aarch64.c   |  58 +++
>>  gcc/config/aarch64/aarch64.md  |   3 +
>>  gcc/config/aarch64/aarch64.opt |   8 ++
>>  gcc/doc/invoke.texi|  20 
>>  gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c |  63 
>>  gcc/testsuite/gcc.target/aarch64/rsqrt.c   | 107 
>> +
>>  10 files changed, 408 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c
>>  create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 3432adb..f4b7407 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,20 @@
>> +2015-07-14  Benedikt Huber  
>> +   Philipp Tomsich  
>> +
>> +   * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and
>> +   rsqrtf.
>> +   * config/aarch64/aarch64-protos.h: Declare.
>> +   * config/aarch64/aarch64-simd.md: Matching expressions for
>> +   frsqrte and frsqrts.
>> +   * config/aarch64/aarch64.c: New functions. Emit rsqrt
>> +   estimation code in fast math mode.
>> +   * config/aarch64/aarch64.md: Added enum entries.
>> +   * config/aarch64/aarch64.opt: Added options -mrecip and
>> +   -mlow-precision-recip-sqrt.
>> +   * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
>> +   for frsqrte and frsqrts
>> +   * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt.
>> +
>>  2015-07-08  Jiong Wang  
>>
>> * config/aarch64/aarch64.c (aarch64_unspec_may_trap_p): New function.
>> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
>> b/gcc/config/aarch64/aarch64-builtins.c
>> index b6c89b9..adcea07 100644
>> --- a/gcc/config/aarch64/aarch64-builtins.c
>> +++ b/gcc/config/aarch64/aarch64-builtins.c
>> @@ -335,6 +335,11 @@ enum aarch64_builtins
>>AARCH64_BUILTIN_GET_FPSR,
>>AARCH64_BUILTIN_SET_FPSR,
>>
>> +  AARCH64_BUILTIN_RSQRT_DF,
>> +  AARCH64_BUILTIN_RSQRT_SF,
>> +  AARCH64_BUILTIN_RSQRT_V2DF,
>> +  AARCH64_BUILTIN_RSQRT_V2SF,
>> +  AARCH64_BUILTIN_RSQRT_V4SF,
>>AARCH64_SIMD_BUILTIN_BASE,
>>AARCH64_SIMD_BUILTIN_LANE_CHECK,
>>  #include "aarch64-simd-builtins.def"
>> @@ -824,6 +829,42 @@ aarch64_init_crc32_builtins ()
>>  }
>>
>>  void
>> +aarch64_add_builtin_rsqrt (void)
>> +{
>> +  tree fndecl = NULL;
>> +  tree ftype = NULL;
>> +
>> +  tree V2SF_type_node = build_vector_type (float_type_node, 2);
>> +  tree V2DF_type_node = build_vector_type (double_type_node, 2);
>> +  tree V4SF_type_node = build_vector_type (float_type_node, 4);
>> +
>> +  ftype = build_function_type_list (double_type_node, double_type_node, 
>> NULL_TREE);
>> +  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_df",
>> +ftype, AARCH64_BUILTIN_RSQRT_DF, BUILT_IN_MD, NULL, NULL_TREE);
>> +  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_DF] = fndecl;
>> +
>> +  ftype = build_function_type_list (float_type_node, float_type_node, 
>> NULL_TREE);
>> +  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_sf",
>> +ftype, AARCH64_BUILTIN_RSQRT_SF, BUILT_IN_MD, NULL, NULL_TREE);
>> +  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_SF] = fndecl;
>> +
>> +  ftype = build_function_type_list (V2DF_type_node, V2DF_type_node, 
>> NULL_TREE);
>> +  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_v2df",
>> +ftype, AARCH64_BUILTIN_RSQRT_V2DF, BUILT_IN_MD, NULL, NULL_TREE);
>> +  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_V2DF] = fndecl;
>> +
>> +  

Re: Still crashes due to aliasing violation (Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator)

2015-07-18 Thread Richard Biener
On July 17, 2015 11:28:28 PM GMT+02:00, Ulrich Weigand  
wrote:
>On July 17, 2015 6:54:32 PM GMT+02:00, Ulrich Weigand
> wrote:
>> >So do we now consider host compilers < 4.3 (4?) unsupported for
>> >building
>> >mainline GCC, or should we try to work around the issue (e.g. by
>moving
>> >the allocator out-of-line or using some other aliasing barrier)?
>> 
>> Why is this an issue for stage1 which runs w/o optimization?
>
>Well, this is the SPU compiler on a Cell system, which is technically
>a cross compiler from PowerPC (even though the resulting binaries run
>natively on the machine).
>
>> For cross compiling we already suggest using known good compilers.
>
>The documentation says:
>
>  To build a cross compiler, we recommend first building and installing
>  a native compiler. You can then use the native GCC compiler to build
>  the cross compiler. The installed native compiler needs to be GCC
>  version 2.95 or later. 

I think that needs updating anyway since even for crosses we now require a 
C++04 conforming host compiler.

>So building with a native GCC 4.1 seems to have been officially
>supported until now as far as I can tell (unless you're building Ada).
>
>
>Now, I could certainly live with a statement that cross compilers can
>only be build with a native GCC 4.3 or newer; but that should be IMO
>a deliberate decision and be widely announced (maybe even verified
>by a configure check?), so that others don't run into the problem;
>the nature of its symptoms make the problem difficult to diagnose.

The requirement is to have a bug-free host compiler or use flags that make it 
appear bug-free.  Which is why we use -O0 when bootstrapping...

Yes, we could detect appropriate host gcc versions at configure time and apply 
a workaround (use -fno-strict-aliasing) for too old GCC.

Richard.

>
>Bye,
>Ulrich




[PATCH, i386]: fix PR 66922, wrong code for bit-field struct

2015-07-18 Thread Uros Bizjak
We have to reject misaligned insertions and extractions from
ix86_expand_pextr and  ix86_expand_pinsr.

2015-07-18  Uros Bizjak  

PR target/66922
* config/i386/i386.c (ix86_expand_pextr): Reject extractions
from misaligned positions.
(ix86_expand_pinsr): Reject insertions to misaligned positions.

testsuite/ChangeLog:

2015-07-18  Uros Bizjak  

PR target/66922
* gcc.target/i386/pr66922.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN and release branches.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 225979)
+++ config/i386/i386.c  (working copy)
@@ -50591,6 +50591,10 @@ ix86_expand_pextr (rtx *operands)
return false;
  }
 
+   /* Reject extractions from misaligned positions.  */
+   if (pos & (size-1))
+ return false;
+
if (GET_MODE (dst) == dstmode)
  d = dst;
else
@@ -50687,6 +50691,10 @@ ix86_expand_pinsr (rtx *operands)
return false;
  }
 
+   /* Reject insertions to misaligned positions.  */
+   if (pos & (size-1))
+ return false;
+
if (GET_CODE (src) == SUBREG)
  {
unsigned int srcpos = SUBREG_BYTE (src);
Index: testsuite/gcc.target/i386/pr66922.c
===
--- testsuite/gcc.target/i386/pr66922.c (revision 0)
+++ testsuite/gcc.target/i386/pr66922.c (working copy)
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-options "-O1 -msse2" } */
+/* { dg-require-effective-target sse2 } */
+
+#include "sse2-check.h"
+
+struct S 
+{
+  int:31;
+  int:2;
+  int f0:16;
+  int f1;
+  int f2;
+};
+
+static void 
+sse2_test (void)
+{
+  struct S a = { 1, 0, 0 };
+
+  if (a.f0 != 1)
+__builtin_abort(); 
+}


Re: [PATCH, MIPS] Scheduling for M51xx core family

2015-07-18 Thread Richard Sandiford
Robert Suchanek  writes:
> @@ -771,7 +771,8 @@ struct mips_cpu_info {
>  
>  /* Infer a -mnan=2008 setting from a -mips argument.  */
>  #define MIPS_ISA_NAN2008_SPEC \
> -  "%{mnan*:;mips32r6|mips64r6:-mnan=2008}"
> +  "%{mnan*:;mips32r6|mips64r6:-mnan=2008;march=m51*: \
> +  %{!msoft-float:-mnan=2008}}"

Did you need this, or was it for completeness?  MIPS_ISA_NAN2008_SPEC
should only be used after MIPS_ISA_LEVEL_SPEC, so I would have expected
the mips32r6|mips64r6: case to fire for -march=m51* too, ahead of the
new case.

Thanks,
Richard


Re: [gomp] Move openacc vector& worker single handling to RTL

2015-07-18 Thread Thomas Schwinge
Hi Nathan!

On Thu, 09 Jul 2015 20:25:22 -0400, Nathan Sidwell  wrote:
> This is the patch I committed.  [...]

Prompted by your recent "-O0 patch" to »[f]ix PTX worker spill/fill«, I
used the attached patch 0001-O0-libgomp-C-C-testing.patch to run all C
and C++ libgomp testing with -O0 (for Fortran, we iterate through various
kinds of optimization levels anyway).  (There are no regressions of
OpenMP testing.)  

For OpenACC nvptx offloading, there must still be something wrong; here's
a count of the (non-deterministic!) regressions of ten runs of the
libgomp testsuite.  As private-vars-loop-worker-5.c fails most often, it
probably makes sense to look into that one first.

For avoidance of doubt, there are no such regressions if I un-apply your
patch to »[m]ove openacc vector& worker single handling to RTL«.

libgomp.oacc-c:

3: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-local-worker-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
4: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-local-worker-2.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
3: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-local-worker-3.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
5: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-local-worker-4.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
4: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-local-worker-5.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
3: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-loop-vector-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
2: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-loop-vector-2.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
3: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-loop-worker-2.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
2: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-loop-worker-3.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
2: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-loop-worker-4.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
8: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-loop-worker-5.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
4: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-loop-worker-6.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
4: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-loop-worker-7.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
1: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/worker-partn-5.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
3: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/worker-partn-6.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

libgomp.oacc-c++:

5: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-local-worker-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
5: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-local-worker-2.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
4: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-local-worker-3.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
5: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-local-worker-4.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
6: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-local-worker-5.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
3: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-loop-vector-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
2: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-loop-worker-2.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
4: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-loop-worker-3.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
4: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-loop-worker-4.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
7: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-loop-worker-5.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
4: [-PASS:-]{+FAIL:+} 
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars

Re: [PATCH][combine][1/2] Try to simplify before substituting

2015-07-18 Thread Segher Boessenkool
On Fri, Jul 17, 2015 at 02:47:34PM -0600, Jeff Law wrote:
> >>I mean move the whole "if (BINARY_P ..." block to after the existing
> >>simplify calls, to just before the "First see if we can apply" comment,
> >>and not do a new simplify_rtx call at all.  Does that work?
> >
> >Yes, and here's the patch.
> >It just moves the simplification block.
> >The effect on codegen in SPEC2006 on aarch64 looks sane in the same
> >way as the original patch I posted (i.e. many redundant zero_extends
> >eliminated)
> >and together with patch 2/2 this helps in the -abs testcase.
> >
> >I'm bootstrapping this on aarch64, arm and x86.
> >Any other testing would be appreciated.
> >
> >Is this version ok if testing comes clean?
> >
> >Thanks,
> >Kyrill
> >
> >2015-07-17  Kyrylo Tkachov  
> >
> > * combine.c (combine_simplify_rtx): Move simplification step
> > before various transformations/substitutions.
> OK.
> jeff

The patch improves generated code on most archs (or at least code size,
which strongly correlates for combine), or is neutral.  xtensa regresses
a tiny bit; powerpc64 and hppa64 regress more.  I analysed the powerpc64
differences, and it seems to be all down to code that is now expressed as

(set (reg:DI) (lt:DI (reg:SI) (const_int 0)))

where before it was a bit extract (of a subreg).  The newly generated
pattern is simper alright, but the backend didn't recognise it.  With a
simple patch, it does, and the generated code is nicely better than
before.

The hppa64 problem looks similar.  Maybe other targets could use such
an improvement as well.

So yes, the patch is fine.  Thank you for working on it Kyrill :-)


Segher


RE: [PATCH, MIPS] Scheduling for M51xx core family

2015-07-18 Thread Matthew Fortune
Richard Sandiford  writes:
> Robert Suchanek  writes:
> > @@ -771,7 +771,8 @@ struct mips_cpu_info {
> >
> >  /* Infer a -mnan=2008 setting from a -mips argument.  */
> >  #define MIPS_ISA_NAN2008_SPEC \
> > -  "%{mnan*:;mips32r6|mips64r6:-mnan=2008}"
> > +  "%{mnan*:;mips32r6|mips64r6:-mnan=2008;march=m51*: \
> > +%{!msoft-float:-mnan=2008}}"
> 
> Did you need this, or was it for completeness?  MIPS_ISA_NAN2008_SPEC
> should only be used after MIPS_ISA_LEVEL_SPEC, so I would have expected
> the mips32r6|mips64r6: case to fire for -march=m51* too, ahead of the
> new case.

The m5100 is a MIPS32R5 but with a NAN2008 FPU which is why there is the
special case. The soft-float case is to try and limit the number of
multilib variants required so we stick to nan legacy for softfloat by
default.

Thanks,
Matthew



[PATCH] fix compilation of vmsdbgout.c

2015-07-18 Thread tbsaunde+gcc
From: Trevor Saunders 

The debug-early branch renamed vmsdbgout_decl to
vmsdbgout_function_decl, but didn't update its prototype.

checked that the alpha and ia64 vms targets in config-list.mk can now build
all-gcc, and committing to trunk as obvious.

Trev

gcc/ChangeLog:

2015-07-18  Trevor Saunders  

* vmsdbgout.c (vmsdbgout_decl): Change name of prototyped
function to vmsdbgout_function_decl.
---
 gcc/ChangeLog   | 5 +
 gcc/vmsdbgout.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 095713d..128e08a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2015-07-18  Trevor Saunders  
+
+   * vmsdbgout.c (vmsdbgout_decl): Change name of prototyped
+   function to vmsdbgout_function_decl.
+
 2015-07-18  Uros Bizjak  
 
PR target/66922
diff --git a/gcc/vmsdbgout.c b/gcc/vmsdbgout.c
index f3ebd75..d41d4b2 100644
--- a/gcc/vmsdbgout.c
+++ b/gcc/vmsdbgout.c
@@ -163,7 +163,7 @@ static void vmsdbgout_end_function (unsigned int);
 static void vmsdbgout_begin_epilogue (unsigned int, const char *);
 static void vmsdbgout_end_epilogue (unsigned int, const char *);
 static void vmsdbgout_begin_function (tree);
-static void vmsdbgout_decl (tree);
+static void vmsdbgout_function_decl (tree);
 static void vmsdbgout_early_global_decl (tree);
 static void vmsdbgout_late_global_decl (tree);
 static void vmsdbgout_type_decl (tree, int);
-- 
2.4.0



[PATCH] PR rtl-optimization/66790: uninitialized registers handling in REE

2015-07-18 Thread Pierre-Marie de Rodat

Hello,

This patch is an attempt to fix PR rtl-optimization/66790: please see 
 for the context. 
This adds a new dataflow problem (MIR for Must-Initialized Registers) 
and use it in the REE pass to remove oversights, fixing the original issue.


I would also like to draw your attention to the use of the 
"must-initialized" term. Being new to dataflow problems, I understand it 
as "registers that are always initialized whatever the path leading to 
its use" and this seems to be confirmed by the current "LIVE AND 
MUST-INITIALIZED REGISTERS" comment in df-problems.c. However, it feels 
like what is currently called "must-initialized" in this source file is 
actually registers that *may* be initialized: see in particular the use 
of the bitmap_ior_into operation in df_live_confluence_n.


Can someone confirm this, please? If I'm right, I will update the 
attached patch to reword the corresponding comments.


This patch bootstraps and yields no regression on x86_64-linux. I'm 
still trying to fiddle the build system to actually see how assemblies 
from the bootstrap are affected. Thank you in advance for the review! :-)


gcc/ChangeLog:

PR rtl-optimization/66790
* df.h (DF_MIR): New macro.
(DF_LAST_PROBLEM_PLUS1): Update to be past DF_MIR
(DF_MIR_INFO_BB): New macro.
(DF_MIR_IN, DF_MIR_OUT): New macros.
(struct df_mir_bb_info): New.
(df_mir): New macro.
(df_mir_add_problem, df_mir_simulate_one_insn): New forward
declarations.
(df_mir_get_bb_info): New.
* df-problems.c (struct df_mir_problem_data): New.
(df_mir_free_bb_info, df_mir_alloc, df_mir_reset,
df_mir_bb_local_compute, df_mir_local_compute, df_mir_init,
df_mir_confluence_n, df_mir_transfer_function, df_mir_free,
df_mir_top_dump, df_mir_bottom_dump,
df_mir_verify_solution_start, df_mir_verify_solution_end): New.
(problem_MIR): New.
(df_mir_add_problem, df_mir_simulate_one_insn): New.
* timevar.def (TV_DF_MIR): New.
* ree.c: Include bitmap.h
(add_removable_extension): Add an INIT_REGS parameter.  Use it
to skip extensions that may get an uninitialized register.
(find_removable_extensions): Compute must-initialized registers
using the MIR dataflow problem. Update the call to
add_removable_extension.
(find_and_remove_re): Call df_mir_add_problem.

--
Pierre-Marie de Rodat
>From 2b0fe78644c3c3b16555b69d0b787dbad4a434a4 Mon Sep 17 00:00:00 2001
From: Pierre-Marie de Rodat 
Date: Sat, 18 Jul 2015 13:10:45 +0200
Subject: [PATCH] REE: fix uninitialized registers handling

gcc/ChangeLog:

	PR rtl-optimization/66790
	* df.h (DF_MIR): New macro.
	(DF_LAST_PROBLEM_PLUS1): Update to be past DF_MIR
	(DF_MIR_INFO_BB): New macro.
	(DF_MIR_IN, DF_MIR_OUT): New macros.
	(struct df_mir_bb_info): New.
	(df_mir): New macro.
	(df_mir_add_problem, df_mir_simulate_one_insn): New forward
	declarations.
	(df_mir_get_bb_info): New.
	* df-problems.c (struct df_mir_problem_data): New.
	(df_mir_free_bb_info, df_mir_alloc, df_mir_reset,
	df_mir_bb_local_compute, df_mir_local_compute, df_mir_init,
	df_mir_confluence_n, df_mir_transfer_function, df_mir_free,
	df_mir_top_dump, df_mir_bottom_dump,
	df_mir_verify_solution_start, df_mir_verify_solution_end): New.
	(problem_MIR): New.
	(df_mir_add_problem, df_mir_simulate_one_insn): New.
	* timevar.def (TV_DF_MIR): New.
	* ree.c: Include bitmap.h
	(add_removable_extension): Add an INIT_REGS parameter.  Use it
	to skip extensions that may get an uninitialized register.
	(find_removable_extensions): Compute must-initialized registers
	using the MIR dataflow problem. Update the call to
	add_removable_extension.
	(find_and_remove_re): Call df_mir_add_problem.
---
 gcc/df-problems.c | 411 ++
 gcc/df.h  |  37 -
 gcc/ree.c |  61 ++--
 gcc/timevar.def   |   1 +
 4 files changed, 496 insertions(+), 14 deletions(-)

diff --git a/gcc/df-problems.c b/gcc/df-problems.c
index d4b5d76..fdb067d 100644
--- a/gcc/df-problems.c
+++ b/gcc/df-problems.c
@@ -1848,6 +1848,417 @@ df_live_verify_transfer_functions (void)
 }
 
 /*
+   MUST-INITIALIZED REGISTERS.
+*/
+
+/* Private data used to verify the solution for this problem.  */
+struct df_mir_problem_data
+{
+  bitmap_head *in;
+  bitmap_head *out;
+  /* An obstack for the bitmaps we need for this problem.  */
+  bitmap_obstack mir_bitmaps;
+};
+
+
+/* Free basic block info.  */
+
+static void
+df_mir_free_bb_info (basic_block bb ATTRIBUTE_UNUSED,
+		 void *vbb_info)
+{
+  struct df_mir_bb_info *bb_info = (struct df_mir_bb_info *) vbb_info;
+  if (bb_info)
+{
+  bitmap_clear (&bb_info->gen);
+  bitmap_clear (&bb_

Re: [PATCH][AArch64] Use cinc for if_then_else of plus-immediates

2015-07-18 Thread Andrew Pinski
On Thu, Jul 16, 2015 at 8:33 AM, Kyrill Tkachov  wrote:
> Hi all,
>
> This patch improves codegen for expressions of the form:
> (x ? y + c1 : y + c2) when |c1 - c2| == 1
>
> It matches the if_then_else of the two plus-immediates,
> performs one of them, then generates a conditional increment
> operation.
>
> Thus, for the code in the testcase we generate a single add, compare
> and cinc instruction rather than two adds, a compare and a csel.
>
> Bootstrapped and tested on aarch64.
>
> Ok for trunk?

Why isn't this done in the generic code already.  That is ifcvt?  It
seems better to have it optimize it there rather than having a target
specific patch for something which is not really target specific
except maybe the cost.

Thanks,
Andrew


>
> Thanks,
> Kyrill
>
> 2015-07-16  Kyrylo Tkachov  
>
> * config/aarch64/aarch64.md (*csel_plus6):
> New define_insn_and_split.
> (*csinc2_insn): Rename to...
> (csinc2_insn): ... This.
>
> 2015-07-16  Kyrylo Tkachov  
>
> * gcc.target/aarch64/cinc_common_1.c: New test.


Re: [PATCH][AArch64] Use cinc for if_then_else of plus-immediates

2015-07-18 Thread Oleg Endo

On 19 Jul 2015, at 12:13, Andrew Pinski  wrote:

> On Thu, Jul 16, 2015 at 8:33 AM, Kyrill Tkachov  
> wrote:
>> Hi all,
>> 
>> This patch improves codegen for expressions of the form:
>> (x ? y + c1 : y + c2) when |c1 - c2| == 1
>> 
>> It matches the if_then_else of the two plus-immediates,
>> performs one of them, then generates a conditional increment
>> operation.
>> 
>> Thus, for the code in the testcase we generate a single add, compare
>> and cinc instruction rather than two adds, a compare and a csel.
>> 
>> Bootstrapped and tested on aarch64.
>> 
>> Ok for trunk?
> 
> Why isn't this done in the generic code already.  That is ifcvt?  It
> seems better to have it optimize it there rather than having a target
> specific patch for something which is not really target specific
> except maybe the cost.

I'd be better if something transformed

  x > 100 ? x - 2 : x - 1;

into

  x - 1 - (x > 100)

This is much easier to handle with combine patterns without having to rely on 
conditional move patterns.  In this case, it seems that the if_then_else 
combine patterns will be formed by going through conditional move patterns.  So 
if the target doesn't define conditional moves, it will never be able to get 
there.

There are some other similar missed ifcvt cases, such as 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236#c9
Maybe the existing addcc handling in ifcvt could be extended to do that.

Cheers,
Oleg