Re: [PATCH] Prevent LTO wrappers to process a recursive execution

2016-04-26 Thread Martin Liška
On 04/25/2016 09:30 PM, Andi Kleen wrote:
> Does that really work? When the executable is found in $PATH
> av[0] does not contain the full path name. But you seem to assume
> it does?
> 
> -Andi

Hi.

Well, it should be resolved by lrealpath. There's usage from my machine:

marxin@marxinbox:~> which nm
/home/marxin/bin/gcc2/bin/nm
marxin@marxinbox:~> strace -f -s512 nm /tmp/a.o 2>&1 | grep exec
execve("/home/marxin/bin/gcc2/bin/nm", ["nm", "/tmp/a.o"], [/* 97 vars */]) = 0
[pid  5288] execve("/home/marxin/bin/gcc2/bin/nm", 
["/home/marxin/bin/gcc2/bin/nm", "--plugin", 
"/home/marxin/bin/gcc2/lib/gcc/x86_64-pc-linux-gnu/7.0.0/liblto_plugin.so", 
"/tmp/a.o"], [/* 97 vars */] 
[pid  5288] <... execve resumed> )  = 0
[pid  5289] execve("/usr/bin/nm", ["/usr/bin/nm", "--plugin", 
"/home/marxin/bin/gcc2/lib/gcc/x86_64-pc-linux-gnu/7.0.0/liblto_plugin.so", 
"--plugin", 
"/home/marxin/bin/gcc2/lib/gcc/x86_64-pc-linux-gnu/7.0.0/liblto_plugin.so", 
"/tmp/a.o"], [/* 97 vars */] 
[pid  5289] <... execve resumed> )  = 0

marxin@marxinbox:~> l /home/marxin/bin/gcc2/bin/ | grep nm
-rwxr-xr-x 2 marxin users  141880 Apr 26 09:53 gcc-nm*
lrwxrwxrwx 1 marxin users   6 Apr 26 09:31 nm -> gcc-nm*

Martin


[PING][PATCH] DWARF: add abstract origin links on lexical blocks DIEs

2016-04-26 Thread Pierre-Marie de Rodat
Ping for the patch submitted at 
. It applies 
just fine on the current trunk and still bootstrapps and regtests 
successfuly on x86_64-linux.


Thank you in advance,

--
Pierre-Marie de Rodat


Re: Compile libcilkrts with -funwind-tables (PR target/60290)

2016-04-26 Thread Rainer Orth
Hi Jeff,

> On 04/06/2016 05:12 AM, Rainer Orth wrote:
>> I've finally gotten around to analyzing this testsuite failure on 32-bit
>> Solaris/x86:
>>
>> FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -O1 -fcilkplus execution test
>> FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -O3 -fcilkplus execution test
>> FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -g -O2 -fcilkplus execution test
>> FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -g -fcilkplus execution test
>>
>> The testcase aborts like this:
>>
>> Thread 2 received signal SIGABRT, Aborted.
>> [Switching to Thread 1 (LWP 1)]
>> 0xfe3ba3c5 in __lwp_sigqueue () from /lib/libc.so.1
>> (gdb) where
>> #0  0xfe3ba3c5 in __lwp_sigqueue () from /lib/libc.so.1
>> #1  0xfe3b2d4f in thr_kill () from /lib/libc.so.1
>> #2  0xfe2f64da in raise () from /lib/libc.so.1
>> #3  0xfe2c93ee in abort () from /lib/libc.so.1
>> #4  0xfe525b37 in _Unwind_Resume (exc=0x80a75a0)
>>  at /vol/gcc/src/hg/trunk/local/libgcc/unwind.inc:234
>> #5  0xfe783b85 in __cilkrts_gcc_rethrow (sf=0xfeffdb00)
>>  at /vol/gcc/src/hg/trunk/local/libcilkrts/runtime/except-gcc.cpp:589
>> #6  0xfe77f0ea in __cilkrts_rethrow (sf=0xfeffdb00)
>>  at /vol/gcc/src/hg/trunk/local/libcilkrts/runtime/cilk-abi.c:548
>> #7  0x080513a3 in my_test ()
>>  at 
>> /vol/gcc/src/hg/trunk/local/gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc:38
>> #8  0x080515bd in main ()
>>  at 
>> /vol/gcc/src/hg/trunk/local/gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc:62
>>
>> The gcc_assert in _Unwind_Resume triggers since
>> _Unwind_RaiseException_Phase2 returned _URC_FATAL_PHASE2_ERROR.  I found
>> that x86_fallback_frame_state had been invoked for this pc:
>>
>> 0xfe77218e <__cilkrts_rethrow+30>:   add$0x10,%esp
>>
>> and returned _URC_END_OF_STACK, which is totally unexpected since
>> _Unwind_Find_FDE should have found it.  __cilkrts_rethrow is defined in
>> libcilkrts/cilk-abi.o, but in the 32-bit case EH info is missing:
>>
>> 32-bit:
>>
>> ro@fuego 339 > elfdump -u .libs/libcilkrts.so|grep rethrow
>> 0x18def  0x3558c  __cilkrts_gcc_rethrow
>>[0x11fc]  initloc:   0x18def [ sdata4 pcrel ]  __cilkrts_gcc_rethrow
>>
>> 64-bit:
>>
>> ro@fuego 341 > elfdump -u amd64/libcilkrts/.libs/libcilkrts.so|grep rethrow
>> 0x1c639  0x2010  __cilkrts_rethrow
>> 0x2388b  0x5510  __cilkrts_gcc_rethrow
>> [0x488]  initloc:   0x1c639 [ sdata4 pcrel ]  __cilkrts_rethrow
>>[0x3988]  initloc:   0x2388b [ sdata4 pcrel ]  __cilkrts_gcc_rethrow
>>
>> I traced this to -funwind-tables bein set on 32-bit Linux/x86, while it
>> is unset on 32-bit Solaris/x86  due to
>>
>> i386/i386.c (ix86_option_override_internal):
>>
>>if (opts->x_flag_asynchronous_unwind_tables == 2)
>>  opts->x_flag_asynchronous_unwind_tables = !USE_IX86_FRAME_POINTER;
>>
>> where i386/sol2.h has
>>
>> #define USE_IX86_FRAME_POINTER 1
>>
>> while the default is 0.
>>
>> As expected, compiling libcilkrts with -funwind-tables (which is a no-op
>> on Linux/x86, Linux/x86_64, and Solaris/amd64) makes the failure go
>> away.
>>
>> I'm uncertain if this is ok for mainline at this stage or has to wait
>> for gcc-7.  Once it goes into mainline, it's probably worth a backport
>> to all active release branches.
>>
>> Thoughts?
>>
>>  Rainer
>>
>>
>> 2016-04-04  Rainer Orth  
>>
>>  PR target/60290
>>  * Makefile.am (GENERAL_FLAGS): Add -funwind-tables.
>>  * Makefile.in: Regenerate.
> OK.  Thanks for tracking this down, including verification that the
> difference between Solaris and Linux is the latter having -funwind-tables
> on by default.

thanks, installed.  About eventual backports to the 5 and 6 (after 6.1
is released) and maybe even 4.9 branches?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Compile libcilkrts with -funwind-tables (PR target/60290)

2016-04-26 Thread Jakub Jelinek
On Tue, Apr 26, 2016 at 10:56:16AM +0200, Rainer Orth wrote:
> > OK.  Thanks for tracking this down, including verification that the
> > difference between Solaris and Linux is the latter having -funwind-tables
> > on by default.
> 
> thanks, installed.  About eventual backports to the 5 and 6 (after 6.1
> is released) and maybe even 4.9 branches?

After a week or three on the trunk so that people have a chance to report
issues, yes.

Jakub


Re: [PATCH] Verify that context of local DECLs is the current function

2016-04-26 Thread Richard Biener
On Mon, Apr 25, 2016 at 3:22 PM, Martin Jambor  wrote:
> Hi,
>
> the patch below moves an assert from expand_expr_real_1 to gimple
> verification.  It triggers when we do a sloppy job outlining stuff
> from one function to another (or perhaps inlining too) and leave in
> the IL of a function a local declaration that belongs to a different
> function.
>
> Like I wrote above, such cases usually ICE in expand anyway, but I
> think it is worth bailing out sooner, if nothing because bugs like PR
> 70348 would not be assigned to me ;-) ...well, actually, I found this
> helpful when working on OpenMP gridification.
>
> In the process, I think that the verifier would not catch a
> SSA_NAME_IN_FREE_LIST when such an SSA_NAME is a base of a MEM_REF so
> I added that check too.
>
> Bootstrapped and tested on x86_64-linux, OK for trunk?
>
> Thanks,
>
> Martin
>
>
>
> 2016-04-21  Martin Jambor  
>
> * tree-cfg.c (verify_var_parm_result_decl): New function.
> (verify_address): Call it on PARM_DECL bases.
> (verify_expr): Likewise, also verify SSA_NAME bases of MEM_REFs.
> ---
>  gcc/tree-cfg.c | 47 +++
>  1 file changed, 47 insertions(+)
>
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index 3385164..c917967 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -2764,6 +2764,23 @@ gimple_split_edge (edge edge_in)
>return new_bb;
>  }
>
> +/* Verify that a VAR, PARM_DECL or RESULT_DECL T is from the current 
> function,
> +   and if not, return true.  If it is, return false.  */
> +
> +static bool
> +verify_var_parm_result_decl (tree t)
> +{
> +  tree context = decl_function_context (t);
> +  if (context != cfun->decl
> +  && !SCOPE_FILE_SCOPE_P (context)
> +  && !TREE_STATIC (t)
> +  && !DECL_EXTERNAL (t))
> +{
> +  error ("Local declaration from a different function");
> +  return true;
> +}
> +  return NULL;
> +}
>
>  /* Verify properties of the address expression T with base object BASE.  */
>
> @@ -2798,6 +2815,8 @@ verify_address (tree t, tree base)
> || TREE_CODE (base) == RESULT_DECL))
>  return NULL_TREE;
>
> +  if (verify_var_parm_result_decl (base))
> +return base;

Is that necessary?  We recurse after all, so ...

>if (DECL_GIMPLE_REG_P (base))
>  {
>error ("DECL_GIMPLE_REG_P set on a variable with address taken");
> @@ -2834,6 +2853,13 @@ verify_expr (tree *tp, int *walk_subtrees, void *data 
> ATTRIBUTE_UNUSED)
> }
>break;
>
> +case PARM_DECL:
> +case VAR_DECL:
> +case RESULT_DECL:
> +  if (verify_var_parm_result_decl (t))
> +   return t;
> +  break;
> +

... should apply.

>  case INDIRECT_REF:
>error ("INDIRECT_REF in gimple IL");
>return t;
> @@ -2852,9 +2878,25 @@ verify_expr (tree *tp, int *walk_subtrees, void *data 
> ATTRIBUTE_UNUSED)
>   error ("invalid offset operand of MEM_REF");
>   return TREE_OPERAND (t, 1);
> }
> +  if (TREE_CODE (x) == SSA_NAME)
> +   {
> + if (SSA_NAME_IN_FREE_LIST (x))
> +   {
> + error ("SSA name in freelist but still referenced");
> + return x;
> +   }
> + if (SSA_NAME_VAR (x))
> +   x = SSA_NAME_VAR (x);;
> +   }
> +  if ((TREE_CODE (x) == PARM_DECL
> +  || TREE_CODE (x) == VAR_DECL
> +  || TREE_CODE (x) == RESULT_DECL)
> + && verify_var_parm_result_decl (x))
> +   return x;

please instead try removing *walk_subtrees = 0 ...

>if (TREE_CODE (x) == ADDR_EXPR
>   && (x = verify_address (x, TREE_OPERAND (x, 0
> return x;

... we only get some slight duplicate address verification here
(this copy is stronger than the ADDR_EXPR case).

> +
>*walk_subtrees = 0;
>break;
>
> @@ -3010,6 +3052,11 @@ verify_expr (tree *tp, int *walk_subtrees, void *data 
> ATTRIBUTE_UNUSED)
>
>   t = TREE_OPERAND (t, 0);
> }
> +  if ((TREE_CODE (t) == PARM_DECL
> +  || TREE_CODE (t) == VAR_DECL
> +  || TREE_CODE (t) == RESULT_DECL)
> + && verify_var_parm_result_decl (t))
> +   return t;

Hmm.  I wonder if rather than replicating stuff everywhere we do not walk
subtrees we instead should walk the subtree we care about explicitely
via a walk_tree invocation.  Like in this case

   walk_tree (&t, verify_expr, data);

Richard.

>if (!is_gimple_min_invariant (t) && !is_gimple_lvalue (t))
> {
> --
> 2.8.1
>


Unreviewed^2 patches

2016-04-26 Thread Rainer Orth
The following two patches have remained unreviewed for about 6 weeks,
despite a reminder:

[i386] Support .lbss etc. sections with Solaris as (PR target/59407)
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01056.html

[i386] Support .largecomm with Solaris as (PR target/61821)
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01059.html

Both contain x86-specific parts; the first one also needs a middle-end
reviewer for the varasm.c changes.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Fix missed DSE opportunity with operator delete.

2016-04-26 Thread Richard Biener
On Mon, Apr 25, 2016 at 9:57 PM, Jason Merrill  wrote:
> Hmm, this seems to assume that operator delete itself doesn't do
> anything with the object being deleted.  This is true of the default
> implementation, but I don't see anything in the standard that
> prohibits a user-supplied replacement or class-specific deallocation
> function from accessing the memory.

Hmm, but the delete expression invokes the (default) destructor which
ends the lifetime of the object and thus invalidates all memory.  Don't
we place a CLOBBER there as well nowadays (seems not).

For

struct A { A(); ~A(); int i; };

int main()
{
  A *a = new A;
  delete a;
}

I see

struct A * a;
  <;, try
{
  A::A ((struct A *) D.2346);
}
  catch
{
  operator delete (D.2346);
}, (struct A *) D.2346;) >;
  < != 0B)
{
  A::~A (SAVE_EXPR );, operator delete ((void *) SAVE_EXPR );;
}
  else
{
  <<< Unknown tree: void_cst >>>
} >;

so after the destructor is invoked the objects lifetime ends.

Richard.

> Jason
>
>
> On Mon, Apr 25, 2016 at 6:08 AM, Richard Biener
>  wrote:
>> On Fri, Apr 22, 2016 at 11:37 PM, Mikhail Maltsev  wrote:
>>> On 04/20/2016 05:12 PM, Richard Biener wrote:
 You have

 +static tree
 +handle_free_attribute (tree *node, tree name, tree /*args*/, int 
 /*flags*/,
 +  bool *no_add_attrs)
 +{
 +  tree decl = *node;
 +  if (TREE_CODE (decl) == FUNCTION_DECL
 +  && type_num_arguments (TREE_TYPE (decl)) != 0
 +  && POINTER_TYPE_P (TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (decl)
 +DECL_ALLOC_FN_KIND (decl) = ALLOC_FN_FREE;
 +  else
 +{
 +  warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
 + "%qE attribute ignored", name);
 +  *no_add_attrs = true;
 +}

 so one can happily apply the attribute to

  void foo (void *, void *);

 but then

 @@ -2117,6 +2127,13 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref)
   /* Fallthru to general call handling.  */;
}

 +  if (callee != NULL_TREE
 +  && (flags_from_decl_or_type (callee) & ECF_FREE) != 0)
 +{
 +  tree ptr = gimple_call_arg (call, 0);
 +  return ptr_deref_may_alias_ref_p_1 (ptr, ref);
 +}

 will ignore the 2nd argument.  I think it's better to ignore the attribute
 if type_num_arguments () != 1.
>>>
>>> Actually, the C++ standard ([basic.stc.dynamic]/2) defines the following 4
>>> deallocation functions implicitly:
>>>
>>> void operator delete(void*);
>>> void operator delete[](void*);
>>> void operator delete(void*, std::size_t) noexcept;
>>> void operator delete[](void*, std::size_t) noexcept;
>>>
>>> And the standard library also has:
>>>
>>> void operator delete(void*, const std::nothrow_t&);
>>> void operator delete[](void*, const std::nothrow_t&);
>>> void operator delete(void*, std::size_t, const std::nothrow_t&);
>>> void operator delete[](void*, std::size_t, const std::nothrow_t&);
>>>
>>> IIUC, 'delete(void*, std::size_t)' is used by default in C++14
>>> (https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01266.html). How should we 
>>> handle
>>> this?
>>
>> Hmm.  I guess by adjusting the documentation of the attribute to
>> explicitely mention
>> the behavior on the rest of the argument pointed-to memory (the
>> function is assumed
>> to neither write nor read from that memory).  Also explicitely mention
>> that 'this' is
>> always the first argument if present.
>>
>> Richard.
>>
>>> --
>>> Regards,
>>> Mikhail Maltsev


Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread Richard Biener
On Mon, 25 Apr 2016, Uros Bizjak wrote:

> On Mon, Apr 25, 2016 at 4:47 PM, H.J. Lu  wrote:
> > On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak  wrote:
> >> On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu  wrote:
> >>> Tested on Linux/x86-64.  OK for trunk?
> >>
> >>> +  /* FIXME: Since the CSE pass may change dominance info, which isn't
> >>> + expected by the fwprop pass, call free_dominance_info to
> >>> + invalidate dominance info.  Otherwise, the fwprop pass may crash
> >>> + when dominance info is changed.  */
> >>> +  if (TARGET_64BIT)
> >>> +free_dominance_info (CDI_DOMINATORS);
> >>> +
> >>
> >> Please resolve the above problem first, target-dependent sources are
> >> not the place to apply band-aids for middle-end problems. The thread
> >> with the proposed fix died in [1].
> >>
> >> [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html
> >
> > free_dominance_info (CDI_DOMINATORS) has been called in other
> > places to avoid this middle-end issue.   I don't know when the middle-end
> > will be fixed.  I don't think this target optimization should be penalized 
> > by
> > the middle-end issue.
> 
> Let's ask Richard if he is OK with the workaround...

Well, it's ultimately your call (it's a workaround in the target).

Of course I'd like to see the underlying issue fixed and the 
workarounds in "other places" be removed.

Richard.

> One more thing:
> 
> @@ -3551,6 +3874,7 @@ convert_scalars_to_vector ()
>basic_block bb;
>bitmap candidates;
>int converted_insns = 0;
> +  rtx zero, minus_one;
> 
>bitmap_obstack_initialize (NULL);
>candidates = BITMAP_ALLOC (NULL);
> @@ -3585,22 +3909,40 @@ convert_scalars_to_vector ()
>  if (dump_file)
>fprintf (dump_file, "There are no candidates for optimization.\n");
> 
> +  if (TARGET_64BIT)
> +{
> +  zero = gen_reg_rtx (V1TImode);
> +  minus_one = gen_reg_rtx (V1TImode);
> +}
> +  else
> +{
> +  zero = NULL_RTX;
> +  minus_one = NULL_RTX;
> +}
> +
> 
> Do we *really* need to crate registers here? They are only used as a
> temporary in scalar_chain_64::convert_insn, and I think that any CSE
> worth its name should find out when the same immediate is loaded to
> different temporaries.
> 
> BTW: I'd really like if Ilya can review the functionality of the
> patch, he is an expert in this conversion stuff.
> 
> Uros.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix missed DSE opportunity with operator delete.

2016-04-26 Thread Marc Glisse

On Tue, 26 Apr 2016, Richard Biener wrote:


On Mon, Apr 25, 2016 at 9:57 PM, Jason Merrill  wrote:

Hmm, this seems to assume that operator delete itself doesn't do
anything with the object being deleted.  This is true of the default
implementation, but I don't see anything in the standard that
prohibits a user-supplied replacement or class-specific deallocation
function from accessing the memory.


Hmm, but the delete expression invokes the (default) destructor which
ends the lifetime of the object and thus invalidates all memory.  Don't
we place a CLOBBER there as well nowadays (seems not).

For

struct A { A(); ~A(); int i; };

int main()
{
 A *a = new A;
 delete a;
}

I see

   struct A * a;
 <;, try
   {
 A::A ((struct A *) D.2346);
   }
 catch
   {
 operator delete (D.2346);
   }, (struct A *) D.2346;) >;
 < != 0B)
   {
 A::~A (SAVE_EXPR );, operator delete ((void *) SAVE_EXPR );;
   }
 else
   {
 <<< Unknown tree: void_cst >>>
   } >;

so after the destructor is invoked the objects lifetime ends.


You can also call operator new and operator delete directly in C++, not 
just through new/delete expressions, and that's what std::allocator does.


Especially in C++17 where we should have aligned allocation, I can imagine 
operator new / delete implemented like gcc/config/i386/gmm_malloc.h on 
some platforms, which requires reading in operator delete stuff that 
operator new wrote there. Depending on inline decisions, DSE-ing part of 
new could be problematic (IIRC those functions are technically not allowed 
to be marked 'inline', but that's not quite the same thing).


--
Marc Glisse


Re: Unreviewed^2 patches

2016-04-26 Thread Uros Bizjak
On Tue, Apr 26, 2016 at 11:04 AM, Rainer Orth
 wrote:
> The following two patches have remained unreviewed for about 6 weeks,
> despite a reminder:
>
> [i386] Support .lbss etc. sections with Solaris as (PR target/59407)
> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01056.html
>
> [i386] Support .largecomm with Solaris as (PR target/61821)
> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01059.html
>
> Both contain x86-specific parts; the first one also needs a middle-end
> reviewer for the varasm.c changes.

You know as and gas better than I, so I'll rubber-stamp x86 part as OK.

Thanks,
Uros.


Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread Uros Bizjak
On Tue, Apr 26, 2016 at 11:17 AM, Richard Biener  wrote:
> On Mon, 25 Apr 2016, Uros Bizjak wrote:
>
>> On Mon, Apr 25, 2016 at 4:47 PM, H.J. Lu  wrote:
>> > On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak  wrote:
>> >> On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu  wrote:
>> >>> Tested on Linux/x86-64.  OK for trunk?
>> >>
>> >>> +  /* FIXME: Since the CSE pass may change dominance info, which isn't
>> >>> + expected by the fwprop pass, call free_dominance_info to
>> >>> + invalidate dominance info.  Otherwise, the fwprop pass may crash
>> >>> + when dominance info is changed.  */
>> >>> +  if (TARGET_64BIT)
>> >>> +free_dominance_info (CDI_DOMINATORS);
>> >>> +
>> >>
>> >> Please resolve the above problem first, target-dependent sources are
>> >> not the place to apply band-aids for middle-end problems. The thread
>> >> with the proposed fix died in [1].
>> >>
>> >> [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html
>> >
>> > free_dominance_info (CDI_DOMINATORS) has been called in other
>> > places to avoid this middle-end issue.   I don't know when the middle-end
>> > will be fixed.  I don't think this target optimization should be penalized 
>> > by
>> > the middle-end issue.
>>
>> Let's ask Richard if he is OK with the workaround...
>
> Well, it's ultimately your call (it's a workaround in the target).

Oh well, ...

> Of course I'd like to see the underlying issue fixed and the
> workarounds in "other places" be removed.

... then at least a reference to a relevant PR should be added to a
FIXME comment.

Uros.


Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread Richard Biener
On Tue, 26 Apr 2016, Uros Bizjak wrote:

> On Tue, Apr 26, 2016 at 11:17 AM, Richard Biener  wrote:
> > On Mon, 25 Apr 2016, Uros Bizjak wrote:
> >
> >> On Mon, Apr 25, 2016 at 4:47 PM, H.J. Lu  wrote:
> >> > On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak  wrote:
> >> >> On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu  wrote:
> >> >>> Tested on Linux/x86-64.  OK for trunk?
> >> >>
> >> >>> +  /* FIXME: Since the CSE pass may change dominance info, which isn't
> >> >>> + expected by the fwprop pass, call free_dominance_info to
> >> >>> + invalidate dominance info.  Otherwise, the fwprop pass may crash
> >> >>> + when dominance info is changed.  */
> >> >>> +  if (TARGET_64BIT)
> >> >>> +free_dominance_info (CDI_DOMINATORS);
> >> >>> +
> >> >>
> >> >> Please resolve the above problem first, target-dependent sources are
> >> >> not the place to apply band-aids for middle-end problems. The thread
> >> >> with the proposed fix died in [1].
> >> >>
> >> >> [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html
> >> >
> >> > free_dominance_info (CDI_DOMINATORS) has been called in other
> >> > places to avoid this middle-end issue.   I don't know when the middle-end
> >> > will be fixed.  I don't think this target optimization should be 
> >> > penalized by
> >> > the middle-end issue.
> >>
> >> Let's ask Richard if he is OK with the workaround...
> >
> > Well, it's ultimately your call (it's a workaround in the target).
> 
> Oh well, ...
> 
> > Of course I'd like to see the underlying issue fixed and the
> > workarounds in "other places" be removed.
> 
> ... then at least a reference to a relevant PR should be added to a
> FIXME comment.

HJ, can you please open a bug with 1) a testcase, 2) a patch to revert
the workaround so it shows the ICE and 3) a pointer to the ml thread
with your preliminary analysis?

The advantage is that with the patch in we have a reproducer at least.

Thanks,
Richard.


Re: Unreviewed^2 patches

2016-04-26 Thread Jakub Jelinek
On Tue, Apr 26, 2016 at 11:29:40AM +0200, Uros Bizjak wrote:
> On Tue, Apr 26, 2016 at 11:04 AM, Rainer Orth
>  wrote:
> > The following two patches have remained unreviewed for about 6 weeks,
> > despite a reminder:
> >
> > [i386] Support .lbss etc. sections with Solaris as (PR target/59407)
> > https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01056.html
> >
> > [i386] Support .largecomm with Solaris as (PR target/61821)
> > https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01059.html
> >
> > Both contain x86-specific parts; the first one also needs a middle-end
> > reviewer for the varasm.c changes.
> 
> You know as and gas better than I, so I'll rubber-stamp x86 part as OK.

The middle-end/doc changes are ok too.

Jakub


Re: C++ PATCH to implement C++17 maybe_unused attribute

2016-04-26 Thread Andreas Schwab
/usr/local/gcc/gcc-20160426/gcc/testsuite/c-c++-common/cpp/pr63831-1.c:5:14: 
error: size of array 'T1' is negative
/usr/local/gcc/gcc-20160426/gcc/testsuite/c-c++-common/cpp/pr63831-1.c:13:14: 
error: size of array 'T5' is negative
/usr/local/gcc/gcc-20160426/gcc/testsuite/c-c++-common/cpp/pr63831-1.c:17:1: 
error: unknown type name 'T4'
/usr/local/gcc/gcc-20160426/gcc/testsuite/c-c++-common/cpp/pr63831-1.c:35:14: 
error: size of array 'T11' is negative
/usr/local/gcc/gcc-20160426/gcc/testsuite/c-c++-common/cpp/pr63831-1.c:43:14: 
error: size of array 'T15' is negative
/usr/local/gcc/gcc-20160426/gcc/testsuite/c-c++-common/cpp/pr63831-1.c:47:1: 
error: unknown type name 'T14'

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH] Fix missed DSE opportunity with operator delete.

2016-04-26 Thread Richard Biener
On Tue, Apr 26, 2016 at 11:28 AM, Marc Glisse  wrote:
> On Tue, 26 Apr 2016, Richard Biener wrote:
>
>> On Mon, Apr 25, 2016 at 9:57 PM, Jason Merrill  wrote:
>>>
>>> Hmm, this seems to assume that operator delete itself doesn't do
>>> anything with the object being deleted.  This is true of the default
>>> implementation, but I don't see anything in the standard that
>>> prohibits a user-supplied replacement or class-specific deallocation
>>> function from accessing the memory.
>>
>>
>> Hmm, but the delete expression invokes the (default) destructor which
>> ends the lifetime of the object and thus invalidates all memory.  Don't
>> we place a CLOBBER there as well nowadays (seems not).
>>
>> For
>>
>> struct A { A(); ~A(); int i; };
>>
>> int main()
>> {
>>  A *a = new A;
>>  delete a;
>> }
>>
>> I see
>>
>>struct A * a;
>>  <>  (void) (a = TARGET_EXPR ;, try
>>{
>>  A::A ((struct A *) D.2346);
>>}
>>  catch
>>{
>>  operator delete (D.2346);
>>}, (struct A *) D.2346;) >;
>>  <>  if (SAVE_EXPR  != 0B)
>>{
>>  A::~A (SAVE_EXPR );, operator delete ((void *) SAVE_EXPR );;
>>}
>>  else
>>{
>>  <<< Unknown tree: void_cst >>>
>>} >;
>>
>> so after the destructor is invoked the objects lifetime ends.
>
>
> You can also call operator new and operator delete directly in C++, not just
> through new/delete expressions, and that's what std::allocator does.
>
> Especially in C++17 where we should have aligned allocation, I can imagine
> operator new / delete implemented like gcc/config/i386/gmm_malloc.h on some
> platforms, which requires reading in operator delete stuff that operator new
> wrote there. Depending on inline decisions, DSE-ing part of new could be
> problematic (IIRC those functions are technically not allowed to be marked
> 'inline', but that's not quite the same thing).

Ok.  Is it then a matter of also seeing the size of the object and only treating
[p, p + size[ as DCEable thus only the object itself becomes dead?

Btw, similar issues would arise if we'd inline malloc() from glibc and that used
a malloc annotated internal helper and we run into a (not inlined) free() call.

Richard.

> --
> Marc Glisse


Re: match.pd: x+x -> 2*x

2016-04-26 Thread Richard Biener
On Mon, Apr 25, 2016 at 9:21 PM, Marc Glisse  wrote:
> Hello,
>
> a simple transform to replace a more complicated one in fold-const.c.
>
> This patch breaks the testcase gcc.dg/gomp/loop-1.c. Indeed, the C front-end
> folds too eagerly
>   newrhs = c_fully_fold (newrhs, false, NULL);
> in build_modify_expr, and by the time the OMP code checks that the increment
> in the for loop has the right form, it sees i=i*2 instead of i=i+i. Since
> the original code is apparently illegal, I guess it isn't that bad... The
> C++ front-end seems fine.
>
> Testcase no-strict-overflow-6.c also breaks. ivcanon is clever enough to
> count how many iterations there are before i*=2 makes i negative, which I
> guess would be great with -fwrapv, but I find it a bit suspicious with just
> -fno-strict-overflow. I adjusted the testcase assuming the ivcanon was doing
> the right thing (with -fstrict-overflow we generate an infinite loop
> instead, so it is still testing that).
>
> Bootstrap+regtest on powerpc64le-unknown-linux-gnu.

Ok.  Note that I think

/* (X /[ex] A) * A -> X.  */
(simplify
  (mult (convert? (exact_div @0 @1)) @1)
  /* Look through a sign-changing conversion.  */
  (convert @0))

has a bug as we use operand_equal_p for comparing @1 but
that treats equal but different typed INTEGER_CSTs as equal...

Thus this lacks the tree_nop_conversion_p check.

Thanks,
Richard.

> 2016-04-26  Marc Glisse  
>
> gcc/
> * genmatch.c (write_predicate): Add ATTRIBUTE_UNUSED.
> * fold-const.c (fold_binary_loc): Remove 2 transformations
> superseded by match.pd.
> * match.pd (x+x -> x*2): Generalize to integers.
>
> gcc/testsuite/
> * gcc.dg/fold-plusmult.c: Adjust.
> * gcc.dg/no-strict-overflow-6.c: Adjust.
> * gcc.dg/gomp/loop-1.c: Xfail some tests.
>
> --
> Marc Glisse
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c(revision 235411)
> +++ gcc/fold-const.c(working copy)
> @@ -9949,39 +9949,20 @@ fold_binary_loc (location_t loc,
>   /* Transform x * -C into -x * C if x is easily negatable.  */
>   if (TREE_CODE (op1) == INTEGER_CST
>   && tree_int_cst_sgn (op1) == -1
>   && negate_expr_p (op0)
>   && (tem = negate_expr (op1)) != op1
>   && ! TREE_OVERFLOW (tem))
> return fold_build2_loc (loc, MULT_EXPR, type,
> fold_convert_loc (loc, type,
>   negate_expr (op0)),
> tem);
>
> - /* (A + A) * C -> A * 2 * C  */
> - if (TREE_CODE (arg0) == PLUS_EXPR
> - && TREE_CODE (arg1) == INTEGER_CST
> - && operand_equal_p (TREE_OPERAND (arg0, 0),
> - TREE_OPERAND (arg0, 1), 0))
> -   return fold_build2_loc (loc, MULT_EXPR, type,
> -   omit_one_operand_loc (loc, type,
> - TREE_OPERAND (arg0, 0),
> - TREE_OPERAND (arg0, 1)),
> -   fold_build2_loc (loc, MULT_EXPR, type,
> -build_int_cst (type, 2) ,
> arg1));
> -
> - /* ((T) (X /[ex] C)) * C cancels out if the conversion is
> -sign-changing only.  */
> - if (TREE_CODE (arg1) == INTEGER_CST
> - && TREE_CODE (arg0) == EXACT_DIV_EXPR
> - && operand_equal_p (arg1, TREE_OPERAND (arg0, 1), 0))
> -   return fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
> -
>   strict_overflow_p = false;
>   if (TREE_CODE (arg1) == INTEGER_CST
>   && 0 != (tem = extract_muldiv (op0, arg1, code, NULL_TREE,
>  &strict_overflow_p)))
> {
>   if (strict_overflow_p)
> fold_overflow_warning (("assuming signed overflow does not "
> "occur when simplifying "
> "multiplication"),
>WARN_STRICT_OVERFLOW_MISC);
> Index: gcc/genmatch.c
> ===
> --- gcc/genmatch.c  (revision 235411)
> +++ gcc/genmatch.c  (working copy)
> @@ -3549,21 +3549,21 @@ decision_tree::gen (FILE *f, bool gimple
>
>  /* Output code to implement the predicate P from the decision tree DT.  */
>
>  void
>  write_predicate (FILE *f, predicate_id *p, decision_tree &dt, bool gimple)
>  {
>fprintf (f, "\nbool\n"
>"%s%s (tree t%s%s)\n"
>"{\n", gimple ? "gimple_" : "tree_", p->id,
>p->nargs > 0 ? ", tree *res_ops" : "",
> -  gimple ? ", tree (*valueize)(tree)" : "");
> +  gimple ? ", tree (*valueize)(tree) ATTRIBUTE_UNUSED" : "");
>/* Conveniently make 'type' available.  */
>fprin

Re: [PATCH GCC]Cleanup tree ifcvt by renaming any_mask_load_store.

2016-04-26 Thread Richard Biener
On Mon, Apr 25, 2016 at 6:53 PM, Bin Cheng  wrote:
> Hi,
> This is a simple patch for tree ifcvt.  It renames variable 
> any_mask_load_store to any_pred_load_store, as well as makes the variable 
> visible in file scope.  First rationale is name of that variable is confusing 
> with masked load store.  In fact, it also covers cases in which data race 
> store is introduced, and that's not masked load store at all.  From the point 
> of view of the variable's def/use, it's clear the variable indicates we 
> introduces new load/store during if-conversion that needs to be predicated by 
> some conditions.  The second rationale is the variable records a global flag 
> information and is used in many places.  Together with patch at 
> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01395.html, this patch resolves 
> ambiguity of the variable and is good for next patch fixing PR56541
>
> Bootstrap and test on x86_64 and AArch64, is it OK?

Ok.

Richard.

> Thanks,
> bin
>
> 2016-04-22  Bin Cheng  
>
> * tree-if-conv.c (any_pred_load_store): New static variable.
> (if_convertible_gimple_assign_stmt_p): Remove parameter.  Use
> any_pred_load_store instead of and_mask_load_store.
> (if_convertible_stmt_p, if_convertible_loop_p_1): Ditto.
> (if_convertible_loop_p, insert_gimplified_predicates): Ditto.
> (combine_blocks, tree_if_conversion): Ditto.


Re: [PATCH GCC]Improve tree ifconv by handling virtual PHIs which can be degenerated.

2016-04-26 Thread Richard Biener
On Mon, Apr 25, 2016 at 6:44 PM, Bin.Cheng  wrote:
> On Fri, Apr 22, 2016 at 11:47 AM, Richard Biener
>  wrote:
>> On Fri, Apr 22, 2016 at 12:33 PM, Bin.Cheng  wrote:
>>> On Fri, Apr 22, 2016 at 11:25 AM, Richard Biener
>>>  wrote:
 On Fri, Apr 22, 2016 at 12:07 PM, Bin Cheng  wrote:
> Hi,
> Tree if-conv has below code checking on virtual PHI nodes in 
> if_convertible__phi_p:
>
>   if (any_mask_load_store)
> return true;
>
>   /* When there were no if-convertible stores, check
>  that there are no memory writes in the branches of the loop to be
>  if-converted.  */
>   if (virtual_operand_p (gimple_phi_result (phi)))
> {
>   imm_use_iterator imm_iter;
>   use_operand_p use_p;
>
>   if (bb != loop->header)
> {
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "Virtual phi not on loop->header.\n");
>   return false;
> }
>
>   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_phi_result (phi))
> {
>   if (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI
>   && USE_STMT (use_p) != phi)
> {
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "Difficult to handle this virtual 
> phi.\n");
>   return false;
> }
> }
> }
>
> After investigation, I think it's to bypass code in the form of:
>
> 
>   .MEM_2232 = PHI <.MEM_574(179), .MEM_1247(183)> //    ...
>   if (cond)
> goto 
>   else
> goto 
>
> :  //empty
> :
>   .MEM_1247 = PHI <.MEM_2232(180), .MEM_2232(181)> //    if (cond2)
> goto 
>   else
> goto 
>
> :
>   goto 
>
> Here PHI_2 can be degenerated and deleted.  Furthermore, after 
> propagating .MEM_2232 to .MEM_1247's uses, PHI_1 can also be degenerated 
> and deleted in this case.  These cases are bypassed because tree if-conv 
> doesn't handle virtual PHI nodes during loop conversion (it only 
> predicates scalar PHI nodes).  Of course this results in loops not 
> converted, and not vectorized.
> This patch firstly deletes the aforementioned checking code, then adds 
> code handling such virtual PHIs during conversion.  The use of 
> `any_mask_load_store' now is less ambiguous with this change, which 
> allows further cleanups and patches fixing PR56541.
> BTW, I think the newly fix at PR70725 on PHI nodes with only one argument 
> is a special case covered by this change too.  Unfortunately I can't use 
> replace_uses_by because I need to handle PHIs at use point after 
> replacing too.  This doesn't really matter since we only care about 
> virtual PHI, it's not possible to be used by anything other than IR 
> itself.
> Bootstrap and test on x86_64 and AArch64, is it OK if no regressions?

 Doesn't this undo my fix for degenerate non-virtual PHIs?
>>> No, since we already support degenerate non-virtual PHIs in
>>> predicate_scalar_phi, your fix is also for virtual PHIs handling.
>>
>> Was it?  I don't remember ;)  I think it was for a non-virtual PHI.
>> Anyway, you should
>> see the PR70725 testcase fail again if not.
>>

 I believe we can just drop virtual PHIs and rely on

   if (any_mask_load_store)
 {
   mark_virtual_operands_for_renaming (cfun);
   todo |= TODO_update_ssa_only_virtuals;
 }

 re-creating them from scratch.  To do better than that we'd simply
>>> I tried this, simply enable above code for all cases can't resolve
>>> verify_ssa issue.  I haven't look into the details, looks like ssa
>>> def-use chain is corrupted in if-conversion if we don't process it
>>> explicitly.  Maybe it's possible along with your below suggestions,
>>> but we need to handle uses outside of loop too.
>>
>> Yes.  I don't like all the new code to deal with virtual PHIs when doing
>> it correctly would also avoid the above virtual SSA update ...
>>
>> After all the above seems to work for the case of if-converted stores
>> (which is where virtual PHIs appear as well, even not degenerate).
>> So I don't see exactly how it would break in the other case.  I suppose
>> you may need to call mark_virtual_phi_result_for_renaming () on
>> all virtual PHIs.
>>
> Hi Richard,
> Here is the updated patch.  It also fixes PR70771 & PR70775. Root
> cause for the ICE is in the fix to PR70725 because it forgot to
> release single-argument PHI nodes after replacing uses.  In
> combine_blocks, these PHIs are removed from basic block but are still
> live in IR.  As a result, the ssa def/use chain for these PHIs are in
> broken state, thus ICE is triggered whenever ssa use list is
> accessed..
> In this updated patch, I made

Re: match.pd patch: min(-x, -y), min(~x, ~y)

2016-04-26 Thread Richard Biener
On Mon, Apr 25, 2016 at 4:25 PM, Kyrill Tkachov
 wrote:
>
> On 22/04/16 12:20, Kyrill Tkachov wrote:
>>
>>
>> On 22/04/16 11:34, Marc Glisse wrote:
>>>
>>> On Fri, 22 Apr 2016, Kyrill Tkachov wrote:
>>>

 On 22/04/16 10:43, Kyrill Tkachov wrote:
>
>
> On 22/04/16 10:42, Marc Glisse wrote:
>>
>> On Fri, 22 Apr 2016, Kyrill Tkachov wrote:
>>
 2016-04-21  Marc Glisse 

 gcc/
 * match.pd (min(-x, -y), max(-x, -y), min(~x, ~y), max(~x, ~y)):
 New transformations.

 gcc/testsuite/
 * gcc.dg/tree-ssa/minmax-2.c: New testcase.


>>>
>>> I see the new testcase failing on aarch64:
>>> FAIL: gcc.dg/tree-ssa/minmax-2.c scan-tree-dump optimized
>>> "__builtin_fmin"
>>
>>
>> Strange, it seems to work in
>> https://gcc.gnu.org/ml/gcc-testresults/2016-04/msg02120.html
>>
>> Is that on some freestanding kind of setup where the builtin might be
>> disabled?
>>
>
> Ah, this is aarch64-none-elf which uses newlib as the C library.
> Let me check on aarch64-none-linux-gnu and get back to you.
>

 Yeah, I see it passing on aarch64-none-linux-gnu.
 Do we have an appropriate effective target check to gate this test on?
>>>
>>>
>>> I don't know, I have a hard time finding something related. I am not even
>>> convinced the test should be skipped. It looks like __builtin_fmax was
>>> recognized, otherwise you would get a warning and a conversion int-double.
>>> Maybe gimple_call_combined_fn rejects it? Ah, builtins.def declares it with
>>> DEF_C99_BUILTIN, which checks targetm.libc_has_function (function_c99_misc).
>>> I assume newlib fails that check? That would make c99_runtime a relevant
>>> target check.
>>>
>>
>> Yeah, adding the below makes this test UNSUPPORTED on aarch64-none-elf.
>> /* { dg-add-options c99_runtime } */
>> /* { dg-require-effective-target c99_runtime } */
>>
>> I'll prepare a patch.
>>
>
> Sorry for the delay, here it is.
> Ok to commit?

Ok.

Richard.

> Thanks,
> Kyrill
>
> 2016-04-25  Kyrylo Tkachov  
>
> * gcc.dg/tree-ssa/minmax-2.c: Require c99_runtime and add the
> associated options.
>
>> Thanks,
>> Kyrill
>
>


Re: [RFC] introduce --param max-lto-partition for having an upper bound on partition size

2016-04-26 Thread Richard Biener
On Mon, 25 Apr 2016, Prathamesh Kulkarni wrote:

> On 6 April 2016 at 14:54, Richard Biener  wrote:
> > On Wed, 6 Apr 2016, Richard Biener wrote:
> >
> >> On Wed, 6 Apr 2016, Prathamesh Kulkarni wrote:
> >>
> >> > On 6 April 2016 at 13:44, Richard Biener  wrote:
> >> > > On Wed, 6 Apr 2016, Prathamesh Kulkarni wrote:
> >> > >
> >> > >> On 5 April 2016 at 18:28, Richard Biener  wrote:
> >> > >> > On Tue, 5 Apr 2016, Prathamesh Kulkarni wrote:
> >> > >> >
> >> > >> >> On 5 April 2016 at 16:58, Richard Biener  wrote:
> >> > >> >> > On Tue, 5 Apr 2016, Prathamesh Kulkarni wrote:
> >> > >> >> >
> >> > >> >> >> On 4 April 2016 at 19:44, Jan Hubicka  wrote:
> >> > >> >> >> >
> >> > >> >> >> >> diff --git a/gcc/lto/lto-partition.c 
> >> > >> >> >> >> b/gcc/lto/lto-partition.c
> >> > >> >> >> >> index 9eb63c2..bc0c612 100644
> >> > >> >> >> >> --- a/gcc/lto/lto-partition.c
> >> > >> >> >> >> +++ b/gcc/lto/lto-partition.c
> >> > >> >> >> >> @@ -511,9 +511,20 @@ lto_balanced_map (int n_lto_partitions)
> >> > >> >> >> >>varpool_order.qsort (varpool_node_cmp);
> >> > >> >> >> >>
> >> > >> >> >> >>/* Compute partition size and create the first partition. 
> >> > >> >> >> >>  */
> >> > >> >> >> >> +  if (PARAM_VALUE (MIN_PARTITION_SIZE) > PARAM_VALUE 
> >> > >> >> >> >> (MAX_PARTITION_SIZE))
> >> > >> >> >> >> +fatal_error (input_location, "min partition size cannot 
> >> > >> >> >> >> be greater than max partition size");
> >> > >> >> >> >> +
> >> > >> >> >> >>partition_size = total_size / n_lto_partitions;
> >> > >> >> >> >>if (partition_size < PARAM_VALUE (MIN_PARTITION_SIZE))
> >> > >> >> >> >>  partition_size = PARAM_VALUE (MIN_PARTITION_SIZE);
> >> > >> >> >> >> +  else if (partition_size > PARAM_VALUE 
> >> > >> >> >> >> (MAX_PARTITION_SIZE))
> >> > >> >> >> >> +{
> >> > >> >> >> >> +  n_lto_partitions = total_size / PARAM_VALUE 
> >> > >> >> >> >> (MAX_PARTITION_SIZE);
> >> > >> >> >> >> +  if (total_size % PARAM_VALUE (MAX_PARTITION_SIZE))
> >> > >> >> >> >> + n_lto_partitions++;
> >> > >> >> >> >> +  partition_size = total_size / n_lto_partitions;
> >> > >> >> >> >> +}
> >> > >> >> >> >
> >> > >> >> >> > lto_balanced_map actually works in a way that looks for 
> >> > >> >> >> > cheapest cutpoint in range
> >> > >> >> >> > 3/4*parittion_size to 2*partition_size and picks the cheapest 
> >> > >> >> >> > range.
> >> > >> >> >> > Setting partition_size to this value will thus not cause 
> >> > >> >> >> > partitioner to produce smaller
> >> > >> >> >> > partitions only.  I suppose modify the conditional:
> >> > >> >> >> >
> >> > >> >> >> >   /* Partition is too large, unwind into step when best 
> >> > >> >> >> > cost was reached and
> >> > >> >> >> >  start new partition.  */
> >> > >> >> >> >   if (partition->insns > 2 * partition_size)
> >> > >> >> >> >
> >> > >> >> >> > and/or in the code above set the partition_size to half of 
> >> > >> >> >> > total_size/max_size.
> >> > >> >> >> >
> >> > >> >> >> > I know this is somewhat sloppy.  This was really just first 
> >> > >> >> >> > cut implementation
> >> > >> >> >> > many years ago. I expected to reimplement it marter soon, but 
> >> > >> >> >> > then there was
> >> > >> >> >> > never really a need for it (I am trying to avoid late IPA 
> >> > >> >> >> > optimizations so the
> >> > >> >> >> > partitioning decisions should mostly affect compile time 
> >> > >> >> >> > performance only).
> >> > >> >> >> > If ARM is more sensitive for partitining, perhaps it would 
> >> > >> >> >> > make sense to try to
> >> > >> >> >> > look for something smarter.
> >> > >> >> >> >
> >> > >> >> >> >> +
> >> > >> >> >> >>npartitions = 1;
> >> > >> >> >> >>partition = new_partition ("");
> >> > >> >> >> >>if (symtab->dump_file)
> >> > >> >> >> >> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> >> > >> >> >> >> index 9dd513f..294b8a4 100644
> >> > >> >> >> >> --- a/gcc/lto/lto.c
> >> > >> >> >> >> +++ b/gcc/lto/lto.c
> >> > >> >> >> >> @@ -3112,6 +3112,12 @@ do_whole_program_analysis (void)
> >> > >> >> >> >>timevar_pop (TV_WHOPR_WPA);
> >> > >> >> >> >>
> >> > >> >> >> >>timevar_push (TV_WHOPR_PARTITIONING);
> >> > >> >> >> >> +
> >> > >> >> >> >> +  if (flag_lto_partition != LTO_PARTITION_BALANCED
> >> > >> >> >> >> +  && PARAM_VALUE (MAX_PARTITION_SIZE) != INT_MAX)
> >> > >> >> >> >> +fatal_error (input_location, "--param max-lto-partition 
> >> > >> >> >> >> should only"
> >> > >> >> >> >> +  " be used with balanced partitioning\n");
> >> > >> >> >> >> +
> >> > >> >> >> >
> >> > >> >> >> > I think we should wire in resonable MAX_PARTITION_SIZE 
> >> > >> >> >> > default.  THe value you
> >> > >> >> >> > found experimentally may be a good start. For that reason we 
> >> > >> >> >> > can't really
> >> > >> >> >> > refuse a value when !LTO_PARTITION_BALANCED.  Just document 
> >> > >> >> >> > it as parameter for
> >> > >> >> >> > balanced partitioning only and add a parameter to 
> >>

Re: match.pd: unsigned A - B > A --> A < B

2016-04-26 Thread Richard Biener
On Sun, Apr 24, 2016 at 7:42 PM, Marc Glisse  wrote:
> Hello,
>
> the first part is something that was discussed last stage3, and Jakub argued
> in favor of single_use. The second part is probably less useful, it notices
> that if we manually check for overflow using the result of IFN_*_OVERFLOW,
> then we might as well read that information from the result of that
> function.
>
> Bootstrap+regtest on powerpc64le-unknown-linux-gnu. (hmm, I probably should
> have done it on x86_64 instead, I don't know if the ppc backend has
> implemented the overflow functions recently)

Ok.  Can you please place the match.pd rules adjacent to the other comparison
simplifications rather than at the end?

Thanks,
Richard.

> 2016-04-25  Marc Glisse  
>
> gcc/
> * match.pd (A - B > A, A + B < A): New transformations.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/overflow-2.c: New testcase.
> * gcc.dg/tree-ssa/minus-ovf.c: Likewise.
>
> --
> Marc Glisse
> Index: trunk-ovf2/gcc/match.pd
> ===
> --- trunk-ovf2/gcc/match.pd (revision 235371)
> +++ trunk-ovf2/gcc/match.pd (working copy)
> @@ -3071,10 +3071,60 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (simplify
>   /* signbit(x) -> 0 if x is nonnegative.  */
>   (SIGNBIT tree_expr_nonnegative_p@0)
>   { integer_zero_node; })
>
>  (simplify
>   /* signbit(x) -> x<0 if x doesn't have signed zeros.  */
>   (SIGNBIT @0)
>   (if (!HONOR_SIGNED_ZEROS (@0))
>(convert (lt @0 { build_real (TREE_TYPE (@0), dconst0); }
> +
> +/* To detect overflow in unsigned A - B, A < B is simpler than A - B > A.
> +   However, the detection logic for SUB_OVERFLOW in tree-ssa-math-opts.c
> +   expects the long form, so we restrict the transformation for now.  */
> +(for cmp (gt le)
> + (simplify
> +  (cmp (minus@2 @0 @1) @0)
> +  (if (single_use (@2)
> +   && ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +   && TYPE_UNSIGNED (TREE_TYPE (@0))
> +   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> +   (cmp @1 @0
> +(for cmp (lt ge)
> + (simplify
> +  (cmp @0 (minus@2 @0 @1))
> +  (if (single_use (@2)
> +   && ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +   && TYPE_UNSIGNED (TREE_TYPE (@0))
> +   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> +   (cmp @0 @1
> +
> +/* Testing for overflow is unnecessary if we already know the result.  */
> +/* A < A - B  */
> +(for cmp (lt ge)
> + out (ne eq)
> + (simplify
> +  (cmp @0 (realpart (IFN_SUB_OVERFLOW@2 @0 @1)))
> +  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> +   (out (imagpart @2) { build_zero_cst (TREE_TYPE (@0)); }
> +/* A - B > A  */
> +(for cmp (gt le)
> + out (ne eq)
> + (simplify
> +  (cmp (realpart (IFN_SUB_OVERFLOW@2 @0 @1)) @0)
> +  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> +   (out (imagpart @2) { build_zero_cst (TREE_TYPE (@0)); }
> +/* A + B < A  */
> +(for cmp (lt ge)
> + out (ne eq)
> + (simplify
> +  (cmp (realpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) @0)
> +  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> +   (out (imagpart @2) { build_zero_cst (TREE_TYPE (@0)); }
> +/* A > A + B  */
> +(for cmp (gt le)
> + out (ne eq)
> + (simplify
> +  (cmp @0 (realpart (IFN_ADD_OVERFLOW:c@2 @0 @1)))
> +  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> +   (out (imagpart @2) { build_zero_cst (TREE_TYPE (@0)); }
> Index: trunk-ovf2/gcc/testsuite/gcc.dg/tree-ssa/minus-ovf.c
> ===
> --- trunk-ovf2/gcc/testsuite/gcc.dg/tree-ssa/minus-ovf.c(revision 0)
> +++ trunk-ovf2/gcc/testsuite/gcc.dg/tree-ssa/minus-ovf.c(working
> copy)
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +
> +int f(unsigned a, unsigned b) {
> +  unsigned remove = a - b;
> +  return remove > a;
> +}
> +
> +int g(unsigned a, unsigned b) {
> +  unsigned remove = a - b;
> +  return remove <= a;
> +}
> +
> +int h(unsigned a, unsigned b) {
> +  unsigned remove = a - b;
> +  return a < remove;
> +}
> +
> +int i(unsigned a, unsigned b) {
> +  unsigned remove = a - b;
> +  return a >= remove;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "remove" "optimized" } } */
> Index: trunk-ovf2/gcc/testsuite/gcc.dg/tree-ssa/overflow-2.c
> ===
> --- trunk-ovf2/gcc/testsuite/gcc.dg/tree-ssa/overflow-2.c   (revision 0)
> +++ trunk-ovf2/gcc/testsuite/gcc.dg/tree-ssa/overflow-2.c   (working
> copy)
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
> +
> +int carry;
> +int f(unsigned a, unsigned b) {
> +  unsigned r;
> +  carry = __builtin_sub_overflow(a, b, &r);
> +  return r > a;
> +}
> +int g(unsigned a, unsigned b) {
> +  unsigned r;
> +  carry = __builtin_sub_overflow(a, b, &r);
> +  return a < r;
> +}
> +int h(unsigned a, unsigned b) {
> +  unsigned r;
> +  carry = __builtin_sub_overflow(a, b, &r);
> +  return r <= a;
> +}
> +int i(unsigned a, unsigned b) {
> 

Re: match.pd patch: u + 3 < u is u > UINT_MAX - 3

2016-04-26 Thread Richard Biener
On Sun, Apr 24, 2016 at 7:24 PM, Marc Glisse  wrote:
> On Fri, 22 Apr 2016, Marc Glisse wrote:
>
>> On Fri, 22 Apr 2016, Richard Biener wrote:
>>
>>> On Fri, Apr 22, 2016 at 5:29 AM, Marc Glisse 
>>> wrote:

 Hello,

 this optimizes a common pattern for unsigned overflow detection, when
 one of
 the arguments turns out to be a constant. There are more ways this could
 look like, (a + 42 <= 41) in particular, but that'll be for another
 patch.
>>>
>>>
>>> This case is also covered by fold_comparison which should be re-written
>>> to match.pd patterns (and removed from fold-const.c).
>>>
>>> fold_binary also as a few interesting/similar equality compare cases
>>> like X +- Y CMP X to Y CMP 0 which look related.
>>>
>>> Also your case is in fold_binary for the case of undefined overflow:
>>
>>
>> As far as I can tell, fold-const.c handles this kind of transformation
>> strictly in the case of undefined overflow (or floats), while this is
>> strictly in the case of unsigned with wrapping overflow. I thought it would
>> be more readable to take advantage of the genmatch machinery and group the
>> wrapping transforms in one place, and the undefined overflow ones in another
>> place (they don't group the same way by operator, etc).
>>
>> If you prefer to group by pattern shape and port the related fold-const.c
>> bit at the same time, I could try that...
>>
>>> +/* When one argument is a constant, overflow detection can be
>>> simplified.
>>> +   Currently restricted to single use so as not to interfere too much
>>> with
>>> +   ADD_OVERFLOW detection in tree-ssa-math-opts.c.  */
>>> +(for cmp (lt le ge gt)
>>> + out (gt gt le le)
>>> + (simplify
>>> +  (cmp (plus@2 @0 integer_nonzerop@1) @0)
>>> +  (if (TYPE_UNSIGNED (TREE_TYPE (@0))
>>> +   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
>>> +   && TYPE_MAX_VALUE (TREE_TYPE (@0))
>>> +   && single_use (@2))
>>> +   (out @0 (minus { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1)
>>> +(for cmp (gt ge le lt)
>>> + out (gt gt le le)
>>> + (simplify
>>> +  (cmp @0 (plus@2 @0 integer_nonzerop@1))
>>> +  (if (TYPE_UNSIGNED (TREE_TYPE (@0))
>>> +   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
>>> +   && TYPE_MAX_VALUE (TREE_TYPE (@0))
>>> +   && single_use (@2))
>>> +   (out @0 (minus { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1)
>>>
>>> please add a comment with the actual transform - A + CST CMP A -> A CMP'
>>> CST'
>>>
>>> As we are relying on twos-complement wrapping you shouldn't need
>>> TYPE_MAX_VALUE
>>> here but you can use wi::max_value (precision, sign).  I'm not sure we
>>> have sensible
>>> TYPE_MAX_VALUE for vector or complex types - the accessor uses
>>> NUMERICAL_TYPE_CKECK
>>> and TYPE_OVERFLOW_WRAPS checks for ANY_INTEGRAL_TYPE.  Thus I wonder
>>> if we should restrict this to INTEGRAL_TYPE_P (making the
>>> wi::max_value route valid).
>>
>>
>> integer_nonzerop currently already restricts to INTEGER_CST or
>> COMPLEX_CST, and I don't think complex can appear in a comparison. I'll go
>> back to writing the more explicit INTEGER_CST in the pattern and I'll use
>> wide_int.
>
>
> Better this way?
>
> By the way, it would be cool to be able to write:
> (lt:c @0 @1)
>
> which would expand to both
> (lt @0 @1)
> (gt @1 @0)
>
> (as per swap_tree_comparison or swapped_tcc_comparison)

Yeah, I know...  I was hesitant to overload :c with "slightly" different
semantics though.

I can give it a shot though - it would avoid quite some repetition I guess.

Ok.

Thanks,
Richard.

> --
> Marc Glisse
> Index: trunk-ovf/gcc/match.pd
> ===
> --- trunk-ovf/gcc/match.pd  (revision 235371)
> +++ trunk-ovf/gcc/match.pd  (working copy)
> @@ -3071,10 +3071,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (simplify
>   /* signbit(x) -> 0 if x is nonnegative.  */
>   (SIGNBIT tree_expr_nonnegative_p@0)
>   { integer_zero_node; })
>
>  (simplify
>   /* signbit(x) -> x<0 if x doesn't have signed zeros.  */
>   (SIGNBIT @0)
>   (if (!HONOR_SIGNED_ZEROS (@0))
>(convert (lt @0 { build_real (TREE_TYPE (@0), dconst0); }
> +
> +/* When one argument is a constant, overflow detection can be simplified.
> +   Currently restricted to single use so as not to interfere too much with
> +   ADD_OVERFLOW detection in tree-ssa-math-opts.c.
> +   A + CST CMP A  ->  A CMP' CST' */
> +(for cmp (lt le ge gt)
> + out (gt gt le le)
> + (simplify
> +  (cmp (plus@2 @0 INTEGER_CST@1) @0)
> +  (if (TYPE_UNSIGNED (TREE_TYPE (@0))
> +   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
> +   && wi::ne_p (@1, 0)
> +   && single_use (@2))
> +   (out @0 { wide_int_to_tree (TREE_TYPE (@0), wi::max_value
> +  (TYPE_PRECISION (TREE_TYPE (@0)), UNSIGNED) - @1); }
> +/* A CMP A + CST  ->  A CMP' CST' */
> +(for cmp (gt ge le lt)
> + out (gt gt le le)
> + (simplify
> +  (cmp @0 (plus@2 @0 INTEGER_CST@1))
> +  (if (TYPE_UNSIGNED (TREE_TYPE (@0))
> +   && TYPE_OVERFLO

[C PATCH] Follow-up fix to the misclassified token problem (PR c/67784)

2016-04-26 Thread Marek Polacek
This PR was reopened, because the exact same problem with treating a TYPENAME
wrongly as an ID was found when using just if-clause, without an enclosing for
loop.  More details: .
That fix had a follow-up, because it broke some ObjC code.

To fix this, we need to use the (amended) token reclassification even when
parsing an if statement.  I factored the code into a separate function so as to
not repeat the very same code.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-26  Marek Polacek  

PR c/67784
* c-parser.c (c_parser_maybe_reclassify_token): New function factored
out of ...
(c_parser_for_statement): ... here.
(c_parser_if_statement): Use it.

* gcc.dg/pr67784-3.c: New test.

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index bdd669d..720ce75 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -5425,6 +5425,43 @@ c_parser_else_body (c_parser *parser, const 
token_indent_info &else_tinfo,
   return c_end_compound_stmt (body_loc, block, flag_isoc99);
 }
 
+/* We might need to reclassify any previously-lexed identifier, e.g.
+   when we've left a for loop with an if-statement without else in the
+   body - we might have used a wrong scope for the token.  See PR67784.  */
+
+static void
+c_parser_maybe_reclassify_token (c_parser *parser)
+{
+  if (c_parser_next_token_is (parser, CPP_NAME))
+{
+  c_token *token = c_parser_peek_token (parser);
+
+  if (token->id_kind != C_ID_CLASSNAME)
+   {
+ tree decl = lookup_name (token->value);
+
+ token->id_kind = C_ID_ID;
+ if (decl)
+   {
+ if (TREE_CODE (decl) == TYPE_DECL)
+   token->id_kind = C_ID_TYPENAME;
+   }
+ else if (c_dialect_objc ())
+   {
+ tree objc_interface_decl = objc_is_class_name (token->value);
+ /* Objective-C class names are in the same namespace as
+variables and typedefs, and hence are shadowed by local
+declarations.  */
+ if (objc_interface_decl)
+   {
+ token->value = objc_interface_decl;
+ token->id_kind = C_ID_CLASSNAME;
+   }
+   }
+   }
+}
+}
+
 /* Parse an if statement (C90 6.6.4, C99 6.8.4).
 
if-statement:
@@ -5523,6 +5560,8 @@ c_parser_if_statement (c_parser *parser, bool *if_p, 
vec *chain)
   if (flag_cilkplus && contains_array_notation_expr (if_stmt))
 if_stmt = fix_conditional_array_notations (if_stmt);
   add_stmt (if_stmt);
+
+  c_parser_maybe_reclassify_token (parser);
 }
 
 /* Parse a switch statement (C90 6.6.4, C99 6.8.4).
@@ -5917,37 +5956,7 @@ c_parser_for_statement (c_parser *parser, bool ivdep, 
bool *if_p)
 c_finish_loop (loc, cond, incr, body, c_break_label, c_cont_label, true);
   add_stmt (c_end_compound_stmt (loc, block, flag_isoc99 || c_dialect_objc 
()));
 
-  /* We might need to reclassify any previously-lexed identifier, e.g.
- when we've left a for loop with an if-statement without else in the
- body - we might have used a wrong scope for the token.  See PR67784.  */
-  if (c_parser_next_token_is (parser, CPP_NAME))
-{
-  c_token *token = c_parser_peek_token (parser);
-
-  if (token->id_kind != C_ID_CLASSNAME)
-   {
- tree decl = lookup_name (token->value);
-
- token->id_kind = C_ID_ID;
- if (decl)
-   {
- if (TREE_CODE (decl) == TYPE_DECL)
-   token->id_kind = C_ID_TYPENAME;
-   }
- else if (c_dialect_objc ())
-   {
- tree objc_interface_decl = objc_is_class_name (token->value);
- /* Objective-C class names are in the same namespace as
-variables and typedefs, and hence are shadowed by local
-declarations.  */
- if (objc_interface_decl)
-   {
- token->value = objc_interface_decl;
- token->id_kind = C_ID_CLASSNAME;
-   }
-   }
-   }
-}
+  c_parser_maybe_reclassify_token (parser);
 
   token_indent_info next_tinfo
 = get_token_indent_info (c_parser_peek_token (parser));
diff --git gcc/testsuite/gcc.dg/pr67784-3.c gcc/testsuite/gcc.dg/pr67784-3.c
index e69de29..45e3c44 100644
--- gcc/testsuite/gcc.dg/pr67784-3.c
+++ gcc/testsuite/gcc.dg/pr67784-3.c
@@ -0,0 +1,50 @@
+/* PR c/67784 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+typedef int T;
+
+void
+fn1 (void)
+{
+  if (sizeof (enum { T }) == 0)
+;
+  T x;
+}
+
+void
+fn2 (void)
+{
+  int i = 0;
+  if (sizeof (enum { T }) == 0)
+i++;
+  T x;
+}
+
+void
+fn3 (void)
+{
+  if (sizeof (enum { T }) == 0)
+{
+}
+  T x;
+}
+
+void
+fn4 (void)
+{
+  if (sizeof (enum { T }) == 0)
+L:
+;
+  T x;
+}
+
+void
+fn5 (void)
+{
+  if (sizeof (enum { T }) == 0)
+;
+  else
+;
+  T x;
+}

Marek


[PATCH] [ARC] Fix emitting jump tables for ARCv2

2016-04-26 Thread Claudiu Zissulescu
The compact casesi option only make sens for ARCv1 cores. For ARCv2 cores we
use the regular expansion.

OK to apply?
Claudiu

gcc/
2016-04-26  Claudiu Zissulescu  

* common/config/arc/arc-common.c (arc_option_optimization_table):
Disable compact casesi as default option.
* config/arc/arc.c (arc_override_options): Enable compact casesi
option for non-ARCv2 cores.
* config/arc/arc.md (movsi_insn): Use @pcl relocation.
(movsi_ne): Update assembly printing pattern.
(casesi_load): Use short ld instruction.
---
 gcc/common/config/arc/arc-common.c |  1 -
 gcc/config/arc/arc.c   |  7 +++
 gcc/config/arc/arc.md  | 11 +++
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/gcc/common/config/arc/arc-common.c 
b/gcc/common/config/arc/arc-common.c
index 64fb053..17cc1bd 100644
--- a/gcc/common/config/arc/arc-common.c
+++ b/gcc/common/config/arc/arc-common.c
@@ -56,7 +56,6 @@ static const struct default_options 
arc_option_optimization_table[] =
 { OPT_LEVELS_ALL, OPT_mbbit_peephole, NULL, 1 },
 { OPT_LEVELS_SIZE, OPT_mq_class, NULL, 1 },
 { OPT_LEVELS_SIZE, OPT_mcase_vector_pcrel, NULL, 1 },
-{ OPT_LEVELS_SIZE, OPT_mcompact_casesi, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 6f2136e..be55c99 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -812,6 +812,13 @@ arc_override_options (void)
   if (arc_size_opt_level == 3)
 optimize_size = 1;
 
+  /* Compact casesi is not a valid option for ARCv2 family, disable
+ it.  */
+  if (TARGET_V2)
+TARGET_COMPACT_CASESI = 0;
+  else if (optimize_size == 1)
+TARGET_COMPACT_CASESI = 1;
+
   if (flag_pic)
 target_flags |= MASK_NO_SDATA_SET;
 
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 718443b..aec4b37 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -713,7 +713,7 @@
ror %0,((%1*2+1) & 0x3f) ;6
mov%? %0,%1 ;7
add %0,%S1  ;8
-   * return arc_get_unalign () ? \"add %0,pcl,%1-.+2\" : \"add %0,pcl,%1-.\";
+   add %0,pcl,%1@pcl
mov%? %0,%S1%&  ;10
mov%? %0,%S1;11
ld%?%U1 %0,%1%& ;12
@@ -3467,8 +3467,8 @@
   ""
   "@
* current_insn_predicate = 0; return \"sub%?.ne %0,%0,%0%&\";
-mov_s.ne %0,%1
-mov_s.ne %0,%1
+* current_insn_predicate = 0; return \"mov%?.ne %0,%1\";
+* current_insn_predicate = 0; return \"mov%?.ne %0,%1\";
mov.ne %0,%1
mov.ne %0,%S1"
   [(set_attr "type" "cmove")
@@ -3777,7 +3777,10 @@
   switch (GET_MODE (diff_vec))
 {
 case SImode:
-  return \"ld.as %0,[%1,%2]%&\";
+  if ((which_alternative == 0) && TARGET_CODE_DENSITY)
+   return \"ld_s.as %0,[%1,%2]%&\";
+  else
+   return \"ld.as %0,[%1,%2]%&\";
 case HImode:
   if (ADDR_DIFF_VEC_FLAGS (diff_vec).offset_unsigned)
return \"ld%_.as %0,[%1,%2]\";
-- 
1.9.1



RE: PING [PATCH] [ARC] Add SIMD extensions for ARC HS

2016-04-26 Thread Claudiu Zissulescu
PING

> -Original Message-
> From: Claudiu Zissulescu
> Sent: Friday, April 08, 2016 10:31 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Claudiu Zissulescu; g...@amylaar.uk; francois.bed...@synopsys.com;
> jeremy.benn...@embecosm.com
> Subject: [PATCH] [ARC] Add SIMD extensions for ARC HS
> 
> This patch adds support for the new SIMD operations added to ARC HS
> cpu class. The proposed patch doesn't chase for performance but offers
> support for those newly added operations, and autovectorization.
> 
> The patch is tested using dg.exp, compile.exp, and execute.exp for
> both arc700 and archs with and without SIMD support enabled.
> 
> OK to apply?
> Claudiu
> 
> gcc/
> 2016-03-14  Claudiu Zissulescu  
> 
>   * config/arc/arc.c (arc_vector_mode_supported_p): Add support for
>   the new ARC HS SIMD instructions.
>   (arc_preferred_simd_mode): New function.
>   (arc_autovectorize_vector_sizes): Likewise.
>   (TARGET_VECTORIZE_PREFERRED_SIMD_MODE)
>   (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Define.
>   (arc_init_reg_tables): Accept new ARC HS SIMD modes.
>   (arc_init_builtins): Add new SIMD builtin types.
>   (arc_split_move): Handle 64 bit vector moves.
>   * config/arc/arc.h (TARGET_PLUS_DMPY, TARGET_PLUS_MACD)
>   (TARGET_PLUS_QMACW): Define.
>   * config/arc/builtins.def (QMACH, QMACHU, QMPYH, QMPYHU,
> DMACH)
>   (DMACHU, DMPYH, DMPYHU, DMACWH, DMACWHU, VMAC2H,
> VMAC2HU, VMPY2H)
>   (VMPY2HU, VADDSUB2H, VSUBADD2H, VADDSUB, VSUBADD,
> VADDSUB4H)
>   (VSUBADD4H): New builtins.
>   * config/arc/simdext.md: Add new ARC HS SIMD instructions.
>   * testsuite/gcc.target/arc/builtin_simdarc.c: New file.
> ---
>  gcc/config/arc/arc.c   | 112 -
>  gcc/config/arc/arc.h   |   6 +
>  gcc/config/arc/builtins.def|  27 ++
>  gcc/config/arc/simdext.md  | 571 
> +
>  gcc/testsuite/gcc.target/arc/builtin_simdarc.c |  38 ++
>  5 files changed, 747 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/builtin_simdarc.c
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index d60db50..d120946 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -247,16 +247,47 @@ static bool arc_use_by_pieces_infrastructure_p
> (unsigned HOST_WIDE_INT,
>  static bool
>  arc_vector_mode_supported_p (machine_mode mode)
>  {
> -  if (!TARGET_SIMD_SET)
> -return false;
> +  switch (mode)
> +{
> +case V2HImode:
> +  return TARGET_PLUS_DMPY;
> +case V4HImode:
> +case V2SImode:
> +  return TARGET_PLUS_QMACW;
> +case V4SImode:
> +case V8HImode:
> +  return TARGET_SIMD_SET;
> 
> -  if ((mode == V4SImode)
> -  || (mode == V8HImode))
> -return true;
> +default:
> +  return false;
> +}
> +}
> 
> -  return false;
> +/* Implements target hook TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
> */
> +
> +static enum machine_mode
> +arc_preferred_simd_mode (enum machine_mode mode)
> +{
> +  switch (mode)
> +{
> +case HImode:
> +  return TARGET_PLUS_QMACW ? V4HImode : V2HImode;
> +case SImode:
> +  return V2SImode;
> +
> +default:
> +  return word_mode;
> +}
>  }
> 
> +/* Implements target hook
> +   TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES.  */
> +
> +static unsigned int
> +arc_autovectorize_vector_sizes (void)
> +{
> +  return TARGET_PLUS_QMACW ? (8 | 4) : 0;
> +}
> 
>  /* TARGET_PRESERVE_RELOAD_P is still awaiting patch re-evaluation /
> review.  */
>  static bool arc_preserve_reload_p (rtx in) ATTRIBUTE_UNUSED;
> @@ -345,6 +376,12 @@ static void arc_finalize_pic (void);
>  #undef TARGET_VECTOR_MODE_SUPPORTED_P
>  #define TARGET_VECTOR_MODE_SUPPORTED_P
> arc_vector_mode_supported_p
> 
> +#undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
> +#define TARGET_VECTORIZE_PREFERRED_SIMD_MODE
> arc_preferred_simd_mode
> +
> +#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
> +#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
> arc_autovectorize_vector_sizes
> +
>  #undef TARGET_CAN_USE_DOLOOP_P
>  #define TARGET_CAN_USE_DOLOOP_P arc_can_use_doloop_p
> 
> @@ -1214,7 +1251,12 @@ arc_init_reg_tables (void)
>   arc_mode_class[i] = 0;
> break;
>   case MODE_VECTOR_INT:
> -   arc_mode_class [i] = (1<< (int) V_MODE);
> +   if (GET_MODE_SIZE (m) == 4)
> + arc_mode_class[i] = (1 << (int) S_MODE);
> +   else if (GET_MODE_SIZE (m) == 8)
> + arc_mode_class[i] = (1 << (int) D_MODE);
> +   else
> + arc_mode_class[i] = (1 << (int) V_MODE);
> break;
>   case MODE_CC:
>   default:
> @@ -5277,6 +5319,15 @@ arc_builtin_decl (unsigned id, bool initialize_p
> ATTRIBUTE_UNUSED)
>  static void
>  arc_init_builtins (void)
>  {
> +  tree V4HI_type_node;
> +  tree V2SI_type_node;
> +  tree V2HI_type_node;
> +
> +  /* Vector types based on HS SIMD elements.  */
> +  V4HI_type_node = build_vec

Re: match.pd: x+x -> 2*x

2016-04-26 Thread Marc Glisse

On Tue, 26 Apr 2016, Richard Biener wrote:


Note that I think

/* (X /[ex] A) * A -> X.  */
(simplify
 (mult (convert? (exact_div @0 @1)) @1)
 /* Look through a sign-changing conversion.  */
 (convert @0))

has a bug as we use operand_equal_p for comparing @1 but
that treats equal but different typed INTEGER_CSTs as equal...

Thus this lacks the tree_nop_conversion_p check.


You seemed ok with removing that check last year
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01551.html

For this specific pattern, I think any conversion should work. Did you 
have a particular example in mind?


--
Marc Glisse


Re: [C PATCH] Follow-up fix to the misclassified token problem (PR c/67784)

2016-04-26 Thread Joseph Myers
On Tue, 26 Apr 2016, Marek Polacek wrote:

> This PR was reopened, because the exact same problem with treating a TYPENAME
> wrongly as an ID was found when using just if-clause, without an enclosing for
> loop.  More details: 
> .
> That fix had a follow-up, because it broke some ObjC code.
> 
> To fix this, we need to use the (amended) token reclassification even when
> parsing an if statement.  I factored the code into a separate function so as 
> to
> not repeat the very same code.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

I think you need more thorough testcases, to cover the cases where the if 
statement with no else forms the body of a switch or while statement, and 
the declaration in question appears in the expression of that switch or 
while statement, e.g.

typedef int T;

switch (sizeof (enum { T }))
  if (1)
;
T x;

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: match.pd patch: u + 3 < u is u > UINT_MAX - 3

2016-04-26 Thread Richard Biener
On Tue, Apr 26, 2016 at 1:07 PM, Richard Biener
 wrote:
> On Sun, Apr 24, 2016 at 7:24 PM, Marc Glisse  wrote:
>> On Fri, 22 Apr 2016, Marc Glisse wrote:
>>
>>> On Fri, 22 Apr 2016, Richard Biener wrote:
>>>
 On Fri, Apr 22, 2016 at 5:29 AM, Marc Glisse 
 wrote:
>
> Hello,
>
> this optimizes a common pattern for unsigned overflow detection, when
> one of
> the arguments turns out to be a constant. There are more ways this could
> look like, (a + 42 <= 41) in particular, but that'll be for another
> patch.


 This case is also covered by fold_comparison which should be re-written
 to match.pd patterns (and removed from fold-const.c).

 fold_binary also as a few interesting/similar equality compare cases
 like X +- Y CMP X to Y CMP 0 which look related.

 Also your case is in fold_binary for the case of undefined overflow:
>>>
>>>
>>> As far as I can tell, fold-const.c handles this kind of transformation
>>> strictly in the case of undefined overflow (or floats), while this is
>>> strictly in the case of unsigned with wrapping overflow. I thought it would
>>> be more readable to take advantage of the genmatch machinery and group the
>>> wrapping transforms in one place, and the undefined overflow ones in another
>>> place (they don't group the same way by operator, etc).
>>>
>>> If you prefer to group by pattern shape and port the related fold-const.c
>>> bit at the same time, I could try that...
>>>
 +/* When one argument is a constant, overflow detection can be
 simplified.
 +   Currently restricted to single use so as not to interfere too much
 with
 +   ADD_OVERFLOW detection in tree-ssa-math-opts.c.  */
 +(for cmp (lt le ge gt)
 + out (gt gt le le)
 + (simplify
 +  (cmp (plus@2 @0 integer_nonzerop@1) @0)
 +  (if (TYPE_UNSIGNED (TREE_TYPE (@0))
 +   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
 +   && TYPE_MAX_VALUE (TREE_TYPE (@0))
 +   && single_use (@2))
 +   (out @0 (minus { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1)
 +(for cmp (gt ge le lt)
 + out (gt gt le le)
 + (simplify
 +  (cmp @0 (plus@2 @0 integer_nonzerop@1))
 +  (if (TYPE_UNSIGNED (TREE_TYPE (@0))
 +   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
 +   && TYPE_MAX_VALUE (TREE_TYPE (@0))
 +   && single_use (@2))
 +   (out @0 (minus { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1)

 please add a comment with the actual transform - A + CST CMP A -> A CMP'
 CST'

 As we are relying on twos-complement wrapping you shouldn't need
 TYPE_MAX_VALUE
 here but you can use wi::max_value (precision, sign).  I'm not sure we
 have sensible
 TYPE_MAX_VALUE for vector or complex types - the accessor uses
 NUMERICAL_TYPE_CKECK
 and TYPE_OVERFLOW_WRAPS checks for ANY_INTEGRAL_TYPE.  Thus I wonder
 if we should restrict this to INTEGRAL_TYPE_P (making the
 wi::max_value route valid).
>>>
>>>
>>> integer_nonzerop currently already restricts to INTEGER_CST or
>>> COMPLEX_CST, and I don't think complex can appear in a comparison. I'll go
>>> back to writing the more explicit INTEGER_CST in the pattern and I'll use
>>> wide_int.
>>
>>
>> Better this way?
>>
>> By the way, it would be cool to be able to write:
>> (lt:c @0 @1)
>>
>> which would expand to both
>> (lt @0 @1)
>> (gt @1 @0)
>>
>> (as per swap_tree_comparison or swapped_tcc_comparison)
>
> Yeah, I know...  I was hesitant to overload :c with "slightly" different
> semantics though.
>
> I can give it a shot though - it would avoid quite some repetition I guess.

Being able to write (lt:c @0 @1) is easy, see attached (didn't check
if it works),
but being able to write

 (for cmp (lt gt)
  (cmp:c @0 @1)

is harder (see FIXME), you'd have to create a new for at the nesting
level of the
old with the operator list adjusted.  Not impossible, of course.

Includes some verification I added locally at some point (which also exposed we
use :c on non-commutative tree codes, thus the new :C ...).

Richard.

> Ok.
>
> Thanks,
> Richard.
>
>> --
>> Marc Glisse
>> Index: trunk-ovf/gcc/match.pd
>> ===
>> --- trunk-ovf/gcc/match.pd  (revision 235371)
>> +++ trunk-ovf/gcc/match.pd  (working copy)
>> @@ -3071,10 +3071,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>  (simplify
>>   /* signbit(x) -> 0 if x is nonnegative.  */
>>   (SIGNBIT tree_expr_nonnegative_p@0)
>>   { integer_zero_node; })
>>
>>  (simplify
>>   /* signbit(x) -> x<0 if x doesn't have signed zeros.  */
>>   (SIGNBIT @0)
>>   (if (!HONOR_SIGNED_ZEROS (@0))
>>(convert (lt @0 { build_real (TREE_TYPE (@0), dconst0); }
>> +
>> +/* When one argument is a constant, overflow detection can be simplified.
>> +   Currently restricted to single use so as not to interfere too much with
>> +   ADD_OVERFLOW detection in tree-ssa-math-opts.c.
>> +   A + CST CMP A 

Re: match.pd patch: u + 3 < u is u > UINT_MAX - 3

2016-04-26 Thread Marc Glisse

On Tue, 26 Apr 2016, Richard Biener wrote:


By the way, it would be cool to be able to write:
(lt:c @0 @1)

which would expand to both
(lt @0 @1)
(gt @1 @0)

(as per swap_tree_comparison or swapped_tcc_comparison)


Yeah, I know...  I was hesitant to overload :c with "slightly" different
semantics though.

I can give it a shot though - it would avoid quite some repetition I guess.


Being able to write (lt:c @0 @1) is easy, see attached (didn't check
if it works),
but being able to write

(for cmp (lt gt)
 (cmp:c @0 @1)

is harder (see FIXME), you'd have to create a new for at the nesting
level of the
old with the operator list adjusted.  Not impossible, of course.

Includes some verification I added locally at some point (which also exposed we
use :c on non-commutative tree codes, thus the new :C ...).


Ah, I was hoping it would be as simple as adding op=commuted_op(op) at the 
place where the regular commutation gets lowered. If it is significantly 
more complicated, I guess it isn't that urgent...


--
Marc Glisse


Re: Move "X +- C1 CMP C2 to X CMP C2 -+ C1" to match.pd

2016-04-26 Thread Richard Biener
On Sun, Apr 24, 2016 at 7:14 PM, Marc Glisse  wrote:
> Hello,
>
> trying to move a first pattern from fold_comparison.
>
> I first tried without single_use. It brought the number of 'free' in
> g++.dg/tree-ssa/pr61034.C down to 11, changed gcc.dg/sms-6.c to only 2 SMS
> (I don't think the generated code was worse, maybe even better, but I don't
> know ppc asm), broke Wstrict-overflow-18.c (the optimization moved from VRP
> to CCP if I remember correctly), and caused IVOPTS to make a mess in
> guality/pr54693-2.c (much longer code, and many  debug
> variables). If someone wants to drop the single_use, they can work on that
> after this patch is in.
>
> The conditions do not exactly match the ones in fold-const.c, but I guess
> they are close. The warning in the constant case was missing in
> fold_comparison, but present in VRP, so I had to add it not to regress.
>
> I don't think we were warning much from match.pd. I can't say that I am a
> big fan of those strict overflow warnings, but I expect people would
> complain if we just dropped the existing ones when moving the transforms to
> match.pd?

I just dropped them for patterns I moved (luckily we didn't have
testcases sofar ;))

If we really want to warn from match.pd then you should do the defer/undefer
stuff in fold_stmt itself (same condition I guess) and defer/undefer without
warning in gimple_fold_stmt_to_constant_1.

So you actually ran into a testcase that expected the warning?

> I wanted to restrict the equality case to TYPE_OVERFLOW_WRAPS ||
> TYPE_OVERFLOW_UNDEFINED, but that broke 20041114-1.c at -O1 (no strict
> overflow), so I went with some kind of complement we use elsewhere.

I think I prefer to move things 1:1 (unless sth regresses) and fix bugs in the
fold-const.c variant as followup (possibly also adding testcases).

IMHO -fno-strict-overflow needs to go.  It has very wary designed semantics
(ops neither wrap nor have undefined overflow).

> Now that
> I am writing this, I don't remember why I didn't just add -fstrict-overflow
> to the testcase, or xfail it at -O1. The saturating case could be handled as
> long as the constant is not an extremum, but I don't think we really handle
> saturating integers anyway.
>
> I split the equality case, because it was already getting ugly.

That's fine.

Thanks,
Richard.

> Bootstrap+regtest on powerpc64le-unknown-linux-gnu.
>
> 2016-04-25  Marc Glisse  
>
> gcc/
> * fold-const.h: Include flag-types.h.
> (fold_overflow_warning): Declare.
> * fold-const.c (fold_overflow_warning): Make non-static.
> (fold_comparison): Move the transformation of X +- C1 CMP C2
> into X CMP C2 -+ C1 ...
> * match.pd: ... here.
> * tree-ssa-forwprop.c (execute): Protect fold_stmt with
> fold_defer_overflow_warnings.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/20040305-1.c: Adjust.
>
>
> --
> Marc Glisse
> Index: trunk4/gcc/fold-const.c
> ===
> --- trunk4/gcc/fold-const.c (revision 235384)
> +++ trunk4/gcc/fold-const.c (working copy)
> @@ -290,21 +290,21 @@ fold_undefer_and_ignore_overflow_warning
>
>  bool
>  fold_deferring_overflow_warnings_p (void)
>  {
>return fold_deferring_overflow_warnings > 0;
>  }
>
>  /* This is called when we fold something based on the fact that signed
> overflow is undefined.  */
>
> -static void
> +void
>  fold_overflow_warning (const char* gmsgid, enum warn_strict_overflow_code
> wc)
>  {
>if (fold_deferring_overflow_warnings > 0)
>  {
>if (fold_deferred_overflow_warning == NULL
>   || wc < fold_deferred_overflow_code)
> {
>   fold_deferred_overflow_warning = gmsgid;
>   fold_deferred_overflow_code = wc;
> }
> @@ -8366,89 +8366,20 @@ fold_comparison (location_t loc, enum tr
>  {
>const bool equality_code = (code == EQ_EXPR || code == NE_EXPR);
>tree arg0, arg1, tem;
>
>arg0 = op0;
>arg1 = op1;
>
>STRIP_SIGN_NOPS (arg0);
>STRIP_SIGN_NOPS (arg1);
>
> -  /* Transform comparisons of the form X +- C1 CMP C2 to X CMP C2 -+ C1.
> */
> -  if ((TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR)
> -  && (equality_code
> - || (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
> - && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0
> -  && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST
> -  && !TREE_OVERFLOW (TREE_OPERAND (arg0, 1))
> -  && TREE_CODE (arg1) == INTEGER_CST
> -  && !TREE_OVERFLOW (arg1))
> -{
> -  const enum tree_code
> -   reverse_op = TREE_CODE (arg0) == PLUS_EXPR ? MINUS_EXPR : PLUS_EXPR;
> -  tree const1 = TREE_OPERAND (arg0, 1);
> -  tree const2 = fold_convert_loc (loc, TREE_TYPE (const1), arg1);
> -  tree variable = TREE_OPERAND (arg0, 0);
> -  tree new_const = int_const_binop (reverse_op, const2, const1);
> -
> -  /* If the constant operation overflowed this can be
> -simplif

Re: match.pd patch: u + 3 < u is u > UINT_MAX - 3

2016-04-26 Thread Richard Biener
On Tue, Apr 26, 2016 at 1:57 PM, Marc Glisse  wrote:
> On Tue, 26 Apr 2016, Richard Biener wrote:
>
 By the way, it would be cool to be able to write:
 (lt:c @0 @1)

 which would expand to both
 (lt @0 @1)
 (gt @1 @0)

 (as per swap_tree_comparison or swapped_tcc_comparison)
>>>
>>>
>>> Yeah, I know...  I was hesitant to overload :c with "slightly" different
>>> semantics though.
>>>
>>> I can give it a shot though - it would avoid quite some repetition I
>>> guess.
>>
>>
>> Being able to write (lt:c @0 @1) is easy, see attached (didn't check
>> if it works),
>> but being able to write
>>
>> (for cmp (lt gt)
>>  (cmp:c @0 @1)
>>
>> is harder (see FIXME), you'd have to create a new for at the nesting
>> level of the
>> old with the operator list adjusted.  Not impossible, of course.
>>
>> Includes some verification I added locally at some point (which also
>> exposed we
>> use :c on non-commutative tree codes, thus the new :C ...).
>
>
> Ah, I was hoping it would be as simple as adding op=commuted_op(op) at the
> place where the regular commutation gets lowered. If it is significantly
> more complicated, I guess it isn't that urgent...

Yeah, the issue is that lowering of (for ...) has to happen last and
thus we have
to commutate the (for...)s themselves.  Possible, but some work.

Richard.

> --
> Marc Glisse


Re: match.pd: x+x -> 2*x

2016-04-26 Thread Richard Biener
On Tue, Apr 26, 2016 at 1:37 PM, Marc Glisse  wrote:
> On Tue, 26 Apr 2016, Richard Biener wrote:
>
>> Note that I think
>>
>> /* (X /[ex] A) * A -> X.  */
>> (simplify
>>  (mult (convert? (exact_div @0 @1)) @1)
>>  /* Look through a sign-changing conversion.  */
>>  (convert @0))
>>
>> has a bug as we use operand_equal_p for comparing @1 but
>> that treats equal but different typed INTEGER_CSTs as equal...
>>
>> Thus this lacks the tree_nop_conversion_p check.
>
>
> You seemed ok with removing that check last year
> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01551.html

Ah, ok.

> For this specific pattern, I think any conversion should work. Did you have
> a particular example in mind?

No, I just saw the fold-const.c code again and wondered.

Richard.

> --
> Marc Glisse


Re: Fix some i386 testcases for -frename-registers

2016-04-26 Thread Bernd Schmidt

On 01/29/2016 01:19 PM, Uros Bizjak wrote:

* gcc.target/i386/avx512bw-vptestmb-1.c: Correct [xyz]mm register
number scans.
* gcc.target/i386/avx512bw-vptestmw-1.c: Likewise.
* gcc.target/i386/avx512bw-vptestnmb-1.c: Likewise.
* gcc.target/i386/avx512bw-vptestnmw-1.c: Likewise.
* gcc.target/i386/avx512cd-vpbroadcastmb2q-1.c: Likewise.
* gcc.target/i386/avx512cd-vpbroadcastmw2d-1.c: Likewise.
* gcc.target/i386/avx512dq-vfpclasspd-1.c: Likewise.
* gcc.target/i386/avx512dq-vfpclassps-1.c: Likewise.
* gcc.target/i386/avx512dq-vinsertf64x2-1.c: Likewise.
* gcc.target/i386/avx512dq-vinserti64x2-1.c: Likewise.
* gcc.target/i386/avx512f-gather-5.c: Likewise.
* gcc.target/i386/avx512f-vptestmd-1.c: Likewise.
* gcc.target/i386/avx512f-vptestmq-1.c: Likewise.
* gcc.target/i386/avx512f-vptestnmd-1.c: Likewise.
* gcc.target/i386/avx512f-vptestnmq-1.c: Likewise.
* gcc.target/i386/avx512f-vrndscaleps-1.c: Likewise.
* gcc.target/i386/avx512vl-vpbroadcastmb2q-1.c: Likewise.
* gcc.target/i386/avx512vl-vpbroadcastmw2d-1.c: Likewise.
* gcc.target/i386/avx512vl-vptestmd-1.c: Likewise.
* gcc.target/i386/avx512vl-vptestmq-1.c: Likewise.
* gcc.target/i386/avx512vl-vptestnmd-1.c: Likewise.
* gcc.target/i386/avx512vl-vptestnmq-1.c: Likewise.
* gcc.target/i386/pr32219-2.c: Allow registers other than %eax in
scans.
* gcc.target/i386/pr32219-4.c: Likewise.
* gcc.target/i386/pr32219-6.c: Likewise.
* gcc.target/i386/pr32219-8.c: Likewise.


OK.


Now committed.


Bernd


Re: [PATCH] Fixup nb_iterations_upper_bound adjustment for vectorized loops

2016-04-26 Thread Ilya Enkovich
2016-04-22 10:13 GMT+03:00 Richard Biener :
> On Thu, Apr 21, 2016 at 6:09 PM, Ilya Enkovich  wrote:
>> Hi,
>>
>> Currently when loop is vectorized we adjust its nb_iterations_upper_bound
>> by dividing it by VF.  This is incorrect since nb_iterations_upper_bound
>> is upper bound for ( - 1) and therefore simple
>> dividing it by VF in many cases gives us bounds greater than a real one.
>> Correct value would be ((nb_iterations_upper_bound + 1) / VF - 1).
>
> Yeah, that seems correct.
>
>> Also decrement due to peeling for gaps should happen before we scale it
>> by VF because peeling applies to a scalar loop, not vectorized one.
>
> That's not true - PEELING_FOR_GAPs is so that the last _vector_ iteration
> is peeled as scalar operations.  We do not account for the amount
> of known prologue peeling (if peeling for alignment and the misalignment
> is known at compile-time) - that would be peeling of scalar iterations.

My initial patch didn't change anything for PEELING_FOR_GAP and it caused
a runfail for one of SPEC2006 benchmarks.  My investigation showed number
of vector iterations calculation doesn't match nb_iterations_upper_bound
adjustment in a way PEELING_FOR_GAP is accounted.

Looking into vect_generate_tmps_on_preheader I see:

/* If epilogue loop is required because of data accesses with gaps, we
   subtract one iteration from the total number of iterations here for
   correct calculation of RATIO.  */

And then we decrement loop counter before dividing it by VF to compute
ratio and ratio_mult_vf.  This doesn't match nb_iterations_upper_bound
update and that's why I fixed it.  This resolved runfail for me.

Thus ratio_mult_vf computation conflicts with your statement we peel a
vector iteration.

>
> But it would be interesting to know why we need the != 0 check - static
> cost modelling should have disabled vectorization if the vectorized body
> isn't run.
>
>> This patch modifies nb_iterations_upper_bound computation to resolve
>> these issues.
>
> You do not adjust the ->nb_iterations_estimate accordingly.
>
>> Running regression testing I got one fail due to optimized loop. Heres
>> is a loop:
>>
>> foo (signed char s)
>> {
>>   signed char i;
>>   for (i = 0; i < s; i++)
>> yy[i] = (signed int) i;
>> }
>>
>> Here we vectorize for AVX512 using VF=64.  Original loop has max 127
>> iterations and therefore vectorized loop may be executed only once.
>> With the patch applied compiler detects it and transforms loop into
>> BB with just stores of constants vectors into yy.  Test was adjusted
>> to increase number of possible iterations.  A copy of test was added
>> to check we can optimize out the original loop.
>>
>> Bootstrapped and regtested on x86_64-pc-linux-gnu.  OK for trunk?
>
> I'd like to see testcases covering the corner-cases - have them have
> upper bound estimates by adjusting known array sizes and also cover
> the case of peeling for gaps.

OK, I'll make more tests.

Thanks,
Ilya

>
> Richard.
>


Re: C/C++ PATCH to add -Wdangling-else option

2016-04-26 Thread Marek Polacek
On Wed, Apr 13, 2016 at 05:16:12PM +0200, Jakub Jelinek wrote:
> On Wed, Apr 13, 2016 at 05:13:28PM +0200, Bernd Schmidt wrote:
> > On 04/13/2016 04:14 PM, Marek Polacek wrote:
> > >This patch is meant to be applied on top of the "Wparentheses overhaul" 
> > >patch.
> > >
> > >I really think that warning about the dangling else problem isn't 
> > >appropriate
> > >as a part of the -Wparentheses warning, which I think should only deal with
> > >stuff like precedence of operators, i.e. things where ()'s are missing and 
> > >not
> > >{}'s.
> > >
> > >This new warning is, however, a subset of -Wparentheses.
> > >
> > >Bootstrapped/regtested on x86_64-linux, ok for trunk or should I stash it
> > >for the next stage1?
> > 
> > I think it's not appropriate for now. I'm ambivalent about the concept; my
> > (vague) recollection is that putting it under -Wparentheses was Kenner's
> > idea, and it's been there so long that I'm not sure there's really a point
> > to changing this. In a sense it is a very similar problem as operator
> > precedence.
> 
> Well, even with the change it is still included with -Wparentheses, just
> it is a suboption with more specific name that can be enabled/disabled
> independently from -Wparentheses if needed.
> Though, of course, it can wait for GCC 7.

So how do y'all feel about this patch now that we're in stage1?

Marek


Re: C/C++ PATCH to add -Wdangling-else option

2016-04-26 Thread Jakub Jelinek
On Tue, Apr 26, 2016 at 02:32:01PM +0200, Marek Polacek wrote:
> On Wed, Apr 13, 2016 at 05:16:12PM +0200, Jakub Jelinek wrote:
> > On Wed, Apr 13, 2016 at 05:13:28PM +0200, Bernd Schmidt wrote:
> > > On 04/13/2016 04:14 PM, Marek Polacek wrote:
> > > >This patch is meant to be applied on top of the "Wparentheses overhaul" 
> > > >patch.
> > > >
> > > >I really think that warning about the dangling else problem isn't 
> > > >appropriate
> > > >as a part of the -Wparentheses warning, which I think should only deal 
> > > >with
> > > >stuff like precedence of operators, i.e. things where ()'s are missing 
> > > >and not
> > > >{}'s.
> > > >
> > > >This new warning is, however, a subset of -Wparentheses.
> > > >
> > > >Bootstrapped/regtested on x86_64-linux, ok for trunk or should I stash it
> > > >for the next stage1?
> > > 
> > > I think it's not appropriate for now. I'm ambivalent about the concept; my
> > > (vague) recollection is that putting it under -Wparentheses was Kenner's
> > > idea, and it's been there so long that I'm not sure there's really a point
> > > to changing this. In a sense it is a very similar problem as operator
> > > precedence.
> > 
> > Well, even with the change it is still included with -Wparentheses, just
> > it is a suboption with more specific name that can be enabled/disabled
> > independently from -Wparentheses if needed.
> > Though, of course, it can wait for GCC 7.
> 
> So how do y'all feel about this patch now that we're in stage1?

I support that change, and -Wparentheses will still enable this, it just
gives more fine-grained control and be in line with what clang does.

Bernd, how much are you against this change?

Jakub


Re: Enabling -frename-registers?

2016-04-26 Thread Bernd Schmidt

On 04/17/2016 08:59 PM, Jeff Law wrote:

invoke.texi has an independent list (probably incomplete! ;( of all the
things that -O2 enables.  Make sure to add -frename-registers to that
list and this is Ok for the trunk (gcc-7).


This is what I committed.


Bernd

Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 235441)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,10 @@
+2016-04-26  Bernd Schmidt  
+
+   PR rtl-optimization/57193
+   * opts.c (default_options_table): Add OPT_frename_registers at -O2
+   and above.
+   * doc/invoke.texi (-frename-registers, -O2): Update documentation.
+
 2016-04-26  Bin Cheng  

* tree-if-conv.c (any_pred_load_store): New static variable.
Index: gcc/opts.c
===
--- gcc/opts.c  (revision 235441)
+++ gcc/opts.c  (working copy)
@@ -498,6 +498,7 @@ static const struct default_options defa
 { OPT_LEVELS_2_PLUS, OPT_fstrict_overflow, NULL, 1 },
 { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_freorder_blocks_algorithm_, NULL,
   REORDER_BLOCKS_ALGORITHM_STC },
+{ OPT_LEVELS_2_PLUS, OPT_frename_registers, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_freorder_functions, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 },
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 235441)
+++ gcc/doc/invoke.texi (working copy)
@@ -6255,6 +6255,7 @@ also turns on the following optimization
 -foptimize-strlen @gol
 -fpartial-inlining @gol
 -fpeephole2 @gol
+-frename-registers @gol
 -freorder-blocks-algorithm=stc @gol
 -freorder-blocks-and-partition -freorder-functions @gol
 -frerun-cse-after-loop  @gol
@@ -8562,7 +8563,8 @@ debug information format adopted by the
 make debugging impossible, since variables no longer stay in
 a ``home register''.

-Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops}.
+Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops},
+and also enabled at levels @option{-O2} and @option{-O3}.

 @item -fschedule-fusion
 @opindex fschedule-fusion


Re: [PATCH] Fix missed DSE opportunity with operator delete.

2016-04-26 Thread Jason Merrill
On Tue, Apr 26, 2016 at 5:07 AM, Richard Biener
 wrote:
> On Mon, Apr 25, 2016 at 9:57 PM, Jason Merrill  wrote:
>> Hmm, this seems to assume that operator delete itself doesn't do
>> anything with the object being deleted.  This is true of the default
>> implementation, but I don't see anything in the standard that
>> prohibits a user-supplied replacement or class-specific deallocation
>> function from accessing the memory.
>
> Hmm, but the delete expression invokes the (default) destructor which
> ends the lifetime of the object and thus invalidates all memory.  Don't
> we place a CLOBBER there as well nowadays (seems not).

We put a CLOBBER inside the destructor.  Now that you mention it, I
suppose we would get more DSE if we put it in the calling context
instead.

But for types with trivial destructors, including scalar types, the
lifetime of the object doesn't end until the storage is released,
somewhere within the call to operator delete.  The clang folks have
been arguing recently that this asymmetry is harmful, so we might move
toward a model where trivial destruction also ends the lifetime of an
object, but that's currently unclear.

operator delete is like free in that it renders the pointer invalid.

Jason


[PATCH] Fix up inchash::add_expr to match more closely operand_equal_p (PR sanitizer/70683)

2016-04-26 Thread Jakub Jelinek
Hi!

As mentioned in the PR, the bug that has been eventually worked around
by making the C++ constexpr hash tables non-deletable is that
operand_equal_p, which is used in lots of hash tables as the equal function
(with last argument 0), may return true, i.e. a match, even for trees that
have different hash values (when iterative_hash_expr is used).

The following (included) patch changes inchash::add_expr, which is the 
underlying
function, to match what operand_equal_p does more closely, in particular
attempting to make sure that if operand_equal_p returns non-zero, then
the hash must be the same.

I've been using the attached hack; without this patch during x86_64-linux
and i686-linux yes,extra,rtl checking bootstraps there were 66931
notes (surprisingly only from the ivopts and gimple-ssa-strength-reduction
hash tables, no others), with the patch there are none.

Ok for trunk?

The debugging hack is too ugly and slows down the compiler (by artificially
increasing number of collisions), so it is not appropriate, but perhaps we
can add some internal only use OEP_* flag, pass it to the recursive calls
of operand_equal_p and if not set and flag_checking, verify
iterative_hash_expr equality in the outermost call).

2016-04-26  Jakub Jelinek  

PR sanitizer/70683
* tree.h (inchash::add_expr): Add FLAGS argument.
* tree.c (inchash::add_expr): Likewise.  If not OEP_ADDRESS_OF,
use STRIP_NOPS first.  For INTEGER_CST assert not OEP_ADDRESS_OF.
For REAL_CST and !HONOR_SIGNED_ZEROS (t) hash +/- 0 the same.
Formatting fix.  Adjust recursive calls.  For tcc_comparison,
if swap_tree_comparison (code) is smaller than code, hash that
and arguments in the other order.  Hash CONVERT_EXPR the same
as NOP_EXPR.  For OEP_ADDRESS_OF hash MEM_REF with 0 offset
of ADDR_EXPR of decl as the decl itself.  Add or remove
OEP_ADDRESS_OF from recursive flags as needed.  For
FMA_EXPR, WIDEN_MULT_{PLUS,MINUS}_EXPR hash the first two
operands commutatively and only the third one normally.
For internal CALL_EXPR hash in CALL_EXPR_IFN.

--- gcc/tree.h.jj   2016-04-22 18:21:32.0 +0200
+++ gcc/tree.h  2016-04-26 10:59:50.333534452 +0200
@@ -4759,7 +4759,7 @@ extern int simple_cst_equal (const_tree,
 namespace inchash
 {
 
-extern void add_expr (const_tree, hash &);
+extern void add_expr (const_tree, hash &, unsigned int = 0);
 
 }
 
--- gcc/tree.c.jj   2016-04-22 18:21:32.0 +0200
+++ gcc/tree.c  2016-04-26 11:20:48.795359078 +0200
@@ -7769,7 +7769,7 @@ namespace inchash
This function is intended to produce the same hash for expressions which
would compare equal using operand_equal_p.  */
 void
-add_expr (const_tree t, inchash::hash &hstate)
+add_expr (const_tree t, inchash::hash &hstate, unsigned int flags)
 {
   int i;
   enum tree_code code;
@@ -7781,6 +7781,9 @@ add_expr (const_tree t, inchash::hash &h
   return;
 }
 
+  if (!(flags & OEP_ADDRESS_OF))
+STRIP_NOPS (t);
+
   code = TREE_CODE (t);
 
   switch (code)
@@ -7791,12 +7794,17 @@ add_expr (const_tree t, inchash::hash &h
   hstate.merge_hash (0);
   return;
 case INTEGER_CST:
+  gcc_checking_assert (!(flags & OEP_ADDRESS_OF));
   for (i = 0; i < TREE_INT_CST_NUNITS (t); i++)
hstate.add_wide_int (TREE_INT_CST_ELT (t, i));
   return;
 case REAL_CST:
   {
-   unsigned int val2 = real_hash (TREE_REAL_CST_PTR (t));
+   unsigned int val2;
+   if (!HONOR_SIGNED_ZEROS (t) && real_zerop (t))
+ val2 = rvc_zero;
+   else
+ val2 = real_hash (TREE_REAL_CST_PTR (t));
hstate.merge_hash (val2);
return;
   }
@@ -7807,17 +7815,18 @@ add_expr (const_tree t, inchash::hash &h
return;
   }
 case STRING_CST:
-  hstate.add ((const void *) TREE_STRING_POINTER (t), TREE_STRING_LENGTH 
(t));
+  hstate.add ((const void *) TREE_STRING_POINTER (t),
+ TREE_STRING_LENGTH (t));
   return;
 case COMPLEX_CST:
-  inchash::add_expr (TREE_REALPART (t), hstate);
-  inchash::add_expr (TREE_IMAGPART (t), hstate);
+  inchash::add_expr (TREE_REALPART (t), hstate, flags);
+  inchash::add_expr (TREE_IMAGPART (t), hstate, flags);
   return;
 case VECTOR_CST:
   {
unsigned i;
for (i = 0; i < VECTOR_CST_NELTS (t); ++i)
- inchash::add_expr (VECTOR_CST_ELT (t, i), hstate);
+ inchash::add_expr (VECTOR_CST_ELT (t, i), hstate, flags);
return;
   }
 case SSA_NAME:
@@ -7831,16 +7840,17 @@ add_expr (const_tree t, inchash::hash &h
   /* A list of expressions, for a CALL_EXPR or as the elements of a
 VECTOR_CST.  */
   for (; t; t = TREE_CHAIN (t))
-   inchash::add_expr (TREE_VALUE (t), hstate);
+   inchash::add_expr (TREE_VALUE (t), hstate, flags);
   return;
 case CONSTRUCTOR:
   {
unsigned HOST_WIDE_INT idx;
tree field,

Re: C/C++ PATCH to add -Wdangling-else option

2016-04-26 Thread Bernd Schmidt

On 04/26/2016 02:39 PM, Jakub Jelinek wrote:

I support that change, and -Wparentheses will still enable this, it just
gives more fine-grained control and be in line with what clang does.

Bernd, how much are you against this change?


Don't really care that much, I just don't quite see the point. Don't let 
me stop you though.



Bernd



Re: C, C++: New warning for memset without multiply by elt size

2016-04-26 Thread Bernd Schmidt

On 04/25/2016 07:55 PM, Jason Merrill wrote:

On 04/25/2016 05:07 AM, Bernd Schmidt wrote:

+if (TREE_CODE (arg2) == CONST_DECL)
+  arg2 = DECL_INITIAL (arg2);
+int literal_mask = ((!!integer_zerop (arg1) << 1)
+| (!!integer_zerop (arg2) << 2));


Are you deliberately treating an enumerator as a literal 0?  I'd think
we should set literal_mask before stripping CONST_DECL.  OK with that
change.


Just to be sure, is that OK for everything or just the C++ parts?


Bernd



Re: [PATCH] add support for placing variables in shared memory

2016-04-26 Thread Nathan Sidwell

On 04/25/16 13:49, Alexander Monakov wrote:

On Mon, 25 Apr 2016, Nathan Sidwell wrote:

acceptable?


No, that really doesn't sound viable.  You'd need to somehow take into account
every instance where the compiler attempts to switch sections internally
(.text/.data/.bss, -ffunction-sections/-fdata-sections etc.).


Hm, -f{function/data}=sections would be the killer here.



Ugh.  Checking DECL_INITIAL in nvptx_encode_section_info would be much
simpler (and that's how other backends perform a similar test).


If that's available, then great.


Note, rejecting zero-initializers is debatable:
C and C++ don't have a concept of uninitialized global-scope data; if
the initializer is missing, it's exactly as if it was 0.  However, GCC has
-fcommon enabled by default (which, btw, shouldn't we change on NVPTX?), and
that makes a difference: 'int v = 0;' is a strong definition, while 'int v;'
becomes a common symbol, and ultimately a weak definition on NVPTX.

So if all-zeros initializers are rejected, to make a strong definition of a
shared variable one would have to write:


I think we should  reject the cases where the backend gets to see an explicit 
initializer.



fno-common would be a good default for PTX (I had thought it was  already on?)

nathan


Re: [C PATCH] Follow-up fix to the misclassified token problem (PR c/67784)

2016-04-26 Thread Marek Polacek
On Tue, Apr 26, 2016 at 11:44:52AM +, Joseph Myers wrote:
> On Tue, 26 Apr 2016, Marek Polacek wrote:
> 
> > This PR was reopened, because the exact same problem with treating a 
> > TYPENAME
> > wrongly as an ID was found when using just if-clause, without an enclosing 
> > for
> > loop.  More details: 
> > .
> > That fix had a follow-up, because it broke some ObjC code.
> > 
> > To fix this, we need to use the (amended) token reclassification even when
> > parsing an if statement.  I factored the code into a separate function so 
> > as to
> > not repeat the very same code.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> I think you need more thorough testcases, to cover the cases where the if 
> statement with no else forms the body of a switch or while statement, and 
> the declaration in question appears in the expression of that switch or 
> while statement, e.g.
> 
> typedef int T;
> 
> switch (sizeof (enum { T }))
>   if (1)
> ;
> T x;

Ah, right, that revealed two more places that were missing the
c_parser_maybe_reclassify_token call.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-26  Marek Polacek  

PR c/67784
* c-parser.c (c_parser_maybe_reclassify_token): New function factored
out of ...
(c_parser_for_statement): ... here.
(c_parser_if_statement): Use it.
(c_parser_switch_statement): Use it.
(c_parser_while_statement): Use it.

* gcc.dg/pr67784-3.c: New test.
* gcc.dg/pr67784-4.c: New test.
* gcc.dg/pr67784-5.c: New test.

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index bdd669d..74423a6 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -5425,6 +5425,43 @@ c_parser_else_body (c_parser *parser, const 
token_indent_info &else_tinfo,
   return c_end_compound_stmt (body_loc, block, flag_isoc99);
 }
 
+/* We might need to reclassify any previously-lexed identifier, e.g.
+   when we've left a for loop with an if-statement without else in the
+   body - we might have used a wrong scope for the token.  See PR67784.  */
+
+static void
+c_parser_maybe_reclassify_token (c_parser *parser)
+{
+  if (c_parser_next_token_is (parser, CPP_NAME))
+{
+  c_token *token = c_parser_peek_token (parser);
+
+  if (token->id_kind != C_ID_CLASSNAME)
+   {
+ tree decl = lookup_name (token->value);
+
+ token->id_kind = C_ID_ID;
+ if (decl)
+   {
+ if (TREE_CODE (decl) == TYPE_DECL)
+   token->id_kind = C_ID_TYPENAME;
+   }
+ else if (c_dialect_objc ())
+   {
+ tree objc_interface_decl = objc_is_class_name (token->value);
+ /* Objective-C class names are in the same namespace as
+variables and typedefs, and hence are shadowed by local
+declarations.  */
+ if (objc_interface_decl)
+   {
+ token->value = objc_interface_decl;
+ token->id_kind = C_ID_CLASSNAME;
+   }
+   }
+   }
+}
+}
+
 /* Parse an if statement (C90 6.6.4, C99 6.8.4).
 
if-statement:
@@ -5523,6 +5560,7 @@ c_parser_if_statement (c_parser *parser, bool *if_p, 
vec *chain)
   if (flag_cilkplus && contains_array_notation_expr (if_stmt))
 if_stmt = fix_conditional_array_notations (if_stmt);
   add_stmt (if_stmt);
+  c_parser_maybe_reclassify_token (parser);
 }
 
 /* Parse a switch statement (C90 6.6.4, C99 6.8.4).
@@ -5578,6 +5616,7 @@ c_parser_switch_statement (c_parser *parser)
 }
   c_break_label = save_break;
   add_stmt (c_end_compound_stmt (switch_loc, block, flag_isoc99));
+  c_parser_maybe_reclassify_token (parser);
 }
 
 /* Parse a while statement (C90 6.6.5, C99 6.8.5).
@@ -5620,6 +5659,7 @@ c_parser_while_statement (c_parser *parser, bool ivdep, 
bool *if_p)
   body = c_parser_c99_block_statement (parser, if_p);
   c_finish_loop (loc, cond, NULL, body, c_break_label, c_cont_label, true);
   add_stmt (c_end_compound_stmt (loc, block, flag_isoc99));
+  c_parser_maybe_reclassify_token (parser);
 
   token_indent_info next_tinfo
 = get_token_indent_info (c_parser_peek_token (parser));
@@ -5916,38 +5956,7 @@ c_parser_for_statement (c_parser *parser, bool ivdep, 
bool *if_p)
   else
 c_finish_loop (loc, cond, incr, body, c_break_label, c_cont_label, true);
   add_stmt (c_end_compound_stmt (loc, block, flag_isoc99 || c_dialect_objc 
()));
-
-  /* We might need to reclassify any previously-lexed identifier, e.g.
- when we've left a for loop with an if-statement without else in the
- body - we might have used a wrong scope for the token.  See PR67784.  */
-  if (c_parser_next_token_is (parser, CPP_NAME))
-{
-  c_token *token = c_parser_peek_token (parser);
-
-  if (token->id_kind != C_ID_CLASSNAME)
-   {
- tree decl = lookup_name (token->value);
-
- token->id_kind = C_ID_ID;
- if 

[PATCH] Fix a recent warning in reorg.c

2016-04-26 Thread Jakub Jelinek
Hi!

I've noticed a warning during bootstrap:
../../gcc/reorg.c: In function ‘void try_merge_delay_insns(rtx_insn*, 
rtx_insn*)’:
../../gcc/reorg.c:1431:12: warning: name lookup of ‘i’ changed
   for (i = 0; i < XVECLEN (PATTERN (insn), 0); i++)
^
../../gcc/reorg.c:1263:7: warning:   matches this ‘i’ under ISO standard rules
   int i, j;
   ^
../../gcc/reorg.c:1413:25: warning:   matches this ‘i’ under old rules
   for (unsigned int i = len - 1; i < len; i--)
 ^
It is not fatal, but still ugly.  The problem is that the function has
  int i;
...
  for (i = 0; ...)
...
  for (unsigned int i = ... )
...
  for (i = 0; ...)
This patch just declares the var in the only affected loop, so that the warning
is not emitted, unless we start checking -Wshadow warnings, I think this is
good enough.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-04-26  Jakub Jelinek  

* reorg.c (try_merge_delay_insns): Declare i inside the last
for loop to avoid warning.

--- gcc/reorg.c.jj  2016-04-26 08:08:16.0 +0200
+++ gcc/reorg.c 2016-04-26 11:13:53.212030471 +0200
@@ -1428,7 +1428,7 @@ try_merge_delay_insns (rtx_insn *insn, r
 
   INSN_ANNULLED_BRANCH_P (delay_insn) = 0;
 
-  for (i = 0; i < XVECLEN (PATTERN (insn), 0); i++)
+  for (int i = 0; i < XVECLEN (PATTERN (insn), 0); i++)
INSN_FROM_TARGET_P (XVECEXP (PATTERN (insn), 0, i)) = 0;
 }
 }

Jakub


Re: [C PATCH] Follow-up fix to the misclassified token problem (PR c/67784)

2016-04-26 Thread Jakub Jelinek
On Tue, Apr 26, 2016 at 03:06:16PM +0200, Marek Polacek wrote:
> On Tue, Apr 26, 2016 at 11:44:52AM +, Joseph Myers wrote:
> > On Tue, 26 Apr 2016, Marek Polacek wrote:
> > 
> > > This PR was reopened, because the exact same problem with treating a 
> > > TYPENAME
> > > wrongly as an ID was found when using just if-clause, without an 
> > > enclosing for
> > > loop.  More details: 
> > > .
> > > That fix had a follow-up, because it broke some ObjC code.
> > > 
> > > To fix this, we need to use the (amended) token reclassification even when
> > > parsing an if statement.  I factored the code into a separate function so 
> > > as to
> > > not repeat the very same code.
> > > 
> > > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > 
> > I think you need more thorough testcases, to cover the cases where the if 
> > statement with no else forms the body of a switch or while statement, and 
> > the declaration in question appears in the expression of that switch or 
> > while statement, e.g.
> > 
> > typedef int T;
> > 
> > switch (sizeof (enum { T }))
> >   if (1)
> > ;
> > T x;
> 
> Ah, right, that revealed two more places that were missing the
> c_parser_maybe_reclassify_token call.

Bet there are many more, the various OpenMP/OpenACC/Cilk+ loops (if without
{}s around the body or in between collapsed loops), perhaps even #pragma GCC
ivdep.  But I can handle those incrementally.

Jakub


Re: [PATCH] Fix a recent warning in reorg.c

2016-04-26 Thread Bernd Schmidt

On 04/26/2016 03:08 PM, Jakub Jelinek wrote:

It is not fatal, but still ugly.  The problem is that the function has
   int i;
...
   for (i = 0; ...)
...
   for (unsigned int i = ... )
...
   for (i = 0; ...)
This patch just declares the var in the only affected loop, so that the warning
is not emitted, unless we start checking -Wshadow warnings, I think this is
good enough.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-04-26  Jakub Jelinek  

* reorg.c (try_merge_delay_insns): Declare i inside the last
for loop to avoid warning.

--- gcc/reorg.c.jj  2016-04-26 08:08:16.0 +0200
+++ gcc/reorg.c 2016-04-26 11:13:53.212030471 +0200
@@ -1428,7 +1428,7 @@ try_merge_delay_insns (rtx_insn *insn, r

INSN_ANNULLED_BRANCH_P (delay_insn) = 0;

-  for (i = 0; i < XVECLEN (PATTERN (insn), 0); i++)
+  for (int i = 0; i < XVECLEN (PATTERN (insn), 0); i++)
INSN_FROM_TARGET_P (XVECEXP (PATTERN (insn), 0, i)) = 0;
  }
  }


Can you make all for statements declare their i as int and remove the 
outer declaration?



Bernd


Re: [PATCH] Fix a recent warning in reorg.c

2016-04-26 Thread Bernd Schmidt

On 04/26/2016 03:08 PM, Jakub Jelinek wrote:

^
../../gcc/reorg.c:1413:25: warning:   matches this ‘i’ under old rules
for (unsigned int i = len - 1; i < len; i--)
  ^


Oh, and also - I flagged this while reviewing other parts of Trevor's 
changes, this pattern is too ugly to live, that should be a 
FOR_EACH_VEC_ELT_



Bernd



[Patch AArch64] Set TARGET_OMIT_STRUCT_RETURN_REG to true.

2016-04-26 Thread Ramana Radhakrishnan
As $SUBJECT. The reason this caught my eye on aarch64 is because
the return value register (x0) is not identical to the register in which
the hidden parameter for AArch64 is set (x8). Thus setting this to true
seems to be quite reasonable and shaves off 100 odd mov x0, x8's from
cc1 in a bootstrap build.

I don't expect this to make a huge impact on performance but as they say every 
little counts. 
The AAPCS64 is quite explicit about not requiring that the contents of x8 be 
kept live.

Bootstrapped and regression tested on aarch64.

Ok to apply ?

Ramana

gcc/
* config/aarch64/aarch64.c (TARGET_OMIT_STRUCT_RETURN_REG): Set to
true.

gcc/testsuite

* gcc.target/aarch64/struct_return.c: New test.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
335161992572d36144ceecd9bdda5f387f3a1a2b..df908a56ce5b024a56aeb4d6fcf74cbef0ac7132
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14359,6 +14359,9 @@ aarch64_optab_supported_p (int op, machine_mode, 
machine_mode,
 #undef TARGET_OPTAB_SUPPORTED_P
 #define TARGET_OPTAB_SUPPORTED_P aarch64_optab_supported_p
 
+#undef TARGET_OMIT_STRUCT_RETURN_REG
+#define TARGET_OMIT_STRUCT_RETURN_REG true
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-aarch64.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/struct_return.c 
b/gcc/testsuite/gcc.target/aarch64/struct_return.c
new file mode 100644
index 
..6d90b7e59537ea66ad89e3615a2b11f2838b8779
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/struct_return.c
@@ -0,0 +1,31 @@
+/* Test the absence of a spurious move from x8 to x0 for functions
+   return structures.  */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct s
+{
+  long x;
+  long y;
+  long z;
+};
+
+struct s __attribute__((noinline))
+foo (long a, long d, long c)
+{
+  struct s b;
+  b.x = a;
+  b.y = d;
+  b.z = c;
+  return b;
+}
+
+int
+main (void)
+{
+  struct s x;
+  x = foo ( 10, 20, 30);
+  return x.x + x.y + x.z;
+}
+
+/* { dg-final { scan-assembler-not "mov\tx0, x8" } } */


Re: [PATCH][AArch64] Replace insn to zero up SIMD registers

2016-04-26 Thread Wilco Dijkstra
Evandro Menezes wrote:
>On 03/10/16 10:37, James Greenhalgh wrote:
>> Thanks for sticking with it. This is OK for GCC 7 when development
>> opens.
>>
>> Remember to mention the most recent changes in your Changelog entry
>> (Remove "fp" attribute from *movhf_aarch64 and *movtf_aarch64).
>
>
> OK to commit?

The updated Changelog looks fine - James already OK'd this.

Wilco



Re: [PING][PATCH] New plugin event when evaluating a constexpr call

2016-04-26 Thread Andres Tiraboschi
Hi, thanks for answering,

2016-04-25 16:21 GMT-03:00 Jason Merrill :
> Let's create a constexpr.h rather than expose constexpr internals to all of
> the front end.  Really, I'd prefer to avoid exposing them at all. Why does
> what you want to do require all this implementation detail?

Ok, you are right, I'll make a constexpr.h.

> This is a curious place to invoke the callback.  Why before the
> *non_constant_p?  More generally, why between evaluating the arguments and
> evaluating the function body?

That was because I was interested just in the functions and its
arguments, but you are right, I'll do the callback at the beginning of
the function.

Regards,
Andrés.


Re: [PATCH] Replace old AWK script (utilizing bc) with Python implementation

2016-04-26 Thread Martin Liška
On 04/25/2016 09:01 PM, Matthias Klose wrote:
> please could you make the shebang python3? Not sure if it's good to replace 
> one old implementation with a soon to become old implementation.
> 
> Matthias

Sure, thanks for pointing out.

Attaching v2, where I changed:
+ switched from python2 to python3
+ added a new column with better readable coverage values:

HEURISTICS   BRANCHES  (REL)  HITRATE
COVERAGE COVERAGE  (REL)
loop iv compare70   0.1%  59.31% /  71.45%  
391732444  391.73M   0.0%
unconditional jump252   0.2% 100.00% / 100.00%  
62269   62.27K   0.0%
guess loop iv compare 362   0.3%  89.02% /  90.08% 
27031841642.70G   0.2%
...

As I've just tested the patch, I runs significantly faster for SPEC2006 profile 
dumps:
real 0m1.162s vs. real 0m17.962s

Martin
>From 38ee7451629e5fe7d9d2468bf24e1929193be2a6 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 25 Apr 2016 16:42:42 +0200
Subject: [PATCH] Replace AWK script with the python script.

contrib/ChangeLog:

2016-04-25  Martin Liska  

	* analyze_brprob: Remove.
	* analyze_brprob.py: New file.
---
 contrib/analyze_brprob| 147 --
 contrib/analyze_brprob.py | 136 ++
 2 files changed, 136 insertions(+), 147 deletions(-)
 delete mode 100755 contrib/analyze_brprob
 create mode 100755 contrib/analyze_brprob.py

diff --git a/contrib/analyze_brprob b/contrib/analyze_brprob
deleted file mode 100755
index 5702834..000
--- a/contrib/analyze_brprob
+++ /dev/null
@@ -1,147 +0,0 @@
-#!/usr/bin/awk -f
-# Script to analyze experimental results of our branch prediction heuristics
-# Contributed by Jan Hubicka, SuSE Inc.
-# Copyright (C) 2001, 2003 Free Software Foundation, Inc.
-#
-# This file is part of GCC.
-#
-# GCC is free software; you can redistribute it and/or modify
-# it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 3, or (at your option)
-# any later version.
-#
-# GCC is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-# GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with GCC; see the file COPYING.  If not, write to
-# the Free Software Foundation, 51 Franklin Street, Fifth Floor,
-# Boston, MA 02110-1301, USA.
-#
-#
-# This script is used to calculate two basic properties of the branch prediction
-# heuristics - coverage and hitrate.  Coverage is number of executions of a given
-# branch matched by the heuristics and hitrate is probability that once branch is
-# predicted as taken it is really taken.
-#
-# These values are useful to determine the quality of given heuristics.  Hitrate
-# may be directly used in predict.c.
-#
-# Usage:
-#  Step 1: Compile and profile your program.  You need to use -fprofile-arcs
-#flag to get the profiles
-#  Step 2: Generate log files.  The information about given heuristics are
-#saved into ipa-profile dumps.  You need to pass the -fdimp-ipa-profile switch
-#to the compiler as well
-#as -fbranch-probabilities to get the results of profiling noted in the dumps.
-#Ensure that there are no "Arc profiling: some edge counts were bad." warnings.
-#  Step 3: Run this script to concatenate all *.profile files:
-#analyze_brprob `find . -name *.profile`
-#the information is collected and print once all files are parsed.  This
-#may take a while.
-#Note that the script does use bc to perform long arithmetic.
-#  Step 4: Read the results.  Basically the following table is printed:
-#  (this is just an example from a very early stage of branch prediction pass
-#   development, so please don't take these numbers seriously)
-#
-#HEURISTICS  BRANCHES  (REL)  HITRATE COVERAGE  (REL)
-#opcode  2889  83.7%  94.96%/ 97.62%  7516383  75.3%
-#pointer  246   7.1%  99.69%/ 99.86%   118791   1.2%
-#loop header  449  13.0%  98.32%/ 99.07%43553   0.4%
-#first match 3450 100.0%  89.92%/ 97.27%  9979782 100.0%
-#loop exit924  26.8%  88.95%/ 95.58%  9026266  90.4%
-#error return 150   4.3%  64.48%/ 86.81%   453542   4.5%
-#call 803  23.3%  51.66%/ 98.61%  3614037  36.2%
-#loop branch   51   1.5%  99.26%/ 99.27%26854   0.3%
-#noreturn call951  27.6% 100.00%/100.00%  1759809  17.6%
-#
-#  The heuristic called "first match" is a heuristic used by GCC branch
-#  prediction pass and it predicts 89.92% branches correctly.
-#
-#  The quality of heuristics can

Allow redefinition of libcilkrts debug macros

2016-04-26 Thread Rainer Orth
When working on a couple of Cilk Plus issues lately (PRs target/60290,
target/68945), I noticed that you have to modify the libcilkplus sources
to enable various debugging output.  This seems silly, and the following
patch allows defining them from the command line.

Tested on i386-pc-solaris2.12 and sparc-sun-solaris2.12.

Ok for mainline?

Rainer


2016-04-07  Rainer Orth  

* runtime/except-gcc.cpp (DEBUG_EXCEPTIONS): Allow redefinition.
* runtime/cilk_fiber.h (FIBER_DEBUG): Likewise.
* runtime/scheduler.h (REDPAR_DEBUG): Likewise.

# HG changeset patch
# Parent  072a0724a080e27be15d001b8507af7a2490d73d
Allow redefinition of libcilkrts debug macros

diff --git a/libcilkrts/runtime/cilk_fiber.h b/libcilkrts/runtime/cilk_fiber.h
--- a/libcilkrts/runtime/cilk_fiber.h
+++ b/libcilkrts/runtime/cilk_fiber.h
@@ -63,7 +63,9 @@
  * A value of 0 means no debugging.
  * Higher values generate more debugging output.
  */
+#ifndef FIBER_DEBUG
 #define FIBER_DEBUG 0
+#endif
 
 /**
  * @brief Flag for validating reference counts.
diff --git a/libcilkrts/runtime/except-gcc.cpp b/libcilkrts/runtime/except-gcc.cpp
--- a/libcilkrts/runtime/except-gcc.cpp
+++ b/libcilkrts/runtime/except-gcc.cpp
@@ -49,7 +49,9 @@
 #include 
 #include 
 
+#ifndef DEBUG_EXCEPTIONS
 #define DEBUG_EXCEPTIONS 0
+#endif
 
 struct pending_exception_info
 {
diff --git a/libcilkrts/runtime/scheduler.h b/libcilkrts/runtime/scheduler.h
--- a/libcilkrts/runtime/scheduler.h
+++ b/libcilkrts/runtime/scheduler.h
@@ -74,7 +74,9 @@
  * Print debugging messages and assertions for parallel reducers. 0 is
  * no debugging.  A higher value generates more output.
  */
+#ifndef REDPAR_DEBUG
 #define REDPAR_DEBUG 0
+#endif
 
 /**
  * @brief Lock the worker mutex to allow exclusive access to the

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread H.J. Lu
On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich  wrote:
> 2016-04-25 18:27 GMT+03:00 H.J. Lu :
>>
>> Ilya, can you take a look?
>>
>> Thanks.
>>
>> --
>> H.J.
>
> Hi,
>
> Algorithmic part of the patch looks OK to me except the following piece of 
> code.
>
> +/* Check REF's chain to add new insns into a queue
> +   and find registers requiring conversion.  */
>
> Comment is wrong because you don't have any conversions required for
> your candidates.

I will fix it.

> +
> +void
> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref)
> +{
> +  df_link *chain;
> +
> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
> +  add_to_queue (DF_REF_INSN_UID (ref));
> +
> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
> +{
> +  unsigned uid = DF_REF_INSN_UID (chain->ref);
> +
> +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
> +   continue;
> +
> +  if (!DF_REF_REG_MEM_P (chain->ref))
> +   continue;
>
> I believe here you wrongly jump to the next ref intead of actually adding it
> to a queue.  You may just use
>
> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>
> because you should'n have a candidate used in address operand.

I will update.

> +
> +  if (bitmap_bit_p (insns, uid))
> +   continue;
> +
> +  if (bitmap_bit_p (candidates, uid))
> +   add_to_queue (uid);
>
> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs
> out of candidates list are allowed?

That would be wrong since there are

 while (!bitmap_empty_p (queue))
{
  insn_uid = bitmap_first_set_bit (queue);
  bitmap_clear_bit (queue, insn_uid);
  bitmap_clear_bit (candidates, insn_uid);
  add_insn (candidates, insn_uid);
}

An instruction is a candidate and the bit is cleared when
analyze_register_chain is called.

-- 
H.J.


Re: [RFC][PR68217] Improve value range for signed & sign-bit-CST

2016-04-26 Thread Richard Biener
On Fri, Apr 15, 2016 at 12:44 PM, kugan
 wrote:
> As pointed out by Richard, for signed & sign-bit-CST value range should be
> [-INF, 0] range, not a [-INF, INF] range as happens now.
>
> This patch fixes this. I bootstrapped and regression tested for
> x86-64-linux-gnu with no new regression. Is this OK for statege-1.

I think you need to check vr0 for singleton-constant == sign_bit as well,
there is nothing that canonicalizes this during propagation.

Otherwise ok.

Thanks,
Richard.

> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2016-04-14  Kugan Vivekanandarajah  
>
> PR middle-end/68217
> * tree-vrp.c (extract_range_from_binary_expr_1): In case of signed &
> sign-bit-CST,
>  generate [-INF, 0] instead of [-INF, INF].
>
>
> gcc/testsuite/ChangeLog:
>
> 2016-04-14  Kugan Vivekanandarajah  
>
> PR middle-end/68217
> * gcc.dg/pr68217.c: New test.


Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread Ilya Enkovich
2016-04-26 17:07 GMT+03:00 H.J. Lu :
> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich  wrote:
>> 2016-04-25 18:27 GMT+03:00 H.J. Lu :
>>>
>>> Ilya, can you take a look?
>>>
>>> Thanks.
>>>
>>> --
>>> H.J.
>>
>> Hi,
>>
>> Algorithmic part of the patch looks OK to me except the following piece of 
>> code.
>>
>> +/* Check REF's chain to add new insns into a queue
>> +   and find registers requiring conversion.  */
>>
>> Comment is wrong because you don't have any conversions required for
>> your candidates.
>
> I will fix it.
>
>> +
>> +void
>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref)
>> +{
>> +  df_link *chain;
>> +
>> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>> +  add_to_queue (DF_REF_INSN_UID (ref));
>> +
>> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
>> +{
>> +  unsigned uid = DF_REF_INSN_UID (chain->ref);
>> +
>> +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
>> +   continue;
>> +
>> +  if (!DF_REF_REG_MEM_P (chain->ref))
>> +   continue;
>>
>> I believe here you wrongly jump to the next ref intead of actually adding it
>> to a queue.  You may just use
>>
>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>>
>> because you should'n have a candidate used in address operand.
>
> I will update.
>
>> +
>> +  if (bitmap_bit_p (insns, uid))
>> +   continue;
>> +
>> +  if (bitmap_bit_p (candidates, uid))
>> +   add_to_queue (uid);
>>
>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs
>> out of candidates list are allowed?
>
> That would be wrong since there are
>
>  while (!bitmap_empty_p (queue))
> {
>   insn_uid = bitmap_first_set_bit (queue);
>   bitmap_clear_bit (queue, insn_uid);
>   bitmap_clear_bit (candidates, insn_uid);
>   add_insn (candidates, insn_uid);
> }
>
> An instruction is a candidate and the bit is cleared when
> analyze_register_chain is called.

You clear candidates bit but the first thing you do in add_insn is set
insns bit.
Thus you should hit:

  if (bitmap_bit_p (insns, uid))
continue;

For handled candidates.

Probably it would be more clear if we keep this clear/set pair
together?  E.g. move
bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.

Thanks,
Ilya

>
> --
> H.J.


Re: [PATCH] Replace old AWK script (utilizing bc) with Python implementation

2016-04-26 Thread Jan Hubicka
> On 04/25/2016 09:01 PM, Matthias Klose wrote:
> > please could you make the shebang python3? Not sure if it's good to replace 
> > one old implementation with a soon to become old implementation.
> > 
> > Matthias
> 
> Sure, thanks for pointing out.
> 
> Attaching v2, where I changed:
> + switched from python2 to python3
> + added a new column with better readable coverage values:
> 
> HEURISTICS   BRANCHES  (REL)  HITRATE
> COVERAGE COVERAGE  (REL)
> loop iv compare70   0.1%  59.31% /  71.45%  
> 391732444  391.73M   0.0%
> unconditional jump252   0.2% 100.00% / 100.00%
>   62269   62.27K   0.0%
> guess loop iv compare 362   0.3%  89.02% /  90.08% 
> 27031841642.70G   0.2%
> ...
> 
> As I've just tested the patch, I runs significantly faster for SPEC2006 
> profile dumps:
> real 0m1.162s vs. real 0m17.962s

Hehe, my awk-fu was not intended to be extremely effective ;)
The patch is OK. We could make it to respond to --help rather than
having explanation in comments, but I do not care much as long as it gets the 
data.

Honza
> 
> Martin

> >From 38ee7451629e5fe7d9d2468bf24e1929193be2a6 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Mon, 25 Apr 2016 16:42:42 +0200
> Subject: [PATCH] Replace AWK script with the python script.
> 
> contrib/ChangeLog:
> 
> 2016-04-25  Martin Liska  
> 
>   * analyze_brprob: Remove.
>   * analyze_brprob.py: New file.
> ---
>  contrib/analyze_brprob| 147 
> --
>  contrib/analyze_brprob.py | 136 ++
>  2 files changed, 136 insertions(+), 147 deletions(-)
>  delete mode 100755 contrib/analyze_brprob
>  create mode 100755 contrib/analyze_brprob.py
> 
> diff --git a/contrib/analyze_brprob b/contrib/analyze_brprob
> deleted file mode 100755
> index 5702834..000
> --- a/contrib/analyze_brprob
> +++ /dev/null
> @@ -1,147 +0,0 @@
> -#!/usr/bin/awk -f
> -# Script to analyze experimental results of our branch prediction heuristics
> -# Contributed by Jan Hubicka, SuSE Inc.
> -# Copyright (C) 2001, 2003 Free Software Foundation, Inc.
> -#
> -# This file is part of GCC.
> -#
> -# GCC is free software; you can redistribute it and/or modify
> -# it under the terms of the GNU General Public License as published by
> -# the Free Software Foundation; either version 3, or (at your option)
> -# any later version.
> -#
> -# GCC is distributed in the hope that it will be useful,
> -# but WITHOUT ANY WARRANTY; without even the implied warranty of
> -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -# GNU General Public License for more details.
> -#
> -# You should have received a copy of the GNU General Public License
> -# along with GCC; see the file COPYING.  If not, write to
> -# the Free Software Foundation, 51 Franklin Street, Fifth Floor,
> -# Boston, MA 02110-1301, USA.
> -#
> -#
> -# This script is used to calculate two basic properties of the branch 
> prediction
> -# heuristics - coverage and hitrate.  Coverage is number of executions of a 
> given
> -# branch matched by the heuristics and hitrate is probability that once 
> branch is
> -# predicted as taken it is really taken.
> -#
> -# These values are useful to determine the quality of given heuristics.  
> Hitrate
> -# may be directly used in predict.c.
> -#
> -# Usage:
> -#  Step 1: Compile and profile your program.  You need to use -fprofile-arcs
> -#flag to get the profiles
> -#  Step 2: Generate log files.  The information about given heuristics are
> -#saved into ipa-profile dumps.  You need to pass the -fdimp-ipa-profile 
> switch
> -#to the compiler as well
> -#as -fbranch-probabilities to get the results of profiling noted in the 
> dumps.
> -#Ensure that there are no "Arc profiling: some edge counts were bad." 
> warnings.
> -#  Step 3: Run this script to concatenate all *.profile files:
> -#analyze_brprob `find . -name *.profile`
> -#the information is collected and print once all files are parsed.  This
> -#may take a while.
> -#Note that the script does use bc to perform long arithmetic.
> -#  Step 4: Read the results.  Basically the following table is printed:
> -#  (this is just an example from a very early stage of branch prediction pass
> -#   development, so please don't take these numbers seriously)
> -#
> -#HEURISTICS  BRANCHES  (REL)  HITRATE COVERAGE  
> (REL)
> -#opcode  2889  83.7%  94.96%/ 97.62%  7516383  
> 75.3%
> -#pointer  246   7.1%  99.69%/ 99.86%   118791   
> 1.2%
> -#loop header  449  13.0%  98.32%/ 99.07%43553   
> 0.4%
> -#first match 3450 100.0%  89.92%/ 97.27%  9979782 
> 100.0%
> -#loop exit924  26.8%  88.95%/ 95.58%  9026266  
> 90.4%
> -#error return  

Re: [C PATCH] Follow-up fix to the misclassified token problem (PR c/67784)

2016-04-26 Thread Joseph Myers
On Tue, 26 Apr 2016, Marek Polacek wrote:

> Ah, right, that revealed two more places that were missing the
> c_parser_maybe_reclassify_token call.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread H.J. Lu
On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich  wrote:
> 2016-04-26 17:07 GMT+03:00 H.J. Lu :
>> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich  
>> wrote:
>>> 2016-04-25 18:27 GMT+03:00 H.J. Lu :

 Ilya, can you take a look?

 Thanks.

 --
 H.J.
>>>
>>> Hi,
>>>
>>> Algorithmic part of the patch looks OK to me except the following piece of 
>>> code.
>>>
>>> +/* Check REF's chain to add new insns into a queue
>>> +   and find registers requiring conversion.  */
>>>
>>> Comment is wrong because you don't have any conversions required for
>>> your candidates.
>>
>> I will fix it.
>>
>>> +
>>> +void
>>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref)
>>> +{
>>> +  df_link *chain;
>>> +
>>> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>>> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>>> +  add_to_queue (DF_REF_INSN_UID (ref));
>>> +
>>> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
>>> +{
>>> +  unsigned uid = DF_REF_INSN_UID (chain->ref);
>>> +
>>> +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
>>> +   continue;
>>> +
>>> +  if (!DF_REF_REG_MEM_P (chain->ref))
>>> +   continue;
>>>
>>> I believe here you wrongly jump to the next ref intead of actually adding it
>>> to a queue.  You may just use
>>>
>>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>>>
>>> because you should'n have a candidate used in address operand.
>>
>> I will update.
>>
>>> +
>>> +  if (bitmap_bit_p (insns, uid))
>>> +   continue;
>>> +
>>> +  if (bitmap_bit_p (candidates, uid))
>>> +   add_to_queue (uid);
>>>
>>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs
>>> out of candidates list are allowed?
>>
>> That would be wrong since there are
>>
>>  while (!bitmap_empty_p (queue))
>> {
>>   insn_uid = bitmap_first_set_bit (queue);
>>   bitmap_clear_bit (queue, insn_uid);
>>   bitmap_clear_bit (candidates, insn_uid);
>>   add_insn (candidates, insn_uid);
>> }
>>
>> An instruction is a candidate and the bit is cleared when
>> analyze_register_chain is called.
>
> You clear candidates bit but the first thing you do in add_insn is set
> insns bit.
> Thus you should hit:
>
>   if (bitmap_bit_p (insns, uid))
> continue;
>
> For handled candidates.
>
> Probably it would be more clear if we keep this clear/set pair
> together?  E.g. move
> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.
>

After we started processing candidates, we only use candidates
to check if an instruction is a candidate, not to check if an
instruction is NOT a candidate.

-- 
H.J.


[C PATCH] Better location for -Wnested-externs (PR c/70791)

2016-04-26 Thread Marek Polacek
This patch improves the location info for -Wnested-externs.

Bootstrapped/regtested on x86_64-linux, applying to trunk.

2016-04-26  Marek Polacek  

PR c/70791
* c-decl.c (pushdecl): Pass LOCUS down to warning.

* gcc.dg/Wnested-externs-2.c: New test.

diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index f0c677b..16e4250 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -2949,7 +2949,8 @@ pushdecl (tree x)
}
   if (scope != file_scope
  && !DECL_IN_SYSTEM_HEADER (x))
-   warning (OPT_Wnested_externs, "nested extern declaration of %qD", x);
+   warning_at (locus, OPT_Wnested_externs,
+   "nested extern declaration of %qD", x);
 
   while (b && !B_IN_EXTERNAL_SCOPE (b))
{
diff --git gcc/testsuite/gcc.dg/Wnested-externs-2.c 
gcc/testsuite/gcc.dg/Wnested-externs-2.c
index e69de29..77bed5f 100644
--- gcc/testsuite/gcc.dg/Wnested-externs-2.c
+++ gcc/testsuite/gcc.dg/Wnested-externs-2.c
@@ -0,0 +1,11 @@
+/* PR c/70791 */
+/* { dg-do compile } */
+/* { dg-options "-Wnested-externs" } */
+
+void
+bar (void)
+{
+  extern int i; /* { dg-warning "14:nested extern declaration of 'i'" } */
+  extern short foo (void); /* { dg-warning "16:nested extern declaration of 
'foo'" } */
+  extern struct S *s; /* { dg-warning "20:nested extern declaration of 's'" } 
*/
+}

Marek


Re: [PATCH] Fix a recent warning in reorg.c

2016-04-26 Thread Jeff Law

On 04/26/2016 07:08 AM, Jakub Jelinek wrote:

Hi!

I've noticed a warning during bootstrap:
../../gcc/reorg.c: In function ‘void try_merge_delay_insns(rtx_insn*, 
rtx_insn*)’:
../../gcc/reorg.c:1431:12: warning: name lookup of ‘i’ changed
   for (i = 0; i < XVECLEN (PATTERN (insn), 0); i++)
^
../../gcc/reorg.c:1263:7: warning:   matches this ‘i’ under ISO standard rules
   int i, j;
   ^
../../gcc/reorg.c:1413:25: warning:   matches this ‘i’ under old rules
   for (unsigned int i = len - 1; i < len; i--)
 ^
It is not fatal, but still ugly.  The problem is that the function has
  int i;
...
  for (i = 0; ...)
...
  for (unsigned int i = ... )
...
  for (i = 0; ...)
This patch just declares the var in the only affected loop, so that the warning
is not emitted, unless we start checking -Wshadow warnings, I think this is
good enough.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-04-26  Jakub Jelinek  

* reorg.c (try_merge_delay_insns): Declare i inside the last
for loop to avoid warning.
I'd like to declare this in the "obviously OK" category for future 
changes of a similar nature.  But I don't think we can because in the 
general case the value from the loop may be used outside the loop.


Makes me wonder if a plugin could help identify the obvious cases where 
the value from the loop isn't used outside the loop, thus allowing 
someone to quickly convert many of these things with a small effort.


jeff



Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread Ilya Enkovich
2016-04-26 17:55 GMT+03:00 H.J. Lu :
> On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich  wrote:
>> 2016-04-26 17:07 GMT+03:00 H.J. Lu :
>>> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich  
>>> wrote:
 2016-04-25 18:27 GMT+03:00 H.J. Lu :
>
> Ilya, can you take a look?
>
> Thanks.
>
> --
> H.J.

 Hi,

 Algorithmic part of the patch looks OK to me except the following piece of 
 code.

 +/* Check REF's chain to add new insns into a queue
 +   and find registers requiring conversion.  */

 Comment is wrong because you don't have any conversions required for
 your candidates.
>>>
>>> I will fix it.
>>>
 +
 +void
 +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref)
 +{
 +  df_link *chain;
 +
 +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
 + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
 +  add_to_queue (DF_REF_INSN_UID (ref));
 +
 +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
 +{
 +  unsigned uid = DF_REF_INSN_UID (chain->ref);
 +
 +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
 +   continue;
 +
 +  if (!DF_REF_REG_MEM_P (chain->ref))
 +   continue;

 I believe here you wrongly jump to the next ref intead of actually adding 
 it
 to a queue.  You may just use

 gcc_assert (!DF_REF_REG_MEM_P (chain->ref));

 because you should'n have a candidate used in address operand.
>>>
>>> I will update.
>>>
 +
 +  if (bitmap_bit_p (insns, uid))
 +   continue;
 +
 +  if (bitmap_bit_p (candidates, uid))
 +   add_to_queue (uid);

 Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs
 out of candidates list are allowed?
>>>
>>> That would be wrong since there are
>>>
>>>  while (!bitmap_empty_p (queue))
>>> {
>>>   insn_uid = bitmap_first_set_bit (queue);
>>>   bitmap_clear_bit (queue, insn_uid);
>>>   bitmap_clear_bit (candidates, insn_uid);
>>>   add_insn (candidates, insn_uid);
>>> }
>>>
>>> An instruction is a candidate and the bit is cleared when
>>> analyze_register_chain is called.
>>
>> You clear candidates bit but the first thing you do in add_insn is set
>> insns bit.
>> Thus you should hit:
>>
>>   if (bitmap_bit_p (insns, uid))
>> continue;
>>
>> For handled candidates.
>>
>> Probably it would be more clear if we keep this clear/set pair
>> together?  E.g. move
>> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.
>>
>
> After we started processing candidates, we only use candidates
> to check if an instruction is a candidate, not to check if an
> instruction is NOT a candidate.

I don't see how it's related to what I said.  My point is that
when you analyze added insn you shouldn't reach insns which are both
not in candidates and not in current scalar_chain_64.  That's why I
think you miss an assert in scalar_chain_64::analyze_register_chain.

Thanks,
Ilya

>
> --
> H.J.


Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread H.J. Lu
On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich  wrote:
> 2016-04-26 17:55 GMT+03:00 H.J. Lu :
>> On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich  
>> wrote:
>>> 2016-04-26 17:07 GMT+03:00 H.J. Lu :
 On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich  
 wrote:
> 2016-04-25 18:27 GMT+03:00 H.J. Lu :
>>
>> Ilya, can you take a look?
>>
>> Thanks.
>>
>> --
>> H.J.
>
> Hi,
>
> Algorithmic part of the patch looks OK to me except the following piece 
> of code.
>
> +/* Check REF's chain to add new insns into a queue
> +   and find registers requiring conversion.  */
>
> Comment is wrong because you don't have any conversions required for
> your candidates.

 I will fix it.

> +
> +void
> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref)
> +{
> +  df_link *chain;
> +
> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
> +  add_to_queue (DF_REF_INSN_UID (ref));
> +
> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
> +{
> +  unsigned uid = DF_REF_INSN_UID (chain->ref);
> +
> +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
> +   continue;
> +
> +  if (!DF_REF_REG_MEM_P (chain->ref))
> +   continue;
>
> I believe here you wrongly jump to the next ref intead of actually adding 
> it
> to a queue.  You may just use
>
> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>
> because you should'n have a candidate used in address operand.

 I will update.

> +
> +  if (bitmap_bit_p (insns, uid))
> +   continue;
> +
> +  if (bitmap_bit_p (candidates, uid))
> +   add_to_queue (uid);
>
> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and 
> defs
> out of candidates list are allowed?

 That would be wrong since there are

  while (!bitmap_empty_p (queue))
 {
   insn_uid = bitmap_first_set_bit (queue);
   bitmap_clear_bit (queue, insn_uid);
   bitmap_clear_bit (candidates, insn_uid);
   add_insn (candidates, insn_uid);
 }

 An instruction is a candidate and the bit is cleared when
 analyze_register_chain is called.
>>>
>>> You clear candidates bit but the first thing you do in add_insn is set
>>> insns bit.
>>> Thus you should hit:
>>>
>>>   if (bitmap_bit_p (insns, uid))
>>> continue;
>>>
>>> For handled candidates.
>>>
>>> Probably it would be more clear if we keep this clear/set pair
>>> together?  E.g. move
>>> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.
>>>
>>
>> After we started processing candidates, we only use candidates
>> to check if an instruction is a candidate, not to check if an
>> instruction is NOT a candidate.
>
> I don't see how it's related to what I said.  My point is that
> when you analyze added insn you shouldn't reach insns which are both
> not in candidates and not in current scalar_chain_64.  That's why I
> think you miss an assert in scalar_chain_64::analyze_register_chain.

Since all candidates will be processed by

 while (!bitmap_empty_p (queue))
{
  insn_uid = bitmap_first_set_bit (queue);
  bitmap_clear_bit (queue, insn_uid);
  bitmap_clear_bit (candidates, insn_uid);
  add_insn (candidates, insn_uid);
}

I will change to

/* Check REF's chain to add new insns into a queue.  */

void
timode_scalar_chain::analyze_register_chain (bitmap candidates,
 df_ref ref)
{
gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
  || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
  add_to_queue (DF_REF_INSN_UID (ref));
}



-- 
H.J.


Re: [PATCH] [libatomic] Add RTEMS support

2016-04-26 Thread Jeff Law

On 04/19/2016 01:24 AM, Sebastian Huber wrote:

gcc/

* config/rtems.h (LIB_SPEC): Add -latomic.

libatomic/

* configure.tgt (*-*-rtems*): New supported target.
* config/rtems/host-config.h: New file.
* config/rtems/lock.c: Likewise.
IMHO, as the rtems port maintainer, this is all within the scope of what 
you can self-approve.



Jeff


Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread Jeff Law

On 04/26/2016 03:17 AM, Richard Biener wrote:

On Mon, 25 Apr 2016, Uros Bizjak wrote:


On Mon, Apr 25, 2016 at 4:47 PM, H.J. Lu  wrote:

On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak  wrote:

On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu  wrote:

Tested on Linux/x86-64.  OK for trunk?



+  /* FIXME: Since the CSE pass may change dominance info, which isn't
+ expected by the fwprop pass, call free_dominance_info to
+ invalidate dominance info.  Otherwise, the fwprop pass may crash
+ when dominance info is changed.  */
+  if (TARGET_64BIT)
+free_dominance_info (CDI_DOMINATORS);
+


Please resolve the above problem first, target-dependent sources are
not the place to apply band-aids for middle-end problems. The thread
with the proposed fix died in [1].

[1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html


free_dominance_info (CDI_DOMINATORS) has been called in other
places to avoid this middle-end issue.   I don't know when the middle-end
will be fixed.  I don't think this target optimization should be penalized by
the middle-end issue.


Let's ask Richard if he is OK with the workaround...


Well, it's ultimately your call (it's a workaround in the target).

Of course I'd like to see the underlying issue fixed and the
workarounds in "other places" be removed.

Agreed.  It's fundamentally papering over a problem elsewhere.

jeff



Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread Ilya Enkovich
2016-04-26 18:12 GMT+03:00 H.J. Lu :
> On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich  wrote:
>> 2016-04-26 17:55 GMT+03:00 H.J. Lu :
>>> On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich  
>>> wrote:
 2016-04-26 17:07 GMT+03:00 H.J. Lu :
> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich  
> wrote:
>> 2016-04-25 18:27 GMT+03:00 H.J. Lu :
>>>
>>> Ilya, can you take a look?
>>>
>>> Thanks.
>>>
>>> --
>>> H.J.
>>
>> Hi,
>>
>> Algorithmic part of the patch looks OK to me except the following piece 
>> of code.
>>
>> +/* Check REF's chain to add new insns into a queue
>> +   and find registers requiring conversion.  */
>>
>> Comment is wrong because you don't have any conversions required for
>> your candidates.
>
> I will fix it.
>
>> +
>> +void
>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref)
>> +{
>> +  df_link *chain;
>> +
>> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>> +  add_to_queue (DF_REF_INSN_UID (ref));
>> +
>> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
>> +{
>> +  unsigned uid = DF_REF_INSN_UID (chain->ref);
>> +
>> +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
>> +   continue;
>> +
>> +  if (!DF_REF_REG_MEM_P (chain->ref))
>> +   continue;
>>
>> I believe here you wrongly jump to the next ref intead of actually 
>> adding it
>> to a queue.  You may just use
>>
>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>>
>> because you should'n have a candidate used in address operand.
>
> I will update.
>
>> +
>> +  if (bitmap_bit_p (insns, uid))
>> +   continue;
>> +
>> +  if (bitmap_bit_p (candidates, uid))
>> +   add_to_queue (uid);
>>
>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and 
>> defs
>> out of candidates list are allowed?
>
> That would be wrong since there are
>
>  while (!bitmap_empty_p (queue))
> {
>   insn_uid = bitmap_first_set_bit (queue);
>   bitmap_clear_bit (queue, insn_uid);
>   bitmap_clear_bit (candidates, insn_uid);
>   add_insn (candidates, insn_uid);
> }
>
> An instruction is a candidate and the bit is cleared when
> analyze_register_chain is called.

 You clear candidates bit but the first thing you do in add_insn is set
 insns bit.
 Thus you should hit:

   if (bitmap_bit_p (insns, uid))
 continue;

 For handled candidates.

 Probably it would be more clear if we keep this clear/set pair
 together?  E.g. move
 bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.

>>>
>>> After we started processing candidates, we only use candidates
>>> to check if an instruction is a candidate, not to check if an
>>> instruction is NOT a candidate.
>>
>> I don't see how it's related to what I said.  My point is that
>> when you analyze added insn you shouldn't reach insns which are both
>> not in candidates and not in current scalar_chain_64.  That's why I
>> think you miss an assert in scalar_chain_64::analyze_register_chain.
>
> Since all candidates will be processed by
>
>  while (!bitmap_empty_p (queue))
> {
>   insn_uid = bitmap_first_set_bit (queue);
>   bitmap_clear_bit (queue, insn_uid);
>   bitmap_clear_bit (candidates, insn_uid);
>   add_insn (candidates, insn_uid);
> }

This process only queue, not all candidates.  analyze_register_chain
fills queue bitmap to build a chain.

>
> I will change to
>
> /* Check REF's chain to add new insns into a queue.  */
>
> void
> timode_scalar_chain::analyze_register_chain (bitmap candidates,
>  df_ref ref)
> {
> gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>   || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>   add_to_queue (DF_REF_INSN_UID (ref));
> }
>

The purpose of analyze_register_chain is to collect all register defs
or uses into chain's queue.  You can't just remove a loop over ref's
chain then.

Thanks,
Ilya

>
>
> --
> H.J.


Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread H.J. Lu
On Tue, Apr 26, 2016 at 2:35 AM, Richard Biener  wrote:
> On Tue, 26 Apr 2016, Uros Bizjak wrote:
>
>> On Tue, Apr 26, 2016 at 11:17 AM, Richard Biener  wrote:
>> > On Mon, 25 Apr 2016, Uros Bizjak wrote:
>> >
>> >> On Mon, Apr 25, 2016 at 4:47 PM, H.J. Lu  wrote:
>> >> > On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak  wrote:
>> >> >> On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu  wrote:
>> >> >>> Tested on Linux/x86-64.  OK for trunk?
>> >> >>
>> >> >>> +  /* FIXME: Since the CSE pass may change dominance info, which isn't
>> >> >>> + expected by the fwprop pass, call free_dominance_info to
>> >> >>> + invalidate dominance info.  Otherwise, the fwprop pass may crash
>> >> >>> + when dominance info is changed.  */
>> >> >>> +  if (TARGET_64BIT)
>> >> >>> +free_dominance_info (CDI_DOMINATORS);
>> >> >>> +
>> >> >>
>> >> >> Please resolve the above problem first, target-dependent sources are
>> >> >> not the place to apply band-aids for middle-end problems. The thread
>> >> >> with the proposed fix died in [1].
>> >> >>
>> >> >> [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html
>> >> >
>> >> > free_dominance_info (CDI_DOMINATORS) has been called in other
>> >> > places to avoid this middle-end issue.   I don't know when the 
>> >> > middle-end
>> >> > will be fixed.  I don't think this target optimization should be 
>> >> > penalized by
>> >> > the middle-end issue.
>> >>
>> >> Let's ask Richard if he is OK with the workaround...
>> >
>> > Well, it's ultimately your call (it's a workaround in the target).
>>
>> Oh well, ...
>>
>> > Of course I'd like to see the underlying issue fixed and the
>> > workarounds in "other places" be removed.
>>
>> ... then at least a reference to a relevant PR should be added to a
>> FIXME comment.
>
> HJ, can you please open a bug with 1) a testcase, 2) a patch to revert
> the workaround so it shows the ICE and 3) a pointer to the ml thread
> with your preliminary analysis?

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70807

-- 
H.J.


Re: Patches to fix GCC’s C++ exception handling on NetBSD/VAX

2016-04-26 Thread Jeff Law

On 03/26/2016 06:02 AM, Jake Hamby wrote:

As an added bonus, I see that my patch set also included an old m68k
patch that had been sitting in my tree, which fixes a crash when
-m68040 is defined. I may have submitted it to port-m68k before. It
hasn’t been tested with the new compiler either. Here’s that patch
separately. It only matter when TARGET_68881 && TUNE_68040.


Do you have a testcase for this failure?


[middle-end][PATCH] Update alignment_for_piecewise_move

2016-04-26 Thread H.J. Lu
I am working a patch to enable SSE, AVX and AVX512 for memcpy/memset
optimization.  x86 backend defines MAX_BITSIZE_MODE_ANY_INT to 128
to keep the OI and XI modes from confusing the compiler into thinking
that these modes could actually be used for computation.  But the OI
and XI modes can be used for data movement with vector instructions.

alignment_for_piecewise_move is called only with MOVE_MAX_PIECES or
STORE_MAX_PIECES, which are the number of bytes at a time that we
can move or store efficiently.  We should call mode_for_size without
limit to MAX_FIXED_MODE_SIZE, which is an integer expression for the
size in bits of the largest integer machine mode that should actually
be used, may be smaller than MOVE_MAX_PIECES or STORE_MAX_PIECES, which
may use vector.

Tested on Linux/x86-64.  OK for trunk.


H.J.
---
* expr.c (alignment_for_piecewise_move): Call mode_for_size
without limit to MAX_FIXED_MODE_SIZE.
---
 gcc/expr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index 248d3d7..36070f0 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -733,7 +733,7 @@ alignment_for_piecewise_move (unsigned int max_pieces, 
unsigned int align)
 {
   machine_mode tmode;
 
-  tmode = mode_for_size (max_pieces * BITS_PER_UNIT, MODE_INT, 1);
+  tmode = mode_for_size (max_pieces * BITS_PER_UNIT, MODE_INT, 0);
   if (align >= GET_MODE_ALIGNMENT (tmode))
 align = GET_MODE_ALIGNMENT (tmode);
   else
-- 
2.5.5



Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread H.J. Lu
On Tue, Apr 26, 2016 at 8:21 AM, Ilya Enkovich  wrote:
> 2016-04-26 18:12 GMT+03:00 H.J. Lu :
>> On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich  
>> wrote:
>>> 2016-04-26 17:55 GMT+03:00 H.J. Lu :
 On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich  
 wrote:
> 2016-04-26 17:07 GMT+03:00 H.J. Lu :
>> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich  
>> wrote:
>>> 2016-04-25 18:27 GMT+03:00 H.J. Lu :

 Ilya, can you take a look?

 Thanks.

 --
 H.J.
>>>
>>> Hi,
>>>
>>> Algorithmic part of the patch looks OK to me except the following piece 
>>> of code.
>>>
>>> +/* Check REF's chain to add new insns into a queue
>>> +   and find registers requiring conversion.  */
>>>
>>> Comment is wrong because you don't have any conversions required for
>>> your candidates.
>>
>> I will fix it.
>>
>>> +
>>> +void
>>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref)
>>> +{
>>> +  df_link *chain;
>>> +
>>> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>>> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>>> +  add_to_queue (DF_REF_INSN_UID (ref));
>>> +
>>> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
>>> +{
>>> +  unsigned uid = DF_REF_INSN_UID (chain->ref);
>>> +
>>> +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
>>> +   continue;
>>> +
>>> +  if (!DF_REF_REG_MEM_P (chain->ref))
>>> +   continue;
>>>
>>> I believe here you wrongly jump to the next ref intead of actually 
>>> adding it
>>> to a queue.  You may just use
>>>
>>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>>>
>>> because you should'n have a candidate used in address operand.
>>
>> I will update.
>>
>>> +
>>> +  if (bitmap_bit_p (insns, uid))
>>> +   continue;
>>> +
>>> +  if (bitmap_bit_p (candidates, uid))
>>> +   add_to_queue (uid);
>>>
>>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and 
>>> defs
>>> out of candidates list are allowed?
>>
>> That would be wrong since there are
>>
>>  while (!bitmap_empty_p (queue))
>> {
>>   insn_uid = bitmap_first_set_bit (queue);
>>   bitmap_clear_bit (queue, insn_uid);
>>   bitmap_clear_bit (candidates, insn_uid);
>>   add_insn (candidates, insn_uid);
>> }
>>
>> An instruction is a candidate and the bit is cleared when
>> analyze_register_chain is called.
>
> You clear candidates bit but the first thing you do in add_insn is set
> insns bit.
> Thus you should hit:
>
>   if (bitmap_bit_p (insns, uid))
> continue;
>
> For handled candidates.
>
> Probably it would be more clear if we keep this clear/set pair
> together?  E.g. move
> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.
>

 After we started processing candidates, we only use candidates
 to check if an instruction is a candidate, not to check if an
 instruction is NOT a candidate.
>>>
>>> I don't see how it's related to what I said.  My point is that
>>> when you analyze added insn you shouldn't reach insns which are both
>>> not in candidates and not in current scalar_chain_64.  That's why I
>>> think you miss an assert in scalar_chain_64::analyze_register_chain.
>>
>> Since all candidates will be processed by
>>
>>  while (!bitmap_empty_p (queue))
>> {
>>   insn_uid = bitmap_first_set_bit (queue);
>>   bitmap_clear_bit (queue, insn_uid);
>>   bitmap_clear_bit (candidates, insn_uid);
>>   add_insn (candidates, insn_uid);
>> }
>
> This process only queue, not all candidates.  analyze_register_chain
> fills queue bitmap to build a chain.
>
>>
>> I will change to
>>
>> /* Check REF's chain to add new insns into a queue.  */
>>
>> void
>> timode_scalar_chain::analyze_register_chain (bitmap candidates,
>>  df_ref ref)
>> {
>> gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>>   || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>>   add_to_queue (DF_REF_INSN_UID (ref));
>> }
>>
>
> The purpose of analyze_register_chain is to collect all register defs
> or uses into chain's queue.  You can't just remove a loop over ref's
> chain then.

That is true for DImode conversion.  For TImode conversion,
there are no register conversions required:

https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01394.html

All it does it to call add_to_queue.

Here is the updated patch.  OK for trunk if there is no regression
on x86-64?

-- 
H.J.
From 3d0d350aa482d6fd22a1de6d0423825deb379c0b Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 7 Mar 2016 14:44:37 -0800
Subject: [PATCH] Extend STV pas

Re: [PATCH] Fix a recent warning in reorg.c

2016-04-26 Thread Jakub Jelinek
On Tue, Apr 26, 2016 at 03:13:32PM +0200, Bernd Schmidt wrote:
> On 04/26/2016 03:08 PM, Jakub Jelinek wrote:
> >^
> >../../gcc/reorg.c:1413:25: warning:   matches this ‘i’ under old rules
> >for (unsigned int i = len - 1; i < len; i--)
> >  ^
> 
> Oh, and also - I flagged this while reviewing other parts of Trevor's
> changes, this pattern is too ugly to live, that should be a
> FOR_EACH_VEC_ELT_

That would be FOR_EACH_VEC_ELT_REVERSE, but unfortunately that doesn't
really work.

The iterate template does:

template
inline bool
vec::iterate (unsigned ix, T *ptr) const
{
  if (ix < m_vecpfx.m_num)
{
  *ptr = m_vecdata[ix];
  return true;
}
  else
{
  *ptr = 0;
  return false;
}
}

and the element in this case is std::pair , for which
*ptr = 0; doesn't work (maybe *ptr = T (); would work in there instead,
dunno).

So is the following ok if it passes bootstrap/regtest, or shall I do
something about vec.h?

2016-04-26  Jakub Jelinek  

* reorg.c (try_merge_delay_insns): Declare i and j inside the
for loops rather than one for the whole function.

--- gcc/reorg.c.jj  2016-04-26 08:08:16.0 +0200
+++ gcc/reorg.c 2016-04-26 17:32:22.553670734 +0200
@@ -1260,7 +1260,6 @@ try_merge_delay_insns (rtx_insn *insn, r
   rtx next_to_match = XVECEXP (PATTERN (insn), 0, slot_number);
   struct resources set, needed, modified;
   auto_vec, 10> merged_insns;
-  int i, j;
   int flags;
 
   flags = get_jump_flags (delay_insn, JUMP_LABEL (delay_insn));
@@ -1275,7 +1274,7 @@ try_merge_delay_insns (rtx_insn *insn, r
  will essentially disable this optimization.  This method is somewhat of
  a kludge, but I don't see a better way.)  */
   if (! annul_p)
-for (i = 1 ; i < num_slots; i++)
+for (int i = 1; i < num_slots; i++)
   if (XVECEXP (PATTERN (insn), 0, i))
mark_referenced_resources (XVECEXP (PATTERN (insn), 0, i), &needed,
   true);
@@ -1346,19 +1345,19 @@ try_merge_delay_insns (rtx_insn *insn, r
   mark_set_resources (filled_insn, &set, 0, MARK_SRC_DEST_CALL);
   mark_referenced_resources (filled_insn, &needed, true);
 
-  for (i = 1; i < pat->len (); i++)
+  for (int i = 1; i < pat->len (); i++)
{
  rtx_insn *dtrial = pat->insn (i);
 
  CLEAR_RESOURCE (&modified);
  /* Account for resources set by the insn following NEXT_TO_MATCH
 inside INSN's delay list. */
- for (j = 1; slot_number + j < num_slots; j++)
+ for (int j = 1; slot_number + j < num_slots; j++)
mark_set_resources (XVECEXP (PATTERN (insn), 0, slot_number + j),
&modified, 0, MARK_SRC_DEST_CALL);
  /* Account for resources set by the insn before DTRIAL and inside
 TRIAL's delay list. */
- for (j = 1; j < i; j++)
+ for (int j = 1; j < i; j++)
mark_set_resources (XVECEXP (pat, 0, j),
&modified, 0, MARK_SRC_DEST_CALL); 
  if (! insn_references_resource_p (dtrial, &set, true)
@@ -1411,24 +1410,22 @@ try_merge_delay_insns (rtx_insn *insn, r
 {
   unsigned int len = merged_insns.length ();
   for (unsigned int i = len - 1; i < len; i--)
-   {
- if (merged_insns[i].second)
-   {
- update_block (merged_insns[i].first, thread);
- rtx_insn *new_rtx = delete_from_delay_slot 
(merged_insns[i].first);
- if (thread->deleted ())
-   thread = new_rtx;
-   }
- else
-   {
- update_block (merged_insns[i].first, thread);
- delete_related_insns (merged_insns[i].first);
-   }
-   }
+   if (merged_insns[i].second)
+ {
+   update_block (merged_insns[i].first, thread);
+   rtx_insn *new_rtx = delete_from_delay_slot (merged_insns[i].first);
+   if (thread->deleted ())
+ thread = new_rtx;
+ }
+   else
+ {
+   update_block (merged_insns[i].first, thread);
+   delete_related_insns (merged_insns[i].first);
+ }
 
   INSN_ANNULLED_BRANCH_P (delay_insn) = 0;
 
-  for (i = 0; i < XVECLEN (PATTERN (insn), 0); i++)
+  for (int i = 0; i < XVECLEN (PATTERN (insn), 0); i++)
INSN_FROM_TARGET_P (XVECEXP (PATTERN (insn), 0, i)) = 0;
 }
 }


Jakub


Re: Bad CC0 optimizer in VAX backend (was Re: Patches to fix GCC’s C++ exception handling on NetBSD/VAX)

2016-04-26 Thread Jeff Law

On 03/28/2016 04:29 PM, Jake Hamby wrote:

I have some bad news and some good news. The bad news is that there
has been a nasty optimizer bug lurking in the VAX backend for GCC for
many years, which has to do with over-optimistic removal of necessary
tst/cmp instructions under certain circumstances. This manifests at
-O or higher and the symptoms are strange behavior like stack
overflows because of branches going the wrong way.
So let's get that addressed.  Matt would be the best person to review 
this change as he's the Vax maintainer.  But if it's simple and 
straightforward others can handle.


It will make Matt's job easier if you can create a self-contained 
testcases which show these problems.  Ideally it'll exit (0) when the 
test passes and abort() when the test fails.  Matt (or someone else) can 
use that to help verify exactly what is happening *and* the test can be 
added to the regression testsuite.


There are hints for testcase reduction on the gcc website.



The good news is that my suspicions about the CC0 handler appear to
be correct, and better yet, I'm currently testing a patch that isn't
too big and I believe will fix this bug. The bad behavior is in
vax_notice_update_cc (), which is supposed to call CC_STATUS_INIT if
the CC flags have been reset, or set cc_status.value1 and possibly
cc_status.value2 with the destination of the current (set...)
instruction, whose signed comparison to 0 will be stored in the N and
Z flags as a side effect (most VAX instructions do this).

Right.  Pretty standard stuff for a cc0 target.




The first bug is that most, but not all of the define_insn patterns
in vax.md actually do this. The paths that emit movz* instructions
will not set N and Z, and there are some code paths that can't be
trusted because they handle a 64-bit integer in two 32-bit pieces,
for example, so N and Z won't reflect the desired result (comparison
of operand 0 to 0).
Understood.  So it sounds like there's two bugs, which implies we'd 
really like two independent tests.


One is the movz instructions don't set the cc0 codes, but 
notice_update_cc isn't aware of that or doesn't handle it properly.


Second is the handling of notice_update_cc for double-word instructions. 
 Many ports use CC_STATUS_INIT for those as tracking the status bits 
can be painful.




The current version of vax_notice_update_cc has a specific check for
this: it won't set the cc_status cache if GET_CODE (SET_DEST (exp))
!= ZERO_EXTRACT. The problem is that this is checking the RTL
expression for (zero_extract...) and not whether what was actually
emitted is a movz* or not. That's why all of the define_insn()'s have
to be annotated with their actual behavior vis-a-vis compare to 0,
and then the change to vax.c to read the CC attribute makes it really
much easier to get the correct behavior.
Right.  That's similar to how other cc0 ports work -- they have 
attributes which clearly indicate the cc0 status for each alternative. 
Then you just need to add the attribute to each insn and check it in 
notice_update_cc.  See the v850 port as a fairly simple example of how 
this can be handled.





I need to do a lot of testing today to see if this really does help
fix GCC's VAX backend, but I'm hopeful that this is at least a real
bug that needs to be fixed, and that I have a fix for it that should
work. The other good news is that you only need three cc enums:
"none" (doesn't alter Z or N flags), "compare" (Z and N reflect
comparison of operand 0 to literal 0), and "clobber" (call
CC_STATUS_INIT to reset cc_status cache).
It's not uncommon to need an attribute that says the cc0 bits aren't 
modified, but operand0 is modified.  In fact, that may be what you want 
for the movz instructions on the vax from what I've read above.


As far as the patch itself, I'd drop all the grubbing around in RTL, 
except for the case where the cc0 bits are set from a comparison. 
Essentially you let the insn's attributes entirely describe the cc0 
status.   Again, see v850.c::notice_update_cc and the attributes in v850.md.


Jeff


Re: [PATCH] Fix a recent warning in reorg.c

2016-04-26 Thread Bernd Schmidt

On 04/26/2016 05:41 PM, Jakub Jelinek wrote:

On Tue, Apr 26, 2016 at 03:13:32PM +0200, Bernd Schmidt wrote:

On 04/26/2016 03:08 PM, Jakub Jelinek wrote:

^
../../gcc/reorg.c:1413:25: warning:   matches this ‘i’ under old rules
for (unsigned int i = len - 1; i < len; i--)
  ^


Oh, and also - I flagged this while reviewing other parts of Trevor's
changes, this pattern is too ugly to live, that should be a
FOR_EACH_VEC_ELT_


That would be FOR_EACH_VEC_ELT_REVERSE, but unfortunately that doesn't
really work.

The iterate template does:

template
inline bool
vec::iterate (unsigned ix, T *ptr) const
{
   if (ix < m_vecpfx.m_num)
 {
   *ptr = m_vecdata[ix];
   return true;
 }
   else
 {
   *ptr = 0;
   return false;
 }
}

and the element in this case is std::pair , for which
*ptr = 0; doesn't work (maybe *ptr = T (); would work in there instead,
dunno).

So is the following ok if it passes bootstrap/regtest, or shall I do
something about vec.h?


Your patch is fine, and I think we should also work on vec.h separately.


Bernd


Re: C++ PATCH to implement C++17 maybe_unused attribute

2016-04-26 Thread Jason Merrill
Fixed.

Jason


On Tue, Apr 26, 2016 at 5:46 AM, Andreas Schwab  wrote:
> /usr/local/gcc/gcc-20160426/gcc/testsuite/c-c++-common/cpp/pr63831-1.c:5:14: 
> error: size of array 'T1' is negative
> /usr/local/gcc/gcc-20160426/gcc/testsuite/c-c++-common/cpp/pr63831-1.c:13:14: 
> error: size of array 'T5' is negative
> /usr/local/gcc/gcc-20160426/gcc/testsuite/c-c++-common/cpp/pr63831-1.c:17:1: 
> error: unknown type name 'T4'
> /usr/local/gcc/gcc-20160426/gcc/testsuite/c-c++-common/cpp/pr63831-1.c:35:14: 
> error: size of array 'T11' is negative
> /usr/local/gcc/gcc-20160426/gcc/testsuite/c-c++-common/cpp/pr63831-1.c:43:14: 
> error: size of array 'T15' is negative
> /usr/local/gcc/gcc-20160426/gcc/testsuite/c-c++-common/cpp/pr63831-1.c:47:1: 
> error: unknown type name 'T14'
>
> Andreas.
>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."


Re: match.pd: unsigned A - B > A --> A < B

2016-04-26 Thread Marc Glisse

On Tue, 26 Apr 2016, Richard Biener wrote:


On Sun, Apr 24, 2016 at 7:42 PM, Marc Glisse  wrote:

Hello,

the first part is something that was discussed last stage3, and Jakub argued
in favor of single_use. The second part is probably less useful, it notices
that if we manually check for overflow using the result of IFN_*_OVERFLOW,
then we might as well read that information from the result of that
function.

Bootstrap+regtest on powerpc64le-unknown-linux-gnu. (hmm, I probably should
have done it on x86_64 instead, I don't know if the ppc backend has
implemented the overflow functions recently)


Ok.  Can you please place the match.pd rules adjacent to the other comparison
simplifications rather than at the end?


(I did that for the other patch as well)
I just realized that the *_OVERFLOW internal functions do not work the way 
I expected, the arguments and result can be any combination of unrelated 
types, which makes it unlikely that the transforms are safe as they are 
(could be, but that would be a lot of luck).


The patterns have a comparison between @0 and realpart, so at least those 
2 types are (essentially) the same. To be safe, I would add a condition 
that @0 and @1 have (essentially) the same type. Elsewhere we have a 
different condition for generic/gimple, but for such transforms it doesn't 
seem important to do them on generic.


Here is the new patch, with useless_type_conversion_p added. Is that ok?
(I could also drop that part of the patch and commit only the part that 
does not involve builtins)


I'll do another regtest to be sure.


2016-04-25  Marc Glisse  

gcc/
* match.pd (A - B > A, A + B < A): New transformations.

gcc/testsuite/
* gcc.dg/tree-ssa/overflow-2.c: New testcase.
* gcc.dg/tree-ssa/minus-ovf.c: Likewise.


--
Marc GlisseIndex: trunk-ovf2/gcc/match.pd
===
--- trunk-ovf2/gcc/match.pd (revision 235448)
+++ trunk-ovf2/gcc/match.pd (working copy)
@@ -2501,20 +2501,75 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  out (gt gt le le)
  (simplify
   (cmp @0 (plus@2 @0 INTEGER_CST@1))
   (if (TYPE_UNSIGNED (TREE_TYPE (@0))
&& TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
&& wi::ne_p (@1, 0)
&& single_use (@2))
(out @0 { wide_int_to_tree (TREE_TYPE (@0), wi::max_value
   (TYPE_PRECISION (TREE_TYPE (@0)), UNSIGNED) - @1); }
 
+/* To detect overflow in unsigned A - B, A < B is simpler than A - B > A.
+   However, the detection logic for SUB_OVERFLOW in tree-ssa-math-opts.c
+   expects the long form, so we restrict the transformation for now.  */
+(for cmp (gt le)
+ (simplify
+  (cmp (minus@2 @0 @1) @0)
+  (if (single_use (@2)
+   && ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
+   && TYPE_UNSIGNED (TREE_TYPE (@0))
+   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
+   (cmp @1 @0
+(for cmp (lt ge)
+ (simplify
+  (cmp @0 (minus@2 @0 @1))
+  (if (single_use (@2)
+   && ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
+   && TYPE_UNSIGNED (TREE_TYPE (@0))
+   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
+   (cmp @0 @1
+
+/* Testing for overflow is unnecessary if we already know the result.  */
+(if (GIMPLE)
+ /* A < A - B  */
+ (for cmp (lt ge)
+  out (ne eq)
+  (simplify
+   (cmp @0 (realpart (IFN_SUB_OVERFLOW@2 @0 @1)))
+   (if (TYPE_UNSIGNED (TREE_TYPE (@0))
+   && useless_type_conversion_p (TREE_TYPE (@0), TREE_TYPE (@1)))
+(out (imagpart @2) { build_zero_cst (TREE_TYPE (@0)); }
+ /* A - B > A  */
+ (for cmp (gt le)
+  out (ne eq)
+  (simplify
+   (cmp (realpart (IFN_SUB_OVERFLOW@2 @0 @1)) @0)
+   (if (TYPE_UNSIGNED (TREE_TYPE (@0))
+   && useless_type_conversion_p (TREE_TYPE (@0), TREE_TYPE (@1)))
+(out (imagpart @2) { build_zero_cst (TREE_TYPE (@0)); }
+ /* A + B < A  */
+ (for cmp (lt ge)
+  out (ne eq)
+  (simplify
+   (cmp (realpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) @0)
+   (if (TYPE_UNSIGNED (TREE_TYPE (@0))
+   && useless_type_conversion_p (TREE_TYPE (@0), TREE_TYPE (@1)))
+(out (imagpart @2) { build_zero_cst (TREE_TYPE (@0)); }
+ /* A > A + B  */
+ (for cmp (gt le)
+  out (ne eq)
+  (simplify
+   (cmp @0 (realpart (IFN_ADD_OVERFLOW:c@2 @0 @1)))
+   (if (TYPE_UNSIGNED (TREE_TYPE (@0))
+   && useless_type_conversion_p (TREE_TYPE (@0), TREE_TYPE (@1)))
+(out (imagpart @2) { build_zero_cst (TREE_TYPE (@0)); })
+
 
 /* Simplification of math builtins.  These rules must all be optimizations
as well as IL simplifications.  If there is a possibility that the new
form could be a pessimization, the rule should go in the canonicalization
section that follows this one.
 
Rules can generally go in this section if they satisfy one of
the following:
 
- the rule describes an identity
Index: trunk-ovf2/gcc/testsuite/gcc.dg/tree-ssa/minus-ovf.c
===
--- trunk-ovf2/gcc/testsuite/gcc.dg/tree-ssa/minus-ovf.c(

Re: Patches to fix GCC's C++ exception_handling on NetBSD/VAX

2016-04-26 Thread Jeff Law

On 03/27/2016 04:09 PM, Jake Hamby wrote:

Thank you for the offer. I already tested it on an Amiga 3000 w/
68040 card when I made the fix. The bug manifested as the
cross-compiler crashing with a failure to find a suitable insn,
because it couldn’t find the correct FP instruction to expand to. I
believe it manifested when running ./build.sh release with “-m68040”
set in CPUFLAGS. I will test it myself and see if it’s still an issue
without the patch. If you look at the .md file, there’s an entirely
different code path to generate the same instructions when
"TARGET_68881 && TUNE_68040" aren't defined.

At the time I made the change, I had already been testing the code on
an Amiga 3000 w/ 68040 card, so I know that the generated code is
likely correct (also, the assembler accepted it). So I’m assuming
that it’s a fairly safe thing. It was the difference between the
build succeeding or failing, and not an issue with the generated
code.

So the only thing I can suggest is that you can try a build with the
patch and make sure it's stable. I was never able to produce a build
without it, because "TARGET_68881 && TUNE_68040" was excluding the
other choices when building, I believe, libm or libc or the kernel or
something like that. I do have a test case for C++ exceptions on VAX,
which I will send separately.
We'd really like to have a test for this -- a blind change without 
analysis is strongly discouraged.  Analysis is virtually impossible 
without a testcase.


My suggestion would be to remove your hack, rebuild (and yes, I know 
it'll take a *long* time) & capture the failure.  We'll want  the .i/.ii 
preprocessed code as well as the options used to trigger the failure. 
Given those things it ought to be relatively easy for someone to analyze 
the problem, determine if your patch or something else is the best 
solution, and reduce the test for use within the testsuite.


jeff




Re: C, C++: New warning for memset without multiply by elt size

2016-04-26 Thread Jason Merrill
On Tue, Apr 26, 2016 at 9:04 AM, Bernd Schmidt  wrote:
> On 04/25/2016 07:55 PM, Jason Merrill wrote:
>>
>> On 04/25/2016 05:07 AM, Bernd Schmidt wrote:
>>>
>>> +if (TREE_CODE (arg2) == CONST_DECL)
>>> +  arg2 = DECL_INITIAL (arg2);
>>> +int literal_mask = ((!!integer_zerop (arg1) << 1)
>>> +| (!!integer_zerop (arg2) << 2));
>>
>>
>> Are you deliberately treating an enumerator as a literal 0?  I'd think
>> we should set literal_mask before stripping CONST_DECL.  OK with that
>> change.
>
>
> Just to be sure, is that OK for everything or just the C++ parts?

Everything.

Jason


Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread Ilya Enkovich
2016-04-26 18:39 GMT+03:00 H.J. Lu :
> On Tue, Apr 26, 2016 at 8:21 AM, Ilya Enkovich  wrote:
>> 2016-04-26 18:12 GMT+03:00 H.J. Lu :
>>> On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich  
>>> wrote:
 2016-04-26 17:55 GMT+03:00 H.J. Lu :
> On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich  
> wrote:
>> 2016-04-26 17:07 GMT+03:00 H.J. Lu :
>>> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich  
>>> wrote:
 2016-04-25 18:27 GMT+03:00 H.J. Lu :
>
> Ilya, can you take a look?
>
> Thanks.
>
> --
> H.J.

 Hi,

 Algorithmic part of the patch looks OK to me except the following 
 piece of code.

 +/* Check REF's chain to add new insns into a queue
 +   and find registers requiring conversion.  */

 Comment is wrong because you don't have any conversions required for
 your candidates.
>>>
>>> I will fix it.
>>>
 +
 +void
 +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref 
 ref)
 +{
 +  df_link *chain;
 +
 +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
 + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
 +  add_to_queue (DF_REF_INSN_UID (ref));
 +
 +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
 +{
 +  unsigned uid = DF_REF_INSN_UID (chain->ref);
 +
 +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
 +   continue;
 +
 +  if (!DF_REF_REG_MEM_P (chain->ref))
 +   continue;

 I believe here you wrongly jump to the next ref intead of actually 
 adding it
 to a queue.  You may just use

 gcc_assert (!DF_REF_REG_MEM_P (chain->ref));

 because you should'n have a candidate used in address operand.
>>>
>>> I will update.
>>>
 +
 +  if (bitmap_bit_p (insns, uid))
 +   continue;
 +
 +  if (bitmap_bit_p (candidates, uid))
 +   add_to_queue (uid);

 Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and 
 defs
 out of candidates list are allowed?
>>>
>>> That would be wrong since there are
>>>
>>>  while (!bitmap_empty_p (queue))
>>> {
>>>   insn_uid = bitmap_first_set_bit (queue);
>>>   bitmap_clear_bit (queue, insn_uid);
>>>   bitmap_clear_bit (candidates, insn_uid);
>>>   add_insn (candidates, insn_uid);
>>> }
>>>
>>> An instruction is a candidate and the bit is cleared when
>>> analyze_register_chain is called.
>>
>> You clear candidates bit but the first thing you do in add_insn is set
>> insns bit.
>> Thus you should hit:
>>
>>   if (bitmap_bit_p (insns, uid))
>> continue;
>>
>> For handled candidates.
>>
>> Probably it would be more clear if we keep this clear/set pair
>> together?  E.g. move
>> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.
>>
>
> After we started processing candidates, we only use candidates
> to check if an instruction is a candidate, not to check if an
> instruction is NOT a candidate.

 I don't see how it's related to what I said.  My point is that
 when you analyze added insn you shouldn't reach insns which are both
 not in candidates and not in current scalar_chain_64.  That's why I
 think you miss an assert in scalar_chain_64::analyze_register_chain.
>>>
>>> Since all candidates will be processed by
>>>
>>>  while (!bitmap_empty_p (queue))
>>> {
>>>   insn_uid = bitmap_first_set_bit (queue);
>>>   bitmap_clear_bit (queue, insn_uid);
>>>   bitmap_clear_bit (candidates, insn_uid);
>>>   add_insn (candidates, insn_uid);
>>> }
>>
>> This process only queue, not all candidates.  analyze_register_chain
>> fills queue bitmap to build a chain.
>>
>>>
>>> I will change to
>>>
>>> /* Check REF's chain to add new insns into a queue.  */
>>>
>>> void
>>> timode_scalar_chain::analyze_register_chain (bitmap candidates,
>>>  df_ref ref)
>>> {
>>> gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>>>   || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>>>   add_to_queue (DF_REF_INSN_UID (ref));
>>> }
>>>
>>
>> The purpose of analyze_register_chain is to collect all register defs
>> or uses into chain's queue.  You can't just remove a loop over ref's
>> chain then.
>
> That is true for DImode conversion.  For TImode conversion,
> there are no register conversions required:
>
> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01394.html
>
> All it does it to call add_to_queue.

No conversion means no mark_dual_mode_def cal

Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread H.J. Lu
On Tue, Apr 26, 2016 at 9:07 AM, Ilya Enkovich  wrote:
> 2016-04-26 18:39 GMT+03:00 H.J. Lu :
>> On Tue, Apr 26, 2016 at 8:21 AM, Ilya Enkovich  
>> wrote:
>>> 2016-04-26 18:12 GMT+03:00 H.J. Lu :
 On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich  
 wrote:
> 2016-04-26 17:55 GMT+03:00 H.J. Lu :
>> On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich  
>> wrote:
>>> 2016-04-26 17:07 GMT+03:00 H.J. Lu :
 On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich 
  wrote:
> 2016-04-25 18:27 GMT+03:00 H.J. Lu :
>>
>> Ilya, can you take a look?
>>
>> Thanks.
>>
>> --
>> H.J.
>
> Hi,
>
> Algorithmic part of the patch looks OK to me except the following 
> piece of code.
>
> +/* Check REF's chain to add new insns into a queue
> +   and find registers requiring conversion.  */
>
> Comment is wrong because you don't have any conversions required for
> your candidates.

 I will fix it.

> +
> +void
> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref 
> ref)
> +{
> +  df_link *chain;
> +
> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
> +  add_to_queue (DF_REF_INSN_UID (ref));
> +
> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
> +{
> +  unsigned uid = DF_REF_INSN_UID (chain->ref);
> +
> +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
> +   continue;
> +
> +  if (!DF_REF_REG_MEM_P (chain->ref))
> +   continue;
>
> I believe here you wrongly jump to the next ref intead of actually 
> adding it
> to a queue.  You may just use
>
> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>
> because you should'n have a candidate used in address operand.

 I will update.

> +
> +  if (bitmap_bit_p (insns, uid))
> +   continue;
> +
> +  if (bitmap_bit_p (candidates, uid))
> +   add_to_queue (uid);
>
> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses 
> and defs
> out of candidates list are allowed?

 That would be wrong since there are

  while (!bitmap_empty_p (queue))
 {
   insn_uid = bitmap_first_set_bit (queue);
   bitmap_clear_bit (queue, insn_uid);
   bitmap_clear_bit (candidates, insn_uid);
   add_insn (candidates, insn_uid);
 }

 An instruction is a candidate and the bit is cleared when
 analyze_register_chain is called.
>>>
>>> You clear candidates bit but the first thing you do in add_insn is set
>>> insns bit.
>>> Thus you should hit:
>>>
>>>   if (bitmap_bit_p (insns, uid))
>>> continue;
>>>
>>> For handled candidates.
>>>
>>> Probably it would be more clear if we keep this clear/set pair
>>> together?  E.g. move
>>> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.
>>>
>>
>> After we started processing candidates, we only use candidates
>> to check if an instruction is a candidate, not to check if an
>> instruction is NOT a candidate.
>
> I don't see how it's related to what I said.  My point is that
> when you analyze added insn you shouldn't reach insns which are both
> not in candidates and not in current scalar_chain_64.  That's why I
> think you miss an assert in scalar_chain_64::analyze_register_chain.

 Since all candidates will be processed by

  while (!bitmap_empty_p (queue))
 {
   insn_uid = bitmap_first_set_bit (queue);
   bitmap_clear_bit (queue, insn_uid);
   bitmap_clear_bit (candidates, insn_uid);
   add_insn (candidates, insn_uid);
 }
>>>
>>> This process only queue, not all candidates.  analyze_register_chain
>>> fills queue bitmap to build a chain.
>>>

 I will change to

 /* Check REF's chain to add new insns into a queue.  */

 void
 timode_scalar_chain::analyze_register_chain (bitmap candidates,
  df_ref ref)
 {
 gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
   || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
   add_to_queue (DF_REF_INSN_UID (ref));
 }

>>>
>>> The purpose of analyze_register_chain is to collect all register defs
>>> or uses into chain's queue.  You can't just remove a loop over ref's
>>> chain then.
>>
>> That is true for DImode conversion.  For TImode co

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-04-26 Thread Wilco Dijkstra
Evandro Menezes wrote:
>
> True, but the results when running on A53 could be quite different.

GCC is ~1.2% faster on Cortex-A53 built for generic, but there is no
difference in perlbench.

Wilco



Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-04-26 Thread Evandro Menezes

On 04/26/16 11:14, Wilco Dijkstra wrote:

Evandro Menezes wrote:

True, but the results when running on A53 could be quite different.

GCC is ~1.2% faster on Cortex-A53 built for generic, but there is no
difference in perlbench.


Looks good, then.  Fine by me.

Thanks for your patience,

--
Evandro Menezes



Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread Ilya Enkovich
2016-04-26 19:12 GMT+03:00 H.J. Lu :
> On Tue, Apr 26, 2016 at 9:07 AM, Ilya Enkovich  wrote:
>> 2016-04-26 18:39 GMT+03:00 H.J. Lu :
>>> On Tue, Apr 26, 2016 at 8:21 AM, Ilya Enkovich  
>>> wrote:
 2016-04-26 18:12 GMT+03:00 H.J. Lu :
> On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich  
> wrote:
>> 2016-04-26 17:55 GMT+03:00 H.J. Lu :
>>> On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich  
>>> wrote:
 2016-04-26 17:07 GMT+03:00 H.J. Lu :
> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich 
>  wrote:
>> 2016-04-25 18:27 GMT+03:00 H.J. Lu :
>>>
>>> Ilya, can you take a look?
>>>
>>> Thanks.
>>>
>>> --
>>> H.J.
>>
>> Hi,
>>
>> Algorithmic part of the patch looks OK to me except the following 
>> piece of code.
>>
>> +/* Check REF's chain to add new insns into a queue
>> +   and find registers requiring conversion.  */
>>
>> Comment is wrong because you don't have any conversions required for
>> your candidates.
>
> I will fix it.
>
>> +
>> +void
>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref 
>> ref)
>> +{
>> +  df_link *chain;
>> +
>> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>> +  add_to_queue (DF_REF_INSN_UID (ref));
>> +
>> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
>> +{
>> +  unsigned uid = DF_REF_INSN_UID (chain->ref);
>> +
>> +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
>> +   continue;
>> +
>> +  if (!DF_REF_REG_MEM_P (chain->ref))
>> +   continue;
>>
>> I believe here you wrongly jump to the next ref intead of actually 
>> adding it
>> to a queue.  You may just use
>>
>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>>
>> because you should'n have a candidate used in address operand.
>
> I will update.
>
>> +
>> +  if (bitmap_bit_p (insns, uid))
>> +   continue;
>> +
>> +  if (bitmap_bit_p (candidates, uid))
>> +   add_to_queue (uid);
>>
>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses 
>> and defs
>> out of candidates list are allowed?
>
> That would be wrong since there are
>
>  while (!bitmap_empty_p (queue))
> {
>   insn_uid = bitmap_first_set_bit (queue);
>   bitmap_clear_bit (queue, insn_uid);
>   bitmap_clear_bit (candidates, insn_uid);
>   add_insn (candidates, insn_uid);
> }
>
> An instruction is a candidate and the bit is cleared when
> analyze_register_chain is called.

 You clear candidates bit but the first thing you do in add_insn is set
 insns bit.
 Thus you should hit:

   if (bitmap_bit_p (insns, uid))
 continue;

 For handled candidates.

 Probably it would be more clear if we keep this clear/set pair
 together?  E.g. move
 bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.

>>>
>>> After we started processing candidates, we only use candidates
>>> to check if an instruction is a candidate, not to check if an
>>> instruction is NOT a candidate.
>>
>> I don't see how it's related to what I said.  My point is that
>> when you analyze added insn you shouldn't reach insns which are both
>> not in candidates and not in current scalar_chain_64.  That's why I
>> think you miss an assert in scalar_chain_64::analyze_register_chain.
>
> Since all candidates will be processed by
>
>  while (!bitmap_empty_p (queue))
> {
>   insn_uid = bitmap_first_set_bit (queue);
>   bitmap_clear_bit (queue, insn_uid);
>   bitmap_clear_bit (candidates, insn_uid);
>   add_insn (candidates, insn_uid);
> }

 This process only queue, not all candidates.  analyze_register_chain
 fills queue bitmap to build a chain.

>
> I will change to
>
> /* Check REF's chain to add new insns into a queue.  */
>
> void
> timode_scalar_chain::analyze_register_chain (bitmap candidates,
>  df_ref ref)
> {
> gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>   || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>   add_to_queue (DF_REF_INSN_UID (ref));
> }
>

 The purpose of analyze_register_cha

Re: Patches to fix optimizer bug & C++ exceptions for GCC VAX backend

2016-04-26 Thread Jeff Law

On 04/04/2016 08:51 AM, Maciej W. Rozycki wrote:

On Thu, 31 Mar 2016, Jake Hamby wrote:


There's one more thing that's broken in the VAX backend which I'd
*really* like to fix: GCC can't compile many of its own files at -O2, as
well as a few other .c files in the NetBSD tree, because it can't expand
an insn pattern. The Makefiles have hacks to add "-O0" on VAX, and when
I remove that workaround, I get GCC assertion failures, all of the form:

/home/netbsd/current/src/external/gpl3/gcc/dist/gcc/builtins.c: In function 
'void maybe_emit_chk_warning(tree, built_in_function)':
/home/netbsd/current/src/external/gpl3/gcc/dist/gcc/builtins.c:11549:1: error: 
unrecognizable insn:
 }
 ^
(insn 295 294 296 25 (set (reg:SI 111)
(subreg:SI (mem:DI (plus:SI (mult:SI (reg:SI 109)
(const_int 8 [0x8]))
(reg/f:SI 45 [ D.85015 ])) [7 *_98+0 S8 A32]) 4)) 
/home/netbsd/current/src/external/gpl3/gcc/dist/gcc/wide-int.h:799 -1
 (nil))
/home/netbsd/current/src/external/gpl3/gcc/dist/gcc/builtins.c:11549:1: 
internal compiler error: in extract_insn, at recog.c:2343
0xbd0365 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)

/home/netbsd/current/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/rtl-error.c:110
0xbd03fa _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)

/home/netbsd/current/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/rtl-error.c:118
0xb92a2d extract_insn(rtx_insn*)

/home/netbsd/current/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/recog.c:2343
0x9612cd instantiate_virtual_regs_in_insn

/home/netbsd/current/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/function.c:1598
0x9612cd instantiate_virtual_regs

/home/netbsd/current/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/function.c:1966
0x9612cd execute

/home/netbsd/current/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/function.c:2015

The failures all seem to be related to trying to read a value from an
array of 64-bit values and loading it into a 32-bit register. It seems
like there should be a special insn defined for this sort of array
access, since VAX has mova* and pusha* variants to set a value from an
address plus an index into a byte, word, long, or 64-bit array (it uses
movab/pushab, put not the other variants). The addressing modes,
constraints, and predicates all get very complicated, and I still need
to understand a bit better what is actually required, and what could be
simplified and cleaned up.


 Please note however that this RTL instruction is a memory reference, not
an address load.  So the suitable hardware instruction would be MOVAQ for
an indexed DI mode memory load.  If used to set a SI mode subreg at byte
number 4 (this would be the second hardware register of a pair a DI mode
value is held in) as seen here, it would have to address the immediately
preceding register however (so you can't load R0 this way) and it would
clobber it.

 So offhand I think you need an RTL instruction splitter to express this,
and then avoid fetching 64 bits worth of data from memory -- for the sake
of matching the indexed addressing mode -- where you only need 32 bits.
At the hardware instruction level I'd use a scratch register (as with
MOVAQ you'd have to waste one anyway) to scale the index and then use
MOVAL instead with the modified index.  Where no index is used it gets
simpler even as you can just bump up the displacement according to the
subreg offset.

Note you shouldn't need an expander for this.

That insn is just a 32bit load.  I would have expected something to 
simplify the subreg expression, likely requiring loading the address 
into a register in the process.


Jeff


Re: Patches to fix optimizer bug & C++ exceptions for GCC VAX backend

2016-04-26 Thread Jeff Law

On 04/01/2016 05:37 AM, Bernd Schmidt wrote:

Cc'ing Matt Thomas who is listed as the vax maintainer; most of the
patch should be reviewed by him IMO. If he is no longer active I'd
frankly rather deprecate the port rather than invest effort in keeping
it running.


Index: gcc/except.c
===
RCS file: /cvsroot/src/external/gpl3/gcc/dist/gcc/except.c,v
retrieving revision 1.3
diff -u -r1.3 except.c
--- gcc/except.c23 Mar 2016 15:51:36 -1.3
+++ gcc/except.c28 Mar 2016 23:24:40 -
@@ -2288,7 +2288,8 @@
  #endif
  {
  #ifdef EH_RETURN_HANDLER_RTX
-  emit_move_insn (EH_RETURN_HANDLER_RTX, crtl->eh.ehr_handler);
+  rtx insn = emit_move_insn (EH_RETURN_HANDLER_RTX,
crtl->eh.ehr_handler);
+  RTX_FRAME_RELATED_P (insn) = 1;// XXX FIXME in VAX backend?
  #else
error ("__builtin_eh_return not supported on this target");
  #endif


This part looks highly suspicious and I think there needs to be further
analysis.

Agreed 100%.  This is a symptom of a problem elsewhere.

jeff


Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread Ilya Enkovich
2016-04-26 19:20 GMT+03:00 Ilya Enkovich :
> 2016-04-26 19:12 GMT+03:00 H.J. Lu :
>> On Tue, Apr 26, 2016 at 9:07 AM, Ilya Enkovich  
>> wrote:
>>> 2016-04-26 18:39 GMT+03:00 H.J. Lu :
 On Tue, Apr 26, 2016 at 8:21 AM, Ilya Enkovich  
 wrote:
> 2016-04-26 18:12 GMT+03:00 H.J. Lu :
>> On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich  
>> wrote:
>>> 2016-04-26 17:55 GMT+03:00 H.J. Lu :
 On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich 
  wrote:
> 2016-04-26 17:07 GMT+03:00 H.J. Lu :
>> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich 
>>  wrote:
>>> 2016-04-25 18:27 GMT+03:00 H.J. Lu :

 Ilya, can you take a look?

 Thanks.

 --
 H.J.
>>>
>>> Hi,
>>>
>>> Algorithmic part of the patch looks OK to me except the following 
>>> piece of code.
>>>
>>> +/* Check REF's chain to add new insns into a queue
>>> +   and find registers requiring conversion.  */
>>>
>>> Comment is wrong because you don't have any conversions required for
>>> your candidates.
>>
>> I will fix it.
>>
>>> +
>>> +void
>>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref 
>>> ref)
>>> +{
>>> +  df_link *chain;
>>> +
>>> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>>> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>>> +  add_to_queue (DF_REF_INSN_UID (ref));
>>> +
>>> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
>>> +{
>>> +  unsigned uid = DF_REF_INSN_UID (chain->ref);
>>> +
>>> +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
>>> +   continue;
>>> +
>>> +  if (!DF_REF_REG_MEM_P (chain->ref))
>>> +   continue;
>>>
>>> I believe here you wrongly jump to the next ref intead of actually 
>>> adding it
>>> to a queue.  You may just use
>>>
>>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>>>
>>> because you should'n have a candidate used in address operand.
>>
>> I will update.
>>
>>> +
>>> +  if (bitmap_bit_p (insns, uid))
>>> +   continue;
>>> +
>>> +  if (bitmap_bit_p (candidates, uid))
>>> +   add_to_queue (uid);
>>>
>>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses 
>>> and defs
>>> out of candidates list are allowed?
>>
>> That would be wrong since there are
>>
>>  while (!bitmap_empty_p (queue))
>> {
>>   insn_uid = bitmap_first_set_bit (queue);
>>   bitmap_clear_bit (queue, insn_uid);
>>   bitmap_clear_bit (candidates, insn_uid);
>>   add_insn (candidates, insn_uid);
>> }
>>
>> An instruction is a candidate and the bit is cleared when
>> analyze_register_chain is called.
>
> You clear candidates bit but the first thing you do in add_insn is set
> insns bit.
> Thus you should hit:
>
>   if (bitmap_bit_p (insns, uid))
> continue;
>
> For handled candidates.
>
> Probably it would be more clear if we keep this clear/set pair
> together?  E.g. move
> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.
>

 After we started processing candidates, we only use candidates
 to check if an instruction is a candidate, not to check if an
 instruction is NOT a candidate.
>>>
>>> I don't see how it's related to what I said.  My point is that
>>> when you analyze added insn you shouldn't reach insns which are both
>>> not in candidates and not in current scalar_chain_64.  That's why I
>>> think you miss an assert in scalar_chain_64::analyze_register_chain.
>>
>> Since all candidates will be processed by
>>
>>  while (!bitmap_empty_p (queue))
>> {
>>   insn_uid = bitmap_first_set_bit (queue);
>>   bitmap_clear_bit (queue, insn_uid);
>>   bitmap_clear_bit (candidates, insn_uid);
>>   add_insn (candidates, insn_uid);
>> }
>
> This process only queue, not all candidates.  analyze_register_chain
> fills queue bitmap to build a chain.
>
>>
>> I will change to
>>
>> /* Check REF's chain to add new insns into a queue.  */
>>
>> void
>> timode_scalar_chain::analyze_register_chain (bitmap candidates,
>>  df_ref ref)
>> {
>> gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (

Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread H.J. Lu
On Tue, Apr 26, 2016 at 9:20 AM, Ilya Enkovich  wrote:
> 2016-04-26 19:12 GMT+03:00 H.J. Lu :
>> On Tue, Apr 26, 2016 at 9:07 AM, Ilya Enkovich  
>> wrote:
>>> 2016-04-26 18:39 GMT+03:00 H.J. Lu :
 On Tue, Apr 26, 2016 at 8:21 AM, Ilya Enkovich  
 wrote:
> 2016-04-26 18:12 GMT+03:00 H.J. Lu :
>> On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich  
>> wrote:
>>> 2016-04-26 17:55 GMT+03:00 H.J. Lu :
 On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich 
  wrote:
> 2016-04-26 17:07 GMT+03:00 H.J. Lu :
>> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich 
>>  wrote:
>>> 2016-04-25 18:27 GMT+03:00 H.J. Lu :

 Ilya, can you take a look?

 Thanks.

 --
 H.J.
>>>
>>> Hi,
>>>
>>> Algorithmic part of the patch looks OK to me except the following 
>>> piece of code.
>>>
>>> +/* Check REF's chain to add new insns into a queue
>>> +   and find registers requiring conversion.  */
>>>
>>> Comment is wrong because you don't have any conversions required for
>>> your candidates.
>>
>> I will fix it.
>>
>>> +
>>> +void
>>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref 
>>> ref)
>>> +{
>>> +  df_link *chain;
>>> +
>>> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>>> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>>> +  add_to_queue (DF_REF_INSN_UID (ref));
>>> +
>>> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
>>> +{
>>> +  unsigned uid = DF_REF_INSN_UID (chain->ref);
>>> +
>>> +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
>>> +   continue;
>>> +
>>> +  if (!DF_REF_REG_MEM_P (chain->ref))
>>> +   continue;
>>>
>>> I believe here you wrongly jump to the next ref intead of actually 
>>> adding it
>>> to a queue.  You may just use
>>>
>>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>>>
>>> because you should'n have a candidate used in address operand.
>>
>> I will update.
>>
>>> +
>>> +  if (bitmap_bit_p (insns, uid))
>>> +   continue;
>>> +
>>> +  if (bitmap_bit_p (candidates, uid))
>>> +   add_to_queue (uid);
>>>
>>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses 
>>> and defs
>>> out of candidates list are allowed?
>>
>> That would be wrong since there are
>>
>>  while (!bitmap_empty_p (queue))
>> {
>>   insn_uid = bitmap_first_set_bit (queue);
>>   bitmap_clear_bit (queue, insn_uid);
>>   bitmap_clear_bit (candidates, insn_uid);
>>   add_insn (candidates, insn_uid);
>> }
>>
>> An instruction is a candidate and the bit is cleared when
>> analyze_register_chain is called.
>
> You clear candidates bit but the first thing you do in add_insn is set
> insns bit.
> Thus you should hit:
>
>   if (bitmap_bit_p (insns, uid))
> continue;
>
> For handled candidates.
>
> Probably it would be more clear if we keep this clear/set pair
> together?  E.g. move
> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.
>

 After we started processing candidates, we only use candidates
 to check if an instruction is a candidate, not to check if an
 instruction is NOT a candidate.
>>>
>>> I don't see how it's related to what I said.  My point is that
>>> when you analyze added insn you shouldn't reach insns which are both
>>> not in candidates and not in current scalar_chain_64.  That's why I
>>> think you miss an assert in scalar_chain_64::analyze_register_chain.
>>
>> Since all candidates will be processed by
>>
>>  while (!bitmap_empty_p (queue))
>> {
>>   insn_uid = bitmap_first_set_bit (queue);
>>   bitmap_clear_bit (queue, insn_uid);
>>   bitmap_clear_bit (candidates, insn_uid);
>>   add_insn (candidates, insn_uid);
>> }
>
> This process only queue, not all candidates.  analyze_register_chain
> fills queue bitmap to build a chain.
>
>>
>> I will change to
>>
>> /* Check REF's chain to add new insns into a queue.  */
>>
>> void
>> timode_scalar_chain::analyze_register_chain (bitmap candidates,
>>  df_ref ref)
>> {
>> gcc_assert (bitmap_bit_p (insns, DF_RE

Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread H.J. Lu
On Tue, Apr 26, 2016 at 9:27 AM, Ilya Enkovich  wrote:
> 2016-04-26 19:20 GMT+03:00 Ilya Enkovich :
>> 2016-04-26 19:12 GMT+03:00 H.J. Lu :
>>> On Tue, Apr 26, 2016 at 9:07 AM, Ilya Enkovich  
>>> wrote:
 2016-04-26 18:39 GMT+03:00 H.J. Lu :
> On Tue, Apr 26, 2016 at 8:21 AM, Ilya Enkovich  
> wrote:
>> 2016-04-26 18:12 GMT+03:00 H.J. Lu :
>>> On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich  
>>> wrote:
 2016-04-26 17:55 GMT+03:00 H.J. Lu :
> On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich 
>  wrote:
>> 2016-04-26 17:07 GMT+03:00 H.J. Lu :
>>> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich 
>>>  wrote:
 2016-04-25 18:27 GMT+03:00 H.J. Lu :
>
> Ilya, can you take a look?
>
> Thanks.
>
> --
> H.J.

 Hi,

 Algorithmic part of the patch looks OK to me except the following 
 piece of code.

 +/* Check REF's chain to add new insns into a queue
 +   and find registers requiring conversion.  */

 Comment is wrong because you don't have any conversions required 
 for
 your candidates.
>>>
>>> I will fix it.
>>>
 +
 +void
 +scalar_chain_64::analyze_register_chain (bitmap candidates, 
 df_ref ref)
 +{
 +  df_link *chain;
 +
 +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
 + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
 +  add_to_queue (DF_REF_INSN_UID (ref));
 +
 +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
 +{
 +  unsigned uid = DF_REF_INSN_UID (chain->ref);
 +
 +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
 +   continue;
 +
 +  if (!DF_REF_REG_MEM_P (chain->ref))
 +   continue;

 I believe here you wrongly jump to the next ref intead of actually 
 adding it
 to a queue.  You may just use

 gcc_assert (!DF_REF_REG_MEM_P (chain->ref));

 because you should'n have a candidate used in address operand.
>>>
>>> I will update.
>>>
 +
 +  if (bitmap_bit_p (insns, uid))
 +   continue;
 +
 +  if (bitmap_bit_p (candidates, uid))
 +   add_to_queue (uid);

 Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses 
 and defs
 out of candidates list are allowed?
>>>
>>> That would be wrong since there are
>>>
>>>  while (!bitmap_empty_p (queue))
>>> {
>>>   insn_uid = bitmap_first_set_bit (queue);
>>>   bitmap_clear_bit (queue, insn_uid);
>>>   bitmap_clear_bit (candidates, insn_uid);
>>>   add_insn (candidates, insn_uid);
>>> }
>>>
>>> An instruction is a candidate and the bit is cleared when
>>> analyze_register_chain is called.
>>
>> You clear candidates bit but the first thing you do in add_insn is 
>> set
>> insns bit.
>> Thus you should hit:
>>
>>   if (bitmap_bit_p (insns, uid))
>> continue;
>>
>> For handled candidates.
>>
>> Probably it would be more clear if we keep this clear/set pair
>> together?  E.g. move
>> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.
>>
>
> After we started processing candidates, we only use candidates
> to check if an instruction is a candidate, not to check if an
> instruction is NOT a candidate.

 I don't see how it's related to what I said.  My point is that
 when you analyze added insn you shouldn't reach insns which are both
 not in candidates and not in current scalar_chain_64.  That's why I
 think you miss an assert in scalar_chain_64::analyze_register_chain.
>>>
>>> Since all candidates will be processed by
>>>
>>>  while (!bitmap_empty_p (queue))
>>> {
>>>   insn_uid = bitmap_first_set_bit (queue);
>>>   bitmap_clear_bit (queue, insn_uid);
>>>   bitmap_clear_bit (candidates, insn_uid);
>>>   add_insn (candidates, insn_uid);
>>> }
>>
>> This process only queue, not all candidates.  analyze_register_chain
>> fills queue bitmap to build a chain.
>>
>>>
>>> I will change to
>>>
>>> /* Check REF's chain to add new insns into a queue.  */
>>>
>

constexpr function caching

2016-04-26 Thread Nathan Sidwell

Jason.
we currently clone constexpr function bodies when evaluating them (via the reuse 
cache that caused such fun).   The cloning is necessary so decls in different 
(recursive) evaluations are distinct.  We only need to clone for recursive 
evaluations though -- the outermost evaluation could use the original function body.


That leaves us with needing some mechanism to know whether we're doing a 
recursive call or not.  This patch implements such an optimization.


get_fundef_copy is changed to use get-or-insert, and uses the 'existed' arg of 
get_or_insert to know whether a new slot was created or not.  If a new slot was 
created, we know we're not  currently evaluating the function, so we can use it 
directly without copy.  Otherwise, we look in the slot to see if there's a 
cached body available.  If there isn't we create a clone.


When saving the function for later reuse, the original fn is not distinguished 
from copies.  That's ok -- we'll just see it as a cached body available for 
reuse on the next call. and if we run out of cached bodies, we'll create them 
because the slot will not have just been created.


Manually instrumenting the new testcase shows things behaving as I expected.

The change in error behaviour of ubsan/pr63956.C was a surprise, but I think a 
good one.  That testcase boiled down to:


constexpr int
fn6 (const int &a, int b)
{
  b = a;  <-- problem here
  return b;
}

constexpr int
fn7 (const int *a, int b)
{
  return fn6 (*a, b);
}

constexpr int n4 = fn7 ((const int *) 0, 8); <--  diag here

and we emitted the following diagnostic:
pr63956.ii:15:24:   in constexpr expansion of 'fn7(0u, 8)'
pr63956.ii:12:14:   in constexpr expansion of 'fn6((* a), b)'
pr63956.ii:15:43: error: 'a' is not a constant expression
 constexpr int n4 = fn7 ((const int *) 0, 8);

notice that final line is the assignment of 'n4' and not the location of the 
problem.


With this patch we now produce:

pr63956.ii: At global scope:
pr63956.ii:14:24:   in constexpr expansion of 'fn7(0u, 8)'
pr63956.ii:11:14:   in constexpr expansion of 'fn6((* a), b)'
pr63956.ii:5:7: error: 'a' is not a constant expression
   b = a;

which is better.  There's something about the function cloning that throws off 
the error location information.  Changing fn6 to be conditionally recursive will 
lead to the older error location behaviour when the problem is the cloned 
instance.  IMHO this is a collateral latent bug, and now it only affects 
recursive constexpr fns.


ok for trunk?

nathan
2016-04-26  Nathan Sidwell  

	cp/
	* constexpr.c (get_fundef_copy): Use the original function for
	non-recursive evaluations.
	(save_fundef_copy): Always expect a slot to be available.

	testsuite/
	* g++.dg/cpp0x/constexpr-recursion3.C: New.
	* g++.dg/ubsan/pr63956.C: Adjust error location.

Index: cp/constexpr.c
===
--- cp/constexpr.c	(revision 235364)
+++ cp/constexpr.c	(working copy)
@@ -989,7 +989,8 @@ maybe_initialize_fundef_copies_table ()
 }
 
 /* Reuse a copy or create a new unshared copy of the function FUN.
-   Return this copy.  */
+   Return this copy.  We use a TREE_LIST whose PURPOSE is body, VALUE
+   is parms, TYPE is result.  */
 
 static tree
 get_fundef_copy (tree fun)
@@ -997,15 +998,26 @@ get_fundef_copy (tree fun)
   maybe_initialize_fundef_copies_table ();
 
   tree copy;
-  tree *slot = fundef_copies_table->get (fun);
-  if (slot == NULL || *slot == NULL_TREE)
+  bool existed;
+  tree *slot = &fundef_copies_table->get_or_insert (fun, &existed);
+
+  if (!existed)
+{
+  /* There is no cached function available, or in use.  We can use
+	 the function directly.  That the slot is now created records
+	 that this function is now in use.  */
+  copy = build_tree_list (DECL_SAVED_TREE (fun), DECL_ARGUMENTS (fun));
+  TREE_TYPE (copy) = DECL_RESULT (fun);
+}
+  else if (*slot == NULL_TREE)
 {
+  /* We've already used the function itself, so make a copy.  */
   copy = build_tree_list (NULL, NULL);
-  /* PURPOSE is body, VALUE is parms, TYPE is result.  */
   TREE_PURPOSE (copy) = copy_fn (fun, TREE_VALUE (copy), TREE_TYPE (copy));
 }
   else
 {
+  /* We have a cached function available.  */
   copy = *slot;
   *slot = TREE_CHAIN (copy);
 }
@@ -1013,12 +1025,14 @@ get_fundef_copy (tree fun)
   return copy;
 }
 
-/* Save the copy COPY of function FUN for later reuse by get_fundef_copy().  */
+/* Save the copy COPY of function FUN for later reuse by
+   get_fundef_copy().  By construction, there will always be an entry
+   to find.  */
 
 static void
 save_fundef_copy (tree fun, tree copy)
 {
-  tree *slot = &fundef_copies_table->get_or_insert (fun, NULL);
+  tree *slot = fundef_copies_table->get (fun);
   TREE_CHAIN (copy) = *slot;
   *slot = copy;
 }
Index: testsuite/g++.dg/cpp0x/constexpr-recursion3.C
===
--- testsuite/g++.dg/cpp

Re: RFD: annotate iterator patterns with expanded forms

2016-04-26 Thread Richard Sandiford
Bernd Schmidt  writes:
> On 01/01/2016 07:02 PM, Hans-Peter Nilsson wrote:
>> On Tue, 1 Dec 2015, Bernd Schmidt wrote:
>
>>> The automatic Makefile approach might look something like this. The effect 
>>> is
>>> similar to what happens when you edit tm.texi.in, except the build would not
>>> be interrupted every time, only when you modify the iterator expansion of a
>>> pattern. There's a new rtx code which can be put into a machine
>>> description to
>>> enable this feature.
>>
>> (No-one else chimed in, so:)
>>
>> I really like this!
>
> Now that we're in stage1, I thought I'd bring this up again. For 
> reference, the patch was here:
>https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00165.html
>
> So, would you like this for cris and mmix? I could enable it for these, 
> then we'd need someone to review/approve the generator parts. I'm still 
> hoping the x86 maintainers would also consider it.

One question I'd like to ask first is: can anyone think of an alternative
syntax that would make things clearer and remove the need for this kind
of change?  It seems like a Bad Thing if the .md files are now so
indirect that people aren't sure what they're actually doing (and if
mddump doesn't help or is needed so often that it's too cumbersome).

Thanks,
Richard


Re: [middle-end][PATCH] Update alignment_for_piecewise_move

2016-04-26 Thread Richard Sandiford
"H.J. Lu"  writes:
> I am working a patch to enable SSE, AVX and AVX512 for memcpy/memset
> optimization.  x86 backend defines MAX_BITSIZE_MODE_ANY_INT to 128
> to keep the OI and XI modes from confusing the compiler into thinking
> that these modes could actually be used for computation.  But the OI
> and XI modes can be used for data movement with vector instructions.

But doesn't this then open the possibility that a memset or memcpy
will be seen as a "normal" integer operation?  Routines like
simplify_immed_subreg could in principle create a constant integer
for the stored value, which could then be treated by later passes as
a wide_int, breaking the MAX_BITSIZE_MODE_ANY_INT assumption.

Couldn't we move to allowing vector modes instead?  That seems better
than having to pander to the current situation in which every vector
effectively needs an associated integer mode.

Thanks,
Richard


[RFC] Update gmp/mpfr/mpc minimum versions

2016-04-26 Thread Bernd Edlinger
Hi,

as we all know, it's high time now to adjust the minimum supported
gmp/mpfr/mpc versions for gcc-7.

So this attached patch is now targeting at only supporting the currently
latest relased gmp-6.1.0, mpfr-3.1.4 and mpc-1.0.3, instead of trying
to support all previous versions, especially in-tree, but also without the
in-tree configuration the previously used versions are really creating
trouble nowadays.

To support gmp-6.1.0 in-tree we have to set AM_CFLAGS=-DNO_ASM
as already done with mpfr.  This was already discussed recently on this list.

Initially I was trying to use gmp's new --disable-assembly, but
after careful consideration, I think that we still need to set gmp's
target to none-host-vendor triplet, because only that prevents
gmp to get smart on selecting one of the different possible
target-ABIs.  This would not really work in general, because
gcc overrides the CFLAGS an thus defeats the ABI selection
again.

What do you think, is this patch generally acceptable?

Of course it would mean that everybody would have to use
the recent gmp/mpfr/mpc tar balls immediately, and someone would
have to upload them to ftp://gcc.gnu.org/pub/gcc/infrastructure/
before this patch can be applied of course (I don't know how to).


Thanks
Bernd.2016-04-26  Bernd Edlinger  

* configure.ac (mpfr): Remove pre-3.1.0 compatibility.
* configure: Regenerated.
* Makefile.def (gmp): Explicitly disable assembler.
(mpfr): Adjust lib_path.
(mpc): Likewise.
* Makefile.in: Regenerated.

gcc/
2016-04-26  Bernd Edlinger  

* doc/install.texi: Adjust gmp/mpfr/mpc minimum versions.

contrib/
2016-04-26  Bernd Edlinger  

* download_prerequisites: Adjust gmp/mpfr/mpc versions.
Index: Makefile.def
===
--- Makefile.def	(revision 234533)
+++ Makefile.def	(working copy)
@@ -50,6 +50,7 @@ host_modules= { module= gcc; bootstrap=true;
 host_modules= { module= gmp; lib_path=.libs; bootstrap=true;
 		// Work around in-tree gmp configure bug with missing flex.
 		extra_configure_flags='--disable-shared LEX="touch lex.yy.c"';
+		extra_make_flags='AM_CFLAGS="-DNO_ASM"';
 		no_install= true;
 		// none-*-* disables asm optimizations, bootstrap-testing
 		// the compiler more thoroughly.
@@ -57,11 +58,11 @@ host_modules= { module= gmp; lib_path=.libs; boots
 		// gmp's configure will complain if given anything
 		// different from host for target.
 	target="none-${host_vendor}-${host_os}"; };
-host_modules= { module= mpfr; lib_path=.libs; bootstrap=true;
+host_modules= { module= mpfr; lib_path=src/.libs; bootstrap=true;
 		extra_configure_flags='--disable-shared @extra_mpfr_configure_flags@';
 		extra_make_flags='AM_CFLAGS="-DNO_ASM"';
 		no_install= true; };
-host_modules= { module= mpc; lib_path=.libs; bootstrap=true;
+host_modules= { module= mpc; lib_path=src/.libs; bootstrap=true;
 		extra_configure_flags='--disable-shared @extra_mpc_gmp_configure_flags@ @extra_mpc_mpfr_configure_flags@';
 		no_install= true; };
 host_modules= { module= isl; lib_path=.libs; bootstrap=true;
Index: Makefile.in
===
--- Makefile.in	(revision 234533)
+++ Makefile.in	(working copy)
@@ -639,12 +639,12 @@ HOST_LIB_PATH_gmp = \
 
 @if mpfr
 HOST_LIB_PATH_mpfr = \
-  $$r/$(HOST_SUBDIR)/mpfr/.libs:$$r/$(HOST_SUBDIR)/prev-mpfr/.libs:
+  $$r/$(HOST_SUBDIR)/mpfr/src/.libs:$$r/$(HOST_SUBDIR)/prev-mpfr/src/.libs:
 @endif mpfr
 
 @if mpc
 HOST_LIB_PATH_mpc = \
-  $$r/$(HOST_SUBDIR)/mpc/.libs:$$r/$(HOST_SUBDIR)/prev-mpc/.libs:
+  $$r/$(HOST_SUBDIR)/mpc/src/.libs:$$r/$(HOST_SUBDIR)/prev-mpc/src/.libs:
 @endif mpc
 
 @if isl
@@ -11299,7 +11299,7 @@ all-gmp: configure-gmp
 	s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
 	$(HOST_EXPORTS)  \
 	(cd $(HOST_SUBDIR)/gmp && \
-	  $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_HOST_FLAGS) $(STAGE1_FLAGS_TO_PASS)  \
+	  $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_HOST_FLAGS) $(STAGE1_FLAGS_TO_PASS) AM_CFLAGS="-DNO_ASM" \
 		$(TARGET-gmp))
 @endif gmp
 
@@ -11328,7 +11328,7 @@ all-stage1-gmp: configure-stage1-gmp
 		CXXFLAGS_FOR_TARGET="$(CXXFLAGS_FOR_TARGET)" \
 		LIBCFLAGS_FOR_TARGET="$(LIBCFLAGS_FOR_TARGET)" \
 		$(EXTRA_HOST_FLAGS)  \
-		$(STAGE1_FLAGS_TO_PASS)  \
+		$(STAGE1_FLAGS_TO_PASS) AM_CFLAGS="-DNO_ASM" \
 		TFLAGS="$(STAGE1_TFLAGS)" \
 		$(TARGET-stage1-gmp)
 
@@ -11343,7 +11343,7 @@ clean-stage1-gmp:
 	fi; \
 	cd $(HOST_SUBDIR)/gmp && \
 	$(MAKE) $(EXTRA_HOST_FLAGS)  \
-	$(STAGE1_FLAGS_TO_PASS)  clean
+	$(STAGE1_FLAGS_TO_PASS) AM_CFLAGS="-DNO_ASM" clean
 @endif gmp-bootstrap
 
 
@@ -11370,7 +11370,7 @@ all-stage2-gmp: configure-stage2-gmp
 		CFLAGS_FOR_TARGET="$(CFLAGS_FOR_TARGET)" \
 		CXXFLAGS_FOR_TARGET="$(CXXFLAGS_FOR_TARGET)" \
 		LIBCFLAGS_FOR_TARGET="$(LIBCFLAGS_FOR_TARGET)" \
-		$(EXTRA_HOST_FLAGS) $(POSTSTAGE1_FLAGS_TO_PASS)  \
+		$(EXTRA_HOST_FLAGS) $(POSTSTAGE1_FLAGS_TO_PASS) AM_CFLAGS="-DNO_ASM" \
 		TFLAGS="$(STAGE2_TFLAGS)

Re: [RFC] Update gmp/mpfr/mpc minimum versions

2016-04-26 Thread Jakub Jelinek
On Tue, Apr 26, 2016 at 06:23:17PM +, Bernd Edlinger wrote:
> as we all know, it's high time now to adjust the minimum supported
> gmp/mpfr/mpc versions for gcc-7.

I'm not saying we shouldn't bump the minimum supported versions, but bumping
them immediately to the latest releases is IMHO undesirable unless the
benefits can justify it.
E.g. I don't consider gmp-6.0.0, or mpfr-3.1.2, or mpc-1.0.2 that obsolete
that I'd have to download a newer one for my daily use.

Jakub


Re: [middle-end][PATCH] Update alignment_for_piecewise_move

2016-04-26 Thread Bernd Schmidt

On 04/26/2016 08:21 PM, Richard Sandiford wrote:

"H.J. Lu"  writes:

I am working a patch to enable SSE, AVX and AVX512 for memcpy/memset
optimization.  x86 backend defines MAX_BITSIZE_MODE_ANY_INT to 128
to keep the OI and XI modes from confusing the compiler into thinking
that these modes could actually be used for computation.  But the OI
and XI modes can be used for data movement with vector instructions.


But doesn't this then open the possibility that a memset or memcpy
will be seen as a "normal" integer operation?  Routines like
simplify_immed_subreg could in principle create a constant integer
for the stored value, which could then be treated by later passes as
a wide_int, breaking the MAX_BITSIZE_MODE_ANY_INT assumption.

Couldn't we move to allowing vector modes instead?  That seems better
than having to pander to the current situation in which every vector
effectively needs an associated integer mode.


I'm actually working on a patch in this area. Haven't gotten to the 
point of allowing vector modes yet, but it's something that I've had on 
my mind; I think it would be a good idea.



Bernd



Re: [PATCH 02/18] make avail_stores a vec

2016-04-26 Thread Richard Sandiford
Another style nit on top of the ones already posted, sorry:

tbsaunde+...@tbsaunde.org writes:
> +void
> +print_rtx_insn_vec (FILE *file, const vec &vec)
> +{
> +  fputc('{', file);

missing space before '('.

Richard


Re: Fix some i386 testcases for -frename-registers

2016-04-26 Thread H.J. Lu
On Tue, Apr 26, 2016 at 5:26 AM, Bernd Schmidt  wrote:
> On 01/29/2016 01:19 PM, Uros Bizjak wrote:
>>>
>>> * gcc.target/i386/avx512bw-vptestmb-1.c: Correct [xyz]mm register
>>> number scans.
>>> * gcc.target/i386/avx512bw-vptestmw-1.c: Likewise.
>>> * gcc.target/i386/avx512bw-vptestnmb-1.c: Likewise.
>>> * gcc.target/i386/avx512bw-vptestnmw-1.c: Likewise.
>>> * gcc.target/i386/avx512cd-vpbroadcastmb2q-1.c: Likewise.
>>> * gcc.target/i386/avx512cd-vpbroadcastmw2d-1.c: Likewise.
>>> * gcc.target/i386/avx512dq-vfpclasspd-1.c: Likewise.
>>> * gcc.target/i386/avx512dq-vfpclassps-1.c: Likewise.
>>> * gcc.target/i386/avx512dq-vinsertf64x2-1.c: Likewise.
>>> * gcc.target/i386/avx512dq-vinserti64x2-1.c: Likewise.
>>> * gcc.target/i386/avx512f-gather-5.c: Likewise.
>>> * gcc.target/i386/avx512f-vptestmd-1.c: Likewise.
>>> * gcc.target/i386/avx512f-vptestmq-1.c: Likewise.
>>> * gcc.target/i386/avx512f-vptestnmd-1.c: Likewise.
>>> * gcc.target/i386/avx512f-vptestnmq-1.c: Likewise.
>>> * gcc.target/i386/avx512f-vrndscaleps-1.c: Likewise.
>>> * gcc.target/i386/avx512vl-vpbroadcastmb2q-1.c: Likewise.
>>> * gcc.target/i386/avx512vl-vpbroadcastmw2d-1.c: Likewise.
>>> * gcc.target/i386/avx512vl-vptestmd-1.c: Likewise.
>>> * gcc.target/i386/avx512vl-vptestmq-1.c: Likewise.
>>> * gcc.target/i386/avx512vl-vptestnmd-1.c: Likewise.
>>> * gcc.target/i386/avx512vl-vptestnmq-1.c: Likewise.
>>> * gcc.target/i386/pr32219-2.c: Allow registers other than %eax in
>>> scans.
>>> * gcc.target/i386/pr32219-4.c: Likewise.
>>> * gcc.target/i386/pr32219-6.c: Likewise.
>>> * gcc.target/i386/pr32219-8.c: Likewise.
>>
>>
>> OK.
>
>
> Now committed.

This

/* { dg-final { scan-assembler-times "vmovdqa64\[
\\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\nxy\]*\\(.{5}(?:\n|\[ \\t\]+#)" 1 {
target nonpic } } } */

fails on x32 since x32 with 32-bit pointers has (%r10d) instead of
(%r10).  .{5} doesn't match.


-- 
H.J.


Re: [RFC] Update gmp/mpfr/mpc minimum versions

2016-04-26 Thread Marc Glisse

On Tue, 26 Apr 2016, Bernd Edlinger wrote:


as we all know, it's high time now to adjust the minimum supported
gmp/mpfr/mpc versions for gcc-7.

So this attached patch is now targeting at only supporting the currently
latest relased gmp-6.1.0, mpfr-3.1.4 and mpc-1.0.3, instead of trying
to support all previous versions, especially in-tree, but also without the
in-tree configuration the previously used versions are really creating
trouble nowadays.


For in-tree builds, requiring the latest version makes sense to me. When 
building gcc with a pre-installed gmp/mpfr/mpc (say the one provided by 
your linux distribution), that's much more painful, and the benefit is not 
as clear. What "trouble" do you have in mind?



To support gmp-6.1.0 in-tree we have to set AM_CFLAGS=-DNO_ASM
as already done with mpfr.  This was already discussed recently on this list.


Note that gmp-6.1.1 is supposed to be out any day now and will not require 
-DNO_ASM anymore.


--
Marc Glisse


Re: [middle-end][PATCH] Update alignment_for_piecewise_move

2016-04-26 Thread H.J. Lu
On Tue, Apr 26, 2016 at 11:31 AM, Bernd Schmidt  wrote:
> On 04/26/2016 08:21 PM, Richard Sandiford wrote:
>>
>> "H.J. Lu"  writes:
>>>
>>> I am working a patch to enable SSE, AVX and AVX512 for memcpy/memset
>>> optimization.  x86 backend defines MAX_BITSIZE_MODE_ANY_INT to 128
>>> to keep the OI and XI modes from confusing the compiler into thinking
>>> that these modes could actually be used for computation.  But the OI
>>> and XI modes can be used for data movement with vector instructions.
>>
>>
>> But doesn't this then open the possibility that a memset or memcpy
>> will be seen as a "normal" integer operation?  Routines like
>> simplify_immed_subreg could in principle create a constant integer
>> for the stored value, which could then be treated by later passes as
>> a wide_int, breaking the MAX_BITSIZE_MODE_ANY_INT assumption.
>>
>> Couldn't we move to allowing vector modes instead?  That seems better
>> than having to pander to the current situation in which every vector
>> effectively needs an associated integer mode.
>
>
> I'm actually working on a patch in this area. Haven't gotten to the point of
> allowing vector modes yet, but it's something that I've had on my mind; I
> think it would be a good idea.
>

I am looking forward to seeing your patch.

Thanks.

-- 
H.J.


Re: [RFC] Update gmp/mpfr/mpc minimum versions

2016-04-26 Thread Bernd Edlinger
On 26.04.2016 20:29, Jakub Jelinek wrote:
> On Tue, Apr 26, 2016 at 06:23:17PM +, Bernd Edlinger wrote:
>> as we all know, it's high time now to adjust the minimum supported
>> gmp/mpfr/mpc versions for gcc-7.
>
> I'm not saying we shouldn't bump the minimum supported versions, but bumping
> them immediately to the latest releases is IMHO undesirable unless the
> benefits can justify it.
> E.g. I don't consider gmp-6.0.0, or mpfr-3.1.2, or mpc-1.0.2 that obsolete
> that I'd have to download a newer one for my daily use.
>
>   Jakub
>

Yes, thanks.

That would be acceptable as well.  I think the patch will work
with these versions too.

I'd even use these versions for download_prerequisites if they
are more widespread in use today.


Bernd.


Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-26 Thread H.J. Lu
On Tue, Apr 26, 2016 at 9:33 AM, H.J. Lu  wrote:
> On Tue, Apr 26, 2016 at 9:27 AM, Ilya Enkovich  wrote:
>> 2016-04-26 19:20 GMT+03:00 Ilya Enkovich :
>>> 2016-04-26 19:12 GMT+03:00 H.J. Lu :
 On Tue, Apr 26, 2016 at 9:07 AM, Ilya Enkovich  
 wrote:
> 2016-04-26 18:39 GMT+03:00 H.J. Lu :
>> On Tue, Apr 26, 2016 at 8:21 AM, Ilya Enkovich  
>> wrote:
>>> 2016-04-26 18:12 GMT+03:00 H.J. Lu :
 On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich 
  wrote:
> 2016-04-26 17:55 GMT+03:00 H.J. Lu :
>> On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich 
>>  wrote:
>>> 2016-04-26 17:07 GMT+03:00 H.J. Lu :
 On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich 
  wrote:
> 2016-04-25 18:27 GMT+03:00 H.J. Lu :
>>
>> Ilya, can you take a look?
>>
>> Thanks.
>>
>> --
>> H.J.
>
> Hi,
>
> Algorithmic part of the patch looks OK to me except the following 
> piece of code.
>
> +/* Check REF's chain to add new insns into a queue
> +   and find registers requiring conversion.  */
>
> Comment is wrong because you don't have any conversions required 
> for
> your candidates.

 I will fix it.

> +
> +void
> +scalar_chain_64::analyze_register_chain (bitmap candidates, 
> df_ref ref)
> +{
> +  df_link *chain;
> +
> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
> + || bitmap_bit_p (candidates, DF_REF_INSN_UID 
> (ref)));
> +  add_to_queue (DF_REF_INSN_UID (ref));
> +
> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
> +{
> +  unsigned uid = DF_REF_INSN_UID (chain->ref);
> +
> +  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
> +   continue;
> +
> +  if (!DF_REF_REG_MEM_P (chain->ref))
> +   continue;
>
> I believe here you wrongly jump to the next ref intead of 
> actually adding it
> to a queue.  You may just use
>
> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>
> because you should'n have a candidate used in address operand.

 I will update.

> +
> +  if (bitmap_bit_p (insns, uid))
> +   continue;
> +
> +  if (bitmap_bit_p (candidates, uid))
> +   add_to_queue (uid);
>
> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no 
> uses and defs
> out of candidates list are allowed?

 That would be wrong since there are

  while (!bitmap_empty_p (queue))
 {
   insn_uid = bitmap_first_set_bit (queue);
   bitmap_clear_bit (queue, insn_uid);
   bitmap_clear_bit (candidates, insn_uid);
   add_insn (candidates, insn_uid);
 }

 An instruction is a candidate and the bit is cleared when
 analyze_register_chain is called.
>>>
>>> You clear candidates bit but the first thing you do in add_insn is 
>>> set
>>> insns bit.
>>> Thus you should hit:
>>>
>>>   if (bitmap_bit_p (insns, uid))
>>> continue;
>>>
>>> For handled candidates.
>>>
>>> Probably it would be more clear if we keep this clear/set pair
>>> together?  E.g. move
>>> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.
>>>
>>
>> After we started processing candidates, we only use candidates
>> to check if an instruction is a candidate, not to check if an
>> instruction is NOT a candidate.
>
> I don't see how it's related to what I said.  My point is that
> when you analyze added insn you shouldn't reach insns which are both
> not in candidates and not in current scalar_chain_64.  That's why I
> think you miss an assert in scalar_chain_64::analyze_register_chain.

 Since all candidates will be processed by

  while (!bitmap_empty_p (queue))
 {
   insn_uid = bitmap_first_set_bit (queue);
   bitmap_clear_bit (queue, insn_uid);
   bitmap_clear_bit (candidates, insn_uid);
   add_insn (candidates, insn_uid);
 }
>>>
>>> This process only queue, not all candid

Re: [RFC] Update gmp/mpfr/mpc minimum versions

2016-04-26 Thread Richard Biener
On April 26, 2016 8:39:28 PM GMT+02:00, Marc Glisse  
wrote:
>On Tue, 26 Apr 2016, Bernd Edlinger wrote:
>
>> as we all know, it's high time now to adjust the minimum supported
>> gmp/mpfr/mpc versions for gcc-7.
>>
>> So this attached patch is now targeting at only supporting the
>currently
>> latest relased gmp-6.1.0, mpfr-3.1.4 and mpc-1.0.3, instead of trying
>> to support all previous versions, especially in-tree, but also
>without the
>> in-tree configuration the previously used versions are really
>creating
>> trouble nowadays.
>
>For in-tree builds, requiring the latest version makes sense to me.
>When 
>building gcc with a pre-installed gmp/mpfr/mpc (say the one provided by
>
>your linux distribution), that's much more painful, and the benefit is
>not 
>as clear. What "trouble" do you have in mind?
>
>> To support gmp-6.1.0 in-tree we have to set AM_CFLAGS=-DNO_ASM
>> as already done with mpfr.  This was already discussed recently on
>this list.
>
>Note that gmp-6.1.1 is supposed to be out any day now and will not
>require 
>-DNO_ASM anymore.

I agree with Marc here - we should only update the version used by download 
prerequisites and document that this is the only supported version for in-tree 
builds.

Richard.



Re: [RFC] Update gmp/mpfr/mpc minimum versions

2016-04-26 Thread Bernd Edlinger
On 26.04.2016 20:39, Marc Glisse wrote:
> On Tue, 26 Apr 2016, Bernd Edlinger wrote:
>
>> as we all know, it's high time now to adjust the minimum supported
>> gmp/mpfr/mpc versions for gcc-7.
>>
>> So this attached patch is now targeting at only supporting the currently
>> latest relased gmp-6.1.0, mpfr-3.1.4 and mpc-1.0.3, instead of trying
>> to support all previous versions, especially in-tree, but also without
>> the
>> in-tree configuration the previously used versions are really creating
>> trouble nowadays.
>
> For in-tree builds, requiring the latest version makes sense to me. When
> building gcc with a pre-installed gmp/mpfr/mpc (say the one provided by
> your linux distribution), that's much more painful, and the benefit is
> not as clear. What "trouble" do you have in mind?
>

For instance PR libstdc++/69881: gmp.h did this:

#define __need_size_t  /* tell gcc stddef.h we only want size_t */
#include  /* for size_t */

I've persuaded Jonathan to work around that in libstdc++.

Of course the in-tree build does work with less versions than
otherwise.

>> To support gmp-6.1.0 in-tree we have to set AM_CFLAGS=-DNO_ASM
>> as already done with mpfr.  This was already discussed recently on
>> this list.
>
> Note that gmp-6.1.1 is supposed to be out any day now and will not
> require -DNO_ASM anymore.
>

Yes, I know.  I mean, I have tested the patch already with the gmp 
snapshots, but was not aware that it will be available so soon.
However I think it would not really hurt to allow a little
more interoperability, even for in-tree builds.

It is just a matter what versions we want to test, I have not really
any idea where the limits will be, just that it can no longer be 4.3.2.


Thanks
Bernd.


  1   2   >