RE: [RFC] [Patch] PR67326 - relax trap assumption by looking at similar DRS

2015-11-27 Thread Kumar, Venkataramanan
Hi Richard,

> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Tuesday, November 24, 2015 9:07 PM
> To: Kumar, Venkataramanan
> Cc: Jakub Jelinek (ja...@redhat.com); gcc-patches@gcc.gnu.org
> Subject: Re: [RFC] [Patch] PR67326 - relax trap assumption by looking at
> similar DRS
> 
> On Fri, Nov 20, 2015 at 1:02 PM, Kumar, Venkataramanan
>  wrote:
> > Hi Richard,
> >
> > As per Jakub suggestion in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67326, the below patch fixes
> the regression in tree if conversion.
> > Basically allowing if conversion to happen for a candidate DR, if we find
> similar DR with same dimensions  and that DR will not trap.
> >
> > To find similar DRs using hash table to hashing the offset and DR pairs.
> > Also reusing  read/written information that was stored for reference tree.
> >
> > Also.
> > (1) I guard these checks for  -ftree-loop-if-convert-stores and -fno-
> common.
> > Sometimes vectorization flags also triggers if conversion.
> > (2) Also hashing base DRs for writes only.
> >
> > gcc/ChangeLog
> > 2015-11-19  Venkataramanan  
> >
> > PR tree-optimization/67326
> > * tree-if-conv.c  (offset_DR_map): Define.
> > (struct ifc_dr): Add new tree base_predicate field.
> > (hash_memrefs_baserefs_and_store_DRs_read_written_info): Hash
> offsets, DR pairs
> > and hash base ref,  DR pairs  for write type DRs.
> > (ifcvt_memrefs_wont_trap):  Guard checks with -ftree-loop-if-
> convert-stores flag.
> >Check for similar DR that are accessed unconditionally.
> >(if_convertible_loop_p_1):  Initialize and delete offset hash
> > maps
> >
> > gcc/testsuite/ChangeLog
> > 2015-11-19  Venkataramanan  
> > * gcc.dg/tree-ssa/ifc-pr67326.c:  Add new.
> >
> > Regstrapped on x86_64, Ok for trunk?
> 
> +  if (offset)
> +{
> +  offset_master_dr = &offset_DR_map->get_or_insert (offset,&exist3);
> +  if (!exist3)
> +   *offset_master_dr = a;
> +
> +  if (DR_RW_UNCONDITIONALLY (*offset_master_dr) != 1)
> +   DR_RW_UNCONDITIONALLY (*offset_master_dr)
> +   = DR_RW_UNCONDITIONALLY (*master_dr);
> 
> this is fishy - as far as I can see offset_master globs all _candidates_ and
> 
> +  else if (DR_OFFSET (a))
> +{
> +  offset_dr = offset_DR_map->get (DR_OFFSET (a));
> +  if ((DR_RW_UNCONDITIONALLY (*offset_dr) == 1)
> +  && DR_NUM_DIMENSIONS (a) == DR_NUM_DIMENSIONS
> (*offset_dr))
> +   {
> + tree base_tree = get_base_address (DR_REF (a));
> + if (DECL_P (base_tree)
> + && flag_tree_loop_if_convert_stores
> + && decl_binds_to_current_def_p (base_tree)
> + && !TREE_READONLY (base_tree))
> +   return true;
> +   }
> +}
> 
> where with this that actually checks something (DR_NUM_DIMENSIONS is
> not something you can use to identify two arrays with the same domain) will
> then consider DR_DW_UNCONDITIONALLY ORed from all _candidates_ but
> not only from those which really have the same domain.
> 
> You need to do the domain check as part of the hash-map
> hashing/comparing.
> 
> Note that there is no bounds info in the data ref info so you need to
>   a) consider DR_OFFSET + DR_INIT
>   b) verify the access size is the same (TYPE_SIZE_UNIT (TREE_TYPE (dr-
> >ref)))
>   c) verify the base objects are of the same size - note this is somewhat
> difficult as the base object for DR_OFFSET/INIT is starting at
> DR_BASE_ADDRESS so maybe restrict this to ADDR_EXPR 
> DR_BASE_ADDRESS cases where you can look at DECL_SIZE (decl) of both
> candidates
> 
> You can also try using indices (DR_BASE_OBJECT plus DR_ACCESS_FNS when
> DR_UNCONSTRAINED_BASE is false).  If the size of DR_BASE_OBJECT
> matches and all access functions are equal it should be a compatible enough
> case as well.

Ok,  I will take some time to figure out on domain analysis part. 

> 
> I'd say you should split out the base_predicate introduction into a separate
> patch (this change looks ok).
> 

Attached patch has the  "base_predicate" introduction part alone. 
It does the predicate folding  and hashes base references for only write type 
DRs while hashing.
I have not added any new test case since we already have  ifc-8.c

Also fixed formatting issues Jakub  pointed out for this patch.
 
Boot strapped on X86_64. 

Ok to upstream if it passes regression tests?

gcc/ChangeLog
2015-11-27  Venkataramanan Kumar  

* tree-if-conv.c (struct ifc_dr): Add new tree 
base_predicate field.
(hash_memrefs_baserefs_and_store_DRs_read_written_info): Hash 
base ref, DR pairs and store base_predicate for write type DRs.
(ifcvt_memrefs_wont_trap): Guard checks with
-ftree-loop-if-convert-stores flag

Regards,
Venkat
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 01065cb..f43942d 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -589,6 +589,8 @@ struct

Re: [PATCH 7/N] Fix newly introduced memory leak in tree-ssa-loop-ivopts.c

2015-11-27 Thread Martin Liška
On 11/27/2015 04:54 AM, Bin.Cheng wrote:
> On Fri, Nov 27, 2015 at 5:08 AM, Martin Liška  wrote:
>> Hi.
>>
>> There's one more patch that fixes really of lot memory leaks related to loop
>> ivopts.
>> The regression was introduced by r230647.
>>
>> Patch was tested in the series with the rest and the compiler bootstraps
>> successfully.
>>
>> Ready for trunk?
> 
> Hi Martin,
> Thanks for fixing my issue.  The IVO part of patch is OK.
> Just for me to understand, iv_common_cand is freed via free_ptr_hash,
> and thus typed_free_remove.  So what leaks is the iv_use * vector in
> struct iv_common_cand, right?  I did forget to free that.

Hi.

You are right, the suggested patch uses delete operator for deallocation of 
iv_common_cand
structure. That eventually calls dtor of auto_vec.

> BTW, how do you monitor memory use in GCC, maybe I can run same test
> for my future patches.

I've been working on removal of memory leaks using valgrind, just configure the 
compiler with
'--enable-valgrind-annotations' and run for instance:

valgrind --leak-check=yes --trace-children=yes ./gcc/xgcc -Bgcc 
../gcc/testsuite/gcc.dg/tree-ssa/loop-32.c -c -O2

Producing:
...
==13919== 216 bytes in 3 blocks are definitely lost in loss record 679 of 795
==13919==at 0x4C2A00F: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==13919==by 0x107CEDF: xrealloc (xmalloc.c:178)
==13919==by 0xAC46AA: reserve (vec.h:288)
==13919==by 0xAC46AA: reserve (vec.h:1406)
==13919==by 0xAC46AA: reserve_exact (vec.h:1426)
==13919==by 0xAC46AA: create (vec.h:1441)
==13919==by 0xAC46AA: record_common_cand(ivopts_data*, tree_node*, 
tree_node*, iv_use*) (tree-ssa-loop-ivopts.c:3133)
==13919==by 0xAC49C5: add_iv_candidate_for_use(ivopts_data*, iv_use*) 
(tree-ssa-loop-ivopts.c:3220)
==13919==by 0xAC4EA2: add_iv_candidate_for_uses 
(tree-ssa-loop-ivopts.c:3294)
==13919==by 0xAC4EA2: find_iv_candidates(ivopts_data*) 
(tree-ssa-loop-ivopts.c:5705)
==13919==by 0xAC839D: tree_ssa_iv_optimize_loop 
(tree-ssa-loop-ivopts.c:7708)
==13919==by 0xAC839D: tree_ssa_iv_optimize() (tree-ssa-loop-ivopts.c:7758)
==13919==by 0xADE4D0: (anonymous 
namespace)::pass_iv_optimize::execute(function*) (tree-ssa-loop.c:520)
==13919==by 0x920033: execute_one_pass(opt_pass*) (passes.c:2335)
==13919==by 0x920547: execute_pass_list_1(opt_pass*) [clone .constprop.84] 
(passes.c:2408)
==13919==by 0x920559: execute_pass_list_1(opt_pass*) [clone .constprop.84] 
(passes.c:2409)
==13919==by 0x920559: execute_pass_list_1(opt_pass*) [clone .constprop.84] 
(passes.c:2409)
==13919==by 0x9205A4: execute_pass_list(function*, opt_pass*) 
(passes.c:2419)
...

Martin

> 
> Thanks,
> bin
> 



[PATCH][1/2] Fix PR68553

2015-11-27 Thread Richard Biener

This is part 1 of a fix for PR68533 which shows that some targets
cannot can_vec_perm_p on an identity permutation.  I chose to fix
this in the vectorizer by detecting the identity itself but with
the current structure of vect_transform_slp_perm_load this is
somewhat awkward.  Thus the following no-op patch simplifies it
greatly (from the times it was restricted to do interleaving-kind
of permutes).  It turned out to not be 100% no-op as we now can
handle non-adjacent source operands so I split it out from the
actual fix.

The two adjusted testcases no longer fail to vectorize because
of "need three vectors" but unadjusted would fail because there
are simply not enough scalar iterations in the loop.  I adjusted
that and now we vectorize it just fine (running into PR68559
which I filed).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-11-27  Richard Biener  

PR tree-optimization/68553
* tree-vect-slp.c (vect_get_mask_element): Remove.
(vect_transform_slp_perm_load): Implement in a simpler way.

* gcc.dg/vect/pr45752.c: Adjust.
* gcc.dg/vect/slp-perm-4.c: Likewise.

Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 230962)
--- gcc/tree-vect-slp.c (working copy)
*** vect_create_mask_and_perm (gimple *stmt,
*** 3241,3342 
  }
  
  
- /* Given FIRST_MASK_ELEMENT - the mask element in element representation,
-return in CURRENT_MASK_ELEMENT its equivalent in target specific
-representation.  Check that the mask is valid and return FALSE if not.
-Return TRUE in NEED_NEXT_VECTOR if the permutation requires to move to
-the next vector, i.e., the current first vector is not needed.  */
- 
- static bool
- vect_get_mask_element (gimple *stmt, int first_mask_element, int m,
-int mask_nunits, bool only_one_vec, int index,
-  unsigned char *mask, int *current_mask_element,
-bool *need_next_vector, int *number_of_mask_fixes,
-bool *mask_fixed, bool *needs_first_vector)
- {
-   int i;
- 
-   /* Convert to target specific representation.  */
-   *current_mask_element = first_mask_element + m;
-   /* Adjust the value in case it's a mask for second and third vectors.  */
-   *current_mask_element -= mask_nunits * (*number_of_mask_fixes - 1);
- 
-   if (*current_mask_element < 0)
- {
-   if (dump_enabled_p ())
-   {
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-  "permutation requires past vector ");
- dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
- dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
-   }
-   return false;
- }
- 
-   if (*current_mask_element < mask_nunits)
- *needs_first_vector = true;
- 
-   /* We have only one input vector to permute but the mask accesses values in
-  the next vector as well.  */
-   if (only_one_vec && *current_mask_element >= mask_nunits)
- {
-   if (dump_enabled_p ())
- {
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-  "permutation requires at least two vectors ");
-   dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
-   dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
- }
- 
-   return false;
- }
- 
-   /* The mask requires the next vector.  */
-   while (*current_mask_element >= mask_nunits * 2)
- {
-   if (*needs_first_vector || *mask_fixed)
- {
-   /* We either need the first vector too or have already moved to the
-  next vector. In both cases, this permutation needs three
-  vectors.  */
-   if (dump_enabled_p ())
- {
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-  "permutation requires at "
-  "least three vectors ");
-   dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
-   dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
- }
- 
-   return false;
- }
- 
-   /* We move to the next vector, dropping the first one and working with
-  the second and the third - we need to adjust the values of the mask
-  accordingly.  */
-   *current_mask_element -= mask_nunits * *number_of_mask_fixes;
- 
-   for (i = 0; i < index; i++)
- mask[i] -= mask_nunits * *number_of_mask_fixes;
- 
-   (*number_of_mask_fixes)++;
-   *mask_fixed = true;
- }
- 
-   *need_next_vector = *mask_fixed;
- 
-   /* This was the last element of this mask. Start a new one.  */
-   if (index == mask_nunits - 1)
- {
-   *number_of_mask_fixes = 1;
-   *mask_fixed = false;
-   *needs_first_vector = false;
- }
- 
-   return true;
- }
- 
- 
  /* Generate vector permute stateme

Re: [PATCH 7/N] Fix newly introduced memory leak in tree-ssa-loop-ivopts.c

2015-11-27 Thread Bin.Cheng
On Fri, Nov 27, 2015 at 4:29 PM, Martin Liška  wrote:
> On 11/27/2015 04:54 AM, Bin.Cheng wrote:
>> On Fri, Nov 27, 2015 at 5:08 AM, Martin Liška  wrote:
>>> Hi.
>>>
>>> There's one more patch that fixes really of lot memory leaks related to loop
>>> ivopts.
>>> The regression was introduced by r230647.
>>>
>>> Patch was tested in the series with the rest and the compiler bootstraps
>>> successfully.
>>>
>>> Ready for trunk?
>>
>> Hi Martin,
>> Thanks for fixing my issue.  The IVO part of patch is OK.
>> Just for me to understand, iv_common_cand is freed via free_ptr_hash,
>> and thus typed_free_remove.  So what leaks is the iv_use * vector in
>> struct iv_common_cand, right?  I did forget to free that.
>
> Hi.
>
> You are right, the suggested patch uses delete operator for deallocation of 
> iv_common_cand
> structure. That eventually calls dtor of auto_vec.
>
>> BTW, how do you monitor memory use in GCC, maybe I can run same test
>> for my future patches.
>
> I've been working on removal of memory leaks using valgrind, just configure 
> the compiler with
> '--enable-valgrind-annotations' and run for instance:
>
> valgrind --leak-check=yes --trace-children=yes ./gcc/xgcc -Bgcc 
> ../gcc/testsuite/gcc.dg/tree-ssa/loop-32.c -c -O2
>
> Producing:
> ...
> ==13919== 216 bytes in 3 blocks are definitely lost in loss record 679 of 795
> ==13919==at 0x4C2A00F: malloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==13919==by 0x107CEDF: xrealloc (xmalloc.c:178)
> ==13919==by 0xAC46AA: reserve (vec.h:288)
> ==13919==by 0xAC46AA: reserve (vec.h:1406)
> ==13919==by 0xAC46AA: reserve_exact (vec.h:1426)
> ==13919==by 0xAC46AA: create (vec.h:1441)
> ==13919==by 0xAC46AA: record_common_cand(ivopts_data*, tree_node*, 
> tree_node*, iv_use*) (tree-ssa-loop-ivopts.c:3133)
> ==13919==by 0xAC49C5: add_iv_candidate_for_use(ivopts_data*, iv_use*) 
> (tree-ssa-loop-ivopts.c:3220)
> ==13919==by 0xAC4EA2: add_iv_candidate_for_uses 
> (tree-ssa-loop-ivopts.c:3294)
> ==13919==by 0xAC4EA2: find_iv_candidates(ivopts_data*) 
> (tree-ssa-loop-ivopts.c:5705)
> ==13919==by 0xAC839D: tree_ssa_iv_optimize_loop 
> (tree-ssa-loop-ivopts.c:7708)
> ==13919==by 0xAC839D: tree_ssa_iv_optimize() (tree-ssa-loop-ivopts.c:7758)
> ==13919==by 0xADE4D0: (anonymous 
> namespace)::pass_iv_optimize::execute(function*) (tree-ssa-loop.c:520)
> ==13919==by 0x920033: execute_one_pass(opt_pass*) (passes.c:2335)
> ==13919==by 0x920547: execute_pass_list_1(opt_pass*) [clone 
> .constprop.84] (passes.c:2408)
> ==13919==by 0x920559: execute_pass_list_1(opt_pass*) [clone 
> .constprop.84] (passes.c:2409)
> ==13919==by 0x920559: execute_pass_list_1(opt_pass*) [clone 
> .constprop.84] (passes.c:2409)
> ==13919==by 0x9205A4: execute_pass_list(function*, opt_pass*) 
> (passes.c:2419)
> ...

Thanks for explanation, I will do that in future.

Thanks,
bin


Re: [patch] Copy-edit the Option Summary in invoke.texi

2015-11-27 Thread Bernd Schmidt

On 11/26/2015 01:16 PM, Jonathan Wakely wrote:

At https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html we document
-Waggressive-loop-optimizations but you can't find that option at
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html because we
document -Wno-aggressive-loop-optimizations instead. Similarly, you
can't find -Wpedantic-ms-format in the full listing, because we
document the negative form, -Wno-pedantic-ms-format, but list *both*
in the summary. This patches fixes those mistakes.


I'm guessing we want the negative form documented for anything that has 
Init(1)? Ok.



I've also tried to put the list back into alphabetical order, and
re-justified the list a bit to avoid some especially short lines (I
don't understand the inconsistent use of single or double spaces
between options, so if there's some logic to that I've not followed
it, but I think this is an improvement).


No idea about single vs double spaces. One line has two @gol which seems 
like an error, you can fix that too if you like.



Bernd



Re: [PR67335] drop dummy zero from reverse VTA ops, fix infinite recursion

2015-11-27 Thread Jakub Jelinek
On Thu, Nov 26, 2015 at 09:45:06PM -0200, Alexandre Oliva wrote:
> VTA's cselib expression hashing compares expressions with the same
> hash before adding them to the hash table.  When there is a collision
> involving a self-referencing expression, we could get infinite
> recursion, in spite of the cycle breakers already in place.  The
> problem is currently latent in the trunk, because by chance we don't
> get a collision.
> 
> Such value cycles are often introduced by reverse_op; most often,
> they're indirect, and then value canonicalization takes care of the
> cycle, but if the reverse operation simplifies to the original value,
> we used to issue a (plus V (const_int 0)), because at some point
> adding a plain value V to a location list as a reverse_op equivalence
> caused other problems.
> 
> (Jakub, do you by any chance still remember what those problems were,
>  some 5+ years ago?)

I'm sorry, but I don't remember.  Perhaps it has been before some recursion
prevention has been added or whatever, maybe your own PR52001?

Have you checked if your patch results in any significant debug info quality
changes (say on cc1plus itself, using dwlocstat or just comparing
.debug_info/.debug_loc sizes)?

Jakub


[PATCH][2/2] Fix PR68553

2015-11-27 Thread Richard Biener

This detects 1:1 permutations and avoids asking the target if it can
create those as well as generating VEC_PERM_EXPR in the vectorized code.

Bootstrap / regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2015-11-27  Richard Biener  

PR tree-optimization/68553
* tree-vect-slp.c (vect_create_mask_and_perm): Skip VEC_PERM_EXPR
generation for 1:1 permutations.
(vect_transform_slp_perm_load): Detect 1:1 permutations.

Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 230993)
--- gcc/tree-vect-slp.c (working copy)
*** vect_create_mask_and_perm (gimple *stmt,
*** 3224,3235 
first_vec = dr_chain[first_vec_indx];
second_vec = dr_chain[second_vec_indx];
  
!   /* Generate the permute statement.  */
!   perm_stmt = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
!  first_vec, second_vec, mask);
!   data_ref = make_ssa_name (perm_dest, perm_stmt);
!   gimple_set_lhs (perm_stmt, data_ref);
!   vect_finish_stmt_generation (stmt, perm_stmt, gsi);
  
/* Store the vector statement in NODE.  */
SLP_TREE_VEC_STMTS (node)[stride_out * i + vect_stmts_counter]
--- 3253,3270 
first_vec = dr_chain[first_vec_indx];
second_vec = dr_chain[second_vec_indx];
  
!   /* Generate the permute statement if necessary.  */
!   if (mask)
!   {
! perm_stmt = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
!  first_vec, second_vec, mask);
! data_ref = make_ssa_name (perm_dest, perm_stmt);
! gimple_set_lhs (perm_stmt, data_ref);
! vect_finish_stmt_generation (stmt, perm_stmt, gsi);
!   }
!   else
!   /* If mask was NULL_TREE generate the requested identity transform.  */
!   perm_stmt = SSA_NAME_DEF_STMT (first_vec);
  
/* Store the vector statement in NODE.  */
SLP_TREE_VEC_STMTS (node)[stride_out * i + vect_stmts_counter]
*** vect_transform_slp_perm_load (slp_tree n
*** 3315,3320 
--- 3350,3356 
int index = 0;
int first_vec_index = -1;
int second_vec_index = -1;
+   bool noop_p = true;
  
for (int j = 0; j < unroll_factor; j++)
  {
*** vect_transform_slp_perm_load (slp_tree n
*** 3351,3361 
  
  gcc_assert (mask_element >= 0
  && mask_element < 2 * nunits);
  mask[index++] = mask_element;
  
  if (index == nunits)
{
! if (!can_vec_perm_p (mode, false, mask))
{
  if (dump_enabled_p ())
{
--- 3387,3400 
  
  gcc_assert (mask_element >= 0
  && mask_element < 2 * nunits);
+ if (mask_element != index)
+   noop_p = false;
  mask[index++] = mask_element;
  
  if (index == nunits)
{
! if (! noop_p
! && ! can_vec_perm_p (mode, false, mask))
{
  if (dump_enabled_p ())
{
*** vect_transform_slp_perm_load (slp_tree n
*** 3371,3381 
  
  if (!analyze_only)
{
! tree mask_vec, *mask_elts;
! mask_elts = XALLOCAVEC (tree, nunits);
! for (int l = 0; l < nunits; ++l)
!   mask_elts[l] = build_int_cst (mask_element_type, mask[l]);
! mask_vec = build_vector (mask_type, mask_elts);
  
  if (second_vec_index == -1)
second_vec_index = first_vec_index;
--- 3410,3425 
  
  if (!analyze_only)
{
! tree mask_vec = NULL_TREE;
! 
! if (! noop_p)
!   {
! tree *mask_elts = XALLOCAVEC (tree, nunits);
! for (int l = 0; l < nunits; ++l)
!   mask_elts[l] = build_int_cst (mask_element_type,
! mask[l]);
! mask_vec = build_vector (mask_type, mask_elts);
!   }
  
  if (second_vec_index == -1)
second_vec_index = first_vec_index;
*** vect_transform_slp_perm_load (slp_tree n
*** 3388,3393 
--- 3432,3438 
  index = 0;
  first_vec_index = -1;
  second_vec_index = -1;
+ noop_p = true;
}
}
  }


regrename/i386: ROP vs df and stack-regs

2015-11-27 Thread Bernd Schmidt
This is a patch for PRs 68471 and 68472, which show problems with the 
ROP mitigation:

 * reg-stack doesn't call df_insn_update when it makes changes, and
   if df checking is enabled, any subsequent df_analyze call will
   abort
 * Using -mcmodel=medium fails because of a pattern that has lea type
   and needs its modrm_class overridden.

Both of these are fixed in the i386 backend. As a further safety 
measure, I've added some extra code to regrename to ignore stack regs 
after regstack_complete - they can't be dealt with anymore.


Bootstrapped and tested on x86_64-linux, with -mmitigate-rop forced on. Ok?


Bernd
	PR target/68471
	PR target/68472
	* config/i386/i386.c (ix86_mitigate_rop): Don't call
	compute_bb_for_insn again.  Call df_insn_rescan_all.
	* config/i386/i386.md (set_got_rex64): Override modrm_class.

	* regrename.c (build_def_use): Ignore stack regs if regstack_completed.

testsuite/
	* gcc.target/i386/rop1.c: New test.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2ac6c25..14c99eb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -45243,8 +45243,9 @@ ix86_mitigate_rop (void)
   COPY_HARD_REG_SET (inout_risky, input_risky);
   IOR_HARD_REG_SET (inout_risky, output_risky);
 
-  compute_bb_for_insn ();
   df_note_add_problem ();
+  /* Fix up what stack-regs did.  */
+  df_insn_rescan_all ();
   df_analyze ();
 
   regrename_init (true);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a57d165..671580d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12418,6 +12418,7 @@
   "lea{q}\t{_GLOBAL_OFFSET_TABLE_(%%rip), %0|%0, _GLOBAL_OFFSET_TABLE_[rip]}"
   [(set_attr "type" "lea")
(set_attr "length_address" "4")
+   (set_attr "modrm_class" "unknown")
(set_attr "mode" "DI")])
 
 (define_insn "set_rip_rex64"
diff --git a/gcc/regrename.c b/gcc/regrename.c
index e2a1e83..47c8dfa 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -1685,6 +1685,13 @@ build_def_use (basic_block bb)
 		  && !verify_reg_tracked (op))
 		create_new_chain (REGNO (op), REG_NREGS (op), NULL, NULL,
   NO_REGS);
+#ifdef STACK_REGS
+	  if (regstack_completed
+		  && REG_P (recog_data.operand[i])
+		  && IN_RANGE (REGNO (recog_data.operand[i]),
+			   FIRST_STACK_REG, LAST_STACK_REG))
+		untracked_operands |= 1 << i;
+#endif
 	}
 
 	  if (fail_current_block)
--- /dev/null	2015-11-23 12:05:22.553607702 +0100
+++ gcc/testsuite/gcc.target/i386/rop1.c	2015-11-24 15:40:04.381086953 +0100
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-mcmodel=medium -mmitigate-rop" } */
+void
+foo (void)
+{
+}


Re: [PATCH 1/6] Fix memory leak in cilk

2015-11-27 Thread Bernd Schmidt

On 11/26/2015 09:59 PM, Martin Liška wrote:

I'm sending v2 of the patch, where I removed adding of 'const' to
certain function arguments.
Apart from that, I found one more leak related to cilk. As I've retested
in valgrind, there
should not be any memory leak related to cilk.

Ready to be installed?


Already improved, wasn't it? (Please don't quote the entire previous 
patch next time).



Bernd



Re: [AArch64] Rework ARMv8.1 command line options.

2015-11-27 Thread Matthew Wahab

On 24/11/15 15:22, James Greenhalgh wrote:
> On Mon, Nov 16, 2015 at 04:31:32PM +, Matthew Wahab wrote:
>>
>> The command line options for target selection allow ARMv8.1 extensions
>> to be individually enabled/disabled. They also allow the extensions to
>> be enabled with -march=armv8-a. This doesn't reflect the ARMv8.1
>> architecture which requires all extensions to be enabled and doesn't make
>> them available for ARMv8.
>>
>> This patch removes the options for the individual ARMv8.1 extensions
>> except for +lse. This means that setting -march=armv8.1-a will enable
>> all extensions required by ARMv8.1 and that the ARMv8.1 extensions can't
>> be used with -march=armv8.

> I think I mentioned it in another review, but this patch seems a good place
> to solve the problem. Could you please update the documentation to explain
> what you've written above. As it stands I find myself confused by which
> features GCC will make available at -march=armv8-a and -march=armv8.1-a.

Attached is a patch with the documentation for the AArch64 -march option
reworked to try to make it clearer what the -march=armv8.1-a option will
do. Extensions with feature modifiers (+crc, +lse) are explicitly stated
as being enabled by -march=armv8.1-a. Extensions without feature
modifiers (RDMA, PAN, LOR) are treated as part of the generic 'ARMv8.1
architecture extension' term in the description of -march=armv8.1-a.

I've also rearranged the -march section, to put the description of the
values for -march together and reworded the description of the
-march=native option.

Matthew

2015-11-26  Matthew Wahab  

* config/aarch64/aarch64-options-extensions.def: Remove
AARCH64_FL_RDMA from "fp" and "simd".  Remove "pan", "lor",
"rdma".
* config/aarch64/aarch64.h (AARCH64_FL_PAN): Remove.
(AARCH64_FL_LOR): Remove.
(AARCH64_FL_RDMA): Remove.
(AARCH64_FL_V8_1): New.
(AARCH64_FL_FOR_AARCH8_1): Replace AARCH64_FL_PAN, AARCH64_FL_LOR
and AARCH64_FL_RDMA with AARCH64_FL_V8_1.
(AARCH64_ISA_RDMA): Replace AARCH64_FL_RDMA with AARCH64_FL_V8_1.
* doc/invoke.texi (AArch64 -march): Rewrite initial paragraph and
section on -march=native.  Group descriptions of permitted
architecture names together.  Expand description of
-march=armv8.1-a.
(AArch64 -mtune): Slightly rework section on -march=native.
(AArch64 -mcpu): Slightly rework section on -march=native.
(AArch64 Feature Modifiers): Remove "pan", "lor" and "rdma".
State that -march=armv8.1-a enables "crc" and "lse".

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index b261a0f7c3c6f5264fe4f95c85a59535aa951ce4..4f1d53515a9a4ff8920fadb13164c85e39990db5 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -34,11 +34,10 @@
should contain a whitespace-separated list of the strings in 'Features'
that are required.  Their order is not important.  */
 
-AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, "fp")
-AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA,   "asimd")
+AARCH64_OPT_EXTENSION ("fp", AARCH64_FL_FP,
+		   AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO, "fp")
+AARCH64_OPT_EXTENSION ("simd", AARCH64_FL_FPSIMD,
+		   AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "asimd")
 AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO,   "aes pmull sha1 sha2")
 AARCH64_OPT_EXTENSION("crc",	AARCH64_FL_CRC, AARCH64_FL_CRC,"crc32")
 AARCH64_OPT_EXTENSION("lse",	AARCH64_FL_LSE, AARCH64_FL_LSE,"lse")
-AARCH64_OPT_EXTENSION("pan",	AARCH64_FL_PAN,		AARCH64_FL_PAN,		"pan")
-AARCH64_OPT_EXTENSION("lor",	AARCH64_FL_LOR,		AARCH64_FL_LOR,		"lor")
-AARCH64_OPT_EXTENSION("rdma",	AARCH64_FL_RDMA | AARCH64_FL_FPSIMD,	AARCH64_FL_RDMA,	"rdma")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 68c006fa91f6326140cf447c7f4578ac46c24f79..06345f0215ea190b7b089264a0039a201437ecec 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -134,9 +134,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_CRC(1 << 3)	/* Has CRC.  */
 /* ARMv8.1 architecture extensions.  */
 #define AARCH64_FL_LSE	  (1 << 4)  /* Has Large System Extensions.  */
-#define AARCH64_FL_PAN	  (1 << 5)  /* Has Privileged Access Never.  */
-#define AARCH64_FL_LOR	  (1 << 6)  /* Has Limited Ordering regions.  */
-#define AARCH64_FL_RDMA	  (1 << 7)  /* Has ARMv8.1 Adv.SIMD.  */
+#define AARCH64_FL_V8_1	  (1 << 5)  /* Has ARMv8.1 extensions.  */
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP |

Re: basic asm and memory clobbers

2015-11-27 Thread Bernd Edlinger
Hi,


On Tue, 17 Nov 2015 14:31:29, Jeff Law wrote:
> The benefit is traditional asms do the expected thing. With no way to 
> describe dataflow, the only rational behaviour for a traditional asm is that 
> it has to be considered a
use/clobber of memory and hard registers.


I'd like to mention here, that there is also another use-case for a basic asms:

It is most often used as a fairly portable memory barrier like this:

x = 1;
asm(""); // memory barrier
y = 2;

that is also the reason why every basic asm is implicitly a volatile asm.


Bernd.

Re: regrename/i386: ROP vs df and stack-regs

2015-11-27 Thread Eric Botcazou
> Both of these are fixed in the i386 backend. As a further safety
> measure, I've added some extra code to regrename to ignore stack regs
> after regstack_complete - they can't be dealt with anymore.

+#ifdef STACK_REGS
+ if (regstack_completed
+ && REG_P (recog_data.operand[i])
+ && IN_RANGE (REGNO (recog_data.operand[i]),
+  FIRST_STACK_REG, LAST_STACK_REG))
+   untracked_operands |= 1 << i;
+#endif

Why not use "op" instead of recog_data.operand[i] here?  Don't this need to be 
placed before the conditional call to create_new_chain?

-- 
Eric Botcazou


Re: regrename/i386: ROP vs df and stack-regs

2015-11-27 Thread Bernd Schmidt

On 11/27/2015 10:26 AM, Eric Botcazou wrote:


+#ifdef STACK_REGS
+ if (regstack_completed
+ && REG_P (recog_data.operand[i])
+ && IN_RANGE (REGNO (recog_data.operand[i]),
+  FIRST_STACK_REG, LAST_STACK_REG))
+   untracked_operands |= 1 << i;
+#endif

Why not use "op" instead of recog_data.operand[i] here?  Don't this need to be
placed before the conditional call to create_new_chain?


Both good points. Ok with those changes (will retest)?


Bernd



[PATCH] Fix PR68470, ICE in IPA split

2015-11-27 Thread Richard Biener

This fixes the case where we split the returning part of a function
and keep the non-returning part as main.  We were keeping the return
block with stmts not relevant to main in the return path of the
split which obviously doesn't work as it may use SSA names no longer
defined (but split out).

The following patch detects the situation and pretends the exit
block was found as EXIT_BLOCK_FOR_FN in this case.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

Honza, does this look ok?

Thanks,
Richard.

2015-11-27  Richard Biener  

PR ipa/68470
* ipa-split.c (split_function): Handle main part not returning.

* g++.dg/torture/pr68470.C: New testcase.

Index: gcc/ipa-split.c
===
--- gcc/ipa-split.c (revision 230998)
+++ gcc/ipa-split.c (working copy)
@@ -1205,7 +1205,6 @@ split_function (basic_block return_bb, s
   edge e;
   edge_iterator ei;
   tree retval = NULL, real_retval = NULL, retbnd = NULL;
-  bool split_part_return_p = false;
   bool with_bounds = chkp_function_instrumented_p (current_function_decl);
   gimple *last_stmt = NULL;
   unsigned int i;
@@ -1246,12 +1245,16 @@ split_function (basic_block return_bb, s
args_to_pass.safe_push (arg);
   }
 
-  /* See if the split function will return.  */
+  /* See if the split function or the main part will return.  */
+  bool main_part_return_p = false;
+  bool split_part_return_p = false;
   FOR_EACH_EDGE (e, ei, return_bb->preds)
-if (bitmap_bit_p (split_point->split_bbs, e->src->index))
-  break;
-  if (e)
-split_part_return_p = true;
+{
+  if (bitmap_bit_p (split_point->split_bbs, e->src->index))
+   split_part_return_p = true;
+  else
+   main_part_return_p = true;
+}
 
   /* Add return block to what will become the split function.
  We do not return; no return block is needed.  */
@@ -1295,6 +1298,11 @@ split_function (basic_block return_bb, s
   else
 bitmap_set_bit (split_point->split_bbs, return_bb->index);
 
+  /* If the main part doesn't return pretend the return block wasn't
+ found for all of the following.  */
+  if (! main_part_return_p)
+return_bb = EXIT_BLOCK_PTR_FOR_FN (cfun);
+
   /* If RETURN_BB has virtual operand PHIs, they must be removed and the
  virtual operand marked for renaming as we change the CFG in a way that
  tree-inline is not able to compensate for.
Index: gcc/testsuite/g++.dg/torture/pr68470.C
===
--- gcc/testsuite/g++.dg/torture/pr68470.C  (revision 0)
+++ gcc/testsuite/g++.dg/torture/pr68470.C  (working copy)
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+
+void deallocate(void *);
+void *a;
+
+struct C {
+virtual void m_fn1();
+};
+
+struct D {
+C *m_fn2() {
+   if (a)
+ __builtin_abort();
+}
+};
+D getd();
+
+struct vec_int {
+int _M_start;
+~vec_int() {
+   if (_M_start)
+ deallocate(&_M_start);
+}
+};
+vec_int *b;
+
+struct I {
+virtual void m_fn3();
+};
+
+void I::m_fn3() {
+if (a)
+  getd().m_fn2()->m_fn1();
+b->~vec_int();
+}
+


Re: [PATCH 1/6] Fix memory leak in cilk

2015-11-27 Thread Martin Liška
On 11/27/2015 10:21 AM, Bernd Schmidt wrote:
> Already improved, wasn't it? (Please don't quote the entire previous patch 
> next time).
> 
> 
> Bernd

Thanks, I just applied nits pointed by you.
Patch installed as r231001.

Thank you,
Martin


Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"

2015-11-27 Thread Renlin Li

Hi Ramana,

On 16/10/15 14:54, Renlin Li wrote:



The command line implies we remove r7 (frame pointer in Thumb2 - 
historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 
(-mpic-register) which

leaves us with:

* r0, r1
* r2, r3
* r4, r5

as the only free registers available for DImode values for the whole 
compilation.


We then have r0, r1 and r2 live across the insn which means that 
there are no free registers to handle DImode values
under the constraints provided unless LRA / reload can spill the 
argument registers which it doesn't seem to be able to do

in this particular testcase. Vlad, is that correct ?
According to the logic, conflict hard register are excluded from spill 
candidate. That's why, in this case, r0, r1, r2 cannot be used.



In the test case, there are code structure like this.


uint64_t callee (int a, int b, int c, int d);
uint64_t caller (int a, int b, int c, int d)
{
  uint64_t res;
/*
single BB contains complicated data processing which requires register pair
*/

  res = callee (tmp, b ,c, d);
  return res;
}

CES pass in this case will extend the hard register live range across 
the whole BB until the callee. In this case, r1, r2, r3 are excluded 
from allocatable registers.


There are places in CES which prevents extending the hard register's 
live range, for example for hard register which fullfil 
small_register_classes_for_mode_p(), class_likely_spilled_p(). However, 
argument registers belong to neither of them.


I tried to stop CES from extending argument registers live range. 
However, later, scheduler jumps in and re-orders the instruction to 
reduce the pseudo register pressure, which in effect extend the argument 
register live again.


Regards,

Renlin Li





Re: regrename/i386: ROP vs df and stack-regs

2015-11-27 Thread Eric Botcazou
> Both good points. Ok with those changes (will retest)?

Yes, thanks.

-- 
Eric Botcazou


Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68506: Fix emitting order of insns in IF-THEN-JOIN case

2015-11-27 Thread Kyrill Tkachov


On 26/11/15 16:54, Kyrill Tkachov wrote:


On 26/11/15 16:49, Bernd Schmidt wrote:

On 11/26/2015 05:45 PM, Kyrill Tkachov wrote:

 that doesn't help, punt.  */

-  modified_in_a = emit_a != NULL_RTX && modified_in_p (orig_b, emit_a);
if (tmp_b && then_bb)
  {

These bits I thought would be part of a followup patch (which would also guard 
against single_set problems), and as I mentioned I'd rather have a checking 
assert.

Yes, you're right. I have the checking_assert statement in the followup that 
I've been testing.
I'll move the deletion of these two statements there as well to minimise the 
changes to this patch.

I'll move these bits to that patch, re-build cc1 and commit.



Here it is.
I'm committing this to trunk.

Thanks,
Kyrill

2015-11-26  Kyrylo Tkachov  

PR rtl-optimization/68506
* ifcvt.c (noce_try_cmove_arith): Try emitting the else basic block
first if emit_a exists or then_bb modifies 'b'.  Reindent if-else
blocks.

2015-11-26  Kyrylo Tkachov  

PR rtl-optimization/68506
* gcc.c-torture/execute/pr68506.c: New test.


Thanks for your guidance,
Kyrill


So take these deletions out and leave them for the followup, and the patch is 
ok everywhere. No need for a full retest given that practically the same patch 
has been tested already, just make sure you can build cc1.


Bernd





commit ba7633ec30e8e25d7dc1975893bf56eadf223404
Author: Kyrylo Tkachov 
Date:   Tue Nov 24 11:49:30 2015 +

PR rtl-optimization/68506: Fix emitting order of insns in IF-THEN-JOIN case

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index af7a3b9..3ce9fe6 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2220,40 +2220,38 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  }
 
 }
-if (emit_a && modified_in_a)
-  {
-	modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a, emit_b);
-	if (tmp_b && else_bb)
-	  {
-	FOR_BB_INSNS (else_bb, tmp_insn)
-	/* Don't check inside insn_b.  We will have changed it to emit_b
-	   with a destination that doesn't conflict.  */
-	  if (!(insn_b && tmp_insn == insn_b)
-		  && modified_in_p (orig_a, tmp_insn))
-		{
-		  modified_in_b = true;
-		  break;
-		}
-
-	  }
-	if (modified_in_b)
-	  goto end_seq_and_fail;
-
-	if (!noce_emit_bb (emit_b, else_bb, b_simple))
-	  goto end_seq_and_fail;
+  if (emit_a || modified_in_a)
+{
+  modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a, emit_b);
+  if (tmp_b && else_bb)
+	{
+	  FOR_BB_INSNS (else_bb, tmp_insn)
+	  /* Don't check inside insn_b.  We will have changed it to emit_b
+	 with a destination that doesn't conflict.  */
+	  if (!(insn_b && tmp_insn == insn_b)
+	  && modified_in_p (orig_a, tmp_insn))
+	{
+	  modified_in_b = true;
+	  break;
+	}
+	}
+  if (modified_in_b)
+	goto end_seq_and_fail;
 
-	if (!noce_emit_bb (emit_a, then_bb, a_simple))
-	  goto end_seq_and_fail;
-  }
-else
-  {
-	if (!noce_emit_bb (emit_a, then_bb, a_simple))
-	  goto end_seq_and_fail;
+  if (!noce_emit_bb (emit_b, else_bb, b_simple))
+	goto end_seq_and_fail;
 
-	if (!noce_emit_bb (emit_b, else_bb, b_simple))
-	  goto end_seq_and_fail;
+  if (!noce_emit_bb (emit_a, then_bb, a_simple))
+	goto end_seq_and_fail;
+}
+  else
+{
+  if (!noce_emit_bb (emit_a, then_bb, a_simple))
+	goto end_seq_and_fail;
 
-  }
+  if (!noce_emit_bb (emit_b, else_bb, b_simple))
+	goto end_seq_and_fail;
+}
 
   target = noce_emit_cmove (if_info, x, code, XEXP (if_info->cond, 0),
 			XEXP (if_info->cond, 1), a, b);
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68506.c b/gcc/testsuite/gcc.c-torture/execute/pr68506.c
new file mode 100644
index 000..15984ed
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr68506.c
@@ -0,0 +1,63 @@
+/* { dg-options "-fno-builtin-abort" } */
+
+int a, b, m, n, o, p, s, u, i;
+char c, q, y;
+short d;
+unsigned char e;
+static int f, h;
+static short g, r, v;
+unsigned t;
+
+extern void abort ();
+
+int
+fn1 (int p1)
+{
+  return a ? p1 : p1 + a;
+}
+
+unsigned char
+fn2 (unsigned char p1, int p2)
+{
+  return p2 >= 2 ? p1 : p1 >> p2;
+}
+
+static short
+fn3 ()
+{
+  int w, x = 0;
+  for (; p < 31; p++)
+{
+  s = fn1 (c | ((1 && c) == c));
+  t = fn2 (s, x);
+  c = (unsigned) c > -(unsigned) ((o = (m = d = t) == p) <= 4UL) && n;
+  v = -c;
+  y = 1;
+  for (; y; y++)
+	e = v == 1;
+  d = 0;
+  for (; h != 2;)
+	{
+	  for (;;)
+	{
+	  if (!m)
+		abort ();
+	  r = 7 - f;
+	  x = e = i | r;
+	  q = u * g;
+	  w = b == q;
+	  if (w)
+		break;
+	}
+	  break;
+	}
+}
+  return x;
+}
+
+int
+main ()
+{
+  fn3 ();
+  return 0;
+}


[PATCH][RTL-ifcvt] Reject insns that are multiple_sets

2015-11-27 Thread Kyrill Tkachov

Hi all,

As discussed, I've added a check for multiple_sets to insn_valid_noce_process_p 
and replaced the
modified_a and modified_b redundant definitions with
checking asserts to catch cases if any unexpected multiple
sets get through the net.

Bootstrapped and tested on arm, aarch64, x86_64.
As expected, I didn't see any codegen difference as it should a pretty rare
case for which we have no reported testcase.

Ok for trunk?

Thanks,
Kyrill

2015-11-27  Kyrylo Tkachov  

* ifcvt.c (insn_valid_noce_process_p): Reject insn if it satisfies
multiple_sets.
(noce_try_cmove_arith): Add checking asserts that orig_a and orig_b
are not modified by the final modified insns in the basic blocks
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 3ce9fe6082069361504268788c009eec722ac89b..ef3ebe359a3ef80965f9827c67a7cb6b0dca1442 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1880,6 +1880,7 @@ insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
 {
   if (!insn
   || !NONJUMP_INSN_P (insn)
+  || multiple_sets (insn)
   || (cc && set_of (cc, insn)))
   return false;
 
@@ -2206,7 +2207,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
swap insn that sets up A with the one that sets up B.  If even
that doesn't help, punt.  */
 
-  modified_in_a = emit_a != NULL_RTX && modified_in_p (orig_b, emit_a);
+  gcc_checking_assert (!emit_a || !modified_in_p (orig_b, emit_a));
   if (tmp_b && then_bb)
 {
   FOR_BB_INSNS (then_bb, tmp_insn)
@@ -,7 +2223,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 }
   if (emit_a || modified_in_a)
 {
-  modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a, emit_b);
+  gcc_checking_assert (!emit_b || !modified_in_p (orig_a, emit_b));
   if (tmp_b && else_bb)
 	{
 	  FOR_BB_INSNS (else_bb, tmp_insn)


Re: [PATCH][RTL-ifcvt] Reject insns that are multiple_sets

2015-11-27 Thread Bernd Schmidt

On 11/27/2015 10:45 AM, Kyrill Tkachov wrote:

As discussed, I've added a check for multiple_sets to
insn_valid_noce_process_p and replaced the
modified_a and modified_b redundant definitions with
checking asserts to catch cases if any unexpected multiple
sets get through the net.


Ok, thanks!


Bernd


Re: Remove noce_mem_write_may_trap_or_fault_p in ifcvt

2015-11-27 Thread Bernd Schmidt

On 11/26/2015 10:46 AM, Richard Biener wrote:


Ok with the change suggested by Micha for the asm()s.  Note that I
originally used gimple_vuse () instead of gimple_vdef () as even
reading random memory is a barrier for the compiler to move stores
across it (not reads, of course).  Which is why I also considered
pure (global memory reading) calls to be a barrier (for the stores).


Yes, but IIUC the stores aren't being moved, they may just be turned 
into unconditional ones. Being nontrapping is one necessary condition 
for that (which we already compute), but we also want to make sure that 
we don't introduce surprises for threaded programs.



Of course as we don't consider regular assign statement reads (or stores)
to be a "barrier" in the sense that matters here (we're not looking for
memory optimization barriers!) this might be moot and then the
middle-end will effectively require all synchronization barriers (which we
are looking for(?)) to appear as clobbering memory.


If I read this correctly you have reached the same conclusions. Test 
results came back ok (with vdef tested for asms), and I've committed the 
change.



Bernd


[PATCH] Fix PR68029

2015-11-27 Thread Jiří Engelthaler
Hi all,
  the attached patch fixes PR68029 where GCC -fdiagnostics_color
parameter was ignored if it was as first parameter. The problem is in
GCC 6.0 version only so should be applied to the trunk.


Jiří Engelthaler
2015-11-27  Jiří Engelthaler 

PR driver/68029
* opts-common.c (prune_options): fdiagnostics_color ignored
if it was as first parameter
 gcc/opts-common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/opts-common.c b/gcc/opts-common.c
index d9bf4d4..24967cc 100644
--- a/gcc/opts-common.c
+++ b/gcc/opts-common.c
@@ -885,7 +885,7 @@ keep:
}
 }
 
-  if (fdiagnostics_color_idx > 1)
+  if (fdiagnostics_color_idx >= 1)
 {
   /* We put the last -fdiagnostics-color= at the first position
 after argv[0] so it can take effect immediately.  */


[PATCH] Convert a test to GIMPLE

2015-11-27 Thread Marek Polacek
In the process of dealing with PR68513, it turned out this test should have
been written with GIMPLE in mind.

Tested on x86_64-linux, ok for trunk?  Maybe we'll want this even for 5, I
don't know yet.

2015-11-27  Marek Polacek  

* gcc.dg/pr63568.c: Convert to GIMPLE.

diff --git gcc/testsuite/gcc.dg/pr63568.c gcc/testsuite/gcc.dg/pr63568.c
index c6b88e7..5c688b0 100644
--- gcc/testsuite/gcc.dg/pr63568.c
+++ gcc/testsuite/gcc.dg/pr63568.c
@@ -1,53 +1,69 @@
 /* PR middle-end/63568 */
 /* { dg-do compile } */
-/* { dg-options "-fdump-tree-original" } */
+/* { dg-options "-O -fdump-tree-cddce1" } */
 
 int
 fn1 (int a, int b, int m)
 {
-  return (a & ~m) | (b & m);
+  int tem1 = a & ~m;
+  int tem2 = b & m;
+  return tem1 | tem2;
 }
 
 int
 fn2 (int a, int b, int m)
 {
-  return (a & ~m) | (m & b);
+  int tem1 = a & ~m;
+  int tem2 = m & b;
+  return tem1 | tem2;
 }
 
 int
 fn3 (int a, int b, int m)
 {
-  return (~m & a) | (m & b);
+  int tem1 = ~m & a;
+  int tem2 = m & b;
+  return tem1 | tem2;
 }
 
 int
 fn4 (int a, int b, int m)
 {
-  return (~m & a) | (b & m);
+  int tem1 = ~m & a;
+  int tem2 = b & m;
+  return tem1 | tem2;
 }
 
 int
 fn5 (int a, int b, int m)
 {
-  return (b & m) | (a & ~m);
+  int tem1 = b & m;
+  int tem2 = a & ~m;
+  return tem1 | tem2;
 }
 
 int
 fn6 (int a, int b, int m)
 {
-  return (m & b) | (a & ~m);
+  int tem1 = m & b;
+  int tem2 = a & ~m;
+  return tem1 | tem2;
 }
 
 int
 fn7 (int a, int b, int m)
 {
-  return (m & b) | (~m & a);
+  int tem1 = m & b;
+  int tem2 = ~m & a;
+  return tem1 | tem2;
 }
 
 int
 fn8 (int a, int b, int m)
 {
-  return (b & m) | (~m & a);
+  int tem1 = b & m;
+  int tem2 = ~m & a;
+  return tem1 | tem2;
 }
 
-/* { dg-final { scan-tree-dump-not " \\| " "original" } } */
+/* { dg-final { scan-tree-dump-not " \\| " "cddce1" } } */

Marek


Re: [PATCH] Fix PR68470, ICE in IPA split

2015-11-27 Thread Richard Biener
On Fri, 27 Nov 2015, Richard Biener wrote:

> 
> This fixes the case where we split the returning part of a function
> and keep the non-returning part as main.  We were keeping the return
> block with stmts not relevant to main in the return path of the
> split which obviously doesn't work as it may use SSA names no longer
> defined (but split out).
> 
> The following patch detects the situation and pretends the exit
> block was found as EXIT_BLOCK_FOR_FN in this case.
> 
> Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.
> 
> Honza, does this look ok?

Ok, fails for example

FAIL: gcc.dg/vect/pr51581-1.c execution test

where we split a region at a point the main part falls thru to
(an odd place to split...).  Happens when we split at a loop
header but the pre-header remains in the main part.

Adjusted patch as follows.

Richard.

2015-11-27  Richard Biener  

PR ipa/68470
* ipa-split.c (split_function): Handle main part not returning.

* g++.dg/torture/pr68470.C: New testcase.

Index: gcc/ipa-split.c
===
--- gcc/ipa-split.c (revision 230998)
+++ gcc/ipa-split.c (working copy)
@@ -1205,7 +1205,6 @@ split_function (basic_block return_bb, s
   edge e;
   edge_iterator ei;
   tree retval = NULL, real_retval = NULL, retbnd = NULL;
-  bool split_part_return_p = false;
   bool with_bounds = chkp_function_instrumented_p (current_function_decl);
   gimple *last_stmt = NULL;
   unsigned int i;
@@ -1246,12 +1245,28 @@ split_function (basic_block return_bb, s
args_to_pass.safe_push (arg);
   }
 
-  /* See if the split function will return.  */
+  /* See if the split function or the main part will return.  */
+  bool main_part_return_p = false;
+  bool split_part_return_p = false;
   FOR_EACH_EDGE (e, ei, return_bb->preds)
-if (bitmap_bit_p (split_point->split_bbs, e->src->index))
-  break;
-  if (e)
-split_part_return_p = true;
+{
+  if (bitmap_bit_p (split_point->split_bbs, e->src->index))
+   split_part_return_p = true;
+  else
+   main_part_return_p = true;
+}
+  /* The main part also returns if we we split on a fallthru edge
+ and the split part returns.  */
+  if (split_part_return_p)
+FOR_EACH_EDGE (e, ei, split_point->entry_bb->preds)
+  {
+   if (! bitmap_bit_p (split_point->split_bbs, e->src->index)
+   && single_succ_p (e->src))
+ {
+   main_part_return_p = true;
+   break;
+ }
+  }
 
   /* Add return block to what will become the split function.
  We do not return; no return block is needed.  */
@@ -1295,6 +1310,11 @@ split_function (basic_block return_bb, s
   else
 bitmap_set_bit (split_point->split_bbs, return_bb->index);
 
+  /* If the main part doesn't return pretend the return block wasn't
+ found for all of the following.  */
+  if (! main_part_return_p)
+return_bb = EXIT_BLOCK_PTR_FOR_FN (cfun);
+
   /* If RETURN_BB has virtual operand PHIs, they must be removed and the
  virtual operand marked for renaming as we change the CFG in a way that
  tree-inline is not able to compensate for.
Index: gcc/testsuite/g++.dg/torture/pr68470.C
===
--- gcc/testsuite/g++.dg/torture/pr68470.C  (revision 0)
+++ gcc/testsuite/g++.dg/torture/pr68470.C  (working copy)
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+
+void deallocate(void *);
+void *a;
+
+struct C {
+virtual void m_fn1();
+};
+
+struct D {
+C *m_fn2() {
+   if (a)
+ __builtin_abort();
+}
+};
+D getd();
+
+struct vec_int {
+int _M_start;
+~vec_int() {
+   if (_M_start)
+ deallocate(&_M_start);
+}
+};
+vec_int *b;
+
+struct I {
+virtual void m_fn3();
+};
+
+void I::m_fn3() {
+if (a)
+  getd().m_fn2()->m_fn1();
+b->~vec_int();
+}
+


Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"

2015-11-27 Thread Ramana Radhakrishnan


On 27/11/15 09:40, Renlin Li wrote:
> Hi Ramana,
> 
> On 16/10/15 14:54, Renlin Li wrote:
>>
>>
>>> The command line implies we remove r7 (frame pointer in Thumb2 - historical 
>>> accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) 
>>> which
>>> leaves us with:
>>>
>>> * r0, r1
>>> * r2, r3
>>> * r4, r5
>>>
>>> as the only free registers available for DImode values for the whole 
>>> compilation.
>>>
>>> We then have r0, r1 and r2 live across the insn which means that there are 
>>> no free registers to handle DImode values
>>> under the constraints provided unless LRA / reload can spill the argument 
>>> registers which it doesn't seem to be able to do
>>> in this particular testcase. Vlad, is that correct ?
>> According to the logic, conflict hard register are excluded from spill 
>> candidate. That's why, in this case, r0, r1, r2 cannot be used.
> 
> 
> In the test case, there are code structure like this.
> 
> 
> uint64_t callee (int a, int b, int c, int d);
> uint64_t caller (int a, int b, int c, int d)
> {
>   uint64_t res;
> /*
> single BB contains complicated data processing which requires register pair
> */
> 
>   res = callee (tmp, b ,c, d);
>   return res;
> }
> 
> CES pass in this case will extend the hard register live range across the 
> whole BB until the callee. In this case, r1, r2, r3 are excluded from 
> allocatable registers.
> 
> There are places in CES which prevents extending the hard register's live 
> range, for example for hard register which fullfil 
> small_register_classes_for_mode_p(), class_likely_spilled_p(). However, 
> argument registers belong to neither of them.
> 
> I tried to stop CES from extending argument registers live range. However, 
> later, scheduler jumps in and re-orders the instruction to reduce the pseudo 
> register pressure, which in effect extend the argument register live again.

Thanks for digging further and trying to figure out what the solution was. I 
can't think of a less risky fix than what you have proposed, thus Ok if no 
regressions.


regards
Ramana





> 
> Regards,
> 
> Renlin Li
> 
> 
> 


[PATCH] Fix oacc kernels default mapping for scalars

2015-11-27 Thread Tom de Vries

Hi,

The OpenACC 2.0a standard says this about the default mapping for 
variables used in a kernels region:

...
An array or variable of aggregate data type referenced in the kernels 
construct that does not appear in a data clause for the construct or
any enclosing data construct will be treated as if it appeared in a 
present_or_copy clause for the kernels construct.


A scalar variable referenced in the kernels construct that does not 
appear in a data clause for the construct or any enclosing data 
construct will be treated as if it appeared in a copy clause.

...

But atm, all variables including the scalar ones have 'present_or_copy' 
defaults.


This patch makes sure scalar variables get the 'copy' default.

Bootstrapped and reg-tested on x86_64. OK for stage3 trunk?

Thanks,
- Tom
Fix oacc kernels default mapping for scalars

2015-11-27  Tom de Vries  

	* gimplify.c (enum gimplify_omp_var_data): Add enum value
	GOVD_MAP_FORCE.
	(oacc_default_clause): Fix default for scalars in oacc kernels.
	(gimplify_adjust_omp_clauses_1): Handle GOVD_MAP_FORCE.

	* c-c++-common/goacc/kernels-default-2.c: New test.
	* c-c++-common/goacc/kernels-default.c: New test.

---
 gcc/gimplify.c   | 19 ++-
 gcc/testsuite/c-c++-common/goacc/kernels-default-2.c | 17 +
 gcc/testsuite/c-c++-common/goacc/kernels-default.c   | 14 ++
 3 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index fcac745..68d90bf 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -87,6 +87,9 @@ enum gimplify_omp_var_data
   /* Flag for GOVD_MAP, if it is always, to or always, tofrom mapping.  */
   GOVD_MAP_ALWAYS_TO = 65536,
 
+  /* Flag for GOVD_MAP, if it is a forced mapping.  */
+  GOVD_MAP_FORCE = 131072,
+
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
 			   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
 			   | GOVD_LOCAL)
@@ -5976,8 +5979,12 @@ oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags)
   gcc_unreachable ();
 
 case ORT_ACC_KERNELS:
-  /* Everything under kernels are default 'present_or_copy'.  */
+  /* Scalars are default 'copy' under kernels, non-scalars are default
+	 'present_or_copy'.  */
   flags |= GOVD_MAP;
+  if (!AGGREGATE_TYPE_P (TREE_TYPE (decl)))
+	flags |= GOVD_MAP_FORCE;
+
   rkind = "kernels";
   break;
 
@@ -7489,10 +7496,12 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
 }
   else if (code == OMP_CLAUSE_MAP)
 {
-  OMP_CLAUSE_SET_MAP_KIND (clause,
-			   flags & GOVD_MAP_TO_ONLY
-			   ? GOMP_MAP_TO
-			   : GOMP_MAP_TOFROM);
+  int kind = (flags & GOVD_MAP_TO_ONLY
+		  ? GOMP_MAP_TO
+		  : GOMP_MAP_TOFROM);
+  if (flags & GOVD_MAP_FORCE)
+	kind |= GOMP_MAP_FLAG_FORCE;
+  OMP_CLAUSE_SET_MAP_KIND (clause, kind);
   if (DECL_SIZE (decl)
 	  && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
 	{
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-default-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-default-2.c
new file mode 100644
index 000..232b123
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-default-2.c
@@ -0,0 +1,17 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+#define N 2
+
+void
+foo (void)
+{
+  unsigned int a[N];
+
+#pragma acc kernels
+  {
+a[0]++;
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "map\\(tofrom" 1 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-default.c b/gcc/testsuite/c-c++-common/goacc/kernels-default.c
new file mode 100644
index 000..58cd5e1
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-default.c
@@ -0,0 +1,14 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+void
+foo (void)
+{
+  unsigned int i;
+#pragma acc kernels
+  {
+i++;
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "map\\(force_tofrom" 1 "gimple" } } */


Re: [PATCH] Convert a test to GIMPLE

2015-11-27 Thread Richard Biener
On Fri, Nov 27, 2015 at 11:41 AM, Marek Polacek  wrote:
> In the process of dealing with PR68513, it turned out this test should have
> been written with GIMPLE in mind.
>
> Tested on x86_64-linux, ok for trunk?  Maybe we'll want this even for 5, I
> don't know yet.

Ok.

Richard.

> 2015-11-27  Marek Polacek  
>
> * gcc.dg/pr63568.c: Convert to GIMPLE.
>
> diff --git gcc/testsuite/gcc.dg/pr63568.c gcc/testsuite/gcc.dg/pr63568.c
> index c6b88e7..5c688b0 100644
> --- gcc/testsuite/gcc.dg/pr63568.c
> +++ gcc/testsuite/gcc.dg/pr63568.c
> @@ -1,53 +1,69 @@
>  /* PR middle-end/63568 */
>  /* { dg-do compile } */
> -/* { dg-options "-fdump-tree-original" } */
> +/* { dg-options "-O -fdump-tree-cddce1" } */
>
>  int
>  fn1 (int a, int b, int m)
>  {
> -  return (a & ~m) | (b & m);
> +  int tem1 = a & ~m;
> +  int tem2 = b & m;
> +  return tem1 | tem2;
>  }
>
>  int
>  fn2 (int a, int b, int m)
>  {
> -  return (a & ~m) | (m & b);
> +  int tem1 = a & ~m;
> +  int tem2 = m & b;
> +  return tem1 | tem2;
>  }
>
>  int
>  fn3 (int a, int b, int m)
>  {
> -  return (~m & a) | (m & b);
> +  int tem1 = ~m & a;
> +  int tem2 = m & b;
> +  return tem1 | tem2;
>  }
>
>  int
>  fn4 (int a, int b, int m)
>  {
> -  return (~m & a) | (b & m);
> +  int tem1 = ~m & a;
> +  int tem2 = b & m;
> +  return tem1 | tem2;
>  }
>
>  int
>  fn5 (int a, int b, int m)
>  {
> -  return (b & m) | (a & ~m);
> +  int tem1 = b & m;
> +  int tem2 = a & ~m;
> +  return tem1 | tem2;
>  }
>
>  int
>  fn6 (int a, int b, int m)
>  {
> -  return (m & b) | (a & ~m);
> +  int tem1 = m & b;
> +  int tem2 = a & ~m;
> +  return tem1 | tem2;
>  }
>
>  int
>  fn7 (int a, int b, int m)
>  {
> -  return (m & b) | (~m & a);
> +  int tem1 = m & b;
> +  int tem2 = ~m & a;
> +  return tem1 | tem2;
>  }
>
>  int
>  fn8 (int a, int b, int m)
>  {
> -  return (b & m) | (~m & a);
> +  int tem1 = b & m;
> +  int tem2 = ~m & a;
> +  return tem1 | tem2;
>  }
>
> -/* { dg-final { scan-tree-dump-not " \\| " "original" } } */
> +/* { dg-final { scan-tree-dump-not " \\| " "cddce1" } } */
>
> Marek


Re: [PATCH] Convert a test to GIMPLE

2015-11-27 Thread Jakub Jelinek
On Fri, Nov 27, 2015 at 12:30:24PM +0100, Richard Biener wrote:
> On Fri, Nov 27, 2015 at 11:41 AM, Marek Polacek  wrote:
> > In the process of dealing with PR68513, it turned out this test should have
> > been written with GIMPLE in mind.
> >
> > Tested on x86_64-linux, ok for trunk?  Maybe we'll want this even for 5, I
> > don't know yet.
> 
> Ok.

I think it is just fine for 5 branch too.
You could even use int tem3 = ~m; and use tem3 instead of ~m if you really
wanted to test whether GIMPLE folding works.
> 
> > 2015-11-27  Marek Polacek  
> >
> > * gcc.dg/pr63568.c: Convert to GIMPLE.
> >
> > diff --git gcc/testsuite/gcc.dg/pr63568.c gcc/testsuite/gcc.dg/pr63568.c
> > index c6b88e7..5c688b0 100644
> > --- gcc/testsuite/gcc.dg/pr63568.c
> > +++ gcc/testsuite/gcc.dg/pr63568.c
> > @@ -1,53 +1,69 @@
> >  /* PR middle-end/63568 */
> >  /* { dg-do compile } */
> > -/* { dg-options "-fdump-tree-original" } */
> > +/* { dg-options "-O -fdump-tree-cddce1" } */
> >
> >  int
> >  fn1 (int a, int b, int m)
> >  {
> > -  return (a & ~m) | (b & m);
> > +  int tem1 = a & ~m;
> > +  int tem2 = b & m;
> > +  return tem1 | tem2;
> >  }
> >
> >  int
> >  fn2 (int a, int b, int m)
> >  {
> > -  return (a & ~m) | (m & b);
> > +  int tem1 = a & ~m;
> > +  int tem2 = m & b;
> > +  return tem1 | tem2;
> >  }
> >
> >  int
> >  fn3 (int a, int b, int m)
> >  {
> > -  return (~m & a) | (m & b);
> > +  int tem1 = ~m & a;
> > +  int tem2 = m & b;
> > +  return tem1 | tem2;
> >  }
> >
> >  int
> >  fn4 (int a, int b, int m)
> >  {
> > -  return (~m & a) | (b & m);
> > +  int tem1 = ~m & a;
> > +  int tem2 = b & m;
> > +  return tem1 | tem2;
> >  }
> >
> >  int
> >  fn5 (int a, int b, int m)
> >  {
> > -  return (b & m) | (a & ~m);
> > +  int tem1 = b & m;
> > +  int tem2 = a & ~m;
> > +  return tem1 | tem2;
> >  }
> >
> >  int
> >  fn6 (int a, int b, int m)
> >  {
> > -  return (m & b) | (a & ~m);
> > +  int tem1 = m & b;
> > +  int tem2 = a & ~m;
> > +  return tem1 | tem2;
> >  }
> >
> >  int
> >  fn7 (int a, int b, int m)
> >  {
> > -  return (m & b) | (~m & a);
> > +  int tem1 = m & b;
> > +  int tem2 = ~m & a;
> > +  return tem1 | tem2;
> >  }
> >
> >  int
> >  fn8 (int a, int b, int m)
> >  {
> > -  return (b & m) | (~m & a);
> > +  int tem1 = b & m;
> > +  int tem2 = ~m & a;
> > +  return tem1 | tem2;
> >  }
> >
> > -/* { dg-final { scan-tree-dump-not " \\| " "original" } } */
> > +/* { dg-final { scan-tree-dump-not " \\| " "cddce1" } } */
> >
> > Marek

Jakub


[gomp4] Re: OpenACC declare directive updates

2015-11-27 Thread Thomas Schwinge
Hi!

On Thu, 19 Nov 2015 10:22:16 -0600, James Norris  
wrote:
> --- a/gcc/fortran/dump-parse-tree.c
> +++ b/gcc/fortran/dump-parse-tree.c

Don't you need to handle OMP_LIST_LINK in
gcc/fortran/dump-parse-tree.c:show_omp_clauses; OMP_LIST_DEVICE_RESIDENT
is being handled there (but maps to the wrong string?).  (See
gomp-4_0-branch.)  When touching that, please sort the "case OMP_LIST_*"s
corresponding to the order the OMP_LIST_* are defined in
gcc/fortran/gfortran.h.

> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c

I see OMP_LIST_DEVICE_RESIDENT being handled in
gcc/fortran/openmp.c:resolve_omp_clauses and
gcc/fortran/openmp.c:gfc_resolve_oacc_declare, but not OMP_LIST_LINK --
is that correct?  Likewise, in
gcc/fortran/trans-openmp.c:gfc_trans_omp_clauses.

Also, oacc_declare_device_resident is handled in a lot more places
compared to oacc_declare_link -- is that correct?  In fact, there doesn't
seem to be any "consumer" for the latter, but I see the OpenACC link
clause being used in the test cases you added, so I wonder how that
works.


Merging your trunk r230722 and r230725 with the existing Fortran OpenACC
declare implementation present on gomp-4_0-branch, I effectively applied
the following to gomp-4_0-branch in 231002.  Please verify this.

Regarding my Fortran XFAIL comments in
,
with some of my earlier changes "#if 0"ed in
gcc/fortran/trans-decl.c:add_attributes_to_decl,
libgomp.oacc-fortran/declare-3.f90 again PASSes.  But I don't understand
(why something like) this code (isn't needed/done differently in C/C++).
The XFAIL in libgomp.oacc-fortran/declare-1.f90 means to be resolved
(gomp-4_0-branch only; not seen on trunk): "libgomp: cuStreamSynchronize
error: an illegal memory access was encountered".

commit 95e909a492b001df6d6faffdfa6047a5e9919561
Merge: 8373bdf e18d05e
Author: tschwinge 
Date:   Fri Nov 27 09:41:03 2015 +

svn merge -r 230720:230725 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@231002 
138bc75d-0d04-0410-961f-82ee72b054a4

 gcc/fortran/ChangeLog  |  51 +
 gcc/fortran/gfortran.h |  17 +-
 gcc/fortran/openmp.c   | 235 +
 gcc/fortran/parse.c|   2 +-
 gcc/fortran/parse.h|   2 +-
 gcc/fortran/resolve.c  |   1 -
 gcc/fortran/st.c   |   2 +-
 gcc/fortran/symbol.c   |  12 +-
 gcc/fortran/trans-decl.c   | 198 +
 gcc/fortran/trans-openmp.c |  29 +--
 gcc/fortran/trans-stmt.c   |   3 +-
 gcc/testsuite/ChangeLog|   6 +
 gcc/testsuite/gfortran.dg/goacc/declare-1.f95  |   4 +-
 gcc/testsuite/gfortran.dg/goacc/declare-2.f95  |  43 +++-
 libgomp/ChangeLog  |   9 +
 .../testsuite/libgomp.oacc-fortran/declare-1.f90   |  13 ++
 .../testsuite/libgomp.oacc-fortran/declare-2.f90   |   2 +
 .../testsuite/libgomp.oacc-fortran/declare-3.f90   |   4 +-
 .../testsuite/libgomp.oacc-fortran/declare-4.f90   |   2 +
 .../testsuite/libgomp.oacc-fortran/declare-5.f90   |   1 +
 20 files changed, 347 insertions(+), 289 deletions(-)

[diff --git gcc/fortran/ChangeLog gcc/fortran/ChangeLog]
diff --git gcc/fortran/gfortran.h gcc/fortran/gfortran.h
index c8401cf..dd186b5 100644
--- gcc/fortran/gfortran.h
+++ gcc/fortran/gfortran.h
@@ -1250,17 +1250,18 @@ gfc_omp_clauses;
 
 #define gfc_get_omp_clauses() XCNEW (gfc_omp_clauses)
 
-/* Node in the linked list used for storing OpenACC declare constructs.  */
+
+/* Node in the linked list used for storing !$oacc declare constructs.  */
 
 typedef struct gfc_oacc_declare
 {
   struct gfc_oacc_declare *next;
-  locus where;
   bool module_var;
   gfc_omp_clauses *clauses;
-  gfc_omp_clauses *return_clauses;
+  locus loc;
 }
 gfc_oacc_declare;
+
 #define gfc_get_oacc_declare() XCNEW (gfc_oacc_declare)
 
 
@@ -1685,8 +1686,8 @@ typedef struct gfc_namespace
  this namespace.  */
   struct gfc_data *data, *old_data;
 
-  /* !$ACC DECLARE clauses.  */
-  struct gfc_oacc_declare *oacc_declare;
+  /* !$ACC DECLARE.  */
+  gfc_oacc_declare *oacc_declare;
 
   /* !$ACC ROUTINE clauses.  */
   gfc_omp_clauses *oacc_routine_clauses;
@@ -2455,8 +2456,8 @@ typedef struct gfc_code
 struct gfc_code *which_construct;
 int stop_code;
 gfc_entry_list *entry;
-gfc_omp_clauses *omp_clauses;
 gfc_oacc_declare *oacc_declare;
+gfc_omp_clauses *omp_clauses;
 const char *omp_name;
 gfc_omp_namelist *omp_namelist;
 bool omp_bool;
@@ -2958,7 +2959,7 @@ gfc_expr *gfc_get_parentheses (gfc_expr *);
 /* openmp.c */
 struct gfc_omp_saved_state { void *ptrs[2]; int ints[1];

[PATCH 1/7] S/390: Fix vrepi constraint letter.

2015-11-27 Thread Andreas Krebbel
gcc/ChangeLog:

2015-11-27  Andreas Krebbel  

* config/s390/vector.md ("*vec_splats"): Fix constraint
latter I->K.

gcc/testsuite/ChangeLog:

2015-11-27  Andreas Krebbel  

* gcc.target/s390/zvector/vec-splat-1.c: New test.
---
 gcc/config/s390/vector.md  |  4 +--
 .../gcc.target/s390/zvector/vec-splat-1.c  | 42 ++
 2 files changed, 44 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-splat-1.c

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 16276e0..d8b9b07 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -370,11 +370,11 @@
 
 (define_insn "*vec_splats"
   [(set (match_operand:V_HW  0 "register_operand" 
"=v,v,v,v")
-   (vec_duplicate:V_HW (match_operand: 1 "general_operand"  
"QR,I,v,d")))]
+   (vec_duplicate:V_HW (match_operand: 1 "general_operand"  
"QR,K,v,d")))]
   "TARGET_VX"
   "@
vlrep\t%v0,%1
-   vrepi\t%v0,%1
+   vrepi\t%v0,%h1
vrep\t%v0,%v1,0
#"
   [(set_attr "op_type" "VRX,VRI,VRI,*")])
diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec-splat-1.c 
b/gcc/testsuite/gcc.target/s390/zvector/vec-splat-1.c
new file mode 100644
index 000..bab2e2d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/vec-splat-1.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mzarch -march=z13 -mzvector" } */
+
+#include 
+
+vector signed char v16qi;
+vector short   v8hi;
+vector int v4si;
+vector long long   v2di;
+
+vector unsigned char  uv16qi;
+vector unsigned short uv8hi;
+vector unsigned int   uv4si;
+vector unsigned long long uv2di;
+
+int
+foo ()
+{
+  v16qi  = vec_splats ((signed char)0x77);
+  uv16qi = vec_splats ((unsigned char)0x77);
+
+  v8hi  = vec_splats ((short int)0x7f0f);
+  uv8hi = vec_splats ((unsigned short int)0x7f0f);
+
+  v4si  = vec_splats ((int)0x7f0f);
+  uv4si = vec_splats ((unsigned int)0x7f0f);
+
+  v2di  = vec_splats ((long long)0x7f0f);
+  uv2di = vec_splats ((unsigned long long)0x7f0f);
+}
+
+/* { dg-final { scan-assembler-times "vrepib\t%v.*,119" 1 } } */
+/* { dg-final { scan-assembler-times "vrepib\t%v.*,119" 1 } } */
+
+/* { dg-final { scan-assembler-times "vrepih\t%v.*,32527" 1 } } */
+/* { dg-final { scan-assembler-times "vrepih\t%v.*,32527" 1 } } */
+
+/* { dg-final { scan-assembler-times "vrepif\t%v.*,32527" 1 } } */
+/* { dg-final { scan-assembler-times "vrepif\t%v.*,32527" 1 } } */
+
+/* { dg-final { scan-assembler-times "vrepig\t%v.*,32527" 1 } } */
+/* { dg-final { scan-assembler-times "vrepig\t%v.*,32527" 1 } } */
-- 
2.3.0



[PATCH 0/7] S/390: Minor fixes and improvements for the vector builtins.

2015-11-27 Thread Andreas Krebbel
The patchset has been tested as a whole on current mainline on s390 and s390x.

I'll wait a couple of days for comments before committing them.

Bye,

-Andreas-

Andreas Krebbel (7):
  S/390: Fix vrepi constraint letter.
  S/390: Enable vrepi constants.
  S/390: Fix RT flag in vstrc instruction.
  S/390: Sort builtin types - cleanup only.
  S/390: Fix vec_splat_* builtins.
  S/390: vec_set mode DI->SI for shift_count
  S/390: Make constant checking more strict

 gcc/config/s390/constraints.md |   46 +-
 gcc/config/s390/predicates.md  |5 +
 gcc/config/s390/s390-builtin-types.def | 1027 ++--
 gcc/config/s390/s390-builtins.def  |   27 +-
 gcc/config/s390/s390-c.c   |   24 +-
 gcc/config/s390/s390.c |   35 +-
 gcc/config/s390/vecintrin.h|   16 +-
 gcc/config/s390/vector.md  |   19 +-
 gcc/config/s390/vx-builtins.md |  194 ++--
 gcc/testsuite/gcc.target/s390/vector/vec-vrepi-1.c |   58 ++
 .../gcc.target/s390/zvector/vec-splat-1.c  |   42 +
 .../gcc.target/s390/zvector/vec-splat-2.c  |   42 +
 12 files changed, 857 insertions(+), 678 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-vrepi-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-splat-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-splat-2.c

-- 
2.3.0



[PATCH 3/7] S/390: Fix RT flag in vstrc instruction.

2015-11-27 Thread Andreas Krebbel
gcc/ChangeLog:

2015-11-27  Andreas Krebbel  

* config/s390/s390-c.c (s390_get_vstring_flags): Invert the
condition for the RT flag.
---
 gcc/config/s390/s390-c.c | 24 
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/gcc/config/s390/s390-c.c b/gcc/config/s390/s390-c.c
index a94eda5..fa69ed3 100644
--- a/gcc/config/s390/s390-c.c
+++ b/gcc/config/s390/s390-c.c
@@ -414,22 +414,14 @@ s390_get_vstring_flags (int ob_fcode)
   switch (ob_fcode)
 {
 
-case S390_OVERLOADED_BUILTIN_s390_vec_find_any_eq_idx:
-case S390_OVERLOADED_BUILTIN_s390_vec_find_any_ne_idx:
-case S390_OVERLOADED_BUILTIN_s390_vec_find_any_eq_or_0_idx:
-case S390_OVERLOADED_BUILTIN_s390_vec_find_any_ne_or_0_idx:
-case S390_OVERLOADED_BUILTIN_s390_vec_find_any_eq_idx_cc:
-case S390_OVERLOADED_BUILTIN_s390_vec_find_any_ne_idx_cc:
-case S390_OVERLOADED_BUILTIN_s390_vec_find_any_eq_or_0_idx_cc:
-case S390_OVERLOADED_BUILTIN_s390_vec_find_any_ne_or_0_idx_cc:
-case S390_OVERLOADED_BUILTIN_s390_vec_cmprg_idx:
-case S390_OVERLOADED_BUILTIN_s390_vec_cmpnrg_idx:
-case S390_OVERLOADED_BUILTIN_s390_vec_cmprg_or_0_idx:
-case S390_OVERLOADED_BUILTIN_s390_vec_cmpnrg_or_0_idx:
-case S390_OVERLOADED_BUILTIN_s390_vec_cmprg_idx_cc:
-case S390_OVERLOADED_BUILTIN_s390_vec_cmpnrg_idx_cc:
-case S390_OVERLOADED_BUILTIN_s390_vec_cmprg_or_0_idx_cc:
-case S390_OVERLOADED_BUILTIN_s390_vec_cmpnrg_or_0_idx_cc:
+case S390_OVERLOADED_BUILTIN_s390_vec_find_any_eq:
+case S390_OVERLOADED_BUILTIN_s390_vec_find_any_ne:
+case S390_OVERLOADED_BUILTIN_s390_vec_find_any_eq_cc:
+case S390_OVERLOADED_BUILTIN_s390_vec_find_any_ne_cc:
+case S390_OVERLOADED_BUILTIN_s390_vec_cmprg:
+case S390_OVERLOADED_BUILTIN_s390_vec_cmpnrg:
+case S390_OVERLOADED_BUILTIN_s390_vec_cmprg_cc:
+case S390_OVERLOADED_BUILTIN_s390_vec_cmpnrg_cc:
   flags |= __VSTRING_FLAG_RT;
   break;
 default:
-- 
2.3.0



[PATCH 2/7] S/390: Enable vrepi constants.

2015-11-27 Thread Andreas Krebbel
This enables vrepi to be used on more situations.

gcc/testsuite/ChangeLog:

2015-11-27  Andreas Krebbel  

* gcc.target/s390/vector/vec-vrepi-1.c: New test.

gcc/ChangeLog:

2015-11-27  Andreas Krebbel  

* config/s390/constraints.md ("jKK"): New constraint.
* config/s390/s390.c (tm-constrs.h): Include for
satisfies_constraint_*.
(s390_legitimate_constant_p): Allow jKK constants.  Use
satisfies_constraint_* also for the others.
(legitimate_reload_vector_constant_p): Likewise.
(print_operand): Allow h output modifier on vectors.
* config/s390/vector.md ("mov"): Add vrepi.
---
 gcc/config/s390/constraints.md | 46 ++---
 gcc/config/s390/s390.c | 26 ++
 gcc/config/s390/vector.md  |  7 +--
 gcc/testsuite/gcc.target/s390/vector/vec-vrepi-1.c | 58 ++
 4 files changed, 107 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-vrepi-1.c

diff --git a/gcc/config/s390/constraints.md b/gcc/config/s390/constraints.md
index 66d4ace..1dab92a 100644
--- a/gcc/config/s390/constraints.md
+++ b/gcc/config/s390/constraints.md
@@ -34,6 +34,8 @@
 ;; jm1: constant scalar or vector with all bits set
 ;; jxx: contiguous bitmask of 0 or 1 in all vector elements
 ;; jyy: constant consisting of byte chunks being either 0 or 0xff
+;; jKK: constant vector with all elements having the same value and
+;;  matching K constraint
 ;;t -- Access registers 36 and 37.
 ;;v -- Vector registers v0-v31.
 ;;C -- A signed 8-bit constant (-128..127)
@@ -108,23 +110,6 @@
   "FP_REGS"
   "Floating point registers")
 
-(define_constraint "j00"
-  "Zero scalar or vector constant"
-  (match_test "op == CONST0_RTX (GET_MODE (op))"))
-
-(define_constraint "jm1"
-  "All one bit scalar or vector constant"
-  (match_test "op == CONSTM1_RTX (GET_MODE (op))"))
-
-(define_constraint "jxx"
-  "@internal"
-  (and (match_code "const_vector")
-   (match_test "s390_contiguous_bitmask_vector_p (op, NULL, NULL)")))
-
-(define_constraint "jyy"
-  "@internal"
-  (and (match_code "const_vector")
-   (match_test "s390_bytemask_vector_p (op, NULL)")))
 
 (define_register_constraint "t"
   "ACCESS_REGS"
@@ -402,6 +387,33 @@
(match_test "s390_O_constraint_str ('n', ival)")))
 
 
+;;
+;; Vector constraints follow.
+;;
+
+(define_constraint "j00"
+  "Zero scalar or vector constant"
+  (match_test "op == CONST0_RTX (GET_MODE (op))"))
+
+(define_constraint "jm1"
+  "All one bit scalar or vector constant"
+  (match_test "op == CONSTM1_RTX (GET_MODE (op))"))
+
+(define_constraint "jxx"
+  "@internal"
+  (and (match_code "const_vector")
+   (match_test "s390_contiguous_bitmask_vector_p (op, NULL, NULL)")))
+
+(define_constraint "jyy"
+  "@internal"
+  (and (match_code "const_vector")
+   (match_test "s390_bytemask_vector_p (op, NULL)")))
+
+(define_constraint "jKK"
+  "@internal"
+  (and (and (match_code "const_vector")
+   (match_test "const_vec_duplicate_p (op)"))
+   (match_test "satisfies_constraint_K (XVECEXP (op, 0, 0))")))
 
 
 ;;
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 40ee2f7..e872423 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "rtl-iter.h"
 #include "intl.h"
+#include "tm-constrs.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -3643,9 +3644,11 @@ s390_legitimate_constant_p (machine_mode mode, rtx op)
   if (GET_MODE_SIZE (mode) != 16)
return 0;
 
-  if (!const0_operand (op, mode)
- && !s390_contiguous_bitmask_vector_p (op, NULL, NULL)
- && !s390_bytemask_vector_p (op, NULL))
+  if (!satisfies_constraint_j00 (op)
+ && !satisfies_constraint_jm1 (op)
+ && !satisfies_constraint_jKK (op)
+ && !satisfies_constraint_jxx (op)
+ && !satisfies_constraint_jyy (op))
return 0;
 }
 
@@ -3826,14 +3829,12 @@ legitimate_reload_fp_constant_p (rtx op)
 static bool
 legitimate_reload_vector_constant_p (rtx op)
 {
-  /* FIXME: Support constant vectors with all the same 16 bit unsigned
- operands.  These can be loaded with vrepi.  */
-
   if (TARGET_VX && GET_MODE_SIZE (GET_MODE (op)) == 16
-  && (const0_operand (op, GET_MODE (op))
- || constm1_operand (op, GET_MODE (op))
- || s390_contiguous_bitmask_vector_p (op, NULL, NULL)
- || s390_bytemask_vector_p (op, NULL)))
+  && (satisfies_constraint_j00 (op)
+ || satisfies_constraint_jm1 (op)
+ || satisfies_constraint_jKK (op)
+ || satisfies_constraint_jxx (op)
+ || satisfies_constraint_jyy (op)))
 return true;
 
   return false;
@@ -7117,6 +7118,11 @@ print_operand (FILE *file, rtx x, int code)
 case CONST_VECTOR

[PATCH 7/7] S/390: Make constant checking more strict

2015-11-27 Thread Andreas Krebbel
This makes the predicates and constraints on constant integer operands
more strict in order to catch more problems at compile-time.

gcc/ChangeLog:

2015-11-27  Andreas Krebbel  

* config/s390/predicates.md (const_mask_operand): New predicate.
* config/s390/s390-builtins.def: Set a smaller bitmask for a few 
builtins.
* config/s390/vector.md: Change predicate from immediate_operand
to either const_int_operand or const_mask_operand.  Add special
insn conditions on patterns which have to exclude certain values.
* config/s390/vx-builtins.md: Likewise.
---
 gcc/config/s390/predicates.md |   5 +
 gcc/config/s390/s390-builtins.def |  18 ++--
 gcc/config/s390/vector.md |   6 +-
 gcc/config/s390/vx-builtins.md| 194 +++---
 4 files changed, 114 insertions(+), 109 deletions(-)

diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index eeaf1ae..e1a2bc6 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -34,6 +34,11 @@
   (and (match_code "const_int, const_double,const_vector")
(match_test "op == CONSTM1_RTX (mode)")))
 
+;; Return true if OP is a 4 bit mask operand
+(define_predicate "const_mask_operand"
+  (and (match_code "const_int")
+   (match_test "UINTVAL (op) < 16")))
+
 ;; Return true if OP is constant.
 
 (define_special_predicate "consttable_operand"
diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index 0b6961e..b0a86e9 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -2470,15 +2470,15 @@ OB_DEF (s390_vec_ctd,   
s390_vec_ctd_s64,   s390_vec_ctd_u64,
 OB_DEF_VAR (s390_vec_ctd_s64,   s390_vec_ctd_s64,   O2_U5, 
 BT_OV_V2DF_V2DI_INT) /* vcdgb */
 OB_DEF_VAR (s390_vec_ctd_u64,   s390_vec_ctd_u64,   O2_U5, 
 BT_OV_V2DF_UV2DI_INT)/* vcdlgb */
 
-B_DEF  (s390_vec_ctd_s64,   vec_ctd_s64,0, 
 B_VX,   O2_U5,  BT_FN_V2DF_V2DI_INT)   
  /* vcdgb */
-B_DEF  (s390_vec_ctd_u64,   vec_ctd_u64,0, 
 B_VX,   O2_U5,  BT_FN_V2DF_UV2DI_INT)  
  /* vcdlgb */
-B_DEF  (s390_vcdgb, vec_di_to_df_s64,   0, 
 B_VX,   O2_U5,  BT_FN_V2DF_V2DI_INT)
-B_DEF  (s390_vcdlgb,vec_di_to_df_u64,   0, 
 B_VX,   O2_U5,  BT_FN_V2DF_UV2DI_INT)
-B_DEF  (s390_vec_ctsl,  vec_ctsl,   0, 
 B_VX,   O2_U5,  BT_FN_V2DI_V2DF_INT)   
  /* vcgdb */
-B_DEF  (s390_vec_ctul,  vec_ctul,   0, 
 B_VX,   O2_U5,  BT_FN_UV2DI_V2DF_INT)  
  /* vclgdb */
-B_DEF  (s390_vcgdb, vec_df_to_di_s64,   0, 
 B_VX,   O2_U5,  BT_FN_V2DI_V2DF_INT)
-B_DEF  (s390_vclgdb,vec_df_to_di_u64,   0, 
 B_VX,   O2_U5,  BT_FN_UV2DI_V2DF_INT)
-B_DEF  (s390_vfidb, vfidb,  0, 
 B_VX,   O2_U4 | O3_U4,  BT_FN_V2DF_V2DF_UCHAR_UCHAR)
+B_DEF  (s390_vec_ctd_s64,   vec_ctd_s64,0, 
 B_VX,   O2_U3,  BT_FN_V2DF_V2DI_INT)   
  /* vcdgb */
+B_DEF  (s390_vec_ctd_u64,   vec_ctd_u64,0, 
 B_VX,   O2_U3,  BT_FN_V2DF_UV2DI_INT)  
  /* vcdlgb */
+B_DEF  (s390_vcdgb, vec_di_to_df_s64,   0, 
 B_VX,   O2_U3,  BT_FN_V2DF_V2DI_INT)   
  /* vcdgb */
+B_DEF  (s390_vcdlgb,vec_di_to_df_u64,   0, 
 B_VX,   O2_U3,  BT_FN_V2DF_UV2DI_INT)  
  /* vcdlgb */
+B_DEF  (s390_vec_ctsl,  vec_ctsl,   0, 
 B_VX,   O2_U3,  BT_FN_V2DI_V2DF_INT)   
  /* vcgdb */
+B_DEF  (s390_vec_ctul,  vec_ctul,   0, 
 B_VX,   O2_U3,  BT_FN_UV2DI_V2DF_INT)  
  /* vclgdb */
+B_DEF  (s390_vcgdb, vec_df_to_di_s64,   0, 
 B_VX,   O2_U3,  BT_FN_V2DI_V2DF_INT)   
  /* vcgdb */
+B_DEF  (s390_vclgdb,vec_df_to_di_u64,   0, 
 B_VX,   O2_U3,  BT_FN_UV2DI_V2DF_INT)  
  /* vclgdb */
+B_DEF  (s390_vfidb, vfidb,  0, 
 B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_V2DF_UCHAR_UCHAR)
 B

[PATCH 6/7] S/390: vec_set mode DI->SI for shift_count

2015-11-27 Thread Andreas Krebbel
gcc/ChangeLog:

2015-11-27  Andreas Krebbel  

* config/s390/vector.md ("*vec_set"): Change shift count
mode from DI to SI.
---
 gcc/config/s390/vector.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 9c1e6a6..d4f652a 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -314,7 +314,7 @@
 (define_insn "*vec_set"
   [(set (match_operand:V0 "register_operand" 
"=v, v,v")
(unspec:V [(match_operand: 1 "general_operand"   
"d,QR,K")
-  (match_operand:DI2 "shift_count_or_setmem_operand" 
"Y, I,I")
+  (match_operand:SI2 "shift_count_or_setmem_operand" 
"Y, I,I")
   (match_operand:V 3 "register_operand"  
"0, 0,0")]
  UNSPEC_VEC_SET))]
   "TARGET_VX"
-- 
2.3.0



[PATCH 5/7] S/390: Fix vec_splat_* builtins.

2015-11-27 Thread Andreas Krebbel
This enables the vec_splat_* builtins to make use of vrepi for constant 
operands.

gcc/testsuite/ChangeLog:

2015-11-27  Andreas Krebbel  

* gcc.target/s390/zvector/vec-splat-2.c: New test.

gcc/ChangeLog:

2015-11-27  Andreas Krebbel  

* config/s390/s390-builtin-types.def: New builtin types added.
* config/s390/s390-builtins.def: Add s390_vec_splat_* definitions.
* config/s390/s390.c (s390_expand_builtin): Always truncate
constants to the mode in the pattern.
* config/s390/vecintrin.h: Let the vec_splat_* macros point to the
respective builtin __builtin_s390_vec_splat_*.
---
 gcc/config/s390/s390-builtin-types.def |  3 ++
 gcc/config/s390/s390-builtins.def  |  9 +
 gcc/config/s390/s390.c |  9 +
 gcc/config/s390/vecintrin.h| 16 -
 .../gcc.target/s390/zvector/vec-splat-2.c  | 42 ++
 5 files changed, 71 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-splat-2.c

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index 724e0b6..bd3d534 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -139,16 +139,19 @@ DEF_FN_TYPE_2 (BT_FN_UV16QI_USHORT, B_VX, BT_UV16QI, 
BT_USHORT)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UV16QI, B_VX, BT_UV16QI, BT_UV16QI)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_ULONGLONG, B_VX, BT_UV2DI, BT_ULONGLONG)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_ULONGLONGCONSTPTR, B_VX, BT_UV2DI, 
BT_ULONGLONGCONSTPTR)
+DEF_FN_TYPE_2 (BT_FN_UV2DI_USHORT, B_VX, BT_UV2DI, BT_USHORT)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_UV2DI, B_VX, BT_UV2DI, BT_UV2DI)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_UV4SI, B_VX, BT_UV2DI, BT_UV4SI)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UINT, B_VX, BT_UV4SI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UINTCONSTPTR, B_VX, BT_UV4SI, BT_UINTCONSTPTR)
+DEF_FN_TYPE_2 (BT_FN_UV4SI_USHORT, B_VX, BT_UV4SI, BT_USHORT)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UV4SI, B_VX, BT_UV4SI, BT_UV4SI)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UV8HI, B_VX, BT_UV4SI, BT_UV8HI)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_USHORT, B_VX, BT_UV8HI, BT_USHORT)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_USHORTCONSTPTR, B_VX, BT_UV8HI, BT_USHORTCONSTPTR)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV16QI, B_VX, BT_UV8HI, BT_UV16QI)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI, B_VX, BT_UV8HI, BT_UV8HI)
+DEF_FN_TYPE_2 (BT_FN_V16QI_SCHAR, B_VX, BT_V16QI, BT_SCHAR)
 DEF_FN_TYPE_2 (BT_FN_V16QI_UCHAR, B_VX, BT_V16QI, BT_UCHAR)
 DEF_FN_TYPE_2 (BT_FN_V16QI_V16QI, B_VX, BT_V16QI, BT_V16QI)
 DEF_FN_TYPE_2 (BT_FN_V2DF_DBL, B_VX, BT_V2DF, BT_DBL)
diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index b267b04..0b6961e 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -362,6 +362,15 @@ B_DEF  (s390_vrepih,vec_splatsv8hi,
 0,
 B_DEF  (s390_vrepif,vec_splatsv4si, 0, 
 B_VX,   O1_S16, BT_FN_V4SI_SHORT)
 B_DEF  (s390_vrepig,vec_splatsv2di, 0, 
 B_VX,   O1_S16, BT_FN_V2DI_SHORT)
 
+B_DEF  (s390_vec_splat_u8,  vec_splatsv16qi,0, 
 B_VX,   O1_U8,  BT_FN_UV16QI_UCHAR)
+B_DEF  (s390_vec_splat_s8,  vec_splatsv16qi,0, 
 B_VX,   O1_S8,  BT_FN_V16QI_SCHAR)
+B_DEF  (s390_vec_splat_u16, vec_splatsv8hi, 0, 
 B_VX,   O1_U16, BT_FN_UV8HI_USHORT)
+B_DEF  (s390_vec_splat_s16, vec_splatsv8hi, 0, 
 B_VX,   O1_S16, BT_FN_V8HI_SHORT)
+B_DEF  (s390_vec_splat_u32, vec_splatsv4si, 0, 
 B_VX,   O1_U16, BT_FN_UV4SI_USHORT)
+B_DEF  (s390_vec_splat_s32, vec_splatsv4si, 0, 
 B_VX,   O1_S16, BT_FN_V4SI_SHORT)
+B_DEF  (s390_vec_splat_u64, vec_splatsv2di, 0, 
 B_VX,   O1_U16, BT_FN_UV2DI_USHORT)
+B_DEF  (s390_vec_splat_s64, vec_splatsv2di, 0, 
 B_VX,   O1_S16, BT_FN_V2DI_SHORT)
+
 OB_DEF (s390_vec_insert,s390_vec_insert_s8, 
s390_vec_insert_dbl,B_VX,   BT_FN_OV4SI_INT_OV4SI_INT)
 OB_DEF_VAR (s390_vec_insert_s8, s390_vlvgb, O3_ELEM,   
 BT_OV_V16QI_SCHAR_V16QI_INT)
 OB_DEF_VAR (s390_vec_insert_u8, s390_vlvgb, O3_ELEM,   
 BT_OV_UV16QI_UCHAR_UV16QI_INT)
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index e872423..2c8bbdc 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -835,6 +835,15 @@ s390_expand_builtin (tree exp, rtx target, rtx subtarget 
ATTRIBUTE_UNUSED,
   insn_op = &insn_data[icode].operand[arity + nonvoid];
   op[arity] = 

[PATCH 4/7] S/390: Sort builtin types - cleanup only.

2015-11-27 Thread Andreas Krebbel
This patch sorts the builtin type definitions in the s390-builtin-types.def file
properly.  The file is script-generated from s390-builtins.def and
this hopefully makes future patches smaller.

2015-11-27  Andreas Krebbel  

* config/s390/s390-builtin-types.def: Sort builtin types.
---
 gcc/config/s390/s390-builtin-types.def | 1024 
 1 file changed, 512 insertions(+), 512 deletions(-)

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index 245d538..724e0b6 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -58,695 +58,695 @@
   s390_builtin_types[T4],  \
   s390_builtin_types[T5],  \
   s390_builtin_types[T6])
-DEF_TYPE (BT_DBL, B_VX, double_type_node, 0)
-DEF_TYPE (BT_DBLCONST, B_VX, double_type_node, 1)
-DEF_TYPE (BT_FLT, B_VX, float_type_node, 0)
+DEF_TYPE (BT_INT, B_HTM | B_VX, integer_type_node, 0)
+DEF_TYPE (BT_VOID, 0, void_type_node, 0)
 DEF_TYPE (BT_FLTCONST, B_VX, float_type_node, 1)
+DEF_TYPE (BT_UINT64, B_HTM, c_uint64_type_node, 0)
+DEF_TYPE (BT_FLT, B_VX, float_type_node, 0)
+DEF_TYPE (BT_UINT, 0, unsigned_type_node, 0)
+DEF_TYPE (BT_VOIDCONST, B_VX, void_type_node, 1)
+DEF_TYPE (BT_ULONG, B_VX, long_unsigned_type_node, 0)
+DEF_TYPE (BT_USHORTCONST, B_VX, short_unsigned_type_node, 1)
+DEF_TYPE (BT_SHORTCONST, B_VX, short_integer_type_node, 1)
 DEF_TYPE (BT_INTCONST, B_VX, integer_type_node, 1)
-DEF_TYPE (BT_INT, B_HTM | B_VX, integer_type_node, 0)
-DEF_TYPE (BT_LONGLONGCONST, B_VX, long_long_integer_type_node, 1)
-DEF_TYPE (BT_LONGLONG, B_VX, long_long_integer_type_node, 0)
-DEF_TYPE (BT_LONG, B_VX, long_integer_type_node, 0)
+DEF_TYPE (BT_UCHARCONST, B_VX, unsigned_char_type_node, 1)
 DEF_TYPE (BT_UCHAR, B_VX, unsigned_char_type_node, 0)
-DEF_TYPE (BT_SCHAR, B_VX, signed_char_type_node, 0)
 DEF_TYPE (BT_SCHARCONST, B_VX, signed_char_type_node, 1)
-DEF_TYPE (BT_SHORTCONST, B_VX, short_integer_type_node, 1)
 DEF_TYPE (BT_SHORT, B_VX, short_integer_type_node, 0)
-DEF_TYPE (BT_UINT, 0, unsigned_type_node, 0)
-DEF_TYPE (BT_UINT64, B_HTM, c_uint64_type_node, 0)
-DEF_TYPE (BT_UCHARCONST, B_VX, unsigned_char_type_node, 1)
-DEF_TYPE (BT_UINTCONST, B_VX, unsigned_type_node, 1)
+DEF_TYPE (BT_LONG, B_VX, long_integer_type_node, 0)
+DEF_TYPE (BT_SCHAR, B_VX, signed_char_type_node, 0)
 DEF_TYPE (BT_ULONGLONGCONST, B_VX, long_long_unsigned_type_node, 1)
-DEF_TYPE (BT_USHORTCONST, B_VX, short_unsigned_type_node, 1)
-DEF_TYPE (BT_VOIDCONST, B_VX, void_type_node, 1)
-DEF_TYPE (BT_VOID, 0, void_type_node, 0)
-DEF_TYPE (BT_ULONG, B_VX, long_unsigned_type_node, 0)
-DEF_TYPE (BT_ULONGLONG, B_VX, long_long_unsigned_type_node, 0)
 DEF_TYPE (BT_USHORT, B_VX, short_unsigned_type_node, 0)
-DEF_DISTINCT_TYPE (BT_BCHAR, B_VX, BT_UCHAR)
-DEF_DISTINCT_TYPE (BT_BINT, B_VX, BT_UINT)
-DEF_DISTINCT_TYPE (BT_BLONGLONG, B_VX, BT_ULONGLONG)
-DEF_DISTINCT_TYPE (BT_BSHORT, B_VX, BT_USHORT)
-DEF_POINTER_TYPE (BT_DBLPTR, B_VX, BT_DBL)
+DEF_TYPE (BT_LONGLONG, B_VX, long_long_integer_type_node, 0)
+DEF_TYPE (BT_DBLCONST, B_VX, double_type_node, 1)
+DEF_TYPE (BT_ULONGLONG, B_VX, long_long_unsigned_type_node, 0)
+DEF_TYPE (BT_DBL, B_VX, double_type_node, 0)
+DEF_TYPE (BT_LONGLONGCONST, B_VX, long_long_integer_type_node, 1)
+DEF_TYPE (BT_UINTCONST, B_VX, unsigned_type_node, 1)
+DEF_VECTOR_TYPE (BT_UV2DI, B_VX, BT_ULONGLONG, 2)
+DEF_VECTOR_TYPE (BT_V4SI, B_VX, BT_INT, 4)
+DEF_VECTOR_TYPE (BT_V8HI, B_VX, BT_SHORT, 8)
+DEF_VECTOR_TYPE (BT_UV4SI, B_VX, BT_UINT, 4)
+DEF_VECTOR_TYPE (BT_V16QI, B_VX, BT_SCHAR, 16)
+DEF_VECTOR_TYPE (BT_V2DF, B_VX, BT_DBL, 2)
+DEF_VECTOR_TYPE (BT_V2DI, B_VX, BT_LONGLONG, 2)
+DEF_VECTOR_TYPE (BT_UV8HI, B_VX, BT_USHORT, 8)
+DEF_VECTOR_TYPE (BT_UV16QI, B_VX, BT_UCHAR, 16)
+DEF_POINTER_TYPE (BT_UCHARPTR, B_VX, BT_UCHAR)
 DEF_POINTER_TYPE (BT_DBLCONSTPTR, B_VX, BT_DBLCONST)
+DEF_POINTER_TYPE (BT_VOIDPTR, B_HTM | B_VX, BT_VOID)
 DEF_POINTER_TYPE (BT_FLTPTR, B_VX, BT_FLT)
-DEF_POINTER_TYPE (BT_FLTCONSTPTR, B_VX, BT_FLTCONST)
-DEF_POINTER_TYPE (BT_INTCONSTPTR, B_VX, BT_INTCONST)
-DEF_POINTER_TYPE (BT_INTPTR, B_VX, BT_INT)
-DEF_POINTER_TYPE (BT_LONGLONGCONSTPTR, B_VX, BT_LONGLONGCONST)
-DEF_POINTER_TYPE (BT_LONGLONGPTR, B_VX, BT_LONGLONG)
-DEF_POINTER_TYPE (BT_SCHARCONSTPTR, B_VX, BT_SCHARCONST)
+DEF_POINTER_TYPE (BT_UINT64PTR, B_HTM, BT_UINT64)
 DEF_POINTER_TYPE (BT_SCHARPTR, B_VX, BT_SCHAR)
-DEF_POINTER_TYPE (BT_SHORTCONSTPTR, B_VX, BT_SHORTCONST)
-DEF_POINTER_TYPE (BT_SHORTPTR, B_VX, BT_SHORT)
-DEF_POINTER_TYPE (BT_UCHARCONSTPTR, B_VX, BT_UCHARCONST)
-DEF_POINTER_TYPE (BT_UCHARPTR, B_VX, BT_UCHAR)
-DEF_POINTER_TYPE (BT_UINTPTR, B_VX, BT_UINT)
 DEF_POINTER_TYPE (BT_UINTCONSTPTR, B_VX, BT_UINTCONST)
-DEF_POINTER_TYPE (BT_UINT64PTR, B_HTM, BT_UINT64)
 DEF_POINTER_TYPE (BT_ULONGLONGCONSTPTR, B_VX, BT_ULONGLONGCONST)
-DEF_POINTER_TYPE (BT_ULONGLONGPTR, B_VX, BT_ULONGLONG)
+DEF_POINTER_TYPE (BT_LONGLONGCONSTPTR

Re: [PATCH] Convert a test to GIMPLE

2015-11-27 Thread Marek Polacek
On Fri, Nov 27, 2015 at 12:37:32PM +0100, Jakub Jelinek wrote:
> I think it is just fine for 5 branch too.

Ok, committed there as well.

> You could even use int tem3 = ~m; and use tem3 instead of ~m if you really
> wanted to test whether GIMPLE folding works.

Sure.

Marek


Re: [PATCH, 4/16] Implement -foffload-alias

2015-11-27 Thread Tom de Vries

On 23/11/15 12:41, Richard Biener wrote:

On Sat, 21 Nov 2015, Tom de Vries wrote:


>On 13/11/15 12:39, Jakub Jelinek wrote:

> >On Fri, Nov 13, 2015 at 12:29:51PM +0100, Richard Biener wrote:

> > > >thanks for the explanation. Filed as PR68331 - '[meta-bug] fipa-pta
> > > >issues'.
> > > >
> > > >Any feedback on the '#pragma GCC offload-alias=' bit
> > > >above?
> > > >Is that sort of what you had in mind?

> > >
> > >Yes.  Whether that makes sense is another question of course.  You can
> > >annotate memory references with MR_DEPENDENCE_BASE/CLIQUE yourself
> > >as well if you know dependences without the users intervention.

> >
> >I really don't like even the GCC offload-alias, I just don't see anything
> >special on the offload code.  Not to mention that the same issue is already
> >with other outlined functions, like OpenMP tasks or parallel regions, those
> >aren't offloaded, yet they can suffer from worse alias/points-to analysis
> >too.

>
>AFAIU there is one aspect that is different for offloaded code: the setup of
>the data on the device.
>
>Consider this example:
>...
>unsigned int a[N];
>unsigned int b[N];
>unsigned int c[N];
>
>int
>main (void)
>{
>   ...
>
>#pragma acc kernels copyin (a) copyin (b) copyout (c)
>   {
> for (COUNTERTYPE ii = 0; ii < N; ii++)
>   c[ii] = a[ii] + b[ii];
>   }
>
>   ...
>...
>
>At gimple level, we have:
>...
>#pragma omp target oacc_kernels \
>   map(force_from:c [len: 2097152]) \
>   map(force_to:b [len: 2097152]) \
>   map(force_to:a [len: 2097152])
>...
>
>[ The meaning of the force_from/force_to mappings is given in
>include/gomp-constants.h:
>...
> /* Allocate.  */
> GOMP_MAP_FORCE_ALLOC = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_ALLOC),
> /* ..., and copy to device.  */
> GOMP_MAP_FORCE_TO = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TO),
> /* ..., and copy from device.  */
> GOMP_MAP_FORCE_FROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_FROM),
> /* ..., and copy to and from device.  */
> GOMP_MAP_FORCE_TOFROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TOFROM),
>...  ]
>
>So before calling the offloaded function, a separate alloc is done for a, b
>and c, and the base pointers of the newly allocated objects are passed to the
>offloaded function.
>
>This means we can mark those base pointers as restrict in the offloaded
>function.
>
>Attached proof-of-concept patch implements that.
>

> >We simply have some compiler internal interface between the caller and
> >callee of the outlined regions, each interface in between those has
> >its own structure type used to communicate the info;
> >we can attach attributes on the fields, or some flags to indicate some
> >properties interesting from aliasing POV.
> >We don't really need to perform
> >full IPA-PTA, perhaps it would be enough to a) record somewhere in cgraph
> >the relationship in between such callers and callees (for offloading regions
> >we already have "omp target entrypoint" attribute on the callee and a
> >singler caller), tell LTO if possible not to split those into different
> >partitions if easily possible, and then just for these pairs perform
> >aliasing/points-to analysis in the caller and the result record using
> >cliques/special attributes/whatever to the callee side, so that the callee
> >(outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias analysis.

>
>As a start, is the approach of this patch OK?

Works for me but leaving to Jakub to review for correctness.


Attached patch is a complete version:
- added ChangeLog
- added missing function header comments
- moved analysis to separate function
  omp_target_base_pointers_restrict_p
- added example in comment before analysis
- fixed error in omp_target_base_pointers_restrict_p where I was using
  GOMP_MAP_ALLOC but should have been using GOMP_MAP_FORCE_ALLOC
- added testcases

Bootstrapped and reg-tested on x86_64.

OK for stage3 trunk?

Thanks,
- Tom

Mark pointers to allocated target vars as restricted, if possible

2015-11-26  Tom de Vries  

	* omp-low.c (install_var_field_1): New function, factored out of ...
	(install_var_field): ... here.
	(scan_sharing_clauses_1): New function, factored out of ...
	(scan_sharing_clauses): ... here.
	(omp_target_base_pointers_restrict_p): New function.
	(scan_omp_target): Call scan_sharing_clauses_1 instead of
	scan_sharing_clauses, with base_pointers_restrict arg.

	* c-c++-common/goacc/kernels-alias-2.c: New test.
	* c-c++-common/goacc/kernels-alias-3.c: New test.
	* c-c++-common/goacc/kernels-alias-4.c: New test.
	* c-c++-common/goacc/kernels-alias-5.c: New test.
	* c-c++-common/goacc/kernels-alias-6.c: New test.
	* c-c++-common/goacc/kernels-alias-7.c: New test.
	* c-c++-common/goacc/kernels-alias-8.c: New test.
	* c-c++-common/goacc/kernels-alias.c: New test.

---
 gcc/omp-low.c  | 109 +++--
 gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c |  27 +
 gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c |  20 
 gcc/testsuite/

[PATCH PR68529]Fix not recognized scev by computing no-overflow info for loop with NE_EXPR exit condition

2015-11-27 Thread Bin Cheng
Hi,
This patch is to fix PR68529.  In my previous scev/niter overflow patches, I
only computed no-overflow information for control iv in simple loops with
LT_EXPR as exit condition code.  This bug is about loop with NE_EXPR as exit
condition code.  Given below example:

#include 
#include 

int main(){
char c[1]={};
unsigned int nchar=;

while(nchar--!=0){   
   c[nchar]='A';
  }   

printf("%s\n",c);
return 0;
}
nchar used as an index to array 'c' doesn't overflow during loop iterations.
Thus &c[nchar] acts as a scev.  GCC now fails to do that.  With this patch,
this issue is fixed.

Furthermore, the computation of no-overflow information could be improved by
using TREE_OVERFLOW_UNDEFINED semantic of signed type for C/C++.  I didn't
do that because:
1) I doubt how useful it could be because I have already changed scev to use
the semantic whenever possible.  It doesn't need loop niter analysis' help.
2) To do that, I need to expose chrec_convert_aggressive information out of
scev in function simple_iv, because that function could corrupt
TREE_OVERFLOW_UNDEFINED semantic assumption.  This isn't appropriate for
Stage3.

Bootstrap and test on x86_64 and x86.  I don't expect any issue on aarch64
either.  Is it OK?

2015-11-27  Bin Cheng  

PR tree-optimization/68529
* tree-ssa-loop-niter.c (number_of_iterations_ne): Add new param.
Compute no-overflow information for control iv.
(number_of_iterations_lt, number_of_iterations_le): Add new param.
(number_of_iterations_cond): Pass new argument to above functions.

2015-11-27  Bin Cheng  

PR tree-optimization/68529
* gcc.dg/tree-ssa/pr68529-1.c: New test.
* gcc.dg/tree-ssa/pr68529-2.c: New test.
* gcc.dg/tree-ssa/pr68529-3.c: New test.

Index: gcc/testsuite/gcc.dg/tree-ssa/pr68529-1.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/pr68529-1.c   (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr68529-1.c   (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns 
-fdump-tree-ldist-details" } */
+
+void bar(char *s);
+int foo()
+{
+  char c[1] = {};
+  unsigned short nchar = ;
+
+  while(nchar-- != 0)
+{
+  c[nchar] = 'A';
+}
+
+  bar (c);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "distributed: split to 0 loops and 1 library 
calls" "ldist" } } */
+/* { dg-final { scan-tree-dump "generated memset" "ldist" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr68529-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/pr68529-2.c   (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr68529-2.c   (working copy)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns 
-fdump-tree-ldist-details" } */
+
+void bar(char *s);
+int foo(unsigned short l)
+{
+  char c[1] = {};
+  unsigned short nchar = ;
+
+  if (nchar <= l)
+return -1;
+
+  while(nchar-- != l)
+{
+  c[nchar] = 'A';
+}
+
+  bar (c);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "distributed: split to 0 loops and 1 library 
calls" "ldist" } } */
+/* { dg-final { scan-tree-dump "generated memset" "ldist" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr68529-3.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/pr68529-3.c   (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr68529-3.c   (working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns 
-fdump-tree-ldist-details" } */
+
+void bar(char *s);
+int foo(unsigned short l)
+{
+  char c[1] = {};
+  unsigned short nchar = ;
+
+  if (nchar < l)
+return -1;
+
+  while(nchar-- != l)
+{
+  c[nchar] = 'A';
+}
+
+  bar (c);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "failed: evolution of offset is not affine" 
"ldist" } } */
Index: gcc/tree-ssa-loop-niter.c
===
--- gcc/tree-ssa-loop-niter.c   (revision 230945)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -957,13 +957,14 @@ number_of_iterations_ne_max (mpz_t bnd, bool no_ov
bounds on the difference FINAL - IV->base.  */
 
 static bool
-number_of_iterations_ne (tree type, affine_iv *iv, tree final,
-struct tree_niter_desc *niter, bool exit_must_be_taken,
-bounds *bnds)
+number_of_iterations_ne (struct loop *loop, tree type, affine_iv *iv,
+tree final, struct tree_niter_desc *niter,
+bool exit_must_be_taken, bounds *bnds)
 {
   tree niter_type = unsigned_type_for (type);
   tree s, c, d, bits, assumption, tmp, bound;
   mpz_t max;
+  tree e;
 
   niter->control = *iv;
   niter->bound = final;
@@ -998,6 +999,2

Re: [PATCH, 4/16] Implement -foffload-alias

2015-11-27 Thread Tom de Vries

On 27/11/15 12:42, Tom de Vries wrote:

On 23/11/15 12:41, Richard Biener wrote:

On Sat, 21 Nov 2015, Tom de Vries wrote:


>On 13/11/15 12:39, Jakub Jelinek wrote:

> >On Fri, Nov 13, 2015 at 12:29:51PM +0100, Richard Biener wrote:

> > > >thanks for the explanation. Filed as PR68331 - '[meta-bug]
fipa-pta
> > > >issues'.
> > > >
> > > >Any feedback on the '#pragma GCC
offload-alias=' bit
> > > >above?
> > > >Is that sort of what you had in mind?

> > >
> > >Yes.  Whether that makes sense is another question of course.
You can
> > >annotate memory references with MR_DEPENDENCE_BASE/CLIQUE
yourself
> > >as well if you know dependences without the users intervention.

> >
> >I really don't like even the GCC offload-alias, I just don't see
anything
> >special on the offload code.  Not to mention that the same issue
is already
> >with other outlined functions, like OpenMP tasks or parallel
regions, those
> >aren't offloaded, yet they can suffer from worse alias/points-to
analysis
> >too.

>
>AFAIU there is one aspect that is different for offloaded code: the
setup of
>the data on the device.
>
>Consider this example:
>...
>unsigned int a[N];
>unsigned int b[N];
>unsigned int c[N];
>
>int
>main (void)
>{
>   ...
>
>#pragma acc kernels copyin (a) copyin (b) copyout (c)
>   {
> for (COUNTERTYPE ii = 0; ii < N; ii++)
>   c[ii] = a[ii] + b[ii];
>   }
>
>   ...
>...
>
>At gimple level, we have:
>...
>#pragma omp target oacc_kernels \
>   map(force_from:c [len: 2097152]) \
>   map(force_to:b [len: 2097152]) \
>   map(force_to:a [len: 2097152])
>...
>
>[ The meaning of the force_from/force_to mappings is given in
>include/gomp-constants.h:
>...
> /* Allocate.  */
> GOMP_MAP_FORCE_ALLOC = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_ALLOC),
> /* ..., and copy to device.  */
> GOMP_MAP_FORCE_TO = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TO),
> /* ..., and copy from device.  */
> GOMP_MAP_FORCE_FROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_FROM),
> /* ..., and copy to and from device.  */
> GOMP_MAP_FORCE_TOFROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TOFROM),
>...  ]
>
>So before calling the offloaded function, a separate alloc is done
for a, b
>and c, and the base pointers of the newly allocated objects are
passed to the
>offloaded function.
>
>This means we can mark those base pointers as restrict in the offloaded
>function.
>
>Attached proof-of-concept patch implements that.
>

> >We simply have some compiler internal interface between the
caller and
> >callee of the outlined regions, each interface in between those has
> >its own structure type used to communicate the info;
> >we can attach attributes on the fields, or some flags to indicate
some
> >properties interesting from aliasing POV.
> >We don't really need to perform
> >full IPA-PTA, perhaps it would be enough to a) record somewhere
in cgraph
> >the relationship in between such callers and callees (for
offloading regions
> >we already have "omp target entrypoint" attribute on the callee
and a
> >singler caller), tell LTO if possible not to split those into
different
> >partitions if easily possible, and then just for these pairs perform
> >aliasing/points-to analysis in the caller and the result record
using
> >cliques/special attributes/whatever to the callee side, so that
the callee
> >(outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias
analysis.

>
>As a start, is the approach of this patch OK?

Works for me but leaving to Jakub to review for correctness.


Attached patch is a complete version:
- added ChangeLog
- added missing function header comments
- moved analysis to separate function
   omp_target_base_pointers_restrict_p
- added example in comment before analysis
- fixed error in omp_target_base_pointers_restrict_p where I was using
   GOMP_MAP_ALLOC but should have been using GOMP_MAP_FORCE_ALLOC
- added testcases



This follow-up patch handles the case that we copy from/to pointers 
rather than declared variables:

...
   void foo (unsigned int *a, unsigned int *b)
   {
 #pragma acc kernels copyout (a[0:2]) copyout (b[0:2])
 {
   a[0] = 0;
   b[0] = 1;
 }
   }
...

After gimplification, we have:
...
 foo (unsigned int * a, unsigned int * b)
 {
   unsigned int * b.0;
   unsigned int * a.1;

   b.0 = b;
   a.1 = a;
   #pragma omp target oacc_kernels \
 map(force_from:*a.1 (*a) [len: 8]) \
 map(alloc:a [pointer assign, bias: 0]) \
 map(force_from:*b.0 (*b) [len: 8]) \
 map(alloc:b [pointer assign, bias: 0])
   {
 unsigned int * a.2;
 unsigned int * b.3;

 a.2 = a;
 *a.2 = 0;
 b.3 = b;
 *b.3 = 1;
  }
 }
...

We don't bail out of omp_target_base_pointers_restrict_p when 
encountering 'map(alloc:a [pointer assign, bias: 0])', given that we can 
find the matching 'map(force_from:*a.1 (*a) [len: 8])'.


Using this and the previous patch, I'm able to do auto-paral

[PATCH] Add testcase for PR rtl-optimization/68250

2015-11-27 Thread Jakub Jelinek
Hi!

Another REE issue dup.

2015-11-27  Jakub Jelinek  

PR rtl-optimization/68250
* gcc.c-torture/execute/pr68250.c: New test.

--- gcc/testsuite/gcc.c-torture/execute/pr68250.c.jj2015-11-27 
13:10:13.718447138 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr68250.c   2015-11-27 
13:10:07.716531620 +0100
@@ -0,0 +1,40 @@
+/* PR rtl-optimization/68250 */
+
+signed char a, b, h, k, l, m, o;
+short c, d, n;
+int e, f, g, j, q;
+
+void
+fn1 (void)
+{
+  int p = b || a;
+  n = o > 0 || d > 1 >> o ? d : d << o;
+  for (; j; j++)
+m = c < 0 || m || c << p;
+  l = f + 1;
+  for (; f < 1; f = 1)
+k = h + 1;
+}
+
+__attribute__((noinline, noclone)) void
+fn2 (int k)
+{
+  if (k != 1)
+__builtin_abort ();
+}
+
+int
+main ()
+{
+  signed char i;
+  for (; e < 1; e++)
+{
+  fn1 ();
+  if (k)
+   i = k;
+  if (i > q)
+   g = 0;
+}
+  fn2 (k);
+  return 0;
+}

Jakub


Re: [RFC] [Patch] PR67326 - relax trap assumption by looking at similar DRS

2015-11-27 Thread Richard Biener
On Fri, Nov 27, 2015 at 9:24 AM, Kumar, Venkataramanan
 wrote:
> Hi Richard,
>
>> -Original Message-
>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>> Sent: Tuesday, November 24, 2015 9:07 PM
>> To: Kumar, Venkataramanan
>> Cc: Jakub Jelinek (ja...@redhat.com); gcc-patches@gcc.gnu.org
>> Subject: Re: [RFC] [Patch] PR67326 - relax trap assumption by looking at
>> similar DRS
>>
>> On Fri, Nov 20, 2015 at 1:02 PM, Kumar, Venkataramanan
>>  wrote:
>> > Hi Richard,
>> >
>> > As per Jakub suggestion in
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67326, the below patch fixes
>> the regression in tree if conversion.
>> > Basically allowing if conversion to happen for a candidate DR, if we find
>> similar DR with same dimensions  and that DR will not trap.
>> >
>> > To find similar DRs using hash table to hashing the offset and DR pairs.
>> > Also reusing  read/written information that was stored for reference tree.
>> >
>> > Also.
>> > (1) I guard these checks for  -ftree-loop-if-convert-stores and -fno-
>> common.
>> > Sometimes vectorization flags also triggers if conversion.
>> > (2) Also hashing base DRs for writes only.
>> >
>> > gcc/ChangeLog
>> > 2015-11-19  Venkataramanan  
>> >
>> > PR tree-optimization/67326
>> > * tree-if-conv.c  (offset_DR_map): Define.
>> > (struct ifc_dr): Add new tree base_predicate field.
>> > (hash_memrefs_baserefs_and_store_DRs_read_written_info): Hash
>> offsets, DR pairs
>> > and hash base ref,  DR pairs  for write type DRs.
>> > (ifcvt_memrefs_wont_trap):  Guard checks with -ftree-loop-if-
>> convert-stores flag.
>> >Check for similar DR that are accessed unconditionally.
>> >(if_convertible_loop_p_1):  Initialize and delete offset hash
>> > maps
>> >
>> > gcc/testsuite/ChangeLog
>> > 2015-11-19  Venkataramanan  
>> > * gcc.dg/tree-ssa/ifc-pr67326.c:  Add new.
>> >
>> > Regstrapped on x86_64, Ok for trunk?
>>
>> +  if (offset)
>> +{
>> +  offset_master_dr = &offset_DR_map->get_or_insert (offset,&exist3);
>> +  if (!exist3)
>> +   *offset_master_dr = a;
>> +
>> +  if (DR_RW_UNCONDITIONALLY (*offset_master_dr) != 1)
>> +   DR_RW_UNCONDITIONALLY (*offset_master_dr)
>> +   = DR_RW_UNCONDITIONALLY (*master_dr);
>>
>> this is fishy - as far as I can see offset_master globs all _candidates_ and
>>
>> +  else if (DR_OFFSET (a))
>> +{
>> +  offset_dr = offset_DR_map->get (DR_OFFSET (a));
>> +  if ((DR_RW_UNCONDITIONALLY (*offset_dr) == 1)
>> +  && DR_NUM_DIMENSIONS (a) == DR_NUM_DIMENSIONS
>> (*offset_dr))
>> +   {
>> + tree base_tree = get_base_address (DR_REF (a));
>> + if (DECL_P (base_tree)
>> + && flag_tree_loop_if_convert_stores
>> + && decl_binds_to_current_def_p (base_tree)
>> + && !TREE_READONLY (base_tree))
>> +   return true;
>> +   }
>> +}
>>
>> where with this that actually checks something (DR_NUM_DIMENSIONS is
>> not something you can use to identify two arrays with the same domain) will
>> then consider DR_DW_UNCONDITIONALLY ORed from all _candidates_ but
>> not only from those which really have the same domain.
>>
>> You need to do the domain check as part of the hash-map
>> hashing/comparing.
>>
>> Note that there is no bounds info in the data ref info so you need to
>>   a) consider DR_OFFSET + DR_INIT
>>   b) verify the access size is the same (TYPE_SIZE_UNIT (TREE_TYPE (dr-
>> >ref)))
>>   c) verify the base objects are of the same size - note this is somewhat
>> difficult as the base object for DR_OFFSET/INIT is starting at
>> DR_BASE_ADDRESS so maybe restrict this to ADDR_EXPR 
>> DR_BASE_ADDRESS cases where you can look at DECL_SIZE (decl) of both
>> candidates
>>
>> You can also try using indices (DR_BASE_OBJECT plus DR_ACCESS_FNS when
>> DR_UNCONSTRAINED_BASE is false).  If the size of DR_BASE_OBJECT
>> matches and all access functions are equal it should be a compatible enough
>> case as well.
>
> Ok,  I will take some time to figure out on domain analysis part.
>
>>
>> I'd say you should split out the base_predicate introduction into a separate
>> patch (this change looks ok).
>>
>
> Attached patch has the  "base_predicate" introduction part alone.
> It does the predicate folding  and hashes base references for only write type 
> DRs while hashing.
> I have not added any new test case since we already have  ifc-8.c
>
> Also fixed formatting issues Jakub  pointed out for this patch.
>
> Boot strapped on X86_64.
>
> Ok to upstream if it passes regression tests?

Ok.

Thanks,
Richard.

> gcc/ChangeLog
> 2015-11-27  Venkataramanan Kumar  
>
> * tree-if-conv.c (struct ifc_dr): Add new tree
> base_predicate field.
> (hash_memrefs_baserefs_and_store_DRs_read_written_info): Hash
> base ref, DR pairs and store base_predicate for write type DRs.
> (ifcvt_memrefs_wont_trap): Guard checks with
> -ftre

Re: [Fortran, patch, pr68218, backport to 5 and 4.9, v1] ALLOCATE with size given by a module function

2015-11-27 Thread Andre Vehreschild
Hi all,

I have backported the patch for 68218 (multiple calls of the same
function, where only one call is expected and reasonable) to
gcc-5-branch and gcc-4_9-branch.

Bootstrapped and regtested on x86_64-linux-gnu/f21.

Ok for gcc-5-branch?

Ok for gcc-4_9-branch?

The Changelog is identical for both patches. The patches are mostly,
too, just a slight shift.

Regards,
Andre

On Sun, 8 Nov 2015 18:48:50 +0100
Andre Vehreschild  wrote:

> Hi Paul,
> 
> thanks for the review. Comitted as r229956.
> 
> In 5 and 4.9 the same issue exists. Currently checking whether the same
> patch helps.
> 
> Regards,
>   Andre
> 
> On Sat, 7 Nov 2015 14:58:35 +0100
> Paul Richard Thomas  wrote:
> 
> > Dear Andre,
> > 
> > OK for trunk.
> > 
> > I understand that you have investigated the issue(s) reported to you
> > by Dominique and can find no sign of them.
> > 
> > Thanks
> > 
> > Paul
> > 
> > On 5 November 2015 at 15:29, Andre Vehreschild  wrote:
> > > Hi all,
> > >
> > > attached is a rather trivial patch to prevent multiple evaluations of a
> > > function in:
> > >
> > >   allocate( array(func()) )
> > >
> > > The patch tests whether the upper bound of the array is a function
> > > and calls gfc_evaluate_now().
> > >
> > > Bootstrapped and regtested for x86_64-linux-gnu/f21.
> > >
> > > Ok for trunk?
> > >
> > > Regards,
> > > Andre
> > > --
> > > Andre Vehreschild * Email: vehre ad gmx dot de
> > 
> > 
> > 
> 
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 877e371..4928adf 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -4976,6 +4976,8 @@ gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset,
   gcc_assert (ubound);
   gfc_conv_expr_type (&se, ubound, gfc_array_index_type);
   gfc_add_block_to_block (pblock, &se.pre);
+  if (ubound->expr_type == EXPR_FUNCTION)
+	se.expr = gfc_evaluate_now (se.expr, pblock);
 
   gfc_conv_descriptor_ubound_set (descriptor_block, descriptor,
   gfc_rank_cst[n], se.expr);
diff --git a/gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90 b/gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90
new file mode 100644
index 000..686b612
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90
@@ -0,0 +1,29 @@
+! { dg-do run }
+! { dg-options "-fdump-tree-original" }
+
+MODULE mo_test
+
+  integer :: n = 0
+CONTAINS
+
+  FUNCTION nquery()
+INTEGER :: nquery
+WRITE (0,*) "hello!"
+n = n + 1
+nquery = n
+  END FUNCTION nquery
+
+END MODULE mo_test
+
+
+! --
+! MAIN PROGRAM
+! --
+PROGRAM example
+   USE mo_test
+   INTEGER, ALLOCATABLE :: query_buf(:)
+   ALLOCATE(query_buf(nquery()))
+   if (n /= 1 .or. size(query_buf) /= n) call abort()
+END PROGRAM example
+
+! { dg-final { scan-tree-dump-times "nquery" 5 "original" } }


pr68218_v5_1.clog
Description: Binary data
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 9c175b1..3c2c640 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -5030,6 +5030,8 @@ gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset,
   gcc_assert (ubound);
   gfc_conv_expr_type (&se, ubound, gfc_array_index_type);
   gfc_add_block_to_block (pblock, &se.pre);
+  if (ubound->expr_type == EXPR_FUNCTION)
+	se.expr = gfc_evaluate_now (se.expr, pblock);
 
   gfc_conv_descriptor_ubound_set (descriptor_block, descriptor,
   gfc_rank_cst[n], se.expr);
diff --git a/gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90 b/gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90
new file mode 100644
index 000..686b612
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90
@@ -0,0 +1,29 @@
+! { dg-do run }
+! { dg-options "-fdump-tree-original" }
+
+MODULE mo_test
+
+  integer :: n = 0
+CONTAINS
+
+  FUNCTION nquery()
+INTEGER :: nquery
+WRITE (0,*) "hello!"
+n = n + 1
+nquery = n
+  END FUNCTION nquery
+
+END MODULE mo_test
+
+
+! --
+! MAIN PROGRAM
+! --
+PROGRAM example
+   USE mo_test
+   INTEGER, ALLOCATABLE :: query_buf(:)
+   ALLOCATE(query_buf(nquery()))
+   if (n /= 1 .or. size(query_buf) /= n) call abort()
+END PROGRAM example
+
+! { dg-final { scan-tree-dump-times "nquery" 5 "original" } }


Re: [PATCH] Fix PR68029

2015-11-27 Thread Jiří Engelthaler
Sorry for international characters in my name. It should be

Jiri Engelthaler

2015-11-27 13:29 GMT+01:00 Engelthaler Jiří :
>
>
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
> Behalf Of Jiří Engelthaler
> Sent: Friday, November 27, 2015 11:23 AM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] Fix PR68029
>
> Hi all,
>   the attached patch fixes PR68029 where GCC -fdiagnostics_color parameter 
> was ignored if it was as first parameter. The problem is in GCC 6.0 version 
> only so should be applied to the trunk.
>
>
> Jiří Engelthaler
2015-11-27  Jiri Engelthaler 

PR driver/68029
* opts-common.c (prune_options): fdiagnostics_color ignored
if it was as first parameter
 gcc/opts-common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/opts-common.c b/gcc/opts-common.c
index d9bf4d4..24967cc 100644
--- a/gcc/opts-common.c
+++ b/gcc/opts-common.c
@@ -885,7 +885,7 @@ keep:
}
 }
 
-  if (fdiagnostics_color_idx > 1)
+  if (fdiagnostics_color_idx >= 1)
 {
   /* We put the last -fdiagnostics-color= at the first position
 after argv[0] so it can take effect immediately.  */


Re: [PATCH] Fix PR68029

2015-11-27 Thread Bernd Schmidt

On 11/27/2015 01:30 PM, Jiří Engelthaler wrote:

Sorry for international characters in my name. It should be

Jiri Engelthaler

2015-11-27 13:29 GMT+01:00 Engelthaler Jiří :


There is precedent for non-ASCII characters in ChangeLogs. Grep for 
Rafael Ávila de Espíndola. But I think there should be two spaces before 
the email address.



PR driver/68029
* opts-common.c (prune_options): fdiagnostics_color ignored
if it was as first parameter


This should read "Don't ignore -fdiagnostics-color if it is the first
parameter." Full sentences with punctuation.

Otherwise this seems ok.


Bernd


Re: [PATCH PR68529]Fix not recognized scev by computing no-overflow info for loop with NE_EXPR exit condition

2015-11-27 Thread Richard Biener
On Fri, Nov 27, 2015 at 12:44 PM, Bin Cheng  wrote:
> Hi,
> This patch is to fix PR68529.  In my previous scev/niter overflow patches, I
> only computed no-overflow information for control iv in simple loops with
> LT_EXPR as exit condition code.  This bug is about loop with NE_EXPR as exit
> condition code.  Given below example:
>
> #include 
> #include 
>
> int main(){
> char c[1]={};
> unsigned int nchar=;
>
> while(nchar--!=0){
>c[nchar]='A';
>   }
>
> printf("%s\n",c);
> return 0;
> }
> nchar used as an index to array 'c' doesn't overflow during loop iterations.
> Thus &c[nchar] acts as a scev.  GCC now fails to do that.  With this patch,
> this issue is fixed.
>
> Furthermore, the computation of no-overflow information could be improved by
> using TREE_OVERFLOW_UNDEFINED semantic of signed type for C/C++.  I didn't
> do that because:
> 1) I doubt how useful it could be because I have already changed scev to use
> the semantic whenever possible.  It doesn't need loop niter analysis' help.
> 2) To do that, I need to expose chrec_convert_aggressive information out of
> scev in function simple_iv, because that function could corrupt
> TREE_OVERFLOW_UNDEFINED semantic assumption.  This isn't appropriate for
> Stage3.
>
> Bootstrap and test on x86_64 and x86.  I don't expect any issue on aarch64
> either.  Is it OK?

+  if (integer_onep (e)
+  && (integer_onep (s)
+ || (TREE_CODE (c) == INTEGER_CST
+ && TREE_CODE (s) == INTEGER_CST
+ && wi::mod_trunc (c, s, TYPE_SIGN (type)) == 0)))

the only thing I'm looking at here is the modulo sign.  Considering
we're looking at the sign bit of the step to normalize 'c' and 's' what
happens for

  for (unsigned int i = 0; i != 1000; --i)

?  I suppose we get s == 1 and c == -1000U and you'll say the control
IV doesn't wrap.  Similar for i -= 2 where even when we use a signed
modulo (singed)-1000U % 2 is still 0.

So I think you need to remember whether we consider the step
to be negative and compare iv->base and final as well.

Bonus points for a wrong-code testcase with the above.

I'd also like to see a testcase exercising step != 1.

Thanks,
Richard.

> 2015-11-27  Bin Cheng  
>
> PR tree-optimization/68529
> * tree-ssa-loop-niter.c (number_of_iterations_ne): Add new param.
> Compute no-overflow information for control iv.
> (number_of_iterations_lt, number_of_iterations_le): Add new param.
> (number_of_iterations_cond): Pass new argument to above functions.
>
> 2015-11-27  Bin Cheng  
>
> PR tree-optimization/68529
> * gcc.dg/tree-ssa/pr68529-1.c: New test.
> * gcc.dg/tree-ssa/pr68529-2.c: New test.
> * gcc.dg/tree-ssa/pr68529-3.c: New test.
>


[Patch AArch64] Reinstate CANNOT_CHANGE_MODE_CLASS to fix pr67609

2015-11-27 Thread James Greenhalgh

Hi,

This patch follow Richard Henderson's advice to tighten up
CANNOT_CHANGE_MODE_CLASS for AArch64 to avoid a simplification bug in
the middle-end.

There is nothing AArch64-specific about the testcase which triggers this,
so I'll put it in the testcase for other targets. If you see a regression,
the explanation in the PR is much more thorough and correct than I can
reproduce here, so I'd recommend starting there. In short, target
maintainers need to:

> forbid BITS_PER_WORD (64-bit) subregs of hard registers >
> BITS_PER_WORD.  See the verbiage I added to the i386 backend for this.

We removed the CANNOT_CHANGE_MODE_CLASS macro back in January 2015. Before
then, we used it to workaround bugs in big-endian vector support
( https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01216.html ). Ideally,
we'd not need to bring this macro back, but if we can't fix the middle-end
bug this exposes, we need the workaround.

For AArch64, doing this runs in to some trouble with two of our
instruction patterns - we end up with:

  (truncate:DI (reg:TF))

Which fails if it ever make it through to the simplify routines with
something nasty like:

  (and:DI (truncate:DI (reg:TF 32 v0 [ a ]))
  (const_int 2305843009213693951 [0x1fff]))

The simplify routines want to turn this around to look like:

  (truncate:DI (and:TF (reg:TF 32 v0 [ a ])
  (const_int 2305843009213693951 [0x1fff])))

Which then wants to further simplify the expression by first building
the constant in TF mode, and trunc_int_for_mode barfs:

  0x7a38a5 trunc_int_for_mode(long, machine_mode)
  .../gcc/explow.c:53

We can fix that by changing the patterns to use a zero_extract, which seems
more in line with what they actually express (extracting the two 64-bit
halves of a 128-bit value).

Bootstrapped on aarch64-none-linux-gnu, and tested on aarch64-none-elf and
aarch64_be-none-elf without seeing any correctness regressions.

OK?

If so, we ought to get this backported to the release branches, the gcc-5
backport applies clean (testing ongoing but looks OK so far) if the release
managers and AArch64 maintainers agree this is something that should be
backported this late in the 5.3 release cycle.

Thanks,
James

---
2015-11-27  James Greenhalgh  

* config/aarch64/aarch64-protos.h
(aarch64_cannot_change_mode_class): Bring back.
* config/aarch64/aarch64.c
(aarch64_cannot_change_mode_class): Likewise.
* config/aarch64/aarch64.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
* config/aarch64/aarch64.md (aarch64_movdi_low): Use
zero_extract rather than truncate.
(aarch64_movdi_high): Likewise.

2015-11-27  James Greenhalgh  

* gcc.dg/torture/pr67609.c: New.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index e0a050c..59d3da4 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -269,6 +269,9 @@ int aarch64_get_condition_code (rtx);
 bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
 int aarch64_branch_cost (bool, bool);
 enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
+bool aarch64_cannot_change_mode_class (machine_mode,
+   machine_mode,
+   enum reg_class);
 bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
 bool aarch64_constant_address_p (rtx);
 bool aarch64_expand_movmem (rtx *);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3fe2f0f..fadb716 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12408,6 +12408,24 @@ aarch64_vectorize_vec_perm_const_ok (machine_mode vmode,
   return ret;
 }
 
+/* Implement target hook CANNOT_CHANGE_MODE_CLASS.  */
+bool
+aarch64_cannot_change_mode_class (machine_mode from,
+  machine_mode to,
+  enum reg_class rclass)
+{
+  /* We cannot allow word_mode subregs of full vector modes.
+ Otherwise the middle-end will assume it's ok to store to
+ (subreg:DI (reg:TI 100) 0) in order to modify only the low 64 bits
+ of the 128-bit register.  However, after reload the subreg will
+ be dropped leaving a plain DImode store.  See PR67609 for a more
+ detailed dicussion.  In all other cases, we want to be premissive
+ and return false.  */
+  return (reg_classes_intersect_p (FP_REGS, rclass)
+	  && GET_MODE_SIZE (to) == UNITS_PER_WORD
+	  && GET_MODE_SIZE (from) > UNITS_PER_WORD);
+}
+
 rtx
 aarch64_reverse_mask (enum machine_mode mode)
 {
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 68c006f..66b768d 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -831,6 +831,9 @@ do {	 \
   extern void  __aarch64_sync_cache_range (void *, void *);	\
   __aarch64_sync_cache_range (beg, end)
 
+#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
+  aarch64_cannot_change_mode_class (FROM, TO, CLASS)
+
 #define SHIFT_COUNT_TRUNCATED !TARGET_SIMD
 
 /* Choose

[testsuite][ARM target attributes] Fix effective_target tests

2015-11-27 Thread Christophe Lyon
Hi,

After the recent commits from Christian adding target attributes
support for ARM FPU settings,  I've noticed that some of the tests
were failing because of incorrect assumptions wrt to the default
cpu/fpu/float-abi of the compiler.

This patch fixes the problems I've noticed in the following way:
- do not force -mfloat-abi=softfp in dg-options, to avoid conflicts
when gcc is configured --with-float=hard

- change arm_vfp_ok such that it tries several -mfpu/-mfloat-abi
flags, checks that __ARM_FP is defined and __ARM_NEON_FP is not
defined

- introduce arm_fp_ok, which is similar but does not enforce fpu setting

- add a new effective_target: arm_crypto_pragma_ok to check that
setting this fpu via a pragma is actually supported by the current
"multilib". This is different from checking the command-line option
because the pragma might conflict with the command-line options in
use.

The updates in the testcases are as follows:
- attr-crypto.c, we have to make sure that the defaut fpu does not
conflict with the one forced by pragma. That's why I use the arm_vfp
options/effective_target. This is needed if gcc has been configured
--with-fpu=neon-fp16, as the pragma fpu=crypto-neon-fp-armv8 would
conflict.

- attr-neon-builtin-fail.c: use arm_fp to force the appropriate
float-abi setting. Enforcing fpu is not needed here.

- attr-neon-fp16.c: similar, I also removed arm_neon_ok since it was
not necessary to make the test pass in my testing. On second thought,
I'm wondering whether I should leave it and make the test unsupported
in more cases (such as when forcing -march=armv5t, although it does
pass with this patch)

- attr-neon2.c: use arm_vfp to force the appropriate float-abi
setting. Enforcing mfpu=vfp is needed to avoid conflict with the
pragma target fpu=neon (for instance if the toolchain default is
neon-fp16)

- attr-neon3.c: similar

Tested on a variety of configurations, see:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/230929-target-attr/report-build-info.html

Note that the regressions reported fall into 3 categories:
- when forcing march=armv5t: tests are now unsupported because I
modified arm_crypto_ok to require arm_v8_neon_ok instead of arm32.

- the warning reported by attr-neon-builtin-fail.c moved from line 12
to 14 and is thus seen as a regression + one improvement

- finally, attr-neon-fp16.c causes an ICE on armeb compilers, for
which I need to post a bugzilla.


TBH, I'm a bit concerned by the complexity of all these multilib-like
conditions. I'm confident that I'm still missing some combinations :-)

And with new target attributes coming, new architectures etc... all
this logic is likely to become even more complex.

That being said, OK for trunk?

Christophe


2015-11-27  Christophe Lyon  

* lib/target-supports.exp
(check_effective_target_arm_vfp_ok_nocache): New.
(check_effective_target_arm_vfp_ok): Call the new
check_effective_target_arm_vfp_ok_nocache function.
(check_effective_target_arm_fp_ok_nocache): New.
(check_effective_target_arm_fp_ok): New.
(add_options_for_arm_fp): New.
(check_effective_target_arm_crypto_ok_nocache): Require
target_arm_v8_neon_ok instead of arm32.
(check_effective_target_arm_crypto_pragma_ok_nocache): New.
(check_effective_target_arm_crypto_pragma_ok): New.
(add_options_for_arm_vfp): New.
* gcc.target/arm/attr-crypto.c: Use arm_crypto_pragma_ok effective
target. Do not force -mfloat-abi=softfp, use arm_vfp effective
target instead.
* gcc.target/arm/attr-neon-builtin-fail.c: Do not force
-mfloat-abi=softfp, use arm_fp effective target instead.
* gcc.target/arm/attr-neon-fp16.c: Likewise. Remove arm_neon_ok
dependency.
* gcc.target/arm/attr-neon2.c: Do not force -mfloat-abi=softfp,
use arm_vfp effective target instead.
* gcc.target/arm/attr-neon3.c: Likewise.
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 254c4e3..886ad66 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2664,17 +2664,34 @@ proc check_effective_target_arm_vect_no_misalign { } {
 
 
 # Return 1 if this is an ARM target supporting -mfpu=vfp
-# -mfloat-abi=softfp.  Some multilibs may be incompatible with these
-# options.
+# -mfloat-abi=softfp or equivalent options.  Some multilibs may be
+# incompatible with these options.  Also set et_arm_vfp_flags to the
+# best options to add.
 
-proc check_effective_target_arm_vfp_ok { } {
+proc check_effective_target_arm_vfp_ok_nocache { } {
+global et_arm_vfp_flags
+set et_arm_vfp_flags ""
 if { [check_effective_target_arm32] } {
-   return [check_no_compiler_messages arm_vfp_ok object {
-   int dummy;
-   } "-mfpu=vfp -mfloat-abi=softfp"]
-} else {
-   return 0
+   foreach flags {"-mfpu=vfp" "-mfpu=vfp -mfloat-abi=softfp" "-mfpu=vfp 
-mfloat-abi=hard"} {
+   if { [check_no_compiler_messag

Re: [Fortran, patch, pr68218, backport to 5 and 4.9, v1] ALLOCATE with size given by a module function

2015-11-27 Thread Mikael Morin

Le 27/11/2015 13:20, Andre Vehreschild a écrit :

Hi all,

I have backported the patch for 68218 (multiple calls of the same
function, where only one call is expected and reasonable) to
gcc-5-branch and gcc-4_9-branch.

Bootstrapped and regtested on x86_64-linux-gnu/f21.

Ok for gcc-5-branch?

Ok for gcc-4_9-branch?


Yes for both.
Richi said in [1] that a 5.3 release candidate was planned for either 
today or next monday, so before proceeding, please ping one release 
manager on IRC to check that your commit won't interfere with the 
release process.

Thanks

Mikael

[1] https://gcc.gnu.org/ml/gcc/2015-11/msg00186.html


Re: [PATCH] Fix PR68029

2015-11-27 Thread Jiří Engelthaler
2015-11-27 13:49 GMT+01:00 Bernd Schmidt :
> On 11/27/2015 01:30 PM, Jiří Engelthaler wrote:
>>
>> Sorry for international characters in my name. It should be
>>
>> Jiri Engelthaler
>>
>> 2015-11-27 13:29 GMT+01:00 Engelthaler Jiří :
>
>
> There is precedent for non-ASCII characters in ChangeLogs. Grep for Rafael
> Ávila de Espíndola. But I think there should be two spaces before the email
> address.

You are right - two spaces.

>> PR driver/68029
>> * opts-common.c (prune_options): fdiagnostics_color ignored
>> if it was as first parameter
>
>
> This should read "Don't ignore -fdiagnostics-color if it is the first
> parameter." Full sentences with punctuation.

Changelog modified.

Thank you for recommendation, this is my first patch to GCC.

Engy


Re: [PATCH] Fix PR68029

2015-11-27 Thread Jiří Engelthaler
2015-11-27 13:49 GMT+01:00 Bernd Schmidt :
> On 11/27/2015 01:30 PM, Jiří Engelthaler wrote:
>>
>> Sorry for international characters in my name. It should be
>>
>> Jiri Engelthaler
>>
>> 2015-11-27 13:29 GMT+01:00 Engelthaler Jiří :
>
>
> There is precedent for non-ASCII characters in ChangeLogs. Grep for Rafael
> Ávila de Espíndola. But I think there should be two spaces before the email
> address.

You are right - two spaces.

>> PR driver/68029
>> * opts-common.c (prune_options): fdiagnostics_color ignored
>> if it was as first parameter
>
>
> This should read "Don't ignore -fdiagnostics-color if it is the first
> parameter." Full sentences with punctuation.

Changelog modified.

Thank you for recommendation, this is my first patch to GCC.

Engy
2015-11-27  Jiri Engelthaler  

PR driver/68029
* opts-common.c (prune_options): Don't ignore -fdiagnostics-color 
if it is the first parameter.
 gcc/opts-common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/opts-common.c b/gcc/opts-common.c
index d9bf4d4..24967cc 100644
--- a/gcc/opts-common.c
+++ b/gcc/opts-common.c
@@ -885,7 +885,7 @@ keep:
}
 }
 
-  if (fdiagnostics_color_idx > 1)
+  if (fdiagnostics_color_idx >= 1)
 {
   /* We put the last -fdiagnostics-color= at the first position
 after argv[0] so it can take effect immediately.  */


Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-11-27 Thread Alan Lawrence
On 26 November 2015 at 14:00, Alan Lawrence  wrote:
> On 6 November 2015 at 16:59, Jakub Jelinek  wrote:
>>
>> In any case, to manually reproduce, compile
>> gnatmake -g -gnatws macrosub.adb
>> with GCC 5.1.1 (before the ARM changes) and then try to run that process 
>> against
>> GCC 5.2.1 (after the ARM changes) libgnat-5.so, which is what make check
>> does (it uses host_gnatmake to compile the support stuff, so ideally the
>> processes built by host gcc/gnatmake should not be run with the
>> LD_LIBRARY_PATH=$ADA_INCLUDE_PATH:$BASE:$LD_LIBRARY_PATH
>> in the environment, and others should).
>> In macrosub in particular, the problem is in:
>>   WHILE NOT END_OF_FILE (INFILE1) LOOP
>>GET_LINE (INFILE1, A_LINE, A_LENGTH);
>> in FILL_TABLE, where A_LINE'First is 0 and A_LINE'Last is 400 (if I remember
>> right), but if you step into GET_LINE compiled by GCC 5.2.1, Item'First
>> and Item'Last don't match that.
>
> Ok, I see the mismatch now.

The type affected in Jakub's case here is an Ada String, which looks like this:

 constant 64>
unit size  constant 8>
align 64 symtab -151604912 alias set -1 canonical type 0xf7569720
fields 
asm_written unsigned SI
size 
unit size 
align 32 symtab -151604672 alias set -1 canonical type 0xf756a2a0>
unsigned nonaddressable SI file  line 0 col 0 size
 unit size 
align 32 offset_align 64
offset 
bit offset  context

chain 
visited unsigned nonaddressable SI file  line 0
col 0 size  unit size 
align 32 offset_align 64 offset 
bit offset  context >> context 
unconstrained array 
BLK
align 8 symtab 0 alias set -1 canonical type 0xf7569c00
context 
pointer_to_this 
reference_to_this  chain
>
chain >

i.e. a 64-bit DImode struct, with alignment set to 64, containing
P_ARRAY a 32-bit pointer with alignment 32, and P_BOUNDS a 32-bit pointer
with alignment 32, pointing to a record (of size 64, alignment 32, containing
two 32-bit ints LB0 and UB0).

AFAICT, in the fill_table/get_line case, the first parameter to
get_line is a file, a simple pointer; then we have a string. So

*fill_table (compiled with 5.1, doubleword aligned) should pass the
string P_ARRAY in r2 and P_BOUNDS in r3.

0x1f334movt   r3, #2
0x1f338strr3, [r11, #-504]; 0x
0x1f33csubr3, r11, #508   ; 0x1fc
0x1f340ldrd   r2, [r3]
0x1f344movr0, r1
0x1f348bl 0x1aee4 

Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-11-27 Thread Christophe Lyon
On 26 November 2015 at 17:10, Matthew Wahab  wrote:
> Attached the missing patch.
> Matthew
>
>
> On 26/11/15 16:02, Matthew Wahab wrote:
>>
>> Hello,
>>
>> This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
>> tests to specify targest and to set up command line options.
>> It builds on the ARMv8.1 target support added for AArch64 tests, partly
>> reworking that support to take into account the different configurations
>> that tests may be run under.
>>
>> The main changes are
>> - add_options_for_arm_v8_1a_neon: Call
>>check_effective_target_arm_v8_1a_neon_ok to select a suitable set of
>>options.
>> - check_effective_target_arm_v8_1a_neon_ok: Test possible command line
>>options, recording the first set that works.
>> - check_effective_target_arm_v8_1a_neon_hw: Add a test for ARM targets.
>>
>> Tested the series for arm-none-eabi with cross-compiled check-gcc on an
>> ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
>> bootstrap and make check.
>>
>> Ok for trunk?
>> Matthew
>>

Hi Matthew,

I may be mistaken, but -mfpu=neon-fp-armv8 and -mfloat-abi=softfp are not
supported by aarch64-gcc. So it seems to me that
check_effective_target_arm_v8_1a_neon_ok_nocache will not always work
for aarch64 after your patch.

Or does it work because no option is needed and thus "" always
matches and thus the loop always exits after the first iteration
on aarch64?

Maybe a more accurate comment would help remembering that, in case
-mfpu option becomes necessary for aarch64.

Christophe.



>> testsuite/
>> 2015-11-26  Matthew Wahab  
>>
>>  * lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
>>  comment.  Use check_effetive_target_arm_v8_1a_neon_ok to select
>>  the command line options.
>>  (check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
>>  test to allow ARM targets.  Select and record a working set of
>>  command line options.
>>  (check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
>>  targets.
>>
>


Re: [RFC] Combine vectorized loops with its scalar remainder.

2015-11-27 Thread Richard Biener
On Fri, Nov 13, 2015 at 11:35 AM, Yuri Rumyantsev  wrote:
> Hi Richard,
>
> Here is updated version of the patch which 91) is in sync with trunk
> compiler and (2) contains simple cost model to estimate profitability
> of scalar epilogue elimination. The part related to vectorization of
> loops with small trip count is in process of developing. Note that
> implemented cost model was not tuned  well for HASWELL and KNL but we
> got  ~6% speed-up on 436.cactusADM from spec2006 suite for HASWELL.

Ok, so I don't know where to start with this.

First of all while I wanted to have the actual stmt processing to be
as post-processing
on the vectorized loop body I didn't want to have this competely separated from
vectorizing.

So, do combine_vect_loop_remainder () from vect_transform_loop, not by iterating
over all (vectorized) loops at the end.

Second, all the adjustments of the number of iterations for the vector
loop should
be integrated into the main vectorization scheme as should determining the
cost of the predication.  So you'll end up adding a
LOOP_VINFO_MASK_MAIN_LOOP_FOR_EPILOGUE flag, determined during
cost analysis and during code generation adjust vector iteration computation
accordingly and _not_ generate the epilogue loop (or wire it up correctly in
the first place).

The actual stmt processing should then still happen in a similar way as you do.

So I'm going to comment on that part only as I expect the rest will look a lot
different.

+/* Generate induction_vector which will be used to mask evaluation.  */
+
+static tree
+gen_vec_induction (loop_vec_info loop_vinfo, unsigned elem_size, unsigned size)
+{

please make use of create_iv.  Add more comments.  I reverse-engineered
that you add a { { 0, ..., vf }, +, {vf, ... vf } } IV which you use
in gen_mask_for_remainder
by comparing it against { niter, ..., niter }.

+  gsi = gsi_after_labels (loop->header);
+  niters = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
+  ? LOOP_VINFO_NITERS (loop_vinfo)
+  : LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo);

that's either wrong or unnecessary.  if ! peeling for alignment
loop-vinfo-niters
is equal to loop-vinfo-niters-unchanged.

+  ptr = build_int_cst (reference_alias_ptr_type (ref), 0);
+  if (!SSA_NAME_PTR_INFO (addr))
+   copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr), ref);

vect_duplicate_ssa_name_ptr_info.

+
+static void
+fix_mask_for_masked_ld_st (vec *masked_stmt, tree mask)
+{
+  gimple *stmt, *new_stmt;
+  tree old, lhs, vectype, var, n_lhs;

no comment?  what's this for.

+/* Convert vectorized reductions to VEC_COND statements to preserve
+   reduction semantic:
+   s1 = x + s2 --> t = x + s2; s1 = (mask)? t : s2.  */
+
+static void
+convert_reductions (loop_vec_info loop_vinfo, tree mask)
+{

for reductions it looks like preserving the last iteration x plus the mask
could avoid predicating it this way and compensate in the reduction
epilogue by "subtracting" x & mask?  With true predication support
that'll likely be more expensive of course.

+  /* Generate new VEC_COND expr.  */
+  vec_cond_expr = build3 (VEC_COND_EXPR, vectype, mask, new_lhs, rhs);
+  new_stmt = gimple_build_assign (lhs, vec_cond_expr);

gimple_build_assign (lhs, VEC_COND_EXPR, vectype, mask, new_lhs, rhs);

+/* Return true if MEM_REF is incremented by vector size and false
otherwise.  */
+
+static bool
+mem_ref_is_vec_size_incremented (loop_vec_info loop_vinfo, tree lhs)
+{
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);

what?!  Just look at DR_STEP of the store?


+void
+combine_vect_loop_remainder (loop_vec_info loop_vinfo)
+{
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  auto_vec loads;
+  auto_vec stores;

so you need to re-structure this in a way that it computes

  a) wheter it can perform the operation - and you need to do that
  reliably before the operation has taken place
  b) its cost

instead of looking at def types or gimple_assign_load/store_p predicates
please look at STMT_VINFO_TYPE instead.

I don't like the new target hook for the costing.  We do need some major
re-structuring in the vectorizer cost model implementation, this doesn't go
into the right direction.

A simplistic hook following the current scheme would have used
the vect_cost_for_stmt as argument and mirror builtin_vectorization_cost.

There is not a single testcase in the patch.  I would have expected one that
makes sure we keep the 6% speedup for cactusADM at least.


So this was a 45minute "overall" review not going into all the
implementation details.

Thanks,
Richard.


> 2015-11-10 17:52 GMT+03:00 Richard Biener :
>> On Tue, Nov 10, 2015 at 2:02 PM, Ilya Enkovich  
>> wrote:
>>> 2015-11-10 15:30 GMT+03:00 Richard Biener :
 On Tue, Nov 3, 2015 at 1:08 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> It looks like misunderstanding - we assume that for GCCv6 the simple
> scheme of remainder will be used through introducing new IV :
> https://gcc.gn

RE: [PATCH 1/4 v2][AArch64] Generalize CCMP support

2015-11-27 Thread Wilco Dijkstra

> James Greenhalgh wrote:
> > Could you please repost this with the word-wrapping issues fixed.
> > I can't apply it to my tree for review or to commit it on your behalf in 
> > the current form.

So it looks like Outlook no longer supports sending emails without wrapping and 
the
maximum is only 132 characters... Now attached.

Wilco



---
 gcc/ccmp.c   |  21 ++-
 gcc/config/aarch64/aarch64-modes.def |  10 --
 gcc/config/aarch64/aarch64.c | 305 ---
 gcc/config/aarch64/aarch64.md|  68 ++--
 gcc/config/aarch64/predicates.md |  17 --
 gcc/doc/tm.texi  |  36 ++---
 gcc/target.def   |  36 ++---
 7 files changed, 128 insertions(+), 365 deletions(-)

diff --git a/gcc/ccmp.c b/gcc/ccmp.c
index 20348d9..58ac126 100644
--- a/gcc/ccmp.c
+++ b/gcc/ccmp.c
@@ -65,6 +65,10 @@ along with GCC; see the file COPYING3.  If not see
 - gen_ccmp_first expands the first compare in CCMP.
 - gen_ccmp_next expands the following compares.
 
+   Both hooks return a comparison with the CC register that is equivalent
+   to the value of the gimple comparison.  This is used by the next CCMP
+   and in the final conditional store.
+
  * We use cstorecc4 pattern to convert the CCmode intermediate to
the integer mode result that expand_normal is expecting.
 
@@ -130,10 +134,12 @@ ccmp_candidate_p (gimple *g)
   return false;
 }
 
-/* PREV is the CC flag from precvious compares.  The function expands the
-   next compare based on G which ops previous compare with CODE.
+/* PREV is a comparison with the CC register which represents the
+   result of the previous CMP or CCMP.  The function expands the
+   next compare based on G which is ANDed/ORed with the previous
+   compare depending on CODE.
PREP_SEQ returns all insns to prepare opearands for compare.
-   GEN_SEQ returnss all compare insns.  */
+   GEN_SEQ returns all compare insns.  */
 static rtx
 expand_ccmp_next (gimple *g, enum tree_code code, rtx prev,
  rtx *prep_seq, rtx *gen_seq)
@@ -226,7 +232,7 @@ expand_ccmp_expr_1 (gimple *g, rtx *prep_seq, rtx *gen_seq)
   return NULL_RTX;
 }
 
-/* Main entry to expand conditional compare statement G. 
+/* Main entry to expand conditional compare statement G.
Return NULL_RTX if G is not a legal candidate or expand fail.
Otherwise return the target.  */
 rtx
@@ -249,9 +255,10 @@ expand_ccmp_expr (gimple *g)
   enum insn_code icode;
   enum machine_mode cc_mode = CCmode;
   tree lhs = gimple_assign_lhs (g);
+  rtx_code cmp_code = GET_CODE (tmp);
 
 #ifdef SELECT_CC_MODE
-  cc_mode = SELECT_CC_MODE (NE, tmp, const0_rtx);
+  cc_mode = SELECT_CC_MODE (cmp_code, XEXP (tmp, 0), const0_rtx);
 #endif
   icode = optab_handler (cstore_optab, cc_mode);
   if (icode != CODE_FOR_nothing)
@@ -262,8 +269,8 @@ expand_ccmp_expr (gimple *g)
  emit_insn (prep_seq);
  emit_insn (gen_seq);
 
- tmp = emit_cstore (target, icode, NE, cc_mode, cc_mode,
-0, tmp, const0_rtx, 1, mode);
+ tmp = emit_cstore (target, icode, cmp_code, cc_mode, cc_mode,
+0, XEXP (tmp, 0), const0_rtx, 1, mode);
  if (tmp)
return tmp;
}
diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index 3bf3b2d..0c529e9 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -25,16 +25,6 @@ CC_MODE (CC_ZESWP); /* zero-extend LHS (but swap to make it 
RHS).  */
 CC_MODE (CC_SESWP); /* sign-extend LHS (but swap to make it RHS).  */
 CC_MODE (CC_NZ);/* Only N and Z bits of condition flags are valid.  */
 CC_MODE (CC_Z); /* Only Z bit of condition flags is valid.  */
-CC_MODE (CC_DNE);
-CC_MODE (CC_DEQ);
-CC_MODE (CC_DLE);
-CC_MODE (CC_DLT);
-CC_MODE (CC_DGE);
-CC_MODE (CC_DGT);
-CC_MODE (CC_DLEU);
-CC_MODE (CC_DLTU);
-CC_MODE (CC_DGEU);
-CC_MODE (CC_DGTU);
 
 /* Half-precision floating point for __fp16.  */
 FLOAT_MODE (HF, 2, 0);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3bb4e64..c8bee3b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3907,7 +3907,6 @@ aarch64_get_condition_code (rtx x)
 static int
 aarch64_get_condition_code_1 (enum machine_mode mode, enum rtx_code comp_code)
 {
-  int ne = -1, eq = -1;
   switch (mode)
 {
 case CCFPmode:
@@ -3930,56 +3929,6 @@ aarch64_get_condition_code_1 (enum machine_mode mode, 
enum rtx_code comp_code)
}
   break;
 
-case CC_DNEmode:
-  ne = AARCH64_NE;
-  eq = AARCH64_EQ;
-  break;
-
-case CC_DEQmode:
-  ne = AARCH64_EQ;
-  eq = AARCH64_NE;
-  break;
-
-case CC_DGEmode:
-  ne = AARCH64_GE;
-  eq = AARCH64_LT;
-  break;
-
-case CC_DLTmode:
-  ne = AARCH64_LT;
-  eq = AARCH64_GE;
-  break;
-
-cas

[PATCH][ARC] Refurbish emitting DWARF2 for epilogue.

2015-11-27 Thread Claudiu Zissulescu
Properly emit DWARF2 related information while expanding epilogue. Remove
the -m[no]-epilogue-cfi option as it is not needed any longer. This patch
solves the dwarf2cfi errors observed while running dejagnu tests.

Ok to commit?
Claudiu

gcc/
2015-11-27  Claudiu Zissulescu  

* config/arc/arc.c (frame_move): Set frame related flag.
(arc_save_restore): Emit epilogue related DWARF2 information.
(arc_expand_epilogue): Likewise.
* config/arc/arc.opt (mno-epilogue-cfi): Remove.
(mepilogue-cfi): Likewise.
* doc/invoke.texi: Remove -m[no]-epilogue-cfi documentation.
---
 gcc/config/arc/arc.c   | 87 --
 gcc/config/arc/arc.opt |  8 -
 gcc/doc/invoke.texi| 10 +-
 3 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 8bb0969..5200ea5 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -1884,7 +1884,9 @@ frame_insn (rtx x)
 static rtx
 frame_move (rtx dst, rtx src)
 {
-  return frame_insn (gen_rtx_SET (dst, src));
+  rtx tmp = gen_rtx_SET (dst, src);
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  return frame_insn (tmp);
 }
 
 /* Like frame_move, but add a REG_INC note for REG if ADDR contains an
@@ -2307,7 +2309,15 @@ arc_save_restore (rtx base_reg,
  if (epilogue_p == 2)
sibthunk_insn = insn;
  else
-   frame_insn (insn);
+   {
+ insn = frame_insn (insn);
+ if (epilogue_p)
+   for (r = start_call; r <= end_call; r++)
+ {
+   rtx reg = gen_rtx_REG (SImode, r);
+   add_reg_note (insn, REG_CFA_RESTORE, reg);
+ }
+   }
  offset += off;
}
 
@@ -2317,6 +2327,7 @@ arc_save_restore (rtx base_reg,
{
  rtx reg = gen_rtx_REG (SImode, regno);
  rtx addr, mem;
+ int cfa_adjust = *first_offset;
 
  if (*first_offset)
{
@@ -2332,7 +2343,20 @@ arc_save_restore (rtx base_reg,
}
  mem = gen_frame_mem (SImode, addr);
  if (epilogue_p)
-   frame_move_inc (reg, mem, base_reg, addr);
+   {
+ rtx insn =
+   frame_move_inc (reg, mem, base_reg, addr);
+ add_reg_note (insn, REG_CFA_RESTORE, reg);
+ if (cfa_adjust)
+   {
+ enum reg_note note = REG_CFA_ADJUST_CFA;
+ add_reg_note (insn, note,
+   gen_rtx_SET (stack_pointer_rtx,
+plus_constant (Pmode,
+   
stack_pointer_rtx,
+   cfa_adjust)));
+   }
+   }
  else
frame_move_inc (mem, reg, base_reg, addr);
  offset += UNITS_PER_WORD;
@@ -2341,6 +2365,10 @@ arc_save_restore (rtx base_reg,
 }/* if */
   if (sibthunk_insn)
 {
+  int start_call = frame->millicode_start_reg;
+  int end_call = frame->millicode_end_reg;
+  int r;
+
   rtx r12 = gen_rtx_REG (Pmode, 12);
 
   frame_insn (gen_rtx_SET (r12, GEN_INT (offset)));
@@ -2350,6 +2378,15 @@ arc_save_restore (rtx base_reg,
   gen_rtx_PLUS (Pmode, stack_pointer_rtx, r12));
   sibthunk_insn = emit_jump_insn (sibthunk_insn);
   RTX_FRAME_RELATED_P (sibthunk_insn) = 1;
+
+  /* Would be nice if we could do this earlier, when the PARALLEL
+is populated, but these need to be attached after the
+emit.  */
+  for (r = start_call; r <= end_call; r++)
+   {
+ rtx reg = gen_rtx_REG (SImode, r);
+ add_reg_note (sibthunk_insn, REG_CFA_RESTORE, reg);
+   }
 }
 } /* arc_save_restore */
 
@@ -2470,6 +2507,7 @@ arc_expand_epilogue (int sibcall_p)
   int can_trust_sp_p = !cfun->calls_alloca;
   int first_offset = 0;
   int millicode_p = cfun->machine->frame_info.millicode_end_reg > 0;
+  rtx insn;
 
   size_to_deallocate = size;
 
@@ -2502,11 +2540,18 @@ arc_expand_epilogue (int sibcall_p)
   /* Restore any saved registers.  */
   if (frame_pointer_needed)
 {
- rtx addr = gen_rtx_POST_INC (Pmode, stack_pointer_rtx);
+  insn = emit_insn (gen_blockage ());
+  add_reg_note (insn, REG_CFA_DEF_CFA,
+   plus_constant (SImode, stack_pointer_rtx,
+  4));
+  RTX_FRAME_RELATED_P (insn) = 1;
 
- frame_move_inc (frame_pointer_rtx, gen_frame_mem (Pmode, addr),
- stack_pointer_rtx, 0);
- size_to_deallocate -= UNITS_PER_WORD;
+  rtx addr = gen_rtx_POST_INC (Pmode, stack_pointer_rtx);
+
+  insn = frame_move_inc (frame_pointer_rtx, gen_frame_mem (Pmode, addr),
+stack_pointer_rtx, 0);

Re: [Patch AArch64] Reinstate CANNOT_CHANGE_MODE_CLASS to fix pr67609

2015-11-27 Thread Richard Biener
On Fri, Nov 27, 2015 at 2:01 PM, James Greenhalgh
 wrote:
>
> Hi,
>
> This patch follow Richard Henderson's advice to tighten up
> CANNOT_CHANGE_MODE_CLASS for AArch64 to avoid a simplification bug in
> the middle-end.
>
> There is nothing AArch64-specific about the testcase which triggers this,
> so I'll put it in the testcase for other targets. If you see a regression,
> the explanation in the PR is much more thorough and correct than I can
> reproduce here, so I'd recommend starting there. In short, target
> maintainers need to:
>
>> forbid BITS_PER_WORD (64-bit) subregs of hard registers >
>> BITS_PER_WORD.  See the verbiage I added to the i386 backend for this.
>
> We removed the CANNOT_CHANGE_MODE_CLASS macro back in January 2015. Before
> then, we used it to workaround bugs in big-endian vector support
> ( https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01216.html ). Ideally,
> we'd not need to bring this macro back, but if we can't fix the middle-end
> bug this exposes, we need the workaround.
>
> For AArch64, doing this runs in to some trouble with two of our
> instruction patterns - we end up with:
>
>   (truncate:DI (reg:TF))
>
> Which fails if it ever make it through to the simplify routines with
> something nasty like:
>
>   (and:DI (truncate:DI (reg:TF 32 v0 [ a ]))
>   (const_int 2305843009213693951 [0x1fff]))
>
> The simplify routines want to turn this around to look like:
>
>   (truncate:DI (and:TF (reg:TF 32 v0 [ a ])
>   (const_int 2305843009213693951 [0x1fff])))
>
> Which then wants to further simplify the expression by first building
> the constant in TF mode, and trunc_int_for_mode barfs:
>
>   0x7a38a5 trunc_int_for_mode(long, machine_mode)
>   .../gcc/explow.c:53
>
> We can fix that by changing the patterns to use a zero_extract, which seems
> more in line with what they actually express (extracting the two 64-bit
> halves of a 128-bit value).
>
> Bootstrapped on aarch64-none-linux-gnu, and tested on aarch64-none-elf and
> aarch64_be-none-elf without seeing any correctness regressions.
>
> OK?
>
> If so, we ought to get this backported to the release branches, the gcc-5
> backport applies clean (testing ongoing but looks OK so far) if the release
> managers and AArch64 maintainers agree this is something that should be
> backported this late in the 5.3 release cycle.

Your call, the RC will be done on monday.

Richard.

> Thanks,
> James
>
> ---
> 2015-11-27  James Greenhalgh  
>
> * config/aarch64/aarch64-protos.h
> (aarch64_cannot_change_mode_class): Bring back.
> * config/aarch64/aarch64.c
> (aarch64_cannot_change_mode_class): Likewise.
> * config/aarch64/aarch64.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
> * config/aarch64/aarch64.md (aarch64_movdi_low): Use
> zero_extract rather than truncate.
> (aarch64_movdi_high): Likewise.
>
> 2015-11-27  James Greenhalgh  
>
> * gcc.dg/torture/pr67609.c: New.
>


[PTX] Another libcall patch

2015-11-27 Thread Nathan Sidwell
I've committed this further cleanup of function decl recording.  We were 
recording libcalls during call expansion, but recording other calls during  call 
outputting.  This moves all recording to the call outputting.  The recording 
helpers were a little confusing -- for intsance 'record_fndecl', was really 
'maybe_record_fndecl', and in some cases the caller checked preconditions (like 
being  an actual  fnddecl) whereas other cases the helper did the checking. 
I've moved things around so that the caller always does the prechecking.  I 
broke the libcall hash manipulation out into a helper, in line with the other 
helpers,  and added a helper to deal with looking through symbol refs.


Finally expand_movdi had no need to register an fndecl itself -- it's already 
calling maybe_convert_symbolic_operand, which does that.


nathan
2015-11-27  Nathan Sidwell  

	* config/nvptx/nvptx-protos.h (nvptx_record_needed_decl): Don't
	declaree.
	* config/nvptx/nvptx.c (write_func_decl_from_insn): Move earlier.
	(nvptx_record_fndecl): Don't return value, remove force
	argyment. Require fndecl.
	(nvptx_record_libfunc): New.
	(nvptx_record_needed_decl): Deteermine how to record decl here.
	(nvptx_maybe_record_fnsym): New.
	(nvptx_expand_call): Don't record libfuncs here,
	(nvptx_maybe_convert_symbolic_operand): Use
	nvptx_maye_record_fnsym.
	(nvptx_assemble_integer): Reimplement with single switch.
	(nvptx_output_call_insn): Register libfuncs here.
	(nvptx_file_end): Adjust  nvptx_record_fndecl call.
	* config/nvptx/nvptx.md (expand_movdi): Don't call
	nvptx_record_needed_decl.

Index: config/nvptx/nvptx-protos.h
===
--- config/nvptx/nvptx-protos.h	(revision 231012)
+++ config/nvptx/nvptx-protos.h	(working copy)
@@ -24,7 +24,6 @@
 extern void nvptx_declare_function_name (FILE *, const char *, const_tree decl);
 extern void nvptx_declare_object_name (FILE *file, const char *name,
    const_tree decl);
-extern void nvptx_record_needed_fndecl (tree decl);
 extern void nvptx_function_end (FILE *);
 extern void nvptx_output_skip (FILE *, unsigned HOST_WIDE_INT);
 extern void nvptx_output_ascii (FILE *, const char *, unsigned HOST_WIDE_INT);
Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231012)
+++ config/nvptx/nvptx.c	(working copy)
@@ -452,6 +452,55 @@ write_function_decl_and_comment (std::st
   s << ";\n";
 }
 
+/* Construct a function declaration from a call insn.  This can be
+   necessary for two reasons - either we have an indirect call which
+   requires a .callprototype declaration, or we have a libcall
+   generated by emit_library_call for which no decl exists.  */
+
+static void
+write_func_decl_from_insn (std::stringstream &s, const char *name,
+			   rtx result, rtx pat)
+{
+  if (!name)
+{
+  s << "\t.callprototype ";
+  name = "_";
+}
+  else
+{
+  s << "\n// BEGIN GLOBAL FUNCTION DECL: " << name << "\n";
+  s << "\t.extern .func ";
+}
+
+  if (result != NULL_RTX)
+s << "(.param"
+  << nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)), false)
+  << " %rval) ";
+
+  s << name;
+
+  const char *sep = " (";
+  int arg_end = XVECLEN (pat, 0);
+  for (int i = 1; i < arg_end; i++)
+{
+  /* We don't have to deal with mode splitting here, as that was
+	 already done when generating the call sequence.  */
+  machine_mode mode = GET_MODE (XEXP (XVECEXP (pat, 0, i), 0));
+
+  s << sep
+	<< ".param"
+	<< nvptx_ptx_type_from_mode (mode, false)
+	<< " %arg"
+	<< i;
+  if (mode == QImode || mode == HImode)
+	s << "[1]";
+  sep = ", ";
+}
+  if (arg_end != 1)
+s << ")";
+  s << ";\n";
+}
+
 /* Check NAME for special function names and redirect them by returning a
replacement.  This applies to malloc, free and realloc, for which we
want to use libgcc wrappers, and call, which triggers a bug in ptxas.  */
@@ -470,20 +519,13 @@ nvptx_name_replacement (const char *name
   return name;
 }
 
-/* If DECL is a FUNCTION_DECL, check the hash table to see if we
-   already encountered it, and if not, insert it and write a ptx
-   declarations that will be output at the end of compilation.  */
+/* DECL is an external FUNCTION_DECL, make sure its in the fndecl hash
+   table and and write a ptx prototype.  These are emitted at end of
+   compilation.  */
 
-static bool
-nvptx_record_fndecl (tree decl, bool force = false)
+static void
+nvptx_record_fndecl (tree decl)
 {
-  if (decl == NULL_TREE || TREE_CODE (decl) != FUNCTION_DECL
-  || !DECL_EXTERNAL (decl))
-return true;
-
-  if (!force && TYPE_ARG_TYPES (TREE_TYPE (decl)) == NULL_TREE)
-return false;
-
   tree *slot = declared_fndecls_htab->find_slot (decl, INSERT);
   if (*slot == NULL)
 {
@@ -492,22 +534,53 @@ nvptx_record_fndecl (tree decl, bool for
   name = nvptx_name_replacement (name);
   write_function_decl_

Re: [PATCH 1/7][ARM] Add support for ARMv8.1.

2015-11-27 Thread Christophe Lyon
On 26 November 2015 at 16:55, Matthew Wahab  wrote:
> Hello,
>
>
> ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
> instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
> ARMv8.1 and for the new instructions, enabling the architecture with
> --march=armv8.1-a. The new instructions are enabled when both ARMv8.1
> and a suitable fpu options are set, for instance with -march=armv8.1-a
> -mfpu=neon-fp-armv8 -mfloat-abi=hard.
>
> This patch set adds the command line options and internal feature
> macros. Following patches
> - enable multilib support for ARMv8.1,
> - add patterns for the new instructions,
> - add the ACLE feature macro for the ARMv8.1 extensions,
> - extend target support in the testsuite to ARMv8.1,
> - add the ACLE intrinsics for vqrmdl{as}h and
> - add the ACLE intrinsics for vqrmdl{as}h_lane.
>
> Tested the series for arm-none-eabi with cross-compiled check-gcc on an
> ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
> bootstrap and make check.
>
> Is this ok for trunk?
> Matthew
>
Hi,

The whole series LGTM, but do you plan to add tests for the new intrinsics?

Thanks,

Christophe.


> gcc/
> 2015-11-26  Matthew Wahab  
>
> * config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
> * config/arm/arm-protos.h (FL2_ARCH8_1): New.
> (FL2_FOR_ARCH8_1A): New.
> * config/arm/arm-tables.opt: Regenerate.
> * config/arm/arm.c (arm_arch8_1): New.
> (arm_option_override): Set arm_arch8_1.
> * config/arm/arm.h (TARGET_NEON_RDMA): New.
> (arm_arch8_1): Declare.
> * doc/invoke.texi (ARM Options, -march): Add "armv8.1-a" and
> "armv8.1-a+crc".
> (ARM Options, -mfpu): Fix a typo.


Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68506: Fix emitting order of insns in IF-THEN-JOIN case

2015-11-27 Thread Richard Biener
On Fri, Nov 27, 2015 at 10:44 AM, Kyrill Tkachov  wrote:
>
> On 26/11/15 16:54, Kyrill Tkachov wrote:
>>
>>
>> On 26/11/15 16:49, Bernd Schmidt wrote:
>>>
>>> On 11/26/2015 05:45 PM, Kyrill Tkachov wrote:

  that doesn't help, punt.  */

 -  modified_in_a = emit_a != NULL_RTX && modified_in_p (orig_b, emit_a);
 if (tmp_b && then_bb)
   {
>>>
>>> These bits I thought would be part of a followup patch (which would also
>>> guard against single_set problems), and as I mentioned I'd rather have a
>>> checking assert.
>>
>> Yes, you're right. I have the checking_assert statement in the followup
>> that I've been testing.
>> I'll move the deletion of these two statements there as well to minimise
>> the changes to this patch.
>>
>> I'll move these bits to that patch, re-build cc1 and commit.
>>
>
> Here it is.
> I'm committing this to trunk.

I think this causes

FAIL: gcc.c-torture/execute/20050124-1.c   -O2  (internal compiler error)
FAIL: gcc.c-torture/execute/20050124-1.c   -O2  (test for excess errors)
WARNING: gcc.c-torture/execute/20050124-1.c   -O2  compilation failed to produce
 executable
FAIL: gcc.c-torture/execute/20050124-1.c   -O2 -flto -fno-use-linker-plugin -flt
o-partition=none  (internal compiler error)
FAIL: gcc.c-torture/execute/20050124-1.c   -O2 -flto -fno-use-linker-plugin -flt
o-partition=none  (test for excess errors)


/space/rguenther/src/svn/trunk2/gcc/testsuite/gcc.c-torture/execute/20050124-1.c:19:1:
internal compiler error: in noce_try_cmove_arith, at ifcvt.c:2180^M
0x11f919d noce_try_cmove_arith^M
/space/rguenther/src/svn/trunk2/gcc/ifcvt.c:2180^M
0x11fb93f noce_process_if_block^M
/space/rguenther/src/svn/trunk2/gcc/ifcvt.c:3525^M
0x11fdd0e noce_find_if_block^M
/space/rguenther/src/svn/trunk2/gcc/ifcvt.c:3974^M
0x11fdd0e find_if_header^M
/space/rguenther/src/svn/trunk2/gcc/ifcvt.c:4179^M
0x11fdd0e if_convert^M
/space/rguenther/src/svn/trunk2/gcc/ifcvt.c:5326^M
0x11ff32d execute^M


on x86_64 with -m64 and -m32.

Richard.

> Thanks,
> Kyrill
>
> 2015-11-26  Kyrylo Tkachov  
>
> PR rtl-optimization/68506
> * ifcvt.c (noce_try_cmove_arith): Try emitting the else basic block
> first if emit_a exists or then_bb modifies 'b'.  Reindent if-else
> blocks.
>
> 2015-11-26  Kyrylo Tkachov  
>
> PR rtl-optimization/68506
> * gcc.c-torture/execute/pr68506.c: New test.
>
>> Thanks for your guidance,
>> Kyrill
>>
>>> So take these deletions out and leave them for the followup, and the
>>> patch is ok everywhere. No need for a full retest given that practically the
>>> same patch has been tested already, just make sure you can build cc1.
>>>
>>>
>>> Bernd
>>>
>>
>


Re: [PATCH] Allocate constant size dynamic stack space in the prologue

2015-11-27 Thread Dominik Vogt
New patch with the following changes:

* Fixed comment about dynamic var area placement.
* The area is now placed further away from the stack pointer than
  the non-dynamic stack variables (tested only with
  STACK_GROWS_DOWNWARD).  This is a possible performance
  improvement on S/390 (hoping that more variables will be
  addressable using a displacement).
* Moved the code that calculates the size to actually allocate
  from the size required by dynamic stack variables to a separate
  function.  Use that function from allocate_dynamic_stack_space()
  and expand_stack_vars() so the size calculations are the same
  for both.
* Use a target hook to activate the feature (for now).
  (This is just meant to make it more feasible to be included in
  Gcc6.  If it's to late for this the code may be as well be used
  for all targets.)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* cfgexpand.c (expand_stack_vars): Implement
ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE.
* explow.c (get_dynamic_stack_base): New function to return an address
expression for the dynamic stack base when using
ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE.
(get_dynamic_stack_size): New function to do the required dynamic stack
space size calculations.
(allocate_dynamic_stack_space): Use new functions.
(align_dynamic_address): Move some code from
allocate_dynamic_stack_space to new function.
* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P):
Define documentation hook.
* config/s390/s390.c
(TARGET_ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P):
Provide hook.
* defaults.h (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P): Define by
default.
* target.def (allocate_dynamic_stack_space_in_prologue_p): Define hook.

gcc/testsuite/ChangeLog

* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.
>From 55b9ba6882dbd2d8deed6c337b0e7de65617d7b3 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 25 Nov 2015 09:31:19 +0100
Subject: [PATCH] v2: Allocate constant size dynamic stack space in the
 prologue ...

... and place it in the virtual stack vars area, if the platform supports it.
On S/390 this saves adjusting the stack pointer twice and forcing the frame
pointer into existence.  It also removes the warning with -mwarn-dynamicstack
that is triggered by cfun->calls_alloca == 1.

This fixes a problem with the Linux kernel which aligns the page structure to
16 bytes at run time using inefficient code and issuing a bogus warning.
---
 gcc/cfgexpand.c  |  26 +++-
 gcc/config/s390/s390.c   |   3 +
 gcc/config/s390/s390.h   |   4 +
 gcc/defaults.h   |   4 +
 gcc/doc/tm.texi  |   5 +
 gcc/doc/tm.texi.in   |   2 +
 gcc/explow.c | 232 +++
 gcc/explow.h |   9 ++
 gcc/target.def   |   9 ++
 gcc/testsuite/gcc.dg/stack-usage-2.c |   4 +-
 10 files changed, 214 insertions(+), 84 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 1990e10..81a7aac 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1032,7 +1032,9 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
   size_t si, i, j, n = stack_vars_num;
   HOST_WIDE_INT large_size = 0, large_alloc = 0;
   rtx large_base = NULL;
+  rtx large_allocsize = NULL;
   unsigned large_align = 0;
+  bool large_allocation_done = false;
   tree decl;
 
   /* Determine if there are any variables requiring "large" alignment.
@@ -1079,8 +1081,17 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 
   /* If there were any, allocate space.  */
   if (large_size > 0)
-	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-		   large_align, true);
+	{
+	  if (targetm.calls.allocate_dynamic_stack_space_in_prologue_p ())
+	{
+	  large_allocsize = GEN_INT (large_size);
+	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
+	}
+	  else
+	/* Allocate space now.  */
+	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
+		   large_align, true);
+	}
 }
 
   for (si = 0; si < n; ++si)
@@ -1166,6 +1177,17 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	  /* Large alignment is only processed in the last pass.  */
 	  if (pred)
 	continue;
+
+	  if (large_allocsize && ! large_allocation_done)
+	{
+	  /* Allocate space the virtual stack vars area in the prologue.
+	   */
+	  HOST_WIDE_INT loffset;
+
+	  loffset = alloc_stack_frame_space (INTVAL (large_allocsize), 1);
+	  large_base = get_dynamic_stack_base (loffset, large_align);
+	  large_a

[PATCH] Fix PR68559

2015-11-27 Thread Richard Biener

The following fixes the excessive peeling for gaps we do when doing
SLP now that I removed most of the restrictions on having gaps in
the first place.

This should make low-trip vectorized loops more efficient (sth
also the combine-epilogue-with-vectorized-body-by-masking patches
claim to do).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-11-27  Richard Biener  

PR tree-optimization/68559
* tree-vect-data-refs.c (vect_analyze_group_access_1): Move
peeling for gap checks ...
* tree-vect-stmts.c (vectorizable_load): ... here and relax
for SLP.
* tree-vect-loop.c (vect_analyze_loop_2): Re-set
LOOP_VINFO_PEELING_FOR_GAPS before re-trying without SLP.

* gcc.dg/vect/slp-perm-4.c: Adjust again.
* gcc.dg/vect/pr45752.c: Likewise.

Index: gcc/tree-vect-stmts.c
===
*** gcc/tree-vect-stmts.c   (revision 230998)
--- gcc/tree-vect-stmts.c   (working copy)
*** vectorizable_load (gimple *stmt, gimple_
*** 6246,6260 
   that leaves unused vector loads around punt - we at least create
 very sub-optimal code in that case (and blow up memory,
 see PR65518).  */
if (first_stmt == stmt
! && !GROUP_NEXT_ELEMENT (stmt_info)
! && GROUP_SIZE (stmt_info) > TYPE_VECTOR_SUBPARTS (vectype))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!"single-element interleaving not supported "
!"for not adjacent vector loads\n");
! return false;
}
  
if (slp && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
--- 6250,6294 
   that leaves unused vector loads around punt - we at least create
 very sub-optimal code in that case (and blow up memory,
 see PR65518).  */
+   bool force_peeling = false;
if (first_stmt == stmt
! && !GROUP_NEXT_ELEMENT (stmt_info))
!   {
! if (GROUP_SIZE (stmt_info) > TYPE_VECTOR_SUBPARTS (vectype))
!   {
! if (dump_enabled_p ())
!   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!"single-element interleaving not supported "
!"for not adjacent vector loads\n");
! return false;
!   }
! 
! /* Single-element interleaving requires peeling for gaps.  */
! force_peeling = true;
!   }
! 
!   /* If there is a gap in the end of the group or the group size cannot
!  be made a multiple of the vector element count then we access excess
!elements in the last iteration and thus need to peel that off.  */
!   if (loop_vinfo
! && ! STMT_VINFO_STRIDED_P (stmt_info)
! && (force_peeling
! || GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0
! || (!slp && vf % GROUP_SIZE (vinfo_for_stmt (first_stmt)) != 0)))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!"Data access with gaps requires scalar "
!"epilogue loop\n");
! if (loop->inner)
!   {
! if (dump_enabled_p ())
!   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!"Peeling for outer loop is not supported\n");
! return false;
!   }
! 
! LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = true;
}
  
if (slp && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
Index: gcc/testsuite/gcc.dg/vect/slp-perm-4.c
===
*** gcc/testsuite/gcc.dg/vect/slp-perm-4.c  (revision 230998)
--- gcc/testsuite/gcc.dg/vect/slp-perm-4.c  (working copy)
***
*** 33,39 
  #define M34 7716
  #define M44 16
  
! #define N 40
  
  void foo (unsigned int *__restrict__ pInput, unsigned int *__restrict__ 
pOutput)
  {
--- 33,39 
  #define M34 7716
  #define M44 16
  
! #define N 20
  
  void foo (unsigned int *__restrict__ pInput, unsigned int *__restrict__ 
pOutput)
  {
*** int main (int argc, const char* argv[])
*** 60,68 
unsigned int input[N], output[N], i;
unsigned int check_results[N]
  = {3208, 1334, 28764, 35679, 2789, 13028, 4754, 168364, 91254, 12399, 
! 22848, 8174, 307964, 146829, 22009, 32668, 11594, 447564, 202404, 31619, 
! 42488, 15014, 587164, 257979, 41229, 52308, 18434, 726764, 313554, 50839, 
! 62128, 21854, 866364, 369129, 60449, 71948, 25274, 1005964, 424704, 
70059};
  
check_vect ();
  
--- 60,66 
unsigned int input[N], output[N], i;
unsigned int check_results[N]
  = {3208, 1334, 28764, 35679, 2789, 13028, 4754, 168364, 91254, 12399, 
! 22848, 8174, 307964, 146829, 220

Re: [PATCH] Allocate constant size dynamic stack space in the prologue

2015-11-27 Thread Dominik Vogt
On Fri, Nov 27, 2015 at 03:09:15PM +0100, Dominik Vogt wrote:
> +++ b/gcc/config/s390/s390.h
...
> +/* Constant size dynamic stack space can be allocated through the function
> +   prologue to save the extra instructions to adjust the stack pointer.  */
> +#define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
> +

This is obsolete and needs to be removed.

> +++ b/gcc/defaults.h
> +#ifndef ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P
> +#define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P 0
> +#endif
> +

Ditto.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



[PTX] address space patch

2015-11-27 Thread Nathan Sidwell
I've committed this cleanup of some of the address space machinery.  As with the 
fn recording helpers, the address space helper was accepting  many things that 
were not symbols.  Changed the two callers to only pass a symbol -- one of them 
had already done the symbol extraction anyway.  Also, the prototype is not 
needed externally.


nathan
2015-11-27  Nathan Sidwell  

	* config/nvptx/nvptx-protos.h (nvptx_addr_space_from_address):
	Don't declare.
	* config/nvptx/nvptx.c (nvptx_addr_space_from_sym): New.
	(nvptx_maybe_convert_symbolic_operand): Simplify.
	(nvptx_addr_space_from_address): Delete.
	(nvptx_print_operand): Adjust 'A' case.

Index: config/nvptx/nvptx-protos.h
===
--- config/nvptx/nvptx-protos.h	(revision 231015)
+++ config/nvptx/nvptx-protos.h	(working copy)
@@ -41,7 +41,6 @@ extern const char *nvptx_output_return (
 extern machine_mode nvptx_underlying_object_mode (rtx);
 extern const char *nvptx_section_from_addr_space (addr_space_t);
 extern bool nvptx_hard_regno_mode_ok (int, machine_mode);
-extern addr_space_t nvptx_addr_space_from_address (rtx);
 extern rtx nvptx_maybe_convert_symbolic_operand (rtx);
 #endif
 #endif
Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231015)
+++ config/nvptx/nvptx.c	(working copy)
@@ -206,6 +206,24 @@ nvptx_ptx_type_from_mode (machine_mode m
 }
 }
 
+/* Determine the address space to use for SYMBOL_REF SYM.  */
+
+static addr_space_t
+nvptx_addr_space_from_sym (rtx sym)
+{
+  tree decl = SYMBOL_REF_DECL (sym);
+  if (decl == NULL_TREE || TREE_CODE (decl) == FUNCTION_DECL)
+return ADDR_SPACE_GENERIC;
+
+  bool is_const = (CONSTANT_CLASS_P (decl)
+		   || TREE_CODE (decl) == CONST_DECL
+		   || TREE_READONLY (decl));
+  if (is_const)
+return ADDR_SPACE_CONST;
+
+  return ADDR_SPACE_GLOBAL;
+}
+
 /* If MODE should be treated as two registers of an inner mode, return
that inner mode.  Otherwise return VOIDmode.  */
 
@@ -1359,22 +1377,25 @@ nvptx_gen_wcast (rtx reg, propagate_mask
original operand, or the converted one.  */
 
 rtx
-nvptx_maybe_convert_symbolic_operand (rtx orig_op)
+nvptx_maybe_convert_symbolic_operand (rtx op)
 {
-  if (GET_MODE (orig_op) != Pmode)
-return orig_op;
+  if (GET_MODE (op) != Pmode)
+return op;
+
+  rtx sym = op;
+  if (GET_CODE (sym) == CONST)
+sym = XEXP (sym, 0);
+  if (GET_CODE (sym) == PLUS)
+sym = XEXP (sym, 0);
 
-  rtx op = orig_op;
-  while (GET_CODE (op) == PLUS || GET_CODE (op) == CONST)
-op = XEXP (op, 0);
-  if (GET_CODE (op) != SYMBOL_REF)
-return orig_op;
+  if (GET_CODE (sym) != SYMBOL_REF)
+return op;
 
-  nvptx_maybe_record_fnsym (op);
+  nvptx_maybe_record_fnsym (sym);
   
-  addr_space_t as = nvptx_addr_space_from_address (op);
+  addr_space_t as = nvptx_addr_space_from_sym (sym);
   if (as == ADDR_SPACE_GENERIC)
-return orig_op;
+return op;
 
   enum unspec code;
   code = (as == ADDR_SPACE_GLOBAL ? UNSPEC_FROM_GLOBAL
@@ -1382,9 +1403,10 @@ nvptx_maybe_convert_symbolic_operand (rt
 	  : as == ADDR_SPACE_SHARED ? UNSPEC_FROM_SHARED
 	  : as == ADDR_SPACE_CONST ? UNSPEC_FROM_CONST
 	  : UNSPEC_FROM_PARAM);
+
   rtx dest = gen_reg_rtx (Pmode);
-  emit_insn (gen_rtx_SET (dest, gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig_op),
-		code)));
+  emit_insn (gen_rtx_SET (dest,
+			  gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op), code)));
   return dest;
 }
 
@@ -1465,29 +1487,6 @@ nvptx_section_for_decl (const_tree decl)
   return ".global";
 }
 
-/* Look for a SYMBOL_REF in ADDR and return the address space to be used
-   for the insn referencing this address.  */
-
-addr_space_t
-nvptx_addr_space_from_address (rtx addr)
-{
-  while (GET_CODE (addr) == PLUS || GET_CODE (addr) == CONST)
-addr = XEXP (addr, 0);
-  if (GET_CODE (addr) != SYMBOL_REF)
-return ADDR_SPACE_GENERIC;
-
-  tree decl = SYMBOL_REF_DECL (addr);
-  if (decl == NULL_TREE || TREE_CODE (decl) == FUNCTION_DECL)
-return ADDR_SPACE_GENERIC;
-
-  bool is_const = (CONSTANT_CLASS_P (decl)
-		   || TREE_CODE (decl) == CONST_DECL
-		   || TREE_READONLY (decl));
-  if (is_const)
-return ADDR_SPACE_CONST;
-
-  return ADDR_SPACE_GLOBAL;
-}
 
 /* Machinery to output constant initializers.  When beginning an initializer,
we decide on a chunk size (which is visible in ptx in the type used), and
@@ -1945,7 +1944,17 @@ nvptx_print_operand (FILE *file, rtx x,
 {
 case 'A':
   {
-	addr_space_t as = nvptx_addr_space_from_address (XEXP (x, 0));
+	addr_space_t as = ADDR_SPACE_GENERIC;
+	rtx sym = XEXP (x, 0);
+
+	if (GET_CODE (sym) == CONST)
+	  sym = XEXP (sym, 0);
+	if (GET_CODE (sym) == PLUS)
+	  sym = XEXP (sym, 0);
+
+	if (GET_CODE (sym) == SYMBOL_REF)
+	  as = nvptx_addr_space_from_sym (sym);
+
 	fputs (nvptx_section_from_addr_space (as), file);
   }
   break;


Re: [Ping][PATCH][GCC][ARM] testcase memset-inline-10.c uses -mfloat-abi=hard but does not check whether target supports it

2015-11-27 Thread Andre Vieira

On 12/11/15 15:16, Andre Vieira wrote:

On 12/11/15 15:08, Andre Vieira wrote:

Hi,

   This patch changes the memset-inline-10.c testcase to make sure that
it is only compiled for ARM targets that support -mfloat-abi=hard using
the fact that all non-thumb1 targets do.

   This is correct because all targets for which -mthumb causes the
compiler to use thumb2 will support the generation of FP instructions.

   Tested by running regressions for this testcase for various ARM
targets.

   Is this OK to commit?

   Thanks,
   Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira  

 * gcc.target/arm/memset-inline-10.c: Added
 dg-require-effective-target arm_thumb2_ok.


Now with attachment, sorry about that.

Cheers,
Andre


Ping.



Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68506: Fix emitting order of insns in IF-THEN-JOIN case

2015-11-27 Thread Kyrill Tkachov


On 27/11/15 14:09, Richard Biener wrote:

On Fri, Nov 27, 2015 at 10:44 AM, Kyrill Tkachov  wrote:

On 26/11/15 16:54, Kyrill Tkachov wrote:


On 26/11/15 16:49, Bernd Schmidt wrote:

On 11/26/2015 05:45 PM, Kyrill Tkachov wrote:

  that doesn't help, punt.  */

-  modified_in_a = emit_a != NULL_RTX && modified_in_p (orig_b, emit_a);
 if (tmp_b && then_bb)
   {

These bits I thought would be part of a followup patch (which would also
guard against single_set problems), and as I mentioned I'd rather have a
checking assert.

Yes, you're right. I have the checking_assert statement in the followup
that I've been testing.
I'll move the deletion of these two statements there as well to minimise
the changes to this patch.

I'll move these bits to that patch, re-build cc1 and commit.


Here it is.
I'm committing this to trunk.

I think this causes

FAIL: gcc.c-torture/execute/20050124-1.c   -O2  (internal compiler error)
FAIL: gcc.c-torture/execute/20050124-1.c   -O2  (test for excess errors)
WARNING: gcc.c-torture/execute/20050124-1.c   -O2  compilation failed to produce
  executable
FAIL: gcc.c-torture/execute/20050124-1.c   -O2 -flto -fno-use-linker-plugin -flt
o-partition=none  (internal compiler error)
FAIL: gcc.c-torture/execute/20050124-1.c   -O2 -flto -fno-use-linker-plugin -flt
o-partition=none  (test for excess errors)



Sorry for that.
That is caused not by this patch but rather by the followup
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03327.html

The checking assert fails:
gcc_checking_assert (!emit_a || !modified_in_p (orig_b, emit_a));
emit_a is:
(parallel [
(set (reg:SI 93)
(plus:SI (reg/v:SI 88 [ i ])
(const_int 2 [0x2])))
(clobber (reg:CC 17 flags))
])

and and orig_b is:
(if_then_else:SI (eq (reg:CC 17 flags)
(const_int 0 [0]))
(reg/v:SI 87 [  ])
(reg/v:SI 88 [ i ]))

So I think our assumption that this case would never trigger by this point 
doesn't hold
due to the CC reg clobber.
So the code before that patch was probably correct.
I think we should revert 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03327.html then.

Kyrill


/space/rguenther/src/svn/trunk2/gcc/testsuite/gcc.c-torture/execute/20050124-1.c:19:1:
internal compiler error: in noce_try_cmove_arith, at ifcvt.c:2180^M
0x11f919d noce_try_cmove_arith^M
 /space/rguenther/src/svn/trunk2/gcc/ifcvt.c:2180^M
0x11fb93f noce_process_if_block^M
 /space/rguenther/src/svn/trunk2/gcc/ifcvt.c:3525^M
0x11fdd0e noce_find_if_block^M
 /space/rguenther/src/svn/trunk2/gcc/ifcvt.c:3974^M
0x11fdd0e find_if_header^M
 /space/rguenther/src/svn/trunk2/gcc/ifcvt.c:4179^M
0x11fdd0e if_convert^M
 /space/rguenther/src/svn/trunk2/gcc/ifcvt.c:5326^M
0x11ff32d execute^M


on x86_64 with -m64 and -m32.

Richard.


Thanks,
Kyrill

2015-11-26  Kyrylo Tkachov  

 PR rtl-optimization/68506
 * ifcvt.c (noce_try_cmove_arith): Try emitting the else basic block
 first if emit_a exists or then_bb modifies 'b'.  Reindent if-else
 blocks.

2015-11-26  Kyrylo Tkachov  

 PR rtl-optimization/68506
 * gcc.c-torture/execute/pr68506.c: New test.


Thanks for your guidance,
Kyrill


So take these deletions out and leave them for the followup, and the
patch is ok everywhere. No need for a full retest given that practically the
same patch has been tested already, just make sure you can build cc1.


Bernd





Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68506: Fix emitting order of insns in IF-THEN-JOIN case

2015-11-27 Thread Bernd Schmidt

On 11/27/2015 03:33 PM, Kyrill Tkachov wrote:

Sorry for that.
That is caused not by this patch but rather by the followup
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03327.html

The checking assert fails:
gcc_checking_assert (!emit_a || !modified_in_p (orig_b, emit_a));
emit_a is:
(parallel [
 (set (reg:SI 93)
 (plus:SI (reg/v:SI 88 [ i ])
 (const_int 2 [0x2])))
 (clobber (reg:CC 17 flags))
 ])

and and orig_b is:
(if_then_else:SI (eq (reg:CC 17 flags)
 (const_int 0 [0]))
 (reg/v:SI 87 [  ])
 (reg/v:SI 88 [ i ]))

So I think our assumption that this case would never trigger by this
point doesn't hold
due to the CC reg clobber.
So the code before that patch was probably correct.
I think we should revert
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03327.html then.


Yes. Sorry. I thought orig_b would hold "normal" objects and not such an 
if-then-else.



Bernd



Re: [Fortran, patch, pr68218, backport to 5 and 4.9, v1] ALLOCATE with size given by a module function

2015-11-27 Thread Andre Vehreschild
Hi Mikael, hi all,

Mikael, thanks for the fast review.

Committed after sync with richi for 5.3 as r231014 and r231017 for 4.9.

Regards,
Andre

On Fri, 27 Nov 2015 14:09:15 +0100
Mikael Morin  wrote:

> Le 27/11/2015 13:20, Andre Vehreschild a écrit :
> > Hi all,
> >
> > I have backported the patch for 68218 (multiple calls of the same
> > function, where only one call is expected and reasonable) to
> > gcc-5-branch and gcc-4_9-branch.
> >
> > Bootstrapped and regtested on x86_64-linux-gnu/f21.
> >
> > Ok for gcc-5-branch?
> >
> > Ok for gcc-4_9-branch?
> >
> Yes for both.
> Richi said in [1] that a 5.3 release candidate was planned for either 
> today or next monday, so before proceeding, please ping one release 
> manager on IRC to check that your commit won't interfere with the 
> release process.
> Thanks
> 
> Mikael
> 
> [1] https://gcc.gnu.org/ml/gcc/2015-11/msg00186.html


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 231014)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,9 @@
+2015-11-27  Andre Vehreschild  
+
+	PR fortran/68218
+	* trans-array.c (gfc_array_init_size): Add gfc_evaluate_now() when
+	array spec in allocate is a function call.
+
 2015-11-25  Paul Thomas  
 
 	Backport from trunk.
Index: gcc/fortran/trans-array.c
===
--- gcc/fortran/trans-array.c	(Revision 231014)
+++ gcc/fortran/trans-array.c	(Arbeitskopie)
@@ -4976,6 +4976,8 @@
   gcc_assert (ubound);
   gfc_conv_expr_type (&se, ubound, gfc_array_index_type);
   gfc_add_block_to_block (pblock, &se.pre);
+  if (ubound->expr_type == EXPR_FUNCTION)
+	se.expr = gfc_evaluate_now (se.expr, pblock);
 
   gfc_conv_descriptor_ubound_set (descriptor_block, descriptor,
   gfc_rank_cst[n], se.expr);
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(Revision 231014)
+++ gcc/testsuite/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,8 @@
+2015-11-27  Andre Vehreschild  
+
+	PR fortran/68218
+	* gfortran.dg/allocate_with_arrayspec_1.f90: New test.
+
 2015-11-26  Kyrylo Tkachov  
 
 	Backport from mainline
Index: gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90
===
--- gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90	(Revision 0)
+++ gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90	(Arbeitskopie)
@@ -0,0 +1,29 @@
+! { dg-do run }
+! { dg-options "-fdump-tree-original" }
+
+MODULE mo_test
+
+  integer :: n = 0
+CONTAINS
+
+  FUNCTION nquery()
+INTEGER :: nquery
+WRITE (0,*) "hello!"
+n = n + 1
+nquery = n
+  END FUNCTION nquery
+
+END MODULE mo_test
+
+
+! --
+! MAIN PROGRAM
+! --
+PROGRAM example
+   USE mo_test
+   INTEGER, ALLOCATABLE :: query_buf(:)
+   ALLOCATE(query_buf(nquery()))
+   if (n /= 1 .or. size(query_buf) /= n) call abort()
+END PROGRAM example
+
+! { dg-final { scan-tree-dump-times "nquery" 5 "original" } }
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 231012)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,9 @@
+2015-11-27  Andre Vehreschild  
+
+	PR fortran/68218
+	* trans-array.c (gfc_array_init_size): Add gfc_evaluate_now() when
+	array spec in allocate is a function call.
+
 2015-11-24  Paul Thomas  
 
 	Backport from trunk.
Index: gcc/fortran/trans-array.c
===
--- gcc/fortran/trans-array.c	(Revision 231012)
+++ gcc/fortran/trans-array.c	(Arbeitskopie)
@@ -5030,6 +5030,8 @@
   gcc_assert (ubound);
   gfc_conv_expr_type (&se, ubound, gfc_array_index_type);
   gfc_add_block_to_block (pblock, &se.pre);
+  if (ubound->expr_type == EXPR_FUNCTION)
+	se.expr = gfc_evaluate_now (se.expr, pblock);
 
   gfc_conv_descriptor_ubound_set (descriptor_block, descriptor,
   gfc_rank_cst[n], se.expr);
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(Revision 231012)
+++ gcc/testsuite/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,8 @@
+2015-11-27  Andre Vehreschild  
+
+	PR fortran/68218
+	* gfortran.dg/allocate_with_arrayspec_1.f90: New test.
+
 2015-11-27  Jakub Jelinek  
 
 	PR rtl-optimization/68250
Index: gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90
===
--- gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90	(Revision 0)
+++ gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90	(Arbeitskopie)
@@ -0,0 +1,29 @@
+! { dg-do run }
+! { dg-options "-fdump-tree-original" }
+
+MODULE m

[HSA] Implementation of various omp_* function

2015-11-27 Thread Martin Liška
Hello.

Following pair of patches implements missing omp function.

Installed to the HSA branch.
Thanks,
Martin
>From 90cb91ca75ce29e560184fbd1ca03a7e58fc6685 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 26 Nov 2015 10:18:44 +0100
Subject: [PATCH 1/5] HSA: implement omp_get_level

libgomp/ChangeLog:

2015-11-26  Martin Liska  

	* plugin/plugin-hsa.c (struct hsa_kernel_description):
	Add field gridified_kernel_p.
	(struct kernel_info): Likewise.
	(GOMP_OFFLOAD_load_image): Fill-up the field.
	(init_single_kernel): Dump value of the field.
	(create_kernel_dispatch): Set-up omp_level for kernel
	packet dispatch structure.

gcc/ChangeLog:

2015-11-26  Martin Liska  

	* hsa-brig.c (hsa_output_kernels): Append gridified_kernel_p
	to kernel_info structure.
	* hsa-gen.c (gen_get_level): Generate call of the builtin.
	(gen_hsa_insns_for_known_library_call): Call the aforementioned
	function.
	(generate_hsa):Output gridified_kernel_p from HSA summary.
	* hsa.c (struct hsa_decl_kernel_map_element): Add
	gridified_kernel_p field.
	(hsa_add_kern_decl_mapping): Add argument for the field.
	(hsa_get_decl_kernel_mapping_gridified): New function.
	(hsa_summary_t::link_functions): Add new argument for
	gridified_kernel_p.
	(hsa_register_kernel): Mark gridified kernel within HSA summary.
	* hsa.h (struct hsa_function_summary): Declare new field
	in HSA summary.
	* ipa-hsa.c (process_hsa_functions): Use modified signature of
	link_functions.

include/ChangeLog:

2015-11-26  Martin Liska  

	* gomp-constants.h (struct GOMP_hsa_kernel_dispatch): Declare
	new field in kernel dispatch structure.
---
 gcc/hsa-brig.c  | 27 ++-
 gcc/hsa-gen.c   | 39 ++-
 gcc/hsa.c   | 25 ++---
 gcc/hsa.h   | 11 ---
 gcc/ipa-hsa.c   |  4 ++--
 include/gomp-constants.h|  2 ++
 libgomp/plugin/plugin-hsa.c |  7 +++
 7 files changed, 93 insertions(+), 22 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index ca30598..9f65d50 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -1982,15 +1982,19 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
 			   unsigned_type_node);
   DECL_CHAIN (id_f2) = id_f1;
   tree id_f3 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
-			   get_identifier ("kernel_dependencies_count"),
-			   unsigned_type_node);
+			   get_identifier ("gridified_kernel_p"),
+			   boolean_type_node);
   DECL_CHAIN (id_f3) = id_f2;
   tree id_f4 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("kernel_dependencies_count"),
+			   unsigned_type_node);
+  DECL_CHAIN (id_f4) = id_f3;
+  tree id_f5 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
 			   get_identifier ("kernel_dependencies"),
 			   build_pointer_type (build_pointer_type
 	   (char_type_node)));
-  DECL_CHAIN (id_f4) = id_f3;
-  finish_builtin_struct (kernel_info_type, "__hsa_kernel_info", id_f4,
+  DECL_CHAIN (id_f5) = id_f4;
+  finish_builtin_struct (kernel_info_type, "__hsa_kernel_info", id_f5,
 			 NULL_TREE);
 
   int_num_of_kernels = build_int_cstu (uint32_type_node, map_count);
@@ -2018,7 +2022,10 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
   free (copy);
 
   unsigned omp_size = hsa_get_decl_kernel_mapping_omp_size (i);
-  tree omp_data_size = build_int_cstu (uint32_type_node, omp_size);
+  tree omp_data_size = build_int_cstu (unsigned_type_node, omp_size);
+  bool gridified_kernel_p = hsa_get_decl_kernel_mapping_gridified (i);
+  tree gridified_kernel_p_tree = build_int_cstu (boolean_type_node,
+		 gridified_kernel_p);
   unsigned count = 0;
 
   kernel_dependencies_vector_type = build_array_type
@@ -2057,7 +2064,7 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
 	}
 	}
 
-  tree dependencies_count = build_int_cstu (uint32_type_node, count);
+  tree dependencies_count = build_int_cstu (unsigned_type_node, count);
 
   vec *kernel_info_vec = NULL;
   CONSTRUCTOR_APPEND_ELT (kernel_info_vec, NULL_TREE,
@@ -2066,11 +2073,10 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
 			  (kern_name)),
   kern_name));
   CONSTRUCTOR_APPEND_ELT (kernel_info_vec, NULL_TREE, omp_data_size);
+  CONSTRUCTOR_APPEND_ELT (kernel_info_vec, NULL_TREE,
+			  gridified_kernel_p_tree);
   CONSTRUCTOR_APPEND_ELT (kernel_info_vec, NULL_TREE, dependencies_count);
 
-  tree kernel_info_ctor = build_constructor (kernel_info_type,
-		 kernel_info_vec);
-
   if (count > 0)
 	{
 	  ASM_GENERATE_INTERNAL_LABEL (tmp_name, "__hsa_dependencies_list", i);
@@ -2098,6 +2104,9 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
   else
 	CONSTRUCTOR_APPEND_ELT (kernel_info_vec, NULL_TREE, null_pointer_node);
 
+  tree kernel_info_ctor = build_constructor (kernel_info_type,
+		 kernel_info_vec);
+
   CONSTRUCTOR_APPEND_ELT (kernel_info_vector

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68506: Fix emitting order of insns in IF-THEN-JOIN case

2015-11-27 Thread Kyrill Tkachov


On 27/11/15 14:35, Bernd Schmidt wrote:

On 11/27/2015 03:33 PM, Kyrill Tkachov wrote:

Sorry for that.
That is caused not by this patch but rather by the followup
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03327.html

The checking assert fails:
gcc_checking_assert (!emit_a || !modified_in_p (orig_b, emit_a));
emit_a is:
(parallel [
 (set (reg:SI 93)
 (plus:SI (reg/v:SI 88 [ i ])
 (const_int 2 [0x2])))
 (clobber (reg:CC 17 flags))
 ])

and and orig_b is:
(if_then_else:SI (eq (reg:CC 17 flags)
 (const_int 0 [0]))
 (reg/v:SI 87 [  ])
 (reg/v:SI 88 [ i ]))

So I think our assumption that this case would never trigger by this
point doesn't hold
due to the CC reg clobber.
So the code before that patch was probably correct.
I think we should revert
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03327.html then.


Yes. Sorry. I thought orig_b would hold "normal" objects and not such an 
if-then-else.



Reverted with r231019.
Sorry for not catching it myself earlier.

Kyrill



Bernd





[HSA] fix emission of function names with user-defined assembly names

2015-11-27 Thread Martin Liška
Hello.

The patch has just been applied to the HSA branch.

Martin
>From 9f791cd1715b65599a4b022a56a7eac7e0816e72 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 27 Nov 2015 09:06:58 +0100
Subject: [PATCH 3/5] HSA: fix emission of function names with user-defined
 names

gcc/ChangeLog:

2015-11-27  Martin Liska  

	* hsa.c (hsa_get_declaration_name): Skip leading asterisk symbol
	is assembly name.
---
 gcc/hsa.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/gcc/hsa.c b/gcc/hsa.c
index 7c4e404..c728608 100644
--- a/gcc/hsa.c
+++ b/gcc/hsa.c
@@ -710,14 +710,20 @@ hsa_get_declaration_name (tree decl)
   free (b);
   return ggc_str;
 }
-  else if (TREE_CODE (decl) == FUNCTION_DECL)
-return cgraph_node::get_create (decl)->asm_name ();
-  else if (TREE_CODE (decl) == VAR_DECL && is_global_var (decl))
-return IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  tree name_tree;
+  if (TREE_CODE (decl) == FUNCTION_DECL
+  || (TREE_CODE (decl) == VAR_DECL && is_global_var (decl)))
+name_tree = DECL_ASSEMBLER_NAME (decl);
   else
-return IDENTIFIER_POINTER (DECL_NAME (decl));
+name_tree = DECL_NAME (decl);
+
+  const char *name = IDENTIFIER_POINTER (name_tree);
+  /* User-defined assembly names have prepended asterisk symbol.  */
+  if (name[0] == '*')
+name++;
 
-  return NULL;
+  return name;
 }
 
 void
-- 
2.6.3



[PATCH committed] PR other/61321 - demangler crash on casts in template parameters

2015-11-27 Thread Markus Trippelsdorf
I've committed the patch from Pedro Alves for PR61321. It was approved
by Jason over a year ago and the dups kept piling up.

PR other/61321 - demangler crash on casts in template parameters

The fix for bug 59195:

 [C++ demangler handles conversion operator incorrectly]
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59195

unfortunately makes the demangler crash due to infinite recursion, in
case of casts in template parameters.

For example, with:

 template struct A {};
 template  void function_temp(A) {}
 template void function_temp(A);

The 'function_temp' instantiation above mangles to:

  _Z13function_tempIiEv1AIXszcvT_Li999EEE

The demangler parses this as:

typed name
  template
name 'function_temp'
template argument list
  builtin type int
  function type
builtin type void
argument list
  template  (*)
name 'A'
template argument list
  unary operator
operator sizeof
unary operator
  cast
template parameter 0(**)
  literal
builtin type int
name '999'

And after the fix for 59195, due to:

 static void
 d_print_cast (struct d_print_info *dpi, int options,
   const struct demangle_component *dc)
 {
 ...
   /* For a cast operator, we need the template parameters from
  the enclosing template in scope for processing the type.  */
   if (dpi->current_template != NULL)
 {
   dpt.next = dpi->templates;
   dpi->templates = &dpt;
   dpt.template_decl = dpi->current_template;
 }

when printing the template argument list of A (what should be ""), the template parameter 0 (that is, "T_", the '**' above) now
refers to the first parameter of the the template argument list of the
'A' template (the '*' above), exactly what we were already trying to
print.  This leads to infinite recursion, and stack exaustion.  The
template parameter 0 should actually refer to the first parameter of
the 'function_temp' template.

Where it reads "for the cast operator" in the comment in d_print_cast
(above), it's really talking about a conversion operator, like:

  struct A { template  explicit operator U(); };

We don't want to inject the template parameters from the enclosing
template in scope when processing a cast _expression_, only when
handling a conversion operator.

The problem is that DEMANGLE_COMPONENT_CAST is currently ambiguous,
and means _both_ 'conversion operator' and 'cast expression'.

Fix this by adding a new DEMANGLE_COMPONENT_CONVERSION component type,
which does what DEMANGLE_COMPONENT_CAST does today, and making
DEMANGLE_COMPONENT_CAST just simply print its component subtree.

I think we could instead reuse DEMANGLE_COMPONENT_CAST and in
d_print_comp_inner still do:

 @@ -5001,9 +5013,9 @@ d_print_comp_inner (struct d_print_info *dpi, int 
options,
d_print_comp (dpi, options, dc->u.s_extended_operator.name);
return;

 case DEMANGLE_COMPONENT_CAST:
   d_append_string (dpi, "operator ");
 - d_print_cast (dpi, options, dc);
 + d_print_conversion (dpi, options, dc);
   return;

leaving the unary cast case below calling d_print_cast, but seems to
me that spliting the component types makes it easier to reason about
the code.

g++'s testsuite actually generates three symbols that crash the
demangler in the same way.  I've added those as tests in the demangler
testsuite as well.

And then this fixes PR other/61233 too, which happens to be a
demangler crash originally reported to GDB, at:
https://sourceware.org/bugzilla/show_bug.cgi?id=16957

Bootstrapped and regtested on x86_64 Fedora 20.

Also ran this through GDB's testsuite.  GDB will require a small
update to use DEMANGLE_COMPONENT_CONVERSION in one place it's using
DEMANGLE_COMPONENT_CAST in its sources.

libiberty/
2015-11-27  Pedro Alves  

PR other/61321
PR other/61233
* demangle.h (enum demangle_component_type)
: New value.
* cp-demangle.c (d_demangle_callback, d_make_comp): Handle
DEMANGLE_COMPONENT_CONVERSION.
(is_ctor_dtor_or_conversion): Handle DEMANGLE_COMPONENT_CONVERSION
instead of DEMANGLE_COMPONENT_CAST.
(d_operator_name): Return a DEMANGLE_COMPONENT_CONVERSION
component if handling a conversion.
(d_count_templates_scopes, d_print_comp_inner): Handle
DEMANGLE_COMPONENT_CONVERSION.
(d_print_comp_inner): Handle 

Re: [PATCH] Fix PR68067

2015-11-27 Thread Alan Lawrence

On 23/11/15 09:43, Richard Biener wrote:

On Fri, 20 Nov 2015, Alan Lawrence wrote:


...the asserts
you suggested in (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D68117#c27)...

>>

So I have to ask, how sure are you that those assertions are(/should
be!) "correct"? :)


Ideally they should be correct but they happen to be not (and I think
the intent was that this should be harmless).  Basically I tried
to assert that nobody creates stale edge redirect data that is not
later consumed or cleared.  Happens to be too optimistic :/


Maybe so, but it looks like the edge_var_redirect_map is still suspect here. On 
the ~~28th call to loop_version, from tree_unswitch_loop, the call to 
lv_flush_pending_stmts executes (tree-cfg.c flush_pending_stmts):


   def = redirect_edge_var_map_def (vm);
   add_phi_arg (phi, def, e, redirect_edge_var_map_location(vm));

and BLOCK_LOCATION (redirect_edge_var_map_location(vm)) is

< 0x7fb7704a80 side-effects addressable asm_written used 
protected static visited tree_0 tree_2 tree_5>


so yeah, next question, how'd that get there...

A.



Re: [PATCH] GCC system.h and Graphite header order

2015-11-27 Thread Thomas Schwinge
Hi!

On Tue, 24 Nov 2015 10:32:12 +, Alan Lawrence  wrote:
> I note doc/install.texi says that gcc uses "ISL Library version 0.15,
> 0.14, 0.13, or 0.12.2". This patch breaks the build with 0.12.2 (a
> subset of errors below)

 has been filed.  I set you guys on CC.

> but seems fine with 0.14. I haven't tested
> 0.13. Do we want to update install.texi ?

I have a slight preference to keep ISL 0.12.2 supported, but can adapt to
a newer version, if necessary.

> In file included from
> /work/alalaw01/build-aarch64-none-elf/host-tools/include/isl/list.h:13:0,
>  from
> /work/alalaw01/build-aarch64-none-elf/host-tools/include/isl/aff_type.h:4,
>  from
> /work/alalaw01/build-aarch64-none-elf/host-tools/include/isl/local_space.h:4,
>  from
> /work/alalaw01/build-aarch64-none-elf/host-tools/include/isl/constraint.h:13,
>  from /work/alalaw01/src/gcc/gcc/graphite-optimize-isl.c:41:
> /work/alalaw01/build-aarch64-none-elf/host-tools/include/isl/ctx.h:108:8:
> error: attempt to use poisoned "malloc"
> malloc(size)))
> ^
> /work/alalaw01/build-aarch64-none-elf/host-tools/include/isl/ctx.h:112:8:
> error: attempt to use poisoned "realloc"
> realloc(ptr,size)))
> ^
> /usr/include/c++/4.8/bits/locale_facets.h:2566:44: error: macro
> "isdigit" passed 2 arguments, but takes just 1
>  isdigit(_CharT __c, const locale& __loc)
> ^
> /usr/include/c++/4.8/bits/locale_facets.h:2572:44: error: macro
> "ispunct" passed 2 arguments, but takes just 1
>  ispunct(_CharT __c, const locale& __loc)
> ^
> /usr/include/c++/4.8/bits/locale_facets.h:2578:45: error: macro
> "isxdigit" passed 2 arguments, but takes just 1
>  isxdigit(_CharT __c, const locale& __loc)
>  ^
> /usr/include/c++/4.8/bits/locale_facets.h:2584:44: error: macro
> "isalnum" passed 2 arguments, but takes just 1
>  isalnum(_CharT __c, const locale& __loc)
> ^
> /usr/include/c++/4.8/bits/locale_facets.h:2590:44: error: macro
> "isgraph" passed 2 arguments, but takes just 1
>  isgraph(_CharT __c, const locale& __loc)
> ^
> /usr/include/c++/4.8/bits/locale_facets.h:2596:44: error: macro
> "toupper" passed 2 arguments, but takes just 1
>  toupper(_CharT __c, const locale& __loc)
> ^
> /usr/include/c++/4.8/bits/locale_facets.h:2602:44: error: macro
> "tolower" passed 2 arguments, but takes just 1
>  tolower(_CharT __c, const locale& __loc)
> 
> In file included from /usr/include/c++/4.8/bits/basic_ios.h:37:0,
>  from /usr/include/c++/4.8/ios:44,
>  from /usr/include/c++/4.8/ostream:38,
>  from /usr/include/c++/4.8/iostream:39,
>  from
> /work/alalaw01/build-aarch64-none-elf/host-tools/include/isl/int.h:17,
>  from
> /work/alalaw01/build-aarch64-none-elf/host-tools/include/isl/ctx.h:16,
>  from
> /work/alalaw01/build-aarch64-none-elf/host-tools/include/isl/list.h:13,
>  from
> /work/alalaw01/build-aarch64-none-elf/host-tools/include/isl/aff_type.h:4,
>  from
> /work/alalaw01/build-aarch64-none-elf/host-tools/include/isl/local_space.h:4,
>  from
> /work/alalaw01/build-aarch64-none-elf/host-tools/include/isl/constraint.h:13,
>  from /work/alalaw01/src/gcc/gcc/graphite-scop-detection.c:52:
> /usr/include/c++/4.8/bits/locale_facets.h:2530:5: error:
> ‘std::isspace’ declared as an ‘inline’ variable
>  isspace(_CharT __c, const locale& __loc)
>  ^
> /usr/include/c++/4.8/bits/locale_facets.h:2530:5: error: template
> declaration of ‘bool std::isspace’
> /usr/include/c++/4.8/bits/locale_facets.h:2531:7: error: expected
> primary-expression before ‘return’
>  { return use_facet >(__loc).is(ctype_base::space, __c); }
>^
> /usr/include/c++/4.8/bits/locale_facets.h:2531:7: error: expected ‘}’
> before ‘return’
> /usr/include/c++/4.8/bits/locale_facets.h:2536:5: error: ‘isprint’
> declared as an ‘inline’ variable
>  isprint(_CharT __c, const locale& __loc)
>  ^
> /usr/include/c++/4.8/bits/locale_facets.h:2536:5: error: template
> declaration of ‘bool isprint’
> /usr/include/c++/4.8/bits/locale_facets.h:2537:7: error: expected
> primary-expression before ‘return’
>  { return use_facet >(__loc).is(ctype_base::print, __c); }
>^
> /usr/include/c++/4.8/bits/locale_facets.h:2537:7: error: expected ‘}’
> before ‘return’
> /usr/include/c++/4.8/bits/locale_facets.h:2537:75: error: expected
> declaration before ‘}’ token
>  { return use_facet >(__loc).is(ctype_base::print, __c); }


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: basic asm and memory clobbers

2015-11-27 Thread Segher Boessenkool
Hi Bernd,

On Fri, Nov 27, 2015 at 09:26:40AM +, Bernd Edlinger wrote:
> On Tue, 17 Nov 2015 14:31:29, Jeff Law wrote:
> > The benefit is traditional asms do the expected thing. With no way to 
> > describe dataflow, the only rational behaviour for a traditional asm is 
> > that it has to be considered a
> use/clobber of memory and hard registers.
> 
> I'd like to mention here, that there is also another use-case for a basic 
> asms:
> 
> It is most often used as a fairly portable memory barrier like this:
> 
> x = 1;
> asm(""); // memory barrier
> y = 2;
> 
> that is also the reason why every basic asm is implicitly a volatile asm.

But that is not a memory barrier, not as currently implemented anyway:

===
int a, b;

void f(void)
{
int j;
for (j = 0; j < 10; j++) {
a = 42;
asm("lolz");
b = 31;
}
}
===

does the asms in a loop, followed by the two stores.  Making it
asm("lolz" ::: "memory"); works as you seem to expect.

It has behaved like this since at least 4.0 (the oldest compiler I have
around currently).

[ Yes I'm a broken record. ]


Segher


Re: [gomp4.5] Handle #pragma omp declare target link

2015-11-27 Thread Ilya Verbin
On Thu, Nov 19, 2015 at 16:31:15 +0100, Jakub Jelinek wrote:
> On Mon, Nov 16, 2015 at 06:40:43PM +0300, Ilya Verbin wrote:
> > @@ -2009,7 +2010,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> >   decl = OMP_CLAUSE_DECL (c);
> >   /* Global variables with "omp declare target" attribute
> >  don't need to be copied, the receiver side will use them
> > -directly.  */
> > +directly.  However, global variables with "omp declare target link"
> > +attribute need to be copied.  */
> >   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> >   && DECL_P (decl)
> >   && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
> > @@ -2017,7 +2019,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> >!= GOMP_MAP_FIRSTPRIVATE_REFERENCE))
> >   || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
> >   && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
> > - && varpool_node::get_create (decl)->offloadable)
> > + && varpool_node::get_create (decl)->offloadable
> > + && !lookup_attribute ("omp declare target link",
> > +   DECL_ATTRIBUTES (decl)))
> 
> I wonder if Honza/Richi wouldn't prefer to have this info also
> in cgraph, instead of looking up the attribute in each case.

So should I add a new flag into cgraph?
Also it is used in gimplify_adjust_omp_clauses.

> > +  if (var.link_ptr_decl == NULL_TREE)
> > +   addr = build_fold_addr_expr (var.decl);
> > +  else
> > +   {
> > + /* For "omp declare target link" var use address of the pointer
> > +instead of address of the var.  */
> > + addr = build_fold_addr_expr (var.link_ptr_decl);
> > + /* Most significant bit of the size marks such vars.  */
> > + unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
> > + isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node) * 8 - 1);
> > + size = wide_int_to_tree (const_ptr_type_node, isize);
> > +
> > + /* FIXME: Remove varpool node of var?  */
> 
> There is varpool_node::remove (), but not sure if at this point all the
> references are already gone.

Actually removing varpool node here will not remove var from the target code, so
I've added a check in cgraphunit.c before assemble_decl ().

> > +class pass_omp_target_link : public gimple_opt_pass
> > +{
> > +public:
> > +  pass_omp_target_link (gcc::context *ctxt)
> > +: gimple_opt_pass (pass_data_omp_target_link, ctxt)
> > +  {}
> > +
> > +  /* opt_pass methods: */
> > +  virtual bool gate (function *fun)
> > +{
> > +#ifdef ACCEL_COMPILER
> > +  /* FIXME: Replace globals in target regions too or not?  */
> > +  return lookup_attribute ("omp declare target",
> > +  DECL_ATTRIBUTES (fun->decl));
> 
> Certainly in "omp declare target entrypoint" regions too.

Done.

> > +unsigned
> > +pass_omp_target_link::execute (function *fun)
> > +{
> > +  basic_block bb;
> > +  FOR_EACH_BB_FN (bb, fun)
> > +{
> > +  gimple_stmt_iterator gsi;
> > +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > +   {
> > + unsigned i;
> > + gimple *stmt = gsi_stmt (gsi);
> > + for (i = 0; i < gimple_num_ops (stmt); i++)
> > +   {
> > + tree op = gimple_op (stmt, i);
> > + tree var = NULL_TREE;
> > +
> > + if (!op)
> > +   continue;
> > + if (TREE_CODE (op) == VAR_DECL)
> > +   var = op;
> > + else if (TREE_CODE (op) == ADDR_EXPR)
> > +   {
> > + tree op1 = TREE_OPERAND (op, 0);
> > + if (TREE_CODE (op1) == VAR_DECL)
> > +   var = op1;
> > +   }
> > + /* FIXME: Support arrays.  What else?  */
> 
> We need to support all the references to the variables.
> So, I think this approach is not right.
> 
> > +
> > + if (var && lookup_attribute ("omp declare target link",
> > +  DECL_ATTRIBUTES (var)))
> > +   {
> > + tree type = TREE_TYPE (var);
> > + tree ptype = build_pointer_type (type);
> > +
> > + /* Find var in offload table.  */
> > + omp_offload_var *table_entry = NULL;
> > + for (unsigned j = 0; j < vec_safe_length (offload_vars); j++)
> > +   if ((*offload_vars)[j].decl == var)
> > + {
> > +   table_entry = &(*offload_vars)[j];
> > +   break;
> > + }
> 
> Plus this would be terribly expensive if there are many variables in
> offload_vars.
> So, what I think should be done instead is that you first somewhere, perhaps
> when streaming in the decls from LTO in ACCEL_COMPILER or so, create
> the artificial link ptr variables for the "omp declare target link"
> global vars and
>   SET_DECL_VALUE_EXPR (var, build_simple_mem_ref (link_ptr_var));
>   DECL_HAS_VALUE_EXPR_P (var) = 1;
> and then in this pass just wal

Re: [PATCH] GCC system.h and Graphite header order

2015-11-27 Thread David Edelsohn
On Fri, Nov 27, 2015 at 11:24 AM, Thomas Schwinge
 wrote:
> Hi!
>
> On Tue, 24 Nov 2015 10:32:12 +, Alan Lawrence  
> wrote:
>> I note doc/install.texi says that gcc uses "ISL Library version 0.15,
>> 0.14, 0.13, or 0.12.2". This patch breaks the build with 0.12.2 (a
>> subset of errors below)
>
>  has been filed.  I set you guys on CC.
>
>> but seems fine with 0.14. I haven't tested
>> 0.13. Do we want to update install.texi ?
>
> I have a slight preference to keep ISL 0.12.2 supported, but can adapt to
> a newer version, if necessary.

I updated the install document yesterday.

I don't object to support for ISL 0.12.2, but someone has to implement
an appropriate header file incantation for the Graphite source files
WITHOUT reordering it again nor including ISL header files first --
before system.h.  Some GCC header files must be included first in GCC
source files.

Thanks, David


Re: [PATCH 1/7][ARM] Add support for ARMv8.1.

2015-11-27 Thread Matthew Wahab

On 27/11/15 14:05, Christophe Lyon wrote:

On 26 November 2015 at 16:55, Matthew Wahab  wrote:



ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
ARMv8.1 and for the new instructions, enabling the architecture with
--march=armv8.1-a. The new instructions are enabled when both ARMv8.1
and a suitable fpu options are set, for instance with -march=armv8.1-a
-mfpu=neon-fp-armv8 -mfloat-abi=hard.

This patch set adds the command line options and internal feature
macros. Following patches
- enable multilib support for ARMv8.1,
- add patterns for the new instructions,
- add the ACLE feature macro for the ARMv8.1 extensions,
- extend target support in the testsuite to ARMv8.1,
- add the ACLE intrinsics for vqrmdl{as}h and
- add the ACLE intrinsics for vqrmdl{as}h_lane.





The whole series LGTM, but do you plan to add tests for the new intrinsics?


The Adv.SIMD intrinsics tests are in gcc.target/aarch64/advsimd-intrinsics, they get 
run for both AArch64 and ARM backends. The tests for the new intrinsics were added 
(yesterday) by the AArch64 version of this patch.


Matthew


Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-11-27 Thread Matthew Wahab

On 27/11/15 13:44, Christophe Lyon wrote:

On 26/11/15 16:02, Matthew Wahab wrote:



This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.



I may be mistaken, but -mfpu=neon-fp-armv8 and -mfloat-abi=softfp are not
supported by aarch64-gcc. So it seems to me that
check_effective_target_arm_v8_1a_neon_ok_nocache will not always work
for aarch64 after your patch.



Or does it work because no option is needed and thus "" always
matches and thus the loop always exits after the first iteration
on aarch64?


Yes, the idea is that the empty string will make the function first try 
'-march=armv8.1-a' without any other flag. That will work for AArch64 because it 
doesn't need any other option.



Maybe a more accurate comment would help remembering that, in case
-mfpu option becomes necessary for aarch64.



Agreed, it's worth having a comment to explain what the 'foreach' construct is 
doing.

Matthew




Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-11-27 Thread Matthew Wahab

On 27/11/15 13:44, Christophe Lyon wrote:

On 26/11/15 16:02, Matthew Wahab wrote:



This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.



I may be mistaken, but -mfpu=neon-fp-armv8 and -mfloat-abi=softfp are not
supported by aarch64-gcc. So it seems to me that
check_effective_target_arm_v8_1a_neon_ok_nocache will not always work
for aarch64 after your patch.



Or does it work because no option is needed and thus "" always
matches and thus the loop always exits after the first iteration
on aarch64?


Yes, the idea is that the empty string will make the function first try 
'-march=armv8.1-a' without any other flag. That will work for AArch64 because it 
doesn't need any other option.



Maybe a more accurate comment would help remembering that, in case
-mfpu option becomes necessary for aarch64.



Agreed, it's worth having a comment to explain what the 'foreach' construct is 
doing.

Matthew




Re: [PATCH 1/7][ARM] Add support for ARMv8.1.

2015-11-27 Thread Christophe Lyon
On 27 November 2015 at 18:05, Matthew Wahab  wrote:
> On 27/11/15 14:05, Christophe Lyon wrote:
>>
>> On 26 November 2015 at 16:55, Matthew Wahab 
>> wrote:
>
>
>>> ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
>>> instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
>>> ARMv8.1 and for the new instructions, enabling the architecture with
>>> --march=armv8.1-a. The new instructions are enabled when both ARMv8.1
>>> and a suitable fpu options are set, for instance with -march=armv8.1-a
>>> -mfpu=neon-fp-armv8 -mfloat-abi=hard.
>>>
>>> This patch set adds the command line options and internal feature
>>> macros. Following patches
>>> - enable multilib support for ARMv8.1,
>>> - add patterns for the new instructions,
>>> - add the ACLE feature macro for the ARMv8.1 extensions,
>>> - extend target support in the testsuite to ARMv8.1,
>>> - add the ACLE intrinsics for vqrmdl{as}h and
>>> - add the ACLE intrinsics for vqrmdl{as}h_lane.
>>>
>
>>
>> The whole series LGTM, but do you plan to add tests for the new
>> intrinsics?
>
>
> The Adv.SIMD intrinsics tests are in gcc.target/aarch64/advsimd-intrinsics,
> they get run for both AArch64 and ARM backends. The tests for the new
> intrinsics were added (yesterday) by the AArch64 version of this patch.
>

Ha yes, of course.

> Matthew


Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-11-27 Thread Eric Botcazou
> So, I'm not familiar with Ada 'fat pointers' but if that is one -
> well, it's a record, with an alignment that the 'new' AAPCS now
> ignores, so yes the ABI has changed between gcc 5.1 and 5.2, rather
> more significantly for Ada than for C.

Yes, XUP suffixed types are fat pointers and they are maximally aligned so 
that they can be given non-BLK mode and, consequently, live in registers.

> Thoughts?

There is no official ABI for Ada so I guess that's not really a problem as 
long as it's documented on https://gcc.gnu.org/gcc-5/changes.html.

-- 
Eric Botcazou


Re: [PATCH] Fix PR68067

2015-11-27 Thread Alan Lawrence

On 27/11/15 15:07, Alan Lawrence wrote:

On 23/11/15 09:43, Richard Biener wrote:

On Fri, 20 Nov 2015, Alan Lawrence wrote:


...the asserts
you suggested in (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D68117#c27)...

 >>

So I have to ask, how sure are you that those assertions are(/should
be!) "correct"? :)


Ideally they should be correct but they happen to be not (and I think
the intent was that this should be harmless).  Basically I tried
to assert that nobody creates stale edge redirect data that is not
later consumed or cleared.  Happens to be too optimistic :/


Maybe so, but it looks like the edge_var_redirect_map is still suspect here. On
the ~~28th call to loop_version, from tree_unswitch_loop, the call to
lv_flush_pending_stmts executes (tree-cfg.c flush_pending_stmts):

def = redirect_edge_var_map_def (vm);
add_phi_arg (phi, def, e, redirect_edge_var_map_location(vm));

and BLOCK_LOCATION (redirect_edge_var_map_location(vm)) is

< 0x7fb7704a80 side-effects addressable asm_written used
protected static visited tree_0 tree_2 tree_5>

so yeah, next question, how'd that get there...

A.


Well, pass_dominator::execute calls redirect_edge_var_map with that edge 
pointer, at which time the edge is from from 32 (0x7fb79cc6e8) to block 20 
(0x7fb7485e38), and locus is 2147483884; and then again, with locus 0.


With no intervening calls to redirect_edge_var_map_clear for that edge, 
loop_version's call to flush_pending_statements then reads 
redirect_edge_var_map_vector for that edge pointer - which is now an edge from 
block 126 (0x7fb7485af8) to 117 (0x7fb74856e8). It sees those locations 
(2147483884 and 0)...


Clearing the edge redirect map at the end of pass_dominator fixes the ICE (as 
would clearing it at the end of each stage, or clearing it at the beginning of 
loop_unswitch, I guess).


I'll post a patch after more testing, but obviously I'm keen to hear if there 
are obvious problems with the approach?


And coming up with a testcase, well, heh - this broke because of identical 
pointers to structures allocated at different times, with intervening 
free...ideas welcome of course!


--Alan



patch to fix PR68536

2015-11-27 Thread Vladimir Makarov

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68536

The patch was tested and bootstrapped on x86/x86-64.

Committed as rev. 231021.

Index: lra-constraints.c
===
--- lra-constraints.c	(revision 230986)
+++ lra-constraints.c	(working copy)
@@ -3383,10 +3383,13 @@ curr_insn_transform (bool check_only_p)
depend on memory mode.  */
 for (i = 0; i < n_operands; i++)
   {
-	rtx op = *curr_id->operand_loc[i];
-	rtx subst, old = op;
+	rtx op, subst, old;
 	bool op_change_p = false;
+
+	if (curr_static_id->operand[i].is_operator)
+	  continue;
 	
+	old = op = *curr_id->operand_loc[i];
 	if (GET_CODE (old) == SUBREG)
 	  old = SUBREG_REG (old);
 	subst = get_equiv_with_elimination (old, curr_insn);
Index: lra.c
===
--- lra.c	(revision 230986)
+++ lra.c	(working copy)
@@ -382,7 +382,7 @@ lra_emit_add (rtx x, rtx y, rtx z)
 	  base = a1;
 	  index = a2;
 	}
-  if (! (REG_P (base) || GET_CODE (base) == SUBREG)
+  if ((base != NULL_RTX && ! (REG_P (base) || GET_CODE (base) == SUBREG))
 	  || (index != NULL_RTX
 	  && ! (REG_P (index) || GET_CODE (index) == SUBREG))
 	  || (disp != NULL_RTX && ! CONSTANT_P (disp))
@@ -442,18 +442,28 @@ lra_emit_add (rtx x, rtx y, rtx z)
 		  rtx_insn *insn = emit_add2_insn (x, disp);
 		  if (insn != NULL_RTX)
 		{
-		  insn = emit_add2_insn (x, base);
-		  if (insn != NULL_RTX)
+		  if (base == NULL_RTX)
 			ok_p = true;
+		  else
+			{
+			  insn = emit_add2_insn (x, base);
+			  if (insn != NULL_RTX)
+			ok_p = true;
+			}
 		}
 		}
 	  if (! ok_p)
 		{
+		  rtx_insn *insn;
+		  
 		  delete_insns_since (last);
 		  /* Generate x = disp; x = x + base; x = x + index_scale.  */
 		  emit_move_insn (x, disp);
-		  rtx_insn *insn = emit_add2_insn (x, base);
-		  lra_assert (insn != NULL_RTX);
+		  if (base != NULL_RTX)
+		{
+		  insn = emit_add2_insn (x, base);
+		  lra_assert (insn != NULL_RTX);
+		}
 		  insn = emit_add2_insn (x, index_scale);
 		  lra_assert (insn != NULL_RTX);
 		}
Index: ChangeLog
===
--- ChangeLog	(revision 230986)
+++ ChangeLog	(working copy)
@@ -1,3 +1,10 @@
+2015-11-27  Vladimir Makarov  
+
+	PR rtl-optimization/68536
+	* lra.c (lra_emit_add): Add code for null base.
+	* lra-constraints.c (curr_insn_transform): Skip operators for
+	subreg reloads.
+
 2015-11-26  Alexandre Oliva 
 
 	PR rtl-optimization/67753


Re: [patch] Copy-edit the Option Summary in invoke.texi

2015-11-27 Thread Jonathan Wakely

On 27/11/15 09:45 +0100, Bernd Schmidt wrote:

On 11/26/2015 01:16 PM, Jonathan Wakely wrote:

At https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html we document
-Waggressive-loop-optimizations but you can't find that option at
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html because we
document -Wno-aggressive-loop-optimizations instead. Similarly, you
can't find -Wpedantic-ms-format in the full listing, because we
document the negative form, -Wno-pedantic-ms-format, but list *both*
in the summary. This patches fixes those mistakes.


I'm guessing we want the negative form documented for anything that 
has Init(1)? Ok.


Yep:

"Each of these specific warning options also has a negative form
beginning ‘-Wno-’ to turn off warnings; for example, -Wno-implicit.
This manual lists only one of the two forms, whichever is not the
default."


I've also tried to put the list back into alphabetical order, and
re-justified the list a bit to avoid some especially short lines (I
don't understand the inconsistent use of single or double spaces
between options, so if there's some logic to that I've not followed
it, but I think this is an improvement).


No idea about single vs double spaces. One line has two @gol which 
seems like an error, you can fix that too if you like.


Good catch, fixed in this version, which I've committed to trunk,
thanks.


commit 8ecbce72e467fe11be62df996a53941493eb8f46
Author: Jonathan Wakely 
Date:   Fri Nov 27 18:35:22 2015 +

Copy-edit the Option Summary in invoke.texi

	* doc/invoke.texi (Option Summary): Use negative form of
	-Waggressive-loop-optimizations, remove redundant -Wpedantic-ms-format,
	sort alphabetically and re-justify.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8f87268..586f11f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -242,39 +242,37 @@ Objective-C and Objective-C++ Dialects}.
 @gccoptlist{-fsyntax-only  -fmax-errors=@var{n}  -Wpedantic @gol
 -pedantic-errors @gol
 -w  -Wextra  -Wall  -Waddress  -Waggregate-return  @gol
--Waggressive-loop-optimizations -Warray-bounds -Warray-bounds=@var{n} @gol
--Wbool-compare -Wduplicated-cond -Wframe-address @gol
--Wno-attributes -Wno-builtin-macro-redefined @gol
+-Wno-aggressive-loop-optimizations -Warray-bounds -Warray-bounds=@var{n} @gol
+-Wno-attributes -Wbool-compare -Wno-builtin-macro-redefined @gol
 -Wc90-c99-compat -Wc99-c11-compat @gol
 -Wc++-compat -Wc++11-compat -Wc++14-compat -Wcast-align  -Wcast-qual  @gol
 -Wchar-subscripts -Wclobbered  -Wcomment -Wconditionally-supported  @gol
--Wconversion -Wcoverage-mismatch -Wdate-time -Wdelete-incomplete -Wno-cpp  @gol
+-Wconversion -Wcoverage-mismatch -Wno-cpp -Wdate-time -Wdelete-incomplete @gol
 -Wno-deprecated -Wno-deprecated-declarations -Wno-designated-init @gol
 -Wdisabled-optimization @gol
 -Wno-discarded-qualifiers -Wno-discarded-array-qualifiers @gol
--Wno-div-by-zero -Wdouble-promotion -Wempty-body  -Wenum-compare @gol
--Wno-endif-labels -Werror  -Werror=* @gol
--Wfatal-errors  -Wfloat-equal  -Wformat  -Wformat=2 @gol
+-Wno-div-by-zero -Wdouble-promotion -Wduplicated-cond @gol
+-Wempty-body  -Wenum-compare -Wno-endif-labels @gol
+-Werror  -Werror=* -Wfatal-errors -Wfloat-equal  -Wformat  -Wformat=2 @gol
 -Wno-format-contains-nul -Wno-format-extra-args -Wformat-nonliteral @gol
--Wformat-security  -Wformat-signedness  -Wformat-y2k @gol
+-Wformat-security  -Wformat-signedness  -Wformat-y2k -Wframe-address @gol
 -Wframe-larger-than=@var{len} -Wno-free-nonheap-object -Wjump-misses-init @gol
 -Wignored-qualifiers  -Wincompatible-pointer-types @gol
 -Wimplicit  -Wimplicit-function-declaration  -Wimplicit-int @gol
 -Winit-self  -Winline  -Wno-int-conversion @gol
 -Wno-int-to-pointer-cast -Wno-invalid-offsetof @gol
--Wnull-dereference @gol
--Winvalid-pch -Wlarger-than=@var{len}  -Wunsafe-loop-optimizations @gol
+-Winvalid-pch -Wlarger-than=@var{len} @gol
 -Wlogical-op -Wlogical-not-parentheses -Wlong-long @gol
 -Wmain -Wmaybe-uninitialized -Wmemset-transposed-args @gol
 -Wmisleading-indentation -Wmissing-braces @gol
 -Wmissing-field-initializers -Wmissing-include-dirs @gol
 -Wno-multichar  -Wnonnull  -Wnormalized=@r{[}none@r{|}id@r{|}nfc@r{|}nfkc@r{]} @gol
--Wodr  -Wno-overflow  -Wopenmp-simd @gol
--Woverride-init-side-effects @gol
--Woverlength-strings  -Wpacked  -Wpacked-bitfield-compat  -Wpadded @gol
--Wparentheses  -Wpedantic-ms-format -Wno-pedantic-ms-format @gol
+-Wnull-dereference -Wodr  -Wno-overflow  -Wopenmp-simd  @gol
+-Woverride-init-side-effects -Woverlength-strings @gol
+-Wpacked  -Wpacked-bitfield-compat  -Wpadded @gol
+-Wparentheses -Wno-pedantic-ms-format @gol
 -Wplacement-new -Wpointer-arith  -Wno-pointer-to-int-cast @gol
--Wredundant-decls  -Wno-return-local-addr @gol
+-Wno-pragmas -Wredundant-decls  -Wno-return-local-addr @gol
 -Wreturn-type  -Wsequence-point  -Wshadow  -Wno-shadow-ivar @gol
 -Wshift-overflow -Wshift-overflow=@var{n} @gol
 -Wshift-count-negative -Wshift-count-overflow -Wshi

Re: Improving the cxx0x_warning.h diagnostic

2015-11-27 Thread Jonathan Wakely

On 27/11/15 12:53 -0500, NightStrike wrote:

You could add a pragma that turns on -Wfatal-errors


That might be better, I'll try it, thanks.



[PATCH] Add save_expr langhook (PR c/68513)

2015-11-27 Thread Marek Polacek
As suggested here 
and here , this patch
adds a new langhook to distinguish whether to call c_save_expr or save_expr
from match.pd.  Does this look reasonable?

I didn't know where to put setting of in_late_processing.  With the current
placement, we won't (for valid programs) call c_save_expr from c_genericize
or c_gimplify_expr.

I suppose I should also modify save_expr in fold-const.c to call it via the
langhook, if this approach is sane.  Dunno.

Bootstrapped/regtested on x86_64-linux.

2015-11-27  Marek Polacek  

PR c/68513
* c-common.c (in_late_processing): New global.
(c_common_save_expr): New function.
* c-common.h (in_late_processing, c_common_save_expr): Declare.

* c-objc-common.h (LANG_HOOKS_SAVE_EXPR): Define.
* c-parser.c (c_parser_compound_statement): Set IN_LATE_PROCESSING.

* generic-match-head.c: Include "langhooks.h".
* genmatch.c (dt_simplify::gen_1): Call save_expr via langhook.
* langhooks-def.h (LANG_HOOKS_SAVE_EXPR): Define.
* langhooks.h (struct lang_hooks): Add save_expr langhook.

* gcc.dg/torture/pr68513.c: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index fe0a235..850bee9 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -271,6 +271,12 @@ int c_inhibit_evaluation_warnings;
be generated.  */
 bool in_late_binary_op;
 
+/* When true, all the constant expression checks from parsing should have been
+   done.  This is used so that fold knows whether to call c_save_expr (thus
+   c_fully_fold is called on the expression), or whether to call save_expr via
+   c_common_save_expr langhook.  */
+bool in_late_processing;
+
 /* Whether lexing has been completed, so subsequent preprocessor
errors should use the compiler's input_location.  */
 bool done_lexing = false;
@@ -4928,6 +4934,15 @@ c_save_expr (tree expr)
   return expr;
 }
 
+/* The C version of the save_expr langhook.  Either call save_expr or 
c_save_expr,
+   depending on IN_LATE_PROCESSING.  */
+
+tree
+c_common_save_expr (tree expr)
+{
+  return in_late_processing ? save_expr (expr) : c_save_expr (expr);
+}
+
 /* Return whether EXPR is a declaration whose address can never be
NULL.  */
 
diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
index bad8d05..e2d4ba9 100644
--- gcc/c-family/c-common.h
+++ gcc/c-family/c-common.h
@@ -771,6 +771,7 @@ extern void c_register_addr_space (const char *str, 
addr_space_t as);
 
 /* In c-common.c.  */
 extern bool in_late_binary_op;
+extern bool in_late_processing;
 extern const char *c_addr_space_name (addr_space_t as);
 extern tree identifier_global_value (tree);
 extern tree c_linkage_bindings (tree);
@@ -812,6 +813,7 @@ extern tree c_fully_fold (tree, bool, bool *);
 extern tree decl_constant_value_for_optimization (tree);
 extern tree c_wrap_maybe_const (tree, bool);
 extern tree c_save_expr (tree);
+extern tree c_common_save_expr (tree);
 extern tree c_common_truthvalue_conversion (location_t, tree);
 extern void c_apply_type_quals_to_decl (int, tree);
 extern tree c_sizeof_or_alignof_type (location_t, tree, bool, bool, int);
diff --git gcc/c/c-objc-common.h gcc/c/c-objc-common.h
index 50c9f54..9fd3722 100644
--- gcc/c/c-objc-common.h
+++ gcc/c/c-objc-common.h
@@ -60,6 +60,8 @@ along with GCC; see the file COPYING3.  If not see
 #define LANG_HOOKS_BUILTIN_FUNCTION c_builtin_function
 #undef  LANG_HOOKS_BUILTIN_FUNCTION_EXT_SCOPE
 #define LANG_HOOKS_BUILTIN_FUNCTION_EXT_SCOPE c_builtin_function_ext_scope
+#undef LANG_HOOKS_SAVE_EXPR
+#define LANG_HOOKS_SAVE_EXPR c_common_save_expr
 
 /* Attribute hooks.  */
 #undef LANG_HOOKS_COMMON_ATTRIBUTE_TABLE
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 0259f66..3f7c458 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -4586,6 +4586,9 @@ c_parser_compound_statement (c_parser *parser)
 {
   tree stmt;
   location_t brace_loc;
+
+  in_late_processing = false;
+
   brace_loc = c_parser_peek_token (parser)->location;
   if (!c_parser_require (parser, CPP_OPEN_BRACE, "expected %<{%>"))
 {
@@ -4598,6 +4601,9 @@ c_parser_compound_statement (c_parser *parser)
   stmt = c_begin_compound_stmt (true);
   c_parser_compound_statement_nostart (parser);
 
+  /* From now on, the fold machinery shouldn't call c_save_expr.  */
+  in_late_processing = true;
+
   /* If the compound stmt contains array notations, then we expand them.  */
   if (flag_cilkplus && contains_array_notation_expr (stmt))
 stmt = expand_array_notation_exprs (stmt);
diff --git gcc/generic-match-head.c gcc/generic-match-head.c
index f55f91e..8fc9d40 100644
--- gcc/generic-match-head.c
+++ gcc/generic-match-head.c
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "dumpfile.h"
 #include "case-cfn-macros.h"
+#include "langhooks.h"
 
 
 /* Routine to determine if the t

Re: [PATCH] Add save_expr langhook (PR c/68513)

2015-11-27 Thread Marek Polacek
On Fri, Nov 27, 2015 at 07:55:43PM +0100, Marek Polacek wrote:
> +/* The C version of the save_expr langhook.  Either call save_expr or 
> c_save_expr,
> +   depending on IN_LATE_PROCESSING.  */

Consider this too long line fixed.

Marek


[PATCH] Fix up target {{enter,exit} nowait,update} depend nowait

2015-11-27 Thread Jakub Jelinek
Hi!

I've recently changed the code so that GOMP_TARGET_TASK_DATA
depend nowait, if it doesn't need to wait for any dependencies,
is handled non-asynchronously, and before that the task created
because we didn't know if there are dependencies or not, is freed.
But, I forgot to remove the depend entries from the hash table, so there
would be stale entries pointing to freed memory.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk.

2015-11-27  Jakub Jelinek  

PR libgomp/68579
* task.c (gomp_task_run_post_handle_depend_hash): New forward decl.
(gomp_create_target_task): Call it before freeing
GOMP_TARGET_TASK_DATA tasks.

--- libgomp/task.c.jj   2015-11-14 19:38:31.0 +0100
+++ libgomp/task.c  2015-11-27 11:26:23.796311905 +0100
@@ -585,6 +585,8 @@ GOMP_PLUGIN_target_task_completion (void
   gomp_mutex_unlock (&team->task_lock);
 }
 
+static void gomp_task_run_post_handle_depend_hash (struct gomp_task *);
+
 /* Called for nowait target tasks.  */
 
 bool
@@ -704,6 +706,7 @@ gomp_create_target_task (struct gomp_dev
 }
   if (state == GOMP_TARGET_TASK_DATA)
 {
+  gomp_task_run_post_handle_depend_hash (task);
   gomp_mutex_unlock (&team->task_lock);
   gomp_finish_task (task);
   free (task);

Jakub


[PATCH] Fix vector rsqrt discovery (PR tree-optimization/68501)

2015-11-27 Thread Jakub Jelinek
Hi!

The recent changes where vector sqrt is represented in the IL using
IFN_SQRT instead of target specific builtins broke the discovery
of vector rsqrt, as targetm.builtin_reciprocal is called only
on builtin functions (not internal functions).  Furthermore,
for internal fns, not only the IFN_* is significant, but also the
types (modes actually) of the lhs and/or arguments.

This patch adjusts the target hook, so that the backends can just inspect
the call (builtin or internal function), whatever it is.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-11-27  Jakub Jelinek  

PR tree-optimization/68501
* target.def (builtin_reciprocal): Replace the 3 arguments with
a gcall * one, adjust description.
* targhooks.h (default_builtin_reciprocal): Replace the 3 arguments
with a gcall * one.
* targhooks.c (default_builtin_reciprocal): Likewise.
* tree-ssa-math-opts.c (pass_cse_reciprocals::execute): Use
targetm.builtin_reciprocal even on internal functions, adjust
the arguments and allow replacing an internal function with normal
built-in.
* config/i386/i386.c (ix86_builtin_reciprocal): Replace the 3 arguments
with a gcall * one.  Handle internal fns too.
* config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Likewise.
* config/aarch64/aarch64.c (aarch64_builtin_reciprocal): Likewise.
* doc/tm.texi (builtin_reciprocal): Document.

--- gcc/target.def.jj   2015-11-18 11:19:19.0 +0100
+++ gcc/target.def  2015-11-27 16:37:07.870823670 +0100
@@ -2463,13 +2463,9 @@ identical versions.",
 DEFHOOK
 (builtin_reciprocal,
  "This hook should return the DECL of a function that implements reciprocal 
of\n\
-the builtin function with builtin function code @var{fn}, or\n\
-@code{NULL_TREE} if such a function is not available.  @var{md_fn} is true\n\
-when @var{fn} is a code of a machine-dependent builtin function.  When\n\
-@var{sqrt} is true, additional optimizations that apply only to the 
reciprocal\n\
-of a square root function are performed, and only reciprocals of @code{sqrt}\n\
-function are valid.",
- tree, (unsigned fn, bool md_fn, bool sqrt),
+the builtin or internal function call @var{call}, or\n\
+@code{NULL_TREE} if such a function is not available.",
+ tree, (gcall *call),
  default_builtin_reciprocal)
 
 /* For a vendor-specific TYPE, return a pointer to a statically-allocated
--- gcc/targhooks.h.jj  2015-11-18 11:19:17.0 +0100
+++ gcc/targhooks.h 2015-11-27 16:37:44.828301093 +0100
@@ -90,7 +90,7 @@ extern tree default_builtin_vectorized_c
 
 extern int default_builtin_vectorization_cost (enum vect_cost_for_stmt, tree, 
int);
 
-extern tree default_builtin_reciprocal (unsigned int, bool, bool);
+extern tree default_builtin_reciprocal (gcall *);
 
 extern HOST_WIDE_INT default_vector_alignment (const_tree);
 
--- gcc/targhooks.c.jj  2015-11-18 11:19:17.0 +0100
+++ gcc/targhooks.c 2015-11-27 16:38:21.461783097 +0100
@@ -600,9 +600,7 @@ default_builtin_vectorization_cost (enum
 /* Reciprocal.  */
 
 tree
-default_builtin_reciprocal (unsigned int fn ATTRIBUTE_UNUSED,
-   bool md_fn ATTRIBUTE_UNUSED,
-   bool sqrt ATTRIBUTE_UNUSED)
+default_builtin_reciprocal (gcall *)
 {
   return NULL_TREE;
 }
--- gcc/tree-ssa-math-opts.c.jj 2015-11-25 09:57:47.0 +0100
+++ gcc/tree-ssa-math-opts.c2015-11-27 17:07:22.756162308 +0100
@@ -601,19 +601,17 @@ pass_cse_reciprocals::execute (function
 
  if (is_gimple_call (stmt1)
  && gimple_call_lhs (stmt1)
- && (fndecl = gimple_call_fndecl (stmt1))
- && (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
- || DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD))
+ && (gimple_call_internal_p (stmt1)
+ || ((fndecl = gimple_call_fndecl (stmt1))
+ && (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
+ || (DECL_BUILT_IN_CLASS (fndecl)
+ == BUILT_IN_MD)
{
- enum built_in_function code;
- bool md_code, fail;
+ bool fail;
  imm_use_iterator ui;
  use_operand_p use_p;
 
- code = DECL_FUNCTION_CODE (fndecl);
- md_code = DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD;
-
- fndecl = targetm.builtin_reciprocal (code, md_code, false);
+ fndecl = targetm.builtin_reciprocal (as_a  (stmt1));
  if (!fndecl)
continue;
 
@@ -639,8 +637,28 @@ pass_cse_reciprocals::execute (function
continue;
 
  gimple_replace_ssa_lhs (stmt1, arg1);
- gimple_call_set_fndecl (stmt1, fndecl);
- update_stmt (stmt1);
+ if (gimple

[PATCH] [ping] Add support for ARM embedded multilibs

2015-11-27 Thread Jasmin J.
Hello !

Did rebase the patch (sent at 11/08/2015 01:24 AM) to latest trunk
(335ce86cb6cea8046993ab93d573316fd9ff798c).

The patch is originally from Terry Guo
(see https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00729.html).
SVN commit r210320 on 
svn://gcc.gnu.org/svn/gcc/branches/ARM/embedded-4_9-branch .

The original was using "with_multilib_list" instead of TM_MULTILIB_CONFIG.
Moreover, it did not check each argument of "$with_multilib_list".

I simplified the patch and reworked it to use TM_MULTILIB_CONFIG. Additionally
each argument of "$with_multilib_list" is now checked.
I added missing "armv7".

I added the FSF header to t-rmprofile and a little explanation.

I have now a copyright assignment: #1059920

> see for example how I added t-aprofile to the backend and the kind of 
> testing it underwent
If this patch is now in principle acceptable, I will start working on your
suggested test scripts.

> The t-rmprofile file will need updating to newer values for -mcpu and 
> march
I will let this for open for other people, because I am not familiar with
the different CPU and ARCH variants. Keep in mind, that I am porting
Terry's patch only. But if someone it telling me what is required, I can add
it now and include it to the test scripts.

Regards,
   Jasmin

>From 69e0a3852b2c1adb9648ae5c5725d63f6e16b488 Mon Sep 17 00:00:00 2001
From: Jasmin Jessich 
Date: Sat, 24 Oct 2015 00:43:48 +0200
Subject: [PATCH] Add support for ARM embedded multilibs

Based on svn://gcc.gnu.org/svn/gcc/branches/ARM/embedded-4_9-branch
commit r210320 from Terry Guo  .

 * config.gcc (--with-multilib-list): Accept arm embedded cores.
 * configure/configure.ac: Helptext.
 * config/arm/t-rmprofile: New file.

Signed-off-by: Terry Guo 
Signed-off-by: Jasmin Jessich 
---
 gcc/config.gcc |  14 ++
 gcc/config/arm/t-rmprofile | 121 +
 gcc/configure  |   2 +-
 gcc/configure.ac   |   2 +-
 4 files changed, 137 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/arm/t-rmprofile

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 59aee2c..63841d2 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3796,6 +3796,18 @@ case "${target}" in
 	tmake_file="${tmake_file} arm/t-aprofile"
 	break
 	;;
+armv6-m|armv7|armv7-m|armv7e-m|armv7-r|armv7-a|cortex-m7)
+	if test "x$with_arch" != x \
+	|| test "x$with_cpu" != x \
+	|| test "x$with_float" != x \
+	|| test "x$with_fpu" != x \
+	|| test "x$with_mode" != x ; then
+	echo "Error: You cannot use any of --with-arch/cpu/fpu/float/mode with --with-multilib-list=${with_multilib_list}" 1>&2
+	exit 1
+	fi
+	tmake_file_ml=" arm/t-rmprofile"
+	TM_MULTILIB_CONFIG="${TM_MULTILIB_CONFIG},${arm_multilib}"
+	;;
 default)
 	;;
 *)
@@ -3804,6 +3816,8 @@ case "${target}" in
 	;;
 esac
 			done
+			tmake_file="${tmake_file}${tmake_file_ml}"
+			TM_MULTILIB_CONFIG=`echo $TM_MULTILIB_CONFIG | sed 's/^,//'`
 		fi
 		;;
 
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
new file mode 100644
index 000..65d60c0
--- /dev/null
+++ b/gcc/config/arm/t-rmprofile
@@ -0,0 +1,121 @@
+# Copyright (C) 2012-2015 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# This is a target makefile fragment that attempts to get
+# multilibs built for the range of CPU's, FPU's and ABI's the user did
+# customize via the configure option --with-multilib-list.
+# It should not be used in conjunction with another make file fragment and
+# assumes --with-arch, --with-cpu, --with-fpu, --with-float, --with-mode
+# have their default values during the configure step.  We enforce
+# this during the top-level configury.
+
+comma := ,
+space :=
+space +=
+
+MULTILIB_OPTIONS   = mthumb/marm
+MULTILIB_DIRNAMES  = thumb arm
+MULTILIB_OPTIONS  += march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7/mcpu=cortex-m7
+MULTILIB_DIRNAMES += armv6-m armv7-m armv7e-m armv7-ar cortex-m7
+MULTILIB_OPTIONS  += mfloat-abi=softfp/mfloat-abi=hard
+MULTILIB_DIRNAMES += softfp fpu
+MULTILIB_OPTIONS  += mfpu=fpv4-sp-d16/mfpu=vfpv3-d16/mfpu=fpv5-sp-d16/mfpu=fpv5-d16
+MULTILIB_DIRNAMES += fpv4-sp-d16 vfpv3-d16 fpv5-sp-d16 fpv5-d16
+
+MULTILIB_MATCHES   = march?armv6s-m=mcpu?cortex-m0
+MULTILIB_MATCHES  += march?armv6s-m=mcpu

Re: [PATCH] Add save_expr langhook (PR c/68513)

2015-11-27 Thread Joseph Myers
On Fri, 27 Nov 2015, Marek Polacek wrote:

> I didn't know where to put setting of in_late_processing.  With the current
> placement, we won't (for valid programs) call c_save_expr from c_genericize
> or c_gimplify_expr.

Well, the placement in this patch (in c_parser_compound_statement) is 
certainly wrong.  It doesn't even save and restore, so after one compound 
statement inside another, parsing would continue with in_late_processing 
wrongly set.  And c_save_expr is logically right for any parsing outside 
compound statements as well (arbitrary expressions can occur in sizeof 
outside functions and in VLA parameter sizes and should follow the normal 
rules for what's a constant expression - there's a known bug that 
statement expressions are wrongly rejected in such contexts).

Starting from first principles: parsing takes place from within 
c_parse_file as the sole external entry point to the parser.  So you could 
have a parsing_input variable that starts off as false, and where 
c_parse_file saves it, sets to true, and restores the saved value at the 
end.  Then you'd use c_save_expr if parsing_input && !in_late_binary_op.

If that doesn't work, it means there are cases where the hook gets called 
from folding that takes place during parsing, on expressions that will not 
subsequently go through c_fully_fold, but without in_late_binary_op set.  
Knowing what those cases are might help work out any fix for them that is 
needed.

> I suppose I should also modify save_expr in fold-const.c to call it via the
> langhook, if this approach is sane.  Dunno.

That's a complication.  When the folding is taking place from within 
c_fully_fold (and so the sub-expressions have already been folded, and had 
their C_MAYBE_CONST_EXPRs removed, and the result of folding will not be 
re-folded), it should be using save_expr not c_save_expr.  So maybe the 
hook needs to say: use c_save_expr, if parsing, not in_late_binary_op and 
not folding from within c_fully_fold.

Again long term we should aim for the representation during parsing not to 
need SAVE_EXPRs and for the folding that creates them (and the other 
folding for optimization in general) to happen only after parsing

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PR67335] drop dummy zero from reverse VTA ops, fix infinite recursion

2015-11-27 Thread Alexandre Oliva
On Nov 27, 2015, Jakub Jelinek  wrote:

> I'm sorry, but I don't remember.  Perhaps it has been before some recursion
> prevention has been added or whatever, maybe your own PR52001?

Yeah.  Thanks anyway.

> Have you checked if your patch results in any significant debug info quality
> changes

It might seem a bit Surprising, but there are no differences whatsoever
between stage3-*/*.o in bootstraps with or without the patch, except for
stage3-gcc/var-tracking.o and stage3-gcc/alias.o, both because of
changes to the source code.

AFAICT the only significant difference the patch makes is in
canonicalization of equivalent values, so that we merge into the same
value equivalences coming from both V and (plus V V0) (where V0 is a
value known to be (const_int 0).

It doesn't immediately affect (plus (plus V V0) V0), though; I think
we'd have to break up the two pluses in a debug insn into separate uops
in var-tracking to get it to do so, or attempt to simplify the incoming
rtl so that both V0s get dropped.


As it stands, the values only get merged in the testcase at the later
insn that computes the second plus in (plus (plus V V0) V0), at which
point the reverse op simplifies to V and then we combine them all.  It's
precisely when we attempt to determine the VALUE for the reverse op of
this second insn that we used to get into infinite recursion.

After the patch, we'll have recognized (plus V V0) as equivalent to V
during the reverse_op of an earlier insn, between the debug insn and the
second plus, and that equivalence cuts off the infinite recursion when
checking they're equal for cselib.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PATCH PR68529]Fix not recognized scev by computing no-overflow info for loop with NE_EXPR exit condition

2015-11-27 Thread Bin.Cheng
On Fri, Nov 27, 2015 at 8:51 PM, Richard Biener
 wrote:
> On Fri, Nov 27, 2015 at 12:44 PM, Bin Cheng  wrote:
>> Hi,
>> This patch is to fix PR68529.  In my previous scev/niter overflow patches, I
>> only computed no-overflow information for control iv in simple loops with
>> LT_EXPR as exit condition code.  This bug is about loop with NE_EXPR as exit
>> condition code.  Given below example:
>>
>> #include 
>> #include 
>>
>> int main(){
>> char c[1]={};
>> unsigned int nchar=;
>>
>> while(nchar--!=0){
>>c[nchar]='A';
>>   }
>>
>> printf("%s\n",c);
>> return 0;
>> }
>> nchar used as an index to array 'c' doesn't overflow during loop iterations.
>> Thus &c[nchar] acts as a scev.  GCC now fails to do that.  With this patch,
>> this issue is fixed.
>>
>> Furthermore, the computation of no-overflow information could be improved by
>> using TREE_OVERFLOW_UNDEFINED semantic of signed type for C/C++.  I didn't
>> do that because:
>> 1) I doubt how useful it could be because I have already changed scev to use
>> the semantic whenever possible.  It doesn't need loop niter analysis' help.
>> 2) To do that, I need to expose chrec_convert_aggressive information out of
>> scev in function simple_iv, because that function could corrupt
>> TREE_OVERFLOW_UNDEFINED semantic assumption.  This isn't appropriate for
>> Stage3.
>>
>> Bootstrap and test on x86_64 and x86.  I don't expect any issue on aarch64
>> either.  Is it OK?
>
> +  if (integer_onep (e)
> +  && (integer_onep (s)
> + || (TREE_CODE (c) == INTEGER_CST
> + && TREE_CODE (s) == INTEGER_CST
> + && wi::mod_trunc (c, s, TYPE_SIGN (type)) == 0)))
>
> the only thing I'm looking at here is the modulo sign.  Considering
> we're looking at the sign bit of the step to normalize 'c' and 's' what
> happens for
>
>   for (unsigned int i = 0; i != 1000; --i)
>
> ?  I suppose we get s == 1 and c == -1000U and you'll say the control
> IV doesn't wrap.  Similar for i -= 2 where even when we use a signed
> modulo (singed)-1000U % 2 is still 0.
>
> So I think you need to remember whether we consider the step
> to be negative and compare iv->base and final as well.
I think the patch does the monotonic check wrto sign of step with below code:

+  if (tree_int_cst_sign_bit (iv->step))
+e = fold_build2 (GE_EXPR, boolean_type_node, iv->base, final);
+  else
+e = fold_build2 (LE_EXPR, boolean_type_node, iv->base, final);
+  e = simplify_using_initial_conditions (loop, e);
+  if (integer_onep (e)

It acts as expected with your example.

>
> Bonus points for a wrong-code testcase with the above.
>
> I'd also like to see a testcase exercising step != 1.
I added two new tests each for "step != 1" and the previous case.  I
also tuned original pr68529-3.c a little.  Actually for the case in
the original patch as below:
+void bar(char *s);
+int foo(unsigned short l)
+{
+  char c[1] = {};
+  unsigned short nchar = ;
+
+  if (nchar < l)
+return -1;
+
+  while(nchar-- != l)
+{
+  c[nchar] = 'A';
+}
+
+  bar (c);
+  return 0;
+}

The offset IS an affine.  GCC can't detect that because condition
"nchar (==) < l" is split into two conditions: "l_8 > " and
"l_8 != ".  For now simplify_using_initial_conditions can't merge
range information from two different conditions.  Maybe jump threading
can merge the two condition/jumps, or VRP improvement discussed before
can handle that.

Here is the updated patch.  Is it OK?

Thanks,
bin
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68529-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-1.c
new file mode 100644
index 000..eef7460
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns 
-fdump-tree-ldist-details" } */
+
+void bar(char *s);
+int foo()
+{
+  char c[1] = {};
+  unsigned short nchar = ;
+
+  while(nchar-- != 0)
+{
+  c[nchar] = 'A';
+}
+
+  bar (c);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "distributed: split to 0 loops and 1 library 
calls" "ldist" } } */
+/* { dg-final { scan-tree-dump "generated memset" "ldist" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68529-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-2.c
new file mode 100644
index 000..a1d2742
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns 
-fdump-tree-ldist-details" } */
+
+void bar(char *s);
+int foo(unsigned short l)
+{
+  char c[1] = {};
+  unsigned short nchar = ;
+
+  if (nchar <= l)
+return -1;
+
+  while(nchar-- != l)
+{
+  c[nchar] = 'A';
+}
+
+  bar (c);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "distributed: split to 0 loops and 1 library 
calls" "ldist" } } */
+/* { dg-final { scan-tree-dump "generated memset" "ldist" } } */
diff --g

Re: [PATCH] Fix vector rsqrt discovery (PR tree-optimization/68501)

2015-11-27 Thread Richard Biener
On November 27, 2015 8:40:56 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>The recent changes where vector sqrt is represented in the IL using
>IFN_SQRT instead of target specific builtins broke the discovery
>of vector rsqrt, as targetm.builtin_reciprocal is called only
>on builtin functions (not internal functions).  Furthermore,
>for internal fns, not only the IFN_* is significant, but also the
>types (modes actually) of the lhs and/or arguments.
>
>This patch adjusts the target hook, so that the backends can just
>inspect
>the call (builtin or internal function), whatever it is.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.  Though the other option would be to add an optab with corresponding IFN.

Richard.

>2015-11-27  Jakub Jelinek  
>
>   PR tree-optimization/68501
>   * target.def (builtin_reciprocal): Replace the 3 arguments with
>   a gcall * one, adjust description.
>   * targhooks.h (default_builtin_reciprocal): Replace the 3 arguments
>   with a gcall * one.
>   * targhooks.c (default_builtin_reciprocal): Likewise.
>   * tree-ssa-math-opts.c (pass_cse_reciprocals::execute): Use
>   targetm.builtin_reciprocal even on internal functions, adjust
>   the arguments and allow replacing an internal function with normal
>   built-in.
>   * config/i386/i386.c (ix86_builtin_reciprocal): Replace the 3
>arguments
>   with a gcall * one.  Handle internal fns too.
>   * config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Likewise.
>   * config/aarch64/aarch64.c (aarch64_builtin_reciprocal): Likewise.
>   * doc/tm.texi (builtin_reciprocal): Document.
>
>--- gcc/target.def.jj  2015-11-18 11:19:19.0 +0100
>+++ gcc/target.def 2015-11-27 16:37:07.870823670 +0100
>@@ -2463,13 +2463,9 @@ identical versions.",
> DEFHOOK
> (builtin_reciprocal,
>"This hook should return the DECL of a function that implements
>reciprocal of\n\
>-the builtin function with builtin function code @var{fn}, or\n\
>-@code{NULL_TREE} if such a function is not available.  @var{md_fn} is
>true\n\
>-when @var{fn} is a code of a machine-dependent builtin function. 
>When\n\
>-@var{sqrt} is true, additional optimizations that apply only to the
>reciprocal\n\
>-of a square root function are performed, and only reciprocals of
>@code{sqrt}\n\
>-function are valid.",
>- tree, (unsigned fn, bool md_fn, bool sqrt),
>+the builtin or internal function call @var{call}, or\n\
>+@code{NULL_TREE} if such a function is not available.",
>+ tree, (gcall *call),
>  default_builtin_reciprocal)
> 
>/* For a vendor-specific TYPE, return a pointer to a
>statically-allocated
>--- gcc/targhooks.h.jj 2015-11-18 11:19:17.0 +0100
>+++ gcc/targhooks.h2015-11-27 16:37:44.828301093 +0100
>@@ -90,7 +90,7 @@ extern tree default_builtin_vectorized_c
> 
>extern int default_builtin_vectorization_cost (enum vect_cost_for_stmt,
>tree, int);
> 
>-extern tree default_builtin_reciprocal (unsigned int, bool, bool);
>+extern tree default_builtin_reciprocal (gcall *);
> 
> extern HOST_WIDE_INT default_vector_alignment (const_tree);
> 
>--- gcc/targhooks.c.jj 2015-11-18 11:19:17.0 +0100
>+++ gcc/targhooks.c2015-11-27 16:38:21.461783097 +0100
>@@ -600,9 +600,7 @@ default_builtin_vectorization_cost (enum
> /* Reciprocal.  */
> 
> tree
>-default_builtin_reciprocal (unsigned int fn ATTRIBUTE_UNUSED,
>-  bool md_fn ATTRIBUTE_UNUSED,
>-  bool sqrt ATTRIBUTE_UNUSED)
>+default_builtin_reciprocal (gcall *)
> {
>   return NULL_TREE;
> }
>--- gcc/tree-ssa-math-opts.c.jj2015-11-25 09:57:47.0 +0100
>+++ gcc/tree-ssa-math-opts.c   2015-11-27 17:07:22.756162308 +0100
>@@ -601,19 +601,17 @@ pass_cse_reciprocals::execute (function
> 
> if (is_gimple_call (stmt1)
> && gimple_call_lhs (stmt1)
>-&& (fndecl = gimple_call_fndecl (stmt1))
>-&& (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
>-|| DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD))
>+&& (gimple_call_internal_p (stmt1)
>+|| ((fndecl = gimple_call_fndecl (stmt1))
>+&& (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
>+|| (DECL_BUILT_IN_CLASS (fndecl)
>+== BUILT_IN_MD)
>   {
>-enum built_in_function code;
>-bool md_code, fail;
>+bool fail;
> imm_use_iterator ui;
> use_operand_p use_p;
> 
>-code = DECL_FUNCTION_CODE (fndecl);
>-md_code = DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD;
>-
>-fndecl = targetm.builtin_reciprocal (code, md_code, false);
>+fndecl = targetm.builtin_reciprocal (as_a  (stmt1));
> if (!fndecl)
>   continue;
> 
>@@ -639,8 +637,28 @@ pass_cse_reciprocals::execute (

Re: [PATCH] Add save_expr langhook (PR c/68513)

2015-11-27 Thread Richard Biener
On November 27, 2015 7:55:43 PM GMT+01:00, Marek Polacek  
wrote:
>As suggested here
>
>and here ,
>this patch
>adds a new langhook to distinguish whether to call c_save_expr or
>save_expr
>from match.pd.  Does this look reasonable?
>
>I didn't know where to put setting of in_late_processing.  With the
>current
>placement, we won't (for valid programs) call c_save_expr from
>c_genericize
>or c_gimplify_expr.
>
>I suppose I should also modify save_expr in fold-const.c to call it via
>the
>langhook, if this approach is sane.  Dunno.

I don't like this at all.

Different approach: after the FE folds (unexpectedly?), scan the result for 
SAVE_EXPRs and if found, drop the folding.

Richard.

>Bootstrapped/regtested on x86_64-linux.
>
>2015-11-27  Marek Polacek  
>
>   PR c/68513
>   * c-common.c (in_late_processing): New global.
>   (c_common_save_expr): New function.
>   * c-common.h (in_late_processing, c_common_save_expr): Declare.
>
>   * c-objc-common.h (LANG_HOOKS_SAVE_EXPR): Define.
>   * c-parser.c (c_parser_compound_statement): Set IN_LATE_PROCESSING.
>
>   * generic-match-head.c: Include "langhooks.h".
>   * genmatch.c (dt_simplify::gen_1): Call save_expr via langhook.
>   * langhooks-def.h (LANG_HOOKS_SAVE_EXPR): Define.
>   * langhooks.h (struct lang_hooks): Add save_expr langhook.
>
>   * gcc.dg/torture/pr68513.c: New test.
>
>diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
>index fe0a235..850bee9 100644
>--- gcc/c-family/c-common.c
>+++ gcc/c-family/c-common.c
>@@ -271,6 +271,12 @@ int c_inhibit_evaluation_warnings;
>be generated.  */
> bool in_late_binary_op;
> 
>+/* When true, all the constant expression checks from parsing should
>have been
>+   done.  This is used so that fold knows whether to call c_save_expr
>(thus
>+   c_fully_fold is called on the expression), or whether to call
>save_expr via
>+   c_common_save_expr langhook.  */
>+bool in_late_processing;
>+
> /* Whether lexing has been completed, so subsequent preprocessor
>errors should use the compiler's input_location.  */
> bool done_lexing = false;
>@@ -4928,6 +4934,15 @@ c_save_expr (tree expr)
>   return expr;
> }
> 
>+/* The C version of the save_expr langhook.  Either call save_expr or
>c_save_expr,
>+   depending on IN_LATE_PROCESSING.  */
>+
>+tree
>+c_common_save_expr (tree expr)
>+{
>+  return in_late_processing ? save_expr (expr) : c_save_expr (expr);
>+}
>+
> /* Return whether EXPR is a declaration whose address can never be
>NULL.  */
> 
>diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
>index bad8d05..e2d4ba9 100644
>--- gcc/c-family/c-common.h
>+++ gcc/c-family/c-common.h
>@@ -771,6 +771,7 @@ extern void c_register_addr_space (const char *str,
>addr_space_t as);
> 
> /* In c-common.c.  */
> extern bool in_late_binary_op;
>+extern bool in_late_processing;
> extern const char *c_addr_space_name (addr_space_t as);
> extern tree identifier_global_value (tree);
> extern tree c_linkage_bindings (tree);
>@@ -812,6 +813,7 @@ extern tree c_fully_fold (tree, bool, bool *);
> extern tree decl_constant_value_for_optimization (tree);
> extern tree c_wrap_maybe_const (tree, bool);
> extern tree c_save_expr (tree);
>+extern tree c_common_save_expr (tree);
> extern tree c_common_truthvalue_conversion (location_t, tree);
> extern void c_apply_type_quals_to_decl (int, tree);
>extern tree c_sizeof_or_alignof_type (location_t, tree, bool, bool,
>int);
>diff --git gcc/c/c-objc-common.h gcc/c/c-objc-common.h
>index 50c9f54..9fd3722 100644
>--- gcc/c/c-objc-common.h
>+++ gcc/c/c-objc-common.h
>@@ -60,6 +60,8 @@ along with GCC; see the file COPYING3.  If not see
> #define LANG_HOOKS_BUILTIN_FUNCTION c_builtin_function
> #undef  LANG_HOOKS_BUILTIN_FUNCTION_EXT_SCOPE
>#define LANG_HOOKS_BUILTIN_FUNCTION_EXT_SCOPE
>c_builtin_function_ext_scope
>+#undef LANG_HOOKS_SAVE_EXPR
>+#define LANG_HOOKS_SAVE_EXPR c_common_save_expr
> 
> /* Attribute hooks.  */
> #undef LANG_HOOKS_COMMON_ATTRIBUTE_TABLE
>diff --git gcc/c/c-parser.c gcc/c/c-parser.c
>index 0259f66..3f7c458 100644
>--- gcc/c/c-parser.c
>+++ gcc/c/c-parser.c
>@@ -4586,6 +4586,9 @@ c_parser_compound_statement (c_parser *parser)
> {
>   tree stmt;
>   location_t brace_loc;
>+
>+  in_late_processing = false;
>+
>   brace_loc = c_parser_peek_token (parser)->location;
>   if (!c_parser_require (parser, CPP_OPEN_BRACE, "expected %<{%>"))
> {
>@@ -4598,6 +4601,9 @@ c_parser_compound_statement (c_parser *parser)
>   stmt = c_begin_compound_stmt (true);
>   c_parser_compound_statement_nostart (parser);
> 
>+  /* From now on, the fold machinery shouldn't call c_save_expr.  */
>+  in_late_processing = true;
>+
>/* If the compound stmt contains array notations, then we expand them. 
>*/
>   if (flag_cilkplus && contains_array_notation_expr (stmt))
> stmt = expand_array_notation_exprs (stmt);
>diff --git gcc