date:20120910

On Fri, Sep 7, 2012 at 8:05 PM, Aldy Hernandez  wrote:
> This is the same thing as gcc.dg/pr52558-1.c, but in this case I had to
> tweak the testcase a bit because optimization passes after LIM are smart
> enough to remove the condition altogether, thus never triggering the test.
> Interestingly, GCC can figure out what's going on when the condition is "l <
> 1234", but not when it is "l != 4".
>
> Luckily, the original PR (PR52558) was testing "l != 4", so now the test
> looks exactly as the what the PR writer had.
>
> Tested on x86-64 Linux by running with and without --param
> allow-store-data-races=0, and by visual inspection of the assembly.
>
> OK?

Ok.

Thanks,
Richard.

Re: [patch] fix gcc.dg/tm/reg-promotion.c

On Fri, Sep 7, 2012 at 8:20 PM, Aldy Hernandez  wrote:
> This is a bit different, in that we don't currently have an infrastructure
> to test transactional memory code within the simulate-thread framework.
> Luckily for this test, the LIM pass has an actual dump message when it fails
> to hoist a value due to its presence in a transaction.
>
> Eventually/ideally, we should have a mechanism for testing races of
> transactionally executed code.
>
> OK?

Ok.

Thanks,
Richard.

Re: [PATCH] Fix PR54515

On Fri, Sep 7, 2012 at 11:01 PM, Markus Trippelsdorf
 wrote:
> Here the problem is that get_base_address() can return NULL_TREE and
> this later leads to a segfault. Fix by checking that the return value is
> valid.
> gcc-4.6 and 4.7 are also affected.
>
> Please commit if this looks OK.
> Thanks.

Hmm, we call the function on

VIEW_CONVERT_EXPR(0)[0]

which should have been folded to a constant.  And get_base_address
should just return the constant tree instead of returning NULL (it does
return a plethora of base object kinds already).  Your patch looks ok for
the branches where I'll install it and come up with sth else for trunk.

Thanks,
Richard.

> Tested on x86_64-pc-linux-gnu
>
> 2012-09-07  Markus Trippelsdorf  
>
> PR middle-end/54515
> * tree-sra.c (disqualify_base_of_expr): Check for possible
> NULL_TREE returned by get_base_address()
>
> * g++.dg/tree-ssa/pr54515.C: new testcase
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr54515.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr54515.C
> new file mode 100644
> index 000..11ed468
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr54515.C
> @@ -0,0 +1,19 @@
> +// { dg-do compile }
> +// { dg-options "-O2" }
> +
> +template < typename T > T h2le (T)
> +{
> +T a;
> +unsigned short &b = a;
> +short c = 0;
> +unsigned char (&d)[2] = reinterpret_cast < unsigned char (&)[2] > (c);
> +unsigned char (&e)[2] = reinterpret_cast < unsigned char (&)[2] > (b);
> +e[0] = d[0];
> +return a;
> +}
> +
> +void
> +bar ()
> +{
> +h2le ((unsigned short) 0);
> +}
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index aafaa15..2bb92e9 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -984,7 +984,8 @@ static void
>  disqualify_base_of_expr (tree t, const char *reason)
>  {
>t = get_base_address (t);
> -  if (sra_mode == SRA_MODE_EARLY_IPA
> +  if (t
> +  && sra_mode == SRA_MODE_EARLY_IPA
>&& TREE_CODE (t) == MEM_REF)
>  t = get_ssa_base_param (TREE_OPERAND (t, 0));
>
> --
> Markus

Re: [PATCH] Combine location with block using block_locations

On Sun, Sep 9, 2012 at 12:26 AM, Dehao Chen  wrote:
> Hi, Diego,
>
> Thanks a lot for the review. I've updated the patch.
>
> This patch is large and may easily break builds because it reserves
> more complete information for TREE_BLOCK as well as gimple_block (may
> trigger bugs that was hided when these info are unavailable). I've
> done more rigorous testing to ensure that most bugs are caught before
> checking in.
>
> * Sync to the head and retest all gcc testsuite.
> * Port the patch to google-4_7 branch to retest all gcc testsuite, as
> well as build many large applications.
>
> Through these tests, I've found two additional bugs that was omitted
> in the original implementation. A new patch is attached (patch.txt) to
> fix these problems. After this fix, all gcc testsuites pass for both
> trunk and google-4_7 branch. I've also copy pasted the new fixes
> (lto.c and tree-cfg.c) below. Now I'd say this patch is in good shape.
> But it may not be perfect. I'll look into build failures as soon as it
> arises.
>
> Richard and Diego, could you help me take a look at the following two fixes?
>
> Thanks,
> Dehao
>
> New fixes:
> --- gcc/lto/lto.c   (revision 191083)
> +++ gcc/lto/lto.c   (working copy)
> @@ -1559,8 +1559,6 @@ lto_fixup_prevailing_decls (tree t)
>  {
>enum tree_code code = TREE_CODE (t);
>LTO_NO_PREVAIL (TREE_TYPE (t));
> -  if (CODE_CONTAINS_STRUCT (code, TS_COMMON))
> -LTO_NO_PREVAIL (TREE_CHAIN (t));

That change is odd.  Can you show us how it breaks?

>if (DECL_P (t))
>  {
>LTO_NO_PREVAIL (DECL_NAME (t));
>
> Index: gcc/tree-cfg.c
> ===
> --- gcc/tree-cfg.c  (revision 191083)
> +++ gcc/tree-cfg.c  (working copy)
> @@ -5980,9 +5974,21 @@ move_stmt_op (tree *tp, int *walk_subtrees, void *
>tree t = *tp;
>
>if (EXPR_P (t))
> -/* We should never have TREE_BLOCK set on non-statements.  */
> -gcc_assert (!TREE_BLOCK (t));
> -
> +{
> +  tree block = TREE_BLOCK (t);
> +  if (p->orig_block == NULL_TREE
> + || block == p->orig_block
> + || block == NULL_TREE)
> +   TREE_SET_BLOCK (t, p->new_block);
> +#ifdef ENABLE_CHECKING
> +  else if (block != p->new_block)
> +   {
> + while (block && block != p->orig_block)
> +   block = BLOCK_SUPERCONTEXT (block);
> + gcc_assert (block);
> +   }
> +#endif

I think what this means is that TREE_BLOCK on non-stmts are meaningless
(thus only gimple_block is interesting on GIMPLE, not BLOCKs on trees).

So instead of setting a BLOCK in some cases you should clear BLOCK
if it happens to be set, or alternatively, only re-set it if there was
a block associated
with it.

Richard.

> +}
>else if (DECL_P (t) || TREE_CODE (t) == SSA_NAME)
>  {
>if (TREE_CODE (t) == SSA_NAME)
>
> Whole patch:
> gcc/ChangeLog:
> 2012-09-08  Dehao Chen  
>
> * toplev.c (general_init): Init block_locations.
> * tree.c (tree_set_block): New.
> (tree_block): Change to use LOCATION_BLOCK.
> * tree.h (TREE_SET_BLOCK): New.
> * final.c (reemit_insn_block_notes): Change to use LOCATION_BLOCK.
> (final_start_function): Likewise.
> * input.c (expand_location_1): Likewise.
> * input.h (LOCATION_LOCUS): New.
> (LOCATION_BLOCK): New.
> (IS_UNKNOWN_LOCATION): New.
> * fold-const.c (expr_location_or): Change to use new location.
> * reorg.c (emit_delay_sequence): Likewise.
> (try_merge_delay_insns): Likewise.
> * modulo-sched.c (dump_insn_location): Likewise.
> * lto-streamer-out.c (lto_output_location_bitpack): Likewise.
> * jump.c (rtx_renumbered_equal_p): Likewise.
> * ifcvt.c (noce_try_move): Likewise.
> (noce_try_store_flag): Likewise.
> (noce_try_store_flag_constants): Likewise.
> (noce_try_addcc): Likewise.
> (noce_try_store_flag_mask): Likewise.
> (noce_try_cmove): Likewise.
> (noce_try_cmove_arith): Likewise.
> (noce_try_minmax): Likewise.
> (noce_try_abs): Likewise.
> (noce_try_sign_mask): Likewise.
> (noce_try_bitop): Likewise.
> (noce_process_if_block): Likewise.
> (cond_move_process_if_block): Likewise.
> (find_cond_trap): Likewise.
> * dwarf2out.c (add_src_coords_attributes): Likewise.
> * expr.c (expand_expr_real): Likewise.
> * tree-parloops.c (create_loop_fn): Likewise.
> * recog.c (peep2_attempt): Likewise.
> * function.c (free_after_compilation): Likewise.
> (expand_function_end): Likewise.
> (set_insn_locations): Likewise.
> (thread_prologue_and_epilogue_insns): Likewise.
> * print-rtl.c (print_rtx): Likewise.
> * profile.c (branch_prob): Likewise.
> * trans-mem.c (ipa_tm_scan_irr_block): Likewise.
> * gimplify.c (gimplify_call_expr): Likewise.
>

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

On Sun, Sep 9, 2012 at 8:02 PM, Iyer, Balaji V  wrote:
> Hello Joseph,
> Here is an updated patch. I think I have fixed all the changes you 
> and others have mentioned. Please let me know if everything looks OK. Thanks 
> again for doing the review!

The ChangeLog still mentions the vectorizer looking at flag_cilk_plus.  Also
this patch does not contain a single testcase and thus all codepaths are
not exercised when bootstrapping and testing.

You do not mention how this feature works from an ABI perspective.

I don't think this is anywere ready to go in.  Please split it into
two parts at least,
part one would be to make the vectorizer support vectorizing functions with
the "elemental function" attribute by assuming a vectorized variant with
documented mangling exists.  That feature should work regardless of whether
Cilk+ is enabled or not.  The mangling needs to be documented alongside
the documentation for the "elemental funciton" attribute.  Part two would be
the rest, possibly re-implemented in the way that was suggested.

Thanks,
Richard.

> Sincerely,
>
> Balaji V. Iyer.
>
> Here are the fixed ChangeLog entries:
>
> 
> gcc/ChangeLog
> 2012-09-09  Balaji V. Iyer  
>
> * attribs.c (is_elem_fn_attribute_p): New function.
> (decl_attributes): Added a check for Elemental function attribute when
> Cilk Plus is enabled.
> * cgraphunit.c (cgraph_decide_is_function_needed): Added a check for
> cloned elemental function when Cilk Plus is enabled.
> (cgraph_add_new_function): When Cilk Plus is enabled we call
> cgraph_get_create_node.
> (cgraph_analyze_functions): Added a check if the function call is a
> cloned elemental function when Cilk Plus is enabled.
> * cilkplus.h: New file.
> * elem-function-common.c: Likewise.
> * config/i386/i386.c (ix86_cilkplus_map_proc_to_attr): New function.
> (TARGET_CILKPLUS_BUILTIN_MAP_PROCESSOR_TO_ATTR): New define.
> * expr.c (expand_expr_real_1): Added a check if Cilk Plus is enabled.
> * function.h (struct function): Added elem_fn_already_cloned field.
> * gimplify.c (gimplify_function_tree): Added a check if Cilk Plus is
> enabled and if the function is an elemental function.  If so, then 
> call
> the function to clone elemental function.
> * langhooks.c (lhd_elem_fn_create_fn): New function.
> * langhooks-def.h (LANG_HOOKS_CILKPLUS): New define.
> (LANG_HOOK_DECLS): Added LANG_HOOKS_CILKPLUS field.
> * langhooks.h (struct lang_hooks_for_cilkplus): New struct.
> (struct lang_hooks): Added a field called cilkplus.
> * target.def (TARGET_CILKPLUS): New hook vector.
> (builtin_map_processor_to_attr): New target hook def.
> * targhooks.c (default_builtin_map_processor_to_attr): New function.
> * doc/tm.texi: Regenerated.
> * doc/tm.texi.in (TARGET_CILKPLUS_BUILTIN_MAP_PROCESSOR_TO_ATTR): 
> Documented
> new hook.
> * tree.h (tree_function_decl): Added a new field called
> elem_fn_already_cloned.
> (DECL_ELEM_FN_ALREADY_CLONED): New define.
> * tree-data-ref.c (find_data_references_in_stmt): Added a check for
> an elemental function call when Cilk Plus is enabled.
> * tree-inline.c (elem_fn_copy_arguments_for_versioning): New function.
> (initialize_elem_fn_cfun): Likewise.
> (tree_elem_fn_versioning): Likewise.
> * tree-vect-stmts.c (vect_get_vec_def_for_operand): Check parm type 
> for
> an elemental function when Cilk Plus is enabled and set data 
> definition
> accordingly.
> (elem_fn_vect_get_vec_def_for_operand): New function.
> (vect_finish_stmt_generation): Added a check for elemental function.
> (vectorizable_function): Check if the function call is a Cilk Plus
> elemental function.  If so, then insert the appopriate mangled name.
> (vectorizable_call): Eliminate the argument requirement when Cilk Plus
> is enabled for vectorization.  Also, set thee appropriate data def. 
> for
> an elemental function call.
> (elem_fn_linear_init_vector): New function.
> * tree.c (build_elem_fn_linear_vector): Likewise.
>
> gcc/c-family/ChangeLog
> 2012-09-09  Balaji V. Iyer  
>
> * c-common.c (struct c_common_attribute_table): Added vector
> attribute for Cilk Plus elemental function.
> (handle_vector_atribute): New function.
> * c-cpp-elem-function.c: New file.
> * c.opt (-fcilkplus): Added new flag.
>
> gcc/c/ChangeLog
> 2012-09-09  Balaji V. Iyer  
>
> * c-decl.c (bind): Added a check for non NULL scope.
> * c-parser.c (c_parser_declaration_or_fndef): Added a check if Cilk
> Plus defined.  If so, then we save the arguments for a fun

[Patch,avr,committed]: Fix PR54536

2012-09-10 Thread Georg-Johann Lay

http://gcc.gnu.org/viewcvs?view=revision&revision=191132
http://gcc.gnu.org/viewcvs?view=revision&revision=191133
http://gcc.gnu.org/viewcvs?view=revision&revision=191134

Committed these changes as obvious fix for a typo in
avr-devices.c / avr-mcus.def

at90usb1287 had a wrong library_name "usb1286" instead of
"usb1287".

The bug is not critical because at90usb1286/7 are the same
architecture and crtusb128[67].o is the same.

Johann


PR target/54536
* config/avr/avr-mcus.def (at90usb1287): Set LIBRARY_NAME to "usb1287".

RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-10 Thread Joseph S. Myers

On Sun, 9 Sep 2012, Iyer, Balaji V wrote:

>   Here is an updated patch. I think I have fixed all the changes you 
> and others have mentioned. Please let me know if everything looks OK. 
> Thanks again for doing the review!

Has the user documentation for this feature been posted?  For patch review 
we really need a self-contained submission that for any feature 
implemented includes not just the implementation but the testcases and the 
documentation.  I think the testsuite patch also needs reworking to make 
it easy to add support for new architectures.

I think you need to revisit your split into 22 patches and arrange things 
based primarily on features.  If the changes for a feature are so big they 
can't be posted in one message, you should still always post all the 
patches for that feature together (implementation, documentation, 
testcases) - even if not all parts have changed in a particular revision.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: combine vec_perm_expr with constructor

On Sat, Sep 8, 2012 at 9:14 AM, Marc Glisse  wrote:
> On Mon, 3 Sep 2012, Richard Guenther wrote:
>
>> You do work above and then bail late here.  Always do early exists early
>> to reduce useless compile-time.
>
> [...]
>
>> You need to verify that fold_ternary returns something that is valid
>> GIMPLE.
>> fold () in general happily returns trees that are in the need of
>> re-gimplification.
>> You expect a CONSTRUCTOR or VECTOR_CST here, so you should check
>> for that.
>
>
> Hello,
>
> here is a new version of the patch, again tested on x86_64-linux-gnu.

Ok.

Thanks,
Richard.

> 2012-09-08  Marc Glisse  
>
>
> gcc/
> * tree-ssa-forwprop.c (simplify_permutation): Handle CONSTRUCTOR.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/forwprop-20.c: New testcase.
>
>
> --
> Marc Glisse
>
> Index: gcc/testsuite/gcc.dg/tree-ssa/forwprop-20.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/forwprop-20.c (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-20.c (revision 0)
> @@ -0,0 +1,70 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target double64 } */
> +/* { dg-options "-O2 -fdump-tree-optimized" }  */
> +
> +#include 
> +
> +/* All of these optimizations happen for unsupported vector modes as a
> +   consequence of the lowering pass. We need to test with a vector mode
> +   that is supported by default on at least some architectures, or make
> +   the test target specific so we can pass a flag like -mavx.  */
> +
> +typedef double vecf __attribute__ ((vector_size (2 * sizeof (double;
> +typedef int64_t veci __attribute__ ((vector_size (2 * sizeof (int64_t;
> +
> +void f (double d, vecf* r)
> +{
> +  vecf x = { -d, 5 };
> +  vecf y = {  1, 4 };
> +  veci m = {  2, 0 };
> +  *r = __builtin_shuffle (x, y, m); // { 1, -d }
> +}
> +
> +void g (float d, vecf* r)
> +{
> +  vecf x = { d, 5 };
> +  vecf y = { 1, 4 };
> +  veci m = { 2, 1 };
> +  *r = __builtin_shuffle (x, y, m); // { 1, 5 }
> +}
> +
> +void h (double d, vecf* r)
> +{
> +  vecf x = { d + 1, 5 };
> +  vecf y = {   1  , 4 };
> +  veci m = {   2  , 0 };
> +  *r = __builtin_shuffle (y, x, m); // { d + 1, 1 }
> +}
> +
> +void i (float d, vecf* r)
> +{
> +  vecf x = { d, 5 };
> +  veci m = { 1, 0 };
> +  *r = __builtin_shuffle (x, m); // { 5, d }
> +}
> +
> +void j (vecf* r)
> +{
> +  vecf y = {  1, 2 };
> +  veci m = {  0, 0 };
> +  *r = __builtin_shuffle (y, m); // { 1, 1 }
> +}
> +
> +void k (vecf* r)
> +{
> +  vecf x = {  3, 4 };
> +  vecf y = {  1, 2 };
> +  veci m = {  3, 0 };
> +  *r = __builtin_shuffle (x, y, m); // { 2, 3 }
> +}
> +
> +void l (double d, vecf* r)
> +{
> +  vecf x = { -d, 5 };
> +  vecf y = {  d, 4 };
> +  veci m = {  2, 0 };
> +  *r = __builtin_shuffle (x, y, m); // { d, -d }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
>
> Property changes on: gcc/testsuite/gcc.dg/tree-ssa/forwprop-20.c
> ___
> Added: svn:eol-style
>+ native
> Added: svn:keywords
>+ Author Date Id Revision URL
>
> Index: gcc/tree-ssa-forwprop.c
> ===
> --- gcc/tree-ssa-forwprop.c (revision 191082)
> +++ gcc/tree-ssa-forwprop.c (working copy)
> @@ -2599,75 +2599,134 @@ is_combined_permutation_identity (tree m
>if (j == i)
> maybe_identity2 = false;
>else if (j == i + nelts)
> maybe_identity1 = false;
>else
> return 0;
>  }
>return maybe_identity1 ? 1 : maybe_identity2 ? 2 : 0;
>  }
>
> -/* Combine two shuffles in a row.  Returns 1 if there were any changes
> -   made, 2 if cfg-cleanup needs to run.  Else it returns 0.  */
> +/* Combine a shuffle with its arguments.  Returns 1 if there were any
> +   changes made, 2 if cfg-cleanup needs to run.  Else it returns 0.  */
>
>  static int
>  simplify_permutation (gimple_stmt_iterator *gsi)
>  {
>gimple stmt = gsi_stmt (*gsi);
>gimple def_stmt;
> -  tree op0, op1, op2, op3;
> -  enum tree_code code = gimple_assign_rhs_code (stmt);
> -  enum tree_code code2;
> +  tree op0, op1, op2, op3, arg0, arg1;
> +  enum tree_code code;
>
> -  gcc_checking_assert (code == VEC_PERM_EXPR);
> +  gcc_checking_assert (gimple_assign_rhs_code (stmt) == VEC_PERM_EXPR);
>
>op0 = gimple_assign_rhs1 (stmt);
>op1 = gimple_assign_rhs2 (stmt);
>op2 = gimple_assign_rhs3 (stmt);
>
> -  if (TREE_CODE (op0) != SSA_NAME)
> -return 0;
> -
>if (TREE_CODE (op2) != VECTOR_CST)
>  return 0;
>
> -  if (op0 != op1)
> -return 0;
> +  if (TREE_CODE (op0) == VECTOR_CST)
> +{
> +  code = VECTOR_CST;
> +  arg0 = op0;
> +}
> +  else if (TREE_CODE (op0) == SSA_NAME)
> +{
> +  def_stmt = SSA_NAME_DEF_STMT (op0);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> + || !can_propagate_from (def_stmt))
> +   return 0;

Re: combine BIT_FIELD_REF and VEC_PERM_EXPR

On Sat, Sep 8, 2012 at 10:17 AM, Marc Glisse  wrote:
> On Mon, 3 Sep 2012, Richard Guenther wrote:
>
>> Please do the early outs where you compute the arguments.  Thus, right
>> after getting op0 in this case or right after computing n for the n != 1
>> check.
>>
>> I think you need to verify that the type of 'op' is actually the element
>> type
>> of op0.  The BIT_FIELD_REF can happily access elements two and three
>> of { 1, 2, 3, 4 } as a long for example.  See the BIT_FIELD_REF foldings
>> in fold-const.c.
>
>
> On Mon, 3 Sep 2012, Richard Guenther wrote:
>
>> If you use fold_build3 you need to check that the result is in expected
>> form
>> (a is_gimple_invariant or an SSA_NAME).
>
>
> I first tried this:
>
> +  if (TREE_CODE (tem) != SSA_NAME
> + && TREE_CODE (tem) != BIT_FIELD_REF
> + && !is_gimple_min_invariant (tem))
> +   return false;
>
> but then I thought that fold_stmt probably does the right thing (?) so I
> switched to it.

I should indeed.

>
>>> Now that I look at this line, I wonder if I am missing some unshare_expr
>>> for
>>> p and/or op1.
>>
>>
>> If either is a CONSTRUCTOR and its def stmt is not removed and it survives
>> into tem then yes ...
>
>
> I added an unconditional unshare_expr. I guess it would be possible to look
> at if the permutation is only used once and in that case maybe not call
> unshare_expr (?) and call remove_prop_source_from_use at the end, but it
> gets a bit complicated and I don't think it helps compared to waiting for
> the next DCE pass.
>
>
 Please also add handling of code == CONSTRUCTOR.
>>>
>>>
>>> The cases I tried were already handled by fre1. I can add code for
>>> constructor, but I'll need to look for a testcase first. Can that go to a
>>> different patch?
>>
>>
>> Yes.
>
>
> This is currently handled by FRE. The only testcase I have that reaches
> forwprop is bit_field_ref (vec_perm_expr (constructor)) and will require me
> to disable this patch so I can test that one...
>
>
> New version of the patch (I really think I already regtested it, but I'll do
> it again to be sure):
>
> 2012-09-08  Marc Glisse  
>
>
> gcc/
> * tree-ssa-forwprop.c (simplify_bitfield): New function.
> (ssa_forward_propagate_and_combine): Call it.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/forwprop-21.c: New testcase.
>
> --
> Marc Glisse
> Index: tree-ssa-forwprop.c
> ===
> --- tree-ssa-forwprop.c (revision 191089)
> +++ tree-ssa-forwprop.c (working copy)
> @@ -2567,20 +2567,92 @@ combine_conversions (gimple_stmt_iterato
>   gimple_assign_set_rhs_code (stmt, CONVERT_EXPR);
>   update_stmt (stmt);
>   return remove_prop_source_from_use (op0) ? 2 : 1;
> }
> }
>  }
>
>return 0;
>  }
>
> +/* Combine an element access with a shuffle.  Returns true if there were
> +   any changes made, else it returns false.  */
> +
> +static bool
> +simplify_bitfield (gimple_stmt_iterator *gsi)

simplify_bitfield_ref

Ok with that change.

Thanks,
Richard.

> +{
> +  gimple stmt = gsi_stmt (*gsi);
> +  gimple def_stmt;
> +  tree op, op0, op1, op2;
> +  tree elem_type;
> +  unsigned idx, n, size;
> +  enum tree_code code;
> +
> +  op = gimple_assign_rhs1 (stmt);
> +  gcc_checking_assert (TREE_CODE (op) == BIT_FIELD_REF);
> +
> +  op0 = TREE_OPERAND (op, 0);
> +  if (TREE_CODE (op0) != SSA_NAME
> +  || TREE_CODE (TREE_TYPE (op0)) != VECTOR_TYPE)
> +return false;
> +
> +  elem_type = TREE_TYPE (TREE_TYPE (op0));
> +  if (TREE_TYPE (op) != elem_type)
> +return false;
> +
> +  size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
> +  op1 = TREE_OPERAND (op, 1);
> +  n = TREE_INT_CST_LOW (op1) / size;
> +  if (n != 1)
> +return false;
> +
> +  def_stmt = SSA_NAME_DEF_STMT (op0);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> +  || !can_propagate_from (def_stmt))
> +return false;
> +
> +  op2 = TREE_OPERAND (op, 2);
> +  idx = TREE_INT_CST_LOW (op2) / size;
> +
> +  code = gimple_assign_rhs_code (def_stmt);
> +
> +  if (code == VEC_PERM_EXPR)
> +{
> +  tree p, m, index, tem;
> +  unsigned nelts;
> +  m = gimple_assign_rhs3 (def_stmt);
> +  if (TREE_CODE (m) != VECTOR_CST)
> +   return false;
> +  nelts = VECTOR_CST_NELTS (m);
> +  idx = TREE_INT_CST_LOW (VECTOR_CST_ELT (m, idx));
> +  idx %= 2 * nelts;
> +  if (idx < nelts)
> +   {
> + p = gimple_assign_rhs1 (def_stmt);
> +   }
> +  else
> +   {
> + p = gimple_assign_rhs2 (def_stmt);
> + idx -= nelts;
> +   }
> +  index = build_int_cst (TREE_TYPE (TREE_TYPE (m)), idx * size);
> +  tem = build3 (BIT_FIELD_REF, TREE_TYPE (op),
> +unshare_expr (p), op1, index);
> +  gimple_assign_set_rhs1 (stmt, tem);
> +  fold_stmt (gsi);
> +  update_stmt (gsi_stmt (*gsi));
> +  return true;
> +}
> +
> +  return false;
> +}
> +
>  /* De

Re: [Patch contrib] check_GNU_style: remove tmp file

2012-09-10 Thread Christophe Lyon

On 9 September 2012 12:46, Gerald Pfeifer  wrote:
> On Mon, 3 Sep 2012, Christophe Lyon wrote:
>> check_GNU_style.sh currently leaves a temporary file in the current
>> directory. This patch removes it upon exit.
>>
>> Christophe.
>>
>> 2012-09-03   Christophe Lyon  
>>
>>   * check_GNU_style.sh: Remove temporay file upon exit.
>
> Shouldn't this also be removed upon abort?
>
> See contrib/warn_summary, for an example,
>
> Gerald

Good point. Here is a new version, catching the same signals as warn_summary.

Christophe.


check-gnu-style.patch
Description: Binary data

Re: [SH] PR 54089 - Improve software dynamic shifts

2012-09-10 Thread Kaz Kojima

Oleg Endo  wrote:
> This patch does two things...
> 
> 1) The dynamic shift cost is set to be the same if HW dynamic shifts are
> available.  This improves code size for SH2A a little (-2 KByte on CSiBE
> for -m2a-single -O2).
> 
> 2) Improve code around library function calls for software dynamic
> shifts (logical right + left shifts only for now).
> For this I had to change the implementations of ashlsi3 and lshrsi3 in
> lib1funcs.S, but  the changes are backwards compatible with older
> binaries.  Due to the additional branch insn in the dyn shift functions
> they might be one or two cycles slower than the original, but this
> reduces the amount of clobbered regs and cuts 9.5 KByte in the CSiBE set
> (-m2 -ml -O2), which seems more beneficial to do on average.
> 
> Tested on rev. 190990 with
> make -k check RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> and no new failures except for this one on SH2:
> 
> FAIL: gcc.dg/pr28402.c scan-assembler-not __[a-z]*si3
> 
> The reason for this is that now the middle-end will expand DImode shifts
> as SImode shifts instead of a DImode shift library call, because it sees
> the new SImode dynamic library call shift patterns for SH2.  I will have
> a look at this issue later to see if it is beneficial to do special
> handling of DImode shifts on SH2.
> 
> OK to install?

OK.

Regards,
kaz

RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-10 Thread Iyer, Balaji V

>-Original Message-
>From: Joseph Myers [mailto:jos...@codesourcery.com]
>Sent: Monday, September 10, 2012 7:23 AM
>To: Iyer, Balaji V
>Cc: gcc-patches@gcc.gnu.org; Aldy Hernandez (al...@redhat.com); Jeff Law;
>r...@redhat.com
>Subject: RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)
>
>On Sun, 9 Sep 2012, Iyer, Balaji V wrote:
>
>>  Here is an updated patch. I think I have fixed all the changes you
>> and others have mentioned. Please let me know if everything looks OK.
>> Thanks again for doing the review!
>
>Has the user documentation for this feature been posted?  For patch review we
>really need a self-contained submission that for any feature implemented
>includes not just the implementation but the testcases and the documentation.  
>I
>think the testsuite patch also needs reworking to make it easy to add support 
>for
>new architectures.

I think you had some changes for the test cases and I am currently working on 
it. 

>
>I think you need to revisit your split into 22 patches and arrange things based
>primarily on features.  If the changes for a feature are so big they can't be 
>posted
>in one message, you should still always post all the patches for that feature
>together (implementation, documentation,
>testcases) - even if not all parts have changed in a particular revision.

So, I assume it is OK for me to include testsuites with the code-changes? I 
included them separately because I remember someone in the mailing list saying 
the patch size must be small and one logical way is to put test cases 
separately from the code-changes.

>
>--
>Joseph S. Myers
>jos...@codesourcery.com

Re: VxWorks Patches Back from the Dead!

2012-09-10 Thread Bruce Korb

On 09/09/12 08:54, rbmj wrote:
> Just because I *love* bothering everyone with emails...

I don't mind, as long as you don't expect me to do anything
until I'm certain you've stabilized the patch ;)
I'm glad you rolled it up into one patch, because I was
eventually going to ask you to do that.  Thank  you.

Cheers - Bruce

> I've made a few changes and squashed everything into a single patch for ease 
> of application.  The commit message is inside the patch, but here's the 
> suggested ChangeLog:
> 
> configure.ac: add --enable-libstdcxx option
> configure: regenerate
> 
> [gcc]
> gcov-io.c (gcov_open): Pass mode to open() unconditionally
> 
> [fixincludes]
> fixinc.in: Added ability to skip machine_name
> inclhack.def (AAB_vxworks_assert): Added fix
> inclhack.def (AAB_vxworks_regs_vxtypes): Added fix
> inclhack.def (AAB_vxworks_stdint): Added fix
> inclhack.def (AAB_vxworks_unistd): Added fix
> inclhack.def (vxworks_ioctl_macro): Added fix
> inclhack.def (vxworks_mkdir_macro): Added fix
> inclhack.def (vxworks_regs): Added fix
> inclhack.def (vxworks_write_const):  Added fix
> fixincl.x:  Regenerate
> mkfixinc.sh: Removed vxworks from list of no-op fixinc targets
> 
> [libstdc++-v3]
> config/os/vxworks/os_defines.h: #define'd NOMINMAX
> 
> Thanks,
> 
> Robert Mason
>

[SH] Add simple_return pattern

2012-09-10 Thread Christian Bruel

This patch implements the simple_return pattern to enable -fshrink-wrap
on SH. It also clean up some redundancies for expand_epilogue (called
twice from the "return" and "epilogue" patterns and the
sh_expand_prologue parameter type.

No regressions with sh-superh-elf and sh4-linux gcc testsuites.

Thanks

Christian

2012-08-29  Christian Bruel  

	* config/sh/sh-protos.h (sh_need_epilogue): Delete.
	* config/sh/sh.c (sh_need_epilogue): Delete.
	(sh_need_epilogue_known): Delete.
	(sh_output_function_epilogue): Remove sh_need_epilogue_known.
	* config/sh/sh.md (any_return): New iterator and optab.
	(simple_return): Define.
	(return): Check epilogue_completed.
	(epilogue): Use inline return rtl.
	(sh_expand_epilogue): Cleanup parameters boolean type.

Index: gcc/config/sh/sh-protos.h
===
--- gcc/config/sh/sh-protos.h	(revision 191129)
+++ gcc/config/sh/sh-protos.h	(working copy)
@@ -117,7 +117,6 @@
 extern int sh_media_register_for_return (void);
 extern void sh_expand_prologue (void);
 extern void sh_expand_epilogue (bool);
-extern bool sh_need_epilogue (void);
 extern void sh_set_return_address (rtx, rtx);
 extern int initial_elimination_offset (int, int);
 extern bool fldi_ok (void);
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 191129)
+++ gcc/config/sh/sh.c	(working copy)
@@ -7901,22 +7901,6 @@
 
 static int sh_need_epilogue_known = 0;
 
-bool
-sh_need_epilogue (void)
-{
-  if (! sh_need_epilogue_known)
-{
-  rtx epilogue;
-
-  start_sequence ();
-  sh_expand_epilogue (0);
-  epilogue = get_insns ();
-  end_sequence ();
-  sh_need_epilogue_known = (epilogue == NULL ? -1 : 1);
-}
-  return sh_need_epilogue_known > 0;
-}
-
 /* Emit code to change the current function's return address to RA.
TEMP is available as a scratch register, if needed.  */
 
@@ -7996,7 +7980,6 @@
 sh_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
 			 HOST_WIDE_INT size ATTRIBUTE_UNUSED)
 {
-  sh_need_epilogue_known = 0;
 }
 
 static rtx
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 191129)
+++ gcc/config/sh/sh.md	(working copy)
@@ -177,6 +177,10 @@
   (UNSPECV_EH_RETURN	12)
 ])
 
+(define_code_iterator any_return [return simple_return])
+(define_code_attr optab [(return "return")
+			 (simple_return "simple_return")])
+
 ;; -
 ;; Attributes
 ;; -
@@ -9280,7 +9284,7 @@
   [(return)]
   ""
 {
-  sh_expand_epilogue (1);
+  sh_expand_epilogue (true);
   if (TARGET_SHCOMPACT)
 {
   rtx insn, set;
@@ -10099,9 +10103,13 @@
 }
   [(set_attr "type" "load_media")])
 
+(define_expand "simple_return"
+  [(simple_return)]
+ "")
+
 (define_expand "return"
-  [(return)]
-  "reload_completed && ! sh_need_epilogue ()"
+  [(simple_return)]
+ "reload_completed && epilogue_completed"
 {
   if (TARGET_SHMEDIA)
 {
@@ -10117,8 +10125,8 @@
 }
 })
 
-(define_insn "*return_i"
-  [(return)]
+(define_insn "*_i"
+  [(any_return)]
   "TARGET_SH1 && ! (TARGET_SHCOMPACT
 		&& (crtl->args.info.call_cookie
 			& CALL_COOKIE_RET_TRAMP (1)))
@@ -10244,19 +10252,12 @@
 (define_expand "prologue"
   [(const_int 0)]
   ""
-{
-  sh_expand_prologue ();
-  DONE;
-})
+  "sh_expand_prologue (); DONE;")
 
 (define_expand "epilogue"
   [(return)]
   ""
-{
-  sh_expand_epilogue (0);
-  emit_jump_insn (gen_return ());
-  DONE;
-})
+  "sh_expand_epilogue (false);")
 
 (define_expand "eh_return"
   [(use (match_operand 0 "register_operand" ""))]

Re: [PATCH] PowerPC VLE port

2012-09-10 Thread James Lemke


On 09/07/2012 07:52 PM, David Edelsohn wrote:

This patch contains a lot of unnecessary, gratuitous changes in
addition to being very invasive.  It was not edited and cleaned
sufficiently before posting.  It has too much of a negative impact on
the current PowerPC port.  The patch is not going to be accepted in
its current form.


David,
What are your thoughts on how to move forward.

--
Jim Lemke
Mentor Graphics / CodeSourcery
Orillia Ontario,  +1-613-963-1073

[PATCH] Fix PR54520


The following fixes PR54520 - we were not updating bb->loop_father
for all basic-blocks converted to "pre-header" blocks during jump
threading.  Fixed as follows.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-09-10  Richard Guenther  

* tree-ssa-threadupdate.c (def_split_header_continue_p):
Properly consider sub-loops.

Index: gcc/tree-ssa-threadupdate.c
===
*** gcc/tree-ssa-threadupdate.c (revision 191129)
--- gcc/tree-ssa-threadupdate.c (working copy)
*** static bool
*** 846,853 
  def_split_header_continue_p (const_basic_block bb, const void *data)
  {
const_basic_block new_header = (const_basic_block) data;
!   return (bb->loop_father == new_header->loop_father
! && bb != new_header);
  }
  
  /* Thread jumps through the header of LOOP.  Returns true if cfg changes.
--- 846,854 
  def_split_header_continue_p (const_basic_block bb, const void *data)
  {
const_basic_block new_header = (const_basic_block) data;
!   return (bb != new_header
! && (loop_depth (bb->loop_father)
! >= loop_depth (new_header->loop_father)));
  }
  
  /* Thread jumps through the header of LOOP.  Returns true if cfg changes.
*** thread_through_loop_header (struct loop
*** 1031,1040 
nblocks = dfs_enumerate_from (header, 0, def_split_header_continue_p,
bblocks, loop->num_nodes, tgt_bb);
for (i = 0; i < nblocks; i++)
!   {
! remove_bb_from_loops (bblocks[i]);
! add_bb_to_loop (bblocks[i], loop_outer (loop));
!   }
free (bblocks);
  
/* If the new header has multiple latches mark it so.  */
--- 1032,1042 
nblocks = dfs_enumerate_from (header, 0, def_split_header_continue_p,
bblocks, loop->num_nodes, tgt_bb);
for (i = 0; i < nblocks; i++)
!   if (bblocks[i]->loop_father == loop)
! {
!   remove_bb_from_loops (bblocks[i]);
!   add_bb_to_loop (bblocks[i], loop_outer (loop));
! }
free (bblocks);
  
/* If the new header has multiple latches mark it so.  */
Index: gcc/testsuite/gcc.dg/torture/pr54520.c
===
*** gcc/testsuite/gcc.dg/torture/pr54520.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr54520.c  (working copy)
***
*** 0 
--- 1,15 
+ /* { dg-do compile } */
+ 
+ char *a;
+ void
+ fn1 ()
+ {
+   char *p = a;
+   while (p && *p != '\0')
+ {
+   while (*p == '\t')
+   *p++ = '\0';
+   if (*p != '\0')
+   p = 0;
+ }
+ }

C++ PATCH for c++/54506 (wrong implicit move)

2012-09-10 Thread Jason Merrill

This area of the standard is in flux, but what we were doing was 
definitely wrong.  The proposed resolution for issue 1402 says that if a 
move constructor would call a non-trivial non-move constructor for a 
subobject, it is not implicitly declared.  We might end up dropping that 
provision entirely, but in any case we should allow moving via template 
constructor as well as non-template.


Tested x86_64-pc-linux-gnu, applying to trunk and 4.7 (since it only 
affects C++11 mode).
commit 37c8977bb82c984645795a9992fe6658841e2d35
Author: Jason Merrill 
Date:   Mon Sep 10 09:22:37 2012 -0400

	PR c++/54506
	* decl.c (move_signature_fn_p): Split out from move_fn_p.
	* method.c (process_subob_fn): Use it.
	* cp-tree.h: Declare it.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 3e0fc3f..3c55ba4 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5066,6 +5066,7 @@ extern tree build_ptrmem_type			(tree, tree);
 extern tree build_this_parm			(tree, cp_cv_quals);
 extern int copy_fn_p(const_tree);
 extern bool move_fn_p   (const_tree);
+extern bool move_signature_fn_p (const_tree);
 extern tree get_scope_of_declarator		(const cp_declarator *);
 extern void grok_special_member_properties	(tree);
 extern int grok_ctor_properties			(const_tree, const_tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 7655f78..e34092d 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10859,10 +10859,6 @@ copy_fn_p (const_tree d)
 bool
 move_fn_p (const_tree d)
 {
-  tree args;
-  tree arg_type;
-  bool result = false;
-
   gcc_assert (DECL_FUNCTION_MEMBER_P (d));
 
   if (cxx_dialect == cxx98)
@@ -10872,12 +10868,29 @@ move_fn_p (const_tree d)
   if (TREE_CODE (d) == TEMPLATE_DECL
   || (DECL_TEMPLATE_INFO (d)
  && DECL_MEMBER_TEMPLATE_P (DECL_TI_TEMPLATE (d
-/* Instantiations of template member functions are never copy
+/* Instantiations of template member functions are never move
functions.  Note that member functions of templated classes are
represented as template functions internally, and we must
-   accept those as copy functions.  */
+   accept those as move functions.  */
 return 0;
 
+  return move_signature_fn_p (d);
+}
+
+/* D is a constructor or overloaded `operator='.
+
+   Then, this function returns true when D has the same signature as a move
+   constructor or move assignment operator (because either it is such a
+   ctor/op= or it is a template specialization with the same signature),
+   false otherwise.  */
+
+bool
+move_signature_fn_p (const_tree d)
+{
+  tree args;
+  tree arg_type;
+  bool result = false;
+
   args = FUNCTION_FIRST_USER_PARMTYPE (d);
   if (!args)
 return 0;
diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index c21ae15..a42ed60 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -947,9 +947,10 @@ process_subob_fn (tree fn, bool move_p, tree *spec_p, bool *trivial_p,
 	}
 }
 
-  /* Core 1402: A non-trivial copy op suppresses the implicit
+  /* Core 1402: A non-trivial non-move ctor suppresses the implicit
  declaration of the move ctor/op=.  */
-  if (no_implicit_p && move_p && !move_fn_p (fn) && !trivial_fn_p (fn))
+  if (no_implicit_p && move_p && !move_signature_fn_p (fn)
+  && !trivial_fn_p (fn))
 *no_implicit_p = true;
 
   if (constexpr_p && !DECL_DECLARED_CONSTEXPR_P (fn))
diff --git a/gcc/testsuite/g++.dg/cpp0x/implicit14.C b/gcc/testsuite/g++.dg/cpp0x/implicit14.C
new file mode 100644
index 000..8a56244
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/implicit14.C
@@ -0,0 +1,26 @@
+// PR c++/54506
+// { dg-do compile { target c++11 } }
+
+template 
+struct A
+{
+  A() {}
+
+  A(A const volatile &&) = delete;
+  A &operator =(A const volatile &&) = delete;
+
+  template  A(A &&) {}
+  template  A &operator =(A &&) { return *this; }
+};
+
+struct B
+{
+  A a;
+  B() = default;
+};
+
+int main()
+{
+  B b = B();
+  b = B();
+}

[google] Fix exception in unroller code size heuristics (issue6498112)

2012-09-10 Thread Teresa Johnson

Fix divide by zero error.

Passes bootstrap and regression tests. Ok for google branches?

Teresa

2012-09-10  Teresa Johnson  

* loop-unroll.c (code_size_limit_factor):

Index: loop-unroll.c
===
--- loop-unroll.c   (revision 191138)
+++ loop-unroll.c   (working copy)
@@ -223,7 +223,8 @@ code_size_limit_factor(struct loop *loop)
   /* Next, set the value of the codesize-based unroll factor divisor which in
  most loops will need to be set to a value that will reduce or eliminate
  unrolling/peeling.  */
-  if (profile_info->num_hot_counters < size_threshold * 2)
+  if (profile_info->num_hot_counters < size_threshold * 2
+  && loop->header->count > 0)
 {
   /* For applications that are less than twice the codesize limit, allow
  limited unrolling for very hot loops.  */

--
This patch is available for review at http://codereview.appspot.com/6498112

Re: [google] Fix exception in unroller code size heuristics (issue6498112)

2012-09-10 Thread Diego Novillo

On Mon, Sep 10, 2012 at 10:16 AM, Teresa Johnson  wrote:

> 2012-09-10  Teresa Johnson  
>
> * loop-unroll.c (code_size_limit_factor):

OK.


Diego.

[PATCH] Fix PR54492

Richard found some N^2 behavior in SLSR that has to be suppressed.
Searching for the best possible basis is overkill when there are
hundreds of thousands of possibilities.  This patch constrains the
search to "good enough" in such cases.

Bootstrapped and tested on powerpc64-unknown-linux-gnu with no
regressions.  Ok for trunk?

Thanks,
Bill


2012-08-10  Bill Schmidt  

* gimple-ssa-strength-reduction.c (find_basis_for_candidate): Limit
the time spent searching for a basis.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 191135)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -353,10 +353,14 @@ find_basis_for_candidate (slsr_cand_t c)
   cand_chain_t chain;
   slsr_cand_t basis = NULL;
 
+  // Limit potential of N^2 behavior for long candidate chains.
+  int iters = 0;
+  const int MAX_ITERS = 50;
+
   mapping_key.base_expr = c->base_expr;
   chain = (cand_chain_t) htab_find (base_cand_map, &mapping_key);
 
-  for (; chain; chain = chain->next)
+  for (; chain && iters < MAX_ITERS; chain = chain->next, ++iters)
 {
   slsr_cand_t one_basis = chain->cand;

Re: [PATCH] Fix PR54492

On Mon, 10 Sep 2012, William J. Schmidt wrote:

> Richard found some N^2 behavior in SLSR that has to be suppressed.
> Searching for the best possible basis is overkill when there are
> hundreds of thousands of possibilities.  This patch constrains the
> search to "good enough" in such cases.
> 
> Bootstrapped and tested on powerpc64-unknown-linux-gnu with no
> regressions.  Ok for trunk?

Hm, rather than stopping the search, can we stop adding new candidates
instead so the list never grows that long?  If that's not easy
the patch is ok as-is.

Thanks,
Richard.

> Thanks,
> Bill
> 
> 
> 2012-08-10  Bill Schmidt  
> 
>   * gimple-ssa-strength-reduction.c (find_basis_for_candidate): Limit
>   the time spent searching for a basis.
> 
> 
> Index: gcc/gimple-ssa-strength-reduction.c
> ===
> --- gcc/gimple-ssa-strength-reduction.c   (revision 191135)
> +++ gcc/gimple-ssa-strength-reduction.c   (working copy)
> @@ -353,10 +353,14 @@ find_basis_for_candidate (slsr_cand_t c)
>cand_chain_t chain;
>slsr_cand_t basis = NULL;
>  
> +  // Limit potential of N^2 behavior for long candidate chains.
> +  int iters = 0;
> +  const int MAX_ITERS = 50;
> +
>mapping_key.base_expr = c->base_expr;
>chain = (cand_chain_t) htab_find (base_cand_map, &mapping_key);
>  
> -  for (; chain; chain = chain->next)
> +  for (; chain && iters < MAX_ITERS; chain = chain->next, ++iters)
>  {
>slsr_cand_t one_basis = chain->cand;
>  
> 
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

Re: [PATCH] Fix PR54492

2012-09-10 Thread Jakub Jelinek

On Mon, Sep 10, 2012 at 04:45:24PM +0200, Richard Guenther wrote:
> On Mon, 10 Sep 2012, William J. Schmidt wrote:
> 
> > Richard found some N^2 behavior in SLSR that has to be suppressed.
> > Searching for the best possible basis is overkill when there are
> > hundreds of thousands of possibilities.  This patch constrains the
> > search to "good enough" in such cases.
> > 
> > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no
> > regressions.  Ok for trunk?
> 
> Hm, rather than stopping the search, can we stop adding new candidates
> instead so the list never grows that long?  If that's not easy
> the patch is ok as-is.

Don't we want a param for that, or is a hardcoded magic constant fine here?

> > 2012-08-10  Bill Schmidt  
> > 
> > * gimple-ssa-strength-reduction.c (find_basis_for_candidate): Limit
> > the time spent searching for a basis.
> > 
> > 
> > Index: gcc/gimple-ssa-strength-reduction.c
> > ===
> > --- gcc/gimple-ssa-strength-reduction.c (revision 191135)
> > +++ gcc/gimple-ssa-strength-reduction.c (working copy)
> > @@ -353,10 +353,14 @@ find_basis_for_candidate (slsr_cand_t c)
> >cand_chain_t chain;
> >slsr_cand_t basis = NULL;
> >  
> > +  // Limit potential of N^2 behavior for long candidate chains.
> > +  int iters = 0;
> > +  const int MAX_ITERS = 50;
> > +
> >mapping_key.base_expr = c->base_expr;
> >chain = (cand_chain_t) htab_find (base_cand_map, &mapping_key);
> >  
> > -  for (; chain; chain = chain->next)
> > +  for (; chain && iters < MAX_ITERS; chain = chain->next, ++iters)
> >  {
> >slsr_cand_t one_basis = chain->cand;

Jakub

Re: [PATCH] Fix PR54492

On Mon, 2012-09-10 at 16:45 +0200, Richard Guenther wrote:
> On Mon, 10 Sep 2012, William J. Schmidt wrote:
> 
> > Richard found some N^2 behavior in SLSR that has to be suppressed.
> > Searching for the best possible basis is overkill when there are
> > hundreds of thousands of possibilities.  This patch constrains the
> > search to "good enough" in such cases.
> > 
> > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no
> > regressions.  Ok for trunk?
> 
> Hm, rather than stopping the search, can we stop adding new candidates
> instead so the list never grows that long?  If that's not easy
> the patch is ok as-is.

I think this way is probably better.  Right now the potential bases are
organized as a stack with new ones added to the front and considered
first.  To disable it there would require adding state to keep a count,
and then we would only be looking at the most distant ones.  This way
the 50 most recently added potential bases (most likely to be local) are
considered.

Thanks,
Bill

> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Bill
> > 
> > 
> > 2012-08-10  Bill Schmidt  
> > 
> > * gimple-ssa-strength-reduction.c (find_basis_for_candidate): Limit
> > the time spent searching for a basis.
> > 
> > 
> > Index: gcc/gimple-ssa-strength-reduction.c
> > ===
> > --- gcc/gimple-ssa-strength-reduction.c (revision 191135)
> > +++ gcc/gimple-ssa-strength-reduction.c (working copy)
> > @@ -353,10 +353,14 @@ find_basis_for_candidate (slsr_cand_t c)
> >cand_chain_t chain;
> >slsr_cand_t basis = NULL;
> >  
> > +  // Limit potential of N^2 behavior for long candidate chains.
> > +  int iters = 0;
> > +  const int MAX_ITERS = 50;
> > +
> >mapping_key.base_expr = c->base_expr;
> >chain = (cand_chain_t) htab_find (base_cand_map, &mapping_key);
> >  
> > -  for (; chain; chain = chain->next)
> > +  for (; chain && iters < MAX_ITERS; chain = chain->next, ++iters)
> >  {
> >slsr_cand_t one_basis = chain->cand;
> >  
> > 
> > 
> > 
>

Re: [PATCH] Fix PR54492

On Mon, 10 Sep 2012, Jakub Jelinek wrote:

> On Mon, Sep 10, 2012 at 04:45:24PM +0200, Richard Guenther wrote:
> > On Mon, 10 Sep 2012, William J. Schmidt wrote:
> > 
> > > Richard found some N^2 behavior in SLSR that has to be suppressed.
> > > Searching for the best possible basis is overkill when there are
> > > hundreds of thousands of possibilities.  This patch constrains the
> > > search to "good enough" in such cases.
> > > 
> > > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no
> > > regressions.  Ok for trunk?
> > 
> > Hm, rather than stopping the search, can we stop adding new candidates
> > instead so the list never grows that long?  If that's not easy
> > the patch is ok as-is.
> 
> Don't we want a param for that, or is a hardcoded magic constant fine here?

I suppose a param for it would be nice.

Richard.

> > > 2012-08-10  Bill Schmidt  
> > > 
> > >   * gimple-ssa-strength-reduction.c (find_basis_for_candidate): Limit
> > >   the time spent searching for a basis.
> > > 
> > > 
> > > Index: gcc/gimple-ssa-strength-reduction.c
> > > ===
> > > --- gcc/gimple-ssa-strength-reduction.c   (revision 191135)
> > > +++ gcc/gimple-ssa-strength-reduction.c   (working copy)
> > > @@ -353,10 +353,14 @@ find_basis_for_candidate (slsr_cand_t c)
> > >cand_chain_t chain;
> > >slsr_cand_t basis = NULL;
> > >  
> > > +  // Limit potential of N^2 behavior for long candidate chains.
> > > +  int iters = 0;
> > > +  const int MAX_ITERS = 50;
> > > +
> > >mapping_key.base_expr = c->base_expr;
> > >chain = (cand_chain_t) htab_find (base_cand_map, &mapping_key);
> > >  
> > > -  for (; chain; chain = chain->next)
> > > +  for (; chain && iters < MAX_ITERS; chain = chain->next, ++iters)
> > >  {
> > >slsr_cand_t one_basis = chain->cand;
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

[Patch][AArch64] Expand binary operations' constant operands for neon intrinsics.



Hi,

This patch expands an Advanced SIMD intrinsic's operand into a constant operand 
only if the predicate allows it.


Regression-tested on aarch64-none-elf. OK for aarch64-branch?

Thanks,
Tejas Belagod
ARM.

Changelog

2012-09-10  Tejas Belagod  

gcc/
* config/aarch64/aarch64.c (aarch64_simd_expand_builtin): Expand binary
operations' constant operand only if the predicate allows it.diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 04cc48a..731f369 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1215,13 +1215,17 @@ aarch64_simd_expand_builtin (int fcode, tree exp, rtx 
target)
 
 case AARCH64_SIMD_BINOP:
   {
-   bool op1_const_int_p
- = CONST_INT_P (expand_normal (CALL_EXPR_ARG (exp, 1)));
-   return aarch64_simd_expand_args (target, icode, 1, exp,
-SIMD_ARG_COPY_TO_REG,
-op1_const_int_p ? SIMD_ARG_CONSTANT
-: SIMD_ARG_COPY_TO_REG,
-SIMD_ARG_STOP);
+rtx arg2 = expand_normal (CALL_EXPR_ARG (exp, 1));
+/* Handle constants only if the predicate allows it.  */
+   bool op1_const_int_p =
+ (CONST_INT_P (arg2)
+  && (*insn_data[icode].operand[2].predicate)
+   (arg2, insn_data[icode].operand[2].mode));
+   return aarch64_simd_expand_args
+ (target, icode, 1, exp,
+  SIMD_ARG_COPY_TO_REG,
+  op1_const_int_p ? SIMD_ARG_CONSTANT : SIMD_ARG_COPY_TO_REG,
+  SIMD_ARG_STOP);
   }
 
 case AARCH64_SIMD_TERNOP:

[Patch][AArch64] Split a move of Q-reg vectors contained in general regs.


Hi,

This patch fixes the mov pattern to split a move between general regs that
contain a Q-reg vector value.

Regression-tested on aarch64-none-elf. OK for aarch64-branch?

Thanks,
Tejas Belagod
ARM.

Changelog:

2012-09-10  Tejas Belagod  

gcc/
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov): Split Q-reg
vector value move contained in general registers.diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index d3f8ef2..1113b06 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -443,7 +443,7 @@
  case 2: return "orr\t%0., %1., %1.";
  case 3: return "umov\t%0, %1.d[0]\;umov\t%H0, %1.d[1]";
  case 4: return "ins\t%0.d[0], %1\;ins\t%0.d[1], %H1";
- case 5: return "mov\t%0, %1;mov\t%H0, %H1";
+ case 5: return "#";
  case 6:
{
int is_valid;
@@ -475,6 +475,27 @@
(set_attr "length" "4,4,4,8,8,8,4")]
 )
 
+(define_split
+  [(set (match_operand:VQ 0 "register_operand" "")
+  (match_operand:VQ 1 "register_operand" ""))]
+  "TARGET_SIMD && reload_completed
+   && GP_REGNUM_P (REGNO (operands[0]))
+   && GP_REGNUM_P (REGNO (operands[1]))"
+  [(set (match_dup 0) (match_dup 1))
+   (set (match_dup 2) (match_dup 3))]
+{
+  int rdest = REGNO (operands[0]);
+  int rsrc = REGNO (operands[1]);
+  rtx dest[2], src[2];
+
+  dest[0] = gen_rtx_REG (DImode, rdest);
+  src[0] = gen_rtx_REG (DImode, rsrc);
+  dest[1] = gen_rtx_REG (DImode, rdest + 1);
+  src[1] = gen_rtx_REG (DImode, rsrc + 1);
+
+  aarch64_simd_disambiguate_copy (operands, dest, src, 2);
+})
+
 (define_insn "orn3"
  [(set (match_operand:VDQ 0 "register_operand" "=w")
(ior:VDQ (not:VDQ (match_operand:VDQ 1 "register_operand" "w"))

[Patch][AArch64] Tighten predicate for CMP pattern.


Hi,

This patch tightens the predicate for the CMP pattern. It makes it restrictive
to accept reg or zero as prescribed by the architecture.

Regression-tested on aarch64-none-elf. OK for aarch64-branch?

Thanks,
Tejas Belagod
ARM.

PS: This patch applies over vldn-vstn.txt sent out earlier.

Changelog:

2012-09-10  Tejas Belagod  

gcc/
* config/aarch64/aarch64-simd.md (aarch64_cm): Tighten
predicate for operand 2 of the compare pattern to accept register
or zero.
* config/aarch64/predicates.md (aarch64_simd_reg_or_zero): New.diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index d3f8ef2..50114aa 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2670,7 +2670,7 @@
   [(set (match_operand: 0 "register_operand" "=w,w")
 (unspec:
  [(match_operand:VSDQ_I_DI 1 "register_operand" "w,w")
-  (match_operand:VSDQ_I_DI 2 "nonmemory_operand" "w,Z")]
+  (match_operand:VSDQ_I_DI 2 "aarch64_simd_reg_or_zero" "w,Z")]
   VCMP_S))]
   "TARGET_SIMD"
   "@
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 328e5cf..f40ab56 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -265,3 +265,10 @@
 {
   return aarch64_simd_shift_imm_p (op, mode, false);
 })
+
+(define_predicate "aarch64_simd_reg_or_zero"
+  (and (match_code "reg,subreg,const_int,const_vector")
+   (ior (match_operand 0 "register_operand")
+   (ior (match_test "op == const0_rtx")
+(match_test "aarch64_simd_imm_zero_p (op, mode)")
+

RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-10 Thread Joseph S. Myers

On Mon, 10 Sep 2012, Iyer, Balaji V wrote:

> So, I assume it is OK for me to include testsuites with the 
> code-changes? I included them separately because I remember someone in 
> the mailing list saying the patch size must be small and one logical way 
> is to put test cases separately from the code-changes.

I believe the message size limit for gcc-patches is 400 kB, and this patch 
is way below that.  But if the self-contained unit of changes exceeds 400 
kB, you should still post the whole thing at once in multiple messages so 
that a matched set of patches that could go in together is available for 
review.

-- 
Joseph S. Myers
jos...@codesourcery.com

[Patch][AArch64] Move immediate into Advanced SIMD scalar.



Hi,

This patch adds support for move an immediate DImode value into an AdvSIMD 
scalar D register. i.e. movi Dd, #imm.


Regression-tested on aarch64-none-elf. OK for aarch64-branch?

Thanks,
Tejas Belagod.
ARM.

Changelog:

2012-09-10  Tejas Belagod  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_simd_imm_scalar_p): Declare.
* config/aarch64/aarch64.c (aarch64_simd_imm_scalar_p): New.
* config/aarch64/aarch64.md (*movdi_aarch64): Add alternative for moving
valid scalar immediate into a Advanved SIMD D-register.
* config/aarch64/constraints.md (Dd): New.diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index afb8b1e..e6d35e4 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -178,6 +178,7 @@ bool aarch64_pad_arg_upward (enum machine_mode, const_tree);
 bool aarch64_pad_reg_upward (enum machine_mode, const_tree, bool);
 bool aarch64_regno_ok_for_base_p (int, bool);
 bool aarch64_regno_ok_for_index_p (int, bool);
+bool aarch64_simd_imm_scalar_p (rtx x, enum machine_mode mode);
 bool aarch64_simd_imm_zero_p (rtx, enum machine_mode);
 bool aarch64_simd_shift_imm_p (rtx, enum machine_mode, bool);
 bool aarch64_symbolic_address_p (rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 10af252..b90be6d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6508,6 +6508,23 @@ aarch64_simd_imm_zero_p (rtx x, enum machine_mode mode)
   return true;
 }
 
+bool
+aarch64_simd_imm_scalar_p (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+  HOST_WIDE_INT imm = INTVAL (x);
+  int i;
+
+  for (i = 0; i < 8; i++)
+{
+  unsigned int byte = imm & 0xff;
+  if (byte != 0xff && byte != 0)
+   return false;
+  imm >>= 8;
+}
+
+  return true;
+}
+
 /* Return a const_int vector of VAL.  */
 rtx
 aarch64_simd_gen_const_vector_dup (enum machine_mode mode, int val)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8f52ed4..78a71fe 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -950,8 +950,8 @@
 )
 
 (define_insn "*movdi_aarch64"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,m, r,  r,  *w, 
r,*w")
-   (match_operand:DI 1 "aarch64_mov_operand"  " 
r,r,k,N,m,rZ,Usa,Ush,rZ,*w,*w"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,m, r,  r,  *w, 
r,*w,w")
+   (match_operand:DI 1 "aarch64_mov_operand"  " 
r,r,k,N,m,rZ,Usa,Ush,rZ,*w,*w,Dd"))]
   "(register_operand (operands[0], DImode)
 || aarch64_reg_or_zero (operands[1], DImode))"
   "@
@@ -965,10 +965,11 @@
adrp\\t%x0, %A1
fmov\\t%d0, %x1
fmov\\t%x0, %d1
-   fmov\\t%d0, %d1"
-  [(set_attr "v8type" "move,move,move,alu,load1,store1,adr,adr,fmov,fmov,fmov")
+   fmov\\t%d0, %d1
+   movi\\t%d0, %1"
+  [(set_attr "v8type" 
"move,move,move,alu,load1,store1,adr,adr,fmov,fmov,fmov,fmov")
(set_attr "mode" "DI")
-   (set_attr "fp" "*,*,*,*,*,*,*,*,yes,yes,yes")]
+   (set_attr "fp" "*,*,*,*,*,*,*,*,yes,yes,yes,yes")]
 )
 
 (define_insn "insv_imm"
diff --git a/gcc/config/aarch64/constraints.md 
b/gcc/config/aarch64/constraints.md
index 267b0b8..fe61307 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -159,3 +159,9 @@
  A constraint that matches vector of immediate zero."
  (and (match_code "const_vector")
   (match_test "aarch64_simd_imm_zero_p (op, GET_MODE (op))")))
+
+(define_constraint "Dd"
+  "@internal
+ A constraint that matches an immediate operand valid for AdvSIMD scalar."
+ (and (match_code "const_int")
+  (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))")))

[Patch][AArch64] Fix vfmaq_lane_f64.



Hi,

This patch fixes vfmaq_lane_f64 () AdvSIMD intrinsic.

Regression-tested on aarch64-none-elf. OK for aarch64-branch?

Thanks,
Tejas Belagod.
ARM.

Changelog:

2012-09-10  Tejas Belagod  

gcc/
* config/aarch64/arm_neon.h (vfmaq_lane_f64): Fix prototype and
assembler template accordingly.diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index de3a2f2..54eb29c 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -7859,15 +7859,16 @@ vfmaq_f64 (float64x2_t a, float64x2_t b, float64x2_t c)
result;  \
  })
 
-#define vfmaq_lane_f64(a, b, c) \
+#define vfmaq_lane_f64(a, b, c, d)  \
   __extension__ \
 ({  \
+   float64x2_t c_ = (c);\
float64x2_t b_ = (b);\
float64x2_t a_ = (a);\
float64x2_t result;  \
-   __asm__ ("fmla %0.2d,%1.2d,%2.d[%3]" \
+   __asm__ ("fmla %0.2d,%2.2d,%3.d[%4]" \
 : "=w"(result)  \
-: "w"(a_), "w"(b_), "i"(c)  \
+: "0"(a_), "w"(b_), "w"(c_), "i"(d) \
 : /* No clobbers */);   \
result;  \
  })

[Patch][AArch64] Implement vmovq_n_f64.



Hi,

This patch adds the missing intrinsic vmovq_n_f64(). OK?

Thanks,
Tejas Belagod
ARM.

Changelog:

2012-09-10  Tejas Belagod  

gcc/
* config/aarch64/arm_neon.h (vmovq_n_f64): Add.diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index e7dadf9..cf8b676 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11753,6 +11753,12 @@ vmovq_n_f32 (float32_t a)
   return result;
 }
 
+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+vmovq_n_f64 (float64_t a)
+{
+  return (float64x2_t) {a, a};
+}
+
 __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
 vmovq_n_p8 (uint32_t a)
 {

[Patch][AArch64] Fix Narrowing high shifts.


Hi,

The attached patch has fixes to assembler templates for rshrn2 and shrn2. OK?

Thanks,
Tejas Belagod.
ARM.

Changelog:

2012-09-10  Tejas Belagod  

gcc/
* config/aarch64/arm_neon.h (vrshrn_high_n_s16, vrshrn_high_n_s32,
vrshrn_high_n_s64, vrshrn_high_n_u16, vrshrn_high_n_u32,
vrshrn_high_n_u64, vshrn_high_n_s16, vshrn_high_n_s32, vshrn_high_n_s32,
vshrn_high_n_s64, vshrn_high_n_u16, vshrn_high_n_u32, vshrn_high_n_u64):
Fix template to reference correct operands.diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 46abaf6..a4b2e78 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -15334,7 +15334,7 @@ vrndqp_f64 (float64x2_t a)
int8x8_t a_ = (a);   \
int8x16_t result = vcombine_s8   \
 (a_, vcreate_s8 (UINT64_C (0x0)));  \
-   __asm__ ("rshrn2 %0.16b,%2.8h,#%3"   \
+   __asm__ ("rshrn2 %0.16b,%1.8h,#%2"   \
 : "+w"(result)  \
 : "w"(b_), "i"(c)   \
 : /* No clobbers */);   \
@@ -15348,7 +15348,7 @@ vrndqp_f64 (float64x2_t a)
int16x4_t a_ = (a);  \
int16x8_t result = vcombine_s16  \
 (a_, vcreate_s16 (UINT64_C (0x0))); \
-   __asm__ ("rshrn2 %0.8h,%2.4s,#%3"\
+   __asm__ ("rshrn2 %0.8h,%1.4s,#%2"\
 : "+w"(result)  \
 : "w"(b_), "i"(c)   \
 : /* No clobbers */);   \
@@ -15362,7 +15362,7 @@ vrndqp_f64 (float64x2_t a)
int32x2_t a_ = (a);  \
int32x4_t result = vcombine_s32  \
 (a_, vcreate_s32 (UINT64_C (0x0))); \
-   __asm__ ("rshrn2 %0.4s,%2.2d,#%3"\
+   __asm__ ("rshrn2 %0.4s,%1.2d,#%2"\
 : "+w"(result)  \
 : "w"(b_), "i"(c)   \
 : /* No clobbers */);   \
@@ -15376,7 +15376,7 @@ vrndqp_f64 (float64x2_t a)
uint8x8_t a_ = (a);  \
uint8x16_t result = vcombine_u8  \
 (a_, vcreate_u8 (UINT64_C (0x0)));  \
-   __asm__ ("rshrn2 %0.16b,%2.8h,#%3"   \
+   __asm__ ("rshrn2 %0.16b,%1.8h,#%2"   \
 : "+w"(result)  \
 : "w"(b_), "i"(c)   \
 : /* No clobbers */);   \
@@ -15390,7 +15390,7 @@ vrndqp_f64 (float64x2_t a)
uint16x4_t a_ = (a); \
uint16x8_t result = vcombine_u16 \
 (a_, vcreate_u16 (UINT64_C (0x0))); \
-   __asm__ ("rshrn2 %0.8h,%2.4s,#%3"\
+   __asm__ ("rshrn2 %0.8h,%1.4s,#%2"\
 : "+w"(result)  \
 : "w"(b_), "i"(c)   \
 : /* No clobbers */);   \
@@ -15404,7 +15404,7 @@ vrndqp_f64 (float64x2_t a)
uint32x2_t a_ = (a); \
uint32x4_t result = vcombine_u32 \
 (a_, vcreate_u32 (UINT64_C (0x0))); \
-   __asm__ ("rshrn2 %0.4s,%2.2d,#%3"\
+   __asm__ ("rshrn2 %0.4s,%1.2d,#%2"\
 : "+w"(result)  \
 : "w"(b_), "i"(c)   \
 : /* No clobbers */);   \
@@ -16088,7 +16088,7 @@ vrsubhn_u64 (uint64x2_t a, uint64x2_t b)
int8x8_t a_ = (a);   \
int8x16_t result = vcombine_s8   \
 (a_, vcreate_s8 (UINT64_C (0x0)));  \
-   __asm__ ("shrn2 %0.16b,%2.8h,#%3"\
+   __asm__ ("shrn2 %0.16b,%1.8h,#

Re: [PATCH] Combine location with block using block_locations

2012-09-10 Thread Dehao Chen

On Mon, Sep 10, 2012 at 3:01 AM, Richard Guenther
 wrote:
> On Sun, Sep 9, 2012 at 12:26 AM, Dehao Chen  wrote:
>> Hi, Diego,
>>
>> Thanks a lot for the review. I've updated the patch.
>>
>> This patch is large and may easily break builds because it reserves
>> more complete information for TREE_BLOCK as well as gimple_block (may
>> trigger bugs that was hided when these info are unavailable). I've
>> done more rigorous testing to ensure that most bugs are caught before
>> checking in.
>>
>> * Sync to the head and retest all gcc testsuite.
>> * Port the patch to google-4_7 branch to retest all gcc testsuite, as
>> well as build many large applications.
>>
>> Through these tests, I've found two additional bugs that was omitted
>> in the original implementation. A new patch is attached (patch.txt) to
>> fix these problems. After this fix, all gcc testsuites pass for both
>> trunk and google-4_7 branch. I've also copy pasted the new fixes
>> (lto.c and tree-cfg.c) below. Now I'd say this patch is in good shape.
>> But it may not be perfect. I'll look into build failures as soon as it
>> arises.
>>
>> Richard and Diego, could you help me take a look at the following two fixes?
>>
>> Thanks,
>> Dehao
>>
>> New fixes:
>> --- gcc/lto/lto.c   (revision 191083)
>> +++ gcc/lto/lto.c   (working copy)
>> @@ -1559,8 +1559,6 @@ lto_fixup_prevailing_decls (tree t)
>>  {
>>enum tree_code code = TREE_CODE (t);
>>LTO_NO_PREVAIL (TREE_TYPE (t));
>> -  if (CODE_CONTAINS_STRUCT (code, TS_COMMON))
>> -LTO_NO_PREVAIL (TREE_CHAIN (t));
>
> That change is odd.  Can you show us how it breaks?

This will break LTO build of gcc.c-torture/execute/pr38051.c

There is data structure like:

  union { long int l; char c[sizeof (long int)]; } u;

Once the block info is reserved for this, it'll reserve this data
structure. And inside this data structure, there is VAR_DECL. Thus
LTO_NO_PREVAIL assertion does not satisfy here for TREE_CHAIN (t).

>
>>if (DECL_P (t))
>>  {
>>LTO_NO_PREVAIL (DECL_NAME (t));
>>
>> Index: gcc/tree-cfg.c
>> ===
>> --- gcc/tree-cfg.c  (revision 191083)
>> +++ gcc/tree-cfg.c  (working copy)
>> @@ -5980,9 +5974,21 @@ move_stmt_op (tree *tp, int *walk_subtrees, void *
>>tree t = *tp;
>>
>>if (EXPR_P (t))
>> -/* We should never have TREE_BLOCK set on non-statements.  */
>> -gcc_assert (!TREE_BLOCK (t));
>> -
>> +{
>> +  tree block = TREE_BLOCK (t);
>> +  if (p->orig_block == NULL_TREE
>> + || block == p->orig_block
>> + || block == NULL_TREE)
>> +   TREE_SET_BLOCK (t, p->new_block);
>> +#ifdef ENABLE_CHECKING
>> +  else if (block != p->new_block)
>> +   {
>> + while (block && block != p->orig_block)
>> +   block = BLOCK_SUPERCONTEXT (block);
>> + gcc_assert (block);
>> +   }
>> +#endif
>
> I think what this means is that TREE_BLOCK on non-stmts are meaningless
> (thus only gimple_block is interesting on GIMPLE, not BLOCKs on trees).
>
> So instead of setting a BLOCK in some cases you should clear BLOCK
> if it happens to be set, or alternatively, only re-set it if there was
> a block associated
> with it.

Yeah, makes sense. New change:

@@ -5980,9 +5974,10 @@
   tree t = *tp;

   if (EXPR_P (t))
-/* We should never have TREE_BLOCK set on non-statements.  */
-gcc_assert (!TREE_BLOCK (t));
-
+{
+  if (TREE_BLOCK (t))
+   TREE_SET_BLOCK (t, p->new_block);
+}
   else if (DECL_P (t) || TREE_CODE (t) == SSA_NAME)
 {
   if (TREE_CODE (t) == SSA_NAME)

Thanks,
Dehao

>
> Richard.
>
>> +}
>>else if (DECL_P (t) || TREE_CODE (t) == SSA_NAME)
>>  {
>>if (TREE_CODE (t) == SSA_NAME)
>>
>> Whole patch:
>> gcc/ChangeLog:
>> 2012-09-08  Dehao Chen  
>>
>> * toplev.c (general_init): Init block_locations.
>> * tree.c (tree_set_block): New.
>> (tree_block): Change to use LOCATION_BLOCK.
>> * tree.h (TREE_SET_BLOCK): New.
>> * final.c (reemit_insn_block_notes): Change to use LOCATION_BLOCK.
>> (final_start_function): Likewise.
>> * input.c (expand_location_1): Likewise.
>> * input.h (LOCATION_LOCUS): New.
>> (LOCATION_BLOCK): New.
>> (IS_UNKNOWN_LOCATION): New.
>> * fold-const.c (expr_location_or): Change to use new location.
>> * reorg.c (emit_delay_sequence): Likewise.
>> (try_merge_delay_insns): Likewise.
>> * modulo-sched.c (dump_insn_location): Likewise.
>> * lto-streamer-out.c (lto_output_location_bitpack): Likewise.
>> * jump.c (rtx_renumbered_equal_p): Likewise.
>> * ifcvt.c (noce_try_move): Likewise.
>> (noce_try_store_flag): Likewise.
>> (noce_try_store_flag_constants): Likewise.
>> (noce_try_addcc): Likewise.
>> (noce_try_store_flag_mask): Likewise.
>> (noce_try_cmove): Likewise.
>> (noce_try_cmove_arith): L

[Patch][AArch64] Implement TARGET_SHIFT_TRUNCATION_MASK.


Hi,

The attached patch implements TARGET_SHIFT_TRUNCATION_MASK target hook.

Regression-tested on aarch64-none-elf. OK for aarch64-branch?

Thanks,
Tejas Belagod
ARM.

PS: This patch applies over vldn-vstn.txt sent earlier.

Changelog:

2012-09-10  Tejas Belagod  

gcc/
* config/aarch64/aarch64.c (aarch64_shift_truncation_mask): Define.
(TARGET_SHIFT_TRUNCATION_MASK): Define.
* config/aarch64/aarch64.h (SHIFT_COUNT_TRUNCATED): Conditionalize on
TARGET_SIMD.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 20b23d2..7952530 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6677,6 +6677,14 @@ aarch64_simd_attr_length_move (rtx insn)
   return 4;
 }
 
+static unsigned HOST_WIDE_INT
+aarch64_shift_truncation_mask (enum machine_mode mode)
+{
+  return
+(aarch64_vector_mode_supported_p (mode)
+ || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 1);
+}
+
 #ifndef TLS_SECTION_ASM_FLAG
 #define TLS_SECTION_ASM_FLAG 'T'
 #endif
@@ -6930,6 +6938,9 @@ aarch64_c_mode_for_suffix (char suffix)
 #undef TARGET_SECONDARY_RELOAD
 #define TARGET_SECONDARY_RELOAD aarch64_secondary_reload
 
+#undef TARGET_SHIFT_TRUNCATION_MASK
+#define TARGET_SHIFT_TRUNCATION_MASK aarch64_shift_truncation_mask
+
 #undef TARGET_SETUP_INCOMING_VARARGS
 #define TARGET_SETUP_INCOMING_VARARGS aarch64_setup_incoming_varargs
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 28cafa9..8dfcd44 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -786,7 +786,7 @@ enum aarch64_builtins
: 0)
 
 
-#define SHIFT_COUNT_TRUNCATED 1
+#define SHIFT_COUNT_TRUNCATED !TARGET_SIMD
 
 /* Callee only saves lower 64-bits of a 128-bit register.  Tell the
compiler the callee clobbers the top 64-bits when restoring the

Re: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access

On 09/06/2012 10:19 AM, Ian Bolton wrote:
> Based on that, and assuming I remove the constraints on the
> pattern, would you say the patch is worthy of commit?

Can you send me the test case you were looking at for this?


r~

Re: [PATCH] Fix PR54492

On Mon, 2012-09-10 at 16:56 +0200, Richard Guenther wrote:
> On Mon, 10 Sep 2012, Jakub Jelinek wrote:
> 
> > On Mon, Sep 10, 2012 at 04:45:24PM +0200, Richard Guenther wrote:
> > > On Mon, 10 Sep 2012, William J. Schmidt wrote:
> > > 
> > > > Richard found some N^2 behavior in SLSR that has to be suppressed.
> > > > Searching for the best possible basis is overkill when there are
> > > > hundreds of thousands of possibilities.  This patch constrains the
> > > > search to "good enough" in such cases.
> > > > 
> > > > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no
> > > > regressions.  Ok for trunk?
> > > 
> > > Hm, rather than stopping the search, can we stop adding new candidates
> > > instead so the list never grows that long?  If that's not easy
> > > the patch is ok as-is.
> > 
> > Don't we want a param for that, or is a hardcoded magic constant fine here?
> 
> I suppose a param for it would be nice.

OK, I'll get a param in place and get back to you.  Thanks...

Bill

> 
> Richard.
> 
> > > > 2012-08-10  Bill Schmidt  
> > > > 
> > > > * gimple-ssa-strength-reduction.c (find_basis_for_candidate): 
> > > > Limit
> > > > the time spent searching for a basis.
> > > > 
> > > > 
> > > > Index: gcc/gimple-ssa-strength-reduction.c
> > > > ===
> > > > --- gcc/gimple-ssa-strength-reduction.c (revision 191135)
> > > > +++ gcc/gimple-ssa-strength-reduction.c (working copy)
> > > > @@ -353,10 +353,14 @@ find_basis_for_candidate (slsr_cand_t c)
> > > >cand_chain_t chain;
> > > >slsr_cand_t basis = NULL;
> > > >  
> > > > +  // Limit potential of N^2 behavior for long candidate chains.
> > > > +  int iters = 0;
> > > > +  const int MAX_ITERS = 50;
> > > > +
> > > >mapping_key.base_expr = c->base_expr;
> > > >chain = (cand_chain_t) htab_find (base_cand_map, &mapping_key);
> > > >  
> > > > -  for (; chain; chain = chain->next)
> > > > +  for (; chain && iters < MAX_ITERS; chain = chain->next, ++iters)
> > > >  {
> > > >slsr_cand_t one_basis = chain->cand;
> > 
> > Jakub
> > 
> > 
>

Re: [Patch ARM] implement bswap16

2012-09-10 Thread Christophe Lyon

On 7 September 2012 17:28, Richard Earnshaw  wrote:
>
> Ah, sigh!  I'd forgotten about the cond-exec issue.  That makes things
> a little awkward, since we also have to deal with the fact that thumb1
> does not support predication.  The solution, unfortunately, is thus a
> bit more involved.
>
Sorry if your suggestion makes me ask a few more questions :-)

> What we need are two patterns (although currently it looks like we've got two,
> in reality the predication means there are three), which need to read:
What is the advantage of the version you propose?
I mean there are already two explicit patterns, your proposal does not
really bring factorization since we end up with two patterns.

> (define_insn "*arm_revsh"
>   [(set (match_operand:SI 0 "s_register_operand" "=l,l,r")
> (sign_extend:SI (bswap:HI (match_operand:HI 1
> "s_register_operand" "l,l,r"]
>   "arm_arch6"
>   "@
>revsh\t%0, %1
>revsh%?\t%0, %1
>revsh%?\t%0, %1"
Why do we have to keep room for the predicate here? (%?) Doesn't this
pattern match only in unconditional cases?

BTW, I didn't manage to have GCC generate conditional revsh. I merely
added an "if (y)" guard before calling builtin_bswap16, but this
didn't turn into a conditional revsh.

>   [(set_attr "arch" "t1,t2,32")
>(set_attr "length" "2,2,4")]



> (define_insn "*arm_revsh_cond"
>   [(cond_exec (match_operator 2 "arm_comparison_operator"
>[(match_operand 3 "cc_register" "") (const_int 0)])
>   (set (match_operand:SI 0 "s_register_operand" "=l,r")
>(sign_extend:SI (bswap:HI (match_operand:HI 1 
> "s_register_operand" "l,r")]
>   "TARGET32_BIT && arm_arch6"
>   "revsh%?\t%0, %1"
>   [(set_attr "arch" "t2,*")
>(set_attr "length" "2,4")])
>
> Note that this removes the "predicable" attribute as we now handle this
> manually rather than with the auto-generation.
>
> Sorry, this has turned out to be more complex than I originally realised.

I understand that this is also applicable to the existing arm_rev and
thumb1_rev patterns for 32 bit swaps. I'd like to understand the
rationale & implications of your proposal.

Thanks

Christophe.

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

On 09/07/2012 02:00 PM, Iyer, Balaji V wrote:
> So, if I am understanding this correctly, there is no way to have the
> vectorization turned on/off on a function by function basis? I don't
> mind if it is turned off for -O0, but would like it be turned on/off
> for anything > -O1.

There's probably no reason that we can't enable vectorization on a loop
by loop basis.  Given that we're keeping a bit attached to the loop itself
for #pragma simd anyway.

This ought not be terribly difficult to arrange...

r~

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

On 09/07/2012 12:31 PM, Iyer, Balaji V wrote:
> I hope I have not mistaken your question, but to clarify the
> elemental function's definition and body is visible to all passes
> after the invocation of gimplify_function_tree (). It is also visible
> for the LTO optimization.

If that's the case, what's the point in defining an external ABI and
defining what __attribute__((vector)) placed on a function declaration
means?

r~

Re: [PATCH 5/6] Thread pointer built-in functions, xtensa [PING]

2012-09-10 Thread Sterling Augustine

On Sun, Sep 9, 2012 at 11:31 PM, Chung-Lin Tang  wrote:
> On 2012/8/28 下午 04:15, Chung-Lin Tang wrote:
>> On 12/7/12 下午2:52, Chung-Lin Tang wrote:
>> Xtensa parts updated to use MD pattern.
>>
>> Thanks,
>> Chung-Lin
>>
>> * config/xtensa/xtensa.md (get_thread_pointersi): Renamed from
>> load_tp.
>> (set_thread_pointersi): Renamed from set_tp.
>> * config/xtensa/xtensa.c
>> (xtensa_legitimize_tls_address): Change gen_load_tp calls to
>> gen_get_thread_pointersi.
>> (xtensa_builtin): Remove XTENSA_BUILTIN_THREAD_POINTER and
>> XTENSA_BUILTIN_SET_THREAD_POINTER.
>> (xtensa_init_builtins): Remove __builtin_thread_pointer,
>> __builtin_set_thread_pointer machine-specific builtins.
>> (xtensa_fold_builtin): Remove XTENSA_BUILTIN_THREAD_POINTER,
>> XTENSA_BUILTIN_SET_THREAD_POINTER cases.
>> (xtensa_expand_builtin): Remove XTENSA_BUILTIN_THREAD_POINTER,
>> XTENSA_BUILTIN_SET_THREAD_POINTER cases.
>>
>
> Ping.
>

This is OK for xtensa.

RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access

2012-09-10 Thread Ian Bolton

> Can you send me the test case you were looking at for this?

See attached.  (Most of it is superfluous, but the point is that
we are not using the address to do a memory access.)

Cheers,
Ian
 

constant-test1.c
Description: Binary data

RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-10 Thread Iyer, Balaji V



>-Original Message-
>From: Richard Henderson [mailto:r...@redhat.com]
>Sent: Monday, September 10, 2012 12:03 PM
>To: Iyer, Balaji V
>Cc: Richard Guenther; gcc-patches@gcc.gnu.org; Gabriel Dos Reis; Aldy
>Hernandez (al...@redhat.com); Jeff Law
>Subject: Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)
>
>On 09/07/2012 12:31 PM, Iyer, Balaji V wrote:
>> I hope I have not mistaken your question, but to clarify the elemental
>> function's definition and body is visible to all passes after the
>> invocation of gimplify_function_tree (). It is also visible for the
>> LTO optimization.
>
>If that's the case, what's the point in defining an external ABI and defining 
>what
>__attribute__((vector)) placed on a function declaration means?

When you have __attribute__((vector)) you are asking the compiler to create a 
vector AND a scalar version of the function. The advantage is that if the 
function is used, for example, in 2 loops where 1 can be vectorized and another 
cannot, the vectorizable loop won't suffer (i.e. suffer from being 
not-vectorized).

Thanks,

Balaji V. Iyer.

>
>
>r~

RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-10 Thread Iyer, Balaji V



>-Original Message-
>From: Richard Henderson [mailto:r...@redhat.com]
>Sent: Monday, September 10, 2012 12:01 PM
>To: Iyer, Balaji V
>Cc: Jakub Jelinek; Andi Kleen; Richard Guenther; gcc-patches@gcc.gnu.org;
>Gabriel Dos Reis; Aldy Hernandez (al...@redhat.com); Jeff Law
>Subject: Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)
>
>On 09/07/2012 02:00 PM, Iyer, Balaji V wrote:
>> So, if I am understanding this correctly, there is no way to have the
>> vectorization turned on/off on a function by function basis? I don't
>> mind if it is turned off for -O0, but would like it be turned on/off
>> for anything > -O1.
>
>There's probably no reason that we can't enable vectorization on a loop by loop
>basis.  Given that we're keeping a bit attached to the loop itself for #pragma 
>simd
>anyway.
>
>This ought not be terribly difficult to arrange...

Can you please help me get a start on how to get can be done? From what I 
understand (please correct me if I am wrong), this requires rearranging and 
duplicating a lot of passes and can potentially open up to a lot of bugs.

>
>
>r~

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

On 09/10/2012 09:11 AM, Iyer, Balaji V wrote:
> Can you please help me get a start on how to get can be done? From
> what I understand (please correct me if I am wrong), this requires
> rearranging and duplicating a lot of passes and can potentially open
> up to a lot of bugs.

Certainly not duplicating passes.  And probably not even rearranging them.

The Important parts are:

  (1) Having a bit in "struct loop" that indicates the special semantics
  you have for #pragma simd.  I don't know if maybe all loops inside an
  elemental function are so automatically marked?

  (2) Have bits in "struct function" that summarize the contents of the
  bit from "struct loop", for all loops in the function.  Note that
  this bit would need to be updated during inlining.

  (3) Change the "gate" predicates for the relevant function to also check
  the bit from "struct function".  In some cases the pass might need
  to run globally (perhaps if-conversion?) and in some cases the pass
  might be able to restrict work to specific loops (e.g. the vectorizer),
  skipping loops for which the optimization is not enabled.

r~

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

On 09/10/2012 09:09 AM, Iyer, Balaji V wrote:
>> >If that's the case, what's the point in defining an external ABI and 
>> >defining what
>> >__attribute__((vector)) placed on a function declaration means?

> When you have __attribute__((vector)) you are asking the compiler to
> create a vector AND a scalar version of the function. The advantage
> is that if the function is used, for example, in 2 loops where 1 can
> be vectorized and another cannot, the vectorizable loop won't suffer
> (i.e. suffer from being not-vectorized).

You've totally mis-understood my point.

Whether or not the compiler creates a clone COULD BE totally up to the
compiler, based on whether or not vectorization is enabled, whether the
loop has been analyzed such that vectorization may proceed, or indeed
the phase of the moon.

But in order for that to happen, the clone must be totally private to
the module for which we are generating code (in the LTO sense, this is
the entire program or dll; without LTO, this is just the object file).
It means that we never attempt to generate clones for functions for
which the body of the function is not visible.

On the other hand, if you insist on assuming a clone exists merely
because a declaration bears an attribute, then you must address ALL
of the problems with respect to defining a stable ABI in the face of
different cpu revisions, different ISAs, and different vector lengths.

I've not seen you address ANY of these problems, despite having the
problem pointed out multiple times.

r~

Re: [PATCH] Combine location with block using block_locations

2012-09-10 Thread Jan Hubicka

Hi,
I was curious how the patch behaves memory wise on compilling Mozilla.  It 
actually crashes on:
(gdb) bt
#0  0x7fab8cd70945 in raise () from /lib64/libc.so.6
#1  0x7fab8cd71f21 in abort () from /lib64/libc.so.6
#2  0x00b52330 in linemap_location_from_macro_expansion_p (set=0x7805, 
location=30725) at ../../libcpp/line-map.c:952
#3  0x00b526fc in linemap_lookup (set=0x7fab8dc34000, line=0) at 
../../libcpp/line-map.c:644
#4  0x00776745 in maybe_unwind_expanded_macro_loc (where=0, 
diagnostic=, context=) at 
../../gcc/tree-diagnostic.c:113
#5  virt_loc_aware_diagnostic_finalizer (context=0x11b8a80, 
diagnostic=0x7fff4d8adf90) at ../../gcc/tree-diagnostic.c:282
#6  0x00b4aa80 in diagnostic_report_diagnostic (context=0x11b8a80, 
diagnostic=0x7fff4d8adf90) at ../../gcc/diagnostic.c:652
#7  0x00b4acd6 in internal_error (gmsgid=) at 
../../gcc/diagnostic.c:957
#8  0x007555c0 in crash_signal (signo=11) at ../../gcc/toplev.c:335
#9  
#10 0x00b526e8 in linemap_lookup (set=0x7fab8dc34000, line=4294967295) 
at ../../libcpp/line-map.c:643
#11 0x00b530fa in linemap_location_in_system_header_p 
(set=0x7fab8dc34000, location=4294967295) at ../../libcpp/line-map.c:916
#12 0x00b4a8b2 in diagnostic_report_diagnostic (context=0x11b8a80, 
diagnostic=0x7fff4d8ae620) at ../../gcc/diagnostic.c:513
#13 0x00b4b462 in warning_at (location=, opt=0, 
gmsgid=) at ../../gcc/diagnostic.c:805
#14 0x00699679 in lto_symtab_merge_decls_2 (diagnosed_p=, slot=) at ../../gcc/lto-symtab.c:574
#15 lto_symtab_merge_decls_1 (slot=, data=) at 
../../gcc/lto-symtab.c:691
#16 0x00bd32e8 in htab_traverse_noresize (htab=, 
callback=0x698ed0 , info=0x0) at 
../../libiberty/hashtab.c:784
#17 0x004e2630 in read_cgraph_and_symbols (nfiles=2849, 
fnames=) at ../../gcc/lto/lto.c:1824
#18 0x004e2b75 in lto_main () at ../../gcc/lto/lto.c:2107

It seems that warning_at is not really able to lookup the position.

Honza

Re: status of -fstack-protector-strong?

2012-09-10 Thread 沈涵

Hi, ping, could any one take a look at this patch? Thanks,

-Han

On Fri, Sep 7, 2012 at 4:07 PM, Kees Cook  wrote:
>
> Hi,
>
> I'm curious about the status of this patch:
> http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00974.html
>
> Chrome OS uses this, and the Ubuntu Security Team has expressed
> interest in it as well. What's needed to land this in gcc?
>
> Thanks!
>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security




--
Han Shen |  Software Engineer |  shen...@google.com |  +1-650-440-3330

Re: [PATCH] Enable bbro for -Os

2012-09-10 Thread Eric Botcazou

> All other comments are accepted.
> 
> The updated patch is attached. Is it OK?

As you probably gathered, I had missed that Steven and Richard had already 
commented on your patch before posting my message.  Sorry about that...

I think that the patch is interesting because, even if it doesn't exactly 
implement what the comment in gate_handle_reorder_blocks was talking about, it 
fixes code layout regressions without increasing the code size (and even 
decreasing it).  So, assuming that Steven and Richard don't strongly oppose, I 
think the patch is OK modulo the following nits:

+   The above description is for the full algorithm, which is used when the
+   function is optimized for speed.  When the function is optimized for size,
+   in order to reduce long jump and connect more fall through edges, the

long jumps... bb-reorder.c uses "fallthru edges" consistently.

+   algorithm is modified as follows:
+   (1) Break long trace to short ones.  The trace is broken at a block, which
+   has multi-predecessors/successors during finding traces.

long traces... A trace is broken at a block that has multiple predecessors/ 
successors during trace discovery.

+   (2) Ignore the edge probability and frequency for fall through edges.

fallthru

+   (3) Keep its original order when there is no chance to fall through.  bbro

Keep the original order of blocks...  We rely on the results of cfg_cleanup

+   bases on the result of cfg_cleanup, which does lots of optimizations on 
cfg.
+   So the order is expected to be kept if no fall through.
+
+   To implement the change for code size optimization, block's index is
+   selected as the key and all traces are found in one round.


+ /* If the best destination has multiple successors or predecessors,
+don't allow it to be added when optimizing for size.  This makes
+sure predecessors with smaller index handled before the best
+destination.  It breaks long trace and reduces long jumps.

missing "are" before "handled"


+After removing the best edge, the final result will be ABCD/ACBD.
+It does not add jump compared with the previous order. But it
+reduce the possibility of long jump.  */

Double space before "But".


+  if (optimize_function_for_size_p (cfun))
+{
+  e_index = src_index_p ? e->src->index : e->dest->index;
+  b_index = src_index_p ? cur_best_edge->src->index
+ : cur_best_edge->dest->index;
+  /* The smaller one is better to keep the original order.  */
+  return b_index > e_index;
+} 

Trailing space after the last parenthesis.


+ /* If dest has multiple predecessors, skip it.  We expect
+that one predecessor with smaller index connect with it
+later.  */

connects


+ /* Only connect Trace n with Trace n + 1.  It is conservative
+to keep the order as close as possible to the original order.
+It also helps to reduce long jump.  */

long jumps


Thanks for working on this.

-- 
Eric Botcazou

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-10 Thread Andi Kleen

On Mon, Sep 10, 2012 at 09:30:15AM -0700, Richard Henderson wrote:
> On 09/10/2012 09:11 AM, Iyer, Balaji V wrote:
> > Can you please help me get a start on how to get can be done? From
> > what I understand (please correct me if I am wrong), this requires
> > rearranging and duplicating a lot of passes and can potentially open
> > up to a lot of bugs.
> 
> Certainly not duplicating passes.  And probably not even rearranging them.

It would be great if unrolling was also done loop by loop in a similar
way. I often wanted that (only enable it for some loop, not the whole file)
And a lot of other compilers have pragmas for this, just not gcc.
As I understand vectorization needs some unrolling anyways?

-Andi

Re: [patch, mips] New mips triplet for multilib linux builds

2012-09-10 Thread Steve Ellcey

On Sat, 2012-09-08 at 13:50 +0100, Richard Sandiford wrote:

> Should add BASE_DRIVER_SELF_SPECS too.  OK with that change, thanks.
> And thanks for your patience.
> 
> Richard

I added BASE_DRIVER_SELF_SPECS and did the checkin.  Thanks for all
your help and advice.

Steve Ellcey
sell...@mips.com

Re: [Patch ARM] implement bswap16

2012-09-10 Thread Richard Earnshaw

On 10/09/12 16:40, Christophe Lyon wrote:
> On 7 September 2012 17:28, Richard Earnshaw  wrote:
>>
>> Ah, sigh!  I'd forgotten about the cond-exec issue.  That makes things
>> a little awkward, since we also have to deal with the fact that thumb1
>> does not support predication.  The solution, unfortunately, is thus a
>> bit more involved.
>>
> Sorry if your suggestion makes me ask a few more questions :-)
> 

No problem :-)

>> What we need are two patterns (although currently it looks like we've got 
>> two,
>> in reality the predication means there are three), which need to read:
> What is the advantage of the version you propose?
> I mean there are already two explicit patterns, your proposal does not
> really bring factorization since we end up with two patterns.
> 

The code generated by recog would have to recoginize three possible
patterns otherwise.  Predication effectively goes through the insn list
and generates additional patterns for each predicable insn.  You never
see them, but they're in there somewhere...

It's relatively minor but it does lead to a slightly simpler recognizer,
which should mean a smaller, faster compiler.

>> (define_insn "*arm_revsh"
>>   [(set (match_operand:SI 0 "s_register_operand" "=l,l,r")
>> (sign_extend:SI (bswap:HI (match_operand:HI 1
>> "s_register_operand" "l,l,r"]
>>   "arm_arch6"
>>   "@
>>revsh\t%0, %1
>>revsh%?\t%0, %1
>>revsh%?\t%0, %1"
> Why do we have to keep room for the predicate here? (%?) Doesn't this
> pattern match only in unconditional cases?
> 

Because the ARM back-end has a very late conditionalizer pass that can
also generate conditional execution.  It very rarely kicks in these
days, but if the predication rules are in there you could end up with an
instruction that the compiler thought was conditionally executed being
always run.  That would be bad^TM.


> BTW, I didn't manage to have GCC generate conditional revsh. I merely
> added an "if (y)" guard before calling builtin_bswap16, but this
> didn't turn into a conditional revsh.
> 
>>   [(set_attr "arch" "t1,t2,32")
>>(set_attr "length" "2,2,4")]
> 
> 
> 
>> (define_insn "*arm_revsh_cond"
>>   [(cond_exec (match_operator 2 "arm_comparison_operator"
>>[(match_operand 3 "cc_register" "") (const_int 0)])
>>   (set (match_operand:SI 0 "s_register_operand" "=l,r")
>>(sign_extend:SI (bswap:HI (match_operand:HI 1 
>> "s_register_operand" "l,r")]
>>   "TARGET32_BIT && arm_arch6"
>>   "revsh%?\t%0, %1"
>>   [(set_attr "arch" "t2,*")
>>(set_attr "length" "2,4")])
>>
>> Note that this removes the "predicable" attribute as we now handle this
>> manually rather than with the auto-generation.
>>
>> Sorry, this has turned out to be more complex than I originally realised.
> 
> I understand that this is also applicable to the existing arm_rev and
> thumb1_rev patterns for 32 bit swaps. I'd like to understand the
> rationale & implications of your proposal.
> 
> Thanks
> 
> Christophe.
> 

R.

Re: [PATCH] Combine location with block using block_locations

2012-09-10 Thread Dehao Chen

Thanks for helping test this. I'll try to build mozzila to check the
memory consumption as well as find new bugs.

Dehao

On Tue, Sep 11, 2012 at 12:41 AM, Jan Hubicka  wrote:
> Hi,
> I was curious how the patch behaves memory wise on compilling Mozilla.  It 
> actually crashes on:
> (gdb) bt
> #0  0x7fab8cd70945 in raise () from /lib64/libc.so.6
> #1  0x7fab8cd71f21 in abort () from /lib64/libc.so.6
> #2  0x00b52330 in linemap_location_from_macro_expansion_p 
> (set=0x7805, location=30725) at ../../libcpp/line-map.c:952
> #3  0x00b526fc in linemap_lookup (set=0x7fab8dc34000, line=0) at 
> ../../libcpp/line-map.c:644
> #4  0x00776745 in maybe_unwind_expanded_macro_loc (where=0, 
> diagnostic=, context=) at 
> ../../gcc/tree-diagnostic.c:113
> #5  virt_loc_aware_diagnostic_finalizer (context=0x11b8a80, 
> diagnostic=0x7fff4d8adf90) at ../../gcc/tree-diagnostic.c:282
> #6  0x00b4aa80 in diagnostic_report_diagnostic (context=0x11b8a80, 
> diagnostic=0x7fff4d8adf90) at ../../gcc/diagnostic.c:652
> #7  0x00b4acd6 in internal_error (gmsgid=) at 
> ../../gcc/diagnostic.c:957
> #8  0x007555c0 in crash_signal (signo=11) at ../../gcc/toplev.c:335
> #9  
> #10 0x00b526e8 in linemap_lookup (set=0x7fab8dc34000, 
> line=4294967295) at ../../libcpp/line-map.c:643
> #11 0x00b530fa in linemap_location_in_system_header_p 
> (set=0x7fab8dc34000, location=4294967295) at ../../libcpp/line-map.c:916
> #12 0x00b4a8b2 in diagnostic_report_diagnostic (context=0x11b8a80, 
> diagnostic=0x7fff4d8ae620) at ../../gcc/diagnostic.c:513
> #13 0x00b4b462 in warning_at (location=, opt=0, 
> gmsgid=) at ../../gcc/diagnostic.c:805
> #14 0x00699679 in lto_symtab_merge_decls_2 (diagnosed_p= out>, slot=) at ../../gcc/lto-symtab.c:574
> #15 lto_symtab_merge_decls_1 (slot=, data=) at 
> ../../gcc/lto-symtab.c:691
> #16 0x00bd32e8 in htab_traverse_noresize (htab=, 
> callback=0x698ed0 , info=0x0) at 
> ../../libiberty/hashtab.c:784
> #17 0x004e2630 in read_cgraph_and_symbols (nfiles=2849, 
> fnames=) at ../../gcc/lto/lto.c:1824
> #18 0x004e2b75 in lto_main () at ../../gcc/lto/lto.c:2107
>
> It seems that warning_at is not really able to lookup the position.
>
> Honza

Re: [PATCH] Fix PR54492

Here's the revised patch with a param.  Bootstrapped and tested in the
same manner.  Ok for trunk?

Thanks,
Bill


2012-08-10  Bill Schmidt  

* doc/invoke.texi (max-slsr-cand-scan): New description.
* gimple-ssa-strength-reduction.c (find_basis_for_candidate): Limit
the time spent searching for a basis.
* params.def (PARAM_MAX_SLSR_CANDIDATE_SCAN): New param.


Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 191135)
+++ gcc/doc/invoke.texi (working copy)
@@ -9407,6 +9407,11 @@ having a regular register file and accurate regist
 See @file{haifa-sched.c} in the GCC sources for more details.
 
 The default choice depends on the target.
+
+@item max-slsr-cand-scan
+Set the maximum number of existing candidates that will be considered when
+seeking a basis for a new straight-line strength reduction candidate.
+
 @end table
 @end table
 
Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 191135)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "domwalk.h"
 #include "pointer-set.h"
 #include "expmed.h"
+#include "params.h"
 
 /* Information about a strength reduction candidate.  Each statement
in the candidate table represents an expression of one of the
@@ -353,10 +354,14 @@ find_basis_for_candidate (slsr_cand_t c)
   cand_chain_t chain;
   slsr_cand_t basis = NULL;
 
+  // Limit potential of N^2 behavior for long candidate chains.
+  int iters = 0;
+  int max_iters = PARAM_VALUE (PARAM_MAX_SLSR_CANDIDATE_SCAN);
+
   mapping_key.base_expr = c->base_expr;
   chain = (cand_chain_t) htab_find (base_cand_map, &mapping_key);
 
-  for (; chain; chain = chain->next)
+  for (; chain && iters < max_iters; chain = chain->next, ++iters)
 {
   slsr_cand_t one_basis = chain->cand;
 
Index: gcc/params.def
===
--- gcc/params.def  (revision 191135)
+++ gcc/params.def  (working copy)
@@ -973,6 +973,13 @@ DEFPARAM (PARAM_SCHED_PRESSURE_ALGORITHM,
  "Which -fsched-pressure algorithm to apply",
  1, 1, 2)
 
+/* Maximum length of candidate scans in straight-line strength reduction.  */
+DEFPARAM (PARAM_MAX_SLSR_CANDIDATE_SCAN,
+ "max-slsr-cand-scan",
+ "Maximum length of candidate scans for straight-line "
+ "strength reduction",
+ 50, 1, 99)
+
 /*
 Local variables:
 mode:c

Re: VxWorks Patches Back from the Dead!

2012-09-10 Thread rbmj

On 9/10/2012 9:35 AM, Bruce Korb wrote:

On 09/09/12 08:54, rbmj wrote:

Just because I *love* bothering everyone with emails...

I don't mind, as long as you don't expect me to do anything
until I'm certain you've stabilized the patch ;)
I'm glad you rolled it up into one patch, because I was
eventually going to ask you to do that. Thank you.

I keep thinking everything is stable, but then something changes
(bitrot? something elsewhere in GCC? I don't know) and I have to
regroup. Sorry for changing everything 10 times - please bear with me.

At this point, I've recompiled with different settings about 10 times
and it hasn't broken itself yet. I've tried in a different VM and it
works there too. So *hopefully* it should be good.

On the other hand, I've read this on the website:

Don't mix together changes made for different reasons. Send them individually. Ideally, each change you send should be impossible to subdivide into
parts that we might want to consider separately, because each of its
parts gets its motivation from the other parts

"impossible to subdivide into parts" seems like one patch per
fixincludes rule (am I looking at the wrong level of granularity here?).
At the same time, it's a pain in the rear to worry about 12 different
commits (especially when I'm making changes, I git rebase a TON). I'm
also not sure about practicality of this approach in terms of the amount
of work it creates on all ends.

Unless cosmic rays break everything again, that should be all.

Thanks,
Robert Mason

[alpha] Fix GPREL16 relocation error building glibc

There's an access within glibc wherein we've forced an important global 
variable into the .sdata section (so that accesses to its members can use 
16-bit relocations), but an array access gets constant-folded such that we 
produce an offset well well outside of a 16-bit range.

A test case that must be visually inspected looks like the following.  I'm not 
certain how to turn this into a portable link-time test.  But considering that 
other ports also handle SYMBOL_REF_SMALL_DATA, it does seem like something 
portable would be nice.  Ideas?  Or should I just drop this in as an alpha 
specific test?

extern int x[10] __attribute__((visibility("hidden"), section(".sdata")));

int foo(int y)
{
  return x[y-10];
}


Committed the patch itself to mainline and 4.7 branch.



r~
diff --git a/gcc/config/alpha/predicates.md b/gcc/config/alpha/predicates.md
index 598742f..0a1885b 100644
--- a/gcc/config/alpha/predicates.md
+++ b/gcc/config/alpha/predicates.md
@@ -328,26 +328,50 @@
 (define_predicate "small_symbolic_operand"
   (match_code "const,symbol_ref")
 {
+  HOST_WIDE_INT ofs = 0, max_ofs = 0;
+
   if (! TARGET_SMALL_DATA)
-return 0;
+return false;
 
   if (GET_CODE (op) == CONST
   && GET_CODE (XEXP (op, 0)) == PLUS
   && CONST_INT_P (XEXP (XEXP (op, 0), 1)))
-op = XEXP (XEXP (op, 0), 0);
+{
+  ofs = INTVAL (XEXP (XEXP (op, 0), 1));
+  op = XEXP (XEXP (op, 0), 0);
+}
 
   if (GET_CODE (op) != SYMBOL_REF)
-return 0;
+return false;
 
   /* ??? There's no encode_section_info equivalent for the rtl
  constant pool, so SYMBOL_FLAG_SMALL never gets set.  */
   if (CONSTANT_POOL_ADDRESS_P (op))
-return GET_MODE_SIZE (get_pool_mode (op)) <= g_switch_value;
+{
+  max_ofs = GET_MODE_SIZE (get_pool_mode (op));
+  if (max_ofs > g_switch_value)
+   return false;
+}
+  else if (SYMBOL_REF_LOCAL_P (op)
+   && SYMBOL_REF_SMALL_P (op)
+   && !SYMBOL_REF_WEAK (op)
+   && !SYMBOL_REF_TLS_MODEL (op))
+{
+  if (SYMBOL_REF_DECL (op))
+max_ofs = tree_low_cst (DECL_SIZE_UNIT (SYMBOL_REF_DECL (op)), 1);
+}
+  else
+return false;
 
-  return (SYMBOL_REF_LOCAL_P (op)
- && SYMBOL_REF_SMALL_P (op)
- && !SYMBOL_REF_WEAK (op)
- && !SYMBOL_REF_TLS_MODEL (op));
+  /* Given that we know that the GP is always 8 byte aligned, we can
+ always adjust by 7 without overflowing.  */
+  if (max_ofs < 8)
+max_ofs = 8;
+
+  /* Since we know this is an object in a small data section, we know the
+ entire section is addressable via GP.  We don't know where the section
+ boundaries are, but we know the entire object is within.  */
+  return IN_RANGE (ofs, 0, max_ofs - 1);
 })
 
 ;; Return true if OP is a SYMBOL_REF or CONST referencing a variable

Re: [PATCH] Add option for dumping to stderr (issue6190057)

2012-09-10 Thread Sharad Singhai

Ping.

Thanks,
Sharad
Sharad


On Wed, Sep 5, 2012 at 10:34 AM, Sharad Singhai  wrote:
> Ping.
>
> Thanks,
> Sharad
>
> Sharad
>
>
>
>
> On Fri, Aug 24, 2012 at 1:06 AM, Sharad Singhai  wrote:
>>
>> Sorry about the delay. Please see comments inline.
>>
>> On Wed, Jul 4, 2012 at 6:33 AM, Richard Guenther
>>  wrote:
>> > On Tue, Jul 3, 2012 at 11:07 PM, Sharad Singhai 
>> > wrote:
>> >> Apologies for the spam. Attempting to resend the patch after shrinking
>> >> it.
>> >>
>> >> I have updated the attached patch to use a new dump message
>> >> classification system for the vectorizer. It currently uses four
>> >> classes, viz, MSG_OPTIMIZED_LOCATIONS, MSG_UNOPTIMIZED_LOCATION,
>> >> MSG_MISSING_OPTIMIZATION, and MSG_NOTE. I have gone through the
>> >> vectorizer passes and have converted each call to fprintf (dump_file,
>> >> ) to a message classification matching in spirit. Most often, it
>> >> is MSG_OPTIMIZED_LOCATIONS, but occasionally others as well.
>> >>
>> >> For example, the following
>> >>
>> >> if (vect_print_dump_info (REPORT_DETAILS))
>> >>   {
>> >> fprintf (vect_dump, "niters for prolog loop: ");
>> >> print_generic_expr (vect_dump, iters, TDF_SLIM);
>> >>   }
>> >>
>> >> gets converted to
>> >>
>> >> if (dump_kind (MSG_OPTIMIZED_LOCATIONS))
>> >>   {
>> >>  dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> >>   "niters for prolog loop: ");
>> >>  dump_generic_expr (MSG_OPTIMIZED_LOCATIONS, TDF_SLIM, iters);
>> >>   }
>> >>
>> >> The asymmetry between the first printf and the second is due to the
>> >> fact that 'vect_print_dump_info (xxx)' prints the location as a
>> >> "side-effect". To preserve the original intent somewhat, I have
>> >> converted the first call within a dump sequence to a dump_printf_loc
>> >> (xxx) which prints the location while the subsequence calls within the
>> >> same conditional get converted to the corresponding plain variants.
>> >
>> > Ok, that looks reasonable.
>> >
>> >> I considered removing the support for alternate dump file, but ended
>> >> up preserving it instead since it is needed for setting the alternate
>> >> dump file to stderr for the case when -fopt-info is given but no dump
>> >> file is available.
>> >>
>> >> The following invocation
>> >> g++ ... -ftree-vectorize -fopt-info=4
>> >>
>> >> dumps *all* available information to stderr. Currently, the opt-info
>> >> level is common to all passes, i.e., a pass can't specify if wants a
>> >> different level of diagnostic info. This can be added as an
>> >> enhancement with a suitable syntax for selecting passes.
>> >>
>> >> I haven't fixed up the documentation/tests but wanted to get some
>> >> feedback about the current state of patch before doing that.
>> >
>> > Some comments / questions.
>> >
>> > +  if (dump_file && (dump_kind & opt_info_flags))
>> > +{
>> > +  dump_loc (dump_kind, dump_file, loc);
>> > +  print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
>> > +}
>> > +
>> > +  if (alt_dump_file && (dump_kind & opt_info_flags))
>> > +{
>> >
>> > you always test dump_kind against the same opt_info_flags variable.
>> > I would have thought that the alternate dump file has a different
>> > opt_info_flags
>> > setting so I can have -fdump-tree-vect-details -fopt-info=1.  Am I
>> > missing
>> > something?
>>
>> It was an oversight on my part. I have since fixed this. There are two
>> separate flags corresponding to the two types of dump files,
>>
>> pflags ==> pass private dump file
>> alt_flags ==> opt-info dump file
>>
>> > If I do
>> >
>> >> gcc file1.c file2.c -O3 -fdump-tree-vectorize=foo
>> >
>> > what will foo contain afterwards?  I think you need to document the
>> > behavior
>> > when such redirection is used with the compiler-driver feature of
>> > handling
>> > multiple translation units.  Especially the difference (or not
>> > difference) to
>> >
>> >> gcc file1.c -O3 -fdump-tree-vectorize=foo
>> >> gcc file2.c -O3 -fdump-tree-vectorize=foo
>>
>> Yes, the dump file gets overwritten during each invocation. I have
>> noted this in the documentation.
>>
>> > I suppose we do not want to append to foo (but eventually support that
>> > with some alternate syntax?  Like -fdump-tree-vectorize=+foo?)
>>
>> Yes, I agree. We could define a new syntax as you suggested for
>> appending to a dump file. However, this feature can wait for a
>> separate patch.
>>
>> > +
>> > +static void
>> > +set_opt_info (int opt_info_level)
>> > +{
>> >
>> > This function needs a comment.  How do the dump flags translate to
>> > the opt-info flags?  Is this documented anywhere?  I only see
>> >
>> > +/* Different types of dump classifications.  */
>> > +enum dump_msg_kind {
>> > +  MSG_NONE = 1 << 0,
>> > +  MSG_OPTIMIZED_LOCATIONS  = 1 << 1,
>> > +  MSG_UNOPTIMIZED_LOCATIONS= 1 << 2,
>> > +  MSG_MISSED_OPTIMIZATION  = 1 << 3,
>> > +  MSG_NOTE = 1 << 4
>> > +};
>>
>> Yes, my mapping was s

Re: [Patch, fortran] PR46897 - [OOP] type-bound defined ASSIGNMENT(=) not used for derived type component in intrinsic assign

2012-09-10 Thread Paul Richard Thomas

Dear All,

Please find attached a new attempt at the patch for PR46897.  It now
uses temporaries to overcome the side effects that Mikael pointed out.
 The resulting code can be quite profligate:

  infant0 = new_child()

produces

  ASSIGN main:da@0 new_child[[()]]
  ASSIGN main:da@1 main:infant0
  ASSIGN main:da@2 main:infant0
  ASSIGN main:infant0 main:da@0
  ASSIGN main:da@3 main:da@1 % parent
  ASSIGN main:da@4 main:da@1 % parent
  CALL assign0 ((main:da@3 % foo) (main:da@0 % parent % foo))
  ASSIGN main:da@1 % parent % foo main:da@3 % foo
  ASSIGN main:infant0 % parent main:da@1 % parent

It could be simplified, I suspect but I do not believe that it is
worth any more effort for what is, after all, well off the beaten
track.

The comments in resolve.c explain how the patch works.

Bootstrapped and regtested on FC9/x86_64 - OK for trunk?

Cheers

Paul

2012-09-10   Alessandro Fanfarillo 
 Paul Thomas  

PR fortran/46897
* gfortran.h : Add bit field 'defined_assign_comp' to
symbol_attribute structure.
Add primitive for gfc_add_full_array_ref.
* expr.c (gfc_add_full_array_ref): New function.
(gfc_lval_expr_from_sym): Call new function.
* resolve.c (add_comp_ref): New function.
(build_assignment): New function.
(get_temp_from_expr): New function
(add_code_to_chain): New function
(generate_component_assignments): New function that calls all
the above new functions.
(resolve_code): Call generate_component_assignments.

2012-09-10   Alessandro Fanfarillo 
 Paul Thomas  

PR fortran/46897
* gfortran.dg/defined_assignment_1.f90: New test.
* gfortran.dg/defined_assignment_2.f90: New test.
* gfortran.dg/defined_assignment_3.f90: New test.



On 14/08/2012, Paul Richard Thomas  wrote:
> Mikael,
>
> On 14 August 2012 10:42, Mikael Morin  wrote:
>> On 14/08/2012 07:03, Paul Richard Thomas wrote:
 However, if we do it before, we also overwrite components to be
 assigned
 with a typebound call, and this can have some side effects as the LHS's
 argument can be INTENT(INOUT).
>>>
>>> This might be so but it is what the standard dictates should
>>> happen isn't it?
>>>
>> It dictates that the components should be assigned one by one (by either
>> defined or intrinsic assignment), which I don't see as strictly
>> equivalent to a whole structure assignment followed by typebound calls
>> (for components needing it).
>
> Hmmm.  That's true.  ***sigh***
>
> I'll put it right.
>
> Cheers
>
> Paul
>
Index: gcc/fortran/gfortran.h
===
*** gcc/fortran/gfortran.h	(revision 191115)
--- gcc/fortran/gfortran.h	(working copy)
*** typedef struct
*** 786,794 
/* The symbol is a derived type with allocatable components, pointer 
   components or private components, procedure pointer components,
   possibly nested.  zero_comp is true if the derived type has no
!  component at all.  */
unsigned alloc_comp:1, pointer_comp:1, proc_pointer_comp:1,
! 	   private_comp:1, zero_comp:1, coarray_comp:1, lock_comp:1;
  
/* This is a temporary selector for SELECT TYPE.  */
unsigned select_type_temporary:1;
--- 786,796 
/* The symbol is a derived type with allocatable components, pointer 
   components or private components, procedure pointer components,
   possibly nested.  zero_comp is true if the derived type has no
!  component at all.  defined_assign_comp is true if the derived
!  type or an ancestor has a typebound defined assignment.  */
unsigned alloc_comp:1, pointer_comp:1, proc_pointer_comp:1,
! 	   private_comp:1, zero_comp:1, coarray_comp:1, lock_comp:1,
! 	   defined_assign_comp:1;
  
/* This is a temporary selector for SELECT TYPE.  */
unsigned select_type_temporary:1;
*** gfc_try gfc_check_assign_symbol (gfc_sym
*** 2761,2766 
--- 2763,2769 
  bool gfc_has_default_initializer (gfc_symbol *);
  gfc_expr *gfc_default_initializer (gfc_typespec *);
  gfc_expr *gfc_get_variable_expr (gfc_symtree *);
+ void gfc_add_full_array_ref (gfc_expr *, gfc_array_spec *);
  gfc_expr * gfc_lval_expr_from_sym (gfc_symbol *);
  
  gfc_array_spec *gfc_get_full_arrayspec_from_expr (gfc_expr *expr);
Index: gcc/fortran/expr.c
===
*** gcc/fortran/expr.c	(revision 191115)
--- gcc/fortran/expr.c	(working copy)
*** gfc_get_variable_expr (gfc_symtree *var)
*** 3878,3883 
--- 3878,3910 
  }
  
  
+ /* Adds a full array reference to an expression, as needed.  */
+ 
+ void
+ gfc_add_full_array_ref (gfc_expr *e, gfc_array_spec *as)
+ {
+   gfc_ref *ref;
+   for (ref = e->ref; ref; ref = ref->next)
+ if (!ref->next)
+   break;
+   if (ref)
+ {
+   ref->next = gfc_get_ref ();
+   ref = ref->next;
+ }
+   else
+ {
+   e

Re: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access

On 09/10/2012 09:09 AM, Ian Bolton wrote:
>> Can you send me the test case you were looking at for this?
> 
> See attached.  (Most of it is superfluous, but the point is that
> we are not using the address to do a memory access.)

Ok.  Having dug a bit deeper I think the main problem is that you're
working against yourself by not handling this pattern right from the
beginning.  You have split the address incorrectly to begin and are
now trying to recover after the fact.

The following patch seems to do the trick for me, producing

> (insn 6 5 7 (set (reg:DI 81)
> (high:DI (const:DI (plus:DI (symbol_ref:DI ("arr") [flags 0x80]  
> )
> (const_int 12 [0xc]) z.c:8 -1
>  (nil))
> 
> (insn 7 6 8 (set (reg:DI 80)
> (lo_sum:DI (reg:DI 81)
> (const:DI (plus:DI (symbol_ref:DI ("arr") [flags 0x80]   0x7f9bae1105f0 arr>)
> (const_int 12 [0xc]) z.c:8 -1
>  (expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI ("arr") [flags 
> 0x80]  )
> (const_int 12 [0xc])))
> (nil)))

right from the .150.expand dump.

I'll leave it to you to fully regression test and commit the patch
as appropriate.  ;-)


r~
Index: aarch64.md
===
--- aarch64.md  (revision 191152)
+++ aarch64.md  (working copy)
@@ -2840,7 +2840,7 @@
(lo_sum:DI (match_operand:DI 1 "register_operand" "r")
   (match_operand 2 "aarch64_valid_symref" "S")))]
   ""
-  "add\\t%0, %1, :lo12:%2"
+  "add\\t%0, %1, :lo12:%a2"
   [(set_attr "v8type" "alu")
(set_attr "mode" "DI")]
 
Index: aarch64.c
===
--- aarch64.c   (revision 191152)
+++ aarch64.c   (working copy)
@@ -652,43 +652,57 @@
   unsigned HOST_WIDE_INT val;
   bool subtargets;
   rtx subtarget;
-  rtx base, offset;
   int one_match, zero_match;
 
   gcc_assert (mode == SImode || mode == DImode);
 
-  /* If we have (const (plus symbol offset)), and that expression cannot
- be forced into memory, load the symbol first and add in the offset.  */
-  split_const (imm, &base, &offset);
-  if (offset != const0_rtx
-  && (targetm.cannot_force_const_mem (mode, imm)
- || (can_create_pseudo_p (
+  /* Check on what type of symbol it is.  */
+  if (GET_CODE (imm) == SYMBOL_REF
+  || GET_CODE (imm) == LABEL_REF
+  || GET_CODE (imm) == CONST)
 {
-  base = aarch64_force_temporary (dest, base);
-  aarch64_emit_move (dest, aarch64_add_offset (mode, NULL, base, INTVAL 
(offset)));
-  return;
-}
+  rtx mem, base, offset;
+  enum aarch64_symbol_type sty;
 
-  /* Check on what type of symbol it is.  */
-  if (GET_CODE (base) == SYMBOL_REF || GET_CODE (base) == LABEL_REF)
-{
-  rtx mem;
-  switch (aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR))
+  /* If we have (const (plus symbol offset)), separate out the offset
+before we start classifying the symbol.  */
+  split_const (imm, &base, &offset);
+
+  sty = aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR);
+  switch (sty)
{
case SYMBOL_FORCE_TO_MEM:
- mem  = force_const_mem (mode, imm);
+ if (offset != const0_rtx
+ && targetm.cannot_force_const_mem (mode, imm))
+   {
+ gcc_assert(can_create_pseudo_p ());
+ base = aarch64_force_temporary (dest, base);
+ base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
+ aarch64_emit_move (dest, base);
+ return;
+   }
+ mem = force_const_mem (mode, imm);
  gcc_assert (mem);
  emit_insn (gen_rtx_SET (VOIDmode, dest, mem));
  return;
 
-case SYMBOL_SMALL_TLSGD:
-case SYMBOL_SMALL_TLSDESC:
-case SYMBOL_SMALL_GOTTPREL:
-case SYMBOL_SMALL_TPREL:
+case SYMBOL_SMALL_TLSGD:
+case SYMBOL_SMALL_TLSDESC:
+case SYMBOL_SMALL_GOTTPREL:
case SYMBOL_SMALL_GOT:
+ if (offset != const0_rtx)
+   {
+ gcc_assert(can_create_pseudo_p ());
+ base = aarch64_force_temporary (dest, base);
+ base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
+ aarch64_emit_move (dest, base);
+ return;
+   }
+ /* FALLTHRU */
+
+case SYMBOL_SMALL_TPREL:
case SYMBOL_SMALL_ABSOLUTE:
- aarch64_load_symref_appropriately
-   (dest, imm, aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR));
+ aarch64_load_symref_appropriately (dest, imm, sty);
  return;
 
default:
@@ -696,7 +710,7 @@
}
 }
 
-  if ((CONST_INT_P (imm) && aarch64_move_imm (INTVAL (imm), mode)))
+  if (CONST_INT_P (imm) && aarch64_move_imm (INTVAL (imm), mode))
 {
   emit_insn (gen_rtx_SET (VOIDmode, dest, imm));
   return;

Re: [PATCH] Use -lgcc in libgcc_so linker script

2012-09-10 Thread Ian Lance Taylor

On Sun, Sep 9, 2012 at 12:29 PM, Andreas Schwab  wrote:
>
> PR target/46191
> * config/t-slibgcc-libgcc (SHLIB_MAKE_SOLINK): Use -lgcc instead
> of libgcc.a.

This is OK.

Thanks.

Ian

Re: VxWorks Patches Back from the Dead!

2012-09-10 Thread Bruce Korb

Hi,

On Mon, Sep 10, 2012 at 10:48 AM, rbmj  wrote:
> On the other hand, I've read this on the website:
>
>> Don't mix together changes made for different reasons. Send them
>> individually.  Ideally, each change you send should be impossible to
>> subdivide into
>
> parts that we might want to consider separately, because each of its parts
> gets its motivation from the other parts

OTOH, this is a fairly cohesive set of patches.
A single project.  Even if, strictly speaking, each include fix
is entirely separate from the others (by the design of fixincludes),
I see them as a cohesive set that ought to be in a single commit.
Fixes to fixes for fixincludes have been very infrequent.

> ... At the same
> time, it's a pain in the rear to worry about 12 different commits

I'm into comforting one's derriere.

> Unless cosmic rays break everything again, that should be all.

:)  OK.  I'll push it on your behalf once the other bits have been
approved by their approvers.

Cheers - Bruce

Re: [PATCH] Enable bbro for -Os

On 09/06/2012 02:56 AM, Zhenqiang Chen wrote:
> +   (3) Keep its original order when there is no chance to fall through.  bbro
> +   bases on the result of cfg_cleanup, which does lots of optimizations on 
> cfg.
> +   So the order is expected to be kept if no fall through.

Thanks for doing this.  Our kernel guys have been asking for something like this
for quite a while.  I am curious about the case of no fall through.  Especially
about using that opportunity to sort cold blocks to the end of the function.

I'm thinking here of stuff like switch with a default: abort(), or asm goto with
an explicitly annotated cold path.

r~

Re: [PATCH, libstdc++] Add proper OpenBSD support

2012-09-10 Thread Mark Kettenis

> 
> On 10 September 2012 07:34, Mark Kettenis wrote:
> >> Date: Sun, 9 Sep 2012 21:07:39 +0100
> >> From: Jonathan Wakely 
> >>
> >> On 4 September 2012 20:26, Mark Kettenis wrote:
> >> > Fixes a few testcases.  Mostly based on the existing
> >> > NetBSD/FreeBSD/Darwin code.
> >> >
> >> > 2012-09-04  Mark Kettenis  
> >> >
> >> > * configure.host (*-*-openbsd*) Set cpu_include_dir.
> >> > * config/os/bsd/openbsd/ctype_base.h: New file.
> >> > * config/os/bsd/openbsd/ctype_configure_char.cc: New file.
> >> > * config/os/bsd/openbsd/ctype_inline.h: New file.
> >> > * config/os/bsd/openbsd/os_defines.h: New file.
> >>
> >> This patch is OK, thanks.  Do you want me to commit it for you?
> >
> > Yes please.
> 
> It occurs to me now that the patch changes the size of
> ctype_base::mask, from the generic unsigned to char. I assume the
> OpenBSD system compiler uses char? How long has that change been
> present in the OpenBSD source tree?

Yes, the system compile uses char and has been doing so since mid-2005.

> I'm not sure whether or not it's better to change the size of that
> type in GCC 4.8, which would break compatibility with previous
> versions of the FSF sources but provide compatibility with the OpenBSD
> system compiler.  My guess would be that most people on OpenBSD are
> using the system compiler not upstream FSF sources.

Indeed.  People either use the system compiler or install one from
ports/packages.  Given the sorry state of OpenBSD support in the FSF
source tree (barely buildable) I think binary compatibility with the
system compiler is more important.

> >> It shouoldn't stop the patch going in, but I assume that this test
> >> fails on OpenBSD even with your patch applied?
> >>
> >> #include 
> >> #include 
> >>
> >> class gnu_ctype: public std::ctype { };
> >>
> >> int main()
> >> {
> >>   gnu_ctype gctype;
> >>
> >>   assert(gctype.is(std::ctype_base::xdigit, L'a'));
> >> }
> >
> > Interestingly enough, it doesn't fail without my diff.  But it does
> > fail for OpenBSD's system compiler (GCC 4.2.1 with a lot of local
> > modifications).  As far as I can determine this is the result of
> > ctype_base::mask being an 8-bit integer type which doesn't go well
> > with the generic ctype_members.cc implementation.  Probably need to
> > have an OpenBSD-specific implementation just like newlib.  Looking
> > into that now.
> 
> See http://gcc.gnu.org/PR51772 (the original description gets the
> cause wrong, see comment 3 for the real problem)

Right!  Using the newlib locale model on OpenBSD fixes the problem,
and seems to fix a couple of test cases in the g++ testsuite as well.


2012-09-10  Mark Kettenis  

* acinclude.m4 (GLIBCXX_ENABLE_CLOCALE): Use newlib locale model
for OpenBSD.
* configure: Regenerated.


Index: acinclude.m4
===
--- acinclude.m4(revision 191120)
+++ acinclude.m4(working copy)
@@ -1836,6 +1836,9 @@
   darwin* | freebsd*)
enable_clocale_flag=darwin
;;
+  openbsd*)
+   enable_clocale_flag=newlib
+   ;;
   *)
if test x"$with_newlib" = x"yes"; then
  enable_clocale_flag=newlib

Re: [ping][PATCH] Power: Reorder a sign-extend RTL pattern for readability

2012-09-10 Thread Maciej W. Rozycki

On Sat, 8 Sep 2012, David Edelsohn wrote:

> 2012-08-10  Maciej W. Rozycki  
> 
>   gcc/
>   * config/rs6000/rs6000.md: Move a splitter next to its insn.
> 
> This patch is okay.  Yes, the splitter should not have been separated
> from the basic pattern. Thanks for helping to clean up the port.

 Applied now, thanks for your review.

  Maciej

Re: [PATCH,i386] fma4 addition for bdver2