date:20120820

Re: [PATCH][C++] Save memory and reallocations in name-lookup

2012-08-20 Thread Richard Guenther

On Sat, 18 Aug 2012, Dimitrios Apostolou wrote:

> Hi,
> 
> On Fri, 17 Aug 2012, Jakub Jelinek wrote:
> > On Fri, Aug 17, 2012 at 06:41:37AM -0500, Gabriel Dos Reis wrote:
> > > I am however concerned with:
> > > 
> > > >   static void
> > > >   store_bindings (tree names, VEC(cxx_saved_binding,gc) **old_bindings)
> > > >   {
> > > > !   static VEC(tree,heap) *bindings_need_stored = NULL;
> > > 
> > > I would be more comfortable to see the cache be on per-scope
> > > (e.g. namespace scope) basis as opposed
> > > a blanket global cache stored in a global variable.
> > 
> > It is not any kind of cache.  It could be in theory an automatic variable
> > vector pointer, it is only used during that function.  The reason why it is
> > static variable instead is just to avoid constant allocation/deallocation
> > of the vector, this way after the first call it will be already allocated
> > (but, upon entry to store_bindings will always be empty).
> 
> Why not use stack vector of sufficient size for most cases then? I usually do
> something like:
> 
>   VEC (tree, stack) *bindings_need_stored = VEC_alloc (tree, stack, 32);
>   ...
>   VEC_free (tree, stack, bindings_need_stored);
> 
> if I've measured that 32 (or whatever) is enough for most cases. I don't have
> a clue about this case though.

I considered that, but I'm not sure it would be an improvement.  We'd
have a re-allocation and free in some cases then.

Richard.

[PATCH] Fix PR54326

2012-08-20 Thread Richard Guenther


The following should fix PR54326 (3.4.6 at least compiles genoutput.c
without warnings after this).

Bootstrap running on x86_64-unknown-linux-gnu.

Richard.

2012-08-20  Richard Guenther  

PR bootstrap/54326
* genoutput.c (note_constraint): Properly use CONST_CAST.

Index: gcc/genoutput.c
===
--- gcc/genoutput.c (revision 190523)
+++ gcc/genoutput.c (working copy)
@@ -1175,7 +1175,7 @@ note_constraint (rtx exp, int lineno)
}
 }
   new_cdata = XNEWVAR (struct constraint_data, sizeof (struct constraint_data) 
+ namelen);
-  strcpy ((char *)new_cdata + offsetof(struct constraint_data, name), name);
+  strcpy (CONST_CAST(char *, new_cdata->name), name);
   new_cdata->namelen = namelen;
   new_cdata->lineno = lineno;
   new_cdata->next_this_letter = *slot;

Re: alloc_pool for tree-ssa-pre.c:phi_translate_table

2012-08-20 Thread Richard Guenther

On Sun, Aug 19, 2012 at 8:30 PM, Dimitrios Apostolou  wrote:
>
>
> 2012-08-19  Dimitrios Apostolou  
>
> * gcc/tree-ssa-pre.c (phi_translate_pool): New static global
> alloc_pool, used for allocating struct expr_pred_trans_d for
> phi_translate_table.
> (phi_trans_add, init_pre, fini_pre): Use it, avoids thousand of
> malloc() and free() calls.
>
>
> This avoids lots of malloc/free calls and slow iterations during numerous
> htab_delete() in fini_pre(). Tested on pre C++-snapshot, will update info as
> soon as a post C++ one is available.

Ok, if bootstrap / testing still succeeds.

Thanks,
Richard.

>
> Thanks,
> Dimitris

Re: CXX conversion: min g++ version pre-requisite?

2012-08-20 Thread Richard Guenther

On Sun, Aug 19, 2012 at 8:44 PM, Gary Funck  wrote:
> I filed two bug reports:
>
> "GCC install document does not list minimum required g++ version"
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54324
>
> "GCC does not build with G++ version 3.4.0"
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54326
>
> Re: the latter bug report (54326), it might be further
> divided into two bug reports: one documenting the failure
> to build with g++ 3.4.0 and the other having to do with
> the use of casts in genoutput.c.  Let me know if you
> recommend that I separate the bug reports.

Does it still fail, even after fixing genoutput.c?  If so, yes, please open
a bug for the compile-with-3.4.6(!) issue.  No, I think 3.4._0_ should not
be what we should be after.

Richard.

Richard.

> - Gary

Re: [PATCH] Fix PR 52631 (VN does not use simplified expression for lookup)

2012-08-20 Thread Richard Guenther

On Mon, Aug 20, 2012 at 6:49 AM, Andrew Pinski
 wrote:
> On Wed, Jul 25, 2012 at 4:39 AM, Richard Guenther
>  wrote:
>> On Tue, Jul 24, 2012 at 5:50 PM, Andrew Pinski
>>  wrote:
>>> Hi,
>>>   Before tuples was introduced, VN used to lookup the simplified
>>> expression to see if it was available already and use that instead of
>>> the non simplified one.  This patch adds the support back to VN to do
>>> exactly that.
>>>
>>> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>>
>> I think this should be done for all RHS and SSA name LHS, not only
>> for UNARY/BINARY/TERNARY - because even for SINGLE rhs we
>> can end up simplifying (for REALPART_EXPR for example which we
>> handle as nary, too).  I think we constrain try_to_simplify enough
>> so that
>>
>> + /* First try to lookup the simplified expression. */
>> + if (simplified && valid_gimple_rhs_p (simplified))
>> +   {
>> + tree result = vn_nary_op_lookup (simplified, NULL);
>> + if (result)
>> +   {
>> + changed = set_ssa_val_to (lhs, result);
>> + goto done;
>> +   }
>> + changed = set_ssa_val_to (lhs, lhs);
>> + vn_nary_op_insert (simplified, lhs);
>> +   }
>>   switch (get_gimple_rhs_class (code))
>> {
>> case GIMPLE_UNARY_RHS:
>> case GIMPLE_BINARY_RHS:
>> ...
>>
>> should work.  As you also insert the simplified variant I think we really
>> (finally) want to have a valid_nary_op routine rather than relying on
>> valid_gimple_rhs_p which is way too generic.
>
> I don't see valid_gimple_rhs_p being that generic as it checks to make
> sure the operands of the gimple are valid.
> Maybe I am missing something here though.

valid_gimple_rhs_p checks what it says.  But what we want to know is whether
the rhs is valid for a SCCVN NARY.  Those are not the same.

Richard.

> Thanks,
> Andrew Pinski
>
>
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>> ChangeLog:
>>>
>>> * tree-ssa-sccvn.c (visit_use): Look up the simplified
>>> expression before visting the original one.
>>>
>>> * gcc.dg/tree-ssa/ssa-fre-9.c: Update the number of
>>> eliminatations that happen.

Re: [PATCH] Fix loop pattern distribution ICE (PR tree-optimization/54321)

2012-08-20 Thread Richard Guenther

On Mon, Aug 20, 2012 at 8:27 AM, Jakub Jelinek  wrote:
> Hi!
>
> The middle-end argument of memset is signed (int), so simplify_builtin_call
> correctly checks host_integerp (val2, 0), but later on used tree_low_cst
> (val2, 1), so for negative values it would ICE.  Fixed thusly, the memset
> is supposed to cast the int to unsigned char internally anyway.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Yes, sorry for the mixup.

Thanks,
Richard.

> 2012-08-20  Jakub Jelinek  
>
> PR tree-optimization/54321
> * tree-ssa-forwprop.c (simplify_builtin_call): Pass 0 instead of 1
> as second argument to tree_low_cst call on val2.
>
> * gcc.c-torture/compile/pr54321.c: New test.
>
> --- gcc/tree-ssa-forwprop.c.jj  2012-08-14 08:45:00.0 +0200
> +++ gcc/tree-ssa-forwprop.c 2012-08-20 08:11:06.247936035 +0200
> @@ -1554,7 +1554,7 @@ simplify_builtin_call (gimple_stmt_itera
>   else
> src_buf[0] = tree_low_cst (src1, 0);
>   memset (src_buf + tree_low_cst (diff, 1),
> - tree_low_cst (val2, 1), tree_low_cst (len2, 1));
> + tree_low_cst (val2, 0), tree_low_cst (len2, 1));
>   src_buf[src_len] = '\0';
>   /* Neither builtin_strncpy_read_str nor builtin_memcpy_read_str
>  handle embedded '\0's.  */
> --- gcc/testsuite/gcc.c-torture/compile/pr54321.c.jj2012-08-20 
> 08:12:10.955630873 +0200
> +++ gcc/testsuite/gcc.c-torture/compile/pr54321.c   2012-08-20 
> 08:13:27.963398948 +0200
> @@ -0,0 +1,12 @@
> +/* PR tree-optimization/54321 */
> +struct S { char s[0]; } *a;
> +
> +void
> +foo (void)
> +{
> +  char *b = a->s;
> +  int c = 0;
> +  b[0] = 0;
> +  while (++c < 9)
> +b[c] = 255;
> +}
>
> Jakub

[PATCH] GTY chain_next annotate gimple_statement_base

2012-08-20 Thread Richard Guenther


This creates better marking code (even though we probably tail-recurse
for the exising one).

Bootstrapped and tested on x86-64-unknown-linux-gnu, applied.

Richard.

2012-08-20  Richard Guenther  

* gimple.h (gimple_statement_base): Annotate with GTY chain_next.

Index: gcc/gimple.h
===
--- gcc/gimple.h(revision 190523)
+++ gcc/gimple.h(working copy)
@@ -151,7 +151,7 @@ typedef struct
 /* Data structure definitions for GIMPLE tuples.  NOTE: word markers
are for 64 bit hosts.  */
 
-struct GTY(()) gimple_statement_base {
+struct GTY((chain_next ("%h.next"))) gimple_statement_base {
   /* [ WORD 1 ]
  Main identifying code for a tuple.  */
   ENUM_BITFIELD(gimple_code) code : 8;

Re: Speedups/Cleanups: End of GSOC patch collection

2012-08-20 Thread Dodji Seketeli

Dimitrios Apostolou  a écrit:

[...]

>   * include/libiberty.h (XOBDELETE, XOBGROW, XOBGROWVEC, XOBSHRINK)
>   (XOBSHRINKVEC, XOBFINISH): New type-safe macros for obstack
>   operations.
>   (XOBFINISH): Changed to return (T *) instead of T. All callers
>   updated.
>   * libcpp/include/symtab.h (obstack_chunk_alloc)
>   (obstack_chunk_free): Define, so that obstack_init() can be used.
>   * libcpp/internal.h (struct cset_converter): Same.
>   * libcpp/files.c (_cpp_init_files): Changed _obstack_begin() to
>   obstack_init().
>   * libcpp/identifiers.c (_cpp_init_hashtable): Same.
>   * libcpp/symtab.c (ht_create): Same.
>   * libcpp/init.c (cpp_create_reader): Same.
>

[...]

> +++ libcpp/include/symtab.h   2012-08-18 15:07:01 +

[...]

> +#ifndef obstack_chunk_alloc

Please add a comment here, as you did bellow in hunk for
libcpp/internal.h:

> +#ifndef obstack_chunk_alloc
> +  /* Needed for calling obstack_init().  */
> +  #define obstack_chunk_alloc(void *(*) (long)) xmalloc
> +  #define obstack_chunk_free (void (*) (void *)) free
> +#endif

> +  #define obstack_chunk_alloc(void *(*) (long)) xmalloc
> +  #define obstack_chunk_free (void (*) (void *)) free
> +#endif

[...]

With these changes, the libcpp parts look OK to me if they still
boostrap post c++ conversion.  I am not a maintainer so I a deferring to
Tom and the other maintainers.

Thanks.

-- 
Dodji

Re: alloc_pool for tree-ssa-pre.c:phi_translate_table

2012-08-20 Thread Jakub Jelinek

On Mon, Aug 20, 2012 at 09:37:39AM +0200, Richard Guenther wrote:
> On Sun, Aug 19, 2012 at 8:30 PM, Dimitrios Apostolou  wrote:
> >
> >
> > 2012-08-19  Dimitrios Apostolou  
> >
> > * gcc/tree-ssa-pre.c (phi_translate_pool): New static global
> > alloc_pool, used for allocating struct expr_pred_trans_d for
> > phi_translate_table.
> > (phi_trans_add, init_pre, fini_pre): Use it, avoids thousand of
> > malloc() and free() calls.
> >
> >
> > This avoids lots of malloc/free calls and slow iterations during numerous
> > htab_delete() in fini_pre(). Tested on pre C++-snapshot, will update info as
> > soon as a post C++ one is available.
> 
> Ok, if bootstrap / testing still succeeds.

I'd note for all the recently posted patches from Dimitrios, the gcc/
prefix doesn't belong to the ChangeLog entry pathnames, the filenames are
relative to the corresponding ChangeLog location.

Jakub

Re: alloc_pool for tree-ssa-pre.c:phi_translate_table

2012-08-20 Thread Dimitrios Apostolou


On Mon, 20 Aug 2012, Jakub Jelinek wrote:


I'd note for all the recently posted patches from Dimitrios, the gcc/
prefix doesn't belong to the ChangeLog entry pathnames, the filenames are
relative to the corresponding ChangeLog location.



Ah sorry, it's what the mklog utility generates, it seems I got carried 
away over-using it. :-)



Dimitris

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Jan Hubicka

> Well, it should store the largest working set in BBs, or one that came
> from a much longer run. But I see the issue you are pointing out. The
> num_counters (the number of hot bbs) should be reasonable, as the
> total number of BBs is the same between all runs, and I want to show
> the largest or more dominant working set in terms of the number of hot
> bbs. But the min_bb_counter will become more and more inaccurate as
> the number of runs increases, since the counter values are
> accumulated.

Yes and that is the one that we plan to use to determine hot/cold decisions on,
right?

Note that there is no really 1-1 corespondence in betwen BBs and counters.
For each function the there should be num_edges-num_bbs+1 counters.
What do you plan to use BB counts for?
> 
> I typed up a detailed email below on why getting this right would be
> difficult, but then I realized there might be a fairly simple accurate
> solution, which I'll describe first:
> 
> The only way I see to do this completely accurately is to take two
> passes through the existing gcda files that we are merging into, one
> to read in all the counter values and merge them into all the counter
> values in the arrays from the current run (after which we can do the
> histogramming/working set computation accurately from the merged
> counters), and the second to rewrite them. In libgcov this doesn't
> seem like it would be too difficult to do, although it is a little bit
> of restructuring of the main merging loop and needs some special
> handling for buffered functions (which could increase the memory
> footprint a bit if there are many of these since they will all need to
> be buffered across the iteration over all the gcda files).
> 
> The summary merging in coverage.c confuses me a bit as it seems to be
> handling the case when there are multiple program summaries in a
> single gcda file. When would this happen? It looks like the merge
> handling in libgcov should always produce a single program summary per
> gcda file.


This is something Nathan introduced years back. The idea was IMO to handle
more acurately objects linked into multiple binaries. I am not sure
if the code really works or worked on some point.

The idea, as I recall it, was to produce overall checksum of all objects and
have different profile records for each combination.

This is not really useful for profile feedback as you generate single object
file for all uses, but it might become useful for LTO where you know into which
binary you are linking to. I am not really sure it is worth all the 
infrastructure
needed to support this though.
> 
> >
> >
> > Why you don't simply write the histogram into gcov file and don't merge the 
> > values
> > here (i.e. doing the cummulation loop in GCC instead of within libgcov)?
> 
> That doesn't completely solve the problem, unfortunately. The reason
> is that you don't know which histogram entry contains a particular
> block each run, and therefore you don't know how to compute the new
> combined histogram value and index for that bb counter. For example, a
> particular histogram index may have 5 counters (bbs) in it for one run
> and 2 counters (bbs) in it for a second run, so the question is how to
> compute the new entry for all of those bb counters, as the 5 bbs from
> the first run may or may not be a superset of the 2 from the second
> run. You could assume that the bbs have the same relative order of
> hotness between runs, and combine the bb counters accordingly, but
> there is still some trickiness because you have to apportion the
> cumulative counter sum stored in the histogram entry between new
> entries. For example, assume the highest indexed non-zero entries (the
> histogram buckets containing the largest counter values) in the two
> incoming histograms are:
> 
> histogram 1:
> 
> index 100: 4 counters, cumulative value X, min counter value minx
> ...
> 
> histogram 2:
> 
> index 100: 2 counters, cumulative value Y, min counter value miny
> index 99: 3 counters, cumulative value W, min counter value minw
> ...
> 
> To merge, you could assume something like 2 counters with a new
> cumulative value (Y + X*2/4), and new min counter value minx+miny,
> that go into the merged histogram with the index corresponding to
> counter value minx+miny. And then 2 counters have a new cumulative
> value (W*2/3 + X*2/4) and new min counter value minx+minw, that go
> into the merged histogram with index corresponding to counter value
> minw+minx. Etc... Not entirely accurate, although it might be a
> reasonable approximation, but it also requires a number of division
> operations during the merge in libgcov.
> 
> Another possibility, that might also provide a reasonable
> approximation, would be to scale the min_bb_counter values in the
> working sets by the sum_all_merged/sum_all_orig, where sum_all_merged
> is the new sum_all, and sum_all_orig is the sum_all from the summary
> whose working_set was chosen to be propagated to the new mer

Re: [PATCH, RFC] Re-work find_reloads_subreg_address (Re: [PATCH][RFC, Reload]. Reload bug?)

2012-08-20 Thread Tejas Belagod


Tejas Belagod wrote:

Ulrich Weigand wrote:

The following patch implements this idea; it passes a basic regression
test on arm-linux-gnueabi.  (Obviously this would need a lot more
testing on various platforms before getting into mainline ...)

Can you have a look whether this fixes the problem you're seeing?



Sorry for the delay in replying. Thanks for the patch. I tried this patch -  it
doesn't seem to reach as far as cleanup_subreg_operands (), but fails an
assertion in push_reload () in reload.c:1307. I'm currently investigating this
and will let you know the reason soon.



Hi,

Sorry for the delay in replying. These new issues that I was seeing were bugs 
specific to my code that I've fixed. Your patch seems to work fine. Thanks a lot 
for the patch.


Thanks,
Tejas Belagod.
ARM.

[PATCH] Fix PR54327

2012-08-20 Thread Richard Guenther


This is an issue of folding looking at SSA def stmts when SSA form
is not up-to-date.  That's not safe unless the SSA name we are
looking at is not marked for update (see GIMPLE_COND folding in
cfgcleaup for another example).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-08-20  Richard Guenther  

PR tree-optimization/54327
* gimple-fold.c (get_maxval_strlen): Do not walk use-def chains
if the use is registered for SSA update.

* gcc.dg/torture/pr54327.c: New testcase.

Index: gcc/gimple-fold.c
===
*** gcc/gimple-fold.c   (revision 190523)
--- gcc/gimple-fold.c   (working copy)
*** get_maxval_strlen (tree arg, tree *lengt
*** 736,741 
--- 736,746 
return true;
  }
  
+   /* If ARG is registered for SSA update we cannot look at its defining
+  statement.  */
+   if (name_registered_for_update_p (arg))
+ return false;
+ 
/* If we were already here, break the infinite cycle.  */
if (!bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
  return true;
Index: gcc/testsuite/gcc.dg/torture/pr54327.c
===
*** gcc/testsuite/gcc.dg/torture/pr54327.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr54327.c  (working copy)
***
*** 0 
--- 1,15 
+ /* { dg-do compile } */
+ 
+ #include 
+ #include 
+ void treathead ()
+ {
+   char *a = ';' == '\0' ? : 0;
+   if (*a == '=')
+ {
+   while (*a == (*a == 0) || *a == '\'')
+   a++;
+   if (strlen (a) < 2)
+   abort ();
+ }
+ }

Re: [PATCH][C++] Get rid of TREE_CHAIN use for TREE_VEC (NON_DEFAULT_TEMPLATE_ARGS_COUNT)

2012-08-20 Thread Richard Guenther

On Fri, 17 Aug 2012, Richard Guenther wrote:

> 
> This gets rid of this field, pushing it into a short int in tree_base
> (hopefully 2^16 non-defaulted template args are enough ...).  The
> existing code is a bit confusing because of the differences with
> respect to ENABLE_CHECKING, it has very little testcase coverage as
> well (a single testcase, g++.dg/template/error39.C), so I am not sure
> I got the idea correct (especially as ENABLE_CHECKING vs. 
> non-ENABLE_CHECKING looks to introduce subtle differences?).
> 
> Anyway, a previous iteration that failed g++.dg/template/error39.C
> bootstrapped and tested all languages ok, the following version now
> passes g++.dg/template/error39.C and hopefully bootstrap and regtest
> as well.
> 
> We build a _lot_ of TREE_VECs from the C++ frontend, so this 8 byte
> saving is quite visible.  TREE_VEC is the most used TREE_CODE by far
> on PR54146:
> 
> ...
> ssa_name  363597
> mem_ref   403966
> addr_expr1203839
> tree_list1205721
> tree_vec 2654415
> 
> which translates to 109615896 bytes allocated in TREE_VECs (yes,
> that includes the actual storage, the average size is 41.3 bytes,
> with this patch TREE_VEC itself shrinks from 40 bytes to 32 bytes
> which means the average TREE_VEC size is one!).
> 
> Let's hope removing TREE_TYPE from TREE_VEC is as "easy" ;)

It turns out the C++ frontend is again the only user.  It gets
used for DECL_PRIMARY_TEMPLATE.  Now, the C++ fronend indeed seems
to want to associate information with the template argument vector
itself, not the template.  As the patch removing TREE_CAHIN is in
some sense not an improvement (C++ specific bits remain in the
generic tree structure, blocking from the VEC size being moved
to spare bits in tree_base), the DECL_PRIMARY_TEMPLATE usage shows
that using a TREE_VEC to represent the template argument is the
issue.  We cannot simply derive from TREE_VEC because it relies
on storage allocated at its tail, so instead a C++ specific tree
node mimicking a TREE_VEC but sporting extra fields, including
ones for the number of non-defaulted args and for DECL_PRIMARY_TEMPLATE
would be the way to go.  That node would derive from tree_base.

This should be pretty straight-forward to do, but I'll push it
back for now due to lack of time.  So consider this patch
withdrawn.  You'll instead see me trying to move the length
field into tree_base.

Richard.

> Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Ok for the C++ parts?
> 
> Thanks,
> Richard.
> 
> 2012-08-17  Richard Guenther  
> 
>   * tree.h (struct tree_base): Put address_space into a union,
>   alongside a new field non_default_template_args_count used by
>   C++ TREE_VECs.
>   (struct tree_vec): Derive from tree_typed instead of tree_common.
>   (TYPE_ADDR_SPACE): Adjust.
>   * tree.c (initialize_tree_contains_struct): Adjust tree type
>   hierarchy.
> 
>   cp/
>   * cp-tree.h (NON_DEFAULT_TEMPLATE_ARGS_COUNT): Remove.
>   (SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT,
>   GET_NON_DEFAULT_TEMPLATE_ARGS_COUNT): Use
>   non_default_template_args_count in tree_base instead of TREE_CHAIN.
>   * pt.c (expand_template_argument_pack): Adjust.
>   (template_parm_to_arg): Likewise.
>   (current_template_args): Likewise.
>   (coerce_template_parameter_pack): Likewise.
>   (coerce_template_parms): Likewise.
>   (tsubst_template_args): Likewise.
>   (type_unification_real): Likewise.
>   * parser.c (cp_parser_template_argument_list): Likewise.
> 
> Index: gcc/tree.h
> ===
> *** gcc/tree.h(revision 190469)
> --- gcc/tree.h(working copy)
> *** struct GTY(()) tree_base {
> *** 453,465 
> unsigned packed_flag : 1;
> unsigned user_align : 1;
> unsigned nameless_flag : 1;
>   
> !   unsigned spare : 12;
> ! 
> !   /* This field is only used with type nodes; the only reason it is present
> !  in tree_base instead of tree_type is to save space.  The size of the
> !  field must be large enough to hold addr_space_t values.  */
> !   unsigned address_space : 8;
>   };
>   
>   struct GTY(()) tree_typed {
> --- 453,473 
> unsigned packed_flag : 1;
> unsigned user_align : 1;
> unsigned nameless_flag : 1;
> +   unsigned spare0 : 4;
>   
> !   /* The following has to be at a 16-bit alignment boundary to be properly
> !  packed and not make tree_base larger than 64 bits.  */
> !   union tree_base_spare_bits {
> ! /* This field is only used with type nodes; the only reason it is 
> present
> !in tree_base instead of tree_type is to save space.  The size of the
> !field must be large enough to hold addr_space_t values.  */
> ! unsigned char address_space;
> ! 
> ! /* This field is only used with tree_vec nodes by the C++ frontend; the
> !only reason it is present

Re: [wwwdocs] Update Fortran secrion in 4.8/changes.html

2012-08-20 Thread Gerald Pfeifer

On Tue, 14 Aug 2012, Tobias Burnus wrote:
> I have committed the patch as obvious, however, I am happy for any 
> comments.

I went ahead and made some smaller changes, patch below.

I noticed you are using ..., as in e,
which we usually don't.  Why that?

Gerald

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.15
diff -u -3 -p -r1.15 changes.html
--- changes.html19 Aug 2012 19:48:02 -  1.15
+++ changes.html20 Aug 2012 10:32:20 -
@@ -92,18 +92,20 @@ by this change.
 (re)allocation in hot loops. (For arrays, replacing 
var=
 by var(:)= disables the automatic reallocation.)
 
-Reading floating point numbers which use q for the
-exponential (such as 4.0q0) is now supported as vendor
+Reading floating point numbers which use q for
+the exponential (such as 4.0q0) is now supported as vendor
 extension for better compatibility with old data files. It is strongly
 recommended to use for I/O the equivalent but standard conforming
-e (such as 4.0e0). [For the Fortran
+e (such as 4.0e0).
+
+(For Fortran
 source code, consider replacing the q in
 floating-point literals by a kind parameter (e.g. 4.0e0_qp
-with a suitable qp). Note that – in the Fortran
+with a suitable qp). Note that – in Fortran
 source code – replacing q by a simple
-e is not equivalent.]
+e is not equivalent.)
 
-The GFORTRAN_TMPDIR environment variable, for specifying
+The GFORTRAN_TMPDIR environment variable for specifying
 a non-default directory for files opened with 
STATUS="SCRATCH",
 is not used anymore. Instead gfortran checks the POSIX/GNU standard
 TMPDIR environment variable. If TMPDIR is not

Re: [SH] PR 54089 - Add support for rotcr insn

2012-08-20 Thread Kaz Kojima

Oleg Endo  wrote:
> This adds support for SH's rotcr insn.
> Tested on rev 190459 with
> make -k check RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> and no new failures.
> OK?

OK.

Regards,
kaz

Re: [SH] PR 51244 - Use more zero displacement branches

2012-08-20 Thread Kaz Kojima

Oleg Endo  wrote:
> This adds two new patterns to undo an optimization that is done by ifcvt
> and is not beneficial if zero displacement branches are available on SH.
> Tested on rev 190459 with
> make -k check RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> and no new failures.
> OK?

OK.

Regards,
kaz

Check for ffs declaration in gcc/configure.ac

2012-08-20 Thread Joseph S. Myers

Building a cross compiler from i686-mingw32 to arm-none-eabi fails
after the move to build as C++ because config/arm/neon.md uses ffs but
no MinGW header declares ffs (and unlike in C, an implicit declaration
of this built-in function can't be used).  This patch fixes this issue
by making the gcc directory configure check for a declaration, so
causing libiberty.h to provide one if needed.

Tested building a cross compiler as described above (fails before the
patch, passes afterwards).  OK to commit?

2012-08-20  Joseph Myers  

* configure.ac (ffs): Check for declaration.
* configure, config.in: Regenerate.

Index: configure
===
--- configure   (revision 190529)
+++ configure   (working copy)
@@ -10288,7 +10288,7 @@
 for ac_func in getenv atol asprintf sbrk abort atof getcwd getwd \
strsignal strstr stpcpy strverscmp \
errno snprintf vsnprintf vasprintf malloc realloc calloc \
-   free basename getopt clock getpagesize clearerr_unlocked feof_unlocked  
 ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked   
fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked   fread_unlocked 
fwrite_unlocked getchar_unlocked getc_unlocked   putchar_unlocked putc_unlocked
+   free basename getopt clock getpagesize ffs clearerr_unlocked 
feof_unlocked   ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked   
fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked   fread_unlocked 
fwrite_unlocked getchar_unlocked getc_unlocked   putchar_unlocked putc_unlocked
 do
   ac_tr_decl=`$as_echo "HAVE_DECL_$ac_func" | $as_tr_cpp`
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $ac_func is 
declared" >&5
Index: config.in
===
--- config.in   (revision 190529)
+++ config.in   (working copy)
@@ -621,6 +621,12 @@
 #endif
 
 
+/* Define to 1 if we found a declaration for 'ffs', otherwise define to 0. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_DECL_FFS
+#endif
+
+
 /* Define to 1 if we found a declaration for 'fgetc_unlocked', otherwise
define to 0. */
 #ifndef USED_FOR_TARGET
Index: configure.ac
===
--- configure.ac(revision 190529)
+++ configure.ac(working copy)
@@ -1075,7 +1075,7 @@
 gcc_AC_CHECK_DECLS(getenv atol asprintf sbrk abort atof getcwd getwd \
strsignal strstr stpcpy strverscmp \
errno snprintf vsnprintf vasprintf malloc realloc calloc \
-   free basename getopt clock getpagesize gcc_UNLOCKED_FUNCS, , ,[
+   free basename getopt clock getpagesize ffs gcc_UNLOCKED_FUNCS, , ,[
 #include "ansidecl.h"
 #include "system.h"])
 

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Check for ffs declaration in gcc/configure.ac

2012-08-20 Thread Richard Guenther

On Mon, Aug 20, 2012 at 1:28 PM, Joseph S. Myers
 wrote:
> Building a cross compiler from i686-mingw32 to arm-none-eabi fails
> after the move to build as C++ because config/arm/neon.md uses ffs but
> no MinGW header declares ffs (and unlike in C, an implicit declaration
> of this built-in function can't be used).  This patch fixes this issue
> by making the gcc directory configure check for a declaration, so
> causing libiberty.h to provide one if needed.
>
> Tested building a cross compiler as described above (fails before the
> patch, passes afterwards).  OK to commit?

Ok.

Thanks,
Richard.

> 2012-08-20  Joseph Myers  
>
> * configure.ac (ffs): Check for declaration.
> * configure, config.in: Regenerate.
>
> Index: configure
> ===
> --- configure   (revision 190529)
> +++ configure   (working copy)
> @@ -10288,7 +10288,7 @@
>  for ac_func in getenv atol asprintf sbrk abort atof getcwd getwd \
> strsignal strstr stpcpy strverscmp \
> errno snprintf vsnprintf vasprintf malloc realloc calloc \
> -   free basename getopt clock getpagesize clearerr_unlocked 
> feof_unlocked   ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked 
>   fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked   
> fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked   
> putchar_unlocked putc_unlocked
> +   free basename getopt clock getpagesize ffs clearerr_unlocked 
> feof_unlocked   ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked 
>   fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked   
> fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked   
> putchar_unlocked putc_unlocked
>  do
>ac_tr_decl=`$as_echo "HAVE_DECL_$ac_func" | $as_tr_cpp`
>  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $ac_func is 
> declared" >&5
> Index: config.in
> ===
> --- config.in   (revision 190529)
> +++ config.in   (working copy)
> @@ -621,6 +621,12 @@
>  #endif
>
>
> +/* Define to 1 if we found a declaration for 'ffs', otherwise define to 0. */
> +#ifndef USED_FOR_TARGET
> +#undef HAVE_DECL_FFS
> +#endif
> +
> +
>  /* Define to 1 if we found a declaration for 'fgetc_unlocked', otherwise
> define to 0. */
>  #ifndef USED_FOR_TARGET
> Index: configure.ac
> ===
> --- configure.ac(revision 190529)
> +++ configure.ac(working copy)
> @@ -1075,7 +1075,7 @@
>  gcc_AC_CHECK_DECLS(getenv atol asprintf sbrk abort atof getcwd getwd \
> strsignal strstr stpcpy strverscmp \
> errno snprintf vsnprintf vasprintf malloc realloc calloc \
> -   free basename getopt clock getpagesize gcc_UNLOCKED_FUNCS, , ,[
> +   free basename getopt clock getpagesize ffs gcc_UNLOCKED_FUNCS, , ,[
>  #include "ansidecl.h"
>  #include "system.h"])
>
>
> --
> Joseph S. Myers
> jos...@codesourcery.com

Re: combine permutations in gimple

2012-08-20 Thread Richard Guenther

On Sat, Aug 18, 2012 at 11:59 AM, Marc Glisse  wrote:
> Hello,
>
> here is a new patch (passes bootstrap+regtest), which only combines
> permutations if the result is the identity. I'll file a PR about more
> general
> combinations to link to this conversation and the cost hook proposition.
>
> Ok?

Ok.

Thanks,
Richard.

>
> 2012-08-18  Marc Glisse  
>
>
> gcc/
> * fold-const.c (fold_ternary_loc): Detect identity permutations.
> Canonicalize permutations more.
> * tree-ssa-forwprop.c (is_combined_permutation_identity): New
> function.
>
> (simplify_permutation): Likewise.
> (ssa_forward_propagate_and_combine): Call it.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/forwprop-19.c: New testcase.
> * gcc.dg/fold-perm.c: Likewise.
>
> --
> Marc Glisse
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c(revision 190500)
> +++ gcc/fold-const.c(working copy)
> @@ -14148,58 +14148,96 @@ fold_ternary_loc (location_t loc, enum t
> return fold_build2_loc (loc, PLUS_EXPR, type,
> const_binop (MULT_EXPR, arg0, arg1), arg2);
>if (integer_zerop (arg2))
> return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
>
>return fold_fma (loc, type, arg0, arg1, arg2);
>
>  case VEC_PERM_EXPR:
>if (TREE_CODE (arg2) == VECTOR_CST)
> {
> - unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i;
> + unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i, mask;
>   unsigned char *sel = XALLOCAVEC (unsigned char, nelts);
>   tree t;
>   bool need_mask_canon = false;
> + bool all_in_vec0 = true;
> + bool all_in_vec1 = true;
> + bool maybe_identity = true;
> + bool single_arg = (op0 == op1);
> + bool changed = false;
>
> + mask = single_arg ? (nelts - 1) : (2 * nelts - 1);
>   gcc_assert (nelts == VECTOR_CST_NELTS (arg2));
>   for (i = 0; i < nelts; i++)
> {
>   tree val = VECTOR_CST_ELT (arg2, i);
>   if (TREE_CODE (val) != INTEGER_CST)
> return NULL_TREE;
>
> - sel[i] = TREE_INT_CST_LOW (val) & (2 * nelts - 1);
> + sel[i] = TREE_INT_CST_LOW (val) & mask;
>   if (TREE_INT_CST_HIGH (val)
>   || ((unsigned HOST_WIDE_INT)
>   TREE_INT_CST_LOW (val) != sel[i]))
> need_mask_canon = true;
> +
> + if (sel[i] < nelts)
> +   all_in_vec1 = false;
> + else
> +   all_in_vec0 = false;
> +
> + if ((sel[i] & (nelts-1)) != i)
> +   maybe_identity = false;
> +   }
> +
> + if (maybe_identity)
> +   {
> + if (all_in_vec0)
> +   return op0;
> + if (all_in_vec1)
> +   return op1;
> }
>
>   if ((TREE_CODE (arg0) == VECTOR_CST
>|| TREE_CODE (arg0) == CONSTRUCTOR)
>   && (TREE_CODE (arg1) == VECTOR_CST
>   || TREE_CODE (arg1) == CONSTRUCTOR))
> {
>   t = fold_vec_perm (type, arg0, arg1, sel);
>   if (t != NULL_TREE)
> return t;
> }
>
> + if (all_in_vec0)
> +   op1 = op0;
> + else if (all_in_vec1)
> +   {
> + op0 = op1;
> + for (i = 0; i < nelts; i++)
> +   sel[i] -= nelts;
> + need_mask_canon = true;
> +   }
> +
> + if (op0 == op1 && !single_arg)
> +   changed = true;
> +
>   if (need_mask_canon && arg2 == op2)
> {
>   tree *tsel = XALLOCAVEC (tree, nelts);
>   tree eltype = TREE_TYPE (TREE_TYPE (arg2));
>   for (i = 0; i < nelts; i++)
> tsel[i] = build_int_cst (eltype, sel[i]);
> - t = build_vector (TREE_TYPE (arg2), tsel);
> - return build3_loc (loc, VEC_PERM_EXPR, type, op0, op1, t);
> + op2 = build_vector (TREE_TYPE (arg2), tsel);
> + changed = true;
> }
> +
> + if (changed)
> +   return build3_loc (loc, VEC_PERM_EXPR, type, op0, op1, op2);
> }
>return NULL_TREE;
>
>  default:
>return NULL_TREE;
>  } /* switch (code) */
>  }
>
>  /* Perform constant folding and related simplification of EXPR.
> The related simplifications include x*1 => x, x*0 => 0, etc.,
> Index: gcc/tree-ssa-forwprop.c
> ===
> --- gcc/tree-ssa-forwprop.c (revision 190500)
> +++ gcc/tree-ssa-forwprop.c (working copy)
> @@ -2570,20 +2570,109 @@ combine_conversions (gimple_stmt_iterato
>   gimple_assign_set_rhs_code (stmt, CONVERT_EXPR);
>   update_stmt (stmt);
>   return remove_prop_source_from_

Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate

2012-08-20 Thread Richard Guenther

On Fri, Aug 17, 2012 at 7:05 PM, Richard Earnshaw  wrote:
> On 17/08/12 16:20, Richard Earnshaw wrote:
>> Ok, in which case we have to give is_widening_mult_rhs_p enough smarts
>> to not strip
>>
>>   (s32)u32
>>
>> and return u32.
>>
>> I'll have another think about it.
>
> Take two.
>
> This version should address your concerns about handling
>
> (u32)u16 * (u32)u16 -> u64
>
> We now look at each operand directly, but when doing so we check whether
> the operand is the same size as the result or not.  When it is, we can
> strip any conversion; when it isn't the conversion must preserve
> signedness of the inner operand and mustn't be a narrowing conversion.
>
> * tree-ssa-math-opts.c (widening_mult_conversion_strippable_p):
> New function.
> (is_widening_mult_rhs_p): Use it.
>
> Testing underway (again)
>
> OK?

Ok.

Thanks,
Richard.

> R.
>

[PATCH] Remove register_new_name_mapping, do less timevar push/pop

2012-08-20 Thread Richard Guenther


This performs timevar push/pop at the single remaining entry to 
incremental SSA rewrite setup.  The other entry, 
register_new_name_mapping, is removed by making its only other
user use create_new_def_for.  Which needs to handle a NULL DEF
for this case, as we have no access to a DEF op for an SSA name LHS
(but we do for a PHI node).  Something Micha may change soon.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-08-20  Richard Guenther  

* tree-flow.h (register_new_name_mapping): Remove.
* tree-into-ssa.c (register_new_name_mapping): Likewise.
(add_new_name_mapping): Do not push/pop timevar here.
(create_new_def_for): Instead do it here.  Initialize
update-ssa here, handle a NULL def.
* tree-vrp.c (build_assert_expr_for): Use create_new_def_for.

Index: gcc/tree-flow.h
===
*** gcc/tree-flow.h (revision 190525)
--- gcc/tree-flow.h (working copy)
*** void release_defs_bitset (bitmap toremov
*** 516,522 
  /* In tree-into-ssa.c  */
  void update_ssa (unsigned);
  void delete_update_ssa (void);
- void register_new_name_mapping (tree, tree);
  tree create_new_def_for (tree, gimple, def_operand_p);
  bool need_ssa_update_p (struct function *);
  bool name_registered_for_update_p (tree);
--- 516,521 
Index: gcc/tree-into-ssa.c
===
*** gcc/tree-into-ssa.c (revision 190525)
--- gcc/tree-into-ssa.c (working copy)
*** static VEC(gimple_vec, heap) *phis_to_re
*** 111,125 
  static bitmap blocks_with_phis_to_rewrite;
  
  /* Growth factor for NEW_SSA_NAMES and OLD_SSA_NAMES.  These sets need
!to grow as the callers to register_new_name_mapping will typically
!create new names on the fly.  FIXME.  Currently set to 1/3 to avoid
!frequent reallocations but still need to find a reasonable growth
!strategy.  */
  #define NAME_SETS_GROWTH_FACTOR   (MAX (3, num_ssa_names / 3))
  
  
  /* The function the SSA updating data structures have been initialized for.
!NULL if they need to be initialized by register_new_name_mapping.  */
  static struct function *update_ssa_initialized_fn = NULL;
  
  /* Global data to attach to the main dominator walk structure.  */
--- 111,125 
  static bitmap blocks_with_phis_to_rewrite;
  
  /* Growth factor for NEW_SSA_NAMES and OLD_SSA_NAMES.  These sets need
!to grow as the callers to create_new_def_for will create new names on
!the fly.
!FIXME.  Currently set to 1/3 to avoid frequent reallocations but still
!need to find a reasonable growth strategy.  */
  #define NAME_SETS_GROWTH_FACTOR   (MAX (3, num_ssa_names / 3))
  
  
  /* The function the SSA updating data structures have been initialized for.
!NULL if they need to be initialized by create_new_def_for.  */
  static struct function *update_ssa_initialized_fn = NULL;
  
  /* Global data to attach to the main dominator walk structure.  */
*** add_to_repl_tbl (tree new_tree, tree old
*** 587,594 
  static void
  add_new_name_mapping (tree new_tree, tree old)
  {
-   timevar_push (TV_TREE_SSA_INCREMENTAL);
- 
/* OLD and NEW_TREE must be different SSA names for the same symbol.  */
gcc_assert (new_tree != old && SSA_NAME_VAR (new_tree) == SSA_NAME_VAR 
(old));
  
--- 587,592 
*** add_new_name_mapping (tree new_tree, tre
*** 613,620 
   respectively.  */
SET_BIT (new_ssa_names, SSA_NAME_VERSION (new_tree));
SET_BIT (old_ssa_names, SSA_NAME_VERSION (old));
- 
-   timevar_pop (TV_TREE_SSA_INCREMENTAL);
  }
  
  
--- 611,616 
*** delete_update_ssa (void)
*** 2842,2857 
  
  
  /* Create a new name for OLD_NAME in statement STMT and replace the
!operand pointed to by DEF_P with the newly created name.  Return
!the new name and register the replacement mapping  in
 update_ssa's tables.  */
  
  tree
  create_new_def_for (tree old_name, gimple stmt, def_operand_p def)
  {
!   tree new_name = duplicate_ssa_name (old_name, stmt);
  
!   SET_DEF (def, new_name);
  
if (gimple_code (stmt) == GIMPLE_PHI)
  {
--- 2838,2865 
  
  
  /* Create a new name for OLD_NAME in statement STMT and replace the
!operand pointed to by DEF_P with the newly created name.  If DEF_P
!is NULL then STMT should be a GIMPLE assignment.
!Return the new name and register the replacement mapping  in
 update_ssa's tables.  */
  
  tree
  create_new_def_for (tree old_name, gimple stmt, def_operand_p def)
  {
!   tree new_name;
  
!   timevar_push (TV_TREE_SSA_INCREMENTAL);
! 
!   if (!update_ssa_initialized_fn)
! init_update_ssa (cfun);
! 
!   gcc_assert (update_ssa_initialized_fn == cfun);
! 
!   new_name = duplicate_ssa_name (old_name, stmt);
!   if (def)
! SET_DEF (def, new_name);
!   else
! gimple_assign_set_lhs (stmt, new_name);
  
if (gimple_co

Re: [graphds.h] Allocate graph from obstack

2012-08-20 Thread Paolo Bonzini

Il 19/08/2012 18:55, Richard Guenther ha scritto:
>> > Initially I had one obstack per struct graph, which was better than using
>> > XNEW for every edge, but still obstack_init() called from new_graph() was
>> > too frequent.
>> >
>> > So in this iteration of the patch the obstack is static global, initialised
>> > once and never freed. Please advise on whether this is acceptable, and also
>> > where I should initialise the obstack once, and avoid checking if it's NULL
>> > in every use.
>> >
>> > Minor speed gains (couple of ms), tested with pre-C++ conversion snapshot,
>> > I'll retest soon and post update.
> Either an obstack per graph or the ability to specify an obstack for 
> allocation.
> A global obstack with global lifetime is bad IMHO.

Dimitrios's patch has a per-file obstack with per-pass lifetime, which I
think is the right solution---but putting graph_obstack as a static
variable in graphds.h is gly.  You can just move the declaration to
each file separately, and give it a better name perhaps (e.g.
loop_graph_obstack).

Paolo

Re: [graphds.h] Allocate graph from obstack

2012-08-20 Thread Richard Guenther

On Mon, Aug 20, 2012 at 2:03 PM, Paolo Bonzini  wrote:
> Il 19/08/2012 18:55, Richard Guenther ha scritto:
>>> > Initially I had one obstack per struct graph, which was better than using
>>> > XNEW for every edge, but still obstack_init() called from new_graph() was
>>> > too frequent.
>>> >
>>> > So in this iteration of the patch the obstack is static global, 
>>> > initialised
>>> > once and never freed. Please advise on whether this is acceptable, and 
>>> > also
>>> > where I should initialise the obstack once, and avoid checking if it's 
>>> > NULL
>>> > in every use.
>>> >
>>> > Minor speed gains (couple of ms), tested with pre-C++ conversion snapshot,
>>> > I'll retest soon and post update.
>> Either an obstack per graph or the ability to specify an obstack for 
>> allocation.
>> A global obstack with global lifetime is bad IMHO.
>
> Dimitrios's patch has a per-file obstack with per-pass lifetime, which I
> think is the right solution---but putting graph_obstack as a static
> variable in graphds.h is gly.  You can just move the declaration to
> each file separately, and give it a better name perhaps (e.g.
> loop_graph_obstack).

Well, I see that the way it is constructed is that you can at most have a
single live graph at the same time (using the same obstack).  That's a
big limitation IMHO.  Now, if the users would manage the obstack completely
and instead would obstack_init()/free() them that would be different.  Of course
the outcome is exactly as with the initial patch having the obstack per
struct graph.

The present patch has too much low-level stuff exposed and does not easily
allow putting other data on the same obstack.

So I'd vote for managing the obstack completely in the caller for flexibility.
Some may want to allocate auxiliar data on the same obstack for example.

Richard.

> Paolo

[PATCH][RFC] Move TREE_VEC length and SSA_NAME version into tree_base

2012-08-20 Thread Richard Guenther


This shrinks TREE_VEC from 40 bytes to 32 bytes and SSA_NAME from
80 bytes to 72 bytes on a 64bit host.  Both structures suffer
from the fact they need storage for an integer (length and version)
which leaves unused padding.  Both data structures do not require
as many flag bits as we keep in tree_base though, so they can
conveniently use the upper 4-bytes of the 8-bytes tree_base to
store length / version.

I added a union to tree_base to divide up the space between flags
(possibly) used for all tree kinds and flags that are not used
for those who chose to re-use the upper 4-bytes of tree_base for
something else.

This superseeds the patch that moved the C++ specific usage of
TREE_CHAIN on TREE_VECs to tree_base (same savings, but TREE_VEC
isn't any closer to be based on tree_base only).

Due to re-use of flags from frontends definitive checking for
flag accesses is not always possible (TREE_NOTRHOW for example).
Where appropriate I added TREE_NOT_CHECK (NODE, TREE_VEC) instead,
to catch mis-uses of the C++ frontend.  Changed ARGUMENT_PACK_INCOMPLETE_P
to use TREE_ADDRESSABLE instead of TREE_LANG_FLAG_0 then which
it used on TREE_VECs.

We are very lazy adjusting flag usage documentation :/

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

2012-08-20  Richard Guenther  

cp/
* cp-tree.h (TREE_INDIRECT_USING): Use TREE_LANG_FLAG_0 accessor.
(ATTR_IS_DEPENDENT): Likewise.
(ARGUMENT_PACK_INCOMPLETE_P): Use TREE_ADDRESSABLE instead of
TREE_LANG_FLAG_0 on TREE_VECs.

* tree.h (struct tree_base): Add union to make it possible to
re-use the upper 4 bytes for tree codes that do not need as
many flags as others.  Move visited and default_def_flag to
common bits section in exchange for saturating_flag and
unsigned_flag.  Add SSA name version and tree vec length
fields here.
(struct tree_vec): Remove length field here.
(struct tree_ssa_name): Remove version field here.

Index: trunk/gcc/cp/cp-tree.h
===
*** trunk.orig/gcc/cp/cp-tree.h 2012-08-20 12:47:47.0 +0200
--- trunk/gcc/cp/cp-tree.h  2012-08-20 13:53:05.212969994 +0200
*** struct GTY((variable_size)) lang_decl {
*** 2520,2530 
  
  /* In a TREE_LIST concatenating using directives, indicate indirect
 directives  */
! #define TREE_INDIRECT_USING(NODE) (TREE_LIST_CHECK (NODE)->base.lang_flag_0)
  
  /* In a TREE_LIST in an attribute list, indicates that the attribute
 must be applied at instantiation time.  */
! #define ATTR_IS_DEPENDENT(NODE) (TREE_LIST_CHECK (NODE)->base.lang_flag_0)
  
  extern tree decl_shadowed_for_var_lookup (tree);
  extern void decl_shadowed_for_var_insert (tree, tree);
--- 2520,2530 
  
  /* In a TREE_LIST concatenating using directives, indicate indirect
 directives  */
! #define TREE_INDIRECT_USING(NODE) TREE_LANG_FLAG_0 (TREE_LIST_CHECK (NODE))
  
  /* In a TREE_LIST in an attribute list, indicates that the attribute
 must be applied at instantiation time.  */
! #define ATTR_IS_DEPENDENT(NODE) TREE_LANG_FLAG_0 (TREE_LIST_CHECK (NODE))
  
  extern tree decl_shadowed_for_var_lookup (tree);
  extern void decl_shadowed_for_var_insert (tree, tree);
*** extern void decl_shadowed_for_var_insert
*** 2881,2887 
 arguments will be placed into the beginning of the argument pack,
 but additional arguments might still be deduced.  */
  #define ARGUMENT_PACK_INCOMPLETE_P(NODE)\
!   TREE_LANG_FLAG_0 (ARGUMENT_PACK_ARGS (NODE))
  
  /* When ARGUMENT_PACK_INCOMPLETE_P, stores the explicit template
 arguments used to fill this pack.  */
--- 2881,2887 
 arguments will be placed into the beginning of the argument pack,
 but additional arguments might still be deduced.  */
  #define ARGUMENT_PACK_INCOMPLETE_P(NODE)\
!   TREE_ADDRESSABLE (ARGUMENT_PACK_ARGS (NODE))
  
  /* When ARGUMENT_PACK_INCOMPLETE_P, stores the explicit template
 arguments used to fill this pack.  */
Index: trunk/gcc/tree.h
===
*** trunk.orig/gcc/tree.h   2012-08-20 12:47:47.0 +0200
--- trunk/gcc/tree.h2012-08-20 13:56:12.109963537 +0200
*** struct GTY(()) tree_base {
*** 427,435 
unsigned addressable_flag : 1;
unsigned volatile_flag : 1;
unsigned readonly_flag : 1;
-   unsigned unsigned_flag : 1;
unsigned asm_written_flag: 1;
unsigned nowarning_flag : 1;
  
unsigned used_flag : 1;
unsigned nothrow_flag : 1;
--- 427,435 
unsigned addressable_flag : 1;
unsigned volatile_flag : 1;
unsigned readonly_flag : 1;
unsigned asm_written_flag: 1;
unsigned nowarning_flag : 1;
+   unsigned visited : 1;
  
unsigned used_flag : 1;
unsigned nothrow_flag : 1;
*** struct GTY(()) tree_base {
*** 438,465 
unsigned pr

Re: [wwwdocs] SH 4.8 changes update

2012-08-20 Thread Gerald Pfeifer

On Mon, 20 Aug 2012, Oleg Endo wrote:
> You mean, since SH is neither a primary nor a secondary platform,
> there are no particular release criteria for it?  What does that
> actually mean?

>From what I recall, you can make stage 1 type changes even after
stage 1.  However, if your port is broken, the release managers
will shrug their shoulders and release even if your port is quite
broken due to that.


Related to your web page update, I just comitted the patch below
which allows for direct linking to the SH entry.

Gerald


Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.16
diff -u -3 -p -r1.16 changes.html
--- changes.html20 Aug 2012 11:02:28 -  1.16
+++ changes.html20 Aug 2012 12:22:52 -
@@ -159,7 +159,7 @@ by this change.
 options to the assembler.
   
 
-SH
+SH
   
 The default alignment settings have been reduced to be less aggressive.
 This results in more compact code for optimization levels other than

Re: [wwwdocs] SH 4.8 changes update

2012-08-20 Thread Richard Guenther

On Mon, Aug 20, 2012 at 1:37 AM, Oleg Endo  wrote:
> On Sun, 2012-08-19 at 21:43 +0200, Gerald Pfeifer wrote:
>> On Sun, 19 Aug 2012, Oleg Endo wrote:
>> > Thanks.  Let's hope that I can squeeze in some more stuff while
>> > stage 1 lasts. :T
>>
>> You know that for backend-specific changes (especially for "smaller"
>> ports) you actually have some more leeway?
>
> You mean, since SH is neither a primary nor a secondary platform, there
> are no particular release criteria for it?  What does that actually
> mean?

It means that release managers do not care about the state of the SH port.
It means SH target maintainers have to watch how the release goes and
decide when to put effort into testing and fixing bugs themselves, otherwise
they may run into the situation that the SH port does not even build for
the release.  So kind of target maintainers for ports that are not primary
or secondary are their own release managers without any control over
release timing.

Richard.

Re: [graphds.h] Allocate graph from obstack

2012-08-20 Thread Dimitrios Apostolou


Hi Paolo,

On Mon, 20 Aug 2012, Paolo Bonzini wrote:

Il 19/08/2012 18:55, Richard Guenther ha scritto:

Initially I had one obstack per struct graph, which was better than using
XNEW for every edge, but still obstack_init() called from new_graph() was
too frequent.

So in this iteration of the patch the obstack is static global, initialised
once and never freed. Please advise on whether this is acceptable, and also
where I should initialise the obstack once, and avoid checking if it's NULL
in every use.

Minor speed gains (couple of ms), tested with pre-C++ conversion snapshot,
I'll retest soon and post update.

Either an obstack per graph or the ability to specify an obstack for allocation.
A global obstack with global lifetime is bad IMHO.


Dimitrios's patch has a per-file obstack with per-pass lifetime


Notice that I never call XOBDELETE with NULL argument, I only free the 
first object, which means that the 4KB per obstack overhead is retained 
until program exit, I did that to save on malloc calls.



Thanks,
Dimitris

Re: [bootstrap] Tentative fix for PR 54281

2012-08-20 Thread Diego Novillo


On 2012-08-19 07:18 , Arnaud Charlet wrote:



The conditionals cannot be removed for the time being because the
foreign language interface of the Ada part of the compiler is
hardcoded for C.

Barring a massive switch to the Ada language for the GNU project,
we would need to switch the foreign language interface to C++,
which might introduce various maintenance issues in the short term
(Arno CCed).


Yes, that would be a large hearthquake for both the compiler and the
run-time, which is unwanted at this stage. I guess the solution for
now is to more selectively export the various symbols as C symbols
and include header files as is.


Thanks.

One potential issue I see here is that as we start using more C++ 
headers (and more C++ in existing headers), we will reach a point in 
which including something like system.h inside an extern "C" {} block 
will fail.


None of the GCC headers should be included as C code anymore.


Diego.

Re: new sign/zero extension elimination pass

2012-08-20 Thread Tom de Vries

On 11/07/12 12:30, Tom de Vries wrote:
> On 13/11/10 10:50, Eric Botcazou wrote:
>>> I profiled the pass on spec2000:
>>>
>>> -mabi=32 -mabi=64
>>> ee-pass (usr time): 0.70 1.16
>>> total   (usr time):   919.30   879.26
>>> ee-pass(%): 0.08 0.13
>>>
>>> The pass takes 0.13% or less of the total usr runtime.
>>
>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>
>>> Is it necessary to improve the runtime of this pass?
>>
>> I've already given my opinion about the implementation.  The other passes in 
>> the compiler try hard not to rescan everything when a single bit changes; as 
>> currently written, yours doesn't.
>>
> 
> Eric,
> 
> I've done the following:
> - refactored the pass such that it now scans at most twice over all
>   instructions.
> - updated the patch to be applicable to current trunk
> - updated the motivating example to a more applicable one (as discussed in
>   this thread), and added that one as test-case.
> - added a part in the header comment illustrating the working of the pass
>   on the motivating example.
> 
> bootstrapped and reg-tested on x86_64 and i686.
> 
> build and reg-tested on mips, mips64, and arm.
> 
> OK for trunk?
> 

Eric,

does the new patch meet your concerns related to rescanning?

If so, OK for trunk?

Thanks,
- Tom


> Thanks,
> - Tom
> 
> 2012-07-10  Tom de Vries  
> 
>   * ee.c: New file.
>   * tree-pass.h (pass_ee): Declare.
>   * opts.c ( default_options_table): Set flag_ee at -O2.
>   * timevar.def (TV_EE): New timevar.
>   * common.opt (fextension-elimination): New option.
>   * Makefile.in (ee.o): New rule.
>   * passes.c (pass_ee): Add it.
> 
>   * gcc.dg/extend-1.c: New test.
>   * gcc.dg/extend-2.c: Same.
>   * gcc.dg/extend-2-64.c: Same.
>   * gcc.dg/extend-3.c: Same.
>   * gcc.dg/extend-4.c: Same.
>   * gcc.dg/extend-5.c: Same.
>   * gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>

patch for machine dependent rtl section to hide case statements for different types of constants.

2012-08-20 Thread Kenneth Zadeck

This patch started out to be a purely mechanical change to the switch 
statements so that the ones that are used to take apart constants can be 
logically grouped.This is important for the next patch that I will 
submit this week that frees the rtl level from only being able to 
represent large integer constants with two HWIs.


I sent the patch to Richard Sandiford and when the comments came back 
from him, this patch turned into something that actually has real 
semantic changes.   (His comments are enclosed below.)   I did almost 
all of Richard's changes because he is generally right about such 
things, but it does mean that the patch has to be more carefully 
reviewed.   Richard does not count his comments as a review.


The patch has, of course, been properly tested on x86-64.

Any comments?  Ok for commit?

Kenny

Richard's comments:
=
The omission of CONST_FIXED from the cselib_expand_value_rtx_1,
attr_copy_rtx, clear_struct_flag and combine switches looks
unintentional (though only as a missed compiler-speed optimisation).
Same goes for the omission of CONST_VECTOR from check_maybe_invariant.

The omission of CONST_FIXED from dse.c:const_or_frame_p looks like
a missed target-code optimisation.  The function ought to be using
CONSTANT_P instead.

 I did not do what is suggested in the last sentence because
 it changes the behavior of the rtx "HIGH".

I don't see any reason to prefer the hashing of CONST_VECTORs in
hash_invariant_expr_1 and invariant_expr_equal_p over the default
cse implementation, so here too I think we can add CONST_VECTOR
to the switch.

cse.c:exp_equiv_p, rtx_equal_for_memref_p, operands_match_p,
rtx_equal_p_cb and rtx_equal_p  are all testing the property
"is pointer equality sufficient for this kind of constant".
rtx_renumbered_equal_p is too, so I think the omission of
CONST_FIXED there is again unintentional.

That leaves mark_jump_label_1.  This block is really testing
"does this rtx have no operands that might be labels",
which is again true for CONST_FIXED.  TBH, this smacks
of premature optimisation to me.  How do we know that
checking each code here is cheaper than falling through?
I doubt the switch is populated enough to merit a jump table,
and the number of executed branches might well be higher
like this when you take other codes into account.

So in terms of setting the abstraction level, I think there
are two options:

1) Define:

/* Match CONST_*s that can represent compile-time constant integers.  */
#define CASE_CONST_SCALAR_INT \
  case CONST_INT: \
  case CONST_DOUBLE

/* Match CONST_*s for which pointer equality corresponds to value 
equality.  */

#define CASE_CONST_UNIQUE \
  case CONST_INT: \
  case CONST_DOUBLE: \
  case CONST_FIXED

/* Match all CONST_* rtxes.  */
#define CASE_CONST_ANY \
  case CONST_INT: \
  case CONST_DOUBLE: \
  case CONST_FIXED: \
  case CONST_VECTOR

and remove the mark_jump_label_1 cases.

The name "CASE_CONST_SCALAR_INT" matches "SCALAR_INT_MODE_P",
CASE_CONST_UNIQUE is solely for the equality functions above.

2) As for (1), but keep the mark_jump_label_1 cases and
rename CASE_CONST_UNIQUE to CASE_CONST_LEAF so that mark_jump_label_1
can use it.  This implies that all "leaf" CONST_*s (i.e. those without
rtx operands) will always be hashed to give pointer equality.

I don't like that assumption, and CASE_CONST_UNIQUE feels like a better
abstraction, so I prefer (1) over (2).

For avoidance of doubt, this isn't an approval or anything.
It definitely needs to be submitted upstream.  (Yeah, yeah,
sorry for nagging. :-))

Richard
==

diff -uprN '--exclude=.svn' gccBaseline/gcc/alias.c gccWCase/gcc/alias.c
--- gccBaseline/gcc/alias.c	2012-08-17 09:35:24.794195890 -0400
+++ gccWCase/gcc/alias.c	2012-08-19 09:48:33.666509880 -0400
@@ -1486,9 +1486,7 @@ rtx_equal_for_memref_p (const_rtx x, con
   return XSTR (x, 0) == XSTR (y, 0);
 
 case VALUE:
-case CONST_INT:
-case CONST_DOUBLE:
-case CONST_FIXED:
+CASE_CONST_UNIQUE:
   /* There's no need to compare the contents of CONST_DOUBLEs or
 	 CONST_INTs because pointer equality is a good enough
 	 comparison for these nodes.  */
diff -uprN '--exclude=.svn' gccBaseline/gcc/combine.c gccWCase/gcc/combine.c
--- gccBaseline/gcc/combine.c	2012-08-17 09:35:24.802195795 -0400
+++ gccWCase/gcc/combine.c	2012-08-17 09:36:49.921199621 -0400
@@ -531,12 +531,10 @@ find_single_use_1 (rtx dest, rtx *loc)
 
   switch (code)
 {
-case CONST_INT:
 case CONST:
 case LABEL_REF:
 case SYMBOL_REF:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_INTEGER_OR_FLOAT:
 case CLOBBER:
   return 0;
 
@@ -12788,10 +12786,8 @@ mark_used_regs_combine (rtx x)
 {
 case LABEL_REF:
 case SYMBOL_REF:
-case CONST_INT:
 case CONST:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_INTEGER_OR_FLOAT:
 case PC:
 case ADDR_VEC:
 case ADDR_DIFF_VEC:
diff -uprN '--exclu

Re: patch for machine dependent rtl section to hide case statements for different types of constants.

2012-08-20 Thread Kenneth Zadeck


I of course meant the machine "independent" not "dependent"
On 08/20/2012 09:50 AM, Kenneth Zadeck wrote:
This patch started out to be a purely mechanical change to the switch 
statements so that the ones that are used to take apart constants can 
be logically grouped.This is important for the next patch that I 
will submit this week that frees the rtl level from only being able to 
represent large integer constants with two HWIs.


I sent the patch to Richard Sandiford and when the comments came back 
from him, this patch turned into something that actually has real 
semantic changes.   (His comments are enclosed below.)   I did almost 
all of Richard's changes because he is generally right about such 
things, but it does mean that the patch has to be more carefully 
reviewed.   Richard does not count his comments as a review.


The patch has, of course, been properly tested on x86-64.

Any comments?  Ok for commit?

Kenny

Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate

2012-08-20 Thread Tobias Burnus


Hi Richard,

your patch fails here; I get the build failure:

/projects/tob/gcc-git/gcc/gcc/tree-ssa-math-opts.c: In function ‘bool 
is_widening_mult_rhs_p(tree, tree, tree_node**, tree_node**)’:
/projects/tob/gcc-git/gcc/gcc/tree-ssa-math-opts.c:2014:18: error: 
variable ‘rhs_code’ set but not used [-Werror=unused-but-set-variable]

   enum tree_code rhs_code;
  ^

Tobias

On 08/17/2012 07:05 PM, Richard Earnshaw wrote:

--- tree-ssa-math-opts.c(revision 190502)
+++ tree-ssa-math-opts.c(local)



@@ -1982,9 +2019,7 @@ is_widening_mult_rhs_p (tree type, tree
if (is_gimple_assign (stmt))
{
  rhs_code = gimple_assign_rhs_code (stmt);
- if (TREE_CODE (type) == INTEGER_TYPE
- ? !CONVERT_EXPR_CODE_P (rhs_code)
- : rhs_code != FIXED_CONVERT_EXPR)
+ if (! widening_mult_conversion_strippable_p (type, stmt))
rhs1 = rhs;
  else
{

Re: PATCH: PR bootstrap/54209: [4.8 Regression] Failed to build gcc for Android/x86

2012-08-20 Thread H.J. Lu

On Fri, Aug 17, 2012 at 2:42 AM, Chupin, Pavel V
 wrote:
> Submitted patch here: https://android-review.googlesource.com/#/c/41705
>
> -- Pavel
>

link.h has been added to AOSP.  I am closing PR 54209.

Thanks.

-- 
H.J.

Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate

2012-08-20 Thread Richard Earnshaw

On 20/08/12 15:01, Tobias Burnus wrote:
> Hi Richard,
> 
> your patch fails here; I get the build failure:
> 
> /projects/tob/gcc-git/gcc/gcc/tree-ssa-math-opts.c: In function ‘bool 
> is_widening_mult_rhs_p(tree, tree, tree_node**, tree_node**)’:
> /projects/tob/gcc-git/gcc/gcc/tree-ssa-math-opts.c:2014:18: error: 
> variable ‘rhs_code’ set but not used [-Werror=unused-but-set-variable]
> enum tree_code rhs_code;
>^
> 
> Tobias
> 
> On 08/17/2012 07:05 PM, Richard Earnshaw wrote:
>> --- tree-ssa-math-opts.c (revision 190502)
>> +++ tree-ssa-math-opts.c (local)
> 
>> @@ -1982,9 +2019,7 @@ is_widening_mult_rhs_p (tree type, tree
>> if (is_gimple_assign (stmt))
>>  {
>>rhs_code = gimple_assign_rhs_code (stmt);
>> -  if (TREE_CODE (type) == INTEGER_TYPE
>> -  ? !CONVERT_EXPR_CODE_P (rhs_code)
>> -  : rhs_code != FIXED_CONVERT_EXPR)
>> +  if (! widening_mult_conversion_strippable_p (type, stmt))
>>  rhs1 = rhs;
>>else
>>  {
>>
> 
> 


Whoops!  Sorry about that.

Fixed thusly.  Committed as obvious.

PR tree-ssa/54295
* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Delete rhs_code
declaration and setter.

R.

Index: tree-ssa-math-opts.c
===
--- tree-ssa-math-opts.c(revision 190533)
+++ tree-ssa-math-opts.c(working copy)
@@ -2011,14 +2011,12 @@ is_widening_mult_rhs_p (tree type, tree 
 {
   gimple stmt;
   tree type1, rhs1;
-  enum tree_code rhs_code;
 
   if (TREE_CODE (rhs) == SSA_NAME)
 {
   stmt = SSA_NAME_DEF_STMT (rhs);
   if (is_gimple_assign (stmt))
{
- rhs_code = gimple_assign_rhs_code (stmt);
  if (! widening_mult_conversion_strippable_p (type, stmt))
rhs1 = rhs;
  else

[C++ Patch] PR 10416

2012-08-20 Thread Paolo Carlini


Hi,

in this old issue submitter points out that we emit too easily 
Wunused_variable warnings even when the destructor has side effects 
(otoh, the constructor is handled Ok). This is particularly annoying 
together with eg, RAII.


Turns out that lately we are already careful when we handle the new 
Wunused_but_set_variable, thus I'm simply proposing to commonize the 
additional check.


Tested x86_64-linux.

Thanks,
Paolo.

/
/cp
2012-08-20  Paolo Carlini  

PR c++/10416
* decl.c (poplevel): Check TYPE_HAS_NONTRIVIAL_DESTRUCTOR for
Wunused_variable too.

/testsuite
2012-08-20  Paolo Carlini  

PR c++/10416
* g++.dg/warn/Wunused-var-17.C: New.
Index: testsuite/g++.dg/warn/Wunused-var-17.C
===
--- testsuite/g++.dg/warn/Wunused-var-17.C  (revision 0)
+++ testsuite/g++.dg/warn/Wunused-var-17.C  (revision 0)
@@ -0,0 +1,4 @@
+// PR c++/10416
+// { dg-options "-Wunused" }
+
+void f () { struct atend { ~atend () { __builtin_printf("leaving f\n"); } } a; 
}
Index: cp/decl.c
===
--- cp/decl.c   (revision 190526)
+++ cp/decl.c   (working copy)
@@ -621,16 +621,16 @@ poplevel (int keep, int reverse, int functionbody)
   if (TREE_CODE (decl) == VAR_DECL
  && (! TREE_USED (decl) || !DECL_READ_P (decl))
  && ! DECL_IN_SYSTEM_HEADER (decl)
- && DECL_NAME (decl) && ! DECL_ARTIFICIAL (decl))
+ && DECL_NAME (decl) && ! DECL_ARTIFICIAL (decl)
+ && TREE_TYPE (decl) != error_mark_node
+ && (!CLASS_TYPE_P (TREE_TYPE (decl))
+ || !TYPE_HAS_NONTRIVIAL_DESTRUCTOR (TREE_TYPE (decl
{
  if (! TREE_USED (decl))
warning (OPT_Wunused_variable, "unused variable %q+D", decl);
  else if (DECL_CONTEXT (decl) == current_function_decl
-  && TREE_TYPE (decl) != error_mark_node
   && TREE_CODE (TREE_TYPE (decl)) != REFERENCE_TYPE
-  && errorcount == unused_but_set_errorcount
-  && (!CLASS_TYPE_P (TREE_TYPE (decl))
-  || !TYPE_HAS_NONTRIVIAL_DESTRUCTOR (TREE_TYPE (decl
+  && errorcount == unused_but_set_errorcount)
{
  warning (OPT_Wunused_but_set_variable,
   "variable %q+D set but not used", decl);

[PATCH] PR53992 - openmp lower transaction code

2012-08-20 Thread Patrick Marlier


In this PR, OMP lowering is not going into the transaction code.
So if GIMPLE_TRANSACTION is found, we lower its body.
(Patch also fixes a format issue.)

Note that PR53992 component in Bugzilla must be change from c to libgomp 
(I don't have bugzilla account with admin rights, who should I ask for 
that?).


Tested on trunk / i686.

Ok for trunk? Ok to backport to 4.7 branch if no regression?
Thanks.

gcc/
2012-08-17  Patrick Marlier  

PR libgomp/53992
* omp-low.c (lower_omp_1): Handle GIMPLE_TRANSACTION.
Index: omp-low.c
===
--- omp-low.c	(revision 190488)
+++ omp-low.c	(working copy)
@@ -6827,6 +6827,9 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_cont
   lower_omp (gimple_try_eval_ptr (stmt), ctx);
   lower_omp (gimple_try_cleanup_ptr (stmt), ctx);
   break;
+case GIMPLE_TRANSACTION:
+  lower_omp (gimple_transaction_body_ptr (stmt), ctx);
+  break;
 case GIMPLE_BIND:
   lower_omp (gimple_bind_body_ptr (stmt), ctx);
   break;
@@ -7108,24 +7111,24 @@ diagnose_sb_2 (gimple_stmt_iterator *gsi_p, bool *
   break;
 
 case GIMPLE_COND:
-	{
-	  tree lab = gimple_cond_true_label (stmt);
-	  if (lab)
-	{
-	  n = splay_tree_lookup (all_labels,
- (splay_tree_key) lab);
-	  diagnose_sb_0 (gsi_p, context,
-			 n ? (gimple) n->value : NULL);
-	}
-	  lab = gimple_cond_false_label (stmt);
-	  if (lab)
-	{
-	  n = splay_tree_lookup (all_labels,
- (splay_tree_key) lab);
-	  diagnose_sb_0 (gsi_p, context,
-			 n ? (gimple) n->value : NULL);
-	}
-	}
+  {
+	tree lab = gimple_cond_true_label (stmt);
+	if (lab)
+	  {
+	n = splay_tree_lookup (all_labels,
+   (splay_tree_key) lab);
+	diagnose_sb_0 (gsi_p, context,
+			   n ? (gimple) n->value : NULL);
+	  }
+	lab = gimple_cond_false_label (stmt);
+	if (lab)
+	  {
+	n = splay_tree_lookup (all_labels,
+   (splay_tree_key) lab);
+	diagnose_sb_0 (gsi_p, context,
+			   n ? (gimple) n->value : NULL);
+	  }
+  }
   break;
 
 case GIMPLE_GOTO:
Index: testsuite/gcc.dg/gomp/pr53992.c
===
--- testsuite/gcc.dg/gomp/pr53992.c	(revision 0)
+++ testsuite/gcc.dg/gomp/pr53992.c	(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm -fopenmp" } */
+/* { dg-require-effective-target fgnu_tm } */
+
+int main() {
+long data[1];
+long i, min=1;
+for (i=0; i<1; i++) data[i] = -i;
+
+#pragma omp parallel for
+for (i=0; i<1; i++) {
+__transaction_atomic
+{
+if (data[i] < min)
+min = data[i];
+}
+}
+
+return !(min == -);
+}

Fix -ftime-report for C++ lookup

2012-08-20 Thread Diego Novillo


Found this while running -ftime-report on a largish C++ source file.
We need to start TV_NAME_LOOKUP conditionally inside poplevel()
because it may be called from another lookup routine that already has
TV_NAME_LOOKUP going.

Tested on x86_64.  Committed to trunk.

2012-08-20  Diego Novillo  

* decl.c (poplevel): Start TV_NAME_LOOKUP conditionally.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 5908996..0dad597 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -552,7 +552,7 @@ poplevel (int keep, int reverse, int functionbody)
   unsigned ix;
   cp_label_binding *label_bind;

-  timevar_start (TV_NAME_LOOKUP);
+  bool subtime = timevar_cond_start (TV_NAME_LOOKUP);
  restart:

   block = NULL_TREE;
@@ -818,7 +818,7 @@ poplevel (int keep, int reverse, int functionbody)
   if (kind == sk_cleanup)
 goto restart;

-  timevar_stop (TV_NAME_LOOKUP);
+  timevar_cond_stop (TV_NAME_LOOKUP, subtime);
   return block;
 }

Re: Reproducible gcc builds, gfortran, and -grecord-gcc-switches

2012-08-20 Thread Simon Baldwin

On 17 August 2012 16:55, Joseph S. Myers  wrote:
>
> On Fri, 17 Aug 2012, Simon Baldwin wrote:
>
> > > You could have just added
> > >   case OPT_cpp_:
> > > to the switch in gen_producer_string, instead of all this.
> >
> > Thanks.  I was under the impression, apparently mistaken, that
> > OPT_cpp_ exists only if fortran is enabled.
>
> OPT_* for Fortran options only exist when the Fortran front-end is in the
> source tree (whether or not enabled).  I think we try to avoid knowingly
> breaking use cases where people remove some front ends from the source
> tree, although we don't actively test them and no longer provide split-up
> source tarballs.

Thanks for the update.  Which fix should move forwards?

--
Google UK Limited | Registered Office: Belgrave House, 76 Buckingham
Palace Road, London SW1W 9TQ | Registered in England Number: 3977902

Re: [PATCH] PR53992 - openmp lower transaction code

2012-08-20 Thread Richard Henderson

On 08/20/2012 07:20 AM, Patrick Marlier wrote:
> 2012-08-17  Patrick Marlier  
> 
> PR libgomp/53992
> * omp-low.c (lower_omp_1): Handle GIMPLE_TRANSACTION.

Ok everywhere.

The Bugzilla must not change to libgomp, as that is
reserved for the runtime library.


r~

Re: [PATCH] PR53992 - openmp lower transaction code

2012-08-20 Thread Jakub Jelinek

On Mon, Aug 20, 2012 at 10:20:33AM -0400, Patrick Marlier wrote:
> Ok for trunk? Ok to backport to 4.7 branch if no regression?

Ok for both, with the following nits resolved:

> gcc/
> 2012-08-17  Patrick Marlier  
> 
>   PR libgomp/53992

Use PR middle-end/53992 instead, libgomp is for library issues only.

>   * omp-low.c (lower_omp_1): Handle GIMPLE_TRANSACTION.

> @@ -7108,24 +7111,24 @@ diagnose_sb_2 (gimple_stmt_iterator *gsi_p, bool *
>break;
>  
>  case GIMPLE_COND:
> - {
> -   tree lab = gimple_cond_true_label (stmt);
> -   if (lab)
> - {
> -   n = splay_tree_lookup (all_labels,
> -  (splay_tree_key) lab);
> -   diagnose_sb_0 (gsi_p, context,
> -  n ? (gimple) n->value : NULL);
> - }
> -   lab = gimple_cond_false_label (stmt);
> -   if (lab)
> - {
> -   n = splay_tree_lookup (all_labels,
> -  (splay_tree_key) lab);
> -   diagnose_sb_0 (gsi_p, context,
> -  n ? (gimple) n->value : NULL);
> - }
> - }
> +  {
> + tree lab = gimple_cond_true_label (stmt);
> + if (lab)
> +   {
> + n = splay_tree_lookup (all_labels,
> +(splay_tree_key) lab);
> + diagnose_sb_0 (gsi_p, context,
> +n ? (gimple) n->value : NULL);
> +   }
> + lab = gimple_cond_false_label (stmt);
> + if (lab)
> +   {
> + n = splay_tree_lookup (all_labels,
> +(splay_tree_key) lab);
> + diagnose_sb_0 (gsi_p, context,
> +n ? (gimple) n->value : NULL);
> +   }
> +  }
>break;
>  
>  case GIMPLE_GOTO:

Please leave this hunk out.  Formatting can be normally fixed only
if you touch the code in question or at least lines around it, not
in an unrelated patch that touches completely different function.

> --- testsuite/gcc.dg/gomp/pr53992.c   (revision 0)
> +++ testsuite/gcc.dg/gomp/pr53992.c   (working copy)
> @@ -0,0 +1,20 @@

Please add
/* PR middle-end/53992 */
line to the beginning of the file.

> +/* { dg-do compile } */
> +/* { dg-options "-fgnu-tm -fopenmp" } */
> +/* { dg-require-effective-target fgnu_tm } */
> +
> +int main() {
> +long data[1];
> +long i, min=1;
> +for (i=0; i<1; i++) data[i] = -i;
> +
> +#pragma omp parallel for
> +for (i=0; i<1; i++) {
> +__transaction_atomic
> +{
> +if (data[i] < min)
> +min = data[i];
> +}
> +}
> +
> +return !(min == -);
> +}

Jakub

Re: Reproducible gcc builds, gfortran, and -grecord-gcc-switches

2012-08-20 Thread Joseph S. Myers

On Mon, 20 Aug 2012, Simon Baldwin wrote:

> > OPT_* for Fortran options only exist when the Fortran front-end is in the
> > source tree (whether or not enabled).  I think we try to avoid knowingly
> > breaking use cases where people remove some front ends from the source
> > tree, although we don't actively test them and no longer provide split-up
> > source tarballs.
> 
> Thanks for the update.  Which fix should move forwards?

I think the approach using a new option flag is the way to go, though the 
patch needs (at least) documentation for the new flag in options.texi.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Teresa Johnson

On Mon, Aug 20, 2012 at 2:48 AM, Jan Hubicka  wrote:
>> Well, it should store the largest working set in BBs, or one that came
>> from a much longer run. But I see the issue you are pointing out. The
>> num_counters (the number of hot bbs) should be reasonable, as the
>> total number of BBs is the same between all runs, and I want to show
>> the largest or more dominant working set in terms of the number of hot
>> bbs. But the min_bb_counter will become more and more inaccurate as
>> the number of runs increases, since the counter values are
>> accumulated.
>
> Yes and that is the one that we plan to use to determine hot/cold decisions 
> on,
> right?

Yes, so I agree it is important to do something to update this as
profiles are merged.

>
> Note that there is no really 1-1 corespondence in betwen BBs and counters.
> For each function the there should be num_edges-num_bbs+1 counters.
> What do you plan to use BB counts for?

True, although what I am using this for is to get an idea of the size
of the working set to help identify large icache footprints in order
to control code size increasing optimizations such as unrolling. The
idea is that a huge number of hot counters indicates a huge number of
hot bbs which in turn indicates a huge number of hot instructions and
therefore high icache pressure. We're using it in the google branch to
control unrolling successfully.

>>
>> I typed up a detailed email below on why getting this right would be
>> difficult, but then I realized there might be a fairly simple accurate
>> solution, which I'll describe first:
>>
>> The only way I see to do this completely accurately is to take two
>> passes through the existing gcda files that we are merging into, one
>> to read in all the counter values and merge them into all the counter
>> values in the arrays from the current run (after which we can do the
>> histogramming/working set computation accurately from the merged
>> counters), and the second to rewrite them. In libgcov this doesn't
>> seem like it would be too difficult to do, although it is a little bit
>> of restructuring of the main merging loop and needs some special
>> handling for buffered functions (which could increase the memory
>> footprint a bit if there are many of these since they will all need to
>> be buffered across the iteration over all the gcda files).
>>
>> The summary merging in coverage.c confuses me a bit as it seems to be
>> handling the case when there are multiple program summaries in a
>> single gcda file. When would this happen? It looks like the merge
>> handling in libgcov should always produce a single program summary per
>> gcda file.
>
>
> This is something Nathan introduced years back. The idea was IMO to handle
> more acurately objects linked into multiple binaries. I am not sure
> if the code really works or worked on some point.
>
> The idea, as I recall it, was to produce overall checksum of all objects and
> have different profile records for each combination.
>
> This is not really useful for profile feedback as you generate single object
> file for all uses, but it might become useful for LTO where you know into 
> which
> binary you are linking to. I am not really sure it is worth all the 
> infrastructure
> needed to support this though.

Ok, so perhaps for the merging in coverage.c we can do something less
accurate, and worry more about the libgcov merging.

>>
>> >
>> >
>> > Why you don't simply write the histogram into gcov file and don't merge 
>> > the values
>> > here (i.e. doing the cummulation loop in GCC instead of within libgcov)?
>>
>> That doesn't completely solve the problem, unfortunately. The reason
>> is that you don't know which histogram entry contains a particular
>> block each run, and therefore you don't know how to compute the new
>> combined histogram value and index for that bb counter. For example, a
>> particular histogram index may have 5 counters (bbs) in it for one run
>> and 2 counters (bbs) in it for a second run, so the question is how to
>> compute the new entry for all of those bb counters, as the 5 bbs from
>> the first run may or may not be a superset of the 2 from the second
>> run. You could assume that the bbs have the same relative order of
>> hotness between runs, and combine the bb counters accordingly, but
>> there is still some trickiness because you have to apportion the
>> cumulative counter sum stored in the histogram entry between new
>> entries. For example, assume the highest indexed non-zero entries (the
>> histogram buckets containing the largest counter values) in the two
>> incoming histograms are:
>>
>> histogram 1:
>>
>> index 100: 4 counters, cumulative value X, min counter value minx
>> ...
>>
>> histogram 2:
>>
>> index 100: 2 counters, cumulative value Y, min counter value miny
>> index 99: 3 counters, cumulative value W, min counter value minw
>> ...
>>
>> To merge, you could assume something like 2 counters with a new
>> cumulative value (Y + X*2/4), and new mi

[DF] RFC: obstacks in DF

2012-08-20 Thread Dimitrios Apostolou


Hi,

while I was happy using obstacks in other parts of the compiler I thought 
they would provide a handy solution for the XNEWVECs/XRESIZEVECs in 
df-scan.c, especially df_install_refs() which is the heaviest malloc() 
user after the rest of my patches.


In the process I realised that obstacks weren't exactly suitable (thanks 
matz for helping on IRC), nevertheless I have a patch which is stable, a 
bit faster and more memory friendly (don't know why, but RSS memory for 
common non-pathological compilations, was smaller). However after trying 
various incorrect changes I'm aware that there are leaks in there that 
can't be closed without dirty hacks, so I gave up.


I'm posting the simplest but stable version of my changes and would really 
like to hear ideas. There are "holes" in the obstack that should have been 
free'd, but in the end I didn't see memory increase. I don't know what 
would happen for huge functions that keep the obstack alive for long, I 
didn't have such testcase. Finally, I think my next try will involve 
converting the chains to pool_alloc'ed linked list, do you think that 
makes sense?



Thanks,
Dimitris
=== modified file 'gcc/df-scan.c'
--- gcc/df-scan.c   2012-07-16 11:32:42 +
+++ gcc/df-scan.c   2012-08-20 14:01:59 +
@@ -184,6 +184,17 @@ struct df_scan_problem_data
   bitmap_obstack insn_bitmaps;
 };
 
+/* Obstack for storing all of all of
+   insn_info->{defs,uses,eq_uses,mw_hardregs} and
+   bb_info->artificial_{defs,uses}.  */
+static struct obstack df_refs_obstack;
+
+/* Keep the obstack initialised (only 4k overhead) and use this pointer to
+   actually empty it fast.  */
+static void *df_first_refs_obj;
+static int df_refs_obstack_is_init;
+
+
 typedef struct df_scan_bb_info *df_scan_bb_info_t;
 
 
@@ -193,36 +204,13 @@ df_scan_free_internal (void)
 {
   struct df_scan_problem_data *problem_data
 = (struct df_scan_problem_data *) df_scan->problem_data;
-  unsigned int i;
-  basic_block bb;
 
-  /* The vectors that hold the refs are not pool allocated because
- they come in many sizes.  This makes them impossible to delete
- all at once.  */
-  for (i = 0; i < DF_INSN_SIZE(); i++)
-{
-  struct df_insn_info *insn_info = DF_INSN_UID_GET(i);
-  /* Skip the insns that have no insn_info or have been
-deleted.  */
-  if (insn_info)
-   {
- df_scan_free_ref_vec (insn_info->defs);
- df_scan_free_ref_vec (insn_info->uses);
- df_scan_free_ref_vec (insn_info->eq_uses);
- df_scan_free_mws_vec (insn_info->mw_hardregs);
-   }
-}
-
-  FOR_ALL_BB (bb)
-{
-  unsigned int bb_index = bb->index;
-  struct df_scan_bb_info *bb_info = df_scan_get_bb_info (bb_index);
-  if (bb_info)
-   {
- df_scan_free_ref_vec (bb_info->artificial_defs);
- df_scan_free_ref_vec (bb_info->artificial_uses);
-   }
-}
+  /* Delete at once all vectors that hold the refs (all of
+ insn_info->{defs,uses,eq_uses,mw_hardregs} and
+ bb_info->artificial_{defs,uses}) but keep the obstack initialised, so
+ that it's ready for more BBs.  */
+  obstack_free (&df_refs_obstack, df_first_refs_obj);
+  df_first_refs_obj = NULL;
 
   free (df->def_info.refs);
   free (df->def_info.begin);
@@ -364,6 +352,14 @@ df_scan_alloc (bitmap all_blocks ATTRIBU
   bb_info->artificial_uses = NULL;
 }
 
+  if (! df_refs_obstack_is_init)
+{
+  obstack_init (&df_refs_obstack);
+  df_refs_obstack_is_init = 1;
+}
+  gcc_assert (df_first_refs_obj == NULL);
+  df_first_refs_obj = XOBNEW (&df_refs_obstack, void *);
+
   bitmap_initialize (&df->hardware_regs_used, &problem_data->reg_bitmaps);
   bitmap_initialize (&df->regular_block_artificial_uses, 
&problem_data->reg_bitmaps);
   bitmap_initialize (&df->eh_block_artificial_uses, 
&problem_data->reg_bitmaps);
@@ -791,9 +787,15 @@ df_install_ref_incremental (df_ref ref)
 }
 
   ref_rec = *ref_rec_ptr;
+  /* fprintf (stderr, "count:%d ref_rec:%p\n", count, ref_rec); */
   if (count)
 {
-  ref_rec = XRESIZEVEC (df_ref, ref_rec, count+2);
+  /* Impossible to actually know if ref_rec was malloc'ed or obstack'ed
+thus we might have a leak here!  */
+  df_ref *ref_rec2 = XOBNEWVEC (&df_refs_obstack, df_ref, count+2);
+  memcpy (ref_rec2, ref_rec, count*sizeof(df_ref));
+  /* free (ref_rec); */
+  ref_rec = ref_rec2;
   *ref_rec_ptr = ref_rec;
   ref_rec[count] = ref;
   ref_rec[count+1] = NULL;
@@ -801,7 +803,7 @@ df_install_ref_incremental (df_ref ref)
 }
   else
 {
-  df_ref *ref_rec = XNEWVEC (df_ref, 2);
+  ref_rec = XOBNEWVEC (&df_refs_obstack, df_ref, 2);
   ref_rec[0] = ref;
   ref_rec[1] = NULL;
   *ref_rec_ptr = ref_rec;
@@ -1070,8 +1072,9 @@ df_ref_chain_delete (df_ref *ref_rec)
 
   /* If the list is empty, it has a special shared element that is not
  to be deleted.  */
-  if (*start)
-free (start);
+  /* if (*start) */
+  /*   free (sta

Re: C++ PR 54197: lifetime of reference not properly extended

2012-08-20 Thread Ollie Wild

On Thu, Aug 16, 2012 at 2:13 PM, Gabriel Dos Reis
 wrote:
>
> On Wed, Aug 15, 2012 at 9:52 AM, Ollie Wild  wrote:
> > (Adding other C++ maintainers in case someone else wants to have a
> > stab.)
> >
> > Ping?
>
> I consider Jason to be the expert on this; so let see what he says.
>
> -- Gaby

Jason, any idea when you can look at this?

The patch is about as short as they come, so it shouldn't take long to review.

Thanks,
Ollie

Re: [DF] RFC: obstacks in DF

2012-08-20 Thread Steven Bosscher

On Mon, Aug 20, 2012 at 4:54 PM, Dimitrios Apostolou  wrote:
> In the process I realised that obstacks weren't exactly suitable (thanks
> matz for helping on IRC),

Right, alloc-pool would be a better choice here.

Ciao!
Steven

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Steven Bosscher

On Mon, Aug 20, 2012 at 11:48 AM, Jan Hubicka  wrote:
>> The summary merging in coverage.c confuses me a bit as it seems to be
>> handling the case when there are multiple program summaries in a
>> single gcda file. When would this happen? It looks like the merge
>> handling in libgcov should always produce a single program summary per
>> gcda file.

AFAIU it merges existing profile data with new profile data from a
second (or third, or ...) trial run.

Ciao!
Steven

Re: patch for machine dependent rtl section to hide case statements for different types of constants.

2012-08-20 Thread Richard Sandiford

Kenneth Zadeck  writes:
> The omission of CONST_FIXED from the cselib_expand_value_rtx_1,
> attr_copy_rtx, clear_struct_flag and combine switches looks
> unintentional (though only as a missed compiler-speed optimisation).
> Same goes for the omission of CONST_VECTOR from check_maybe_invariant.
>
> The omission of CONST_FIXED from dse.c:const_or_frame_p looks like
> a missed target-code optimisation.  The function ought to be using
> CONSTANT_P instead.
>
>  I did not do what is suggested in the last sentence because
>  it changes the behavior of the rtx "HIGH".

As mentioned privately, that's what we want.

> 1) Define:
>
> /* Match CONST_*s that can represent compile-time constant integers.  */
> #define CASE_CONST_SCALAR_INT \
>case CONST_INT: \
>case CONST_DOUBLE
>
> /* Match CONST_*s for which pointer equality corresponds to value 
> equality.  */
> #define CASE_CONST_UNIQUE \
>case CONST_INT: \
>case CONST_DOUBLE: \
>case CONST_FIXED
>
> /* Match all CONST_* rtxes.  */
> #define CASE_CONST_ANY \
>case CONST_INT: \
>case CONST_DOUBLE: \
>case CONST_FIXED: \
>case CONST_VECTOR
>
> and remove the mark_jump_label_1 cases.

I meant that these three should be the _only_ cases we need.
The reason I listed all those missed cases was that, with the
exception of mark_jump_label_1, the switches really seemed to be
testing one of the three conditions above.

Richard

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Xinliang David Li

>
> So I definitely preffer 2 or 3 over 1. David has experience with 3. How well 
> does
> it work for LIPO?
>

This (lack of locking, races) is not a new problem. There is no
synchronization in libgcov for profile update/merge at both thread and
process level. Thread level data races leads to inconsistent counters,
but can be mostly corrected under -fprofile-correction and smoothing.
There is also problem with false indirect call targets --- gcc has
mechanism to filter those out.

Process level synchronization problems can happen when two processes
(running the instrumented binary) exit at the same time. The
updated/merged counters from one process may be overwritten by another
process -- this is true for both counter data and summary data.
Solution 3) does not introduce any new problems.

thanks,

David



> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > Honza
>>
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

failed attempt: retain identifier length from frontend to backend

2012-08-20 Thread Dimitrios Apostolou


Hello,

my last attempt on improving something serious was about three weeks ago, 
trying to keep all lengths of all strings parsed in the frontend for the 
whole compilation phase until the assembly output. I was hoping that would 
help on using faster hashes (knowing the length allows us to hash 4 or 8 
bytes per iteration), quicker strcmp in various places, and using less 
strlen() calls, which show especially on -g3 compilations that store huge 
macro strings.


I'll post no patch here, since what I currently have is a mess in 3 
different branches and most don't even compile. I tried various 
approaches. First I tried adding an extra length parameter in all relevant 
functions, starting from the assembly generation and working my way 
upwards. This got too complex, and I'd really like to ask if you find any 
meaning in changing target specific hooks and macros to actually accept 
length as argument (e.g. ASM_OUTPUT_*) or return it (e.g. ASM_GENERATE_*). 
Changes seemed too intrusive for me to continue.


But seeing that identifier length is there inside struct ht_identifier (or 
cpp_hashnode) and not lost, I tried the approach of having the length at 
str[-4] for all identifiers. To achieve this I changed ht_identifier to 
store str with the flexible array hack. Unfortunately I hadn't noticed 
that ht_identifier was a part of tree_node and also part of too many other 
structs, so changing all those structs to have variable size was not 
without side effects. In the end it compiled but I got crashes all over, 
and I'm sure I didn't do things right since I broke things like the static 
assert in libcpp/identifiers.c, that I don't even understand:


 /* We don't need a proxy since the hash table's identifier comes first
in cpp_hashnode.  However, in case this is ever changed, we have a
static assertion for it.  */
-extern char proxy_assertion_broken[offsetof (struct cpp_hashnode, ident) == 0 
? 1 : -1];

Anyway last attempt was to decouple ht_identifier completely from trees 
and other structs by storing pointer to it, but I was pretty worn out and 
quickly gave up after getting errors on gengtype-generated files that I 
didn't even know how to handle.


Was all this project too ambitious? I'd appreciate all input.


Thanks,
Dimitris

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Xinliang David Li

> If this approach seems like it is feasible, then we could stick with
> the current approach of emitting the working set array in the summary,
> mitigating it somewhat by doing the sum_all based scaling of the
> counter values, then in a follow on patch restructure the merging code
> to delay the working set computation as described above.

+1

David

> Teresa
>
>>
>> Honza
>>>
>>> Thanks,
>>> Teresa
>>>
>>> >
>>> > Honza
>>>
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

[PATCH][LIBTOOL] AIX PIC compiler options

2012-08-20 Thread David Edelsohn

The GCC -fpic/-fPIC option has evolved to mean code generation for a
shared library and changes the optimization behavior of the compiler.
Code for AIX PowerPC always is PIC, but the optimization behavior is
affecting AIX.

libtool exports all global symbols on AIX while GCC binds_local_p()
determines that some function calls are local, causing GCC to emit the
local (non-external) form of function calls, which generates AIX
linker warnings.

The IBM XL compiler defaults to the equivalent of -fpic.  GCC could
try to match that, but it requires working around other GCC bootstrap
problems and continually chasing compatibility.  The other option is
to assume that developers using GCC to build shared libraries either
are using libtool or copying FOSS Makefiles designed for GNU/Linux or
expecting GNU/Linux compatibility.

The following patch starts to implement the latter by adding "-fPIC"
to the compiler command line options when building objects for a
shared library.  Because this places control in the hands of the user
with a command line option and matches GNU/Linux, this seems better
than playing with defaults and fighting GCC semantics for building
shared libraries.

I also will post this patch upstream to Libtool.

Comments?

Thanks, David

* libtool.m4 (_LT_COMPILER_PIC): Add -fPIC to GCC and GXX for AIX.
* configure: Regenerate.

Index: libtool.m4
===
--- libtool.m4  (revision 190521)
+++ libtool.m4  (working copy)
@@ -3580,6 +3580,7 @@
# AIX 5 now supports IA64 processor
_LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   fi
+  _LT_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC'
   ;;

 amigaos*)
@@ -3891,6 +3892,7 @@
# AIX 5 now supports IA64 processor
_LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   fi
+  _LT_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC'
   ;;

 amigaos*)

Re: [C++ Patch] PR 10416

2012-08-20 Thread Jason Merrill


OK.

Jason

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Teresa Johnson

On Mon, Aug 20, 2012 at 8:35 AM, Steven Bosscher  wrote:
> On Mon, Aug 20, 2012 at 11:48 AM, Jan Hubicka  wrote:
>>> The summary merging in coverage.c confuses me a bit as it seems to be
>>> handling the case when there are multiple program summaries in a
>>> single gcda file. When would this happen? It looks like the merge
>>> handling in libgcov should always produce a single program summary per
>>> gcda file.
>
> AFAIU it merges existing profile data with new profile data from a
> second (or third, or ...) trial run.

It looks like libgcov will always merge program summaries between
different runs though. As far as I can tell, it will always either
rewrite the whole gcda file with a single merged program summary, or
abort the write if the merge was not successful. However, the comment
above gcov_exit does talk about keeping program summaries separate for
object files contained in different programs, which is what Honza was
describing as the motivation for the coverage.c merging. But I don't
see where multiple program summaries ever get written to the same gcda
file - if the checksums are different it seems to abort the write. But
the code in coverage.c is dealing with a single gcda file containing
multiple program summaries. Is there code somewhere that will cause
this to happen?

Thanks,
Teresa

>
> Ciao!
> Steven

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Steven Bosscher

On Mon, Aug 20, 2012 at 7:44 PM, Teresa Johnson  wrote:
> But
> the code in coverage.c is dealing with a single gcda file containing
> multiple program summaries. Is there code somewhere that will cause
> this to happen?

Not that I know of, no.
(There are enhancement requests for this, see e.g. PR47618).

Ciao!
Steven

Re: [Patch, Fortran] PR54301 - add warning for pointer might outlive its target

2012-08-20 Thread Thomas Koenig


Hi Tobias,

do you think that -Wtarget-lifetime should be included with -Wall?
I think so, because the code flagged is certainly invalid, and likely
to cause random errors.

Thomas

Re: [wwwdocs] Document Runtime CPU detection builtins

2012-08-20 Thread Sriraman Tallam

Ping.



On Tue, Aug 14, 2012 at 11:02 AM, Sriraman Tallam  wrote:
> +ger...@pfiefer.com
>
> On Tue, Aug 14, 2012 at 10:51 AM, Sriraman Tallam  wrote:
>> Hi Gerald,
>>
>>Is this release note alright?
>>
>> Thanks,
>> -Sri.
>>
>> On Fri, Aug 10, 2012 at 7:20 PM, Sriraman Tallam  wrote:
>>> Hi,
>>>
>>>I have added a release note for x86 builtins __builtin_cpu_is and
>>> __builtin_cpu_supports. They were checked in to trunk in rev. 186789.
>>> Is this ok to submit?
>>>
>>> Thanks,
>>> -Sri.

Re: [wwwdocs] Document Runtime CPU detection builtins

2012-08-20 Thread Diego Novillo

On Fri, Aug 10, 2012 at 10:20 PM, Sriraman Tallam  wrote:
> Hi,
>
>I have added a release note for x86 builtins __builtin_cpu_is and
> __builtin_cpu_supports. They were checked in to trunk in rev. 186789.
> Is this ok to submit?

I would not include such detailed documentation in changes.html.  I
assume that all this documentation, including caveats and limitations
is documented in the manual itself?  If that's the case, then simply
mention the builtins, an overview description of about a paragraph and
pointers to the user documentation for limitations and caveats.

Diego.

Re: [Google 4.7] Generate pubnames compatible with gdb-index version 7. (issue6459099)

2012-08-20 Thread Sterling Augustine

On Thu, Aug 16, 2012 at 5:32 PM, Cary Coutant  wrote:
>> +/* Output a single entry in the pubnames table.  */
>> +
>> +static void
>> +output_pubname (dw_offset die_offset, pubname_entry *entry)
>
> For this function, I'd suggest a comment to the effect that the logic
> is lifted from GDB.
>
>> @@ -2424,6 +2424,10 @@ gpubnames
>>  Common RejectNegative Var(debug_generate_pub_sections, 1)
>>  Generate DWARF pubnames and pubtypes sections.
>>
>> +ggnu-pubnames
>> +Common RejectNegative Var(debug_generate_pub_sections, 2)
>> +Generate DWARF pubnames and pubtypes sections.
>
> Instead of "RejectNegative", I think these three options should now
> use "Negative(...)" flags (each one naming the next, circularly). Not
> sure about that, though. (See the treatment of -gdwarf-, -gstabs,
> etc.)
>
> OK for google/gcc-4_7 branch.

Committed with your recommended changes, as attached.

Sterling


patch.diff
Description: Binary data

Re: [PATCH] Fix -fcompare-debug failure in fwprop (PR rtl-optimization/54294)

2012-08-20 Thread Richard Sandiford

Jakub Jelinek  writes:
> 2012-08-17  Jakub Jelinek  
>
>   PR rtl-optimization/54294
>   * fwprop.c (all_uses_available_at): Ignore debug insns in between
>   def_insn and target_insn when checking whether the shortcut is
>   possible.

OK, thanks.

Richard

Re: [Patch, Fortran] PR54301 - add warning for pointer might outlive its target

2012-08-20 Thread Tobias Burnus


Hi Thomas,

Thomas Koenig wrote:

do you think that -Wtarget-lifetime should be included with -Wall?
I think so, because the code flagged is certainly invalid, and likely
to cause random errors.


I concur. However, that's what the current version in the trunk does: 
-Wall implies -Wtarget-lifetime.


On the other hand, I just realized that "attr.result" is not set if the 
function symbols is also its result symbol – hence, there is no warning for:


function f()
  integer, pointer :: f
  integer, target :: t
  f => t
end

I think the following patch will fix that. I will package, regtest and 
commit it later.


Tobias

--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -3673,6 +3673,7 @@ gfc_check_pointer_assign (gfc_expr *lvalue, 
gfc_expr *rvalue)


   warn = lvalue->symtree->n.sym->attr.dummy
 || lvalue->symtree->n.sym->attr.result
+|| lvalue->symtree->n.sym->attr.function
 || lvalue->symtree->n.sym->attr.host_assoc
 || lvalue->symtree->n.sym->attr.use_assoc
 || lvalue->symtree->n.sym->attr.in_common;

[SPARC] Define MAX_FIXED_MODE_SIZE

2012-08-20 Thread Eric Botcazou

SPARC is now one of the last mainstream 64-bit platforms that do not define 
MAX_FIXED_MODE_SIZE to TImode.  Doing so helps the Ada compiler in particular 
because it is a heavy user of structures made up of a pair of pointers.

Bootstrapped/regtested/compat-regtested on SPARC64/Solaris, applied on the 
mainline.


2012-08-20  Eric Botcazou  

* config/sparc/sparc.h (MAX_FIXED_MODE_SIZE): Define.


-- 
Eric Botcazou
Index: config/sparc/sparc.h
===
--- config/sparc/sparc.h	(revision 190512)
+++ config/sparc/sparc.h	(working copy)
@@ -475,7 +475,6 @@ extern enum cmodel sparc_cmodel;
 #endif
 
 /* Now define the sizes of the C data types.  */
-
 #define SHORT_TYPE_SIZE		16
 #define INT_TYPE_SIZE		32
 #define LONG_TYPE_SIZE		(TARGET_ARCH64 ? 64 : 32)
@@ -512,7 +511,6 @@ extern enum cmodel sparc_cmodel;
 #define SPARC_STACK_BOUNDARY_HACK (TARGET_ARCH64 && TARGET_STACK_BIAS)
 
 /* ALIGN FRAMES on double word boundaries */
-
 #define SPARC_STACK_ALIGN(LOC) \
   (TARGET_ARCH64 ? (((LOC)+15) & ~15) : (((LOC)+7) & ~7))
 
@@ -551,6 +549,10 @@ extern enum cmodel sparc_cmodel;
  : MAX ((COMPUTED), (SPECIFIED)))			\
:  MAX ((COMPUTED), (SPECIFIED)))
 
+/* An integer expression for the size in bits of the largest integer machine
+   mode that should actually be used.  We allow pairs of registers.  */
+#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TARGET_ARCH64 ? TImode : DImode)
+
 /* We need 2 words, so we can save the stack pointer and the return register
of the function containing a non-local goto target.  */
 #define STACK_SAVEAREA_MODE(LEVEL) \

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Andi Kleen

Xinliang David Li  writes:
>
> Process level synchronization problems can happen when two processes
> (running the instrumented binary) exit at the same time. The
> updated/merged counters from one process may be overwritten by another
> process -- this is true for both counter data and summary data.
> Solution 3) does not introduce any new problems.

You could just use lockf() ?

-Andi

[PATCH] Fix valtrack ICE (PR debug/53923)

2012-08-20 Thread Jakub Jelinek

Hi!

On the testcase from this PR on AVR (from libgcc, thus not including
it into testsuite/) we ICE, because dead_debug_insert_temp is called
several times on the same insn, for multi-register hard register for each
regno in it (except the first which doesn't seem to be dead).
In the first call dead_debug_insert_temp changes *DF_REF_REAL_LOC
to DEBUG_EXPR, and in the next call for the next consecutive hard register
we set reg variable to *DF_REF_REAL_LOC and rely on it to be a REG,
when it is a DEBUG_EXPR already instead.  No change is needed in that case.

Bootstrapped/regtested on x86_64-linux and i686-linux (where this never
kicks in though), tested on the testcase in cross to avr.  Ok for trunk?

2012-08-20  Jakub Jelinek  

PR debug/53923
* valtrack.c (dead_debug_insert_temp): Drop non-reg uses
from the chain.

--- gcc/valtrack.c.jj   2012-08-10 12:57:21.0 +0200
+++ gcc/valtrack.c  2012-08-20 14:59:07.609586159 +0200
@@ -336,6 +336,14 @@ dead_debug_insert_temp (struct dead_debu
 {
   if (DF_REF_REGNO (cur->use) == uregno)
{
+ /* If this loc has been changed e.g. to debug_expr already
+as part of a multi-register use, just drop it.  */
+ if (!REG_P (*DF_REF_REAL_LOC (cur->use)))
+   {
+ *tailp = cur->next;
+ XDELETE (cur);
+ continue;
+   }
  *usesp = cur;
  usesp = &cur->next;
  *tailp = cur->next;

Jakub

[Patch, Fortran, commit] -Wtarget-lifetime: Fix omission

2012-08-20 Thread Tobias Burnus


Committed as Rev. 190542

Tobias
Index: gcc/testsuite/gfortran.dg/warn_target_lifetime_2.f90
===
--- gcc/testsuite/gfortran.dg/warn_target_lifetime_2.f90	(Revision 0)
+++ gcc/testsuite/gfortran.dg/warn_target_lifetime_2.f90	(Arbeitskopie)
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! { dg-options "-Wtarget-lifetime" }
+!
+! PR fortran/54301
+!
+function f()
+  integer, pointer :: f
+  integer, target :: t
+  f => t ! { dg-warning "Pointer at .1. in pointer assignment might outlive the pointer target" }
+end
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(Revision 190541)
+++ gcc/testsuite/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,8 @@
+2012-08-20  Tobias Burnus  
+
+	PR fortran/54301
+	* gfortran.dg/warn_target_lifetime_2.f90: New.
+
 2012-08-20  Paolo Carlini  
 
 	PR c++/10416
Index: gcc/fortran/expr.c
===
--- gcc/fortran/expr.c	(Revision 190541)
+++ gcc/fortran/expr.c	(Arbeitskopie)
@@ -3673,6 +3673,7 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_ex
 
   warn = lvalue->symtree->n.sym->attr.dummy
 	 || lvalue->symtree->n.sym->attr.result
+	 || lvalue->symtree->n.sym->attr.function
 	 || lvalue->symtree->n.sym->attr.host_assoc
 	 || lvalue->symtree->n.sym->attr.use_assoc
 	 || lvalue->symtree->n.sym->attr.in_common;
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 190541)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,6 +1,12 @@
 2012-08-20  Tobias Burnus  
 
 	PR fortran/54301
+	* expr.c (gfc_check_pointer_assign): Warn when a pointer,
+	which is a function result, might outlive its target.
+
+2012-08-20  Tobias Burnus  
+
+	PR fortran/54301
 	* expr.c (gfc_check_pointer_assign): Warn when the pointer
 	might outlive its target.
 	* gfortran.h (struct gfc_option_t): Add warn_target_lifetime.

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Xinliang David Li

I was mistaken here -- gcov_open actually uses locking via fcntl
interface -- so we do need to find a way to solve the summary update
synchronization problem when it is separate out of the per-file update
loop.

David

On Mon, Aug 20, 2012 at 12:03 PM, Andi Kleen  wrote:
> Xinliang David Li  writes:
>>
>> Process level synchronization problems can happen when two processes
>> (running the instrumented binary) exit at the same time. The
>> updated/merged counters from one process may be overwritten by another
>> process -- this is true for both counter data and summary data.
>> Solution 3) does not introduce any new problems.
>
> You could just use lockf() ?
>
> -Andi

Re: dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant offset.

2012-08-20 Thread Mark Wielaard

On Fri, Apr 27, 2012 at 08:16:04PM +0200, Mark Wielaard wrote:
> On Fri, 2012-04-27 at 15:43 +0200, Jakub Jelinek wrote:
> > On Fri, Apr 27, 2012 at 03:36:56PM +0200, Mark Wielaard wrote:
> > > But even without this, I think the patch is worth it just to get rid of
> > > all the relocations necessary otherwise.
> > 
> > IMHO we should defer applying this by a few months, given that GDB support
> > is only being added these days and -gdwarf-4 is now the default.
> > Applying it in August/September when there is already GDB version with
> > the support would be better.
> 
> OK, I'll try to remember and ping as soon as a new GDB release is out
> with the patch in.

Ping. There are stable releases of GDB 7.5, valgrind 3.8.0 and
elfutils 0.154 out now that support it.

I rebased the patch and tested against GDB 7.5.

2012-08-20  Mark Wielaard  

* dwarf2out.h (enum dw_val_class): Add dw_val_class_high_pc.
* dwarf2out.c (dw_val_equal_p): Handle dw_val_class_high_pc.
(add_AT_low_high_pc): New function.
(AT_lbl): Handle dw_val_class_high_pc.
(print_die): Likewise.
(attr_checksum): Likewise.
(attr_checksum_ordered): Likewise.
(same_dw_val_p): Likewise.
(size_of_die): Likewise.
(value_format): Likewise.
(output_die): Likewise.
(gen_subprogram_die): Use add_AT_low_high_pc.
(add_high_low_attributes): Likewise.
(dwarf2out_finish): Likewise.

OK to commit now?

Thanks,

Mark
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 4bc4cc3..11d925b 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -1312,6 +1312,7 @@ dw_val_equal_p (dw_val_node *a, dw_val_node *b)
 case dw_val_class_fde_ref:
   return a->v.val_fde_index == b->v.val_fde_index;
 case dw_val_class_lbl_id:
+case dw_val_class_high_pc:
   return strcmp (a->v.val_lbl_id, b->v.val_lbl_id) == 0;
 case dw_val_class_str:
   return a->v.val_str == b->v.val_str;
@@ -3598,6 +3599,26 @@ add_AT_data8 (dw_die_ref die, enum dwarf_attribute 
attr_kind,
   add_dwarf_attr (die, &attr);
 }
 
+/* Add DW_AT_low_pc and DW_AT_high_pc to a DIE.  */
+static inline void
+add_AT_low_high_pc (dw_die_ref die, const char *lbl_low, const char *lbl_high)
+{
+  dw_attr_node attr;
+
+  attr.dw_attr = DW_AT_low_pc;
+  attr.dw_attr_val.val_class = dw_val_class_lbl_id;
+  attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_low);
+  add_dwarf_attr (die, &attr);
+
+  attr.dw_attr = DW_AT_high_pc;
+  if (dwarf_version < 4)
+attr.dw_attr_val.val_class = dw_val_class_lbl_id;
+  else
+attr.dw_attr_val.val_class = dw_val_class_high_pc;
+  attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_high);
+  add_dwarf_attr (die, &attr);
+}
+
 /* Hash and equality functions for debug_str_hash.  */
 
 static hashval_t
@@ -3981,7 +4002,8 @@ AT_lbl (dw_attr_ref a)
 {
   gcc_assert (a && (AT_class (a) == dw_val_class_lbl_id
|| AT_class (a) == dw_val_class_lineptr
-   || AT_class (a) == dw_val_class_macptr));
+   || AT_class (a) == dw_val_class_macptr
+   || AT_class (a) == dw_val_class_high_pc));
   return a->dw_attr_val.v.val_lbl_id;
 }
 
@@ -4877,6 +4899,7 @@ print_die (dw_die_ref die, FILE *outfile)
case dw_val_class_lbl_id:
case dw_val_class_lineptr:
case dw_val_class_macptr:
+   case dw_val_class_high_pc:
  fprintf (outfile, "label: %s", AT_lbl (a));
  break;
case dw_val_class_str:
@@ -5033,6 +5056,7 @@ attr_checksum (dw_attr_ref at, struct md5_ctx *ctx, int 
*mark)
 case dw_val_class_lbl_id:
 case dw_val_class_lineptr:
 case dw_val_class_macptr:
+case dw_val_class_high_pc:
   break;
 
 case dw_val_class_file:
@@ -5305,6 +5329,7 @@ attr_checksum_ordered (enum dwarf_tag tag, dw_attr_ref at,
 case dw_val_class_lbl_id:
 case dw_val_class_lineptr:
 case dw_val_class_macptr:
+case dw_val_class_high_pc:
   break;
 
 case dw_val_class_file:
@@ -5770,6 +5795,7 @@ same_dw_val_p (const dw_val_node *v1, const dw_val_node 
*v2, int *mark)
 case dw_val_class_lbl_id:
 case dw_val_class_lineptr:
 case dw_val_class_macptr:
+case dw_val_class_high_pc:
   return 1;
 
 case dw_val_class_file:
@@ -7241,6 +7267,9 @@ size_of_die (dw_die_ref die)
case dw_val_class_vms_delta:
  size += DWARF_OFFSET_SIZE;
  break;
+   case dw_val_class_high_pc:
+ size += DWARF2_ADDR_SIZE;
+ break;
default:
  gcc_unreachable ();
}
@@ -7558,6 +7587,17 @@ value_format (dw_attr_ref a)
 case dw_val_class_data8:
   return DW_FORM_data8;
 
+case dw_val_class_high_pc:
+  switch (DWARF2_ADDR_SIZE)
+   {
+ case 4:
+   return DW_FORM_data4;
+ case 8:
+   return DW_FORM_data8;
+ default:
+   gcc_unreachable ();
+   }
+
 default:
   gcc_unreachable ();
 }
@@ -7984,6 +8024,11 @@ output_die (dw_die_ref die)
break;
  }
 
+   case

Re: dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant offset.

2012-08-20 Thread Jakub Jelinek

On Mon, Aug 20, 2012 at 09:59:26PM +0200, Mark Wielaard wrote:
> Ping. There are stable releases of GDB 7.5, valgrind 3.8.0 and
> elfutils 0.154 out now that support it.
> 
> I rebased the patch and tested against GDB 7.5.
> 
> 2012-08-20  Mark Wielaard  
> 
> * dwarf2out.h (enum dw_val_class): Add dw_val_class_high_pc.
> * dwarf2out.c (dw_val_equal_p): Handle dw_val_class_high_pc.
> (add_AT_low_high_pc): New function.
> (AT_lbl): Handle dw_val_class_high_pc.
> (print_die): Likewise.
> (attr_checksum): Likewise.
> (attr_checksum_ordered): Likewise.
> (same_dw_val_p): Likewise.
> (size_of_die): Likewise.
> (value_format): Likewise.
> (output_die): Likewise.
> (gen_subprogram_die): Use add_AT_low_high_pc.
> (add_high_low_attributes): Likewise.
> (dwarf2out_finish): Likewise.
> 
> OK to commit now?

Okay, thanks.

Jakub

Re: Fix PR c++/19351 (operator new[] overflow)

2012-08-20 Thread Jason Merrill


OK.  Sorry for the delay.

Jason

Re: [wwwdocs] Document Runtime CPU detection builtins

2012-08-20 Thread Gerald Pfeifer

Hi Sriraman,

On Fri, 10 Aug 2012, Sriraman Tallam wrote:
> I have added a release note for x86 builtins __builtin_cpu_is and
> __builtin_cpu_supports. They were checked in to trunk in rev. 186789.

I had hoped one of the x86 maintainers would review this from his
perspective given that they have more background.  For the lack of
that, let me give it a try.

Index: changes.html
===
+ New builtin functions to detect run-time CPU type and ISA:

"built-in", cf. http://gcc.gnu.org/codingconventions.html; here and
in the following.

No  here;  should just do that.

+
+  Builtin __builtin_cpu_is has been added to detect if
+  the run-time CPU is of a particular type. The builtin returns a postive
+  integer on a match and zero otherwise. The builtin accepts one string
+  literal argument, the CPU name. For example,

"A built-in function..."

"positive"

"It accepts one string" (to make this shorter)

+  __builtin_cpu_is("westmere") returns a postive integer if

"positive"

+  the run-time CPU is an Intel Corei7 Westmere processor.  The following

I don't work for Intel, but should there be a space before "i7"?

+  are the CPU names recognized by __builtin_cpu_is:

How about making this "The following are the CPU names recognized for
now", which avoids another reference to the name of the built-in and
makes it clear that this is subject to change.

+  Builtin __builtin_cpu_supports has been added to detect

"A built-in function..."

+  returns a postive integer on a match and zero otherwise. The builtin

"positive"

+  following are the ISA features recognized by
+  __builtin_cpu_supports:

Same is above?

+Caveat: If the above builtins are called before any constructors are
+invoked, like during IFUNC initialization, then the CPU detection
+initialization must be explicity run using this newly provided
+builtin,  __builtin_cpu_init.

"...using the new built-in function __builtin_cpu_init."

What is a constructor in this context, by the way?  Will this be clear
to all the users?

+
+static void (*some_ifunc_resolver(void))(void)
+{
+   __builtin_cpu_init();
+   if (__builtin_cpu_is("amdfam10h") ...
+   if (__builtin_cpu_supports("popcnt") ...
+}
+

How about using  here? That avoids the s which will cause
problems with the web page validator, by the way.

Nice job for documenting this so well.  Thanks for taking the time
and your patience!

The patch is fine modulo the changes I pointed out (though some of
them are more suggestions and you do not need to slavishly follow
those).

Gerald

Re: [patch] Fix problems with -fdebug-types-section and local types

2012-08-20 Thread Cary Coutant

Ping.

http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00398.html

-cary

On Mon, Aug 13, 2012 at 1:13 PM, Cary Coutant  wrote:
>> 2012-08-07   Cary Coutant  
>>
>> gcc/
>> * dwarf2out.c (clone_as_declaration): Copy DW_AT_abstract_origin
>> attribute.
>> (generate_skeleton_bottom_up): Remove DW_AT_object_pointer attribute
>> from original DIE.
>> (clone_tree_hash): Rename to ...
>> (clone_tree_partial): ... this; change callers.  Copy
>> DW_TAG_subprogram DIEs as declarations.
>>
>> gcc/testsuite/
>> * testsuite/g++.dg/debug/dwarf2/dwarf4-nested.C: New test case.
>> * testsuite/g++.dg/debug/dwarf2/dwarf4-typedef.C: Add
>> -fdebug-types-section flag.
>
> Ping?
>
> -cary

Re: Merge C++ conversion into trunk (0/6 - Overview)

2012-08-20 Thread H.J. Lu

On Tue, Aug 14, 2012 at 11:59 AM, Diego Novillo  wrote:
> On 12-08-14 09:48 , Diego Novillo wrote:
>
>> This merge touches several files, so I'm thinking that the best time is
>> going to be on Thu 16/Aug around 2:00 GMT.
>
>
> So, the fixes I needed from Lawrence are already in so we can proceed with
> the merge.  I'll commit the merge tonight at ~2:00 GMT.
>
> After the merge is in, I will send an announcement and request major branch
> merges to wait for another 24 hrs to allow testers the chance to pick up
> this merge.
>

The C++ merge caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332

GCC memory usage is more than doubled from <= 3GB
to >= 10GB.  Is this a known issue?

-- 
H.J.

Re: [SH] PR 54089 - Add support for rotcr insn

2012-08-20 Thread Gary Funck

On 08/20/12 01:02:39, Oleg Endo wrote:
> Hello,
> 
> This adds support for SH's rotcr insn.
> Tested on rev 190459 with
> make -k check RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> and no new failures.
> OK?
> 
> Cheers,
> Oleg
> 
> ChangeLog:
> 
>   PR target/50489

Above: that should be: PR target/54089.

>   * config/sh/sh.md (rotcr, *rotcr, shar, shlr): New insns and 
>   splits.
>   (ashrdi3_k, lshrdi3_k): Rewrite as insn_and_split.
>   * config/sh/sh.c (sh_lshrsi_clobbers_t_reg_p): New function.
>   * config/sh/sh-protos.h (sh_lshrsi_clobbers_t_reg_p): Declare 
>   it.
>   
> testsuite/ChangeLog:
>   
>   PR target/50489

Here also.

>   * gcc.target/sh/pr54089-1.c: New.

This is OK.

For a sec. I thought my long-standing bug report
got some attention. :)

- Gary

Re: [SH] PR 54089 - Add support for rotcr insn

2012-08-20 Thread Oleg Endo

On Mon, 2012-08-20 at 14:09 -0700, Gary Funck wrote:
> > 
> > ChangeLog:
> > 
> > PR target/50489
> 
> Above: that should be: PR target/54089.
> 
> > * config/sh/sh.md (rotcr, *rotcr, shar, shlr): New insns and 
> > splits.
> > (ashrdi3_k, lshrdi3_k): Rewrite as insn_and_split.
> > * config/sh/sh.c (sh_lshrsi_clobbers_t_reg_p): New function.
> > * config/sh/sh-protos.h (sh_lshrsi_clobbers_t_reg_p): Declare 
> > it.
> > 
> > testsuite/ChangeLog:
> > 
> > PR target/50489
> 
> Here also.
> 
> > * gcc.target/sh/pr54089-1.c: New.
> 
> This is OK.

Thanks and sorry for the digit twister.

I've corrected it in the ChangeLog files.  Is there any way to delete
the PR commit comment in bugzilla?

> For a sec. I thought my long-standing bug report
> got some attention. :)

It did, but probably not in a constructive way ;)

Cheers,
Oleg

Re: [PATCH v2, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86]

2012-08-20 Thread Oleg Endo

On Sat, 2012-08-18 at 10:23 +0200, Uros Bizjak wrote:
> On Sat, Aug 18, 2012 at 10:14 AM, Uros Bizjak  wrote:
> 
> > After discussion with Oleg, it looks that it is enough to prevent
> > wrong registers in the output of the (multi-output) insn pattern. As
> > far as inputs are concerned, combine already handles limited reload
> > classes in the right way. The problem with x86 is, that reload tried
> > to fix output operand of the multi-output ins, where scheduler already
> > moved load of ax register before this insn.
> >
> > Version 2 of the patch now handles only output operands. Also,
> > handling of empty constraints was fixed.
> 
> ... but here we fail testcase from PR46843... so we HAVE to check
> input operands.

Hm, I'm curious ... how would that work for patterns such as

(define_insn "*addc"
  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
(plus:SI (plus:SI
(match_operand:SI 1 "arith_reg_operand" "%0")
(match_operand:SI 2 "arith_reg_or_0_operand" "r"))
 (match_operand:SI 3 "t_reg_operand" "")))
   (clobber (reg:SI T_REG))]
  "TARGET_SH1"
  "addc %2,%0"
  [(set_attr "type" "arith")])

... where the predicate "arith_reg_or_0_operand" allows either
"const_int 0" or an "arith_reg_operand", but the constraint "r" tells
reload to load the constant into a register for this insn.
Probably those kind of patterns that rely on this behavior would need to
be rewritten as insn_and_split to do the constant loading 'manually'?

Cheers,
Oleg

[google/gcc-4_7] Fix ICEs with -gfission.

2012-08-20 Thread Cary Coutant

This patch is for the google/gcc-4_7 branch.

When a location list or location expression is removed from a DIE, we
need to remove entries in the .debug_addr table that were referenced
by those location expressions.  Except for one case, the existing code
checked only the first descriptor in each location expression instead
of looping through all the descriptors.  In cases where we don't
remove the .debug_addr table entries, an ICE occurs during assembly
output.

This patch also fixes an ICE in output_pubname with -ggnu-pubnames,
where we are asserting on TAGs that GDB doesn't care about.  Instead
of asserting, we should just be setting the flags to 0.


2012-08-20   Cary Coutant  

gcc/
* dwarf2out.c (remove_loc_list_addr_table_entries): Change
parameter; update all calls.
(output_pubname): Don't assert on unknown TAGs.
(resolve_addr): Call remove_loc_list_addr_table_entries for all
location expressions.


Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 190548)
+++ gcc/dwarf2out.c (working copy)
@@ -4858,13 +4858,9 @@ remove_addr_table_entry (unsigned int i)
address_table.  */
 
 static void
-remove_loc_list_addr_table_entries (dw_loc_list_ref loc)
+remove_loc_list_addr_table_entries (dw_loc_descr_ref descr)
 {
-  dw_loc_descr_ref descr;
-
-  gcc_assert (loc->replaced);
-
-  for (descr = loc->expr; descr; descr = descr->dw_loc_next)
+  for (; descr; descr = descr->dw_loc_next)
 if (descr->dw_loc_oprnd1.val_index != -1U)
   remove_addr_table_entry (descr->dw_loc_oprnd1.val_index);
 }
@@ -9468,7 +9464,8 @@ output_pubname (dw_offset die_offset, pu
   GDB_INDEX_SYMBOL_STATIC_SET_VALUE(flags, 1);
   break;
 default:
-  gcc_unreachable ();
+ /* For unrecognized TAGs, don't set the flags.  */
+  break;
   }
   dw2_asm_output_data (1, flags >> GDB_INDEX_CU_BITSIZE,
"GDB-index flags");
@@ -22875,8 +22872,8 @@ resolve_addr (dw_die_ref die)
gcc_assert (!next->ll_symbol);
next->ll_symbol = (*curr)->ll_symbol;
  }
-   if (l->dw_loc_oprnd1.val_index != -1U)
- remove_addr_table_entry (l->dw_loc_oprnd1.val_index);
+   if (dwarf_split_debug_info)
+ remove_loc_list_addr_table_entries (l);
*curr = next;
  }
else
@@ -22891,7 +22888,7 @@ resolve_addr (dw_die_ref die)
  {
loc->replaced = 1;
 if (dwarf_split_debug_info)
-  remove_loc_list_addr_table_entries (loc);
+  remove_loc_list_addr_table_entries (loc->expr);
loc->dw_loc_next = *start;
  }
  }
@@ -22916,8 +22913,8 @@ resolve_addr (dw_die_ref die)
   || l->dw_loc_next != NULL)
  && !resolve_addr_in_expr (l))
{
- if (l->dw_loc_oprnd1.val_index != -1U)
-   remove_addr_table_entry (l->dw_loc_oprnd1.val_index);
+ if (dwarf_split_debug_info)
+   remove_loc_list_addr_table_entries (l);
  remove_AT (die, a->dw_attr);
  ix--;
}

Re: [google/gcc-4_7] Fix ICEs with -gfission.

2012-08-20 Thread Sterling Augustine

On Mon, Aug 20, 2012 at 5:30 PM, Cary Coutant  wrote:
> This patch is for the google/gcc-4_7 branch.
>
> When a location list or location expression is removed from a DIE, we
> need to remove entries in the .debug_addr table that were referenced
> by those location expressions.  Except for one case, the existing code
> checked only the first descriptor in each location expression instead
> of looping through all the descriptors.  In cases where we don't
> remove the .debug_addr table entries, an ICE occurs during assembly
> output.
>
> This patch also fixes an ICE in output_pubname with -ggnu-pubnames,
> where we are asserting on TAGs that GDB doesn't care about.  Instead
> of asserting, we should just be setting the flags to 0.
>
>
> 2012-08-20   Cary Coutant  
>
> gcc/
> * dwarf2out.c (remove_loc_list_addr_table_entries): Change
> parameter; update all calls.
> (output_pubname): Don't assert on unknown TAGs.
> (resolve_addr): Call remove_loc_list_addr_table_entries for all
> location expressions.

OK for google 4.7.

Sterling

[Google 4.7 Obvious] Fix trailing enumerator comma

2012-08-20 Thread Sterling Augustine

My last change to google 4.7 that included gdb/gdb-index.h as it
exists in binutils, but it included a trailing comma in an enumerator.
I have checked in as obvious the patch below which eliminates the
trailing comma.

Sterling


 2012-08-20  Sterling Augustine  

* gdb/gdb-index.h: Remove comma from last enum.


--- branches/google/gcc-4_7/include/gdb/gdb-index.h 2012/08/20 18:23:19 
190539
+++ branches/google/gcc-4_7/include/gdb/gdb-index.h 2012/08/20 23:05:44 
190549
@@ -68,7 +68,7 @@
  Give the unused bits a value so gdb will print them sensibly.  */
   GDB_INDEX_SYMBOL_KIND_UNUSED5 = 5,
   GDB_INDEX_SYMBOL_KIND_UNUSED6 = 6,
-  GDB_INDEX_SYMBOL_KIND_UNUSED7 = 7,
+  GDB_INDEX_SYMBOL_KIND_UNUSED7 = 7
 } gdb_index_symbol_kind;

 #define GDB_INDEX_SYMBOL_KIND_SHIFT 28

Re: [google/gcc-4_7] Fix ICEs with -gfission.

2012-08-20 Thread Cary Coutant

>> 2012-08-20   Cary Coutant  
>>
>> gcc/
>> * dwarf2out.c (remove_loc_list_addr_table_entries): Change
>> parameter; update all calls.
>> (output_pubname): Don't assert on unknown TAGs.
>> (resolve_addr): Call remove_loc_list_addr_table_entries for all
>> location expressions.
>
> OK for google 4.7.

Thanks, committed at r190553.

-cary

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Jan Hubicka

> Xinliang David Li  writes:
> >
> > Process level synchronization problems can happen when two processes
> > (running the instrumented binary) exit at the same time. The
> > updated/merged counters from one process may be overwritten by another
> > process -- this is true for both counter data and summary data.
> > Solution 3) does not introduce any new problems.
> 
> You could just use lockf() ?

The issue here is holding lock for all the files (that can be many) versus
number of locks limits & possibilities for deadlocking (mind that updating
may happen in different orders on the same files for different programs built
from same objects)

For David: there is no thread safety code in mainline for the counters.
Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was 
invented)
http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted down
as too memory expensive per thread. We could optionaly do atomic updates like 
ICC
or combination of both as discussed in the thread.
So far no one implemented it since the coverage fixups seems to work well 
enough in
pracitce for multithreaded programs where reproducibility do not seem to be 
_that_
important.

For GCC profiled bootstrap however I would like to see the output binary to be
reproducible. We realy ought to update profiles safe for multple processes.
Trashing whole process run is worse than doing race in increment. There is good
chance that one of runs is more important than others and it will get trashed.

I do not think we do have serious update problems in the summaries at the 
moment.
We lock individual files as we update them. The summary is simple enough to be 
safe.
sum_all is summed, max_all is maximum over the individual runs. Even when you 
combine
mutiple programs the summary will end up same. Everything except for max_all is 
ignored
anyway.

Solution 2 (i.e. histogram streaming) will also have the property that it is 
safe
WRT multiple programs, just like sum_all.

Honza
> 
> -Andi

Re: Merge C++ conversion into trunk (0/6 - Overview)

2012-08-20 Thread Lawrence Crowl

On 8/20/12, H.J. Lu  wrote:
> The C++ merge caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332
>
> GCC memory usage is more than doubled from <= 3GB to >= 10GB.
> Is this a known issue?

The two memory stat reports show no differences.  Are you sure you
didn't splice in the wrong report?

-- 
Lawrence Crowl

[PATCH, MIPS] fix MIPS16 jump table overflow

2012-08-20 Thread Sandra Loosemore


In config/mips/mips.h, there is presently this comment:

/* ??? 16-bit offsets can overflow in large functions.  */
#define TARGET_MIPS16_SHORT_JUMP_TABLES TARGET_MIPS16_TEXT_LOADS

A while ago we had a bug report where a big switch statement did, in 
fact, overflow the range of 16-bit offsets, causing a runtime error.


GCC already has generic machinery to shorten offset tables for switch 
statements that does the necessary range checking, but it only works 
with "casesi", not the lower-level "tablejump" expansion.  So, this 
patch provides a "casesi" expander to handle this situation.


This patch has been in use on our local 4.5 and 4.6 branches for about a 
year now.  When testing it on mainline, I found it tripped over the 
recent change to add MIPS16 branch overflow checking in other 
situations, causing it to get into an infinite loop.  I think telling it 
to ignore these new jump insns it doesn't know how to process is the 
right thing to do, but I'm not sure if there's a better way to restrict 
the condition or make mips16_split_long_branches more robust.  Richard,
since that's your code I assume you'll suggest an alternative if this 
doesn't meet with your approval.  Is the rest of the patch OK to check in?


-Sandra


2012-08-20  Julian Brown  
Sandra Loosemore  

gcc/
* config/mips/mips.md (MIPS16_T_REGNUM): New constant.
(tablejump): Don't use for MIPS16_SHORT_JUMP_TABLES.
(casesi): New.
(casesi_internal_mips16): New.
* config/mips/mips.c (mips16_split_long_branches): Ignore
casesi_internal_mips16 insns.
* config/mips/mips.h (TARGET_MIPS16_SHORT_JUMP_TABLES): Update
comment.
(CASE_VECTOR_MODE): Use ptr_mode unconditionally.
(CASE_VECTOR_SHORTEN_MODE): Define.
(ASM_OUTPUT_ADDR_DIFF_ELT): Output word-sized addr_diff_elts
when necessary for MIPS16_SHORT_JUMP_TABLES.

gcc/testsuite/
* gcc.target/mips/code-readable-1.c: Generalize assembler
pattern for jump table.
Index: gcc/config/mips/mips.md
===
--- gcc/config/mips/mips.md	(revision 190463)
+++ gcc/config/mips/mips.md	(working copy)
@@ -138,6 +138,7 @@
 
 (define_constants
   [(TLS_GET_TP_REGNUM		3)
+   (MIPS16_T_REGNUM		24)
(PIC_FUNCTION_ADDR_REGNUM	25)
(RETURN_ADDR_REGNUM		31)
(CPRESTORE_SLOT_REGNUM	76)
@@ -5904,14 +5905,9 @@
   [(set (pc)
 	(match_operand 0 "register_operand"))
(use (label_ref (match_operand 1 "")))]
-  ""
+  "!TARGET_MIPS16_SHORT_JUMP_TABLES"
 {
-  if (TARGET_MIPS16_SHORT_JUMP_TABLES)
-operands[0] = expand_binop (Pmode, add_optab,
-convert_to_mode (Pmode, operands[0], false),
-gen_rtx_LABEL_REF (Pmode, operands[1]),
-0, 0, OPTAB_WIDEN);
-  else if (TARGET_GPWORD)
+  if (TARGET_GPWORD)
 operands[0] = expand_binop (Pmode, add_optab, operands[0],
 pic_offset_table_rtx, 0, 0, OPTAB_WIDEN);
   else if (TARGET_RTP_PIC)
@@ -5937,6 +5933,91 @@
   [(set_attr "type" "jump")
(set_attr "mode" "none")])
 
+;; For MIPS16, we don't know whether a given jump table will use short or
+;; word-sized offsets until late in compilation, when we are able to determine
+;; the sizes of the insns which comprise the containing function.  This
+;; necessitates the use of the casesi rather than the tablejump pattern, since
+;; the latter tries to calculate the index of the offset to jump through early
+;; in compilation, i.e. at expand time, when nothing is known about the
+;; eventual function layout.
+
+(define_expand "casesi"
+  [(match_operand:SI 0 "register_operand" "")	; index to jump on
+   (match_operand:SI 1 "const_int_operand" "")	; lower bound
+   (match_operand:SI 2 "const_int_operand" "")	; total range
+   (match_operand:SI 3 "" "")			; table label
+   (match_operand:SI 4 "" "")]			; out of range label
+  "TARGET_MIPS16_SHORT_JUMP_TABLES"
+{
+  if (operands[1] != const0_rtx)
+{
+  rtx reg = gen_reg_rtx (SImode);
+  rtx offset = gen_int_mode (-INTVAL (operands[1]), SImode);
+  
+  if (!arith_operand (offset, SImode))
+offset = force_reg (SImode, offset);
+  
+  emit_insn (gen_addsi3 (reg, operands[0], offset));
+  operands[0] = reg;
+}
+
+  if (!arith_operand (operands[0], SImode))
+operands[0] = force_reg (SImode, operands[0]);
+
+  operands[2] = GEN_INT (INTVAL (operands[2]) + 1);
+
+  emit_jump_insn (gen_casesi_internal_mips16 (operands[0], operands[2],
+	  operands[3], operands[4]));
+
+  DONE;
+})
+
+(define_insn "casesi_internal_mips16"
+  [(set (pc)
+ (if_then_else
+   (leu (match_operand:SI 0 "register_operand" "d")
+	(match_operand:SI 1 "arith_operand" "dI"))
+   (mem:SI (plus:SI (mult:SI (match_dup 0) (const_int 4))
+			(label_ref (match_operand 2 "" ""
+   (label_ref (match_operand 3 "" ""
+   (clobber (match_scratch:SI 4 "=&d"))
+   (clobber (match_scratch:SI 5 "=d"))
+   (clobber (reg:SI MIPS16_T_REGNU

Re: [PATCH] convert m32c to constraints.md

2012-08-20 Thread DJ Delorie


I ran the testsuite for you; no regressions.  Looks OK to me, please
apply.  Thanks!

Re: [wwwdocs] Document Runtime CPU detection builtins

2012-08-20 Thread Sriraman Tallam

Hi Gerald / Diego,

I have made all the mentioned changes.  I also shortened the
description like Diego mentioned by removing all the strings but kept
the caveats. I have not added a reference to the documentation because
i do not know what link to reference. The builtins are completely
documented in extend.texi.

   I have attached the patch. If there are no further comments I will
submit this tomorrow.

Thanks,
-Sri.


On Mon, Aug 20, 2012 at 1:31 PM, Gerald Pfeifer  wrote:
> Hi Sriraman,
>
> On Fri, 10 Aug 2012, Sriraman Tallam wrote:
>> I have added a release note for x86 builtins __builtin_cpu_is and
>> __builtin_cpu_supports. They were checked in to trunk in rev. 186789.
>
> I had hoped one of the x86 maintainers would review this from his
> perspective given that they have more background.  For the lack of
> that, let me give it a try.
>
> Index: changes.html
> ===
> + New builtin functions to detect run-time CPU type and ISA:
>
> "built-in", cf. http://gcc.gnu.org/codingconventions.html; here and
> in the following.
>
> No  here;  should just do that.
>
> +
> +  Builtin __builtin_cpu_is has been added to detect if
> +  the run-time CPU is of a particular type. The builtin returns a postive
> +  integer on a match and zero otherwise. The builtin accepts one string
> +  literal argument, the CPU name. For example,
>
> "A built-in function..."
>
> "positive"
>
> "It accepts one string" (to make this shorter)
>
> +  __builtin_cpu_is("westmere") returns a postive integer if
>
> "positive"
>
> +  the run-time CPU is an Intel Corei7 Westmere processor.  The following
>
> I don't work for Intel, but should there be a space before "i7"?
>
> +  are the CPU names recognized by __builtin_cpu_is:
>
> How about making this "The following are the CPU names recognized for
> now", which avoids another reference to the name of the built-in and
> makes it clear that this is subject to change.
>
> +  Builtin __builtin_cpu_supports has been added to 
> detect
>
> "A built-in function..."
>
> +  returns a postive integer on a match and zero otherwise. The builtin
>
> "positive"
>
> +  following are the ISA features recognized by
> +  __builtin_cpu_supports:
>
> Same is above?
>
> +Caveat: If the above builtins are called before any constructors are
> +invoked, like during IFUNC initialization, then the CPU detection
> +initialization must be explicity run using this newly provided
> +builtin,  __builtin_cpu_init.
>
> "...using the new built-in function __builtin_cpu_init."
>
> What is a constructor in this context, by the way?  Will this be clear
> to all the users?
>
> +
> +static void (*some_ifunc_resolver(void))(void)
> +{
> +   __builtin_cpu_init();
> +   if (__builtin_cpu_is("amdfam10h") ...
> +   if (__builtin_cpu_supports("popcnt") ...
> +}
> +
>
> How about using  here? That avoids the s which will cause
> problems with the web page validator, by the way.
>
>
> Nice job for documenting this so well.  Thanks for taking the time
> and your patience!
>
> The patch is fine modulo the changes I pointed out (though some of
> them are more suggestions and you do not need to slavishly follow
> those).
>
> Gerald
Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.10
diff -u -r1.10 changes.html
--- changes.html10 Aug 2012 16:25:46 -  1.10
+++ changes.html21 Aug 2012 02:38:40 -
@@ -92,6 +92,38 @@
 wrong results.  You must build all
 modules with -mpreferred-stack-boundary=3, including any
 libraries.  This includes the system libraries and startup modules.
+ New built-in functions to detect run-time CPU type and ISA:
+
+  A built-in function __builtin_cpu_is has been added to
+  detect if the run-time CPU is of a particular type.  It returns a
+  positive integer on a match and zero otherwise.  It accepts one string
+  literal argument, the CPU name.  For example,
+  __builtin_cpu_is("westmere") returns a positive integer if
+  the run-time CPU is an Intel Core i7 Westmere processor.  Please refer
+  to the documentation for the list of valid CPU names recognized.
+  A built-in function __builtin_cpu_supports has been
+  added to detect if the run-time CPU supports a particular ISA feature.
+  It returns a positive integer on a match and zero otherwise.  It accepts
+  one string literal argument, the ISA feature.  For example,
+  __builtin_cpu_supports("ssse3") returns a positive integer
+  if the run-time CPU supports SSSE3 instructions.  Please refer to the
+  documentation for the list of valid ISA names recognized.
+
+Caveat: If these built-in functions are called before any static
+constructors are invoked, like during IFUNC

Re: Merge C++ conversion into trunk (0/6 - Overview)

2012-08-20 Thread H.J. Lu

On Mon, Aug 20, 2012 at 6:31 PM, Lawrence Crowl  wrote:
> On 8/20/12, H.J. Lu  wrote:
>> The C++ merge caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332
>>
>> GCC memory usage is more than doubled from <= 3GB to >= 10GB.
>> Is this a known issue?
>
> The two memory stat reports show no differences.  Are you sure you
> didn't splice in the wrong report?
>

Those are outputs from -fmem-report.  I am not sure how useful
they are for this bug.

-- 
H.J.

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Teresa Johnson

On Mon, Aug 20, 2012 at 6:27 PM, Jan Hubicka  wrote:
>> Xinliang David Li  writes:
>> >
>> > Process level synchronization problems can happen when two processes
>> > (running the instrumented binary) exit at the same time. The
>> > updated/merged counters from one process may be overwritten by another
>> > process -- this is true for both counter data and summary data.
>> > Solution 3) does not introduce any new problems.
>>
>> You could just use lockf() ?
>
> The issue here is holding lock for all the files (that can be many) versus
> number of locks limits & possibilities for deadlocking (mind that updating
> may happen in different orders on the same files for different programs built
> from same objects)
>
> For David: there is no thread safety code in mainline for the counters.
> Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was 
> invented)
> http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted down
> as too memory expensive per thread. We could optionaly do atomic updates like 
> ICC
> or combination of both as discussed in the thread.
> So far no one implemented it since the coverage fixups seems to work well 
> enough in
> pracitce for multithreaded programs where reproducibility do not seem to be 
> _that_
> important.
>
> For GCC profiled bootstrap however I would like to see the output binary to be
> reproducible. We realy ought to update profiles safe for multple processes.
> Trashing whole process run is worse than doing race in increment. There is 
> good
> chance that one of runs is more important than others and it will get trashed.
>
> I do not think we do have serious update problems in the summaries at the 
> moment.
> We lock individual files as we update them. The summary is simple enough to 
> be safe.
> sum_all is summed, max_all is maximum over the individual runs. Even when you 
> combine
> mutiple programs the summary will end up same. Everything except for max_all 
> is ignored
> anyway.
>
> Solution 2 (i.e. histogram streaming) will also have the property that it is 
> safe
> WRT multiple programs, just like sum_all.

I think the sum_all based scaling of the working set entries also has
this property. What is your opinion on saving the histogram in the
summary and merging histograms together as best as possible compared
to the alternative of saving the working set information as now and
scaling it up by the ratio between the new and old sum_all when
merging?

Thanks,
Teresa

>
> Honza
>>
>> -Andi



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Jan Hubicka

> On Mon, Aug 20, 2012 at 6:27 PM, Jan Hubicka  wrote:
> >> Xinliang David Li  writes:
> >> >
> >> > Process level synchronization problems can happen when two processes
> >> > (running the instrumented binary) exit at the same time. The
> >> > updated/merged counters from one process may be overwritten by another
> >> > process -- this is true for both counter data and summary data.
> >> > Solution 3) does not introduce any new problems.
> >>
> >> You could just use lockf() ?
> >
> > The issue here is holding lock for all the files (that can be many) versus
> > number of locks limits & possibilities for deadlocking (mind that updating
> > may happen in different orders on the same files for different programs 
> > built
> > from same objects)
> >
> > For David: there is no thread safety code in mainline for the counters.
> > Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was 
> > invented)
> > http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted 
> > down
> > as too memory expensive per thread. We could optionaly do atomic updates 
> > like ICC
> > or combination of both as discussed in the thread.
> > So far no one implemented it since the coverage fixups seems to work well 
> > enough in
> > pracitce for multithreaded programs where reproducibility do not seem to be 
> > _that_
> > important.
> >
> > For GCC profiled bootstrap however I would like to see the output binary to 
> > be
> > reproducible. We realy ought to update profiles safe for multple processes.
> > Trashing whole process run is worse than doing race in increment. There is 
> > good
> > chance that one of runs is more important than others and it will get 
> > trashed.
> >
> > I do not think we do have serious update problems in the summaries at the 
> > moment.
> > We lock individual files as we update them. The summary is simple enough to 
> > be safe.
> > sum_all is summed, max_all is maximum over the individual runs. Even when 
> > you combine
> > mutiple programs the summary will end up same. Everything except for 
> > max_all is ignored
> > anyway.
> >
> > Solution 2 (i.e. histogram streaming) will also have the property that it 
> > is safe
> > WRT multiple programs, just like sum_all.
> 
> I think the sum_all based scaling of the working set entries also has
> this property. What is your opinion on saving the histogram in the

I think the scaling will have at lest roundoff issues WRT different merging 
orders.

> summary and merging histograms together as best as possible compared
> to the alternative of saving the working set information as now and
> scaling it up by the ratio between the new and old sum_all when
> merging?

So far I like this option best. But David seems to lean towards thirtd option 
with
whole file locking.  I see it may show to be more extensible in the future.
At the moment I do not understand two things
 1) why do we need info on the number of counter above given threshold, sinc 
ethe hot/cold
decisions usually depends purely on the count cutoff.
Removing those will solve merging issues with variant 2 and then it would 
be probably
good solution.
 2) Do we plan to add some features in near future that will anyway require 
global locking?
I guess LIPO itself does not count since it streams its data into 
independent file as you
mentioned earlier and locking LIPO file is not that hard.
Does LIPO stream everything into that common file, or does it use 
combination of gcda files
and common summary?

What other stuff Google plans to merge?
(In general I would be curious about merging plans WRT profile stuff, so we 
get more
synchronized and effective on getting patches in. We have about two months 
to get it done
in stage1 and it would be nice to get as much as possible. Obviously some 
of the patches will
need bit fo dicsussion like this one. Hope you do not find it frustrating, 
I actually think
this is an important feature).

I also realized today that the common value counters (used by switch, indirect
call and div/mod value profiling) are non-stanble WRT different merging orders
(i.e.  parallel make in train run).  I do not think there is actual solution to
that except for not merging the counter section of this type in libgcov and
merge them in some canonical order at profile feedback time.  Perhaps we just
want to live with this, since the disprepancy here is small. (i.e. these
counters are quite rare and their outcome has just local effect on the final
binary, unlike the global summaries/edge counters).

Honza
> 
> Thanks,
> Teresa
> 
> >
> > Honza
> >>
> >> -Andi
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

libgo patch committed: Define F_GETLK and friends on i386 GNU/Linux

2012-08-20 Thread Ian Lance Taylor

On i386 GNU/Linux, when compiling with -D_FILE_OFFSET_BITS=64, 
winds up doing this:

#  define F_GETLK   F_GETLK64  /* Get record locking info.  */
...
# define F_GETLK64  12  /* Get record locking info.  */

Because of the ordering, -fdump-go-spec does not write F_GETLK into the
file that it creates.  The effect is that syscall.F_GETLK is not defined
for i386 GNU/Linux.  This patch fixes the problem.  Bootstrapped and ran
Go testsuite on x86_64-unknown-linux-gnu, both 32-bit and 64-bit mode.
Committed to mainline and 4.7 branch.

Ian

diff -r a602dc132c2d libgo/mksysinfo.sh
--- a/libgo/mksysinfo.sh	Tue Aug 14 20:46:42 2012 -0700
+++ b/libgo/mksysinfo.sh	Mon Aug 20 22:10:34 2012 -0700
@@ -211,6 +211,16 @@
   echo "const O_CLOEXEC = 0" >> ${OUT}
 fi
 
+# These flags can be lost on i386 GNU/Linux when using
+# -D_FILE_OFFSET_BITS=64, because we see "#define F_SETLK F_SETLK64"
+# before we see the definition of F_SETLK64.
+for flag in F_GETLK F_SETLK F_SETLKW; do
+  if ! grep "^const ${flag} " ${OUT} >/dev/null 2>&1 \
+  && grep "^const ${flag}64 " ${OUT} >/dev/null 2>&1; then
+echo "const ${flag} = ${flag}64" >> ${OUT}
+  fi
+done
+
 # The signal numbers.
 grep '^const _SIG[^_]' gen-sysinfo.go | \
   grep -v '^const _SIGEV_' | \

Re: failed attempt: retain identifier length from frontend to backend

2012-08-20 Thread Richard Guenther

On Mon, Aug 20, 2012 at 7:03 PM, Dimitrios Apostolou  wrote:
> Hello,
>
> my last attempt on improving something serious was about three weeks ago,
> trying to keep all lengths of all strings parsed in the frontend for the
> whole compilation phase until the assembly output. I was hoping that would
> help on using faster hashes (knowing the length allows us to hash 4 or 8
> bytes per iteration), quicker strcmp in various places, and using less
> strlen() calls, which show especially on -g3 compilations that store huge
> macro strings.
>
> I'll post no patch here, since what I currently have is a mess in 3
> different branches and most don't even compile. I tried various approaches.
> First I tried adding an extra length parameter in all relevant functions,
> starting from the assembly generation and working my way upwards. This got
> too complex, and I'd really like to ask if you find any meaning in changing
> target specific hooks and macros to actually accept length as argument (e.g.
> ASM_OUTPUT_*) or return it (e.g. ASM_GENERATE_*). Changes seemed too
> intrusive for me to continue.
>
> But seeing that identifier length is there inside struct ht_identifier (or
> cpp_hashnode) and not lost, I tried the approach of having the length at
> str[-4] for all identifiers. To achieve this I changed ht_identifier to
> store str with the flexible array hack. Unfortunately I hadn't noticed that
> ht_identifier was a part of tree_node and also part of too many other
> structs, so changing all those structs to have variable size was not without
> side effects. In the end it compiled but I got crashes all over, and I'm
> sure I didn't do things right since I broke things like the static assert in
> libcpp/identifiers.c, that I don't even understand:
>
>  /* We don't need a proxy since the hash table's identifier comes first
> in cpp_hashnode.  However, in case this is ever changed, we have a
> static assertion for it.  */
> -extern char proxy_assertion_broken[offsetof (struct cpp_hashnode, ident) ==
> 0 ? 1 : -1];
>
> Anyway last attempt was to decouple ht_identifier completely from trees and
> other structs by storing pointer to it, but I was pretty worn out and
> quickly gave up after getting errors on gengtype-generated files that I
> didn't even know how to handle.
>
> Was all this project too ambitious? I'd appreciate all input.

I think the proper thing would indeed have been to pass down the length of
the string to relevant functions.

Richard.

>
> Thanks,
> Dimitris
>

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Xinliang David Li

On Mon, Aug 20, 2012 at 10:29 PM, Jan Hubicka  wrote:
>> On Mon, Aug 20, 2012 at 6:27 PM, Jan Hubicka  wrote:
>> >> Xinliang David Li  writes:
>> >> >
>> >> > Process level synchronization problems can happen when two processes
>> >> > (running the instrumented binary) exit at the same time. The
>> >> > updated/merged counters from one process may be overwritten by another
>> >> > process -- this is true for both counter data and summary data.
>> >> > Solution 3) does not introduce any new problems.
>> >>
>> >> You could just use lockf() ?
>> >
>> > The issue here is holding lock for all the files (that can be many) versus
>> > number of locks limits & possibilities for deadlocking (mind that updating
>> > may happen in different orders on the same files for different programs 
>> > built
>> > from same objects)
>> >
>> > For David: there is no thread safety code in mainline for the counters.
>> > Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was 
>> > invented)
>> > http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted 
>> > down
>> > as too memory expensive per thread. We could optionaly do atomic updates 
>> > like ICC
>> > or combination of both as discussed in the thread.
>> > So far no one implemented it since the coverage fixups seems to work well 
>> > enough in
>> > pracitce for multithreaded programs where reproducibility do not seem to 
>> > be _that_
>> > important.
>> >
>> > For GCC profiled bootstrap however I would like to see the output binary 
>> > to be
>> > reproducible. We realy ought to update profiles safe for multple processes.
>> > Trashing whole process run is worse than doing race in increment. There is 
>> > good
>> > chance that one of runs is more important than others and it will get 
>> > trashed.
>> >
>> > I do not think we do have serious update problems in the summaries at the 
>> > moment.
>> > We lock individual files as we update them. The summary is simple enough 
>> > to be safe.
>> > sum_all is summed, max_all is maximum over the individual runs. Even when 
>> > you combine
>> > mutiple programs the summary will end up same. Everything except for 
>> > max_all is ignored
>> > anyway.
>> >
>> > Solution 2 (i.e. histogram streaming) will also have the property that it 
>> > is safe
>> > WRT multiple programs, just like sum_all.
>>
>> I think the sum_all based scaling of the working set entries also has
>> this property. What is your opinion on saving the histogram in the
>
> I think the scaling will have at lest roundoff issues WRT different merging 
> orders.
>
>> summary and merging histograms together as best as possible compared
>> to the alternative of saving the working set information as now and
>> scaling it up by the ratio between the new and old sum_all when
>> merging?
>
> So far I like this option best. But David seems to lean towards thirtd option 
> with
> whole file locking.  I see it may show to be more extensible in the future.
> At the moment I do not understand two things
>  1) why do we need info on the number of counter above given threshold, sinc 
> ethe hot/cold
> decisions usually depends purely on the count cutoff.
> Removing those will solve merging issues with variant 2 and then it would 
> be probably
> good solution.

This is useful for large applications with a long tail. The
instruction working set for those applications are very large, and
inliner and unroller need to be aware of that and good heuristics can
be developed to throttle aggressive code bloat transformations. For
inliner, it is kind of the like the global budget but more application
dependent. In the long run, we will collect more advanced fdo summary
regarding working set -- it will be working set size for each code
region (locality region).


>  2) Do we plan to add some features in near future that will anyway require 
> global locking?
> I guess LIPO itself does not count since it streams its data into 
> independent file as you
> mentioned earlier and locking LIPO file is not that hard.
> Does LIPO stream everything into that common file, or does it use 
> combination of gcda files
> and common summary?

Actually, LIPO module grouping information are stored in gcda files.
It is also stored in a separate .imports file (one per object) ---
this is primarily used by our build system for dependence information.


>
> What other stuff Google plans to merge?
> (In general I would be curious about merging plans WRT profile stuff, so 
> we get more
> synchronized and effective on getting patches in. We have about two 
> months to get it done
> in stage1 and it would be nice to get as much as possible. Obviously some 
> of the patches will
> need bit fo dicsussion like this one. Hope you do not find it 
> frustrating, I actually think
> this is an important feature).

We plan to merge in the new LIPO implementation based on LTO
streaming. Rong Xu finished this in 4.6 based compiler, an

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Jan Hubicka

> 
> This is useful for large applications with a long tail. The
> instruction working set for those applications are very large, and
> inliner and unroller need to be aware of that and good heuristics can
> be developed to throttle aggressive code bloat transformations. For
> inliner, it is kind of the like the global budget but more application
> dependent. In the long run, we will collect more advanced fdo summary
> regarding working set -- it will be working set size for each code
> region (locality region).

I see, so you use it to estimate size of the working set and effect of bloating
optimizations on cache size. This sounds interesting. What are you experiences
with this?

What concerns me that it is greatly inaccurate - you have no idea how many
instructions given counter is guarding and it can differ quite a lot. Also
inlining/optimization makes working sets significantly different (by factor of
100 for tramp3d). But on the ohter hand any solution at this level will be
greatly inaccurate. So I am curious how reliable data you can get from this?
How you take this into account for the heuristics?

It seems to me that for this use perhaps the simple logic in histogram merging
maximizing the number of BBs for given bucket will work well?  It is
inaccurate, but we are working with greatly inaccurate data anyway.
Except for degenerated cases, the small and unimportant runs will have small BB
counts, while large runs will have larger counts and those are ones we optimize
for anyway.
> 
> 
> >  2) Do we plan to add some features in near future that will anyway require 
> > global locking?
> > I guess LIPO itself does not count since it streams its data into 
> > independent file as you
> > mentioned earlier and locking LIPO file is not that hard.
> > Does LIPO stream everything into that common file, or does it use 
> > combination of gcda files
> > and common summary?
> 
> Actually, LIPO module grouping information are stored in gcda files.
> It is also stored in a separate .imports file (one per object) ---
> this is primarily used by our build system for dependence information.

I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO behave
on GCC bootstrap? (i.e. it does a lot more work in the libgcov module per each
invocation, so I am curious if it is practically useful at all).

With LTO based solution a lot can be probably pushed at link time? Before
actual GCC starts from the linker plugin, LIPO module can read gcov CFGs from
gcda files and do all the merging/updating/CFG constructions that is currently
performed at runtime, right?
> 
> 
> >
> > What other stuff Google plans to merge?
> > (In general I would be curious about merging plans WRT profile stuff, 
> > so we get more
> > synchronized and effective on getting patches in. We have about two 
> > months to get it done
> > in stage1 and it would be nice to get as much as possible. Obviously 
> > some of the patches will
> > need bit fo dicsussion like this one. Hope you do not find it 
> > frustrating, I actually think
> > this is an important feature).
> 
> We plan to merge in the new LIPO implementation based on LTO
> streaming. Rong Xu finished this in 4.6 based compiler, and he needs
> to port it to 4.8.

Good.  Looks like a lot of work ahead. It would be nice if we can perhaps start
by merging the libgcov infrastructure updates prior the LIPO changes.  From
what I saw at LIPO branch some time ago it has a lot of stuff that is not
exactly LIPO specific.

Honza
> 
> 
> thanks,
> 
> David
> 
> >
> > I also realized today that the common value counters (used by switch, 
> > indirect
> > call and div/mod value profiling) are non-stanble WRT different merging 
> > orders
> > (i.e.  parallel make in train run).  I do not think there is actual 
> > solution to
> > that except for not merging the counter section of this type in libgcov and
> > merge them in some canonical order at profile feedback time.  Perhaps we 
> > just
> > want to live with this, since the disprepancy here is small. (i.e. these
> > counters are quite rare and their outcome has just local effect on the final
> > binary, unlike the global summaries/edge counters).
> >
> > Honza
> >>
> >> Thanks,
> >> Teresa
> >>
> >> >
> >> > Honza
> >> >>
> >> >> -Andi
> >>
> >>
> >>
> >> --
> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: Merge C++ conversion into trunk (0/6 - Overview)

2012-08-20 Thread Richard Guenther

On Tue, Aug 21, 2012 at 3:31 AM, Lawrence Crowl  wrote:
> On 8/20/12, H.J. Lu  wrote:
>> The C++ merge caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332
>>
>> GCC memory usage is more than doubled from <= 3GB to >= 10GB.
>> Is this a known issue?
>
> The two memory stat reports show no differences.  Are you sure you
> didn't splice in the wrong report?

Well, easy things such as a messed up hashtab conversion (no freeing)
or vec conversion (no freeing) can cause this, or even the gengtype change
causing GC issues  (which is why those should
have been different revisions).

Richard.

> --
> Lawrence Crowl

Loop iterations inline hint

2012-08-20 Thread Jan Hubicka


Hi,
this patch adds a hint that if inlining makes bounds on loop iterations known,
it is probably good idea.  This is primarely targetting Fortran's array
descriptors, but should be generally useful.

Fortran will still need a bit more work. Often we disregard inlining because we
think the call is cold (because it comes from Main) so inlining heuristic will
need more updating and apparently we will also need to update for PHI
conditionals as done in Martin's patch 3/3.

At the moment the hint is interpreted same way as the indirect_call hint from
previous patch.

Martin: I think ipa-cp should also make use of this hint. Resolving number of
loop iterations is important enough reason to specialize in many cases.
I think it already has logic for devirtualization but perhaps it should be made
more aggressive? I was sort of surprised that for Mozila the inlining hint
makes us to catch 20 times more cases than before. Most of the cases sounds like
good ipa-cp candidates.

Also can you please try to finaly make param notes to be used by the virtual
clones machinery and thus make it possible for ipa-cp to specialize for known
aggregate parameters? This should make a lot of difference for Fortran, I think.

Boostrapped/regtested x86_64-linux, will commit it after a bit more testing.
Honza

* gcc.dg/ipa/inlinehint-1.c: New.

PR fortran/48636
* ipa-inline.c (want_inline_small_function_p): Take loop_iterations 
hint.
(edge_badness): Likewise.
* ipa-inline.h (inline_hints_vals): Add INLINE_HINT_loop_iterations.
(inline_summary): Add loop_iterations.
* ipa-inline-analysis.c: Include tree-scalar-evolution.h.
(dump_inline_hints): Dump loop_iterations.
(reset_inline_summary): Free loop_iterations.
(inline_node_duplication_hook): Update loop_iterations.
(dump_inline_summary): Dump loop_iterations.
(will_be_nonconstant_expr_predicate): New function.
(estimate_function_body_sizes): Analyze loops.
(estimate_node_size_and_time): Set hint loop_iterations.
(inline_merge_summary): Merge loop iterations.
(inline_read_section): Stream in loop_iterations.
(inline_write_summary): Stram out loop_iterations.
Index: testsuite/gcc.dg/ipa/inlinehint-1.c
===
*** testsuite/gcc.dg/ipa/inlinehint-1.c (revision 0)
--- testsuite/gcc.dg/ipa/inlinehint-1.c (revision 0)
***
*** 0 
--- 1,16 
+ /* { dg-options "-O3 -c -fdump-ipa-inline-details -fno-early-inlining 
-fno-ipa-cp"  } */
+ test (int a)
+ {
+int i;
+for (i=0; isymbol.decl)
   && growth >= MAX_INLINE_INSNS_SINGLE
!  && !(hints & INLINE_HINT_indirect_call))
{
e->inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT;
  want_inline = false;
--- 480,487 
 hints suggests that inlining given function is very profitable.  */
else if (DECL_DECLARED_INLINE_P (callee->symbol.decl)
   && growth >= MAX_INLINE_INSNS_SINGLE
!  && !(hints & (INLINE_HINT_indirect_call
!| INLINE_HINT_loop_iterations)))
{
e->inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT;
  want_inline = false;
*** edge_badness (struct cgraph_edge *edge, 
*** 863,869 
  if (dump)
fprintf (dump_file, "Badness overflow\n");
}
!   if (hints & INLINE_HINT_indirect_call)
badness /= 8;
if (dump)
{
--- 864,871 
  if (dump)
fprintf (dump_file, "Badness overflow\n");
}
!   if (hints & (INLINE_HINT_indirect_call
!  | INLINE_HINT_loop_iterations))
badness /= 8;
if (dump)
{
Index: ipa-inline.h
===
*** ipa-inline.h(revision 190510)
--- ipa-inline.h(working copy)
*** typedef struct GTY(()) condition
*** 45,51 
  /* Inline hints are reasons why inline heuristics should preffer inlining 
given function.
 They are represtented as bitmap of the following values.  */
  enum inline_hints_vals {
!   INLINE_HINT_indirect_call = 1
  };
  typedef int inline_hints;
  
--- 45,52 
  /* Inline hints are reasons why inline heuristics should preffer inlining 
given function.
 They are represtented as bitmap of the following values.  */
  enum inline_hints_vals {
!   INLINE_HINT_indirect_call = 1,
!   INLINE_HINT_loop_iterations = 2
  };
  typedef int inline_hints;
  
*** struct GTY(()) inline_summary
*** 118,123 
--- 119,128 
   merged during inlining.  */
conditions conds;
VEC(size_time_entry,gc) *entry;
+ 
+   /* Predicate on when some loop in the function sbecomes to have known
+  bounds.   */
+   struct predicate * GTY((skip)) loop_iterations;
  };
  
  
Index: ipa-inline-analysis.c
=

Re: [wwwdocs] Update Fortran secrion in 4.8/changes.html

2012-08-20 Thread Tobias Burnus


Gerald Pfeifer wrote:

I went ahead and made some smaller changes, patch below.


Thanks.


I noticed you are using ..., as in e,
which we usually don't.  Why that?


My impression was that a one-letter  didn't stand out enough and 
looked rather odd; if you think it improves consistency or readability, 
feel free to change it.


 * * *

I intent to commit the attached patch to document two new warning flags, 
which were recently added. (Suggested in ISO/IEC Technical Report 24772 
"Guidance for Avoiding Vulnerabilities through Language Selection and Use".)


Tobias
Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.17
diff -u -r1.17 changes.html
--- changes.html	20 Aug 2012 12:23:39 -	1.17
+++ changes.html	21 Aug 2012 06:56:55 -
@@ -92,6 +92,21 @@
 (re)allocation in hot loops. (For arrays, replacing var=
 by var(:)= disables the automatic reallocation.)
 
+The http://gcc.gnu.org/onlinedocs/gfortran/Error-and-Warning-Options.html";>
+-Wcompare-reals flag has been added. When this flag is set,
+warnings are issued when comparing REAL or
+COMPLEX types for equality and inequality; consider replacing
+a == b by abs(a−b) < eps with a suitable
+eps. The -Wcompare-reals flag is enabled by
+-Wall.
+
+The http://gcc.gnu.org/onlinedocs/gfortran/Error-and-Warning-Options.html";>
+-Wtarget-lifetime flag has been added (enabled with
+-Wall), which warns if the pointer in a pointer assignment
+might outlive its target.
+
 Reading floating point numbers which use q for
 the exponential (such as 4.0q0) is now supported as vendor
 extension for better compatibility with old data files. It is strongly

94 matches

Mail list logo